Skip to main content
Log in

Exploring the Effects of Item-Specific Factors in Sequential and IRTree Models

  • Theory & Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Test items for which the item score reflects a sequential or IRTree modeling outcome are considered. For such items, we argue that item-specific factors, although not empirically measurable, will often be present across stages of the same item. In this paper, we present a conceptual model that incorporates such factors. We use the model to demonstrate how the varying conditional distributions of item-specific factors across stages become absorbed into the stage-specific item discrimination and difficulty parameters, creating ambiguity in the interpretations of item and person parameters beyond the first stage. We discuss implications in relation to various applications considered in the literature, including methodological studies of (1) repeated attempt items; (2) answer change/review, (3) on-demand item hints; (4) item skipping behavior; and (5) Likert scale items. Our own empirical applications, as well as several examples published in the literature, show patterns of violations of item parameter invariance across stages that are highly suggestive of item-specific factors. For applications using sequential or IRTree models as analytical models, or for which the resulting item score might be viewed as outcomes of such a process, we recommend (1) regular inspection of data or analytic results for empirical evidence (or theoretical expectations) of item-specific factors; and (2) sensitivity analyses to evaluate the implications of item-specific factors for the intended inferences or applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

Download references

Acknowledgements

This research was performed using the compute resources and assistance of the UW-Madison Center For High Throughput Computing (CHTC) in the Department of Computer Sciences. The CHTC is supported by UW-Madison, the Advanced Computing Initiative, the Wisconsin Alumni Research Foundation, the Wisconsin Institutes for Discovery, and the National Science Foundation, and is an active member of the OSG Consortium, which is supported by the National Science Foundation and the U.S. Department of Energy’s Office of Science.

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weicong Lyu.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 Example Constructed Response Items Scored Using 0, 1, 2 Scoring, Trends in Mathematics and Science Study (TIMSS)

Fig. 7
figure 7

A sample mathematics eighth grade item.

Fig. 8
figure 8

A sample science eighth grade item.

Appendix B

1.1 Graphical Display of IRTree Models for Several Measurement Applications

Fig. 9
figure 9

IRTree model of answer changes (Jeon et al., 2017).

Fig. 10
figure 10

IRTree model of skipping behavior (Debeer et al., 2017).

Fig. 11
figure 11

IRTree model, three category Likert rating scale item, verbal aggression dataset (Jeon and De Boeck, 2016).

Appendix C

Sensitivity of Scoring in the Presence of On-Demand Hints to Item-Specific Factors

   Bolsinova et al. (2022) consider a multinomial logistic model for test administration conditions where examinees can request on-demand hints. The approach considers separate scores for four possible outcomes on each item: (1) incorrect response without hint (\(IH_-\)), (2) incorrect response with hint (\(IH_+\)), (3) correct response with hint (\(CH_+\)), and (4) correct response without hint (\(CH_-\)). The result is scoring that accommodates responses obtained both with and without hints.

   Bolsinova et al. examine and compare a variety of models before arriving at a result that implies scoring \(IH_-\) and \(IH_+\) equivalently, with a higher score for \(CH_+\), and the highest score for \(CH_-\). In other words, incorrect answers, whether with or without hints, were found equally indicative of lower ability, a correct answer without hint as indicative of highest ability, and a correct answer with hint as somewhere in-between.

   Although their functions about scoring are arrived at through the application of a multinomial logistic model, the paper also considers an IRTree model, which arguably provides the best psychological representation of the response process associated with on-demand hint selection. Under an IRTree representation, Stage 1 represents the hint use decision (Yes, No), and Stages 2 and 3 represent the final item score (Correct, Incorrect) under conditions of Hint (Stage 2) or No Hint (Stage 3). The different stages can be modeled using different latent traits or the same latent trait. For simplicity, we assume the same latent trait \(\theta \) applies across stages.

   We seek here to show how the presence/absence of item-specific factors, despite not being directly observable, has the potential to significantly alter the scoring as estimated using a multinomial logistic model. This illustration demonstrates our claim (in the conclusion of the main paper) that the issue of item-specific factors transcends applications using sequential or IRTree models as analytic models, as this application uses a multinomial logistic model.

   First, we note that when considered in the context of real data, as in Bolsinova et al. , there is evidence for the presence of item-specific factors. As already noted in the main paper, the Bolsinova et al. Fig. 1 shows that when evaluated against a single overall latent proficiency, the majority of items show higher estimated difficulties when administered with hints compared to without hints. Unless the hints tend to hurt item performance (which we assume is unlikely), such results suggest that those respondents requesting hints tend to have lower levels on an item-specific factor (\(\eta _j\)) compared to respondents not requesting hints. The lower mean level on \(\eta _j\) for respondents that request hints is absorbed into the item difficulty estimate of the “with hint” item difficulty, making the items often appear more difficult for respondents that request hints.

   Second, we conduct a sensitivity analysis to show how the presence/absence of an item-specific factor can significantly alter the scoring function implied by a multinomial logistic model. Across scenarios with and without item-specific factors, our data generating approach maintains (1) consistent effects in the relative difficulties of the items when administered with and without hints, and (2) consistent effects in the relationship between the latent proficiency \(\theta \) and examinee requests for hints. Because of the psychological plausibility of IRTrees in this context, we conduct our sensitivity analysis using an IRTree model of the form in Fig. 12 as a data generating model. We consider two scenarios: (1) item-specific factors are present across stages (implying the decision to request hint is related to the correct response outcome beyond the influence of \(\theta \)) versus (2) item specific factors are not present across stages (the decision to request a hint is NOT related to the final outcome beyond effects accounted for by \(\theta \)).

Fig. 12
figure 12

IRTree model of on-demand hints (Bolsinova et al., 2022).

   Under Scenario 1 (item-specific factors present), we simulate data from the following model for 10,000 respondents to 10 items. We assume \(\theta \sim \mathcal {N}(0,1)\). At Stage 1, the stage in which the decision about hint is made, we simulate for each item independent \(\eta _j\sim \mathcal {N}(0,1)\), and assume \(b_j=0\) for all items. Using the item-specific factor model, we consider the hint decision (\(Y_j=1\) implies hint; \(=0\) implies no hint), to be a stochastic outcome whose probability is maximized when the probability of correct response without a hint is .5, and that the probability reduces as the probability of correct response moves away from .5, either toward 1.0 or 0. Consistent with the reasoning in Bolsinova et al. (2022), this implies that lower \(\theta \) examinees are more inclined to request a hint when they have a higher item specific factor \(\eta _j\) (i.e., they feel they can achieve a correct answer with the support of a hint), while higher \(\theta \) examinees are more inclined to request a hint when they have a lower item specific factor \(\eta _j\) (i.e., they are unlikely to correctly answer the item without the hint).

Assuming a maximum probability of hint selection equal to .75, this produces a Stage 1 model of

$$\begin{aligned}\Pr (Y_j=1|\theta ,\eta _j)=.75-\left| \Pr (X_j=1|Y_j=0, \theta ,\eta _j)-.5\right| .\end{aligned}$$

At Stages 2 (where hint is not requested) and 3 (where hint is requested), we model the probability of correct response using the item-specific factor model (\(\Pr (X_j=1|Y_j=0,\theta ,\eta _j)\) for Stage 2, \(\Pr (X_j=1|Y_j=1,\theta ,\eta _j)\) for Stage 3) where the item difficulties at Stage 2 are all \(b_j=0\), but the difficulties at Stage 3 are set at \(b_j=-1.5\), so as to reflect the effect of the hint in reducing item difficulty.

   The only feature distinguishing Scenario 2 from Scenario 1 is whether the \(\eta _j\) at Stage 1 is the same as at Stages 2 and 3, or whether we assume \(\eta _{1j}\) at Stage 1 are independent of the \(\eta _{2j}=\eta _{3j}\) that apply at Stages 2 and 3. Across scenarios, preserving the presence of a common \(\eta _j\) across stages allows the psychometric phenomenon occurring within each stage to be identical; the only difference being the unobservable dependence of the item-specific factors across stages induced by Scenario 1.

   For data generated under each scenario, we fit a multinomial logistic model and report the scoring function for each analysis, as provided in Table 3. The scoring function reports the empirically best score for each score category on each item according to the model.

Table 3 Scoring functions for each scenario, 10 simulation items, multinomial logistic model.

   Note that while the \(IH_+\) and \(CH_-\) scores are fixed for identification, we observe very different forms of relative scoring for \(IH_-\) and \(CH_+\) depending on the presence or absence of item specific factors. Scenario 1 (item-specific factors present) provides much more credit for \(CH_+\) relative to \(IH_-\), while Scenario 2 (no item-specific factors) gives more credit to \(IH_-\) than \(CH_+\). Importantly, the difference occurs despite the consistent influences of (1) \(\theta \) on hint selection and (2) \(\theta \) on response correctness across scenarios, and thus the difference can be attributed to the role item-specific factors play in simultaneously affecting both hint requests and ultimate response correctness. In the context of the multinomial logistic model, such a result can be attributed to the way in which the item-specific factors introduce correlations between the error terms of the category propensities, a violation of its independence of irrelevant alternatives assumption.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lyu, W., Bolt, D.M. & Westby, S. Exploring the Effects of Item-Specific Factors in Sequential and IRTree Models. Psychometrika 88, 745–775 (2023). https://doi.org/10.1007/s11336-023-09912-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-023-09912-x

Keywords

Navigation