Interchangeability between factor analysis, logistic IRT, and normal ogive IRT

Cho, Eunseong

doi:10.3389/fpsyg.2023.1267219

ORIGINAL RESEARCH article

Front. Psychol., 25 September 2023

Sec. Quantitative Psychology and Measurement

Volume 14 - 2023 | https://doi.org/10.3389/fpsyg.2023.1267219

Interchangeability between factor analysis, logistic IRT, and normal ogive IRT

Eunseong Cho^*

Department of Business Administration, Kwangwoon University, Seoul, Republic of Korea

In existing studies, it has been argued that factor analysis (FA) is equivalent to item response theory (IRT) and that IRT models that use different functions (i.e., logistic and normal ogive) are also interchangeable. However, these arguments have weak links. The proof of equivalence between FA and normal ogive IRT assumes a normal distribution. The interchangeability between the logistic and normal ogive IRT models depends on a scaling constant, but few scholars have examined whether the usual values of 1.7 or 1.702 maximize interchangeability. This study addresses these issues through Monte Carlo simulations. First, the FA model produces almost identical results to those of the normal ogive model even under severe nonnormality. Second, no single scaling constant maximizes the interchangeability between logistic and normal ogive models. Instead, users should choose different scaling constants depending on their purpose in using a model and the number of response categories (i.e., dichotomous or polytomous). Third, the interchangeability between logistic and normal ogive models is determined by several conditions. The interchangeability is high if the data are dichotomous or if the latent variables follow a symmetric distribution, and vice versa. In summary, the interchangeability between FA and normal ogive models is greater than expected, but that between logistic and normal ogive models is not.

Introduction

When dealing with discrete data (e.g., yes/no or Likert scale data), we can use either factor analysis (FA) or item response theory (IRT). Typical FA is for continuous data, but we can use FA for discrete data by assuming underlying continuous variables. There are various IRT models, most of which use logistic or normal ogive functions. That is, we have three models: FA, logistic IRT, and normal ogive IRT. Assume that we use them for two purposes: estimating parameters from given data and generating data from given parameters. Interchangeability is the degree to which models produce approximate results for each purpose when their apparent differences are removed. For example, if a transformation exists that makes the parameters estimated by one model identical to those estimated by another model from the same data, the two models are fully interchangeable for parameter estimation. If no statistical analysis is better than a random guess at distinguishing between the data generated by two models, the two models are fully interchangeable for data generation. We know that these three models are roughly interchangeable in some cases (Wirth and Edwards, 2007), but we know little about under what conditions and to what extent these models are interchangeable.

FA and IRT have different traditions and seemingly unrelated formulas. Nevertheless, scholars have begun to explain the relationship between FA and IRT (Lord and Novick, 1968; Bartholomew, 1983; Muthén, 1983). An explicit explanation was provided by Takane and de Leeuw (1987), who algebraically proved the equivalence of some FA models with some IRT models and presented formulas for transforming an FA model to an IRT model. These findings led Takane and de Leeuw (1987) to claim “[i]t is clear … that IRT and FA are two alternative formulations of a same model” (pp. 396–397). Modern scholars explain FA and IRT models within the same framework (Wirth and Edwards, 2007). However, few scholars have expressed a cautious view, suggesting that the claims of model equivalence are overgeneralized to areas outside what is substantiated by proof. The algebraic approach found in existing studies requires an assumption to obtain a solution. However, real-world data rarely satisfy this assumption. This study, through an empirical approach (i.e., Monte Carlo simulations), examines 3 weak links in the claim that FA and IRT models are interchangeable.

First, Takane and de Leeuw’s proof assumes that latent variables (i.e., factors in FA or abilities in IRT) follow a normal distribution. There is not yet a proof that assumes another distribution or a general proof that does not assume any distribution. This study empirically examines whether Takane and de Leeuw’s proof is robust to moderate or severe normality violations.

Second, Takane and de Leeuw’s proof applies to the normal ogive model instead of the logistic model. The normal ogive function is mathematically unwieldy because it does not have a closed form. Users typically use logistic models, which have simple formulas and produce approximate results to normal ogive models. What makes the parameters of the two models interchangeable is a scaling constant, often denoted as $D$ . The values that are almost always used are 1.7 and 1.702 (Haley, 1952; Birnbaum, 1968). These values have been used unchallenged for decades, despite the lack of empirical evidence that it is the best choice for its original purpose. This study examines which value to use to maximize the interchangeability between models.

Third, few scholars have examined whether interchangeability for one purpose guarantees interchangeability for another. When existing studies have stated that one model is equivalent to or exchangeable with another model, they have rarely clarified the purpose of the models. For example, if two models are interchangeable when used to generate data, can we be sure that the two models are also interchangeable when used to estimate parameters? Few scholars have examined interchangeability from various angles.

This study addresses these issues through Monte Carlo simulations. Before explaining the simulations, the basic concepts are explained for readers unfamiliar with the topic.

Literature review

Probability distributions

Skewness and kurtosis

When describing a probability distribution, we often use skewness and kurtosis (Figure 1) in addition to mean (i.e., $μ$ ) and variance (i.e., $σ^{2}$ ). Skewness is defined as $E ({(X - μ)}^{3}) / σ^{3}$ , where $E (.)$ is the expected-value operator. Kurtosis is defined as $E ({(X - μ)}^{4}) / σ^{4} - 3$ . A distribution with negative kurtosis is platykurtic, and a distribution with positive kurtosis is leptokurtic.

FIGURE 1

Figure 1. Skewed, normal, platykurtic, and leptokurtic distributions. Left solid line, positively skewed, skewness = 2; left dashed line, negatively skewed, skewness = −2; right solid line, normal distribution, kurtosis = 0; right dashed line, platykurtic distribution (i.e., Wigner semicircle distribution), kurtosis = −1; right dotted line, leptokurtic distribution (i.e., double exponential or Laplace distribution), kurtosis = 3. All distributions have zero mean and unit variance. The distributions on the left have zero kurtosis, and the distributions on the right have zero skewness.

Normal and logistic distributions

Of particular interest to this study are the normal and logistic distributions (Figure 2). Let $Φ$ denote the standard normal ogive function: $Φ (X) = \frac{1}{\sqrt{2 π}} \int_{- \infty}^{X} \exp (- t^{2} / 2) d t$ . Here, the ogive function is a cumulative density function. Let $Ψ$ denote the standard logistic ogive function: $Ψ (X) = 1 / (1 + \exp (- X)) = \exp (X) / (1 + \exp (X))$ . In this case, the word ogive is usually omitted because typical users employ the logistic function without associating it with a logistic distribution (Savalei, 2006). Both distributions have zero skewness, but the logistic distribution has greater kurtosis (i.e., 1.2) than that of the normal distribution (i.e., zero).

FIGURE 2

Figure 2. A comparison of normal and logistic distributions. Left, the cumulative density (i.e., ogive) functions; right, probability density functions; solid, the standard normal distribution; dotted, the logistic distribution with zero mean and scale constant of 1.702.

Statisticians have described distributions with different kurtosis values in several ways, and the normal and logistic distributions fit these descriptions. The logistic function has a thicker center (i.e., $X$ near zero) and tails (i.e., beyond and around ±2) and less thick shoulders (i.e., around ±1) than those of the normal ogive function (Balanda and MacGillivray, 1988). The logistic distribution has a greater propensity to produce outliers than does the normal distribution (Westfall, 2014). The probability density function (PDF) of a normal distribution crosses that of a logistic distribution with the same mean and variance four times (Dyson, 1943). Since a PDF is the derivative of an ogive function, applying Dyson’s description to the ogive function, the logistic function minus the normal ogive function has four local minima or maxima with the signs +, −, +, and – (Figure 3). The different kurtosis values make the two functions different.

FIGURE 3

Figure 3. Difference between normal and logistic distributions when using three constants. Left, difference between logistic and normal ogive (i.e., cumulative) functions; right, difference between logistic and normal probability density functions; horizontal line containing zero: the standard normal distribution; solid line, logistic distribution using the scale constant of 1.702; dashed line, logistic distribution using the scale constant of 1.814; dotted line, logistic distribution using the scale constant of 1.6.

To quantify the difference, the logistic function has a maximum absolute error of approximately 0.01 from the normal ogive function (Haley, 1952); IRT studies describe the value of 0.01 as sufficiently small. For example, citing this value, Birnbaum (1968) argued that “any graph of a [normal ogive function] would serve equally well to illustrate [a logistic function]” (p. 399). Few studies explain this value using a numerical example, allowing the possibility that their readers misconceive it as a relative error of 1% rather than an absolute error. One of the $X$ values at which the maximum absolute difference between the two functions is located is −2.044 if we use Haley’s (1952) scale constant: $Φ (- 2.044) \approx 0$ .020 and $Ψ (- 1.702 \times 2.044) \approx 0$ .030, a relative difference of approximately 50%. Inputting an $X$ value less than −2.044 decreases the absolute difference but increases the relative difference. For example, $Φ (- 3) \approx 0$ .001 and $Ψ (- 1.702 \times 3) \approx 0$ .006. Whether this difference is small is debatable.

Let us review another reference point for determining whether 0.01 is a negligible size. Statisticians have devised dozens of closed-form functions that approximate the normal ogive function, most of which approximate the normal ogive function better than the logistic function does. For example, some have a maximum absolute error of approximately 0.0000001 (Shore, 2005). Unfortunately, these functions are more complex than the logistic function. The better they approximate the normal distribution, the more terms they have (Dombi and Jónás, 2018). Their complexity makes it difficult for humans to remember and understand them, although computers may have no problem calculating them. In summary, the logistic distribution has the advantage of simplicity, but it is not the best approximation of the normal distribution. This property can prevent IRT models that use the two functions from being fully interchangeable.

FA and IRT models

FA models

This study uses a unidimensional FA model. Let us suppose there are $n$ persons and $k$ items. The continuous score of person $i (= 1, \dots, n)$ for item $j$ $(= 1, \dots, k)$ (i.e., $X_{i j}^{*}$ ) has zero mean and unit variance. $X_{i j}^{*}$ is a linear combination of person $i$ ’s latent variable (i.e., $θ_{i}$ ) and an error (i.e., $e_{i}$ ): $X_{i j}^{*} = λ_{j} θ_{i} + e_{i}$ , where $λ_{j} (\leq 1)$ is the factor loading. $θ_{i}$ is drawn independently from a normal or nonnormal distribution with zero mean and unit variance. $e_{i}$ is independently drawn from a normal distribution with zero mean and variance $1 - λ_{j}^{2}$ . We can transform the continuous score $X_{i j}^{*}$ into a discrete score $X_{i j}$ by using thresholds. If there are two categories, $X_{i j}$ is one of $\{0, 1\}$ . Let $τ_{j}$ denote the threshold of item $j$ . We obtain $X_{i j} = 0$ if $X_{i j}^{*} < τ_{j}$ and $X_{i j} = 1$ otherwise. If there are $C$ categories, $X_{i j}$ is one of ${0, 1, \dots, C - 1}$ . Let $τ_{j c}$ denote the $c$ th threshold of item $j$ , where $c$ is one of $\{0, 1, \dots, C\}$ . The first and last thresholds are trivial: $τ_{j 0} = - \infty$ and $τ_{j C} = \infty$ . We obtain $X_{i j} = c$ if $τ_{j c} \leq X_{i j}^{*} < τ_{j (c + 1)}$ .

IRT models

Normal ogive

If there are two categories, the two-parameter model is $P (X_{i j} = 1 | θ_{i}) = Φ (α_{j} θ_{i} - β_{j})$ , where $α_{j}$ is the slope or discrimination parameter of item $j$ and $β_{j}$ is the location or difficulty parameter of item $j$ . We automatically obtain $P (X_{i j} = 0 | θ_{i}) = 1 - P (X_{i j} = 1 | θ_{i})$ . If there are $C$ categories, the two-parameter model is $P (X_{i j} \geq c | θ_{i}) = Φ (α_{j} θ_{i} - β_{j c})$ , where $β_{j c}$ is the $c$ th ( $c = 1, \dots, C - 1$ ) location parameter of item $j$ . We obtain $P (X_{i j} = c | θ_{i}) = P (X_{i j} \geq c | θ_{i}) - P (X_{j} \geq c + 1 | θ_{i})$ , where $P (X_{j} \geq C | θ_{i}) = 0$ .

Logistic

Birnbaum’s (1968) two-parameter model describes dichotomous item scores as $P (X_{i j} = 1 | θ_{i}) = Ψ (D (α_{j} θ_{i} - β_{j}))$ , where $D$ is a scaling constant. When $C$ is greater than 2, the two-parameter model is $P (X_{i j} \geq c | θ_{i}) = Ψ (D (α_{j} θ_{i} - β_{j c}))$ .

Interchangeability between models

Between logistic and normal ogive models

Scaling constant

The link connecting the logistic and normal ogive models is the scaling constant $D$ . The purpose of this constant is to make the results of the two models as approximately equal as possible. The first to use this constant in logistic models was Birnbaum (1968), who used a value of 1.7, citing Haley (1952). Hailey’s original value was 1.702 (Camilli, 1994), the value used in this study. Three other values (i.e., 1.6, 1.749, and 1.814) have been proposed as scaling constants. Furthermore, further thought reveals that any value between 1.6 and 1.814 can be used as a scaling constant. This study is interested in which is the best but first describes the rationale each value is based on and how each approximates the two distributions.

Review of existing rationales

Let us start with the old constants. The easiest rationale for approximating the normal and logistic distributions is to equalize the variances of the two distributions, which gives 1.814 ( $\approx π / \sqrt{3}$ ). However, this value creates excessively large nontail errors (Figure 3). Haley’s (1952) idea was to minimize the maximum absolute error, which makes the maximum tail and nontail error have the same value (Camilli, 1994). Therefore, the value he obtained, 1.702, makes the difference between the two distributions oscillate between the maximum absolute errors (i.e., approximately −0.01 and.01). Amemiya (1981) calculated 13 values from normal and logistic distributions: 1.6 is the value he found by trial and error to minimize the difference between the two distributions. Of the 13 values Amemiya (1981) reviewed, 11 are in the nontail region (i.e., $X$ = 0, 0.1, 0.2, …, 0.9, and 1), and only two are in the tail (i.e., $X$ = 2 and 3). These constants are based on a relatively simple rationale.

Few scholars have attempted to find a better value than these older values. An exception is Savalei (2006), who suggested a value of 1.749 that minimizes the Kullback–Leibler (KL) information (Kullback and Leibler, 1951). Savalei did not directly demonstrate how minimizing the KL information is related to maximizing the interchangeability between the logistic and normal ogive models. In any case, her study is based on more sophisticated statistical rationales than those of existing values, and our best knowledge at this moment expects that 1.749 maximizes model interchangeability. An interesting point raised by Savalei is that the KL divergence when the normal ogive function is approximated by the logistic function differs from the KL divergence when the logistic function is approximated by the normal ogive function (the value that minimizes the latter is 1.814). This asymmetry suggests that the departure and arrival models should be clear (i.e., from normal ogive to logistic) when discussing interchangeability.

Issues illustrated by graphs

An intuitive understanding can originate from graphs (Figure 3). First, choosing a constant involves a trade-off. The difference between the two ogive functions has four local minima or maxima (i.e., $X \approx$ –2, −0.5, 0.5, and 2). Let the errors near −2 and 2 be the tail errors and the errors near −0.5 and 0.5 be the nontail errors. Minimizing one of these comes at a sacrifice of the other. A large constant (e.g., 1.814) reduces the tail errors at the cost of increasing the nontail errors; a small constant (e.g., 1.6) does the opposite. The difference between these constants is how much weight to give to the tail and nontail errors.

The above discussion suggests that using 1.702 makes sense only in a special case, i.e., when we give equal weight to the tail and nontail errors. Birnbaum’s (1968) explanation that using 1.702 ensures that the maximum errors never exceed 0.01 (i.e., an easy number to remember) appeals to human intuition but misses the point. We should give these two errors optimal weights, not equal weights. Nontail regions are more frequent than tail regions in a bell-shaped distribution, so if we weight frequencies, the nontail errors outweigh the tail errors. However, the tail errors may exert a greater impact than the nontail errors do. It is difficult to predict which of these two scenarios is correct in the IRT model.

Second, minimizing differences in the ogive functions differs from minimizing differences in the PDFs. One might think that both questions have the same answer (Cook, 2010). However, a comparison of the left and right sides of Figure 3 reveals that a constant that performs well in one case may perform poorly in another. In the ogive function, 1.702 has smaller maximum errors than those obtained with 1.6 or 1.814. However, in the PDF, 1.702 has a larger maximum error than that of 1.6. However, that does not mean that 1.6 is the best. Little is known about which value minimizes the PDF difference between the normal and logistic distributions.

Rationales not studied by existing studies

Little research has been done to examine the difference between the normal and logistic distributions from various angles. Haley (1952) focused on minimizing the maximum absolute error for the ogive functions. First, the usual approach to error minimization is to minimize the sum of all errors, whether absolute or squared. Second, which scaling constant minimizes the difference in PDFs has not yet been examined. Study 1 addresses these issues.

In existing studies, logistic and normal distributions instead of logistic and normal ogive models have been the source of comparison. The two problems may have different answers. Probability distributions produce continuous outputs, whereas typical IRT models produce discrete outputs. Understanding logistic and normal distributions is only a means to understanding the difference between logistic and normal ogive models. In addition to an indirect examination through probability distributions, this study directly examines our ultimate concern, that is, the interchangeability between logistic and normal ogive models. Studies 2 through 4 address this issue.

Between the FA and normal ogive models

Takane and de Leeuw’s (1987) proof suggests that we can transform FA models into two-parameter normal ogive models, and vice versa, if the latent variables follow a normal distribution. The formulas for transforming the parameters of the FA model to those of the normal ogive model are as follows:

\begin{array}{l} α_{j} = λ_{j} / \sqrt{1 - λ_{j}^{2}} a n d β_{j c} = τ_{j c} / \sqrt{1 - λ_{j}^{2}} & (1) \end{array}

.

Rearranging these formulas creates the opposite formulas:

\begin{array}{l} λ_{j} = α_{j} / \sqrt{1 + α_{j}^{2}} a n d τ_{j c} = β_{j c} / \sqrt{1 + α_{j}^{2}} & (2) \end{array}

The formulas are proven only when the latent variables follow a normal distribution. Latent variables in the real world are not exactly normally distributed. If the formulas do not hold even for data that slightly violate normality, it is difficult to apply them to parameter estimation and data generation for practical purposes. On the other hand, if the formulas hold for data that severely violate normality, we should use them more aggressively than we do now.

Approaches to model interchangeability

Isolating model interchangeability from others

We can use an FA or IRT model in two directions: to estimate parameters from given data and to generate data from given parameters. Most users use a model for parameter estimation, so it makes sense for academic research to focus on parameter estimation as well. However, research focusing on parameter estimation may have difficulty isolating whether differences in parameter estimates between models are due to differences in models or other causes. The FA, logistic, and normal ogive models have their own estimation techniques and, in many cases, dedicated software. For example, Wirth and Edwards (2007) used weighted least squares for categorical data (WLS) and modified WLS for categorical data (MWLS) for the FA model, the expectation–maximization (EM) technique for the logistic model, and the Markov chain Monte Carlo estimation technique for the normal ogive model. In this case, it is challenging to isolate whether the differences in the parameter estimates are due to different models or different estimation techniques. Their work also suggests that the effect of different estimation techniques overpowers that of different models. Parameter estimates of the FA model by MWLS were approximate to those of the logistic and normal ogive models but were meaningfully different from those of the FA model via WLS. In summary, the above discussion suggests that isolating model interchangeability from other causes requires using only one model for parameter estimation or omitting it.

Overview of this study

Each study

This study examines model interchangeability from various angles through six studies (Table 1). First, Study 1 examines only probability distributions instead of using IRT models, as previous studies did (Haley, 1952; Amemiya, 1981; Savalei, 2006). Second, Study 2 examine model interchangeability in parameter estimation and use a logistic model for parameter estimation. Third, Studies 3 and 4 examine model interchangeability in data generation and use all three models for data generation. Among these, additional explanation is needed for the Kolmogorov–Smirnov (KS) test to be performed in Study 3.

TABLE 1

Table 1. Summary of each study.

KS test

Study 3 performs the KS test, which tests whether two pieces of sample data are generated from the same probability distribution. This study proposes a weak criterion and a strong criterion for interchangeability by using the KS test. The weak criterion directly uses the KS test. First, one dataset is generated from each of the two models. Second, the KS test is performed on the two datasets. If there is no statistically significant difference, the two models pass the weak criterion for interchangeability. The original KS test targets continuous data and produces conservative results when applied to discrete data (Conover, 1972). To address this issue, R’s dgof package (Arnold and Emerson, 2011) provides p values calculated by using Monte Carlo simulations.

The strong criterion involves a blind test. For example, suppose that three sets of data are generated from two models, and two persons bet on which are generated from the same model. One makes random guesses, and the other uses statistical information (i.e., statistical guess). This person realizes that the KS statistic contains information even if there is no significant difference in the statistics of the three datasets. After comparing the KS statistics of the three datasets, this person guesses that the datasets with the smallest statistic were generated from the same model. If this statistical guess cannot outperform random guessing, the two models are considered fully interchangeable; otherwise, model interchangeability is measured by how much the statistical guess outperforms random guessing.

Study 1: Finding scaling constants from probability distributions

Study 1 addresses which scaling constant minimizes the difference between logistic and normal distributions. Current knowledge on this issue is limited in two ways: First, it covers only the ogive functions of the two distributions, not their PDFs; second, the knowledge is derived by minimizing the maximum error (Haley, 1952), not the total error (i.e., the sum of absolute or squared errors), which is a typical concern in error minimization.

Haley’s choice may have been due to the computational advantage of the former over the latter. A human can find D that minimizes the maximum error (i.e., $\min_{D} \max_{θ} | Ψ (D θ) - Φ (θ) |$ ) without a computer. Differential calculus simplifies the calculation, and the known approximate value of 1.814 enables us to obtain a convergent solution in only three iterations (Camilli, 1994). However, minimizing the total error requires integration [e.g., $\min_{D} \int_{- \infty}^{\infty} {(Ψ (D θ) - Φ (θ))}^{2} d θ$ ]; its formula is not in a closed form and is challenging for a human to compute without a computer. Haley’s (1952) solution of D = 1.702 may be due to his limited access to a computer.

This study takes a computational approach. First, the scaling constants that minimize the maximum absolute error of the two ogive functions and PDFs [i.e., $\min_{D} \max_{θ} | Ψ (D θ) - ϕ (θ) |$ ] are obtained, where $ψ$ is the PDF of the logistic distribution and $ϕ$ is the PDF of the standard normal distribution. Second, the scaling constants that minimize the sum of the absolute [i.e., $\int_{- \infty}^{\infty} | Ψ (D θ) - Φ (θ) | d θ$ and $\int_{- \infty}^{\infty} | ψ (D θ) - ϕ (θ) | d θ$ ] and squared (i.e., $\int_{- \infty}^{\infty} {(Ψ (D θ) - Φ (θ))}^{2} d θ$ and $\int_{- \infty}^{\infty} {(ψ (D θ) - ϕ (θ))}^{2} d θ$ ) errors are obtained.

Methods

Obtaining the value that minimizes the maximum absolute error is a double optimization problem (i.e., $\min_{D}$ and $\max_{θ}$ ). First, this study obtains the maximum absolute error (i.e., $\max_{θ}$ ) by using R’s (R Core Team, 2023) optimize function. Second, the D value that minimizes (i.e., $\min_{D}$ ) these maximum absolute errors is obtained by a brute-force approach. This study obtains the maximum absolute errors of each using D’s in the 0.0001 interval between 1.6 and 1.8 and then reports the value that minimizes the maximum absolute errors.

Obtaining the value that minimizes the sum of the absolute or squared errors involves brute-force optimization. To obtain the integral, this study uses R’s integrate function, which relies on the Quadpack package (Piessens et al., 2011). All code used in this study is publicly.¹

Results and discussion

Results

Table 2 shows the results: 1.7017 represents Haley’s (1952) constant to four decimal places (Camilli, 1994).

TABLE 2

Table 2. Constants that minimize the difference between the two distributions (Study 1).

Explanation of unexpected results

The values of the constants from the ogive functions are approximate, but the constants from the PDFs are not. The ogive function difference essentially has only two components, the tails and nontails, and minimizing the maximum error is equivalent to equalizing the maximum tail error and the maximum nontail error. The PDF difference has three components: the tails, shoulders, and center (Figure 3). Minimizing the maximum error simply involves minimizing the error in the area where the maximum error is located. According to further analysis, the maximum error is not in the tails for D values ranging between 1.6 and 1.8. If D is less than 1.677, the shoulders have the maximum error; otherwise, the center has the maximum error. Therefore, minimizing the maximum error between D = 1.6 and 1.677 minimizes the errors in the shoulders; the errors in the tails and center are irrelevant.

However, minimizing the sum of errors considers all areas. For example, minimizing the sum of the absolute errors involves minimizing the area between the curve and the line at y = 0 in the right graph of Figure 3. This minimization requires that each area be of a similar size. D = 1.6 creates an excessively small center area and excessively large tail areas; a constant greater than 1.6 increases the center area and decreases the tail areas.

Need for additional simulations

One may question the generalizability of Study 1. Study 1 considered only probability distributions and did not examine IRT models by using these distributions. Although the probability distributions are continuous, IRT models use them to generate or explain discrete item scores. In the process, a new factor that we are unaware of may intervene. Study 2 addresses this issue.

Study 2: Finding scaling constants from IRT models

In existing studies (Haley, 1952; Amemiya, 1981; Savalei, 2006) and in Study 1, only normal and logistic distributions have been examined. This approach is indirect because IRT users are interested in models, and probability distributions are merely a means of describing models. There is no self-evident criterion as to which of these studies’ rationales, such as the maximum error, equal variances, or the KL information, are best applied to the interchangeability of IRT models. Study 2 addresses this issue by directly examining normal ogive and logistic models.