The Correlational Agreement Coefficient CA(≤,D)—a mathematical analysis of a descriptive goodness-of-fit measure

https://doi.org/10.1016/j.mathsocsci.2004.03.003Get rights and content

Abstract

The Correlational Agreement Coefficient, CA(≤,D), was introduced by J.F.J. van Leeuwe in 1974 within Item Tree Analysis (ITA), a data-analytic method to derive quasi orders (surmise relations) on sets of bi-valued test items. Recently, it has become of interest in connection with Knowledge Space Theory (KST). The coefficient CA(≤,D) is used as a descriptive goodness-of-fit measure to select out of competing surmise relations one with maximal CA(≤,D) value. Formal aspects like boundedness, decomposition, and the interplay between consistency of a surmise relation (with a binary data matrix) and the attainment of the maximum value of CA(≤,D) are investigated. Dependence of CA(≤,D) on trivial response patterns is quantified by a functional relationship that allows one to bunch the impact of trivial response patterns in a single “bias term”. These considerations should warn against inconsiderate use of the coefficient. Mathematical reasons for failed, however, heuristically plausible, properties are presented.

Introduction

In the field of knowledge assessment and acquisition based on prerequisite relationships, a central problem is to derive reflexive, transitive binary relations on sets of bi-valued test items. This is done for modeling hierarchies between items based on solvability dependencies of the type: “Given a positive response to an item J (e.g., J solved), it can be surmised that another item I will also be responded to positively (e.g., I solved)”. Such binary relations (quasi orders) are central within Knowledge Space Theory (KST) introduced by Doignon and Falmagne, 1985, Doignon and Falmagne, 1999. In KST, they are called surmise relations. However, given a field of knowledge and a set of bi-valued test items appropriate enough to allow for fine-grained and representative coverage of the field, the problem is how to establish a reasonable surmise relation on the item set. Item Tree Analysis (ITA) is a data-analytic method for the derivation of surmise relations on sets of bi-valued test items. ITA was introduced by Airasian, Bart, and Krus in 1973 Airasian and Bart, 1973, Bart and Krus, 1973 and was developed into the present form by Leeuwe (1974). In particular, Leeuwe (1974) introduced the Correlational Agreement Coefficient, CA(≤,D), as part of ITA.1 In ITA, CA(≤,D) is used as a descriptive goodness-of-fit measure to select out of competing surmise relations one with maximal CA(≤,D) value.

Recently, ITA, and in particular, CA(≤,D), has become of interest in connection with KST; see Held et al. (1995), Held and Korossy (1998), Schrepp, 1999, Schrepp, 2003, and Schrepp et al. (1999). For the application of efficient adaptive computer-based knowledge assessment procedures, one requires surmise relations of “a trade-off type”. On the one hand, it should reflect the data as well as possible (descriptive adequacy), and on the other hand, it should be of as large as possible cardinality as a set. The authors tried to achieve this by applying ITA and the coefficient CA(≤,D) (cp. Section 12).

Leeuwe (1974) reports:2 “…This coefficient [partial order reproducibility coefficient]3 cannot serve therefore [stationarity in tolerance level L=0] as a criterion for choosing the best solutionThis procedure [CA(≤,D)] has the advantage that it gives a lower value not only in the case that too many relations are constructed [larger tolerance levels], but also in the case that the number of relations is very low [smaller tolerance levels]”.

ITA's renaissance in connection with KST has led to criticisms of CA(≤,D). Held and Korossy (1998) stress the “ad hoc” (descriptive) nature of CA(≤,D): “…we will apply two ad hoc criteria [one, the CA(≤,D)]”. Schrepp (1999) illustrates that CA(≤,D) can be reduced by non-comparable item pairs: “…for relations which contain many non-connected item pairs it seems possible that the correct relationL will not have the best CA(≤L) value”. Another criticism of ITA and CA(≤,D) is voiced by Wesiak et al. (2004). They observe that trivial response patterns (i.e., all or none of the items answered positively), though empirically irrelevant with respect to solvability dependencies between items, do drastically manipulate ITA solutions. This is due to CA(≤,D)'s dependence on such patterns (cp. Section 11).

In the light of these observations, a comprehensive mathematical analysis of CA(≤,D) is missing. Rather, the elaborations so far are heuristic, based on experimentation with certain data sets. Other deeper properties of CA(≤,D) are actually not known so far. Thus, this work represents a coherent and extensive mathematical treatise on CA(≤,D). In particular, it warns against inconsiderate use of the coefficient, and if used, it tells to what one needs to pay attention. Perhaps, this work may also be viewed as a general guide to carry out a first mathematical analysis of ad hoc formulated coefficients. Additionally, Section 12 contains valuable methodological issues in regard to goodness-of-fit measures in general. Beside criteria proposed by Goodman and Kruskal, 1954, Goodman and Kruskal, 1959, Goodman and Kruskal, 1963, Goodman and Kruskal, 1972 (reviewed by Bishop et al., 1975, Liebetrau, 1983), Section 12 mentions the importance of purpose-specific goodness-of-fit measures and the problem of trade-off between different fit criteria.

This section reviews Leeuwe's (1974) Item Tree Analysis.

We use the following notation (m,nN)4:

  • Q≔{Il: 1≤lm} set of dichotomous items,

  • P≔{Pk: 1≤kn} sample of subjects,

  • D≔(dkl′) corresponding binary (=0/1) n×m data matrix,

and for every (Ii,Ij)∈Q×Q (1≤i, jm), the 2×2 table notationIiIj101aijbij0cijdijwith aij,bij,cij,dijN⋃{0}; in respective order, the absolute frequencies of subjects solving items Ii and Ij [aij], solving Ii, not Ij [bij], solving Ij, not Ii [cij], and solving neither Ii nor Ij [dij]. Then, the ITA rule for generating binary relations ≤L (0≤Ln) is given byIiLIj:⇔cij≤L.This L (0≤Ln) is called tolerance level. The ITA rule represents STEP1 of ITA. The latter consists of five steps, STEP1–STEP5:
  • STEP1.

    Determine the binary relations ≤L for L=0, 1,…, n.

  • STEP2.

    From the ≤L (0≤Ln), remove those that are not transitive.

  • STEP3.

    Set a critical value 0<c≤1 for the proportions, pL, of subjects not contradicting the respective surmise relations ≤L in STEP2.

  • STEP4.

    From the surmise relations in STEP2, remove those with pL<c.

  • STEP5.

    From the remaining surmise relations (after STEP4)—≤0 is always contained—select one with maximal CA(≤,D) value.

The Correlational Agreement Coefficient is used as a goodness-of-fit measure to handle the selection problem in STEP5. From the remaining surmise relations, select an “optimal” one, i.e., one with maximal CA(≤,D) value.

Basic concepts and the definition of empirical Pearson correlation are reviewed (Section 2). The definition of theoretical correlation is presented (Section 3). Empirical and theoretical correlation are compared in regard to coincidence (Section 4) and boundedness (Section 5). Based on this, CA(≤,D) is defined coherently (Section 6). A natural decomposition of the coefficient CA(≤,D) into four partial functions is given (Section 7). It is analyzed in regard to boundedness (Section 8). An analysis of the interplay between the consistency of a surmise relation ≤ with a data matrix D and the attainment of the maximum value of CA(≤,D) is presented 9 Consistency–maximum problem, 10 Maximum–consistency problem. We conclude with the analysis of the dependence of CA(≤,D) on trivial response patterns (Section 11). The work ends with a discussion (Section 12).

Note that all proofs are deferred to an appendix, section-wise (Appendix A).

Section snippets

Basic concepts

We review basic conventions regarding terminology and notation.

Let Q, P, and D be defined as in Section 1.2. The row zk (1≤kn) of D encodes the responses of subject Pk to all items in Q, whereas column sl (1≤lm) of D encodes the responses of all subjects in P to item Il.

Definition 1

Let Q={Il: 1≤lm} (mN). We define:5S≔{≤⊆Q×Q:≤ quasiorderonQ},Dn∈

Theoretical correlation derived through idealization

Section 4 gives motivation for the form and name of theoretical correlation.

Definition 8

Let (Ii,Ij)∈[Q×Q]A and ≤∈S. Theoretical correlation, rij*, between Ii and Ij, derived through idealization, is defined asrij*≔1:(Ii,Ij)∈≤∧(Ij,Ii)∈≤(1−pIi)·pIj(1−pIj)·pIi:(Ii,Ij)∈≤∧(Ij,Ii)∉≤(1−pIj)·pIi(1−pIi)·pIj:(Ii,Ij)∉≤∧(Ij,Ii)∈≤0:(Ii,Ij)∉≤∧(Ij,Ii)∉≤

Theoretical correlation rij* is well-defined for every (Ii,Ij)∈[Q×Q]A. It is the case that si,sj0n, 1n, i.e., pIi, pIj≠0, 1.

Comparing empirical and theoretical correlation: coincidence

Lemma 9

Let ≤∈S, which is consistent with binary response data D. Then, for all (Ii,Ij)∈≤∩ [Q×Q]A,rij=1:(Ij,Ii)∈≤(1−pIi)pIj(1−pIj)pIi:(Ij,Ii)∉≤

Proof

See Appendix A.1.□

The next corollary gives a first answer to the question of coincidence.

Corollary 10

Let ≤∈S, consistent with D. Let (Ii,Ij)∈≤ with A. Then, theoretical correlation rij* equals empirical correlation rij (i.e., rij*=rij).

What can be said about coincidence in case of not-≤-comparable item pairs?8

Comparing empirical and theoretical correlation: boundedness

Empirical correlation uniformly lies in the interval [−1,1] (Lemma 6). What about theoretical correlation?

Proposition 13

Let Q={Il: 1≤l≤m} (m∈N). It holds:

  • (Relative Interval Nesting). Let D∈M(n×m; {0,1}), ≤∈S, and let (Ii,Ij)∈[Q×Q]A. Then (since (Ii,Ij)∈[Q×Q]A, n≥2),0≤rij*≤n−1.

  • (Proper Divergence to +∞). For m≥2, there exists an ≤S and a pair (Ii,Ij)∈Q×Q with i<j and [(Ii,Ij)∈≤∧(Ij,Ii)∉≤], such that∀n≥2∃Dn−1M(n×m;{0,1}):[[(rij)n−1=−1]∧[(rij*)n−1=n−1]].11

Defining the coefficient CA(≤,D)

Definition 15

Let Q≔{Il: 1≤lm} (mN, m≥2), ≤ be a surmise relation on Q, and D=(dkl′)∈M(n×m; {0,1}). Further, let<Q′≔{(Ii,Ij)∈Q×Q:i<jand(Ii,Ij)fulfillsA}.

The Correlational Agreement Coefficient, CA(≤,D), is defined asCA(≤,D)≔1−2m(m−1)(Ii,Ij)∈<Q(rij−rij*)2.

We close this section with two (actually obvious) remarks.

Decomposing the coefficient CA(≤,D)

We begin with some notation.

Definition 16

Let Q≔{Il: 1≤lm} (m≥2), DM(n×m; {0,1}), and ≤∈S. We define:<Q′=<Q′∩{(Ii,Ij)∈Q×Q:(Ii,Ij)∈≤∧(Ij,Ii)∈≤},<Q′=<Q′∩{(Ii,Ij)∈Q×Q:(Ii,Ij)∈≤∧(Ij,Ii)∉≤},<Q′=<Q′∩{(Ii,Ij)∈Q×Q:(Ii,Ij)∉≤∧(Ij,Ii)∈≤},<Q′=<Q′∩{(Ii,Ij)∈Q×Q:(Ii,Ij)∉≤∧(Ij,Ii)∉≤}.

The family F ≔(<Q′,<Q′,<Q′,<Q′) of subsets of <Q′ fulfills<Q′=k∈{≅,≪≫,≭}<Qk(Covergingproperty),<Qk′∩<Ql′=∅fork,l∈{≅,≪≫,≭},k≠l(Pairwisedisjoint).

In general, F may not be a partition of <Q′, since one of the members <Qi

Boundedness of CA(≤,D)

Proposition 18

Let Q≔{Il: 1≤l≤m} (m≥2). It holds:

  • (Relative Interval Nesting). If D∈M(n×m; {0,1}) for n∈N fixed, then, for all ≤∈S,1−n2≤CA(≤,D)≤1.That is, partial function CA(.,D): SR, ≤↦CA(.,D)(≤)≔CA(≤,D) has a bounded range CA(.,D)(S)⊂[1−n2,1].

  • (Proper Divergence to −∞). There exists an ≤S and (Dn)n∈N in D:limn→∞(CA(≤,Dn))n∈N=−∞,in the sense of diverging properly to −∞.

Proof

See Appendix A.4.□

Consistency–maximum problem

Reconsider the example in Lemma 11:

Lemma 20 Counterexample

Let Q≔{Il: 1≤l≤m} (m≥2) and ≤∈S be a total fit to D. Then, it is not necessarily the case that CA(≤,D)=1. In other words, consistency does not imply maximum in general.

Proof

See Appendix A.5.□

Remark

If we presuppose consistency, and that CA(≤,D) depends on (rijrij*)2>0 for a not-≤-comparable item pair (Ii,Ij)∈<Q′, then we have:CA(≤,D)≔1−2m(m−1)(Ii,Ij)∈<Q(rij−rij*)2=1−2m(m−1)(Ii,Ij)∈<Q′,δij>0(rij−rij*)2>0.>0<1

Equivalence between consistency and maximum is not a

Maximum–consistency problem

The converse implication is also not true in general.

Lemma 22 Counterexample

Let Q≔{Il: 1≤l≤m} (m≥2), D∈D, and ≤∈S with CA(≤,D)=1. Then, it is not necessarily the case that ≤ is consistent with D. In other words, maximum does not imply consistency in general.

Proof

See Appendix A.6.□

Proposition 23 states that maximum CA(≤,D)=1 implies consistency, provided no subject contradicts any of the non-reflexive20 pairs IiIj with non-existent empirical correlation rij.

Proposition 23

Functional relationship for equivalent data matrices

Wesiak et al. (2004) observe a “data-related” problem arising when trivial response patterns are included/excluded in/from the input data matrix for ITA. Such response patterns, though empirically irrelevant with respect to solvability dependencies between items, do drastically manipulate ITA solutions. Larger/smaller optimal Lopt (stronger/weaker structures ≤opt) are obtained by adding/removing trivial patterns to/from the input data.

Lemma 24 bunches the impact of such patterns in a single

Major misconceptions in CA(≤,D) publications

Two major misconceptions are present in some of the CA(≤,D) publications mentioned in Section 1.1:

  • (A)

    The coefficient CA(≤,D) does not measure goodness-of-fit with respect to the fit criterion “number of response patterns in D matching all pairs in ≤”. In the terminology of knowledge spaces (Doignon and Falmagne, 1999), this is refered to as “number of response patterns in D matching one of the knowledge states in the quasi ordinal knowledge space K, corresponding to ≤”.23

Acknowledgements

This research was supported by grants from the University of Graz to Ali Ünlü.

References (27)

  • J.-P. Doignon et al.

    Spaces for the assessment of knowledge

    International Journal of Man–Machine Studies

    (1985)
  • M. Schrepp

    On the empirical construction of implications between bi-valued test items

    Mathematical Social Sciences

    (1999)
  • P.W. Airasian et al.

    Ordering theory: a new and useful measurement model

    Educational Technology

    (1973)
  • W.M. Bart et al.

    An ordering-theoretic method to determine hierarchies among items

    Educational and Psychological Measurement

    (1973)
  • G. Birkhoff

    Rings of sets

    Duke Mathematical Journal

    (1937)
  • Y.M.M. Bishop et al.

    Discrete Multivariate Analysis: Theory and Practice

    (1975)
  • J. Bortz

    Statistik

    (1989)
  • H. Cramér

    Mathematical Methods of Statistics

    (1946)
  • C.M Dayton et al.

    A probabilistic model for validation of behavioral hierarchies

    Psychometrika

    (1976)
  • J.-P. Doignon et al.

    Knowledge Spaces

    (1999)
  • G. Fischer

    Lineare Algebra

    (1995)
  • L.A. Goodman et al.

    Measures of association for cross classifications

    Journal of the American Statistical Association

    (1954)
  • L.A. Goodman et al.

    Measures of association for cross classifications: II. Further discussion and references

    Journal of the American Statistical Association

    (1959)
  • Cited by (6)

    • On the evaluation of fit measures for quasi-orders

      2007, Mathematical social sciences
    • A Neuroevolutionary Method for Knowledge Space Construction

      2022, Computer Science and Information Systems
    View full text