Assessing agreement with intraclass correlation coefficient and concordance correlation coefficient for data with repeated measures

https://doi.org/10.1016/j.csda.2012.11.004Get rights and content

Abstract

The intraclass correlation coefficient and the concordance correlation coefficient are two popular scaled indices for assessing the closeness between observers who make measurements for quantitative responses. These two indices are usually based on subject and observer effects only, and therefore we cannot use these indices if the observer produces repeated measurements rather than replicated readings. In this paper, we consider not only subject and observer effects, but also time effects for data with repeated measurements since it is difficult to obtain the true replications in practice. We compare these two agreement indices for different combinations of random or fixed effects of observer and time. Finally, we use image data of 2D-echocardiograms to illustrate the proposed methodology and the comparison of these two indices. If there is a need to choose between these two indices for repeated measurements, we recommend to use the new concordance correlation coefficient since it does not need ANOVA assumptions.

Introduction

The intraclass correlation coefficient (ICC) and the concordance correlation coefficient (CCC) are two popular indices for assessing agreement between quantitative measurements taken from different observers. ICC and CCC are usually used for data without and with replications and the comparison between these two indices based on this data structure under a general model is reported by Chen and Barnhart (2008). However, we cannot use this methodology if repeated measurements rather than replications are collected. Several authors have studied the agreement indices of ICC and CCC for data with repeated measurements. Vangeneugden et al. (2005) investigated the ICC by linear mixed models with serial correlation for inter-observer, intra-observer, and absolute agreement, where both observer and time are treated as random effects. King et al. (2007) proposed a CCC for assessing inter-observer agreement for a response with repeated measurement, where both observer and time are treated as fixed effects. Chen and Barnhart (2011) also proposed a new CCC for assessing inter-observer, intra-observer, and absolute agreement for data with repeated measurements where observers and times are treated as random since researchers may assess agreement among many observers who take measurements at different time points.

In addition to treating both observer and time as random effects in defining CCC, there are situations where researchers may be interested in the case of random observer and fixed time, or fixed observer and random time. For example, a study is designed to assess the agreement among subjects’ measurements taken by observers (e.g. nurse) for two shifts. Researchers can conduct and assess an agreement study by randomly selecting several nurses from the nurse population and obtaining measurements from the chosen nurses at two time points. In this study, nurse is random and time is fixed because different nurses may produce different measurements at the specific time. Another example is to assess agreement among a fixed number of observers for image study. Patients may have no scheduled visits to take measurements by these observers. In this example, observer is fixed and time is random because the same observers may produce different measurements at different times. Therefore, any combinations of fixed or random effects for observer and time for an agreement study of CCC or ICC can happen depending on the goals of the researchers.

In this paper, we propose new CCCs and ICCs for the remaining combinations of random or fixed effects for the observer and time. We summarize and compare CCCs and ICCs between combinations of random or fixed effects for data with repeated measurements and illustrate the methodology with an example from image study. Section 2 studies four combinations for random or fixed effects of observer and time for the two indices, and introduces the new CCCs and ICCs for the remaining combinations for different agreement assessments. Section 3 presents the estimation and inference for the methods introduced in Section 2. Section 4 demonstrates the performance of the new CCC for the case of random observer and fixed time by a simulation study. Section 5 illustrates four combinations for CCC and ICC by image data. Finally, Section 6 discusses the comparison between these two indices.

Section snippets

Methodology

Consider that there are N randomly selected subjects where measurements are taken by J observers at K time points. Two factors, observer and time, can be treated either as random or fixed. If the factor is treated as random, the levels of this factor are treated as random samples from the corresponding population. If the factor is treated as fixed, the levels of this factor are the finite number of levels for this factor. Chen and Barnhart (2011) proposed CCCR for random observers and random

Estimation and inference

The point estimation and statistical inference of CCC for case (1) has been proposed by Chen and Barnhart (2011). Similar to their previous work, we present the estimation and inference of CCC for case (2) in Section 3.1, while the remaining cases can be done in a similar fashion. The estimation and inference of ICC for cases (1) through (4) are shown in Section 3.2 for different ANOVA assumptions.

Simulation

To evaluate the performance of CCCRinter,CCCRintra, and CCCRabs, we carried out simulations based on 1000 Monte Carlo data sets. We only present the simulation results for the case of random observers and fixed times. Simulation results for the case of random observers and random times were reported previously (Chen and Barnhart, 2011). The results for the case of fixed observers and random times as well as fixed observers and fixed times are similar to the case of random observers and fixed

Data analysis

We use the image data discussed in Chen and Barnhart (2011) for illustrations. The purpose of the image study is to evaluate the pulmonary arterial hypertension measures by 2D-echocardiograms. To assess the agreement between sonographers who measure the 2D-echocardiogram images, two sonographers make measurements twice on 10 patients. The variables of interest for assessing agreement are Visual Ejection Fraction, Biplane Ejection Fraction, and Right Atrium Volume. The number of patients without

Discussion

In this paper, we have proposed new indices for CCC for assessing inter-observer, intra-observer, and absolute agreement under all four combinations for random or fixed effects of observer and time factors for data with repeated measurements. The point estimates of these CCCs regarding random or fixed effects are obtained by using the method of moments approach for each component of the CCC index. The sample estimates approach used in the paper is a non-parametric approach that has the

References (9)

  • C.-C. Chen et al.

    Comparison of ICC and CCC for assessing agreement for data without and with replications

    Computational Statistics and Data Analysis

    (2008)
  • H.X. Barnhart et al.

    Assessing intra, inter and total agreement with replicated readings

    Statistics in Medicine

    (2005)
  • C.-C. Chen et al.

    Assessing agreement with repeated measures for random observers

    Statistics in Medicine

    (2011)
  • T.S. King et al.

    A repeated measures concordance correlation coefficient

    Statistics in Medicine

    (2007)
There are more references available in the full text version of this article.

Cited by (31)

  • Anomalous aortic origin of coronary artery biomechanical modeling: Toward clinical application

    2021, Journal of Thoracic and Cardiovascular Surgery
    Citation Excerpt :

    We believe it is unlikely that the pulmonary artery (lower pressure system) can compress the aortic root and coronaries (higher pressure system). We evaluate consistency and reproducibility among the 3 observers with interclass coefficient correlation (ICC), estimated using a 2-way random-effect model based on a single rating and absolute agreement.9 For each of the 25 parameters, we calculated the ICC estimation with 95% confidence interval and P value.

  • Entrepreneurial fear of failure: Scale development and validation

    2020, Journal of Business Venturing
    Citation Excerpt :

    Thus, proving a high short-term retest stability of inter-individual differences is a necessary requirement for any study of personality.” To examine this, we used the Intraclass Correlation Coefficient—adopting a two-way, fixed effects, consistency approach—and focused on the stability of each single dimension and the aggregated scale across the three waves of data (McGraw and Wong, 1996; Chen and Barnhart, 2013). The intraclass correlation coefficient presents a more appropriate test of the stability of scores across test-retest situations (Shrout and Fleiss, 1979; McGraw and Wong, 1996).

  • Weighted Alternative Coefficient of Concordance

    2023, Proceedings - 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency, SUMMA 2023
View all citing articles on Scopus
View full text