Assessing agreement with intraclass correlation coefficient and concordance correlation coefficient for data with repeated measures

doi:10.1016/j.csda.2012.11.004

Computational Statistics & Data Analysis

Volume 60, April 2013, Pages 132-145

https://doi.org/10.1016/j.csda.2012.11.004 Get rights and content

Abstract

The intraclass correlation coefficient and the concordance correlation coefficient are two popular scaled indices for assessing the closeness between observers who make measurements for quantitative responses. These two indices are usually based on subject and observer effects only, and therefore we cannot use these indices if the observer produces repeated measurements rather than replicated readings. In this paper, we consider not only subject and observer effects, but also time effects for data with repeated measurements since it is difficult to obtain the true replications in practice. We compare these two agreement indices for different combinations of random or fixed effects of observer and time. Finally, we use image data of 2D-echocardiograms to illustrate the proposed methodology and the comparison of these two indices. If there is a need to choose between these two indices for repeated measurements, we recommend to use the new concordance correlation coefficient since it does not need ANOVA assumptions.

Introduction

The intraclass correlation coefficient ( $ICC$ ) and the concordance correlation coefficient ( $CCC$ ) are two popular indices for assessing agreement between quantitative measurements taken from different observers. $ICC$ and $CCC$ are usually used for data without and with replications and the comparison between these two indices based on this data structure under a general model is reported by Chen and Barnhart (2008). However, we cannot use this methodology if repeated measurements rather than replications are collected. Several authors have studied the agreement indices of $ICC$ and $CCC$ for data with repeated measurements. Vangeneugden et al. (2005) investigated the $ICC$ by linear mixed models with serial correlation for inter-observer, intra-observer, and absolute agreement, where both observer and time are treated as random effects. King et al. (2007) proposed a $CCC$ for assessing inter-observer agreement for a response with repeated measurement, where both observer and time are treated as fixed effects. Chen and Barnhart (2011) also proposed a new $CCC$ for assessing inter-observer, intra-observer, and absolute agreement for data with repeated measurements where observers and times are treated as random since researchers may assess agreement among many observers who take measurements at different time points.

In addition to treating both observer and time as random effects in defining $CCC$ , there are situations where researchers may be interested in the case of random observer and fixed time, or fixed observer and random time. For example, a study is designed to assess the agreement among subjects’ measurements taken by observers (e.g. nurse) for two shifts. Researchers can conduct and assess an agreement study by randomly selecting several nurses from the nurse population and obtaining measurements from the chosen nurses at two time points. In this study, nurse is random and time is fixed because different nurses may produce different measurements at the specific time. Another example is to assess agreement among a fixed number of observers for image study. Patients may have no scheduled visits to take measurements by these observers. In this example, observer is fixed and time is random because the same observers may produce different measurements at different times. Therefore, any combinations of fixed or random effects for observer and time for an agreement study of $CCC$ or $ICC$ can happen depending on the goals of the researchers.

In this paper, we propose new $CCC$ s and $ICC$ s for the remaining combinations of random or fixed effects for the observer and time. We summarize and compare $CCC$ s and $ICC$ s between combinations of random or fixed effects for data with repeated measurements and illustrate the methodology with an example from image study. Section 2 studies four combinations for random or fixed effects of observer and time for the two indices, and introduces the new $CCC$ s and $ICC$ s for the remaining combinations for different agreement assessments. Section 3 presents the estimation and inference for the methods introduced in Section 2. Section 4 demonstrates the performance of the new $CCC$ for the case of random observer and fixed time by a simulation study. Section 5 illustrates four combinations for $CCC$ and $ICC$ by image data. Finally, Section 6 discusses the comparison between these two indices.

Section snippets

Methodology

Consider that there are $N$ randomly selected subjects where measurements are taken by $J$ observers at $K$ time points. Two factors, observer and time, can be treated either as random or fixed. If the factor is treated as random, the levels of this factor are treated as random samples from the corresponding population. If the factor is treated as fixed, the levels of this factor are the finite number of levels for this factor. Chen and Barnhart (2011) proposed ${CCC}_{R}$ for random observers and random

Estimation and inference

The point estimation and statistical inference of $CCC$ for case (1) has been proposed by Chen and Barnhart (2011). Similar to their previous work, we present the estimation and inference of $CCC$ for case (2) in Section 3.1, while the remaining cases can be done in a similar fashion. The estimation and inference of $ICC$ for cases (1) through (4) are shown in Section 3.2 for different ANOVA assumptions.

Simulation

To evaluate the performance of ${CCC}_{R}^{inter}, {CCC}_{R}^{intra}$ , and ${CCC}_{R}^{a b s}$ , we carried out simulations based on 1000 Monte Carlo data sets. We only present the simulation results for the case of random observers and fixed times. Simulation results for the case of random observers and random times were reported previously (Chen and Barnhart, 2011). The results for the case of fixed observers and random times as well as fixed observers and fixed times are similar to the case of random observers and fixed

Data analysis

We use the image data discussed in Chen and Barnhart (2011) for illustrations. The purpose of the image study is to evaluate the pulmonary arterial hypertension measures by 2D-echocardiograms. To assess the agreement between sonographers who measure the 2D-echocardiogram images, two sonographers make measurements twice on 10 patients. The variables of interest for assessing agreement are Visual Ejection Fraction, Biplane Ejection Fraction, and Right Atrium Volume. The number of patients without

Discussion

In this paper, we have proposed new indices for $CCC$ for assessing inter-observer, intra-observer, and absolute agreement under all four combinations for random or fixed effects of observer and time factors for data with repeated measurements. The point estimates of these $CCC$ s regarding random or fixed effects are obtained by using the method of moments approach for each component of the $CCC$ index. The sample estimates approach used in the paper is a non-parametric approach that has the

References (9)

C.-C. Chen et al.
Comparison of $ICC$ and $CCC$ for assessing agreement for data without and with replications
Computational Statistics and Data Analysis
(2008)
H.X. Barnhart et al.
Assessing intra, inter and total agreement with replicated readings
Statistics in Medicine
(2005)
C.-C. Chen et al.
Assessing agreement with repeated measures for random observers
Statistics in Medicine
(2011)
T.S. King et al.
A repeated measures concordance correlation coefficient
Statistics in Medicine
(2007)

There are more references available in the full text version of this article.

Cited by (31)

Definition and interpretation effects: how different vigilance definitions can produce varied results
2021, Animal Behaviour
Animals use vigilance to detect or monitor threats. While numerous aspects of vigilance have been studied across a wide range of species, little work has explored the methodological variation that has emerged across these studies. Different approaches in sampling designs, statistical analyses and definitions can make cross-study comparisons challenging and potentially obscure our understanding of animal vigilance. In this study we explore two important components of vigilance definitions and ask (1) whether definitions vary in their interobserver agreement, and (2) whether using different definitions can create varied results within and across observers. Separate groups of ‘experienced’ and ‘inexperienced’ observers extracted data from video focal observations of wild chacma baboons, Papio ursinus, using four different definitions representative of the variation found within primate vigilance literature. In the first stage of analysis, we found that the four definitions varied in their interobserver agreement, with only an operational-looking definition performing well across both duration and frequency assessments, and an experienced/inexperienced dichotomy. This suggests definitions vary in how well observers can converge on similar interpretations of the same definition. The second part of the analysis used the experienced group's data in a typical primate vigilance analysis and found results varied within observers across definitions, i.e. definition effects, and across observers within definitions, i.e. interpretation effects. Together these results suggest that variation in definitions and their interpretation could have a fundamental role in producing between-study differences in results. Future vigilance research must consider these factors and explore working towards a single framework for studying vigilance, particularly within taxonomic families. Without consistency, cross-study comparisons are likely to be challenging and future observational work on other behaviours may also benefit from exploring these types of definitional issues. For baboons, operationalized definitions appear the most consistent across observers; however, future research should explore its application in other taxa.
Anomalous aortic origin of coronary artery biomechanical modeling: Toward clinical application
2021, Journal of Thoracic and Cardiovascular Surgery
Citation Excerpt :
We believe it is unlikely that the pulmonary artery (lower pressure system) can compress the aortic root and coronaries (higher pressure system). We evaluate consistency and reproducibility among the 3 observers with interclass coefficient correlation (ICC), estimated using a 2-way random-effect model based on a single rating and absolute agreement.9 For each of the 25 parameters, we calculated the ICC estimation with 95% confidence interval and P value.
Anomalous aortic origin of the coronary artery can be associated with sudden cardiac death and ischemic events. Anatomic static characteristics mainly dictated surgical indications, although adverse events are usually related to dynamic physical effort. We developed a computational model able to simulate anomalous coronary behavior, and we aimed to assess its clinical applicability and to investigate coronary characteristics at increasing loading stress conditions.
We selected 5 patients with anomalous aortic origin of the coronary artery and 5 control subjects. For each of them, we construct a 3-dimensional model resembling the aortic root and coronary arteries based on 25 parameters obtained from computed tomography. Structural finite element analysis simulations were run to simulate pressure increasing in the aortic root during exercise (+40 mm Hg, +100 mm Hg with respect baseline condition, assumed at 80 mm Hg) and investigate coronary lumen characteristics.
The 25 parameters were obtainable in all subjects with a consistent interobserver agreement. In control subjects, the right coronary artery had a more significant lumen expansion at loading conditions compared with anomalous aortic origin of coronary artery (6%-19.2% vs 1.8%-8.1%, P = .008), which also showed an inability to expand within the intramural segment.
The proposed anomalous aortic origin of coronary artery model is able to represent the pathogenic disease mechanism after being populated with patient-specific data. It can assess the impaired expansion of anomalous right coronary at loading conditions, a process that cannot be quantified in any clinical set-up. This first clinical application showed promising results on quantifying pathological behavior, potentially helping in patient-specific risk stratification.
Entrepreneurial fear of failure: Scale development and validation
2020, Journal of Business Venturing
Citation Excerpt :
Thus, proving a high short-term retest stability of inter-individual differences is a necessary requirement for any study of personality.” To examine this, we used the Intraclass Correlation Coefficient—adopting a two-way, fixed effects, consistency approach—and focused on the stability of each single dimension and the aggregated scale across the three waves of data (McGraw and Wong, 1996; Chen and Barnhart, 2013). The intraclass correlation coefficient presents a more appropriate test of the stability of scores across test-retest situations (Shrout and Fleiss, 1979; McGraw and Wong, 1996).
Fear of failure is an important part of the experience of entrepreneurship. Yet past research has mainly investigated fear of failure in entrepreneurship among non entrepreneurs or nascent entrepreneurs and has done so by asking for reactions to hypothetical future failure. This approach to operationalizing the construct limits our capacity for understanding how entrepreneurs actually experience fear of failure while practicing entrepreneurship. In this paper, we conceptualize entrepreneurial fear of failure as a negative affective reaction based in cognitive appraisals of the potential for failure in the uncertain and ambiguous context of entrepreneurship. We use multiple samples to develop and validate a multidimensional, formative measure to assess entrepreneurial fear of failure as a state that is both cognitive and affective in nature. In addition to evidence of the psychometric properties of the new scale across multiple studies, we present a nomological network analysis with respect to measures of theoretically derived psychological outcomes and perceived behavioral tendencies of entrepreneurial fear of failure. We then discuss the theoretical, methodological, and empirical implications of this new measure of entrepreneurial fear of failure with an eye towards use of this scale in future research.
Deep learning-based automatic sella turcica segmentation and morphology measurement in X-ray images
2023, BMC Medical Imaging
Development and Relative Validation of a Food Frequency Questionnaire to Assess Non-Nutritive Sweeteners Intake among Pregnant Women in Santiago, Chile: A Pilot Study
2023, Nutrients
Weighted Alternative Coefficient of Concordance
2023, Proceedings - 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency, SUMMA 2023

View all citing articles on Scopus

View full text

Assessing agreement with intraclass correlation coefficient and concordance correlation coefficient for data with repeated measures

Abstract

Introduction

Section snippets

Methodology

Estimation and inference

Simulation

Data analysis

Discussion

Computational Statistics and Data Analysis

Assessing intra, inter and total agreement with replicated readings

Statistics in Medicine

Assessing agreement with repeated measures for random observers

Statistics in Medicine

A repeated measures concordance correlation coefficient

Statistics in Medicine