Regression Models for Clustered Binary Responses: Implications of Ignoring the Intracluster Correlation in an Analysis of Perinatal Mortality in Twin Gestations
Introduction
Epidemiologists frequently encounter data that arise from sibships, with twin studies being a special subset in which the cluster size is two. Analyses of these data present statistical challenges because responses for each twin within a sibship are correlated. Responses from twins are therefore said to be “clustered” or “correlated” within a pregnancy, and such pregnancies are referred to as a “cluster.”
An important premise of many statistical models is called the “independence” assumption, where every observation in the study is assumed to be statistically independent of the others. However, twins tend to be more like one another than two randomly chosen individuals who are not twins. This non-independence within a cluster results in responses or outcomes being correlated, with the correlation commonly referred to as the intracluster or within-cluster correlation.
The phenomenon of clustering and the consequences of ignoring the dependency among observations on statistical inferences are fairly well understood. Nonetheless, several recent studies on multiple pregnancies have not accounted for the dependence in the observations. For instance, a literature search on PUBMED with the MeSH-headings “twins” and “perinatal mortality” identified 12 studies published in English between January 2002 and April 2003, of which, none accounted for non-independence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Conventional methods of statistical analysis (logistic regression) typically ignore the intracluster correlation, resulting in incorrect variance estimates, and statistically inefficient estimates of the regression parameters (13).
It is important to distinguish the two types of covariates for clustered data: 1) cluster-varying are those covariates that are unique to each subject within a cluster (e.g., birthweight), and 2) cluster-constant where the covariates share the same value for subjects within a cluster (e.g., maternal smoking). Another important consideration is the classification of regression models, namely, marginal and cluster-specific approaches. In the marginal regression model, the goal is to make inferences about population parameters, that is, averaging responses over (possibly heterogeneous) clusters. In the cluster-specific approach, the probability of the response is modeled as a function of the covariates and a parameter specific to each cluster (14). Therefore, cluster-specific approaches address changes in an exposure for subjects within a cluster. As an illustration, one might be interested in assessing the change in the risk of a fetal death if the mother quits smoking during pregnancy.
The purposes of this article are: 1) illustrate the concept of clustering of perinatal deaths in twins; 2) review marginal and cluster-specific models that account for the intracluster correlation; 3) evaluate the impact of cluster-varying and cluster-constant risk factors for perinatal mortality; and 4) examine the implications of ignoring discuss the intracluster correlation on inferences. We also provide strategies for choosing an appropriate model, discuss the impact of missing data on regression inferences, and discuss a few limitations and caveats of some of the regression models for clustered binary responses.
Section snippets
Regression Models for Clustered Binary Responses
Let Yij denote a binary response (coded 0/1) for the jth subject (1 ≤ j ≤ ni) from the ith cluster (1 ≤ i ≤ K). In the setting of twins, Yi1 and Yi2 may denote perinatal mortality status of twins 1 and 2, respectively, in the ith pregnancy. When ni = 1 for all i, the observations are said to be independent (e.g., singleton births), while ni = 2 would indicate data from twins. Associated with the response Yij, let zij denote a q-covariate vector of cluster-varying covariates, and xi denote a p-q
Material and Methods
This study used the matched multiple birth file for all twin births in the United States, 1995–1997, which includes information from birth, and fetal and infant death certificates in twin and higher order pregnancies (33). From 304,466 twin fetuses, 3602 twins who were delivered before 20 weeks and 2451 twins with birthweight < 250g were sequentially excluded. Analysis was also restricted to blacks and whites, and therefore excluded 11,477 twins of “other” race/ethnicity. Finally, 1710 twins
Marginal Regression Models
The overall perinatal mortality rate was 31.3 (n = 8937) per 1000 twin births. A marginal model was fit using three different methods of estimation: the OLR, and GEE-based logistic regression models with independence (GEE-i) and exchangeable (GEE-e) working correlation structures:
Discussion
The implications of ignoring the intracluster dependency while modeling binary responses can be substantial. In our study, the variance estimates of cluster-constant covariates based on OLR were considerably smaller than the corresponding estimates from the estimating procedures (GEE) that accounted for the correlation (Table 1). For instance, the variance for the birthweight effect on perinatal mortality was larger in the GEE than in the OLR model, especially among twins with increased
References (48)
- et al.
Size-discordant twin pairs have higher perinatal mortality rates than nondiscordant pairs
Am J Obstet Gynecol
(2002) - et al.
Does chorionicity or zygosity predict adverse perinatal outcomes in twins?
Am J Obstet Gynecol
(2002) - et al.
MIXOR: A computer program for mixed-effects ordinal regression analysis
Comput Methods Programs Biomed
(1996) - et al.
A matched cohort comparison of the outcome of twin versus singleton pregnancies in Flanders, Belgium
Twin Res
(2003) - et al.
Twin pregnancy outcome and chorionicity
Acta Obstet Gynecol Scand
(2003) - et al.
Reduction in the occurrence of uterine rupture in Central India
J Obstet Gynaecol
(2002) - et al.
Vaginal breech delivery and perinatal mortality in twins
J Obstet Gynaecol
(2002) - et al.
Twin births at University of Maiduguri Teaching Hospital: Incidence, pregnancy complications and outcome
Niger J Med
(2002) - et al.
Obstetric outcome among women with unexplained infertility after IVF: A matched case–control study
Hum Reprod
(2002) - et al.
Comparison of the perinatal morbidity and mortality of the presenting twin and its co-twin
J Perinatol
(2002)
Multifetal gestation—maternal and perinatal outcome of 112 pregnancies
Fetal Diagn Ther
St. Vincent's Declaration 10 years on: Outcomes of diabetic pregnancies
Diabet Med
Fetal or infant death in twin pregnancy: Neurodevelopmental consequence for the survivor
Arch Dis Child Fetal Neonatal Ed
An overview of methods for the analysis of longitudinal data
Stat Med
Random-effects models for serial observations with binary response
Biometrics
Generalized Linear Models
Longitudinal data analysis using generalized linear models
Biometrika
Quasilikelihood functions, generalized linear models, and the Gauss-Newton method
Biometrika
Longitudinal data analysis for discrete and continuous outcomes
Biometrics
Models for longitudinal data: A generalized estimating equations approach
Biometrics
A parametric model for cluster correlated categorical data
Biometrics
Marginally specified logistic-normal models for longitudinal binary data
Biometrics
Likelihood inference in a correlated probit regression model
Biometrika
Correlated binary regression using a quadratic exponential model
Biometrika
Cited by (0)
This article was presented at the Society for Pediatric and Perinatal Epidemiologic Research (SPER) 16th annual meeting held in Palm Desert, CA, June 2003.
Dr. Ananth is partially supported by a grant (R01-HD038902) from the National Institutes of Health. Dr. Platt is a career scientist of the Canadian Institutes of Health Research.