Regression Models for Clustered Binary Responses: Implications of Ignoring the Intracluster Correlation in an Analysis of Perinatal Mortality in Twin Gestations

doi:10.1016/j.annepidem.2004.08.007

Annals of Epidemiology

Volume 15, Issue 4, April 2005, Pages 293-301

https://doi.org/10.1016/j.annepidem.2004.08.007 Get rights and content

Purpose

Dependent binary responses, such as health outcomes in twin pairs or siblings, frequently arise in perinatal epidemiologic research. This gives rise to correlated data, which must be taken into account during analysis to avoid erroneous statistical and biological inferences.

Methods

An analysis of perinatal mortality (fetal deaths plus deaths within the first 28 days) in twins in relation to cluster-varying (those that are unique to each fetus within a twin pregnancy such as birthweight) and cluster-constant (those that are identical for both twins within a sibship such as maternal smoking status) risk factors is presented. Marginal (ordinary logistic regression [OLR] and logistic regression using generalized estimating equations [GEE]) and cluster-specific (conditional and random-intercept logistic regression models) regression models are fit and their results contrasted. The United States “matched multiple data” file of twin births (1995–1997), which includes 285,226 twins from 142,613 pregnancies, was used to examine the implications of ignoring of clustering on regression inferences.

Results

The OLR models provide variance estimates for cluster constant covariates that ranged from 7% to 71% smaller than those from GEE-based models. This underestimation is even more pronounced for some cluster-varying covariates, ranging from 21% to 198%.

Conclusions

Ignoring the cluster dependency is likely to affect the precision of covariate effects and consequently interpretation of results. With widespread availability of appropriate software, statistical methods for taking the intracluster dependency into account are easily implemented and necessary.

Introduction

Epidemiologists frequently encounter data that arise from sibships, with twin studies being a special subset in which the cluster size is two. Analyses of these data present statistical challenges because responses for each twin within a sibship are correlated. Responses from twins are therefore said to be “clustered” or “correlated” within a pregnancy, and such pregnancies are referred to as a “cluster.”

An important premise of many statistical models is called the “independence” assumption, where every observation in the study is assumed to be statistically independent of the others. However, twins tend to be more like one another than two randomly chosen individuals who are not twins. This non-independence within a cluster results in responses or outcomes being correlated, with the correlation commonly referred to as the intracluster or within-cluster correlation.

The phenomenon of clustering and the consequences of ignoring the dependency among observations on statistical inferences are fairly well understood. Nonetheless, several recent studies on multiple pregnancies have not accounted for the dependence in the observations. For instance, a literature search on PUBMED with the MeSH-headings “twins” and “perinatal mortality” identified 12 studies published in English between January 2002 and April 2003, of which, none accounted for non-independence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Conventional methods of statistical analysis (logistic regression) typically ignore the intracluster correlation, resulting in incorrect variance estimates, and statistically inefficient estimates of the regression parameters (13).

It is important to distinguish the two types of covariates for clustered data: 1) cluster-varying are those covariates that are unique to each subject within a cluster (e.g., birthweight), and 2) cluster-constant where the covariates share the same value for subjects within a cluster (e.g., maternal smoking). Another important consideration is the classification of regression models, namely, marginal and cluster-specific approaches. In the marginal regression model, the goal is to make inferences about population parameters, that is, averaging responses over (possibly heterogeneous) clusters. In the cluster-specific approach, the probability of the response is modeled as a function of the covariates and a parameter specific to each cluster (14). Therefore, cluster-specific approaches address changes in an exposure for subjects within a cluster. As an illustration, one might be interested in assessing the change in the risk of a fetal death if the mother quits smoking during pregnancy.

The purposes of this article are: 1) illustrate the concept of clustering of perinatal deaths in twins; 2) review marginal and cluster-specific models that account for the intracluster correlation; 3) evaluate the impact of cluster-varying and cluster-constant risk factors for perinatal mortality; and 4) examine the implications of ignoring discuss the intracluster correlation on inferences. We also provide strategies for choosing an appropriate model, discuss the impact of missing data on regression inferences, and discuss a few limitations and caveats of some of the regression models for clustered binary responses.

Section snippets

Regression Models for Clustered Binary Responses

Let Y_ij denote a binary response (coded 0/1) for the jth subject (1 ≤ j ≤ n_i) from the ith cluster (1 ≤ i ≤ K). In the setting of twins, Y_i1 and Y_i2 may denote perinatal mortality status of twins 1 and 2, respectively, in the ith pregnancy. When n_i = 1 for all i, the observations are said to be independent (e.g., singleton births), while n_i = 2 would indicate data from twins. Associated with the response Y_ij, let z_ij denote a q-covariate vector of cluster-varying covariates, and x_i denote a p-q

Material and Methods

This study used the matched multiple birth file for all twin births in the United States, 1995–1997, which includes information from birth, and fetal and infant death certificates in twin and higher order pregnancies (33). From 304,466 twin fetuses, 3602 twins who were delivered before 20 weeks and 2451 twins with birthweight < 250g were sequentially excluded. Analysis was also restricted to blacks and whites, and therefore excluded 11,477 twins of “other” race/ethnicity. Finally, 1710 twins

Marginal Regression Models

The overall perinatal mortality rate was 31.3 (n = 8937) per 1000 twin births. A marginal model was fit using three different methods of estimation: the OLR, and GEE-based logistic regression models with independence (GEE-i) and exchangeable (GEE-e) working correlation structures: $\log [\frac{μ_{i j}^{m}}{1 - μ_{i j}^{m}}] = α^{m} + γ_{1}^{m} (z - birthweight) + γ_{2}^{m} {(z - birthweight)}^{2} + γ_{3}^{m} (Second twin) + γ_{4}^{m} (Smaller twin) + γ_{5}^{m} (Male sex) + β_{6}^{m} ((Gestational age - 35.5) / 3.5) + β_{7}^{m} {((Gestational age - 35.5) / 3.5)}^{2} + β_{8}^{m} (Black race) + β_{9}^{m} (Primigravida) + β_{10}^{m} ($

Discussion

The implications of ignoring the intracluster dependency while modeling binary responses can be substantial. In our study, the variance estimates of cluster-constant covariates based on OLR were considerably smaller than the corresponding estimates from the estimating procedures (GEE) that accounted for the correlation (Table 1). For instance, the variance for the birthweight effect on perinatal mortality was larger in the GEE than in the OLR model, especially among twins with increased

References (48)

R.S. Hartley et al.
Size-discordant twin pairs have higher perinatal mortality rates than nondiscordant pairs
Am J Obstet Gynecol
(2002)
J. Dube et al.
Does chorionicity or zygosity predict adverse perinatal outcomes in twins?
Am J Obstet Gynecol
(2002)
D. Hedeker et al.
MIXOR: A computer program for mixed-effects ordinal regression analysis
Comput Methods Programs Biomed
(1996)
Y. Jacquemyn et al.
A matched cohort comparison of the outcome of twin versus singleton pregnancies in Flanders, Belgium
Twin Res
(2003)
S. Baghdadi et al.
Twin pregnancy outcome and chorionicity
Acta Obstet Gynecol Scand
(2003)
S. Chhabra et al.
Reduction in the occurrence of uterine rupture in Central India
J Obstet Gynaecol
(2002)
A.J. Roopnarinesingh et al.
Vaginal breech delivery and perinatal mortality in twins
J Obstet Gynaecol
(2002)
E.I. Nwobodo et al.
Twin births at University of Maiduguri Teaching Hospital: Incidence, pregnancy complications and outcome
Niger J Med
(2002)
R. Isaksson et al.
Obstetric outcome among women with unexplained infertility after IVF: A matched case–control study
Hum Reprod
(2002)
I.M. Usta et al.
Comparison of the perinatal morbidity and mortality of the presenting twin and its co-twin
J Perinatol
(2002)

A. Strauss et al.

Multifetal gestation—maternal and perinatal outcome of 112 pregnancies

Fetal Diagn Ther

(2002)

M.J. Platt et al.

St. Vincent's Declaration 10 years on: Outcomes of diabetic pregnancies

Diabet Med

(2002)

S.V. Glinianaia et al.

Fetal or infant death in twin pregnancy: Neurodevelopmental consequence for the survivor

Arch Dis Child Fetal Neonatal Ed

(2002)

S.L. Zeger et al.

An overview of methods for the analysis of longitudinal data

Stat Med

(1992)

R. Stiratelli et al.

Random-effects models for serial observations with binary response

Biometrics

(1984)

P. McCullagh et al.

Generalized Linear Models

(1989)

K.-Y. Liang et al.

Longitudinal data analysis using generalized linear models

Biometrika

(1986)

R.W.M. Wedderburn

Quasilikelihood functions, generalized linear models, and the Gauss-Newton method

Biometrika

(1974)

S.L. Zeger et al.

Longitudinal data analysis for discrete and continuous outcomes

Biometrics

(1986)

S.L. Zeger et al.

Models for longitudinal data: A generalized estimating equations approach

Biometrics

(1988)

S.G. Meester et al.

A parametric model for cluster correlated categorical data

Biometrics

(1994)

P.J. Heagerty

Marginally specified logistic-normal models for longitudinal binary data

Biometrics

(1999)

Y. Ochi et al.

Likelihood inference in a correlated probit regression model

Biometrika

(1984)

L.P. Zhao et al.

Correlated binary regression using a quadratic exponential model

Biometrika

(1990)

Cited by (0)

This article was presented at the Society for Pediatric and Perinatal Epidemiologic Research (SPER) 16th annual meeting held in Palm Desert, CA, June 2003.

Dr. Ananth is partially supported by a grant (R01-HD038902) from the National Institutes of Health. Dr. Platt is a career scientist of the Canadian Institutes of Health Research.

View full text

Regression Models for Clustered Binary Responses: Implications of Ignoring the Intracluster Correlation in an Analysis of Perinatal Mortality in Twin Gestations

Purpose

Methods

Results

Conclusions

Introduction

Section snippets

Regression Models for Clustered Binary Responses

Material and Methods

Marginal Regression Models

Discussion

Am J Obstet Gynecol

Am J Obstet Gynecol

Comput Methods Programs Biomed

A matched cohort comparison of the outcome of twin versus singleton pregnancies in Flanders, Belgium

Twin Res

Twin pregnancy outcome and chorionicity

Acta Obstet Gynecol Scand

Reduction in the occurrence of uterine rupture in Central India

J Obstet Gynaecol

Vaginal breech delivery and perinatal mortality in twins

J Obstet Gynaecol

Twin births at University of Maiduguri Teaching Hospital: Incidence, pregnancy complications and outcome

Niger J Med

Obstetric outcome among women with unexplained infertility after IVF: A matched case–control study

Hum Reprod

Comparison of the perinatal morbidity and mortality of the presenting twin and its co-twin

J Perinatol

Multifetal gestation—maternal and perinatal outcome of 112 pregnancies

Fetal Diagn Ther

St. Vincent's Declaration 10 years on: Outcomes of diabetic pregnancies

Diabet Med

Fetal or infant death in twin pregnancy: Neurodevelopmental consequence for the survivor

Arch Dis Child Fetal Neonatal Ed

An overview of methods for the analysis of longitudinal data

Stat Med

Random-effects models for serial observations with binary response

Biometrics

Generalized Linear Models

Longitudinal data analysis using generalized linear models

Biometrika

Quasilikelihood functions, generalized linear models, and the Gauss-Newton method

Biometrika

Longitudinal data analysis for discrete and continuous outcomes

Biometrics

Models for longitudinal data: A generalized estimating equations approach

Biometrics

A parametric model for cluster correlated categorical data

Biometrics

Marginally specified logistic-normal models for longitudinal binary data

Biometrics

Likelihood inference in a correlated probit regression model

Biometrika

Correlated binary regression using a quadratic exponential model

Biometrika