Elsevier

Annals of Epidemiology

Volume 15, Issue 4, April 2005, Pages 293-301
Annals of Epidemiology

Regression Models for Clustered Binary Responses: Implications of Ignoring the Intracluster Correlation in an Analysis of Perinatal Mortality in Twin Gestations

https://doi.org/10.1016/j.annepidem.2004.08.007Get rights and content

Purpose

Dependent binary responses, such as health outcomes in twin pairs or siblings, frequently arise in perinatal epidemiologic research. This gives rise to correlated data, which must be taken into account during analysis to avoid erroneous statistical and biological inferences.

Methods

An analysis of perinatal mortality (fetal deaths plus deaths within the first 28 days) in twins in relation to cluster-varying (those that are unique to each fetus within a twin pregnancy such as birthweight) and cluster-constant (those that are identical for both twins within a sibship such as maternal smoking status) risk factors is presented. Marginal (ordinary logistic regression [OLR] and logistic regression using generalized estimating equations [GEE]) and cluster-specific (conditional and random-intercept logistic regression models) regression models are fit and their results contrasted. The United States “matched multiple data” file of twin births (1995–1997), which includes 285,226 twins from 142,613 pregnancies, was used to examine the implications of ignoring of clustering on regression inferences.

Results

The OLR models provide variance estimates for cluster constant covariates that ranged from 7% to 71% smaller than those from GEE-based models. This underestimation is even more pronounced for some cluster-varying covariates, ranging from 21% to 198%.

Conclusions

Ignoring the cluster dependency is likely to affect the precision of covariate effects and consequently interpretation of results. With widespread availability of appropriate software, statistical methods for taking the intracluster dependency into account are easily implemented and necessary.

Introduction

Epidemiologists frequently encounter data that arise from sibships, with twin studies being a special subset in which the cluster size is two. Analyses of these data present statistical challenges because responses for each twin within a sibship are correlated. Responses from twins are therefore said to be “clustered” or “correlated” within a pregnancy, and such pregnancies are referred to as a “cluster.”

An important premise of many statistical models is called the “independence” assumption, where every observation in the study is assumed to be statistically independent of the others. However, twins tend to be more like one another than two randomly chosen individuals who are not twins. This non-independence within a cluster results in responses or outcomes being correlated, with the correlation commonly referred to as the intracluster or within-cluster correlation.

The phenomenon of clustering and the consequences of ignoring the dependency among observations on statistical inferences are fairly well understood. Nonetheless, several recent studies on multiple pregnancies have not accounted for the dependence in the observations. For instance, a literature search on PUBMED with the MeSH-headings “twins” and “perinatal mortality” identified 12 studies published in English between January 2002 and April 2003, of which, none accounted for non-independence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. Conventional methods of statistical analysis (logistic regression) typically ignore the intracluster correlation, resulting in incorrect variance estimates, and statistically inefficient estimates of the regression parameters (13).

It is important to distinguish the two types of covariates for clustered data: 1) cluster-varying are those covariates that are unique to each subject within a cluster (e.g., birthweight), and 2) cluster-constant where the covariates share the same value for subjects within a cluster (e.g., maternal smoking). Another important consideration is the classification of regression models, namely, marginal and cluster-specific approaches. In the marginal regression model, the goal is to make inferences about population parameters, that is, averaging responses over (possibly heterogeneous) clusters. In the cluster-specific approach, the probability of the response is modeled as a function of the covariates and a parameter specific to each cluster (14). Therefore, cluster-specific approaches address changes in an exposure for subjects within a cluster. As an illustration, one might be interested in assessing the change in the risk of a fetal death if the mother quits smoking during pregnancy.

The purposes of this article are: 1) illustrate the concept of clustering of perinatal deaths in twins; 2) review marginal and cluster-specific models that account for the intracluster correlation; 3) evaluate the impact of cluster-varying and cluster-constant risk factors for perinatal mortality; and 4) examine the implications of ignoring discuss the intracluster correlation on inferences. We also provide strategies for choosing an appropriate model, discuss the impact of missing data on regression inferences, and discuss a few limitations and caveats of some of the regression models for clustered binary responses.

Section snippets

Regression Models for Clustered Binary Responses

Let Yij denote a binary response (coded 0/1) for the jth subject (1 ≤ jni) from the ith cluster (1 ≤ iK). In the setting of twins, Yi1 and Yi2 may denote perinatal mortality status of twins 1 and 2, respectively, in the ith pregnancy. When ni = 1 for all i, the observations are said to be independent (e.g., singleton births), while ni = 2 would indicate data from twins. Associated with the response Yij, let zij denote a q-covariate vector of cluster-varying covariates, and xi denote a p-q

Material and Methods

This study used the matched multiple birth file for all twin births in the United States, 1995–1997, which includes information from birth, and fetal and infant death certificates in twin and higher order pregnancies (33). From 304,466 twin fetuses, 3602 twins who were delivered before 20 weeks and 2451 twins with birthweight < 250g were sequentially excluded. Analysis was also restricted to blacks and whites, and therefore excluded 11,477 twins of “other” race/ethnicity. Finally, 1710 twins

Marginal Regression Models

The overall perinatal mortality rate was 31.3 (n = 8937) per 1000 twin births. A marginal model was fit using three different methods of estimation: the OLR, and GEE-based logistic regression models with independence (GEE-i) and exchangeable (GEE-e) working correlation structures:log[μijm1μijm]=αm+γ1m(zbirthweight)+γ2m(zbirthweight)2+γ3m(Second twin)+γ4m(Smaller twin)+γ5m(Male sex)+β6m((Gestational age35.5)/3.5)+β7m((Gestational age35.5)/3.5)2+β8m(Black race)+β9m(Primigravida)+β10m(

Discussion

The implications of ignoring the intracluster dependency while modeling binary responses can be substantial. In our study, the variance estimates of cluster-constant covariates based on OLR were considerably smaller than the corresponding estimates from the estimating procedures (GEE) that accounted for the correlation (Table 1). For instance, the variance for the birthweight effect on perinatal mortality was larger in the GEE than in the OLR model, especially among twins with increased

References (48)

  • R.S. Hartley et al.

    Size-discordant twin pairs have higher perinatal mortality rates than nondiscordant pairs

    Am J Obstet Gynecol

    (2002)
  • J. Dube et al.

    Does chorionicity or zygosity predict adverse perinatal outcomes in twins?

    Am J Obstet Gynecol

    (2002)
  • D. Hedeker et al.

    MIXOR: A computer program for mixed-effects ordinal regression analysis

    Comput Methods Programs Biomed

    (1996)
  • Y. Jacquemyn et al.

    A matched cohort comparison of the outcome of twin versus singleton pregnancies in Flanders, Belgium

    Twin Res

    (2003)
  • S. Baghdadi et al.

    Twin pregnancy outcome and chorionicity

    Acta Obstet Gynecol Scand

    (2003)
  • S. Chhabra et al.

    Reduction in the occurrence of uterine rupture in Central India

    J Obstet Gynaecol

    (2002)
  • A.J. Roopnarinesingh et al.

    Vaginal breech delivery and perinatal mortality in twins

    J Obstet Gynaecol

    (2002)
  • E.I. Nwobodo et al.

    Twin births at University of Maiduguri Teaching Hospital: Incidence, pregnancy complications and outcome

    Niger J Med

    (2002)
  • R. Isaksson et al.

    Obstetric outcome among women with unexplained infertility after IVF: A matched case–control study

    Hum Reprod

    (2002)
  • I.M. Usta et al.

    Comparison of the perinatal morbidity and mortality of the presenting twin and its co-twin

    J Perinatol

    (2002)
  • A. Strauss et al.

    Multifetal gestation—maternal and perinatal outcome of 112 pregnancies

    Fetal Diagn Ther

    (2002)
  • M.J. Platt et al.

    St. Vincent's Declaration 10 years on: Outcomes of diabetic pregnancies

    Diabet Med

    (2002)
  • S.V. Glinianaia et al.

    Fetal or infant death in twin pregnancy: Neurodevelopmental consequence for the survivor

    Arch Dis Child Fetal Neonatal Ed

    (2002)
  • S.L. Zeger et al.

    An overview of methods for the analysis of longitudinal data

    Stat Med

    (1992)
  • R. Stiratelli et al.

    Random-effects models for serial observations with binary response

    Biometrics

    (1984)
  • P. McCullagh et al.

    Generalized Linear Models

    (1989)
  • K.-Y. Liang et al.

    Longitudinal data analysis using generalized linear models

    Biometrika

    (1986)
  • R.W.M. Wedderburn

    Quasilikelihood functions, generalized linear models, and the Gauss-Newton method

    Biometrika

    (1974)
  • S.L. Zeger et al.

    Longitudinal data analysis for discrete and continuous outcomes

    Biometrics

    (1986)
  • S.L. Zeger et al.

    Models for longitudinal data: A generalized estimating equations approach

    Biometrics

    (1988)
  • S.G. Meester et al.

    A parametric model for cluster correlated categorical data

    Biometrics

    (1994)
  • P.J. Heagerty

    Marginally specified logistic-normal models for longitudinal binary data

    Biometrics

    (1999)
  • Y. Ochi et al.

    Likelihood inference in a correlated probit regression model

    Biometrika

    (1984)
  • L.P. Zhao et al.

    Correlated binary regression using a quadratic exponential model

    Biometrika

    (1990)
  • Cited by (0)

    This article was presented at the Society for Pediatric and Perinatal Epidemiologic Research (SPER) 16th annual meeting held in Palm Desert, CA, June 2003.

    Dr. Ananth is partially supported by a grant (R01-HD038902) from the National Institutes of Health. Dr. Platt is a career scientist of the Canadian Institutes of Health Research.

    View full text