Flexible modeling of survival data with covariates subject to detection limits via multiple imputation
Introduction
Biomedical datasets frequently contain variables that are subject to censoring. Censoring due to detection limits (DLs) often occurs in practice when medical instruments are unable to measure a biological factor below a certain known value. In the motivating Genetic and Inflammatory Markers of Sepsis (GenIMS) study, it is of interest to model the survival times of patients with community acquired pneumonia using several biomarkers and demographic covariates. In this study, the survival times are subject to right censoring while three important biomarkers are left-censored below known DLs. Our main objective in this paper is to develop a computationally-efficient procedure for conducting inference in the context of survival models with covariates subject to DLs.
Traditionally, a few common approaches have been taken to handle censoring due to DLs. Mostly simply, a complete case approach can be applied, where data from individuals with censored covariates are completely discarded. While we prove in Section 3.1 that a complete case analysis leads to consistent estimates in accelerated failure time models when covariates are censored due to DLs, efficiency is lost due to the deletion of data. In an attempt to use all of the data, a second approach has become increasingly common where censored observations are replaced with a fixed value such as the DL, , or (Hornung and Reed, 1990) or the conditional mean of the censored covariate (Austin and Hoch, 2004, Giovanini, 2008, Arunajadai and Rauh, 2012). Using these naive substitution methods has been demonstrated to produce biased parameter estimates in generalized linear models (Lynn, 2001, Austin and Brunner, 2003, Lubin et al., 2004, Rigobon and Stoker, 2007, Rigobon and Stoker, 2009, Helsel, 2012, Bernhardt et al., in press) and survival models (D’Angelo and Weissfeld, 2008, Sattar et al., 2012).
A few researchers have considered alternative methods for handling censored covariates in survival models. Langohr et al. (2004) and Sattar et al. (2012) proposed fully-parametric survival models with a single interval-censored predictor. Lee et al. (2003) also considered survival models with a single covariate subject to DLs, though they proposed using semiparametric Cox proportional hazards models in which the relative risk function for the censored covariates is replaced by a nonparametric estimate of its expected value. More recently, D’Angelo and Weissfeld (2008) developed an indexing approach where censored covariate values are directly replaced by their conditional expectation given a linear combination of the fully observed covariates. While their method performs reasonably well, it is somewhat ad-hoc and limited to cases when no more than two covariates are subject to DLs.
In this paper, we develop a straightforward, computationally-efficient multiple imputation method for handling multiple covariates subject to DLs in the context of accelerated failure time (AFT) models for censored survival data. To increase flexibility in the AFT survival model, we recommend using the seminonparametric (SNP) distribution to model the error term. We establish the asymptotic consistency and normality of the multiple imputation estimator and propose a convenient variance estimation method. We additionally suggest an iterative version of this estimator which improves efficiency with only a few updates. Though multiple imputation has been studied in many missing-data problems, our development for censored covariates in flexible survival models is nonstandard. Through numerical studies, we demonstrate that our proposed estimator leads to unbiased estimates and is potentially more efficient than several competing methods. We additionally show that using the flexible SNP distribution is more robust than typical parametric methods.
The remainder of the paper is organized as follows. In Section 2, we review AFT models and the SNP distribution. We also briefly explain how to fit AFT models with an SNP error term. In Section 3, we develop the proposed multiple imputation methods and establish their asymptotic properties. In Section 4, we carry out extensive simulations to compare the performance of the proposed methods with several simpler approaches. In Section 5, we apply the proposed methods to the dataset from the GenIMS study. Finally, in Section 6, we discuss the limitations of the proposed multiple imputation methods and some avenues for further research. The technical details for the proposition and theorems appearing in this paper are provided in the online Supplementary material.
Section snippets
Seminonparametric accelerated failure time model
We first present the proposed seminonparametric accelerated failure time (SNP-AFT) model and discuss the algorithm for fitting the model when covariates are fully observed.
Problem set-up
For the remainder of this paper, we assume that some covariates in are subject to censoring due to lower DLs. Thus, for the th individual, we let , where is the -dimensional vector of covariates fully observed for each individual and is the -dimensional vector of covariates subject to censoring below , the vector of DLs. For , we only observe and , so that the complete set of observed
Simulation
We conducted numerical studies to assess the performance of the proposed multiple imputation and iterated multiple imputation estimators. We set up the simulations to represent a situation similar to that in the application described in Section 5. Specifically, we generated the covariates , to represent an “age” variable, and to represent the log-cytokines TNF and IL-10. Observations of and were censored at the DLs
Application to GenIMS data
We demonstrate the performance of our proposed multiple imputation and iterated multiple imputation methods by applying them to the data from the Genetic and Inflammatory Markers of Sepsis (GenIMS) study. One of the purposes of the GenIMS study was to identify the relationship between the survival time of patients with community acquired pneumonia (CAP) and several biomarkers for inflammatory responses in the body (Kellum et al., 2007). The data for the GenIMS study were obtained from
Discussion
We have proposed a multiple imputation method for handling covariates censored due to DLs in AFT survival models. We have proven that the proposed estimator is consistent and asymptotically normal, with standard errors that are relatively easy to estimate. We also suggested an iterated version of the multiple imputation procedure which provides potentially significant efficiency improvements. Through numerical studies, we demonstrated that the multiple imputation procedure and iterated multiple
Acknowledgments
The authors would like to express their appreciation to the editor, an associate editor, and two anonymous referees for their valuable comments. The authors would also like to thank Dr. Lan Kong of Penn State College of Medicine and the CRISMA (Clinical Research, Investigation, and Systems Modeling of Acute Illness) Center at the University of Pittsburgh for providing us with the GenIMS dataset. The research of Wang is supported by the NSF AwardDMS-1007420 and NSF Career AwardDMS-1149355. The
References (29)
- et al.
The proportional hazards regression with a censored covariate
Statistics & Probability Letters
(2003) - et al.
Handling covariates subject to limits of detection in regression
Environmental and Ecological Statistics
(2012) - et al.
Type I error inflation in the presence of a ceiling effect
The American Statistician
(2003) - et al.
Estimating linear regression models in the presence of a censored independent variable
Statistics in Medicine
(2004) - et al.
An index approach for the cox model with left censored covariates
Statistics in Medicine
(2008) - et al.
Statistical methods for generalized linear models with covariates subject to detection limits
Statistics in Biosciences
(2013) - et al.
‘Smooth’ inference for survival functions with arbitrarily censored data
Statistics in Medicine
(2008) - et al.
Semi-nonparametric maximum likelihood estimation
Econometrica
(1987) - Giovanini, J., 2008. Generalized linear mixed models with censored covariates. Ph.D. Thesis. Oregon State...
Statistics for Censored Environmental Data Using Minitab and R
(2012)
Estimation of average concentration in the presence of nondetectable values
Applied Occupation and Environmental Hygiene
Understanding the inflammatory cytokine response in pneumonia and sepsis
Archives of Internal Medicine
A parametric survival model with an interval-censored covariate
Statistics in Medicine
Finding the observed information matrix when using the EM algorithm
Journal of the Royal Statistical Society, Series B
Cited by (30)
The missing indicator approach for censored covariates subject to limit of detection in logistic regression models
2019, Annals of EpidemiologyCitation Excerpt :It can be shown that the CC estimators are consistent estimators for true parameters [21].
Study on missing data imputation and modeling for the leaching process
2017, Chemical Engineering Research and DesignCitation Excerpt :Gomez-Carracedo et al. (2014) compared the performance of four SI methods and a MI method on actual air quality datasets, and the conclusion proved that MI yielded more disperse imputed values. Bernhardt et al. (2014) proposed a computationally efficient MI method in modeling survival time of patients, and the Simulation studies demonstrated that the proposed MI method works well while alternative methods lead to estimates that are either biased or more variable. Jones et al. (2014) assessed the exposure to drinking water contaminants using the MI method, which appears to be an effective method for filling in water quality values between measures.
Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
2016, Computational Statistics and Data AnalysisSpecial issue on advances in survival analysis
2016, Computational Statistics and Data AnalysisA fast em algorithm for fitting joint models of a binary response and multiple longitudinal covariates subject to detection limits
2015, Computational Statistics and Data AnalysisCitation Excerpt :Sattar et al. (2012) only considered IL-10 in the model and found it to be statistically significant in predicting survival time while D’ Angelo and Weissfeld (2008) jointly modeled IL-6 and IL-10 and only found IL-6 to be strongly statistically significant. Bernhardt et al. (2014) used accelerated failure time models to jointly model TNF, IL-6, and IL-10 and only found IL-6 and IL-10 to be moderately statistically significant, though it was noted that a global test for the three biomarkers was strongly significant. No previous study of the GenIMS data used all of the longitudinal data for all three cytokines of interest simultaneously in a model for 90-day survival.
Spatial prediction in the presence of left-censoring
2014, Computational Statistics and Data AnalysisCitation Excerpt :However, in environmental monitoring, as well as in many other disciplines, the collected spatial data set often includes left-censored observations falling below the minimum detection limit (MDL) of the measuring device. Ways of handling this type of censoring are discussed, e.g. by Bernhardt et al. (2014) in the context of modeling survival data, when the covariates are left-censored. Some spatial prediction methods have also been proposed, ranging from rather naive distribution-free approaches to more sophisticated computer intensive model-based methods.