Abstract
Mean residual life (MRL) is the remaining life expectancy of a subject who has survived to a certain time point and can be used as an alternative to hazard function for characterizing the distribution of a time-to-event variable. Inference and application of MRL models have primarily focused on full-cohort studies. In practice, case-cohort and nested case-control designs have been commonly used within large cohorts that have long follow-up and study rare diseases, particularly when studying costly molecular biomarkers. They enable prospective inference as the full-cohort design with significant cost-saving benefits. In this paper, we study the modeling and inference of a family of generalized MRL models under case-cohort and nested case-control designs. Built upon the idea of inverse selection probability, the weighted estimating equations are constructed to estimate regression parameters and baseline MRL function. Asymptotic properties of the proposed estimators are established and finite-sample performance is evaluated by extensive numerical simulations. An application to the New York University Women’s Health Study is presented to illustrate the proposed models and demonstrate a model diagnostic method to guide practical implementation.
Similar content being viewed by others
References
Cai T, Zheng Y (2013) Resampling procedures for making inference under nested case-control studies. J Am Stat Assoc 108(504):1532–1544
Chen YQ (2007) Additive expectancy regression. J Am Stat Assoc 102(477):153–166
Chen YQ, Cheng S (2005) Semiparametric regression analysis of mean residual life with censored survival data. Biometrika 92(1):19–29
Chen YQ, Cheng S (2006) Linear life expectancy regression with censored data. Biometrika 93(2):303–313
Chen K, Lo SH (1999) Case-cohort and case-control analysis with Cox’s model. Biometrika 86(4):755–764
Chen YQ, Jewell NP, Lei X, Cheng SC (2005) Semiparametric estimation of proportional mean residual life model in presence of censoring. Biometrics 61(1):170–178
Clendenen TV, Ge W, Koenig KL, Axelsson T, Liu M, Afanasyeva Y, Andersson A, Arslan AA, Chen Y, Hallmans G, Lenner P, Kirchhoff T, Lundin E, Shore RE, Sund M, Zeleniuch-Jacquotte A (2015) Genetic polymorphisms in vitamin D metabolism and signaling genes and risk of breast cancer: a nested case-control study. PLoS ONE 10(10):e0140478
Efron B (1979) Bootstrap methods: another look at the Jackknife. Ann Stat 7(1):1–26
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. John Wiley & Sons. Inc., New York
Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879–1886
Ge W, Clendenen TV, Afanasyeva Y, Koenig KL, Agnoli C, Brinton LA, Dorgan JF, Eliassen AH, Falk RT, Hallmans G, Hankinson SE, Hoffman-Bolton J, Key TJ, Krogh V, Nichols HB, Sandler DP, Schoemaker MJ, Sluss PM, Sund M, Swerdlow AJ, Visvanathan K, Liu M, Zeleniuch-Jacquotte A (2018) Circulating anti-Müllerian hormone and breast cancer risk: a study in ten prospective cohorts. Int J Cancer 142(11):2215–2226
James IR (1986) On estimating equations with censored data. Biometrika 73:35–42
Kupper LL, McMichael AJ, Spirtas R (1975) A hybrid epidemiologic study design useful in estimating relative risk. J Am Stat Assoc 70(351a):524–528. https://doi.org/10.1080/01621459.1975.10482466
Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of Martingale-based residuals. Biometrika 80(3):557–572
Liu M, Lu W, Shore RE, Zeleniuch-Jacquotte A (2010a) Cox regression model with time-varying coefficients in nested case-control studies. Biostatistics 11(4):693–706
Liu M, Lu W, Tseng Ch (2010b) Cox regression in nested case-control studies with auxiliary covariates. Biometrics 66(2):374–381
Lu W, Liu M (2012) On estimation of linear transformation models with nested case-control sampling. Lifetime Data Anal 18(1):80–93
Lu W, Tsiatis AA (2006) Semiparametric transformation models for the case-cohort study. Biometrika 93(1):207–214
Ma H, Shi J, Zhou Y (2017) Proportional mean residual life model with censored survival data under case-cohort design. arXiv:1708.01634 [math, stat]
Maguluri G, Zhang CH (1994) Estimation in the mean residual life regression model. J R Stat Soc Ser B Methodol 56(3):477–489
Oakes D, Dasu T (1990) A note on residual life. Biometrika 77(2):409–410
Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11
Reid N (1981) Influence functions for censored data. Ann Stat 9(1):78–92
Samuelsen S (1997) A pseudolikelihood approach to analysis of nested case-control studies. Biometrika 84(2):379–394
Scarmo S, Afanasyeva Y, Lenner P, Koenig KL, Horst RL, Clendenen TV, Arslan AA, Chen Y, Hallmans G, Lundin E, Rinaldi S, Toniolo P, Shore RE, Zeleniuch-Jacquotte A (2013) Circulating levels of 25-hydroxyvitamin D and risk of breast cancer: a nested case-control study. Breast Cancer Res 15(1):R15
Scheike TH, Juul A (2004) Maximum likelihood estimation for Cox’s regression model under nested case-control sampling. Biostatistics 5(2):193–206
Sun L, Zhang Z (2009) A class of transformed mean residual life models with censored survival data. J Am Stat Assoc 104(486):803–815
Sun L, Song X, Zhang Z (2012) Mean residual life models with time-dependent coefficients under right censoring. Biometrika 99(1):185–197
Thomas DC (1977) Addendum to methods of cohort analysis: appraisal by application to asbestos mining by Liddell, F. D. K., McDonald, J. C., and Thomas, D. C. J R Stat Soc Ser A Gen 140(4):483–485
Yang G, Zhou Y (2014) Semiparametric varying-coefficient study of mean residual life models. J Multivar Anal 128:226–238
Zhang LX (2000) A functional central limit theorem for asymptotically negatively dependent random fields. Acta Math Hungar 86(3):237–259
Acknowledgements
The authors would like to thank the Associate Editor and two referees for their valuable suggestions. The research was partially supported by NIH Grants RO1CA178949.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Simulation study
We conducted numerical simulations to compare the IPCW method and the QPS estimator under the full cohort of 1000 subjects when censoring rates were approximately 10%, 30% and 80%. A total of 500 simulations were conducted for both the proportional and additive MRL models. We reported the bias, the standard deviation (SD) of the estimates, the average of estimated standard error (SE) and the coverage rate (CP) of 95% Wald-type confidence intervals (see results in Table 6). The SEs of the estimates were calculated through standard bootstrap method. Based on the simulation results, the two estimators had similar performance when censoring probability was low. The biases were all small and the means of estimated SEs were close to the empirical SDs of the parameter estimators. The 95% Wald-type confidence intervals had proper coverage rate. However, when the censoring rate was 80%, the IPCW performance dropped and underestimated SEs, which led to low coverage probabilities.
1.2 Regularity conditions
Let \(u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }), u_{\breve{\varvec{Z}}}(t;\varvec{\beta })\) and \(u_{\bar{\varvec{Z}}}(t)\) be the limits of \(\tilde{\varvec{Z}}(t;\varvec{\beta }), \breve{\varvec{Z}}(t;\varvec{\beta })\) and \(\bar{\varvec{Z}}(t;\varvec{\beta })\), respectively. We assume the following regularity conditions:
-
(C1)
sup supp(F) \(\le \) sup supp(G), where \(F(\cdot )\) and \(G(\cdot )\) are the distribution functions of T and C, respectively;
-
(C2)
\(\varvec{Z}\) is bounded;
-
(C3)
\(m_*(t)\) is continuously differentiable on [0,\(\tau \)];
-
(C4)
\(A=\int _0^\tau E[(\varvec{Z}-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)) (\varvec{Z}-u_{\breve{\varvec{Z}}}(t;\varvec{\beta }_*))' (\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}\}dN(t) -Y(t)d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}\})]\) is nonsingular;
Proof of Theorem 1(i)
First, we want to establish the consistency of the estimators \(\hat{m}_0(t)\) and \(\hat{\varvec{\beta }}\). Condition (C3) implies that \(m_0(t)\) is of bounded variation on \([0,\tau ]\). Define \(\mathcal {B}=\{\varvec{\beta }:||\varvec{\beta }-\varvec{\beta }_*||\le \epsilon \}\) for any \(\epsilon >0\) and we have \(E(w_i|\mathcal {F})=1\). By the strong law of large numbers and the fact that \(E\{w_idM_i(t)\}=0\), for large n, \(t\in [0,\tau ], \varvec{\beta }\in \mathcal {B}\), and sufficiently large \(\theta \),
By (11), (12), and the monotonicity and continuity of g function, for any \(t\in [0,\tau ]\) and \(\varvec{\beta }\in \mathcal {B}\), there exists a unique \(\hat{m}_0(t;\varvec{\beta })\) that satisfies
Note that (11) and (12) hold for any \(\theta >0\) when and only when \(\varvec{\beta }=\varvec{\beta }_*\). Then we have that \(\hat{m}_0(t;\varvec{\beta })\) converges to \(m_0(t;\varvec{\beta })\) uniformly in \(t\in [0,\tau ]\) and \(\varvec{\beta }\) in a compact set which contains the true parameter \(\varvec{\beta }_*\), and \(m_0(t;\varvec{\beta }_*) = m_*(t)\). Thus, to prove the existence and uniqueness of \(\hat{\varvec{\beta }}\) and \(\hat{m}_0(t)\), it suffices to show that there exists a unique solution to \(U(\varvec{\beta })=0\). Take derivative of (13) with respect to \(\varvec{\beta }\), we have
which is a first-order linear ordinary differential equation about \(d\hat{m}_0(t)/d\varvec{\beta }\). The solution is
where
Let \(\hat{A}(\varvec{\beta }_*) \doteq dU(\varvec{\beta })/d\varvec{\beta }|_{\varvec{\beta }=\varvec{\beta }_*}\). We have
Thus, \(\hat{A}(\varvec{\beta }_*)\) converges in probability to a nonrandom A. It is easy to check that \(U(\varvec{\beta }_*)\rightarrow 0\) almost surely, and A is nonsingular by (C4). The convergence of \(\hat{A}(\varvec{\beta }_*)\) and the continuity of \({A}(\varvec{\beta })\) imply that we can find a small neighborhood of \(\varvec{\beta }_*\) in which \(\hat{A}(\varvec{\beta }_*)\) is nonsingular when n is large enough. Therefore, it follows from the inverse function theorem that within a small neighborhood of \(\varvec{\beta }_*\), there exists a unique solution \(\hat{\varvec{\beta }}\) to \(U(\varvec{\beta })=0\) for sufficiently large n. Thus, there exists unique estimators \(\hat{\varvec{\beta }}\) and \(\hat{m}(t)\). Since \(\hat{\varvec{\beta }}\) is strongly consistent to \(\varvec{\beta }_*\), then it follows the uniform convergence of \(\hat{m}_0(t;\varvec{\beta })\) to \(m_0(t;\varvec{\beta })\) that \(\hat{m}_0(t)\doteq \hat{m}_0(t;\hat{\varvec{\beta }}) \rightarrow m_0(t;\varvec{\beta }_*) = m_*(t)\) almost surely in \([0,\tau ]\).
Proof of Theorem 1(ii)
In this section, we first prove the Theorem 1(ii) under the CC design, then prove the theorem under the NCC design following the proof from Lu and Liu (2012). We know from Eq. (8) that
Subtract the above two equations and use Taylor expansion, we have,
Hence, following the first-order ordinary differential equation,
Let \(U(\varvec{\beta }_*) \doteq U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*))\) and we have
By using Taylor expansion again in \(U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*)) - U(\varvec{\beta }_*,m_*(\cdot ))\), we have
Thus,
As we defined in our manuscript, let \(\mathcal {G}_i\) be the \(\sigma \)-field generated by \(\{\tilde{T}_i,\delta _i,\varvec{Z}_i, i=1,\ldots ,n\}\) and \(\mathcal {F}_i\) be the \(\sigma \)-field generated by \(\{\tilde{T}_i,\delta _i, i=1,\ldots ,n\}\). We denote
It is evident that \(E(1-\gamma _i/p_{0i}|\mathcal {F}_i)=0\) and \(E\{\eta _i(1-\gamma _i/p_{0i}|\mathcal {F}_i)\}=E\{\eta _i E (1-\gamma _i/p_{0i}|\mathcal {F}_i)\}\)=0. Following the proof in Lu and Tsiatis (2006), \(var\{\eta _i(1-\gamma _i/p_{0i})\}= E \{\eta _i^{\otimes 2}(1-p_{0i})/p_{0i}\}- E [\eta _i(1-p_{0i})/p_{0i}]^{\otimes 2} = \Sigma _2\). Condition on \(\mathcal {F}_i, \{\eta _i(1-\gamma _i/p_{0i}),i=1,\ldots ,n\}\) and the first term of \(\sqrt{n}U(\varvec{\beta }_*)\) are uncorrelated. Therefore, \(\sqrt{n}U(\varvec{\beta }_*)\) is asymptotically normal with mean zero and variance-covariance matrix \(\Sigma =\Sigma _1+\Sigma _2\). By Taylor expansion and consistency of \(\hat{\varvec{\beta }}\), it follows
thus \(\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }_*)\rightarrow N\{A^{-1}\Sigma (A^{-1})^{'}\}.\)
Under the nested case-control design, the asymptotic distribution of \(\hat{\varvec{\beta }}\) is more difficult to derive because the NCC sampling scheme is a dynamic process. The probability of being selected as a control is neither a constant not independent. Thus, we consider the idea of central limit theory for asymptotically negatively dependent random variables (Zhang 2000), which has been used in the proof of Lu and Liu (2012). Based on the following asymptotical representation, we have
By martingale central limit theorem, \(\sqrt{n}U_1(\varvec{\beta }_*) = \frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) \rightarrow N(0,\Sigma _1)\) as \(n \rightarrow \infty \). Since \(E(w_i-1|\mathcal {F}_i)\) = 0, it is evident that \(U_1(\varvec{\beta }_*)\) and \(U_2(\varvec{\beta }_*)\) are uncorrelated. However, because of the NCC sampling scheme, \(w_i\) and \(w_j\) (\(i \ne j\)) are correlated even after conditioning on \(\mathcal {F}\). Since \((w_i-1)^2 = (1-\delta _i)(\gamma _i-p_{0i}^2)/p_{0i}^2\), then \(E\{(w_i-1)^2|\mathcal {F}\} = (1-\delta _i)(1-p_{0i})/p_{0i}\). Thus, the conditional variance of \(\sqrt{n}U_2(\varvec{\beta }_*)\) can be written as
According to Samuelsen (1997), for \(i\ne j, Cov(\gamma _i,\gamma _j|\mathcal {F})=\rho _{ij}(1-p_{0i})(1-p_{0j})\), where \(\rho _{ij}=-\frac{m}{n}\int _0^{min(\tilde{T_i},\tilde{T_j})} \frac{\bar{g}_1(t)}{\bar{y}(t)}dm_*(t)+\frac{\bar{g}_2(t)}{\bar{y}(t)}dt+O_p(n^{-2})\). Thus, with some algebra, the Var\(\{\sqrt{n}U_2(\varvec{\beta }_*)|\mathcal {F}\}\) can be written as
where \(\bar{g}_1(t) = \sum _{j=1}^n\frac{Y_j(t)\dot{g}\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}{g\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}, \bar{g}_2(t) = \sum _{j=1}^n\frac{Y_j(t)}{g\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}, \bar{y}(t)=\sum _{j=1}^nY_j(t)\). According to strong law of large numbers, we have
where
Thus, by the central limit theory for asymptotically negatively dependent random variables (Zhang 2000), we have \(\sqrt{n}U_2(\varvec{\beta }_*) \rightarrow N(0,\Sigma _2)\) as \(n\rightarrow \infty \), and
in distribution as \(n \rightarrow \infty \). It is easy to see that \(\Sigma _1+\Sigma _2 = \Sigma \). Follow by Taylor expansion and consistency of \(\hat{\varvec{\beta }}\), we have \(\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }_*)\rightarrow N\{A^{-1}\Sigma (A^{-1})^{'}\}\).
Rights and permissions
About this article
Cite this article
Jin, P., Zeleniuch-Jacquotte, A. & Liu, M. Generalized mean residual life models for case-cohort and nested case-control studies. Lifetime Data Anal 26, 789–819 (2020). https://doi.org/10.1007/s10985-020-09499-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-020-09499-w