Skip to main content

Advertisement

Log in

Generalized mean residual life models for case-cohort and nested case-control studies

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Mean residual life (MRL) is the remaining life expectancy of a subject who has survived to a certain time point and can be used as an alternative to hazard function for characterizing the distribution of a time-to-event variable. Inference and application of MRL models have primarily focused on full-cohort studies. In practice, case-cohort and nested case-control designs have been commonly used within large cohorts that have long follow-up and study rare diseases, particularly when studying costly molecular biomarkers. They enable prospective inference as the full-cohort design with significant cost-saving benefits. In this paper, we study the modeling and inference of a family of generalized MRL models under case-cohort and nested case-control designs. Built upon the idea of inverse selection probability, the weighted estimating equations are constructed to estimate regression parameters and baseline MRL function. Asymptotic properties of the proposed estimators are established and finite-sample performance is evaluated by extensive numerical simulations. An application to the New York University Women’s Health Study is presented to illustrate the proposed models and demonstrate a model diagnostic method to guide practical implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Cai T, Zheng Y (2013) Resampling procedures for making inference under nested case-control studies. J Am Stat Assoc 108(504):1532–1544

    Article  MathSciNet  MATH  Google Scholar 

  • Chen YQ (2007) Additive expectancy regression. J Am Stat Assoc 102(477):153–166

    Article  MathSciNet  MATH  Google Scholar 

  • Chen YQ, Cheng S (2005) Semiparametric regression analysis of mean residual life with censored survival data. Biometrika 92(1):19–29

    Article  MathSciNet  MATH  Google Scholar 

  • Chen YQ, Cheng S (2006) Linear life expectancy regression with censored data. Biometrika 93(2):303–313

    Article  MathSciNet  MATH  Google Scholar 

  • Chen K, Lo SH (1999) Case-cohort and case-control analysis with Cox’s model. Biometrika 86(4):755–764

    Article  MathSciNet  MATH  Google Scholar 

  • Chen YQ, Jewell NP, Lei X, Cheng SC (2005) Semiparametric estimation of proportional mean residual life model in presence of censoring. Biometrics 61(1):170–178

    Article  MathSciNet  MATH  Google Scholar 

  • Clendenen TV, Ge W, Koenig KL, Axelsson T, Liu M, Afanasyeva Y, Andersson A, Arslan AA, Chen Y, Hallmans G, Lenner P, Kirchhoff T, Lundin E, Shore RE, Sund M, Zeleniuch-Jacquotte A (2015) Genetic polymorphisms in vitamin D metabolism and signaling genes and risk of breast cancer: a nested case-control study. PLoS ONE 10(10):e0140478

    Article  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the Jackknife. Ann Stat 7(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Fleming TR, Harrington DP (1991) Counting processes and survival analysis. John Wiley & Sons. Inc., New York

  • Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879–1886

    Article  Google Scholar 

  • Ge W, Clendenen TV, Afanasyeva Y, Koenig KL, Agnoli C, Brinton LA, Dorgan JF, Eliassen AH, Falk RT, Hallmans G, Hankinson SE, Hoffman-Bolton J, Key TJ, Krogh V, Nichols HB, Sandler DP, Schoemaker MJ, Sluss PM, Sund M, Swerdlow AJ, Visvanathan K, Liu M, Zeleniuch-Jacquotte A (2018) Circulating anti-Müllerian hormone and breast cancer risk: a study in ten prospective cohorts. Int J Cancer 142(11):2215–2226

    Article  Google Scholar 

  • James IR (1986) On estimating equations with censored data. Biometrika 73:35–42

    Article  MathSciNet  MATH  Google Scholar 

  • Kupper LL, McMichael AJ, Spirtas R (1975) A hybrid epidemiologic study design useful in estimating relative risk. J Am Stat Assoc 70(351a):524–528. https://doi.org/10.1080/01621459.1975.10482466

    Article  Google Scholar 

  • Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of Martingale-based residuals. Biometrika 80(3):557–572

    Article  MathSciNet  MATH  Google Scholar 

  • Liu M, Lu W, Shore RE, Zeleniuch-Jacquotte A (2010a) Cox regression model with time-varying coefficients in nested case-control studies. Biostatistics 11(4):693–706

    Article  MATH  Google Scholar 

  • Liu M, Lu W, Tseng Ch (2010b) Cox regression in nested case-control studies with auxiliary covariates. Biometrics 66(2):374–381

    Article  MathSciNet  MATH  Google Scholar 

  • Lu W, Liu M (2012) On estimation of linear transformation models with nested case-control sampling. Lifetime Data Anal 18(1):80–93

    Article  MathSciNet  MATH  Google Scholar 

  • Lu W, Tsiatis AA (2006) Semiparametric transformation models for the case-cohort study. Biometrika 93(1):207–214

    Article  MathSciNet  MATH  Google Scholar 

  • Ma H, Shi J, Zhou Y (2017) Proportional mean residual life model with censored survival data under case-cohort design. arXiv:1708.01634 [math, stat]

  • Maguluri G, Zhang CH (1994) Estimation in the mean residual life regression model. J R Stat Soc Ser B Methodol 56(3):477–489

    MathSciNet  MATH  Google Scholar 

  • Oakes D, Dasu T (1990) A note on residual life. Biometrika 77(2):409–410

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Reid N (1981) Influence functions for censored data. Ann Stat 9(1):78–92

    Article  MathSciNet  MATH  Google Scholar 

  • Samuelsen S (1997) A pseudolikelihood approach to analysis of nested case-control studies. Biometrika 84(2):379–394

    Article  MathSciNet  MATH  Google Scholar 

  • Scarmo S, Afanasyeva Y, Lenner P, Koenig KL, Horst RL, Clendenen TV, Arslan AA, Chen Y, Hallmans G, Lundin E, Rinaldi S, Toniolo P, Shore RE, Zeleniuch-Jacquotte A (2013) Circulating levels of 25-hydroxyvitamin D and risk of breast cancer: a nested case-control study. Breast Cancer Res 15(1):R15

  • Scheike TH, Juul A (2004) Maximum likelihood estimation for Cox’s regression model under nested case-control sampling. Biostatistics 5(2):193–206

    Article  MATH  Google Scholar 

  • Sun L, Zhang Z (2009) A class of transformed mean residual life models with censored survival data. J Am Stat Assoc 104(486):803–815

    Article  MathSciNet  MATH  Google Scholar 

  • Sun L, Song X, Zhang Z (2012) Mean residual life models with time-dependent coefficients under right censoring. Biometrika 99(1):185–197

    Article  MathSciNet  MATH  Google Scholar 

  • Thomas DC (1977) Addendum to methods of cohort analysis: appraisal by application to asbestos mining by Liddell, F. D. K., McDonald, J. C., and Thomas, D. C. J R Stat Soc Ser A Gen 140(4):483–485

    Google Scholar 

  • Yang G, Zhou Y (2014) Semiparametric varying-coefficient study of mean residual life models. J Multivar Anal 128:226–238

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang LX (2000) A functional central limit theorem for asymptotically negatively dependent random fields. Acta Math Hungar 86(3):237–259

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Associate Editor and two referees for their valuable suggestions. The research was partially supported by NIH Grants RO1CA178949.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengling Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Simulation study

We conducted numerical simulations to compare the IPCW method and the QPS estimator under the full cohort of 1000 subjects when censoring rates were approximately 10%, 30% and 80%. A total of 500 simulations were conducted for both the proportional and additive MRL models. We reported the bias, the standard deviation (SD) of the estimates, the average of estimated standard error (SE) and the coverage rate (CP) of 95% Wald-type confidence intervals (see results in Table 6). The SEs of the estimates were calculated through standard bootstrap method. Based on the simulation results, the two estimators had similar performance when censoring probability was low. The biases were all small and the means of estimated SEs were close to the empirical SDs of the parameter estimators. The 95% Wald-type confidence intervals had proper coverage rate. However, when the censoring rate was 80%, the IPCW performance dropped and underestimated SEs, which led to low coverage probabilities.

Table 6 Comparison between IPCW estimator and QPS estimator

1.2 Regularity conditions

Let \(u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }), u_{\breve{\varvec{Z}}}(t;\varvec{\beta })\) and \(u_{\bar{\varvec{Z}}}(t)\) be the limits of \(\tilde{\varvec{Z}}(t;\varvec{\beta }), \breve{\varvec{Z}}(t;\varvec{\beta })\) and \(\bar{\varvec{Z}}(t;\varvec{\beta })\), respectively. We assume the following regularity conditions:

  1. (C1)

    sup supp(F) \(\le \) sup supp(G), where \(F(\cdot )\) and \(G(\cdot )\) are the distribution functions of T and C, respectively;

  2. (C2)

    \(\varvec{Z}\) is bounded;

  3. (C3)

    \(m_*(t)\) is continuously differentiable on [0,\(\tau \)];

  4. (C4)

    \(A=\int _0^\tau E[(\varvec{Z}-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)) (\varvec{Z}-u_{\breve{\varvec{Z}}}(t;\varvec{\beta }_*))' (\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}\}dN(t) -Y(t)d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}\})]\) is nonsingular;

Proof of Theorem 1(i)

First, we want to establish the consistency of the estimators \(\hat{m}_0(t)\) and \(\hat{\varvec{\beta }}\). Condition (C3) implies that \(m_0(t)\) is of bounded variation on \([0,\tau ]\). Define \(\mathcal {B}=\{\varvec{\beta }:||\varvec{\beta }-\varvec{\beta }_*||\le \epsilon \}\) for any \(\epsilon >0\) and we have \(E(w_i|\mathcal {F})=1\). By the strong law of large numbers and the fact that \(E\{w_idM_i(t)\}=0\), for large n, \(t\in [0,\tau ], \varvec{\beta }\in \mathcal {B}\), and sufficiently large \(\theta \),

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^nw_i \left[ dN_i(t)-Y_i(t)\frac{dg\{m_0(t)+\theta +\varvec{\beta }'\varvec{Z}_i\}+dt}{g\{m_0(t)+\theta +\varvec{\beta }'\varvec{Z}_i\}}\right] <0, \end{aligned}$$
(11)
$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^nw_i \left[ dN_i(t)-Y_i(t)\frac{dg\{m_0(t)-\theta +\varvec{\beta }'\varvec{Z}_i\}+dt}{g\{m_0(t)-\theta +\varvec{\beta }'\varvec{Z}_i\}}\right] >0. \end{aligned}$$
(12)

By (11), (12), and the monotonicity and continuity of g function, for any \(t\in [0,\tau ]\) and \(\varvec{\beta }\in \mathcal {B}\), there exists a unique \(\hat{m}_0(t;\varvec{\beta })\) that satisfies

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^nw_i\left[ dN_i(t)-Y_i(t)\frac{dg\{\hat{m}_0(t;\varvec{\beta })+\varvec{\beta }'\varvec{Z}_i\}+dt}{g\{\hat{m}_0(t;\varvec{\beta })+\varvec{\beta }'\varvec{Z}_i\}}\right] =0. \end{aligned}$$
(13)

Note that (11) and (12) hold for any \(\theta >0\) when and only when \(\varvec{\beta }=\varvec{\beta }_*\). Then we have that \(\hat{m}_0(t;\varvec{\beta })\) converges to \(m_0(t;\varvec{\beta })\) uniformly in \(t\in [0,\tau ]\) and \(\varvec{\beta }\) in a compact set which contains the true parameter \(\varvec{\beta }_*\), and \(m_0(t;\varvec{\beta }_*) = m_*(t)\). Thus, to prove the existence and uniqueness of \(\hat{\varvec{\beta }}\) and \(\hat{m}_0(t)\), it suffices to show that there exists a unique solution to \(U(\varvec{\beta })=0\). Take derivative of (13) with respect to \(\varvec{\beta }\), we have

$$\begin{aligned}&\frac{d\hat{m}_0(t)}{d\varvec{\beta }}\frac{\sum _{i=1}^nw_i[\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}dN_i(t)-Y_i(t)d\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}]}{\sum _{i=1}^nw_iY_i(t)\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}} - d(\frac{d\hat{m}_0(t)}{d\varvec{\beta }}) \\&\quad = - \frac{\sum _{i=1}^nw_i\varvec{Z}_i[\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}dN_i(t)-Y_i(t)d\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}]}{\sum _{i=1}^nw_iY_i(t)\dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}}, \end{aligned}$$

which is a first-order linear ordinary differential equation about \(d\hat{m}_0(t)/d\varvec{\beta }\). The solution is

$$\begin{aligned} \frac{d\hat{m}_0(t)}{d\varvec{\beta }}= \frac{d\hat{m}_0(t;\varvec{\beta })}{d\varvec{\beta }}=-\frac{1}{K(t;\varvec{\beta })}\int _t^\tau K(u;\varvec{\beta })Q(u;\varvec{\beta }) \equiv -\breve{\varvec{Z}}(t;\varvec{\beta }), \end{aligned}$$

where

$$\begin{aligned} K(t;\varvec{\beta })= & {} \exp \left\{ -\int _0^t\frac{\sum _{i=1}^nw_i[\dot{g} \{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}dN_i(t)-Y_i(t)d\dot{g} \{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}]}{\sum _{i=1}^nw_iY_i(t) \dot{g}\{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}}\right\} ,\\ Q(t;\varvec{\beta })= & {} \frac{\sum _{i=1}^nw_i\varvec{Z}_i[\dot{g}\{\hat{m}_0(t)+ \varvec{\beta }'\varvec{Z}_i\}dN_i(t)-Y_i(t)d\dot{g} \{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}]}{\sum _{i=1}^nw_iY_i(t)\dot{g} \{\hat{m}_0(t)+\varvec{\beta }'\varvec{Z}_i\}}. \end{aligned}$$

Let \(\hat{A}(\varvec{\beta }_*) \doteq dU(\varvec{\beta })/d\varvec{\beta }|_{\varvec{\beta }=\varvec{\beta }_*}\). We have

$$\begin{aligned} \hat{A}(\varvec{\beta }_*)\doteq & {} \frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-\varvec{\bar{Z}}(t;\varvec{\beta }_*)][\varvec{Z}_i-\varvec{\breve{Z}}(t;\varvec{\beta }_*)]'[\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dN_i(t)\\&\quad -Y_i(t)d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}]\\= & {} \frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)][\varvec{Z}_i-u_{\breve{\varvec{Z}}}(t;\varvec{\beta }_*)]'[\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dN_i(t)\\&\quad -Y_i(t)d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}] +o_p(1)\\= & {} A+o_p(1) \end{aligned}$$

Thus, \(\hat{A}(\varvec{\beta }_*)\) converges in probability to a nonrandom A. It is easy to check that \(U(\varvec{\beta }_*)\rightarrow 0\) almost surely, and A is nonsingular by (C4). The convergence of \(\hat{A}(\varvec{\beta }_*)\) and the continuity of \({A}(\varvec{\beta })\) imply that we can find a small neighborhood of \(\varvec{\beta }_*\) in which \(\hat{A}(\varvec{\beta }_*)\) is nonsingular when n is large enough. Therefore, it follows from the inverse function theorem that within a small neighborhood of \(\varvec{\beta }_*\), there exists a unique solution \(\hat{\varvec{\beta }}\) to \(U(\varvec{\beta })=0\) for sufficiently large n. Thus, there exists unique estimators \(\hat{\varvec{\beta }}\) and \(\hat{m}(t)\). Since \(\hat{\varvec{\beta }}\) is strongly consistent to \(\varvec{\beta }_*\), then it follows the uniform convergence of \(\hat{m}_0(t;\varvec{\beta })\) to \(m_0(t;\varvec{\beta })\) that \(\hat{m}_0(t)\doteq \hat{m}_0(t;\hat{\varvec{\beta }}) \rightarrow m_0(t;\varvec{\beta }_*) = m_*(t)\) almost surely in \([0,\tau ]\).

Proof of Theorem 1(ii)

In this section, we first prove the Theorem 1(ii) under the CC design, then prove the theorem under the NCC design following the proof from Lu and Liu (2012). We know from Eq. (8) that

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n w_i\left[ g\{m_0(t)+\varvec{\beta }^{'}\varvec{Z}_i\}dN_i(t)-Y_i(t)d[g\{m_0(t)+\varvec{\beta }^{'}\varvec{Z}_i\}+t]\right] \\&\quad = \frac{1}{n}\sum _{i=1}^nw_ig\{m_0(t)+\varvec{\beta }'\varvec{Z}_idM_i(t)\} \\&\qquad \frac{1}{n}\sum _{i=1}^n w_i\left[ g\{\hat{m}_0(t)+\varvec{\beta }^{'}\varvec{Z}_i\}dN_i(t)-Y_i(t)d[g\{\hat{m}_0(t)+\varvec{\beta }^{'}\varvec{Z}_i\}+t]\right] =0 \end{aligned}$$

Subtract the above two equations and use Taylor expansion, we have,

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^nw_i\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}[\hat{m}_0(t)-m_0(t)]dN_i(t) \\&\qquad -\frac{1}{n}\sum _{i=1}^nw_iY_i(t)d\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}[\hat{m}_0(t)-m_0(t)] \\&\qquad -\frac{1}{n}\sum _{i=1}^nw_iY_i(t)d\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}[d\hat{m}_0(t)-dm_0(t)] \\&\quad =-\frac{1}{n}\sum _{i=1}^nw_ig\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}dM_i(t) \end{aligned}$$

Hence, following the first-order ordinary differential equation,

$$\begin{aligned}&[\hat{m}_0(t)-m_0(t)]\frac{\sum _{i=1}^nw_i[\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}dN_i(t)-Y_i(t)d\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}]}{\sum _{i=1}^nw_iY_i(t)\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}} \\&\qquad -[d\hat{m}_0(t)-dm_0(t)] =-\frac{\sum _{i=1}^nw_ig\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}dM_i(t)}{\sum _{i=1}^nw_iY_i(t)\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}}, \\&\hat{m}_0(t)-m_0(t) = -\frac{1}{K(t;\varvec{\beta })}\int _t^\tau K(u;\varvec{\beta })\frac{\sum _{i=1}^nw_ig\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}dM_i(t)}{\sum _{i=1}^nw_iY_i(t)\dot{g}\{m_0(t)+\varvec{\beta }'\varvec{Z}_i \}}. \end{aligned}$$

Let \(U(\varvec{\beta }_*) \doteq U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*))\) and we have

$$\begin{aligned} U(\varvec{\beta }_*,m_*(t))= & {} \frac{1}{n}\sum _{i=1}^n\int _0^{\tau }w_i\varvec{Z}_i[g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dN_i(t) \\&-Y_i(t)dg\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}-Y_i(t)dt]. \end{aligned}$$

By using Taylor expansion again in \(U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*)) - U(\varvec{\beta }_*,m_*(\cdot ))\), we have

$$\begin{aligned}&U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*)) - U(\varvec{\beta }_*,m_*(\cdot )) \\&\quad = \frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i\varvec{Z}_i[\hat{m}_0(t)-m_0(t)]\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dN_i(t) \\&\qquad -w_i\varvec{Z}_iY_i(t)[\hat{m}_0(t)-m_0(t)]d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\} \\&\qquad -w_i\varvec{Z}_iY_i(t)[d\hat{m}_0(t)-dm_0(t)]\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\} \\&\quad =\frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-\bar{\varvec{Z}}(t;\varvec{\beta }_*)][\hat{m}_0(t)-m_0(t)][\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dN_i(t) \\&\qquad -Y_i(t)d\dot{g}\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}] - w_i\bar{\varvec{Z}}(t;\varvec{\beta }_*)g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t) \\&\quad =-\frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i[\tilde{\varvec{Z}}(t;\varvec{\beta }_*)+\bar{\varvec{Z}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t). \end{aligned}$$

Thus,

$$\begin{aligned}&\sqrt{n}U(\varvec{\beta }_*) = \sqrt{n}U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*)) \\&\quad = \sqrt{n}U(\varvec{\beta }_*,m_*)+\sqrt{n}[U(\varvec{\beta }_*,\hat{m}_0(t;\varvec{\beta }_*))-U(\varvec{\beta }_*,m_*)] \\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-\bar{\varvec{Z}}(t;\varvec{\beta }_*)-\tilde{\varvec{Z}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) \\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*)\\&\qquad + o_p(1) \\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*)\\&\qquad +\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau (w_i-1)[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)] \\&\qquad \quad g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) + o_p(1)\\&\quad =\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*)\\&\qquad -\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau (1-\delta _i)(1-\gamma _i/p_{0i})[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)] \\&\qquad \quad g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) + o_p(1). \end{aligned}$$

As we defined in our manuscript, let \(\mathcal {G}_i\) be the \(\sigma \)-field generated by \(\{\tilde{T}_i,\delta _i,\varvec{Z}_i, i=1,\ldots ,n\}\) and \(\mathcal {F}_i\) be the \(\sigma \)-field generated by \(\{\tilde{T}_i,\delta _i, i=1,\ldots ,n\}\). We denote

$$\begin{aligned} \eta _i = (1-\delta _i)\int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)- u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*' \varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*). \end{aligned}$$

It is evident that \(E(1-\gamma _i/p_{0i}|\mathcal {F}_i)=0\) and \(E\{\eta _i(1-\gamma _i/p_{0i}|\mathcal {F}_i)\}=E\{\eta _i E (1-\gamma _i/p_{0i}|\mathcal {F}_i)\}\)=0. Following the proof in Lu and Tsiatis (2006), \(var\{\eta _i(1-\gamma _i/p_{0i})\}= E \{\eta _i^{\otimes 2}(1-p_{0i})/p_{0i}\}- E [\eta _i(1-p_{0i})/p_{0i}]^{\otimes 2} = \Sigma _2\). Condition on \(\mathcal {F}_i, \{\eta _i(1-\gamma _i/p_{0i}),i=1,\ldots ,n\}\) and the first term of \(\sqrt{n}U(\varvec{\beta }_*)\) are uncorrelated. Therefore, \(\sqrt{n}U(\varvec{\beta }_*)\) is asymptotically normal with mean zero and variance-covariance matrix \(\Sigma =\Sigma _1+\Sigma _2\). By Taylor expansion and consistency of \(\hat{\varvec{\beta }}\), it follows

$$\begin{aligned}&\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }_*) \\&\quad = -A^{-1}\sqrt{n}U(\varvec{\beta }_*) +o_p(1) \\&\quad = -A^{-1}\frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]\\&\qquad g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*)+o_p(1), \end{aligned}$$

thus \(\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }_*)\rightarrow N\{A^{-1}\Sigma (A^{-1})^{'}\}.\)

Under the nested case-control design, the asymptotic distribution of \(\hat{\varvec{\beta }}\) is more difficult to derive because the NCC sampling scheme is a dynamic process. The probability of being selected as a control is neither a constant not independent. Thus, we consider the idea of central limit theory for asymptotically negatively dependent random variables (Zhang 2000), which has been used in the proof of Lu and Liu (2012). Based on the following asymptotical representation, we have

$$\begin{aligned} U(\varvec{\beta }_*)= & {} \frac{1}{n}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)] \\&g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) +o_p(1) \\\equiv & {} U_1(\varvec{\beta }_*) + U_2(\varvec{\beta }_*) +o_p(\frac{1}{\sqrt{n}}). \end{aligned}$$

By martingale central limit theorem, \(\sqrt{n}U_1(\varvec{\beta }_*) = \frac{1}{\sqrt{n}}\sum _{i=1}^n\int _0^\tau w_i[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t;\varvec{\beta }_*,m_*) \rightarrow N(0,\Sigma _1)\) as \(n \rightarrow \infty \). Since \(E(w_i-1|\mathcal {F}_i)\) = 0, it is evident that \(U_1(\varvec{\beta }_*)\) and \(U_2(\varvec{\beta }_*)\) are uncorrelated. However, because of the NCC sampling scheme, \(w_i\) and \(w_j\) (\(i \ne j\)) are correlated even after conditioning on \(\mathcal {F}\). Since \((w_i-1)^2 = (1-\delta _i)(\gamma _i-p_{0i}^2)/p_{0i}^2\), then \(E\{(w_i-1)^2|\mathcal {F}\} = (1-\delta _i)(1-p_{0i})/p_{0i}\). Thus, the conditional variance of \(\sqrt{n}U_2(\varvec{\beta }_*)\) can be written as

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n\frac{1-p_{0i}}{p_{0i}}(1-\delta _i) \left[ \int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t)\right] ^{\otimes 2} \\&\qquad +\frac{1}{n}\sum _{i\ne j}E\{(\frac{\gamma _i}{p_{0i}}-1)(\frac{\gamma _j}{p_{0j}}-1)|\mathcal {F}\} \\&\qquad *(1-\delta _i)(1-\delta _j)\int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t) \\&\qquad \quad \left[ \int _0^\tau [\varvec{Z}_j-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_j\}dM_j(t)\right] '. \end{aligned}$$

According to Samuelsen (1997), for \(i\ne j, Cov(\gamma _i,\gamma _j|\mathcal {F})=\rho _{ij}(1-p_{0i})(1-p_{0j})\), where \(\rho _{ij}=-\frac{m}{n}\int _0^{min(\tilde{T_i},\tilde{T_j})} \frac{\bar{g}_1(t)}{\bar{y}(t)}dm_*(t)+\frac{\bar{g}_2(t)}{\bar{y}(t)}dt+O_p(n^{-2})\). Thus, with some algebra, the Var\(\{\sqrt{n}U_2(\varvec{\beta }_*)|\mathcal {F}\}\) can be written as

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^n(1-\delta _i)\frac{1-p_{0i}}{p_{0i}}\left[ \int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t)\right] ^{\otimes 2} \\&\qquad -m\int _0^\tau [\frac{1}{n}\sum _{i=1}^nY_i(t)\frac{1-p_{0i}}{p_{0i}}(1-\delta _i)\int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)] \\&\qquad \quad g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t)]^{\otimes 2}\left( \frac{\bar{g}_1(t)}{\bar{y}(t)}dm_*(t) +\frac{\bar{g}_2(t)}{\bar{y}(t)}dt\right) +o_p(1), \end{aligned}$$

where \(\bar{g}_1(t) = \sum _{j=1}^n\frac{Y_j(t)\dot{g}\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}{g\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}, \bar{g}_2(t) = \sum _{j=1}^n\frac{Y_j(t)}{g\{\hat{m}_0(t)+\hat{\varvec{\beta }}'\varvec{Z}_j\}}, \bar{y}(t)=\sum _{j=1}^nY_j(t)\). According to strong law of large numbers, we have

$$\begin{aligned} \Sigma _2\equiv & {} lim_{n\rightarrow \infty } Var\{\sqrt{n}U_2(\varvec{\beta }_*)|\mathcal {F}\} \\= & {} E\left[ \frac{1-s_0}{s_0}(1-\delta )[[\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM_i(t)]^{\otimes 2}\right] \\&-m\int _0^\tau (E[Y(t)(1-\delta )\frac{1-s_0}{s_0} \\&\quad \int _0^\tau [\varvec{Z}_i-u_{\bar{\varvec{Z}}}(t;\varvec{\beta }_*)-u_{\tilde{\varvec{Z}}}(t;\varvec{\beta }_*)]g\{m_*(t)+\varvec{\beta }_*'\varvec{Z}_i\}dM(t)]) ^{\otimes 2} \\&\qquad \left( \frac{\bar{g}_1(t)}{\bar{y}(t)}dm_*(t)+\frac{\bar{g}_2(t)}{\bar{y}(t)}dt\right) , \end{aligned}$$

where

$$\begin{aligned} s_{0i} = lim_{n\rightarrow \infty } p_{0i} = 1-\exp \left\{ -m\int _0^{\tilde{T}_i} \frac{\bar{g}_1(t)}{\bar{y}(t)}dm_0(t)+\frac{\bar{g}_2(t)}{\bar{y}(t)}dt \right\} . \end{aligned}$$

Thus, by the central limit theory for asymptotically negatively dependent random variables (Zhang 2000), we have \(\sqrt{n}U_2(\varvec{\beta }_*) \rightarrow N(0,\Sigma _2)\) as \(n\rightarrow \infty \), and

$$\begin{aligned} \sqrt{n}U(\varvec{\beta }_*) \rightarrow N(0,\Sigma _1+\Sigma _2), \end{aligned}$$

in distribution as \(n \rightarrow \infty \). It is easy to see that \(\Sigma _1+\Sigma _2 = \Sigma \). Follow by Taylor expansion and consistency of \(\hat{\varvec{\beta }}\), we have \(\sqrt{n}(\hat{\varvec{\beta }}-\varvec{\beta }_*)\rightarrow N\{A^{-1}\Sigma (A^{-1})^{'}\}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, P., Zeleniuch-Jacquotte, A. & Liu, M. Generalized mean residual life models for case-cohort and nested case-control studies. Lifetime Data Anal 26, 789–819 (2020). https://doi.org/10.1007/s10985-020-09499-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-020-09499-w

Keywords

Navigation