Abstract
In epidemiological studies, the case-cohort design is a widely used method for their outstanding cost-effectiveness. Most of the existing works for the case-cohort design are focused on univariate failure time data. However, clustered failure time data are commonly encountered in epidemiological studies. In this article, we study the marginal nonmixture cure model for clustered failure time data with a cure fraction in the context of case-cohort design. A sieve semiparametric likelihood method is proposed to estimate the parametric and nonparametric components. The proposed method is easy to implement. The resulting estimators are shown to be strongly consistent and asymptotically normal. Simulation studies are carried out to assess the finite sample performance of the proposed method. We also analyze a real dataset from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial to illustrate our method.
Similar content being viewed by others
References
Amico M, Van Keilegom I (2018) Cure models in survival analysis. Annu Rev Stat Appl 5:311–345
Bahari F, Parsi S, Ganjali M (2021) Empirical likelihood inference in general linear model with missing values in response and covariates by MNAR mechanism. Stat Pap 62(2):591–622
Barlow WE (1994) Robust variance estimation for the case-cohort design. Biometrics 50(4):1064–1072
Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515
Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc Ser B Stat Methodol 11(1):15–53
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26:211–252
Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Stat 34(1):86–102
Chen HY (2001) Weighted semiparametric likelihood method for fitting a proportional odds regression model to data from the case-cohort design. J Am Stat Assoc 96(456):1446–1457
Chen CM, Lu TFC (2012) Marginal analysis of multivariate failure time data with a surviving fraction based on semiparametric transformation cure models. Comput Stat Data Anal 56(3):645–655
Chen MH, Ibrahim JG, Sinha D (1999) A new Bayesian model for survival data with a surviving fraction. J Am Stat Assoc 94(447):909–919
Chen MH, Ibrahim JG, Sinha D (2002) Bayesian inference for multivariate survival data with a cure fraction. J Multivar Anal 80(1):101–126
Chen YH, Chatterjee N, Carroll RJ (2008) Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association. Biostatistics 9(1):81–99
Chen CM, Lu TFC, Hsu CM (2013) Association estimation for clustered failure time data with a cure fraction. Comput Stat Data Anal 57:210–222
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc B 34:187–220
Deng LF, Ding JL, Liu YY et al (2018) Regression analysis for the proportional hazards model with parameter constraints under case-cohort design. Comput Stat Data Anal 117:194–206
Ding JL, Chen XL, Fang HY et al (2018) Case-cohort design for accelerated hazards model. Stat Interface 11(4):657–668
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38(4):1041–1046
Han B, Wang XG (2020) Semiparametric estimation for the non-mixture cure model in case-cohort and nested case-control studies. Comput Stat Data Anal 144(106):874
Hu T, Xiang LM (2013) Efficient estimation for semiparametric cure models with interval-censored data. J Multivar Anal 121:139–151
June CH, O’Connor RS, Kawalekar OU et al (2018) CAR T cell immunotherapy for human cancer. Science 359(6382):1361–1365
Kalbfleisch JD, Lawless JF (1988) Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 7(1–2):149–160
Kuk AYC, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79(3):531–541
Lai X, Yau KKW (2008) Long-term survivor model with bivariate random effects: applications to bone marrow transplant and carcinoma study data. Stat Med 27(27):5692–5708
Li Y, Panagiotou OA, Black A et al (2016) Multivariate piecewise exponential survival modeling. Biometrics 72(2):546–553
Li W, Li RS, Feng ZD et al (2020) Semiparametric isotonic regression analysis for risk assessment under nested case-control and case-cohort designs. Stat Methods Med Res 29(8):2328–2343
Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New York
Lu SE, Shih JH (2006) Case-cohort designs and analysis for clustered failure time data. Biometrics 62(4):1138–1148
Ma SG (2007) Additive risk model with case-cohort sampled current status data. Stat Pap 48(4):595–608
Maller RA, Zhou S (1992) Estimating the proportion of immunes in a censored sample. Biometrika 79(4):731–739
Niu Y, Peng Y (2013) A semiparametric marginal mixture cure model for clustered survival data. Stat Med 32(14):2364–2373
Niu Y, Peng Y (2014) Marginal regression analysis of clustered failure time data with a cure fraction. J Multivar Anal 123:129–142
Peng YW, Taylor JMG (2011) Mixture cure model with random effects for the analysis of a multi-center tonsil cancer study. Stat Med 30(3):211–223
Peng YW, Taylor JMG (2014) Cure models in handbook of survival analysis. Chapman and Hall, Boca Raton
Peng YW, Xu JF (2012) An extended cure model and model selection. Lifetime Data Anal 18(2):215–233
Peng YW, Taylor JMG, Yu BB (2007) A marginal regression model for multivariate failure time data with a surviving fraction. Lifetime Data Anal 13(3):351–369
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Portier F, El Ghouch A, Van Keilegom I (2017) Efficiency and bootstrap in the promotion time cure model. Bernoulli 23(4B):3437–3468
Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11
Prorok PC, Andriole GL, Bresalier RS et al (2000) Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Control Clin Trials 21:273S-309S
Segal MR, Neuhaus JM, James IR (1997) Dependence estimation for marginal models of multivariate survival data. Lifetime Data Anal 3(3):251–268
Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81
Shen XT (1997) On methods of sieves and penalization. Ann Stat 25(6):2555–2591
Shen XT, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615
Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10(4):1040–1053
Sy JP, Taylor JMG (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56(1):227–236
Taylor JMG (1995) Semi-parametric estimation in failure time mixture models. Biometrics 51(3):899–907
Tsodikov A (1998) A proportional hazards model taking account of long-term survivors. Biometrics 54(4):1508–1516
Tsodikov AD, Ibrahim JG, Yakovlev AY (2003) Estimating cure rates from survival data: an alternative to two-component mixture models. J Am Stat Assoc 98(464):1063–1078
van de Geer SA (2000) Applications of empirical process theory. Cambridge University Press, Cambridge
van der Vaart AW (1998) Asymptotic statistics, Cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42(1):1–17
Yakovlev AY, Tsodikov AD (1996) Stochastic models of tumor latency and their biostatistical applications. World Scientific, Singapore
Yau KKW, Ng ASK (2001) Long-term survivor mixture model with random effects: application to a multi-centre clinical trial of carcinoma. Stat Med 20(11):1591–1607
Zhang H, Schaubel DE, Kalbfleisch JD (2011) Proportional hazards regression for the analysis of clustered survival data from case-cohort studies. Biometrics 67(1):18–28
Zhao W, Chen YQ, Hsu L (2017) On estimation of time-dependent attributable fraction from population-based case-control studies. Biometrics 73(3):866–875
Acknowledgements
The authors thank the associate editor and two reviewers for their constructive and insightful comments. They are grateful to the National Cancer Institute for access to NCI’s data collected by the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. The statements contained herein are solely those of the authors and do not represent concurrence by NCI. The second author was partially supported by the China Postdoctoral Science Foundation (Grant No. 2021TQ0349). The third author was partially supported by Dalian High-level Talent Innovation Project (Grant No. 2020RD09).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
In this Appendix, we will present the proofs of Theorems 1–3. Let \(W_{\cdot j}=\{T_{\cdot j},\delta _{\cdot j},\varvec{Z}_{\cdot j}\}, j=1,\ldots ,K\) denote the data for a generic cluster and \(\varvec{W}=\{W_{\cdot 1}, \ldots , W_{\cdot K}\}\). Similarly, we denote \(W^{\xi }_{\cdot j}=\{T_{\cdot j},\delta _{\cdot j},\xi _{\cdot j}\varvec{Z}_{\cdot j},\xi _{\cdot j}\}, j=1,\ldots ,K\) as a single observation for a generic cluster under the case-cohort design, and \(\varvec{W}^{\xi }=\{W^{\xi }_{\cdot 1}, \ldots , W^{\xi }_{\cdot K}\}\). Furthermore, we define the function class \(\mathcal {L}_{n}=\{l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })=\sum _{j=1}^{K}l^{w}(\varvec{\theta };W_{\cdot j}^{\xi }):\varvec{\theta }\in \varvec{\Theta }_{n}\}\). In the whole proofs, let \(Pg=\int g(x)\textrm{d}P(x)\), the expectation of g(x) under the distribution P, and \(P_{n}g=n^{-1}\sum _{i=1}^{n} g(X_{i})\), the expectation of g(X) under the empirical measure \(P_n\). We employ \(\widetilde{C}\) to represent a universal positive constant, which may vary from position to position.
For any \(\epsilon >0\), the covering number \(N(\epsilon , \mathcal {L}_{n}, L_1(P_n))\) is defined as the smallest positive integer \(\kappa \), then there exists \(\{\varvec{\theta }^{(1)},\ldots ,\varvec{\theta }^{(\kappa )}\}\) such that
for all \(\varvec{\theta }\in \varvec{\Theta }_{n}\), where \(k=1,\ldots ,\kappa , \varvec{\theta }^{(k)}=(\varvec{\beta }^{(k)}, F^{(k)})\in \varvec{\Theta }_{n}\). If \(\kappa \) does not exist, we define \(N(\epsilon ,\mathcal {L}_{n},L_1(P_n))=\infty \).
Lemma 1
Under conditions (C1)–(C3), the covering number of the function class \(\mathcal {L}_{n}\) satisfies
where \(\widetilde{C}\) is a constant, \(m=o(n^{\nu })\) with \(0<\nu <1\) and the size of the sieve space \(\varvec{\Theta }_{n}\) is controlled by \(M_{n}=O(n^{c})\) with a constant \(c \in (0,\infty )\).
Proof of Lemma 1
For any \(\varvec{\theta }^{1}=(\varvec{\beta }^{1},F^{1}), \varvec{\theta }^{2}=(\varvec{\beta }^{2},F^{2})\in \varvec{\Theta }_{n}\), under conditions (C1)–(C3), there exists a large enough constant \(\widetilde{C}\) such that
where \(\Vert g\Vert _{\infty }=\sup _t|g(t)|\) for a function g. Denote \(\varvec{\gamma }^{j}=(\gamma _{0,j},\ldots ,\gamma _{m,j})^{T}\) as the Bernstein coefficients vector corresponding to \(F^{j},j=1,2\). Then, we obtain that
By plugging (7) into (8), it is easy to show
Thus, for any \(\varvec{\theta }\in \varvec{\Theta }_{n}\),
where \(k=1,\ldots ,\kappa \). According to Lemma 2.5 of van de Geer (2000), we can show that \(\{\varvec{\beta }\in \mathbb {R}^{p}: \Vert \varvec{\beta }\Vert \le M\}\) is covered by \((10\widetilde{C}M/\epsilon )^{p}\) balls with radius \(\epsilon /(2\widetilde{C})\) and \(\{\varvec{\gamma }\in R^{m+1},\sum _{k=0}^{m}| \gamma _{k}|\le M_{n}\}\) is covered by \((10\widetilde{C}M_{n}/\epsilon )^{m+1}\) balls with radius \(\epsilon /(2\widetilde{C})\). As a consequence, the covering number of the function class \(\mathcal {L}_{n}\) satisfies
We finish the proof of Lemma 1. \(\square \)
Lemma 2
Under conditions (C1)–(C3), we have
almost surely.
Proof of Lemma 2
Under conditions (C1)–(C3), we have the uniform bound of \(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })\). Without loss of generality, we assume \(\sup _{\varvec{\theta }\in \varvec{\Theta }}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\le 1\). Then we have \(P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }))^{2}\le P(\sup _{\varvec{\theta }\in \varvec{\Theta }}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|)^{2}\le 1\). Let \(\alpha _{n}= n^{-1/2+\iota }(\log n)^{1/2}\) with \(\nu /2<\iota <1/2\). It is easy to see that \(\{\alpha _{n}\}\) is a nonincreasing sequence of positive numbers. Let \(\epsilon _{n}=\epsilon \alpha _{n}\) with \(\epsilon >0\). For any \(\varvec{\theta }\in \varvec{\Theta }_{n}\) and sufficiently large n, we then have
Denote the observations as \(\mathcal {W}^{\xi }=\{\varvec{W}^{\xi }_1, \ldots , \varvec{W}^{\xi }_n\}\). Following Pollard (1984), we denote \(P^{0}_{n}\) as the signed measure that places mass \(\pm n^{-1}\) at each of the observations \(\mathcal {W}^{\xi }\). According to (p.31 Pollard 1984) and the formula (9), we have the following inequality
Given \(\mathcal {W}^{\xi }\), by the definition of the covering number, we can choose \(\varvec{\theta }^{(1)},\ldots ,\varvec{\theta }^{(\kappa )}\), where \(\kappa =N(\epsilon _{n}/2,\mathcal {L}_n,L_{1}(P_{n}))\), such that
for all \(\varvec{\theta }\in \varvec{\Theta }_{n}\). For each \(\varvec{\theta }\in \varvec{\Theta }_{n}\), write \(\varvec{\theta }^{*}\) for the \(\varvec{\theta }^{(k)}\) at which the minimum is achieved. By some simple calculations, we have the following inequality
According to the formula (10), we obtain
By the definition of the covering number \(N(\epsilon _{n}/2,\mathcal {L}_{n},L_{1}(P_n))\), for each \(\varvec{\theta }^{(k)}\), there exists \(\tilde{\varvec{\theta }}^{(k)} \in \varvec{\Theta }_{n}\) such that \(P_{n}|l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi })-l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi })|<\epsilon _{n}/2\). Then we have
Combining Lemma 2.2.7 of van der Vaart and Wellner (1996) with \(|l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi })|\le 1\), we then have
According to Lemma 1, the formulas (11) and (12), we have
By taking expectations over \(\mathcal {W}^{\xi }\), we obtain
According to \(M_n=O(n^c)\), \(m=o(n^\nu )\) and \(\iota > \nu /2\), we can show that
Hence, we obtain that
By the Borel–Cantelli lemma, we have \(\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\rightarrow 0\) almost surely. \(\square \)
Proof of Theorem 1
According to Lemmas 1 and 2, we can show that
and
almost surely as \(n\rightarrow \infty \). Define \(\varvec{\Theta }_{\epsilon }=\{\varvec{\theta }:\textrm{d}(\varvec{\theta },\varvec{\theta }_0)\ge \epsilon ,\varvec{\theta }\in \varvec{\Theta }_n \}\) for \(\epsilon > 0\),
and
where \(M_{K}(\varvec{\theta };\varvec{W}^{\xi })=-l_{K}^w(\varvec{\theta };\varvec{W}^{\xi })\). We have
If \(\hat{\varvec{\theta }}_n\in \varvec{\Theta }_{\epsilon }\), we then have
By identification of the model (1), we obtain that \(\delta _\epsilon =\inf _{\varvec{\Theta }_{\epsilon }}PM_{K}(\varvec{\theta };\varvec{W}^{\xi })-PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi })>0\). By conditions (C1)-(C3), it is easy to show
with \(\zeta _{n}=\zeta _{1n}+\zeta _{2n}\). Hence, we have \(\zeta _{n}\ge \delta _\epsilon \). It is indicated that \(\{\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\ge \epsilon \}\subseteq \{\zeta _n\ge \delta _\epsilon \}\). According to the formula (13) and the strong law of large numbers, we obtain that both \(\zeta _{1n}\rightarrow 0\) and \(\zeta _{2n}\rightarrow 0\) almost surely as \(n \rightarrow \infty \). Thus, \(\bigcup ^{\infty }_{k=1}\bigcap ^{\infty }_{n=k} \{\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\ge \epsilon \} \subseteq \bigcup ^{\infty }_{k=1}\bigcap ^{\infty }_{n=k} \{\zeta _n\ge \delta _\epsilon \}\), which proves that \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\rightarrow 0\) almost surely. It is easy to see that \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\) and \(\Vert \hat{F}_n- F_0\Vert _2 \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\). Therefore, we have \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \rightarrow 0\) and \(\Vert \hat{F}_n- F_0\Vert _2 \rightarrow 0\) almost surely. We complete the proof of Theorem 1. \(\square \)
Proof of Theorem 2
Now we will obtain the convergence rate of \(\hat{\varvec{\theta }}_n\). By Theorem 1.6.2 of Lorentz (1986), there exists a Bernstein polynomial \(F_{n0}\) which satisfies \(\Vert F_{n0}-F_0\Vert _\infty =O(m^{-r/2})\). For convenience, we define \(\varvec{\theta }_{n0}=(\varvec{\beta }_0,F_{n0})\). It is easy to see that \(\textrm{d}(\varvec{\theta }_{n0},\varvec{\theta }_0)=O(n^{-r\nu /2})\). For any \(\eta >0\), we define the class of functions
for a given observation \(\varvec{W}^{\xi }\). One can easily obtain that \(P(l_{K}^{w}(\varvec{\theta }_{0};\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))\le \widetilde{C}\textrm{d}(\varvec{\theta },\varvec{\theta }_{n0})\le \widetilde{C}n^{-r\nu /2}\). Using the relationship between Hellinger distance and Kullback–Leibler information, we then have
Therefore, for sufficiently large n, it yields that
for any \(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }) \in \mathcal {F}_{\eta }\).
Note that \(\mathcal {F}_{\eta }\) is uniformly bounded under conditions (C1)–(C3). Furthermore, by some algebraic manipulations, we obtain \(P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))^2\le \widetilde{C}\eta ^2\) for any \(l^{w}(\varvec{\theta };W^{\xi })-l^{w}(\varvec{\theta }_{n0};W^{\xi }) \in \mathcal {F}_{\eta }\). By Lemma 3.4.2 of van der Vaart and Wellner (1996), we can prove that
where \(J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\}=\int ^{\eta }_0 [1+\log N_{[]}\{\epsilon ,\mathcal {F}_{\eta },L_2(P)\}]^{1/2}\textrm{d}\epsilon \). For \(0<\epsilon <\eta \), by (Shen and Wong 1994, p. 597), we have \(\log N_{[]}\{\epsilon ,\mathcal {F}_{\eta },L_2(P)\}\le \widetilde{C}N\log (\eta /\epsilon )\) with \(N=m+1\). Then we can show that \(J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\} \le \widetilde{C}N^{1/2}\eta \). This yields \(\varphi _n(\eta )=N^{1/2}\eta +N/n^{1/2}\). One can easily show that \(\varphi _n(\eta )/\eta \) is monotonically decreasing with respect to \(\eta \) and \(r_n^2\varphi _n(1/r_n)=r_nN^{1/2}+r_n^2N/n^{1/2}\le \widetilde{C}n^{1/2}\), where \(r_n=N^{-1/2}n^{1/2}=n^{(1-\nu )/2}\).
Note that \(P_n(l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))\ge 0\) and \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{n0})\le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{0})+\textrm{d}(\varvec{\theta }_0,\varvec{\theta }_{n0})\rightarrow 0\) in probability. Therefore, by Theorem 3.4.1 of van der Vaart and Wellner (1996), we have \(n^{(1-\nu )/2}\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{n0})=O_p(1)\). Together with \(\textrm{d}(\varvec{\theta }_{n0},\varvec{\theta }_{0})=O(n^{-r\nu /2})\), it is easy to show that \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{0})=O_p(n^{-(1-\nu )/2}+n^{-r\nu /2})\). According to \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\) and \(\Vert \hat{F}_n- F_0\Vert _2 \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\), we have \(\Vert \hat{\varvec{\beta }}_n-\varvec{\beta }_0\Vert = O_p(n^{-\min (rv/2,(1-v)/2)})\) and \(\Vert \hat{F}_n-F_0\Vert _2 = O_p(n^{-\min (rv/2,(1-v)/2)})\). The proof of Theorem 2 is completed. \(\square \)
Proof of Theorem 3
First, we will present some necessary notations. The linear span of \(\varvec{\Theta }-\varvec{\theta }_0\) is denoted as \(\mathcal {V}\), where \(\varvec{\Theta }\) is the parameter space and \(\varvec{\theta }_0\) is the true value of \(\varvec{\theta }\). For convenience, we denote \(\varrho _n=n^{-\min \{rv/2,\frac{1-v}{2}\}}\). Following the arguments of van der Vaart (1998, p. 296), for any \(\varvec{\theta }\in \{\varvec{\theta }\in \varvec{\Theta }:\Vert \varvec{\theta }-\varvec{\theta }_0\Vert =O(\varrho _n)\}\), the first order directional derivative of \(l_{K}^w(\varvec{\theta };\varvec{W}^\xi )\) at the direction \(\psi \in \mathcal {V}\) is defined as
The second order directional derivative of \(l_{K}^w(\varvec{\theta };\varvec{W}^\xi )\) is given by
The Fisher inner product on the space \(\mathcal {V}\) is denoted by \(\langle \psi ,\tilde{\psi } \rangle =P(\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ]\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\tilde{\psi }])\). Write a smooth functional of \(\varvec{\theta }\) as follows
where \(||\varvec{\mu }_1||\le 1\) and \(\mu _2(t) \in \mathcal {F}\). The Fisher norm for \(\psi \in \mathcal {V}\) is defined as \(\Vert \psi \Vert ^{1/2}=\langle \psi ,\psi \rangle \). The closed linear span of \(\mathcal {V}\) is denoted as \(\overline{\mathcal {V}}\) under the Fisher norm. We can easily find that \((\overline{\mathcal {V}},\Vert \cdot \Vert )\) is a Hilbert space. Similar to \(\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ]\), for any \(\psi \in \mathcal {V}\), the first directional derivative of \(\Lambda (\varvec{\theta })\) at \(\varvec{\theta }_0\) is given by
Similar to Shen (1997), \(\dot{\Lambda }(\varvec{\theta }_0)[\psi ]\) is linear in \(\psi \) and
For any \(\psi ^*\in \varvec{\Theta }\), there exists \(\psi _n^*\in \varvec{\Theta }_n\) such that \(\Vert \psi _n^*-\psi ^*\Vert =O(n^{-\frac{rv}{2}})\). Then we have \(\varrho _n\Vert \psi _n^*-\psi ^*\Vert =o(n^{-1/2})\) with \(r>1\) and \(r\nu >1/2\). For convenience, we define \(\rho [\varvec{\theta }-\varvec{\theta }_0;\varvec{W}^{\xi }]=l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })- \dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0]\) and let \(\epsilon _n\) be any positive sequence satisfying \(\epsilon _n=o(n^{-1/2})\). Some algebra yields that
Next, we will discuss the calculation details of \(I_1, I_2\) and \(I_3\) respectively. For \(I_1\), according to the Chebyshev’s inequality, \(\Vert \psi _n^*-\psi ^*\Vert =O(n^{-\frac{rv}{2}})\) and \(P\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*]=0\), it is easy to show that
which implies that \(P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*]=o_p(n^{-1/2})\) and \(I_1=\epsilon _n\times o_p(n^{-1/2})\).
For \(I_2\), by some algebraic calculations, we have
where \(\tilde{\varvec{\theta }}\) belongs between \(\hat{\varvec{\theta }}_n\) and \(\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*\). Denote the function class \(\mathcal {L}_3=\{\dot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]:\varvec{\theta }\in \Theta _n \text { and } \Vert \varvec{\theta }-\varvec{\theta }_0\Vert =O(\varrho _n)\}\). For any \(\dot{l}_{K}^{w}(\varvec{\theta }_i;\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]\in \mathcal {L}_3 \) \((i=1,2)\), we have \(\bigl |\dot{l}_{K}^{w}(\varvec{\theta }_1;\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_2;\varvec{W}^{\xi })[\psi _n^*]\bigr |\le \widetilde{C}\Vert \varvec{\theta }_1-\varvec{\theta }_2\Vert \). Thus, it follows that
Note that \(N(\epsilon ,\mathcal {L}_3,L_2(Q)) \le \widetilde{C}e^{3/\epsilon }\) such that \(\int _0^{\infty }\sup _{Q}\sqrt{\log N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon = \int _0^{\widetilde{C}\varrho _n}\sup _{Q}\sqrt{\log N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon \) \(+ \int _{\widetilde{C}\varrho _n}^{\infty } 0 d\epsilon < \infty \) with \(v>1/2\) and \(r>1\). Under conditions (C1)-(C3), \(\dot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\psi _n^*]\) is uniformly bounded. According to Theorem 2.8.3 of van der Vaart and Wellner (1996), we know that \(\mathcal {L}_3\) is a Donsker class. Furthermore, it follows from Corollary 2.3.12 of van der Vaart and Wellner (1996) that
Therefore, we have \(I_2=\epsilon _n\times o_p(n^{-1/2})\).
For \(I_3\), note that \(P(\ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0])= -P(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0]\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0])\). For any \(\varvec{\theta }\in \{\varvec{\theta }:\textrm{d}(\varvec{\theta }-\varvec{\theta }_0)=O(\varrho _n)\}\), we have \(P(\ddot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0,\varvec{\theta }-\varvec{\theta }_0] - \ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0,\varvec{\theta }-\varvec{\theta }_0]) = O(\varrho _n^3)\) and \(\varrho _n^3 = o(n^{-1})\) with \(v<1/3\) and \(r>2\). Then, it yields that
where \(\tilde{\varvec{\theta }}\) is between \(\hat{\varvec{\theta }}_n\) and \(\varvec{\theta }_0\). Combining the Cauchy–Schwarz inequality, \(\Vert \psi _n^*\Vert ^2\rightarrow \Vert \psi ^*\Vert ^2<\infty \) and \(\varrho _n\Vert \psi _n^*-\psi ^*\Vert =o(n^{-1/2})\), we have that
According to the results of \(I_1,I_2\) and \(I_3\), we obtain
By some algebraic calculations, we then have \(\sqrt{n}\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle = \sqrt{n}(P_n-P)\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*] + o_p(1)\) with \(P\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*]=0\) and \(Var(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*]) = \Vert \psi ^*\Vert ^2<\infty \). By the central limit theorem, it yields that \(\sqrt{n}\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle \rightarrow N(0,\Vert \psi ^*\Vert ^2)\). By the Riesz representation theorem, there exists \(\psi ^*\in {\bar{\mathcal {V}}} \text { such that } \dot{\Lambda }(\varvec{\theta }_0)[\psi ]=\langle \psi ,\psi ^*\rangle \) for any \(\psi \in \bar{\mathcal {V}}\) and \(\Vert \psi ^*\Vert = \Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert \). Note that \(\Lambda (\varvec{\theta })-\Lambda (\varvec{\theta }_0)=\dot{\Lambda }(\varvec{\theta }_0)[\varvec{\theta }-\varvec{\theta }_0]\). Therefore, we have that
in distribution. That is \(\sqrt{n}[\varvec{\mu }_1^T(\hat{\varvec{\beta }}_n-\varvec{\beta }_0)+ \int _0^{\tau }\mu _2(t)(\hat{F}_n(t)-F_0(t))dt] \rightarrow N(0, \Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert ^2)\) in distribution.
In particular, if we set \(\mu _2(\cdot )=0\) in the formula (14), then we have \(\Lambda _{\varvec{\beta }}(\varvec{\theta })=\varvec{\mu }_1^T\varvec{\beta }\). The first order directional derivative of \(\Lambda _{\varvec{\beta }}(\varvec{\theta })\) is defined as \(\dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)[\psi ]=\frac{\textrm{d}\Lambda _{\varvec{\beta }}(\varvec{\theta }_0+t\psi )}{\textrm{d}t}\Big |_{t=0}\) and
Similarly, it follows that \(\sqrt{n}\varvec{\mu }_1^T(\hat{\varvec{\beta }}_n-\varvec{\beta }_0) \rightarrow N(0, \Vert \dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)\Vert ^2)\) in distribution. The proof of Theorem 3 is completed. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, P., Han, B. & Wang, X. Case-cohort studies for clustered failure time data with a cure fraction. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01448-7
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-023-01448-7
Keywords
- Case-cohort design
- Clustered failure times
- Cure fraction
- Nonmixture cure model
- Sieve method
- Weighted likelihood