Skip to main content
Log in

Case-cohort studies for clustered failure time data with a cure fraction

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In epidemiological studies, the case-cohort design is a widely used method for their outstanding cost-effectiveness. Most of the existing works for the case-cohort design are focused on univariate failure time data. However, clustered failure time data are commonly encountered in epidemiological studies. In this article, we study the marginal nonmixture cure model for clustered failure time data with a cure fraction in the context of case-cohort design. A sieve semiparametric likelihood method is proposed to estimate the parametric and nonparametric components. The proposed method is easy to implement. The resulting estimators are shown to be strongly consistent and asymptotically normal. Simulation studies are carried out to assess the finite sample performance of the proposed method. We also analyze a real dataset from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial to illustrate our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Amico M, Van Keilegom I (2018) Cure models in survival analysis. Annu Rev Stat Appl 5:311–345

    Article  MathSciNet  Google Scholar 

  • Bahari F, Parsi S, Ganjali M (2021) Empirical likelihood inference in general linear model with missing values in response and covariates by MNAR mechanism. Stat Pap 62(2):591–622

    Article  MathSciNet  MATH  Google Scholar 

  • Barlow WE (1994) Robust variance estimation for the case-cohort design. Biometrics 50(4):1064–1072

    Article  MATH  Google Scholar 

  • Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47(259):501–515

    Article  Google Scholar 

  • Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc Ser B Stat Methodol 11(1):15–53

    MATH  Google Scholar 

  • Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26:211–252

    MATH  Google Scholar 

  • Breslow NE, Wellner JA (2007) Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand J Stat 34(1):86–102

    Article  MathSciNet  MATH  Google Scholar 

  • Chen HY (2001) Weighted semiparametric likelihood method for fitting a proportional odds regression model to data from the case-cohort design. J Am Stat Assoc 96(456):1446–1457

    Article  MathSciNet  MATH  Google Scholar 

  • Chen CM, Lu TFC (2012) Marginal analysis of multivariate failure time data with a surviving fraction based on semiparametric transformation cure models. Comput Stat Data Anal 56(3):645–655

    Article  MathSciNet  MATH  Google Scholar 

  • Chen MH, Ibrahim JG, Sinha D (1999) A new Bayesian model for survival data with a surviving fraction. J Am Stat Assoc 94(447):909–919

    Article  MathSciNet  MATH  Google Scholar 

  • Chen MH, Ibrahim JG, Sinha D (2002) Bayesian inference for multivariate survival data with a cure fraction. J Multivar Anal 80(1):101–126

    Article  MathSciNet  MATH  Google Scholar 

  • Chen YH, Chatterjee N, Carroll RJ (2008) Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association. Biostatistics 9(1):81–99

    Article  MATH  Google Scholar 

  • Chen CM, Lu TFC, Hsu CM (2013) Association estimation for clustered failure time data with a cure fraction. Comput Stat Data Anal 57:210–222

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc B 34:187–220

    MATH  Google Scholar 

  • Deng LF, Ding JL, Liu YY et al (2018) Regression analysis for the proportional hazards model with parameter constraints under case-cohort design. Comput Stat Data Anal 117:194–206

    Article  MathSciNet  MATH  Google Scholar 

  • Ding JL, Chen XL, Fang HY et al (2018) Case-cohort design for accelerated hazards model. Stat Interface 11(4):657–668

    Article  MathSciNet  MATH  Google Scholar 

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38(4):1041–1046

    Article  Google Scholar 

  • Han B, Wang XG (2020) Semiparametric estimation for the non-mixture cure model in case-cohort and nested case-control studies. Comput Stat Data Anal 144(106):874

    MathSciNet  MATH  Google Scholar 

  • Hu T, Xiang LM (2013) Efficient estimation for semiparametric cure models with interval-censored data. J Multivar Anal 121:139–151

    Article  MathSciNet  MATH  Google Scholar 

  • June CH, O’Connor RS, Kawalekar OU et al (2018) CAR T cell immunotherapy for human cancer. Science 359(6382):1361–1365

    Article  Google Scholar 

  • Kalbfleisch JD, Lawless JF (1988) Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 7(1–2):149–160

    Article  Google Scholar 

  • Kuk AYC, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79(3):531–541

    Article  MATH  Google Scholar 

  • Lai X, Yau KKW (2008) Long-term survivor model with bivariate random effects: applications to bone marrow transplant and carcinoma study data. Stat Med 27(27):5692–5708

    Article  MathSciNet  Google Scholar 

  • Li Y, Panagiotou OA, Black A et al (2016) Multivariate piecewise exponential survival modeling. Biometrics 72(2):546–553

    Article  MathSciNet  MATH  Google Scholar 

  • Li W, Li RS, Feng ZD et al (2020) Semiparametric isotonic regression analysis for risk assessment under nested case-control and case-cohort designs. Stat Methods Med Res 29(8):2328–2343

    Article  MathSciNet  Google Scholar 

  • Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New York

    MATH  Google Scholar 

  • Lu SE, Shih JH (2006) Case-cohort designs and analysis for clustered failure time data. Biometrics 62(4):1138–1148

    Article  MathSciNet  MATH  Google Scholar 

  • Ma SG (2007) Additive risk model with case-cohort sampled current status data. Stat Pap 48(4):595–608

    Article  MathSciNet  MATH  Google Scholar 

  • Maller RA, Zhou S (1992) Estimating the proportion of immunes in a censored sample. Biometrika 79(4):731–739

    Article  MathSciNet  MATH  Google Scholar 

  • Niu Y, Peng Y (2013) A semiparametric marginal mixture cure model for clustered survival data. Stat Med 32(14):2364–2373

    Article  MathSciNet  Google Scholar 

  • Niu Y, Peng Y (2014) Marginal regression analysis of clustered failure time data with a cure fraction. J Multivar Anal 123:129–142

    Article  MathSciNet  MATH  Google Scholar 

  • Peng YW, Taylor JMG (2011) Mixture cure model with random effects for the analysis of a multi-center tonsil cancer study. Stat Med 30(3):211–223

    Article  MathSciNet  Google Scholar 

  • Peng YW, Taylor JMG (2014) Cure models in handbook of survival analysis. Chapman and Hall, Boca Raton

    Google Scholar 

  • Peng YW, Xu JF (2012) An extended cure model and model selection. Lifetime Data Anal 18(2):215–233

    Article  MathSciNet  MATH  Google Scholar 

  • Peng YW, Taylor JMG, Yu BB (2007) A marginal regression model for multivariate failure time data with a surviving fraction. Lifetime Data Anal 13(3):351–369

    Article  MathSciNet  MATH  Google Scholar 

  • Pollard D (1984) Convergence of stochastic processes. Springer, New York

    Book  MATH  Google Scholar 

  • Portier F, El Ghouch A, Van Keilegom I (2017) Efficiency and bootstrap in the promotion time cure model. Bernoulli 23(4B):3437–3468

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Prorok PC, Andriole GL, Bresalier RS et al (2000) Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Control Clin Trials 21:273S-309S

    Article  Google Scholar 

  • Segal MR, Neuhaus JM, James IR (1997) Dependence estimation for marginal models of multivariate survival data. Lifetime Data Anal 3(3):251–268

    Article  MATH  Google Scholar 

  • Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81

    Article  MathSciNet  MATH  Google Scholar 

  • Shen XT (1997) On methods of sieves and penalization. Ann Stat 25(6):2555–2591

    Article  MathSciNet  MATH  Google Scholar 

  • Shen XT, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615

    Article  MathSciNet  MATH  Google Scholar 

  • Stone CJ (1982) Optimal global rates of convergence for nonparametric regression. Ann Stat 10(4):1040–1053

    Article  MathSciNet  MATH  Google Scholar 

  • Sy JP, Taylor JMG (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56(1):227–236

    Article  MathSciNet  MATH  Google Scholar 

  • Taylor JMG (1995) Semi-parametric estimation in failure time mixture models. Biometrics 51(3):899–907

    Article  MATH  Google Scholar 

  • Tsodikov A (1998) A proportional hazards model taking account of long-term survivors. Biometrics 54(4):1508–1516

    Article  MathSciNet  MATH  Google Scholar 

  • Tsodikov AD, Ibrahim JG, Yakovlev AY (2003) Estimating cure rates from survival data: an alternative to two-component mixture models. J Am Stat Assoc 98(464):1063–1078

    Article  MathSciNet  Google Scholar 

  • van de Geer SA (2000) Applications of empirical process theory. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • van der Vaart AW (1998) Asymptotic statistics, Cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge

    Google Scholar 

  • van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    Book  MATH  Google Scholar 

  • Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42(1):1–17

    Article  MathSciNet  MATH  Google Scholar 

  • Yakovlev AY, Tsodikov AD (1996) Stochastic models of tumor latency and their biostatistical applications. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Yau KKW, Ng ASK (2001) Long-term survivor mixture model with random effects: application to a multi-centre clinical trial of carcinoma. Stat Med 20(11):1591–1607

    Article  Google Scholar 

  • Zhang H, Schaubel DE, Kalbfleisch JD (2011) Proportional hazards regression for the analysis of clustered survival data from case-cohort studies. Biometrics 67(1):18–28

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao W, Chen YQ, Hsu L (2017) On estimation of time-dependent attributable fraction from population-based case-control studies. Biometrics 73(3):866–875

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the associate editor and two reviewers for their constructive and insightful comments. They are grateful to the National Cancer Institute for access to NCI’s data collected by the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. The statements contained herein are solely those of the authors and do not represent concurrence by NCI. The second author was partially supported by the China Postdoctoral Science Foundation (Grant No. 2021TQ0349). The third author was partially supported by Dalian High-level Talent Innovation Project (Grant No. 2020RD09).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoguang Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (eps 14 KB)

Supplementary file 2 (pdf 95 KB)

Appendix

Appendix

In this Appendix, we will present the proofs of Theorems 13. Let \(W_{\cdot j}=\{T_{\cdot j},\delta _{\cdot j},\varvec{Z}_{\cdot j}\}, j=1,\ldots ,K\) denote the data for a generic cluster and \(\varvec{W}=\{W_{\cdot 1}, \ldots , W_{\cdot K}\}\). Similarly, we denote \(W^{\xi }_{\cdot j}=\{T_{\cdot j},\delta _{\cdot j},\xi _{\cdot j}\varvec{Z}_{\cdot j},\xi _{\cdot j}\}, j=1,\ldots ,K\) as a single observation for a generic cluster under the case-cohort design, and \(\varvec{W}^{\xi }=\{W^{\xi }_{\cdot 1}, \ldots , W^{\xi }_{\cdot K}\}\). Furthermore, we define the function class \(\mathcal {L}_{n}=\{l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })=\sum _{j=1}^{K}l^{w}(\varvec{\theta };W_{\cdot j}^{\xi }):\varvec{\theta }\in \varvec{\Theta }_{n}\}\). In the whole proofs, let \(Pg=\int g(x)\textrm{d}P(x)\), the expectation of g(x) under the distribution P, and \(P_{n}g=n^{-1}\sum _{i=1}^{n} g(X_{i})\), the expectation of g(X) under the empirical measure \(P_n\). We employ \(\widetilde{C}\) to represent a universal positive constant, which may vary from position to position.

For any \(\epsilon >0\), the covering number \(N(\epsilon , \mathcal {L}_{n}, L_1(P_n))\) is defined as the smallest positive integer \(\kappa \), then there exists \(\{\varvec{\theta }^{(1)},\ldots ,\varvec{\theta }^{(\kappa )}\}\) such that

$$\begin{aligned} \min _{k\in \{1,\ldots ,\kappa \}}\frac{1}{n}\sum ^{n}_{i=1}|l_{K}^w(\varvec{\theta };\varvec{W}_{i}^{\xi })-l_{K}^w(\varvec{\theta }^{(k)};\varvec{W}_{i}^{\xi })|< \epsilon , \end{aligned}$$

for all \(\varvec{\theta }\in \varvec{\Theta }_{n}\), where \(k=1,\ldots ,\kappa , \varvec{\theta }^{(k)}=(\varvec{\beta }^{(k)}, F^{(k)})\in \varvec{\Theta }_{n}\). If \(\kappa \) does not exist, we define \(N(\epsilon ,\mathcal {L}_{n},L_1(P_n))=\infty \).

Lemma 1

Under conditions (C1)–(C3), the covering number of the function class \(\mathcal {L}_{n}\) satisfies

$$\begin{aligned} N(\epsilon ,\mathcal {L}_{n},L_1(P_n))\le \widetilde{C}M_{n}^{(m+1)}\epsilon ^{-(m+p+1)}, \end{aligned}$$

where \(\widetilde{C}\) is a constant, \(m=o(n^{\nu })\) with \(0<\nu <1\) and the size of the sieve space \(\varvec{\Theta }_{n}\) is controlled by \(M_{n}=O(n^{c})\) with a constant \(c \in (0,\infty )\).

Proof of Lemma 1

For any \(\varvec{\theta }^{1}=(\varvec{\beta }^{1},F^{1}), \varvec{\theta }^{2}=(\varvec{\beta }^{2},F^{2})\in \varvec{\Theta }_{n}\), under conditions (C1)–(C3), there exists a large enough constant \(\widetilde{C}\) such that

$$\begin{aligned} \mid l_{K}^{w}(\varvec{\theta }^{1};\varvec{W}^{\xi }) - l_{K}^{w}(\varvec{\theta }^{2};\varvec{W}^{\xi })\mid \le \widetilde{C} (\Vert \varvec{\beta }^{1}-\varvec{\beta }^{2}\Vert +\Vert F^{1}-F^{2}\Vert _{\infty }), \end{aligned}$$
(7)

where \(\Vert g\Vert _{\infty }=\sup _t|g(t)|\) for a function g. Denote \(\varvec{\gamma }^{j}=(\gamma _{0,j},\ldots ,\gamma _{m,j})^{T}\) as the Bernstein coefficients vector corresponding to \(F^{j},j=1,2\). Then, we obtain that

$$\begin{aligned} \Vert F^{1}-F^{2}\Vert _{\infty } ={}&\sup _{t} \Bigg | \sum _{k=0}^{m}\gamma _{k,1}B_{k}(t,m,\tau )- \sum _{k=0}^{m}\gamma _{k,2}B_{k}(t,m,\tau ) \Bigg | \nonumber \\ \le {}&\max _{0\le k\le m}\mid \gamma _{k,1}- \gamma _{k,2}\mid \nonumber \\ := {}&\Vert \varvec{\gamma }^{1} -\varvec{\gamma }^{2}\Vert _{\infty }. \end{aligned}$$
(8)

By plugging (7) into (8), it is easy to show

$$\begin{aligned} \mid l_{K}^{w}(\varvec{\theta }^{1};\varvec{W}^{\xi }) - l_{K}^{w}(\varvec{\theta }^{2};\varvec{W}^{\xi })\mid \le \widetilde{C} (\Vert \varvec{\beta }^{1}-\varvec{\beta }^{2}\Vert +\Vert \varvec{\gamma }^{1} -\varvec{\gamma }^{2}\Vert _{\infty }). \end{aligned}$$

Thus, for any \(\varvec{\theta }\in \varvec{\Theta }_{n}\),

$$\begin{aligned} \frac{1}{n}\sum ^{n}_{i=1}\mid l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }_i) - l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi }_i)\mid \le \widetilde{C}(\Vert \varvec{\beta }-\varvec{\beta }^{(k)}\Vert +\Vert \varvec{\gamma }-\varvec{\gamma }^{(k)}\Vert _{\infty }), \end{aligned}$$

where \(k=1,\ldots ,\kappa \). According to Lemma 2.5 of van de Geer (2000), we can show that \(\{\varvec{\beta }\in \mathbb {R}^{p}: \Vert \varvec{\beta }\Vert \le M\}\) is covered by \((10\widetilde{C}M/\epsilon )^{p}\) balls with radius \(\epsilon /(2\widetilde{C})\) and \(\{\varvec{\gamma }\in R^{m+1},\sum _{k=0}^{m}| \gamma _{k}|\le M_{n}\}\) is covered by \((10\widetilde{C}M_{n}/\epsilon )^{m+1}\) balls with radius \(\epsilon /(2\widetilde{C})\). As a consequence, the covering number of the function class \(\mathcal {L}_{n}\) satisfies

$$\begin{aligned} N(\epsilon ,\mathcal {L}_{n},L_1(P_n))\le \left( \frac{10\widetilde{C}M}{\epsilon } \right) ^{p}\cdot \left( \frac{10\widetilde{C}M_{n}}{\epsilon } \right) ^{m+1} \le \widetilde{C}M_{n}^{(m+1)}\epsilon ^{-(m+p+1)}. \end{aligned}$$

We finish the proof of Lemma 1. \(\square \)

Lemma 2

Under conditions (C1)–(C3), we have

$$\begin{aligned} \sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\rightarrow 0, \end{aligned}$$

almost surely.

Proof of Lemma 2

Under conditions (C1)–(C3), we have the uniform bound of \(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })\). Without loss of generality, we assume \(\sup _{\varvec{\theta }\in \varvec{\Theta }}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\le 1\). Then we have \(P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }))^{2}\le P(\sup _{\varvec{\theta }\in \varvec{\Theta }}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|)^{2}\le 1\). Let \(\alpha _{n}= n^{-1/2+\iota }(\log n)^{1/2}\) with \(\nu /2<\iota <1/2\). It is easy to see that \(\{\alpha _{n}\}\) is a nonincreasing sequence of positive numbers. Let \(\epsilon _{n}=\epsilon \alpha _{n}\) with \(\epsilon >0\). For any \(\varvec{\theta }\in \varvec{\Theta }_{n}\) and sufficiently large n, we then have

$$\begin{aligned} \textrm{var} (P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) )/(4\epsilon _{n})^{2}\le \frac{(1/n)P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }))^{2}}{16\epsilon ^{2}\alpha _{n}^{2}} \le \frac{1}{16\epsilon ^{2}n^{2\iota }\log n}\le \frac{1}{2}. \end{aligned}$$
(9)

Denote the observations as \(\mathcal {W}^{\xi }=\{\varvec{W}^{\xi }_1, \ldots , \varvec{W}^{\xi }_n\}\). Following Pollard (1984), we denote \(P^{0}_{n}\) as the signed measure that places mass \(\pm n^{-1}\) at each of the observations \(\mathcal {W}^{\xi }\). According to (p.31 Pollard 1984) and the formula (9), we have the following inequality

$$\begin{aligned} P(\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|>8\epsilon _{n}) \le 4P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) |>2\epsilon _{n}). \end{aligned}$$

Given \(\mathcal {W}^{\xi }\), by the definition of the covering number, we can choose \(\varvec{\theta }^{(1)},\ldots ,\varvec{\theta }^{(\kappa )}\), where \(\kappa =N(\epsilon _{n}/2,\mathcal {L}_n,L_{1}(P_{n}))\), such that

$$\begin{aligned} \min _{k \in \{1,\ldots ,\kappa \}}P_{n}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi })|< \epsilon _{n}/2, \end{aligned}$$

for all \(\varvec{\theta }\in \varvec{\Theta }_{n}\). For each \(\varvec{\theta }\in \varvec{\Theta }_{n}\), write \(\varvec{\theta }^{*}\) for the \(\varvec{\theta }^{(k)}\) at which the minimum is achieved. By some simple calculations, we have the following inequality

$$\begin{aligned} |P_{n}^{o}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-P_{n}^{o}l_{K}^{w}(\varvec{\theta }^{*};\varvec{W}^{\xi }) | \le P_{n}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }^{*};\varvec{W}^{\xi })|. \end{aligned}$$
(10)

According to the formula (10), we obtain

$$\begin{aligned} {}&P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) |>2\epsilon _{n}|\mathcal {W}^{\xi } )\nonumber \\ {}&\quad \le P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} \{ |P^{o}_{n}l_{K}^{w}(\varvec{\theta }^{*};\varvec{W}^{\xi }) |+ P_{n}|l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }^{*};\varvec{W}^{\xi })| \}>2\epsilon _{n}|\mathcal {W}^{\xi } ) \nonumber \\ {}&\quad \le P (\max _{l\in \{1,\ldots ,\kappa \}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi }) |>3\epsilon _{n}/2|\mathcal {W}^{\xi } )\nonumber \\ {}&\quad \le N(\epsilon _{n}/2,\mathcal {L}_{n},L_{1}(P_n)) \max _{l\in \{1,\ldots ,\kappa \}}P ( |P^{o}_{n}l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi }) |>3\epsilon _{n}/2|\mathcal {W}^{\xi }). \end{aligned}$$
(11)

By the definition of the covering number \(N(\epsilon _{n}/2,\mathcal {L}_{n},L_{1}(P_n))\), for each \(\varvec{\theta }^{(k)}\), there exists \(\tilde{\varvec{\theta }}^{(k)} \in \varvec{\Theta }_{n}\) such that \(P_{n}|l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi })-l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi })|<\epsilon _{n}/2\). Then we have

$$\begin{aligned} P ( |P^{o}_{n}l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi }) |>3\epsilon _{n}/2|\mathcal {W}^{\xi } ) \le {}&P ( (P_{n} |l_{K}^{w}(\varvec{\theta }^{(k)};\varvec{W}^{\xi })-l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi })| \\ {}&+|P^{o}_{n}l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi }) | )>3\epsilon _{n}/2|\mathcal {W}^{\xi } ) \\ \le {}&P ( |P^{o}_{n}l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi }) | >\epsilon _{n}|\mathcal {W}^{\xi }). \end{aligned}$$

Combining Lemma 2.2.7 of van der Vaart and Wellner (1996) with \(|l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi })|\le 1\), we then have

$$\begin{aligned} P ( |P^{o}_{n}l_{K}^{w}(\tilde{\varvec{\theta }}^{(k)};\varvec{W}^{\xi }) | >\epsilon _{n}|\mathcal {W}^{\xi } ) \le 2\exp (-n\epsilon _{n}^2/2). \end{aligned}$$
(12)

According to Lemma 1, the formulas (11) and (12), we have

$$\begin{aligned} P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) |>2\epsilon _{n}|\mathcal {W}^{\xi } ) {}&\quad \le 2N(\epsilon _{n}/2,\mathcal {L}_{n},L_{1}(P_n))\exp (-n\epsilon _{n}^2/2)\\ {}&\quad \le 2\widetilde{C}M_n^{(m+1)}(\epsilon _{n}/2)^{-(p+m+1)}\exp (-n\epsilon _{n}^2/2). \end{aligned}$$

By taking expectations over \(\mathcal {W}^{\xi }\), we obtain

$$\begin{aligned} P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })) |>2\epsilon _{n} ) {}&\le 2\widetilde{C}M_n^{(m+1)}(\epsilon _{n}/2)^{-(p+m+1)}\exp (-n\epsilon _{n}^2/2). \end{aligned}$$

According to \(M_n=O(n^c)\), \(m=o(n^\nu )\) and \(\iota > \nu /2\), we can show that

$$\begin{aligned} {}&P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) |>8\epsilon _{n} ) \le 4P (\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}} |P^{o}_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }) |>2\epsilon _{n} )\\ {}&\quad \le 8\widetilde{C}M_n^{(m+1)}(\epsilon _{n}/2)^{-(p+m+1)}\exp (-n\epsilon _{n}^2/2)\\ {}&\quad \le 8\widetilde{C}\exp \{(m+1)c\log n-(p+m+1)[\log (\epsilon n^{-1/2+\iota }(\log n)^{1/2})-\log 2]\\ {}&\qquad -n\epsilon ^2n^{-1+2\iota }\log n/2\}\\ {}&\quad \le 8\widetilde{C}\exp \{(p+m+1)[(c+1/2-\iota )\log n-\log \log n/2 -\log \epsilon +\log 2 ]\\ {}&\qquad -\epsilon ^2n^{2\iota }\log n/2\}\\ {}&\quad \le 8\widetilde{C}\exp (-\widetilde{C}n^{2\iota } \log n). \end{aligned}$$

Hence, we obtain that

$$\begin{aligned} \sum _{n=1}^{\infty } P(\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|>8\epsilon _{n}) <\infty . \end{aligned}$$

By the Borel–Cantelli lemma, we have \(\sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\rightarrow 0\) almost surely. \(\square \)

Proof of Theorem 1

According to Lemmas 1 and 2, we can show that

$$\begin{aligned} N(\epsilon ,\mathcal {L}_{n},L_1(P_n))\le \widetilde{C}M_{n}^{(m+1)}\epsilon ^{-(m+p+1)}, \end{aligned}$$

and

$$\begin{aligned} \sup _{\varvec{\theta }\in \varvec{\Theta }_{n}}|P_{n}l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-Pl_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })|\rightarrow 0, \end{aligned}$$
(13)

almost surely as \(n\rightarrow \infty \). Define \(\varvec{\Theta }_{\epsilon }=\{\varvec{\theta }:\textrm{d}(\varvec{\theta },\varvec{\theta }_0)\ge \epsilon ,\varvec{\theta }\in \varvec{\Theta }_n \}\) for \(\epsilon > 0\),

$$\begin{aligned} {}&\zeta _{1n}=\sup _{\varvec{\theta }\in \varvec{\Theta }_n}|P_nM_{K}(\varvec{\theta };\varvec{W}^{\xi })-PM_{K}(\varvec{\theta };\varvec{W}^{\xi })|, \end{aligned}$$

and

$$\begin{aligned} {}&\zeta _{2n}=P_nM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi })-PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi }), \end{aligned}$$

where \(M_{K}(\varvec{\theta };\varvec{W}^{\xi })=-l_{K}^w(\varvec{\theta };\varvec{W}^{\xi })\). We have

$$\begin{aligned} \inf _{\varvec{\Theta }_{\epsilon }}PM_{K}(\varvec{\theta };\varvec{W}^{\xi })={}&\inf _{\varvec{\Theta }_{\epsilon }}\{PM_{K}(\varvec{\theta };\varvec{W}^{\xi })-P_nM_{K}(\varvec{\theta };\varvec{W}^{\xi }) +P_nM_{K}(\varvec{\theta };\varvec{W}^{\xi })\}\\ \le {}&\zeta _{1n}+ \inf _{\varvec{\Theta }_{\epsilon }}P_nM_{K}(\varvec{\theta };\varvec{W}^{\xi }). \end{aligned}$$

If \(\hat{\varvec{\theta }}_n\in \varvec{\Theta }_{\epsilon }\), we then have

$$\begin{aligned} \inf _{\varvec{\Theta }_{\epsilon }}P_nM_{K}(\varvec{\theta };\varvec{W}^{\xi })=P_nM_{K}(\hat{\varvec{\theta }}_n;\varvec{W}^{\xi }) {}&\le P_nM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi })=\zeta _{2n}+PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi }). \end{aligned}$$

By identification of the model (1), we obtain that \(\delta _\epsilon =\inf _{\varvec{\Theta }_{\epsilon }}PM_{K}(\varvec{\theta };\varvec{W}^{\xi })-PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi })>0\). By conditions (C1)-(C3), it is easy to show

$$\begin{aligned} \inf _{\varvec{\Theta }_{\epsilon }}PM_{K}(\varvec{\theta };\varvec{W}^{\xi })\le \zeta _{1n}+\zeta _{2n}+PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi })=\zeta _{n}+PM_{K}(\varvec{\theta }_0;\varvec{W}^{\xi }), \end{aligned}$$

with \(\zeta _{n}=\zeta _{1n}+\zeta _{2n}\). Hence, we have \(\zeta _{n}\ge \delta _\epsilon \). It is indicated that \(\{\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\ge \epsilon \}\subseteq \{\zeta _n\ge \delta _\epsilon \}\). According to the formula (13) and the strong law of large numbers, we obtain that both \(\zeta _{1n}\rightarrow 0\) and \(\zeta _{2n}\rightarrow 0\) almost surely as \(n \rightarrow \infty \). Thus, \(\bigcup ^{\infty }_{k=1}\bigcap ^{\infty }_{n=k} \{\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\ge \epsilon \} \subseteq \bigcup ^{\infty }_{k=1}\bigcap ^{\infty }_{n=k} \{\zeta _n\ge \delta _\epsilon \}\), which proves that \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\rightarrow 0\) almost surely. It is easy to see that \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\) and \(\Vert \hat{F}_n- F_0\Vert _2 \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\). Therefore, we have \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \rightarrow 0\) and \(\Vert \hat{F}_n- F_0\Vert _2 \rightarrow 0\) almost surely. We complete the proof of Theorem 1. \(\square \)

Proof of Theorem 2

Now we will obtain the convergence rate of \(\hat{\varvec{\theta }}_n\). By Theorem 1.6.2 of Lorentz (1986), there exists a Bernstein polynomial \(F_{n0}\) which satisfies \(\Vert F_{n0}-F_0\Vert _\infty =O(m^{-r/2})\). For convenience, we define \(\varvec{\theta }_{n0}=(\varvec{\beta }_0,F_{n0})\). It is easy to see that \(\textrm{d}(\varvec{\theta }_{n0},\varvec{\theta }_0)=O(n^{-r\nu /2})\). For any \(\eta >0\), we define the class of functions

$$\begin{aligned} \mathcal {F}_{\eta }=\{l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }):\varvec{\theta }\in \varvec{\Theta }_{n}, \eta /2<\textrm{d}(\varvec{\theta },\varvec{\theta }_{n0})\le \eta \}, \end{aligned}$$

for a given observation \(\varvec{W}^{\xi }\). One can easily obtain that \(P(l_{K}^{w}(\varvec{\theta }_{0};\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))\le \widetilde{C}\textrm{d}(\varvec{\theta },\varvec{\theta }_{n0})\le \widetilde{C}n^{-r\nu /2}\). Using the relationship between Hellinger distance and Kullback–Leibler information, we then have

$$\begin{aligned} P(l_{K}^{w}(\varvec{\theta }_{0};\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi }))\ge \widetilde{C}\textrm{d}^2(\varvec{\theta }_0,\varvec{\theta }). \end{aligned}$$

Therefore, for sufficiently large n, it yields that

$$\begin{aligned} P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi })) ={}&P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })- l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi }))+P(l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi }) \\ {}&-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi })) \\ \le {}&-\widetilde{C}\eta ^2+\widetilde{C}n^{-r\nu /2}\\ ={}&-\widetilde{C}\eta ^2, \end{aligned}$$

for any \(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }) \in \mathcal {F}_{\eta }\).

Note that \(\mathcal {F}_{\eta }\) is uniformly bounded under conditions (C1)–(C3). Furthermore, by some algebraic manipulations, we obtain \(P(l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))^2\le \widetilde{C}\eta ^2\) for any \(l^{w}(\varvec{\theta };W^{\xi })-l^{w}(\varvec{\theta }_{n0};W^{\xi }) \in \mathcal {F}_{\eta }\). By Lemma 3.4.2 of van der Vaart and Wellner (1996), we can prove that

$$\begin{aligned} E_P \Vert n^{1/2}(P_n-P) \Vert _{\mathcal {F}_{\eta }} \le \widetilde{C}J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\} \left[ 1+\frac{J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\}}{\eta ^2n^{1/2}} \right] , \end{aligned}$$

where \(J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\}=\int ^{\eta }_0 [1+\log N_{[]}\{\epsilon ,\mathcal {F}_{\eta },L_2(P)\}]^{1/2}\textrm{d}\epsilon \). For \(0<\epsilon <\eta \), by (Shen and Wong 1994, p. 597), we have \(\log N_{[]}\{\epsilon ,\mathcal {F}_{\eta },L_2(P)\}\le \widetilde{C}N\log (\eta /\epsilon )\) with \(N=m+1\). Then we can show that \(J_{[]}\{\eta ,\mathcal {F}_{\eta },L_2(P)\} \le \widetilde{C}N^{1/2}\eta \). This yields \(\varphi _n(\eta )=N^{1/2}\eta +N/n^{1/2}\). One can easily show that \(\varphi _n(\eta )/\eta \) is monotonically decreasing with respect to \(\eta \) and \(r_n^2\varphi _n(1/r_n)=r_nN^{1/2}+r_n^2N/n^{1/2}\le \widetilde{C}n^{1/2}\), where \(r_n=N^{-1/2}n^{1/2}=n^{(1-\nu )/2}\).

Note that \(P_n(l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_{n0};\varvec{W}^{\xi }))\ge 0\) and \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{n0})\le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{0})+\textrm{d}(\varvec{\theta }_0,\varvec{\theta }_{n0})\rightarrow 0\) in probability. Therefore, by Theorem 3.4.1 of van der Vaart and Wellner (1996), we have \(n^{(1-\nu )/2}\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{n0})=O_p(1)\). Together with \(\textrm{d}(\varvec{\theta }_{n0},\varvec{\theta }_{0})=O(n^{-r\nu /2})\), it is easy to show that \(\textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_{0})=O_p(n^{-(1-\nu )/2}+n^{-r\nu /2})\). According to \(\Vert \hat{\varvec{\beta }}_n- \varvec{\beta }_0\Vert \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\) and \(\Vert \hat{F}_n- F_0\Vert _2 \le \textrm{d}(\hat{\varvec{\theta }}_n,\varvec{\theta }_0)\), we have \(\Vert \hat{\varvec{\beta }}_n-\varvec{\beta }_0\Vert = O_p(n^{-\min (rv/2,(1-v)/2)})\) and \(\Vert \hat{F}_n-F_0\Vert _2 = O_p(n^{-\min (rv/2,(1-v)/2)})\). The proof of Theorem 2 is completed. \(\square \)

Proof of Theorem 3

First, we will present some necessary notations. The linear span of \(\varvec{\Theta }-\varvec{\theta }_0\) is denoted as \(\mathcal {V}\), where \(\varvec{\Theta }\) is the parameter space and \(\varvec{\theta }_0\) is the true value of \(\varvec{\theta }\). For convenience, we denote \(\varrho _n=n^{-\min \{rv/2,\frac{1-v}{2}\}}\). Following the arguments of van der Vaart (1998, p. 296), for any \(\varvec{\theta }\in \{\varvec{\theta }\in \varvec{\Theta }:\Vert \varvec{\theta }-\varvec{\theta }_0\Vert =O(\varrho _n)\}\), the first order directional derivative of \(l_{K}^w(\varvec{\theta };\varvec{W}^\xi )\) at the direction \(\psi \in \mathcal {V}\) is defined as

$$\begin{aligned} \dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ]=\frac{\textrm{d}l_{K}^w(\varvec{\theta }+t\psi ;\varvec{W}^\xi )}{\textrm{d}t}\biggl |_{t=0}. \end{aligned}$$

The second order directional derivative of \(l_{K}^w(\varvec{\theta };\varvec{W}^\xi )\) is given by

$$\begin{aligned} \ddot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ,\tilde{\psi }]= \frac{\textrm{d}^2l_{K}^w(\varvec{\theta }+t\psi +\tilde{t}\tilde{\psi };\varvec{W}^\xi )}{\textrm{d}\tilde{t}\textrm{d}t}\biggl |_{t=0,\tilde{t}=0} =\frac{\textrm{d}\dot{l}_{K}^w(\varvec{\theta }+\tilde{t}\tilde{\psi };\varvec{W}^\xi )}{\textrm{d}\tilde{t}}\biggl |_{\tilde{t}=0}. \end{aligned}$$

The Fisher inner product on the space \(\mathcal {V}\) is denoted by \(\langle \psi ,\tilde{\psi } \rangle =P(\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ]\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\tilde{\psi }])\). Write a smooth functional of \(\varvec{\theta }\) as follows

$$\begin{aligned} \Lambda (\varvec{\theta }) = \varvec{\mu }_1^T\varvec{\beta }+ \int _0^{\tau }\mu _2(t)F(t)dt, \end{aligned}$$
(14)

where \(||\varvec{\mu }_1||\le 1\) and \(\mu _2(t) \in \mathcal {F}\). The Fisher norm for \(\psi \in \mathcal {V}\) is defined as \(\Vert \psi \Vert ^{1/2}=\langle \psi ,\psi \rangle \). The closed linear span of \(\mathcal {V}\) is denoted as \(\overline{\mathcal {V}}\) under the Fisher norm. We can easily find that \((\overline{\mathcal {V}},\Vert \cdot \Vert )\) is a Hilbert space. Similar to \(\dot{l}_{K}^w(\varvec{\theta };\varvec{W}^\xi )[\psi ]\), for any \(\psi \in \mathcal {V}\), the first directional derivative of \(\Lambda (\varvec{\theta })\) at \(\varvec{\theta }_0\) is given by

$$\begin{aligned} \dot{\Lambda }(\varvec{\theta }_0)[\psi ]=\frac{\textrm{d}\Lambda (\varvec{\theta }_0+t\psi )}{\textrm{d}t}\biggl |_{t=0}. \end{aligned}$$

Similar to Shen (1997), \(\dot{\Lambda }(\varvec{\theta }_0)[\psi ]\) is linear in \(\psi \) and

$$\begin{aligned} \Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert =\sup _{\psi \in \overline{\mathcal {V}}:\Vert \psi \Vert >0} \frac{|\dot{\Lambda }(\varvec{\theta }_0)[\psi ]|}{\Vert \psi \Vert }<\infty . \end{aligned}$$
(15)

For any \(\psi ^*\in \varvec{\Theta }\), there exists \(\psi _n^*\in \varvec{\Theta }_n\) such that \(\Vert \psi _n^*-\psi ^*\Vert =O(n^{-\frac{rv}{2}})\). Then we have \(\varrho _n\Vert \psi _n^*-\psi ^*\Vert =o(n^{-1/2})\) with \(r>1\) and \(r\nu >1/2\). For convenience, we define \(\rho [\varvec{\theta }-\varvec{\theta }_0;\varvec{W}^{\xi }]=l_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })- \dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0]\) and let \(\epsilon _n\) be any positive sequence satisfying \(\epsilon _n=o(n^{-1/2})\). Some algebra yields that

$$\begin{aligned} 0 \le {}&P_n[l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^{\xi })-l_{K}^{w}(\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*;\varvec{W}^{\xi })]\\ ={}&P_n\{[l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })] - [l_{K}^{w}(\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*;\varvec{W}^{\xi })-l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })]\}\\ ={}&\mp \epsilon _n P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]+P_n\{\rho [\hat{\varvec{\theta }}_n-\varvec{\theta }_0;\varvec{W}^{\xi }]-\rho [\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*-\varvec{\theta }_0;\varvec{W}^{\xi }]\}\\ ={}&\mp \epsilon _n P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi ^*]\mp \epsilon _n P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*] +(P_n-P)\\ {}&\cdot \{\rho [\hat{\varvec{\theta }}_n-\varvec{\theta }_0;\varvec{W}^{\xi }]-\rho [\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*-\varvec{\theta }_0;\varvec{W}^{\xi }]\}+P\{\rho [\hat{\varvec{\theta }}_n-\varvec{\theta }_0;\varvec{W}^{\xi }]\\ {}&-\rho [\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*-\varvec{\theta }_0;\varvec{W}^{\xi }]\}\\ :={}&\mp \epsilon _n P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi ^*]+ I_1 + I_2 + I_3. \end{aligned}$$

Next, we will discuss the calculation details of \(I_1, I_2\) and \(I_3\) respectively. For \(I_1\), according to the Chebyshev’s inequality, \(\Vert \psi _n^*-\psi ^*\Vert =O(n^{-\frac{rv}{2}})\) and \(P\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*]=0\), it is easy to show that

$$\begin{aligned} {}&P\left( \frac{\mid P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*] \mid }{n^{-1/2}} \ge \epsilon \right) \\ {}&\quad \le \frac{\textrm{Var}(P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*])}{n^{-1}\epsilon ^2} \\ {}&\quad = \frac{\textrm{Var}(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*])}{\epsilon ^{2}} \\ {}&\quad = \frac{P(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*]\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*])}{\epsilon ^2} \\ {}&\quad = \frac{\Vert \psi _n^*-\psi ^* \Vert ^2}{\epsilon ^2} \\ {}&\quad \rightarrow 0, \end{aligned}$$

which implies that \(P_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*-\psi ^*]=o_p(n^{-1/2})\) and \(I_1=\epsilon _n\times o_p(n^{-1/2})\).

For \(I_2\), by some algebraic calculations, we have

$$\begin{aligned} I_2 ={}&(P_n-P)(\rho [\hat{\varvec{\theta }}_n-\varvec{\theta }_0;\varvec{W}^{\xi }]-\rho [\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*-\varvec{\theta }_0;\varvec{W}^{\xi }])\\ ={}&(P_n-P)(\dot{l}_{K}^{w}(\tilde{\varvec{\theta }};\varvec{W}^{\xi })[\mp \epsilon _n\psi _n^*] \pm \epsilon _n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*])\\ ={}&\mp \epsilon _n (P_n-P)(\dot{l}_{K}^{w}(\tilde{\varvec{\theta }};\varvec{W}^{\xi })[\psi _n^*] -\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]), \end{aligned}$$

where \(\tilde{\varvec{\theta }}\) belongs between \(\hat{\varvec{\theta }}_n\) and \(\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*\). Denote the function class \(\mathcal {L}_3=\{\dot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]:\varvec{\theta }\in \Theta _n \text { and } \Vert \varvec{\theta }-\varvec{\theta }_0\Vert =O(\varrho _n)\}\). For any \(\dot{l}_{K}^{w}(\varvec{\theta }_i;\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]\in \mathcal {L}_3 \) \((i=1,2)\), we have \(\bigl |\dot{l}_{K}^{w}(\varvec{\theta }_1;\varvec{W}^{\xi })[\psi _n^*]-\dot{l}_{K}^{w}(\varvec{\theta }_2;\varvec{W}^{\xi })[\psi _n^*]\bigr |\le \widetilde{C}\Vert \varvec{\theta }_1-\varvec{\theta }_2\Vert \). Thus, it follows that

$$\begin{aligned} N(\epsilon ,\mathcal {L}_3,L_2(Q))\le N(\epsilon ,\{\varvec{\theta }:\varvec{\theta }\in \varvec{\Theta }_n \text { and } \Vert \varvec{\theta }-\varvec{\theta }_0\Vert \le \widetilde{C}\varrho _n\},\Vert \cdot \Vert ). \end{aligned}$$

Note that \(N(\epsilon ,\mathcal {L}_3,L_2(Q)) \le \widetilde{C}e^{3/\epsilon }\) such that \(\int _0^{\infty }\sup _{Q}\sqrt{\log N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon = \int _0^{\widetilde{C}\varrho _n}\sup _{Q}\sqrt{\log N(\epsilon ,\mathcal {L}_3,L_2(Q))}d\epsilon \) \(+ \int _{\widetilde{C}\varrho _n}^{\infty } 0 d\epsilon < \infty \) with \(v>1/2\) and \(r>1\). Under conditions (C1)-(C3), \(\dot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\psi _n^*]\) is uniformly bounded. According to Theorem 2.8.3 of van der Vaart and Wellner (1996), we know that \(\mathcal {L}_3\) is a Donsker class. Furthermore, it follows from Corollary 2.3.12 of van der Vaart and Wellner (1996) that

$$\begin{aligned} (P_n-P)(\dot{l}_{K}^{w}(\tilde{\varvec{\theta }};\varvec{W}^{\xi })[\psi _n^*] -\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\psi _n^*]) =o_p(n^{-1/2}). \end{aligned}$$

Therefore, we have \(I_2=\epsilon _n\times o_p(n^{-1/2})\).

For \(I_3\), note that \(P(\ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0])= -P(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0]\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\hat{\varvec{\theta }}_n-\varvec{\theta }_0])\). For any \(\varvec{\theta }\in \{\varvec{\theta }:\textrm{d}(\varvec{\theta }-\varvec{\theta }_0)=O(\varrho _n)\}\), we have \(P(\ddot{l}_{K}^{w}(\varvec{\theta };\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0,\varvec{\theta }-\varvec{\theta }_0] - \ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^{\xi })[\varvec{\theta }-\varvec{\theta }_0,\varvec{\theta }-\varvec{\theta }_0]) = O(\varrho _n^3)\) and \(\varrho _n^3 = o(n^{-1})\) with \(v<1/3\) and \(r>2\). Then, it yields that

$$\begin{aligned} {}&P(\rho [\hat{\varvec{\theta }}_n-\varvec{\theta }_0;\varvec{W}^\xi ]) \\ {}&\quad = P(l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^\xi ) - l_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi ) - \dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\hat{\varvec{\theta }}_n-\varvec{\theta }_0]) \\ {}&\quad = \frac{1}{2}P(\ddot{l}_{K}^{w}(\tilde{\varvec{\theta }};\varvec{W}^\xi )[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0] - \ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0])\\ {}&\qquad \ + \frac{1}{2}P(\ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0]) \\ {}&\quad = \epsilon _n \times o_p(n^{-1/2}) + \frac{1}{2}P(\ddot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\hat{\varvec{\theta }}_n-\varvec{\theta }_0,\hat{\varvec{\theta }}_n-\varvec{\theta }_0]) \\ {}&\quad = \epsilon _n \times o_p(n^{-1/2}) - \frac{1}{2}\Vert \hat{\varvec{\theta }}_n-\varvec{\theta }_0\Vert ^2, \end{aligned}$$

where \(\tilde{\varvec{\theta }}\) is between \(\hat{\varvec{\theta }}_n\) and \(\varvec{\theta }_0\). Combining the Cauchy–Schwarz inequality, \(\Vert \psi _n^*\Vert ^2\rightarrow \Vert \psi ^*\Vert ^2<\infty \) and \(\varrho _n\Vert \psi _n^*-\psi ^*\Vert =o(n^{-1/2})\), we have that

$$\begin{aligned} I_3 ={}&- \frac{1}{2}\Vert \hat{\varvec{\theta }}_n-\varvec{\theta }_0\Vert ^2 + \frac{1}{2}\Vert \hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*-\varvec{\theta }_0\Vert ^2 + \epsilon _n\times o_p(n^{-1/2})\\ ={}&\pm \epsilon _n \langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi _n^*\rangle +\frac{1}{2}\Vert \epsilon _n\psi _n^*\Vert ^2 + \epsilon _n\times o_p(n^{-1/2}) \\ ={}&\pm \epsilon _n\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle +\frac{1}{2}\epsilon _n^2\Vert \psi _n^*\Vert ^2 + \epsilon _n\times o_p(n^{-1/2}) \\ ={}&\pm \epsilon _n\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle + \epsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

According to the results of \(I_1,I_2\) and \(I_3\), we obtain

$$\begin{aligned} 0 \le {}&P_n( l_{K}^{w}(\hat{\varvec{\theta }}_n;\varvec{W}^\xi ) - l_{K}^{w}(\hat{\varvec{\theta }}_n\pm \epsilon _n\psi _n^*;\varvec{W}^\xi ) )\nonumber \\ ={}&\mp \epsilon _nP_n\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*] \pm \epsilon _n\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle + \epsilon _n\times o_p(n^{-1/2}). \end{aligned}$$

By some algebraic calculations, we then have \(\sqrt{n}\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle = \sqrt{n}(P_n-P)\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*] + o_p(1)\) with \(P\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*]=0\) and \(Var(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*]) = \Vert \psi ^*\Vert ^2<\infty \). By the central limit theorem, it yields that \(\sqrt{n}\langle \hat{\varvec{\theta }}_n-\varvec{\theta }_0,\psi ^*\rangle \rightarrow N(0,\Vert \psi ^*\Vert ^2)\). By the Riesz representation theorem, there exists \(\psi ^*\in {\bar{\mathcal {V}}} \text { such that } \dot{\Lambda }(\varvec{\theta }_0)[\psi ]=\langle \psi ,\psi ^*\rangle \) for any \(\psi \in \bar{\mathcal {V}}\) and \(\Vert \psi ^*\Vert = \Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert \). Note that \(\Lambda (\varvec{\theta })-\Lambda (\varvec{\theta }_0)=\dot{\Lambda }(\varvec{\theta }_0)[\varvec{\theta }-\varvec{\theta }_0]\). Therefore, we have that

$$\begin{aligned} \sqrt{n}(\Lambda (\hat{\varvec{\theta }}_n)-\Lambda (\varvec{\theta }_0))=\sqrt{n}(P_n-P)(\dot{l}_{K}^{w}(\varvec{\theta }_0;\varvec{W}^\xi )[\psi ^*]) + o_p(1) \rightarrow N(0,\Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert ^2), \end{aligned}$$

in distribution. That is \(\sqrt{n}[\varvec{\mu }_1^T(\hat{\varvec{\beta }}_n-\varvec{\beta }_0)+ \int _0^{\tau }\mu _2(t)(\hat{F}_n(t)-F_0(t))dt] \rightarrow N(0, \Vert \dot{\Lambda }(\varvec{\theta }_0)\Vert ^2)\) in distribution.

In particular, if we set \(\mu _2(\cdot )=0\) in the formula (14), then we have \(\Lambda _{\varvec{\beta }}(\varvec{\theta })=\varvec{\mu }_1^T\varvec{\beta }\). The first order directional derivative of \(\Lambda _{\varvec{\beta }}(\varvec{\theta })\) is defined as \(\dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)[\psi ]=\frac{\textrm{d}\Lambda _{\varvec{\beta }}(\varvec{\theta }_0+t\psi )}{\textrm{d}t}\Big |_{t=0}\) and

$$\begin{aligned} \Vert \dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)\Vert =\sup _{\psi \in \overline{\mathcal {V}}:\Vert \psi \Vert >0} \frac{|\dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)[\psi ]|}{\Vert \psi \Vert }. \end{aligned}$$
(16)

Similarly, it follows that \(\sqrt{n}\varvec{\mu }_1^T(\hat{\varvec{\beta }}_n-\varvec{\beta }_0) \rightarrow N(0, \Vert \dot{\Lambda }_{\varvec{\beta }}(\varvec{\theta }_0)\Vert ^2)\) in distribution. The proof of Theorem 3 is completed. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, P., Han, B. & Wang, X. Case-cohort studies for clustered failure time data with a cure fraction. Stat Papers (2023). https://doi.org/10.1007/s00362-023-01448-7

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-023-01448-7

Keywords

Mathematics Subject Classification

Navigation