Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

Zhou, Qingning; Cai, Jianwen; Zhou, Haibo

doi:10.1007/s10985-019-09461-5

Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

Published: 07 January 2019

Volume 26, pages 85–108, (2020)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

4 Citations
Explore all metrics

Abstract

We propose a two-stage outcome-dependent sampling design and inference procedure for studies that concern interval-censored failure time outcomes. This design enhances the study efficiency by allowing the selection probabilities of the second-stage sample, for which the expensive exposure variable is ascertained, to depend on the first-stage observed interval-censored failure time outcomes. In particular, the second-stage sample is enriched by selectively including subjects who are known or observed to experience the failure at an early or late time. We develop a sieve semiparametric maximum pseudo likelihood procedure that makes use of all available data from the proposed two-stage design. The resulting regression parameter estimator is shown to be consistent and asymptotically normal, and a consistent estimator for its asymptotic variance is derived. Simulation results demonstrate that the proposed design and inference procedure performs well in practical situations and is more efficient than the existing designs and methods. An application to a phase 3 HIV vaccine trial is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Estimation of Semiparametric Transformation Model with Interval-Censored Data in Two-Phase Cohort Studies

Article 25 October 2023

Analysis of Informatively Interval-Censored Case–Cohort Studies with the Application to HIV Vaccine Trials

Article 23 March 2023

New approaches for censored longitudinal data in joint modelling of longitudinal and survival data, with application to HIV vaccine studies

Article 08 June 2018

References

Bickel PJ, Klaassen CA, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Chatterjee N, Chen Y-H, Breslow NE (2003) A pseudoscore estimator for regression problems with two-phase sampling. J Am Stat Assoc 98(461):158–168
Article MathSciNet Google Scholar
Chen D-G, Sun J, Peace KE (2012) Interval-censored time-to-event data: methods and applications. CRC Press, Boca Raton
Book Google Scholar
Chen K, Lo S-H (1999) Case-cohort and case-control analysis with Cox’s model. Biometrika 86(4):755–764
Article MathSciNet Google Scholar
Cornfield J (1951) A method of estimating comparative rates from clinical data. applications to cancer of the lung, breast, and cervix. J Nat Cancer Inst 11(6):1269–1275
Google Scholar
Ding J, Zhou H, Liu Y, Cai J, Longnecker MP (2014) Estimating effect of environmental contaminants on women’s subfecundity for the MoBa study data with an outcome-dependent sampling scheme. Biostatistics 15(4):636–650
Article Google Scholar
Ding J, Lu T-S, Cai J, Zhou H (2017) Recent progresses in outcome-dependent sampling with failure time data. Lifetime Data Anal 23(1):57–82
Article MathSciNet Google Scholar
Gilbert PB, Peterson ML, Follmann D, Hudgens MG, Francis DP, Gurwith M, Heyward WL, Jobes DV, Popovic V, Self SG, Sinangil F, Burke D, Berman PW (2005) Correlation between immunologic responses to a recombinant glycoprotein 120 vaccine and incidence of HIV-1 infection in a phase 3 HIV-1 preventive vaccine trial. J Infect Dis 191(5):666–677
Article Google Scholar
Harro CD, Judson FN, Gorse GJ, Mayer KH, Kostman JR, Brown SJ, Koblin B, Marmor M, Bartholow BN, Popovic V et al (2004) Recruitment and baseline epidemiologic profile of participants in the first phase 3 HIV vaccine efficacy trial. J Acquir Immune Defic Syndr 37(3):1385–1392
Article Google Scholar
Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92(439):960–967
Article MathSciNet Google Scholar
Huang J, Wellner JA (1997) Interval censored survival data: a review of recent progress. In: Proceedings of the first Seattle symposium in biostatistics, pp 123–169. Springer
Huang J, Zhang Y, Hua L (2012) Consistent variance estimation in semiparametric models with application to interval-censored data. In: Chen DG, Sun J, Peace KE (eds)Interval-censored time-to-event data: methods and applications, pp 233–268
Google Scholar
Kang S, Cai J (2009) Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96(4):887–901
Article MathSciNet Google Scholar
Kulich M, Lin D (2004) Improving the efficiency of relative-risk estimation in case-cohort studies. J Am Stat Assoc 99(467):832–844
Article MathSciNet Google Scholar
Li Z, Nan B (2011) Relative risk regression for current status data in case-cohort studies. Canad J Stat 39(4):557–577
Article MathSciNet Google Scholar
Li Z, Gilbert P, Nan B (2008) Weighted likelihood method for grouped survival data in case-cohort studies with application to HIV vaccine trials. Biometrics 64(4):1247–1255
Article MathSciNet Google Scholar
Lorentz GG (1986) Bernstein polynomials. Chelsea Publishing Co, New York
MATH Google Scholar
Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11
Article MathSciNet Google Scholar
Self SG, Prentice RL (1988) Asymptotic distribution theory and efficiency results for case-cohort studies. Ann Stat 16(1):64–81
Article MathSciNet Google Scholar
Shen X, Wong WH (1994) Convergence rate of sieve estimates. Ann Stat 22(2):580–615
Article MathSciNet Google Scholar
Song R, Zhou H, Kosorok MR (2009) A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika 96(1):221–228
Article MathSciNet Google Scholar
Sun J (2006) The statistical analysis of interval-censored failure time data. Springer, New York
MATH Google Scholar
Sun Y, Qian X, Shou Q, Gilbert PB (2017) Analysis of two-phase sampling data with semiparametric additive hazards models. Lifetime Data Anal 23(3):377–399
Article MathSciNet Google Scholar
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with applications to statistics. Springer, New York
Book Google Scholar
Weaver MA, Zhou H (2005) An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. J Am Stat Assoc 100(470):459–469
Article MathSciNet Google Scholar
Whittemore AS (1997) Multistage sampling designs and estimating equations. J R Stat Soc B 59(3):589–602
Article MathSciNet Google Scholar
Xue H, Lam K, Li G (2004) Sieve maximum likelihood estimator for semiparametric regression models with current status data. J Am Stat Assoc 99(466):346–356
Article MathSciNet Google Scholar
Yu J, Liu Y, Sandler DP, Zhou H (2015) Statistical inference for the additive hazards model under outcome-dependent sampling. Canad J Stat 43(3):436–453
Article MathSciNet Google Scholar
Zeng D, Lin DY (2014) Efficient estimation of semiparametric transformation models for two-phase cohort studies. J Am Stat Assoc 109(505):371–383
Article MathSciNet Google Scholar
Zhang Y, Hua L, Huang J (2010) A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand J Stat 37(2):338–354
Article MathSciNet Google Scholar
Zhou H, Weaver M, Qin J, Longnecker M, Wang M (2002) A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58(2):413–421
Article MathSciNet Google Scholar
Zhou H, Song R, Wu Y, Qin J (2011) Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67(1):194–202
Article MathSciNet Google Scholar
Zhou Q, Zhou H, Cai J (2017a) Case-cohort studies with interval-censored failure time data. Biometrika 104(1):17–29
Article MathSciNet Google Scholar
Zhou Q, Hu T, Sun J (2017b) A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J Am Stat Assoc 112(518):664–672
Article MathSciNet Google Scholar
Zhou Q, Cai J, Zhou H (2018) Outcome-dependent sampling with interval-censored failure time data. Biometrics 74(1):58–67
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Editor, Associate Editor and reviewers for their helpful comments and suggestions that have improved the paper. The authors also thank the Global Solutions in Infectious Diseases (GSID) and Dr. Peter Gilbert for providing data from the phase 3 HIV vaccine trial VAX004. This research was partially supported by grants from the National Institutes of Health (R01ES021900, P01CA142538 and P30ES010126). Qingning Zhou’s work was supported, in part, by funds provided by the University of North Carolina at Charlotte.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of North Carolina at Charlotte, Fretwell 335L, 9201 University City Blvd., Charlotte, NC, 28223, USA
Qingning Zhou
Department of Biostatistics, University of North Carolina at Chapel Hill, 3101D McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
Jianwen Cai
Department of Biostatistics, University of North Carolina at Chapel Hill, 3104C McGavran-Greenberg Hall, Chapel Hill, NC, 27599, USA
Haibo Zhou

Authors

Qingning Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jianwen Cai
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingning Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Materials

The supplementary materials include the two lemmas and their proofs as well as some additional simulation results for a smaller cohort size N=1000. (171 KB)

Appendix: Proofs of Theorems 1 and 2

In the appendix, we sketch the proofs of Theorems 1 and 2. Let $O\,=\,\left\{ Y=\{U,\,V,\,\Delta _1=I(T\le U),\,\Delta _2=I(U<T\le V)\},\, RZ,\, R\right\} $ denote a single observation, where U and V are two random examination times, Z is the p-dimensional covariate vector and R is the indicator of an observation being in the validation sample. The following regularity conditions are needed for proving the theorems:

(C1)
There exists $\eta >0$ such that $P(V-U\ge \eta )=1$. The union of the supports of U and V is contained in the interval $[\sigma ,\tau ]$, where $0<\sigma<\tau <+\infty $.
(C2)
The distribution of Z, denoted by $G_Z(z)$, has a bounded support and is not concentrated on any proper subspace of $R^p$.
(C3)
For $r=1$ or 2, the function $\Lambda _0(t)\in \mathcal {M}$ is continuously differentiable up to order r in $[\sigma ,\tau ]$ with the first derivative being strictly positive, and satisfies $\alpha ^{-1}<\Lambda _0(\sigma )<\Lambda _0(\tau )<\alpha $ for some positive constant $\alpha $. Also $\beta _0$ is an interior point of $\mathcal {B}$, a compact subset of $R^p$. $\mathcal {M}$ and $\mathcal {B}$ are defined in Sect. 3.
(C4)
The conditional density g(u, v|z) of (u, v) given z has bounded partial derivatives with respect to u and v, and the bounds of these partial derivatives do not depend on (u, v, z).
(C5)
$E\{\mathrm{var}(Z|U)\}$ and $E\{\mathrm{var}(Z|V)\}$ are positive definite.

These conditions are commonly used in the studies of interval-censored data (e.g. Huang and Wellner 1997; Huang and Rossini 1997; Zhang et al. 2010). In addition, similarly as in Zeng and Lin (2014), one can show that the conclusions of Theorems 1&2 hold under the proposed sampling scheme if they hold under independent sampling. Specifically, Zeng and Lin (2014) in their “Appendix” established the following result under general two-phase cohort studies based on Le Cam’s third lemma: the consistency and asymptotic normality of MLE hold under the sampling mechanism satisfying their condition (C.6) if they hold under independent sampling. It is easy to verify that our two-stage ODS scheme satisfies the condition (C.6) in Zeng and Lin (2014) with

$$\begin{aligned} p(1;y)= & {} I(y\in A_1)[p_0+(1-p_0)p_1]+I(y\in A_2)[p_0+(1-p_0)p_2]\\&+I(y\in A_3)p_0+I(y\in A_4)p_0, \end{aligned}$$

where $y=\{u,v,\delta _1,\delta _2\}$ is the outcome, $\{A_1,A_2,A_3,A_4\}$ are the four strata defined based on the outcome given in (1), $p_0$ is the sampling fraction of the SRS component, and $p_1$ and $p_2$ are the sampling fraction of the supplemental components from the two tail strata $A_1$ and $A_2$ respectively. Thus, we assume in the following that the observations $\{O_i,\,i=1,\ldots ,N\}$ are independent and identically distributed.

Note that the proofs of our theorems differ from those of Zhou et al. (2017b, 2018) in several aspects. First, our likelihood function is not exact, since an estimate of the covariate distribution rather than the true one is used. Thus, we have to deal with the difference between our approximate likelihood and the exact likelihood that assumes the covariate distribution to be known. For establishing consistency and rate of convergence, the proofs of our theorems follow the similar ideas as those in Zhou et al. (2017b, 2018), except that we need to additionally establish the closeness of the approximate and exact likelihoods. For establishing asymptotic normality and deriving the asymptotic covariance matrix, our approach is quite different from those in Zhou et al. (2017b, 2018), since we have to account for the additional variability induced by the estimated covariate distribution.

Before proving Theorems 1 & 2, we first define the class of functions $\mathcal {L}_N=\{l(\theta ,O): \theta \in \Theta _N\}$, where $l(\theta ,O)$ is the log-likelihood function based on a single observation O given by

$$\begin{aligned} \begin{aligned} l(\theta ,O)&\,=\, R\log f(Y|Z;\theta )+(1-R)\log f_Y(Y) \\&\,=\, R\log f(Y|Z;\theta )+(1-R)\log \int f(Y|z;\theta )dG_Z(z), \end{aligned} \end{aligned}$$

where the covariate distribution $G_Z(z)$ is assumed to be known and

$$\begin{aligned} f(Y|Z;\theta )=(1-S(U|Z))^{\Delta _{1}} (S(U|Z)-S(V|Z))^{\Delta _{2}} S(V|Z)^{1-\Delta _{1}-\Delta _{2}} \end{aligned}$$

with $S(t|Z)=\exp (-\Lambda (t)e^{\beta 'Z})$. Let $P_N$ denote the empirical measure. For any $\epsilon >0$, we define the covering number $N(\epsilon ,\mathcal {L}_N,L_1(P_N))$ as the smallest value of $\kappa $ for which there exists $\{\theta ^{(1)},\ldots ,\theta ^{(\kappa )}\}\in \Theta _N$ such that

$$\begin{aligned} \min _{j\in \{1,\cdots ,\kappa \}}\frac{1}{N}\sum _{i=1}^N\Big |l(\theta ,O_i) -l(\theta ^{(j)},O_i)\Big |<\epsilon \end{aligned}$$

for all $\theta \in \Theta _N$. If no such $\kappa $ exists, define $N(\epsilon ,{\mathcal {L}}_N,L_1(P_N))=\infty $.

Proof of Theorem 1

We now prove the strong consistency of $\hat{\theta }_N$. Based on Lemma 1 in the Supplementary Materials, the covering number of $\mathcal {L}_N$ satisfies

$$\begin{aligned} N\left( \epsilon ,\mathcal {L}_N,L_1(P_N)\right) \le KM_N^{(m+1)}\epsilon ^{-(p+m+1)} \, . \end{aligned}$$

Then by Lemma 2 in the Supplementary Materials, we have

$$\begin{aligned} \sup _{\theta \in \Theta _N} \big |P_Nl(\theta ,O)-Pl(\theta , O)\big |\rightarrow 0\quad \text {almost surely}. \end{aligned}$$

(A.1)

Furthermore, define

$$\begin{aligned} \begin{aligned} P_N \hat{l}(\theta ,O)&=\frac{1}{N}\sum _{i=1}^N \left\{ R_i\log f(Y_i|Z_i;\theta )+(1-R_i)\log \int f(Y_i|z;\theta )d\hat{G}_Z(z)\right\} \\&=\frac{1}{N}\sum _{i=1}^N \left[ R_i\log f(Y_i|Z_i;\theta )+(1-R_i)\right. \\&\qquad \left. \log \left\{ \sum _{k=1}^4 \frac{N_k}{N (n_k+n_{0k})} \sum _{r\in I_{v_k}} f(Y_i|Z_r;\theta ) \right\} \right] . \end{aligned} \end{aligned}$$

Then it is easy to show that

$$\begin{aligned} \sup _{\theta \in \Theta _N} \big |P_N\hat{l}(\theta ,O)-P_Nl(\theta , O)\big |\rightarrow 0\quad \text {almost surely}. \end{aligned}$$

(A.2)

Let $M(\theta , O)=-l(\theta , O)$ and $P_N\hat{M}(\theta , O)=-P_N\hat{l}(\theta , O)$. Define $K_\epsilon =\{\theta : d(\theta , \theta _0) \ge \epsilon , \theta \in \Theta _N\}$ for $\epsilon > 0$ and

$$\begin{aligned} \begin{aligned} \zeta _{1N}=&\sup _{\theta \in \Theta _N} |P_NM(\theta , O)-PM(\theta ,O)|,\\ \zeta _{2N}=&\sup _{\theta \in \Theta _N} |P_N\hat{M}(\theta , O)-P_NM(\theta ,O)|,\\ \zeta _{3N}=&P_N\hat{M}(\theta _0,O)-PM(\theta _0,O). \end{aligned} \end{aligned}$$

Then we obtain

$$\begin{aligned} \inf _{K_\epsilon }PM(\theta , O)\le \zeta _{1N}+\zeta _{2N}+\inf _{K_\epsilon }P_N\hat{M}(\theta , O). \end{aligned}$$

(A.3)

If $\hat{\theta }_N\in K_\epsilon ,$ we have

$$\begin{aligned} \inf _{K_\epsilon } P_N\hat{M}(\theta , O)=P_N\hat{M}(\hat{\theta }_N,O)\le P_N\hat{M}(\theta _0,O)=\zeta _{3N}+PM(\theta _0,O). \end{aligned}$$

(A.4)

Define $\delta _\epsilon =\inf _{K_\epsilon }P M(\theta , O)-PM(\theta _0,O)$. Then under Conditions (C1) - (C5), using the same arguments as those in Zhang et al. (2010, p. 352), we can prove $\delta _\epsilon >0$. It follows from (A.3) and (A.4) that

$$\begin{aligned} \inf _{K_\epsilon }PM(\theta , O)\le \zeta _{1N}+\zeta _{2N} + \zeta _{3N} +PM(\theta _0,O) = \zeta _N+ PM(\theta _0,O) \end{aligned}$$

with $\zeta _N=\zeta _{1N}+\zeta _{2N}+\zeta _{3N},$ and hence $\zeta _N \ge \delta _\epsilon .$ This gives $\{\hat{\theta }_N \in K_{\epsilon } \}\subseteq \{\zeta _N \ge \delta _{\epsilon }\}$, and by (A.1), (A.2) and the strong law of large numbers, we have $\zeta _{N}\rightarrow 0$ almost surely. Therefore, $\cup _{k=1}^{\infty }\cap _{N=k}^{\infty }\{\hat{\theta }_N \in K_{\epsilon } \} \subseteq \cup _{k=1}^{\infty }\cap _{N=k}^{\infty }\{\zeta _N \ge \delta _{\epsilon }\}$, which proves that $d(\hat{\theta }_N,\theta _0)\rightarrow 0$ almost surely.

Now we will derive the rate of convergence by using Theorem 3.4.1 of van der Vaart and Wellner (1996). Below let $\tilde{K}$ denote a universal positive constant that may differ from place to place. First note from Theorem 1.6.2 of Lorentz (1986) that there exists a Bernstein polynomial $\Lambda _{N0}$ such that $\Vert \Lambda _{N0}-\Lambda _{0}\Vert _{\infty } = O(m^{-r/2})=O(N^{-r\nu /2}).$ Define $\theta _{N0}=(\beta _0,\Lambda _{N0})$, then $d(\theta _{N0},\theta _0)=O(N^{-r\nu /2})$. For any $\eta >0,$ define the class of functions ${\mathcal {F}}_{\eta }=\{l(\theta ,O)-l(\theta _{N0},O): \theta \in \Theta _N,\, \eta /2 < d(\theta ,\theta _{N0})\le \eta \}.$ One can easily show that $P(l(\theta _0,O)-l(\theta _{N0},O))\le \tilde{K}d(\theta _0,\theta _{N0})\le \tilde{K}N^{-r\nu /2}.$ Also under Condition (C1)–(C5), using the same arguments as those in Zhang et al. (2010, p. 352), we obtain $P(l(\theta _0,O)-l(\theta ,O))\ge \tilde{K} d^2(\theta _0,\theta )$. Therefore, for large N, we have $P(l(\theta ,O)-l(\theta _{N0},O))=P(l(\theta ,O)-l(\theta _0,O))+P(l(\theta _0,O)-l(\theta _{N0},O))\le -\tilde{K}\eta ^2+\tilde{K}N^{-r \nu /2}=-\tilde{K}\eta ^2,$ for any $l(\theta ,O)-l(\theta _{N0},O)\in {\mathcal {F}}_{\eta }.$

Following the calculations in Shen and Wong (1994, p. 597), we have that for $0<\varepsilon <\eta $, $\log N_{[]}(\varepsilon ,{\mathcal {F}}_{\eta },L_2(P))\le \tilde{K} (m+1)\log (\eta /\varepsilon )$. Some algebraic manipulations give $P(l(\theta ,O)-l(\theta _{N0},O))^2\le \tilde{K} \eta ^2$ for any $l(\theta ,O)-l(\theta _{N0},O)\in {\mathcal {F}}_{\eta }.$ Also under Conditions (C1) - (C4), ${\mathcal {F}}_{\eta }$ is uniformly bounded. Then by Lemma 3.4.2 of van der Vaart and Wellner (1996), we obtain

$$\begin{aligned} E_P\Vert N^{1/2}(P_N-P)\Vert _{{\mathcal {F}}_{\eta }}\le \tilde{K}J_{[]}(\eta ,{\mathcal {F}}_{\eta },L_2(P)) \biggl \{1+\frac{J_{[]}(\eta ,{\mathcal {F}}_{\eta },L_2(P))}{\eta ^2N^{1/2}}\biggl \} \end{aligned}$$

with

$$\begin{aligned} J_{[]}(\eta ,{\mathcal {F}}_{\eta },L_2(P))=\int _0^\eta \Big \{1+\log N_{[]}(\varepsilon ,{\mathcal {F}}_{\eta },L_2(P))\Big \}^{1/2}d\varepsilon \le \tilde{K} (m+1)^{1/2}\eta . \end{aligned}$$

Then we have

$$\begin{aligned}&\sup _{\eta /2< d(\theta ,\theta _{N0})\le \eta ,\theta \in \Theta _N} N^{1/2}\left[ (P_N\hat{l}(\theta ,O)-Pl(\theta ,O))-(P_N\hat{l}(\theta _{N0},O) -Pl(\theta _{N0},O))\right] \\&\quad =\sup _{\eta /2 < d(\theta ,\theta _{N0})\le \eta ,\theta \in \Theta _N} N^{1/2}\Big [\big \{(P_N\hat{l}(\theta ,O)-P_Nl(\theta , O)) - (P_N\hat{l}(\theta _{N0},O)\\&\qquad -P_Nl(\theta _{N0}, O))\big \}\\&\qquad + \big \{(P_Nl(\theta ,O)-Pl(\theta ,O))-(P_Nl(\theta _{N0},O)-Pl(\theta _{N0},O))\big \}\Big ]\\&\quad \le 2N^{1/2}\sup _{\theta \in \Theta _N} \big |P_N\hat{l}(\theta ,O)-P_Nl(\theta , O)\big | + E_P\Vert N^{1/2}(P_N-P)\Vert _{{\mathcal {F}}_{\eta }}\\&\quad \le \tilde{K} \big \{(m+1)^{1/2}\eta +(m+1)N^{-1/2}\big \}. \end{aligned}$$

Define $\phi _N(\eta )=(m+1)^{1/2}\eta +(m+1)N^{-1/2}$. It is obvious that $\phi _N(\eta )/\eta $ is decreasing in $\eta $. Let $r_N=N^{min\{(1-\nu )/2,r\nu /2\}}$, then $r_N^2\phi _N(1/r_N)=r_N(N^\nu +1)^{1/2}+r_N^2(N^\nu +1)N^{-1/2}\le \tilde{K}N^{1/2}$.

Note that $d(\hat{\theta }_N,\theta _{N0})\le d(\hat{\theta }_N,\theta _0)+d(\theta _0,\theta _{N0})\rightarrow 0$ in probability. It then follows from Theorem 3.4.1 of van der Vaart and Wellner (1996) that $r_N d(\hat{\theta }_N,\theta _{N0})=O_p(1)$. Furthermore, by $d(\theta _{N0},\theta _0)=O(N^{-r\nu /2})$, we have $r_N d(\hat{\theta }_N,\theta _0)\le r_N d(\hat{\theta }_N,\theta _{N0}) + r_N d(\theta _{N0},\theta _0) = O_p(1)$ which completes the proof.$\square $

Proof of Theorem 2

We now sketch the proof of the asymptotic normality of $\hat{\beta }_N$. Let $l_\beta (\theta ,O)$ denote the score for $\beta $ given by

$$\begin{aligned} l_\beta (\theta ,O)\,=\, R\frac{f_\beta (Y|Z;\theta )}{f(Y|Z;\theta )}+(1-R)\frac{\int f_\beta (Y|z;\theta )dG_Z(z)}{\int f(Y|z;\theta )dG_Z(z)}, \end{aligned}$$

where

$$\begin{aligned} f_\beta (Y|Z;\theta )=(1-S_\beta (U|Z))^{\Delta _{1}} (S_\beta (U|Z)-S_\beta (V|Z))^{\Delta _{2}} S_\beta (V|Z)^{1-\Delta _{1}-\Delta _{2}} \end{aligned}$$

and

$$\begin{aligned} S_\beta (t|Z)=\exp (-\Lambda (t)e^{\beta 'Z})(-\Lambda (t)e^{\beta 'Z})Z. \end{aligned}$$

Consider a parametric smooth submodel with parameter $\theta _{(s)}=(\beta ,\Lambda _{(s)})$, where $\Lambda _{(0)}=\Lambda $ and

$$\begin{aligned} \left. \frac{\partial \Lambda _{(s)}}{\partial s}\right| _{s=0}=h. \end{aligned}$$

Let $\mathcal {H}\subseteq L_2(P)$ denote the class of functions h defined by this equation. The score operator for $\Lambda $ is

$$\begin{aligned} \begin{aligned} l_\Lambda (\theta ,O)[h]&\,=\,\left. \frac{\partial l(\theta _{(s)},O)}{\partial s}\right| _{s=0}\\&\,=\, R\frac{f_\Lambda (Y|Z;\theta )[h]}{f(Y|Z;\theta )}+(1-R)\frac{\int f_\Lambda (Y|z;\theta )[h]dG_Z(z)}{\int f(Y|z;\theta )dG_Z(z)}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} f_\Lambda (Y|Z;\theta )[h]= & {} (1-S_\Lambda (U|Z)[h])^{\Delta _{1}} (S_\Lambda (U|Z)[h]\\&-S_\Lambda (V|Z)[h])^{\Delta _{2}} S_\Lambda (V|Z)[h]^{1-\Delta _{1}-\Delta _{2}} \end{aligned}$$

and

$$\begin{aligned} S_\Lambda (t|Z)[h]=\exp (-\Lambda (t)e^{\beta 'Z})(-e^{\beta 'Z})h(t). \end{aligned}$$

According to Bickel et al. (1993), the efficient score for $\beta $ is

$$\begin{aligned} l^*(\theta ,O)\,=\,l_\beta (\theta ,O)-l_\Lambda (\theta ,O)[h_0], \end{aligned}$$

where $h_0\in \mathcal {H}^p$ minimizes $P\Vert l_\beta (\theta ,O)-l_\Lambda (\theta ,O)[h]\Vert ^2$ and is called the least favorable direction. Then the information for $\beta $ is

$$\begin{aligned} I(\beta )=P\big [l^*(\theta ,O)^{\otimes 2}\big ]=P\big [(l_\beta (\theta ,O)-l_\Lambda (\theta ,O)[h_0])^{\otimes 2}\big ], \end{aligned}$$

where $v^{\otimes 2} = v v' $ for a column vector $v\in R^p$. Under Conditions (C1)-(C5), the existence of the least favorable direction and the positive definiteness of the information can be similarly established as Theorem 4.1 in Huang and Wellner (1997).

Since $\hat{\theta }_N$ maximizes $P_N\hat{l}(\theta ,O)$ which is obtained by replacing $G_Z$ in $P_Nl(\theta ,O)$ with its consistent estimator $\hat{G}_Z$, $\hat{\theta }_N$ is the solution to the functional equation $P_N\hat{l}^*(\theta ,O)=0$, where $P_N\hat{l}^*(\theta ,O)$ is obtained by replacing $G_Z$ in $P_Nl^*(\theta ,O)$ with $\hat{G}_Z$. First note that $\hat{G}_Z$ is a $\sqrt{N}$-consistent estimator of $G_Z$. Similarly as the proofs of Theorem 2 in Zhang et al. (2010), one can establish that

$$\begin{aligned} N^{1/2} ( \hat{\beta }_N - \beta _0 ) = N^{1/2} I^{-1}(\beta _0) P_N\hat{l}^*(\theta _0,O)+o_p(1). \end{aligned}$$

Following the proofs of Theorem 2 in Weaver and Zhou (2005), one can further establish that

$$\begin{aligned} N^{1/2} ( \hat{\beta }_N - \beta _0 ) \rightarrow \, N ( 0 , \Sigma ) \end{aligned}$$

in distribution with the asymptotic covariance matrix given by

$$\begin{aligned} \Sigma =I^{-1}(\beta _0)+\sum _{l=1}^4 \frac{\pi _l^2}{\rho _l \rho _v + \pi _l \rho _0\rho _v}I^{-1}(\beta _0)\Sigma _l(\theta _0)I^{-1}(\beta _0). \end{aligned}$$

Here $I(\beta )$ is the information for $\beta $ defined above, and

$$\begin{aligned} \Sigma _l(\theta )=\mathrm{var}_{\{Z|Y\in A_l\}}\left\{ \sum _{k=1}^4 [\pi _k(1-\rho _0\rho _v)-\rho _k\rho _v] E_{\{Y|Y\in A_k\}}[M_Z(Y;\theta )]\right\} , \end{aligned}$$

where

$$\begin{aligned} M_Z(Y;\theta )=\frac{\partial f(Y|Z;\theta )/\partial \theta }{f_Y(Y;\theta )}-\frac{\partial f_Y(Y;\theta )/\partial \theta }{[f_Y(Y;\theta )]^2}f(Y|Z;\theta ), \end{aligned}$$

with

$$\begin{aligned} \frac{\partial f(Y|Z;\theta )}{\partial \theta }\equiv & {} f_\beta (Y|Z;\theta )-f_\Lambda (Y|Z;\theta )[h_0]\quad \text {and}\quad \frac{\partial f_Y(Y;\theta )}{\partial \theta } \\\equiv & {} \int \frac{\partial f(Y|z;\theta )}{\partial \theta } dG_Z(z). \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Q., Cai, J. & Zhou, H. Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data. Lifetime Data Anal 26, 85–108 (2020). https://doi.org/10.1007/s10985-019-09461-5

Download citation

Received: 27 November 2017
Accepted: 02 January 2019
Published: 07 January 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10985-019-09461-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

Abstract

Access this article

Similar content being viewed by others

Efficient Estimation of Semiparametric Transformation Model with Interval-Censored Data in Two-Phase Cohort Studies

Analysis of Informatively Interval-Censored Case–Cohort Studies with the Application to HIV Vaccine Trials

New approaches for censored longitudinal data in joint modelling of longitudinal and survival data, with application to HIV vaccine studies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary Materials

Appendix: Proofs of Theorems 1 and 2

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semiparametric inference for a two-stage outcome-dependent sampling design with interval-censored failure time data

Abstract

Access this article

Similar content being viewed by others

Efficient Estimation of Semiparametric Transformation Model with Interval-Censored Data in Two-Phase Cohort Studies

Analysis of Informatively Interval-Censored Case–Cohort Studies with the Application to HIV Vaccine Trials

New approaches for censored longitudinal data in joint modelling of longitudinal and survival data, with application to HIV vaccine studies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary Materials

Appendix: Proofs of Theorems 1 and 2

Appendix: Proofs of Theorems 1 and 2

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation