Abstract
In a recurrent event setting, we introduce a new score designed to evaluate the prediction ability, for a given model, of the expected cumulative number of recurrent events. This score can be seen as an extension of the Brier Score for single time to event data but works for recurrent events with or without a terminal event. Theoretical results are provided that show that under standard assumptions in a recurrent event context, our score can be asymptotically decomposed as the sum of the theoretical mean squared error between the model and the true expected cumulative number of recurrent events and an inseparability term that does not depend on the model. This decomposition is further illustrated on simulations studies. It is also shown that this score should be used in comparison with a reference model, such as a nonparametric estimator that does not include the covariates. Finally, the score is applied for the prediction of hospitalisations on a dataset of patients suffering from atrial fibrillation and a comparison of the prediction performances of different models, such as the Cox model, the Aalen Model or the Ghosh and Lin model, is investigated.
Similar content being viewed by others
References
Andersen PK, Angst J, Ravn H (2019) Modeling marginal features in studies of recurrent events in the presence of a terminal event. Lifetime Data Anal 25:681–695
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer series in statistics. Springer-Verlag, New York
Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10(4):1100–1120
Bouaziz O, Geffray S, Lopez O (2015) Semiparametric inference for the recurrent events process by means of a single-index model. Statistics 49:361–385
Bouaziz O, Lopez O (2010) Conditional density estimation in a censored single-index regression model. Bernoulli 16:514–542
Bradley AA, Schwartz SS, Hashino T (2008) Sampling uncertainty and confidence intervals for the brier score and brier skill score. Weather Forecast 23:992–1006
Cook RJ, Lawless J (2007) The statistical analysis of recurrent events. Springer Science & Business Media, New-York, USA
Cook RJ, Lawless JF (1997) Marginal analysis of recurrent events and a terminating event. Stat Med 16:911–924
Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B (Methodol) 34:187–202
Dabrowska DM (1989) Uniform consistency of the kernel conditional Kaplan-Meier estimate. Ann. Stat 17(3):1157–1167
Gerds TA, Kattan MW, Schumacher M, Yu C (2013) Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat Med 32:2173–2184
Gerds TA, Schumacher M (2006) Consistent estimation of the expected brier score in general survival models with right-censored event times. Biom J 48:1029–1040
Ghosh D, Lin D (2003) Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics 59:877–885
Ghosh D, Lin DY (2002) Marginal regression models for recurrent and terminal events. Stat Sin 12(3):663–688
Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999) Assessment and comparison of prognostic classification schemes for survival data. Stat Med 18:2529–2545
Harrell FE Jr, Lee KL, Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387
Heagerty PJ, Zheng Y (2005) Survival model predictive accuracy and ROC curves. Biometrics 61:92–105
Hougaard P, Hougaard P (2000) Analysis of multivariate survival data, vol 564. Springer, New York, USA
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2:841–860
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley series in probability and statistics. Wiley-Interscience (John Wiley & Sons), Hoboken
Lin D, Wei L, Ying Z (1998) Accelerated failure time models for counting processes. Biometrika 85:605–618
Lin DY, Wei L-J, Yang I, Ying Z (2000) Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B (Stat Methodol) 62:711–730
Liu L, Wolfe RA, Huang X (2004) Shared frailty models for recurrent events and a terminal event. Biometrics 60:747–756
Prentice RL, Williams BJ, Peterson AV (1981) On the regression analysis of multivariate failure time data. Biometrika 68:373–379
Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P (2007) Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events. Biostatistics 8:708–721
Scheike TH (2002) The additive nonparametric and semiparametric Aalen model as the rate function for a counting process. Lifetime Data Anal 8:247–262
Schoop R, Schumacher M, Graf E (2011) Measures of prediction error for survival data with longitudinal covariates. Biom J 53:275–293
Schroder J, Bouaziz O, Agner BR, Martinussen T, Madsen PL, Li D, Dixen U (2019) Recurrent event survival analysis predicts future risk of hospitalization in patients with paroxysmal and persistent atrial fibrillation. PLoS One 14:e0217983
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, Mass) 21:128
Van Oirbeek R, Lesaffre E (2016) Exploring the clustering effect of the frailty survival model by means of the brier score. Commun Stat-Simul Comput 45:3294–3306
Acknowledgements
We thank the reviewers for their constructive criticisms and comments that have helped improve the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
8 Appendix: proofs of the convergence of the prediction criterion for the expected cumulative number of recurrent events under the two scenarios
8 Appendix: proofs of the convergence of the prediction criterion for the expected cumulative number of recurrent events under the two scenarios
In the proof of Proposition 1, we need to verify the key equality from Eq. (3). This result depends on the modelling assumptions and has already been proved in all three different scenarios, see Section 1 of Supplementary Information, Sect. 2.2 of the main manuscript and Section 2 of Supplementary Information for the right-censoring case with no terminal event, the terminal event case, and the dependence on prior counts case, respectively. In the proof of Proposition 2, we also need to have \(\mathbb E[\mu ^*(\tau \mid \bar{X}(\tau ))]<\infty\) which also depends on the different modelling assumptions made under each scenario.
1.1 8.1 Proof of Proposition 1
In all three scenarios, we directly have:
Using the fact that \(\mathbb E[\int _0^tdN(u)/(G_c(u\mid \bar{X}(u)))\mid {\bar{X}}(t)]=\mu ^*(t\mid {\bar{X}}(t))\), we conclude that
where
Now, using the remarkable identity \(a^2-b^2=(a-b)(a+b)\) and observing that \(\int _0^tdN(u)/(G_c(u\mid \bar{X}(u)))=\sum _{\text {ev.}\le t}\{1/(G_c(u\mid X(\text {ev.})))\}\) either equals 0 if no observed recurrent events occurred before time t or is greater than 1 if at least one recurrent event occurred before time t, we conclude that
almost surely. Taking the expectation on both sides proves \(A(t)\ge 0\).
1.2 8.2 Proof of Proposition 2
We start by proving that \(\mathbb E[\mu ^*(\tau \mid \bar{X}(\tau ))]<\infty\) in the presence of a terminal event (the scenario without terminal event follows from the same arguments). We have for all \(t\in [0,\tau ]: \mathbb P[C\ge t\mid {\bar{X}}(t)] \ge \mathbb P[T\ge t\mid {\bar{X}}(t)]\ge c\), from Assumption 2. From the same assumption, \(N(\tau )\) is almost surely bounded by a constant. As a consequence,
is almost surely bounded, where the equality has been proved in Sect. 2.2. In the dependence on prior counts case, we have for all \(t\in [0,\tau ]: \mathbb P[C\ge t\mid {\bar{X}}(t)] \ge \mathbb P[T\ge t\mid \bar{X}(t)]=\sum _{l=1}^{L+1}\mathbb P[T\ge t,N(t-)=l-1\mid {\bar{X}}(t)]\ge (L+1)c>0\), where the two last bounds come from Assumption MSM in the Supplementary Information. From the same assumption, \(N(\tau )\) is almost surely bounded by a constant. As a consequence,
is almost surely bounded, where the equality has been proved in the Supplementary Information. The rest of the proof of Proposition 2 is identical in all three scenarios.
We first note \(F_{X(t)}(x)=\mathbb P[X(t)\le x]\), we let \(\mathcal X_{u,v}\) denote the support of the joint distribution (X(u), X(v)) and we note \(F_{X(u),X(v)}(x,y)=\mathbb P[X(u)\le x,X(v)\le v]\). We then introduce the quantity
Write:
By decomposing the square term into three other terms, we bound C(t) in the following way: \(C(t)\le |C_1(t)|+|C_2(t)|+|C_3(t)|\) with
For \(C_1(t)\) we have
and we can deal with all four terms in the same fashion. For instance, for the first term,
using the fact that \(\int _0^t dN(v)/({\hat{G}}_c(v\mid y)G_c(v\mid y))\) is bounded. Then, since \(\int _0^t\mathbb E[dN(u)/(1-G(u-\mid X(u))) \mid X(u)=x]=\mu ^*(t\mid {\bar{X}}(t))\) and \({\hat{G}}_c(u\mid x)^{-1}\) is asymptotically uniformly bounded, we conclude that \(|C_1(t)|\) tends toward 0 in probability using the uniform consistency of the censoring estimator.
For \(C_2(t)\) we use the consistency of \(\widehat{\mu }\) and the fact that \(\mathbb E[\mu ^*(t\mid {\bar{X}}(t))]\) is finite to prove that \(|C_2(t)|\) tends towards 0 in probability.
For \(C_3(t)\), we directly write \((\widehat{\mu }(t\mid x))^2-(\mu (t\mid x))^2=(\widehat{\mu }(t\mid x)-\mu (t\mid x))(\widehat{\mu }(t\mid x)+\mu (t\mid x))\) and we use the fact that \(\mu (t\mid x)\) is bounded and the consistency of \(\widehat{\mu }\) to prove that \(|C_3(t)|\) tends towards 0 in probability.
Similarly to C(t) we obtain the following bound: \(D(t)\le |D_1(t)|+|D_2(t)|+|D_3(t)|\) with
We now use the bound \(|D_1(t)|\le |D_{1,1}(t)| + |D_{1,2}(t)|+|D_{1,3}(t)|\) with
and
The term \(|D_{1,1}(t)|\) converges towards 0 in probability from the strong law of large numbers. The term \(|D_{1,2}(t)|\) is bounded by
\(\sup _{u,v,x,y} |\chi (u,v,x,y)|\) converges towards 0 from the uniform consistency of \({\hat{G}}\) while the other term converges towards a bounded quantity from the law of large numbers. The same argument applies to \(|D_{1,3}(t)|\) which also converges towards 0 in probability.
For \(D_2(t)\) we write \(|D_2(t)|\le |D_{2,1}(t)| + |D_{2,2}(t)|+ |D_{2,3}(t)|+ |D_{2,4}(t)|\) with
The \(D_{2,1}(t)\) term converges towards 0 in probability from the law of large numbers. For \(D_{2,2}(t)\), \(D_{2,3}(t)\) and \(D_{2,4}(t)\) we use the consistency of \(\widehat{\mu }\), the convergence in probability of \(\sum _i\int _0^t dN_i(u))G_c(u\mid X_i(u))/n\), the boundedness of \(\mathbb E[\mu ^*(t\mid {\bar{X}}(t))]\), the uniform consistency of \({\hat{G}}\) and the asymptotic boundedness of \(\widehat{\mu }\) and \({\hat{G}}_c(u\mid x)^{-1}\) to prove that all three terms converge towards 0 in probability.
Finally, for \(D_3(t)\), we write
Each of the three terms converges towards 0 in probability using the law of large numbers for the first term and the uniform consistency of \(\widehat{\mu }\) for the other two.
1.3 8.3 Proof of Proposition 3
First, note that the Brier score can be written in the following way:
We now study, our prediction score \(\textrm{MSE}'(t,\pi )\). Using standard martingale properties (see for instance Andersen et al. (1993)), we directly have that \(\mathbb E[dN(t)\mid X]=H(t\mid X)\lambda ^*(t\mid X)dt\), where \(H(t\mid X)=\mathbb P[T>t\mid X]=S(t\mid X) G_c(t\mid X)\) under independent censoring and \(\lambda ^*\) is the hazard rate of \(T^*\). As a consequence,
since \(S(u\mid X)\lambda ^*(u\mid X)\) is equal to the conditional density function of \(T^*\). Also, it is important to notice that
where the first equality is due to the fact that N can only jump once and thus \((\int _0^t dN(u)/(G_c(u\mid X)))^2\) is simply equal to \(\Delta I(T\le t)/(G_c(T\mid X))^2\). Now,
with
Now, using Eq. (14), we can rewrite B(t) in the following way:
This shows that \(B(t)\ge 0\) and that this quantity does not depend on \(\pi\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bouaziz, O. Assessing model prediction performance for the expected cumulative number of recurrent events. Lifetime Data Anal 30, 262–289 (2024). https://doi.org/10.1007/s10985-023-09610-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-023-09610-x