Abstract
The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past two decades. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods use augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.
Funding source: National Institutes of Health
Award Identifier / Grant number: R01CA163687
Acknowledgment
We thank the NRG Oncology Statistics and Data Management Center for providing de-identified RTOG 9410 clinical trial data under a data use agreement.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This work was partially supported by the National Institutes of Health (R01CA163687: AMM, RLS, YC; U10-CA180822: CH).
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Fine, JP, Gray, RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 1999;94:496–509. https://doi.org/10.1080/01621459.1999.10474144.Search in Google Scholar
2. Dignam, J, Zhang, Q, Kocherginsky, M. The use and interpretation of competing risks regression models. Clin Cancer Res 2012;18:2301–8. https://doi.org/10.1158/1078-0432.ccr-11-2097.Search in Google Scholar PubMed PubMed Central
3. Breiman, L. Random forests. Mach Learn 2001;45:5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar
4. Ishwaran, H, Gerds, TA, Kogalur, UB, Moore, RD, Gange, SJ, Lau, BM. Random survival forests for competing risks. Biostatistics 2014;15:757–73. https://doi.org/10.1093/biostatistics/kxu010.Search in Google Scholar PubMed PubMed Central
5. Mogensen, UB, Gerds, TA. A random forest approach for competing risks based on pseudo-values. Stat Med 2013;32:3102–14. https://doi.org/10.1002/sim.5775.Search in Google Scholar PubMed
6. Ishwaran, H, Kogalur, U. Random Forests for Survival, Regression and Classification (RF-SRC). Version 2.4.1. R Foundation for Statistical Computing; 2016.Search in Google Scholar
7. Gray, RJ. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 1988;16:1141–54.10.1214/aos/1176350951Search in Google Scholar
8. Aalen, O, Johansen, S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 1978;5:141–50.Search in Google Scholar
9. Steingrimsson, JA, Diao, L, Molinaro, AM, Strawderman, RL. Doubly robust survival trees. Stat Med 2016;35:3595–612. https://doi.org/10.1002/sim.6949.Search in Google Scholar PubMed PubMed Central
10. Steingrimsson, JA, Diao, L, Strawderman, RL. Censoring unbiased regression trees and ensembles. J Am Stat Assoc 2019;114:370–83. https://doi.org/10.1080/01621459.2017.1407775.Search in Google Scholar PubMed PubMed Central
11. Brier, GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2Search in Google Scholar
12. Scheike, TH, Zhang, M-J, Gerds, TA. Predicting cumulative incidence probability by direct binomial regression. Biometrika 2008;95:205–20. https://doi.org/10.1093/biomet/asm096.Search in Google Scholar
13. Molinaro, AM, Dudoit, S, Van der Laan, MJ. Tree-based multivariate regression and density estimation with right-censored data. J Multivariate Anal 2004;90:154–77. https://doi.org/10.1016/j.jmva.2004.02.003.Search in Google Scholar
14. Lostritto, K, Strawderman, RL, Molinaro, AM. A partitioning deletion/substitution/addition algorithm for creating survival risk groups. Biometrics 2012;68:1146–56. https://doi.org/10.1111/j.1541-0420.2012.01756.x.Search in Google Scholar PubMed
15. Ishwaran, H, Kogalur, UB, Blackstone, EH, Lauer, MS. Random survival forests. Ann Appl Stat 2008;2:841–60. https://doi.org/10.1214/08-aoas169.Search in Google Scholar
16. Tsiatis, AA. Semiparametric Theory and Missing Data. New York: Springer; 2007.Search in Google Scholar
17. Strawderman, RL. Estimating the mean of an increasing stochastic process at a censored stopping time. J Am Stat Assoc 2000;95:1192–208. https://doi.org/10.1080/01621459.2000.10474320.Search in Google Scholar
18. Buckley, J, James, I. Linear regression with censored data. Biometrika 1979;66:429–36. https://doi.org/10.1093/biomet/66.3.429.Search in Google Scholar
19. Schoop, R, Beyersmann, J, Schumacher, M, Binder, H. Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biom J 2011;53:88–112. https://doi.org/10.1002/bimj.201000073.Search in Google Scholar PubMed
20. Cruz-Uribe, D, Neugebauer, C. Sharp error bounds for the trapezoidal rule and simpson’s rule. J Inequalities Pure Appl Math 2002;3:22.Search in Google Scholar
21. Breiman, L, Friedman, J, Stone, CJ, Olshen, RA. Classification and Regression Trees. Monterey CA: Wadsworth and Brooks; 1984.Search in Google Scholar
22. Rahman, R. MultivariateRandomForest: models multivariate cases using random forests. R package version 1.1.5, (2017).Search in Google Scholar
23. Segal, M, Xiao, Y. Multivariate random forests. Wiley Interdisciplinary Reviews: Data Min Knowl Discov 2011;1:80–7. https://doi.org/10.1002/widm.12.Search in Google Scholar
24. Cho, Y, Molinaro, AM, Hu, C, Strawderman, RL. Regression trees for cumulative incidence functions; 2020, arXiv (stat.ME; 2011.06706).Search in Google Scholar
25. Lin, DY, Wei, LJ, Ying, Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 1993;80:557–72. https://doi.org/10.1093/biomet/80.3.557.Search in Google Scholar
26. LeBlanc, M, Crowley, J. Relative risk trees for censored survival data. Biometrics 1992;48:411–25. https://doi.org/10.2307/2532300.Search in Google Scholar
27. Therneau, TM, Atkinson, EJ. An introduction to recursive partitioning using the rpart routines; 2015.Search in Google Scholar
28. Jeong, J-H, Fine, JP. Parametric regression on the cumulative incidence function. Biostatistics 2007;8:184–96. https://doi.org/10.1093/biostatistics/kxj040.Search in Google Scholar PubMed
29. Sun, Y, Wang, HJ, Gilbert, PB. Quantile regression for competing risks data with missing cause of failure. Stat Sin 2012;22:703. https://doi.org/10.5705/ss.2010.093.Search in Google Scholar PubMed PubMed Central
30. Curran, WJ, Paulus, R, Langer, CJ, Komaki, R, Lee, JS, Hauser, S, et al.. Sequential vs concurrent chemoradiation for stage III non–small cell lung cancer: randomized phase III trial RTOG 9410. J Natl Cancer Inst 2011;103:1452–60. https://doi.org/10.1093/jnci/djr325.Search in Google Scholar PubMed PubMed Central
31. Greenwell, BM. pdp: an R package for constructing partial dependence plots. R J 2017;9:421. https://doi.org/10.32614/rj-2017-016.Search in Google Scholar
32. Zeileis, A, Hothorn, T, Hornik, K. Model-based recursive partitioning. J Comput Graph Stat 2008;17:492–514. https://doi.org/10.1198/106186008x319331.Search in Google Scholar
33. Cui, Y, Zhu, R, Zhou, M, Kosorok, M. Consistency of survival tree and forest models: splitting bias and correction; 2019, arXiv (math.ST; 1707.09631).Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2021-0014).
© 2022 Walter de Gruyter GmbH, Berlin/Boston