Regression trees and ensembles for cumulative incidence functions

Youngjoo Cho; Annette M. Molinaro; Chen Hu; Robert L. Strawderman

doi:10.1515/ijb-2021-0014

Published by De Gruyter March 25, 2022

Regression trees and ensembles for cumulative incidence functions

Youngjoo Cho , Annette M. Molinaro , Chen Hu and Robert L. Strawderman

From the journal The International Journal of Biostatistics

https://doi.org/10.1515/ijb-2021-0014

Showing a limited preview of this publication:

Abstract

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past two decades. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and ensemble methods, have begun comparatively recently. In this paper, we propose a novel approach to estimating cumulative incidence curves in a competing risks setting using regression trees and associated ensemble estimators. The proposed methods use augmented estimators of the Brier score risk as the primary basis for building and pruning trees, and lead to methods that are easily implemented using existing R packages. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.

Keywords: Brier score; CART; cause-specific hazard; competing risks; Fine and Gray model; random forests

Corresponding author: Robert L. Strawderman, Department of Biostatistics & Computational Biology, University of Rochester, Rochester, NY, United States, E-mail: robert_strawderman@urmc.rochester.edu

Funding source: National Institutes of Health

Award Identifier / Grant number: R01CA163687

Acknowledgment

We thank the NRG Oncology Statistics and Data Management Center for providing de-identified RTOG 9410 clinical trial data under a data use agreement.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was partially supported by the National Institutes of Health (R01CA163687: AMM, RLS, YC; U10-CA180822: CH).
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Fine, JP, Gray, RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 1999;94:496–509. https://doi.org/10.1080/01621459.1999.10474144.Search in Google Scholar

2. Dignam, J, Zhang, Q, Kocherginsky, M. The use and interpretation of competing risks regression models. Clin Cancer Res 2012;18:2301–8. https://doi.org/10.1158/1078-0432.ccr-11-2097.Search in Google Scholar PubMed PubMed Central

3. Breiman, L. Random forests. Mach Learn 2001;45:5–32. https://doi.org/10.1023/a:1010933404324.10.1023/A:1010933404324Search in Google Scholar

4. Ishwaran, H, Gerds, TA, Kogalur, UB, Moore, RD, Gange, SJ, Lau, BM. Random survival forests for competing risks. Biostatistics 2014;15:757–73. https://doi.org/10.1093/biostatistics/kxu010.Search in Google Scholar PubMed PubMed Central

5. Mogensen, UB, Gerds, TA. A random forest approach for competing risks based on pseudo-values. Stat Med 2013;32:3102–14. https://doi.org/10.1002/sim.5775.Search in Google Scholar PubMed

6. Ishwaran, H, Kogalur, U. Random Forests for Survival, Regression and Classification (RF-SRC). Version 2.4.1. R Foundation for Statistical Computing; 2016.Search in Google Scholar

7. Gray, RJ. A class of k-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 1988;16:1141–54.10.1214/aos/1176350951Search in Google Scholar

8. Aalen, O, Johansen, S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 1978;5:141–50.Search in Google Scholar

9. Steingrimsson, JA, Diao, L, Molinaro, AM, Strawderman, RL. Doubly robust survival trees. Stat Med 2016;35:3595–612. https://doi.org/10.1002/sim.6949.Search in Google Scholar PubMed PubMed Central

10. Steingrimsson, JA, Diao, L, Strawderman, RL. Censoring unbiased regression trees and ensembles. J Am Stat Assoc 2019;114:370–83. https://doi.org/10.1080/01621459.2017.1407775.Search in Google Scholar PubMed PubMed Central

11. Brier, GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2Search in Google Scholar

12. Scheike, TH, Zhang, M-J, Gerds, TA. Predicting cumulative incidence probability by direct binomial regression. Biometrika 2008;95:205–20. https://doi.org/10.1093/biomet/asm096.Search in Google Scholar

13. Molinaro, AM, Dudoit, S, Van der Laan, MJ. Tree-based multivariate regression and density estimation with right-censored data. J Multivariate Anal 2004;90:154–77. https://doi.org/10.1016/j.jmva.2004.02.003.Search in Google Scholar

14. Lostritto, K, Strawderman, RL, Molinaro, AM. A partitioning deletion/substitution/addition algorithm for creating survival risk groups. Biometrics 2012;68:1146–56. https://doi.org/10.1111/j.1541-0420.2012.01756.x.Search in Google Scholar PubMed

15. Ishwaran, H, Kogalur, UB, Blackstone, EH, Lauer, MS. Random survival forests. Ann Appl Stat 2008;2:841–60. https://doi.org/10.1214/08-aoas169.Search in Google Scholar

16. Tsiatis, AA. Semiparametric Theory and Missing Data. New York: Springer; 2007.Search in Google Scholar

17. Strawderman, RL. Estimating the mean of an increasing stochastic process at a censored stopping time. J Am Stat Assoc 2000;95:1192–208. https://doi.org/10.1080/01621459.2000.10474320.Search in Google Scholar

18. Buckley, J, James, I. Linear regression with censored data. Biometrika 1979;66:429–36. https://doi.org/10.1093/biomet/66.3.429.Search in Google Scholar

19. Schoop, R, Beyersmann, J, Schumacher, M, Binder, H. Quantifying the predictive accuracy of time-to-event models in the presence of competing risks. Biom J 2011;53:88–112. https://doi.org/10.1002/bimj.201000073.Search in Google Scholar PubMed

20. Cruz-Uribe, D, Neugebauer, C. Sharp error bounds for the trapezoidal rule and simpson’s rule. J Inequalities Pure Appl Math 2002;3:22.Search in Google Scholar

21. Breiman, L, Friedman, J, Stone, CJ, Olshen, RA. Classification and Regression Trees. Monterey CA: Wadsworth and Brooks; 1984.Search in Google Scholar

22. Rahman, R. MultivariateRandomForest: models multivariate cases using random forests. R package version 1.1.5, (2017).Search in Google Scholar

23. Segal, M, Xiao, Y. Multivariate random forests. Wiley Interdisciplinary Reviews: Data Min Knowl Discov 2011;1:80–7. https://doi.org/10.1002/widm.12.Search in Google Scholar

24. Cho, Y, Molinaro, AM, Hu, C, Strawderman, RL. Regression trees for cumulative incidence functions; 2020, arXiv (stat.ME; 2011.06706).Search in Google Scholar

25. Lin, DY, Wei, LJ, Ying, Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 1993;80:557–72. https://doi.org/10.1093/biomet/80.3.557.Search in Google Scholar

26. LeBlanc, M, Crowley, J. Relative risk trees for censored survival data. Biometrics 1992;48:411–25. https://doi.org/10.2307/2532300.Search in Google Scholar

27. Therneau, TM, Atkinson, EJ. An introduction to recursive partitioning using the rpart routines; 2015.Search in Google Scholar

28. Jeong, J-H, Fine, JP. Parametric regression on the cumulative incidence function. Biostatistics 2007;8:184–96. https://doi.org/10.1093/biostatistics/kxj040.Search in Google Scholar PubMed

29. Sun, Y, Wang, HJ, Gilbert, PB. Quantile regression for competing risks data with missing cause of failure. Stat Sin 2012;22:703. https://doi.org/10.5705/ss.2010.093.Search in Google Scholar PubMed PubMed Central

30. Curran, WJ, Paulus, R, Langer, CJ, Komaki, R, Lee, JS, Hauser, S, et al.. Sequential vs concurrent chemoradiation for stage III non–small cell lung cancer: randomized phase III trial RTOG 9410. J Natl Cancer Inst 2011;103:1452–60. https://doi.org/10.1093/jnci/djr325.Search in Google Scholar PubMed PubMed Central

31. Greenwell, BM. pdp: an R package for constructing partial dependence plots. R J 2017;9:421. https://doi.org/10.32614/rj-2017-016.Search in Google Scholar

32. Zeileis, A, Hothorn, T, Hornik, K. Model-based recursive partitioning. J Comput Graph Stat 2008;17:492–514. https://doi.org/10.1198/106186008x319331.Search in Google Scholar

33. Cui, Y, Zhu, R, Zhou, M, Kosorok, M. Consistency of survival tree and forest models: splitting bias and correction; 2019, arXiv (math.ST; 1707.09631).Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2021-0014).

Received: 2021-02-15

Accepted: 2022-03-02

Published Online: 2022-03-25

Regression trees and ensembles for cumulative incidence functions

Abstract

Acknowledgment

References

Supplementary Material

Journal and Issue

Articles in the same Issue