Abstract
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semiparametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
Funding Statement
AL was supported by the National Institutes of Health (NIH) under award number DP2-LM013340.
IWM was supported by NIH under award 1R01 AG062401 and by the National Science Foundation (NSF) under award DMS-2112938. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH or NSF.
Acknowledgments
We thank Peter Gilbert for suggesting the application in Section 7.
Citation
Tzu-Jung Huang. Alex Luedtke. Ian W. McKeague. "Efficient estimation of the maximal association between multiple predictors and a survival outcome." Ann. Statist. 51 (5) 1965 - 1988, October 2023. https://doi.org/10.1214/23-AOS2313
Information