Abstract
Interpretability and stability are two important characteristics required for the application of high dimensional data in statistics. Although the former has been favored by many existing forecasting methods to some extent, the latter in the sense of controlling the fraction of wrongly discovered features is still largely underdeveloped. Under the accelerated failure time model, this paper introduces a controlled variable selection method with the general framework of Model-X knockoffs to tackle high dimensional data. We provide theoretical justifications on the asymptotic false discovery rate (FDR) control. The proposed method has attracted significant interest due to its strong control of the FDR while preserving predictive power. Several simulation examples are conducted to assess the finite sample performance with desired interpretability and stability. A real data example from Acute Myeloid Leukemia study is analyzed to demonstrate the utility of the proposed method in practice.
Similar content being viewed by others
References
Askari A, Rebjock Q, d’Aspremont A, Ghaoui LE (2021) Fanok: Knockoffs in linear time. SIAM J Math Data Sci 3(3):833–853
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodol) 57(1):289–300
Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085
Barber RF, Candès EJ (2019) A knockoff filter for high-dimensional selective inference. Ann Stat 47(5):2504–2537
Candès E, Fan Y, Janson L, Lv J (2018) Panning for gold: model-X knockoffs for high dimensional controlled variable selection. J R Stat Soc Ser B (Statistical Methodology) 80(3):551–577
Cheng C, Feng X, Huang J, Jiao Y, Zhang S (2022) \(\ell _{0}\)-Regularized high-dimensional accelerated failure time model. Comput Stat Data Anal 170:107430
Choi T, Choi S (2021) A fast algorithm for the accelerated failure time model with high-dimensional time-to-event data. J Stat Comput Simul 91:1–19
Fan Y, Lv J, Sharifvaghefi M, Uematsu Y (2020) IPAD: stable interpretable forecasting with knockoffs inference. J Am Stat Assoc 115(532):1822–1834
Huang J, Ma S, Xie H (2007) Least absolute deviations estimation for the accelerated failure time model. Statistica Sinica 17:1533–1548
Kalbfleisch DJ, Prentice LR (1980) The statistical analysis of failure time data. Wiley, New York
Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Annals Stat 28(5):1356–1378
Kwon JH, Ha ID (2021) Penalized variable selection in mean-variance accelerated failure time models. Korean J Appl Stat 34(3):411–425
Park E, Ha ID (2018) Penalized variable selection for accelerated failure time models. Commun Stat Appl Methods 25(6):591–604
Romano Y, Sesia M, Candès E (2020) Deep knockoffs. J Am Stat Assoc 115(532):1861–1872
Ramchandani R, Finkelstein DM, Schoenfeld DA (2020) Estimation for an accelerated failure time model with intermediate states as auxiliary information. Lifetime Data Anal 26(1):1–20
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23(4):461–471
Swindell WR (2009) Accelerated failure time models provide a useful statistical framework for aging research. Exp Gerontol 44(3):190–200
Xu L, Guo Y, Yan W, Cen J, Niu Y, Yan Q, Hu S (2017) High level of miR-196b at newly diagnosed pediatric acute myeloid leukemia predicts a poor outcome. EXCLI J 16:197–209
Zhou T, Zhu L (2017) Model-free feature screening for ultrahigh dimensional censored regression. Stat Comput 27(4):947–961
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) (No. 12301364; 11901175), Anhui Province Natural Science Foundation Youth Project (No.2308085QA09). Hubei Key Laboratory of Big Data in Science and Technology (No. E3KF291001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
He, B., Xia, D. & Pan, Y. High dimensional controlled variable selection with model-X knockoffs in the AFT model. Comput Stat 39, 1993–2009 (2024). https://doi.org/10.1007/s00180-023-01426-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01426-5