Abstract
In this paper, we propose a new correlation, called stable correlation, to measure the dependence between two random vectors. The new correlation is well defined without the moment condition and is zero if and only if the two random vectors are independent. We also study its other theoretical properties. Based on the new correlation, we further propose a robust model-free feature screening procedure for ultrahigh dimensional data and establish its sure screening property and rank consistency property without imposing the subexponential or sub-Gaussian tail condition, which is commonly required in the literature of feature screening. We also examine the finite sample performance of the proposed robust feature screening procedure via Monte Carlo simulation studies and illustrate the proposed procedure by a real data example.
Similar content being viewed by others
References
Blum J R, Kiefer J, Rosenblatt M. Distribution free tests of independence based on the sample distribution function. Ann Math Statist, 1961, 32: 485–498
Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol, 2008, 70: 849–911
Gretton A, Fukumizu K, Teo C H, et al. A kernel statistical test of independence. In: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems. Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2008, 585–592
Hall P, Miller H. Using generalized correlation to effect variable selection in very high dimensional problems. J Comput Graph Statist, 2009, 18: 533–550
Heller R, Heller Y, Gorfine M. A consistent multivariate test of association based on ranks of distances. Biometrika, 2013, 100: 503–510
Hoeffding W. A non-parametric test of independence. Ann Math Statist, 1948, 19: 546–557
Huo X M, Székely G J. Fast computing for distance covariance. Technometrics, 2016, 58: 435–447
Kendall M G. A new measure of rank correlation. Biometrika, 1938, 30: 81–93
Kim I, Balakrishnan S, Wasserman L. Robust multivariate nonparametric tests via projection-averaging. Ann Statist, 2020, in press
Li G R, Peng H, Zhang J, et al. Robust rank correlation based screening. Ann Statist, 2012, 40: 1846–1877
Li R Z, Zhong W, Zhu L P. Feature screening via distance correlation learning. J Amer Statist Assoc, 2012, 107: 1129–1139
Liu W J, Ke Y, Li R Z. Model-free feature screening and FDR control with Knockoff features. J Amer Statist Assoc, 2020, in press
Liu W J, Li R Z. Variable Selection and Feature Screening. Macroeconomic Forecasting in the Era of Big Data, vol. 52. Cham: Springer, 2020
Nolan J P. Multivariate elliptically contoured stable distributions: Theory and estimation. Comput Statist, 2013, 28: 2067–2089
Pan W L, Wang X Q, Zhang H P, et al. Ball covariance: A generic measure of dependence in Banach space. J Amer Statist Assoc, 2020, 115: 307–317
Sejdinovic D, Sriperumbudur B, Gretton A, et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Statist, 2013, 41: 2263–2291
Serfling R J. Approximation Theorems of Mathematical Statistics. New York: Wiley, 1980
Spearman C. The proof and measurement of association between two things. Amer J Psych, 1904, 15: 72–101
Székely G J, Rizzo M L. Partial distance correlation with methods for dissimilarities. Ann Statist, 2014, 42: 2382–2412
Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distances. Ann Statist, 2007, 35: 2769–2794
Weihs L, Drton M, Meinshausen N. Symmetric rank covariances: A generalized framework for nonparametric measures of dependence. Biometrika, 2018, 105: 547–562
Zhong W, Zhu L P, Li R Z, et al. Regularized quantile regression and robust feature screening for single index models. Statist Sinica, 2016, 26: 69–95
Zhu L P, Li L X, Li R Z, et al. Model-free feature screening for ultrahigh-dimensional data. J Amer Statist Assoc, 2011, 106: 1464–1475
Zhu L P, Xu K, Li R Z, et al. Projection correlation between two random vectors. Biometrika, 2017, 104: 829–843
Acknowldgements
The first author was supported by National Natural Science Foundation of China (Grant No. 11701034). The second author was supported by National Science Foundation of USA (Grant No. DMS-1820702). The authors are grateful to the two anonymous referees for the constructive comments and suggestions that led to significant improvement of an early manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, X., Li, R., Liu, W. et al. Stable correlation and robust feature screening. Sci. China Math. 65, 153–168 (2022). https://doi.org/10.1007/s11425-019-1702-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-019-1702-5