[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

The minimum weighted covariance determinant estimator for high-dimensional data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In a variety of diverse applications, it is very desirable to perform a robust analysis of high-dimensional measurements without being harmed by the presence of a possibly larger percentage of outlying measurements. The minimum weighted covariance determinant (MWCD) estimator, based on implicit weights assigned to individual observations, represents a promising and flexible extension of the popular minimum covariance determinant (MCD) estimator of the expectation and scatter matrix of mlutivariate data. In this work, a regularized version of the MWCD denoted as the minimum regularized weighted covariance determinant (MRWCD) estimator is proposed. At the same time, it is accompanied by an outlier detection procedure. The novel MRWCD estimator is able to outperform other available robust estimators in several simulation scenarios, especially in estimating the scatter matrix of contaminated high-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Agostinelli C, Leung A, Yohai VJ, Zamar RH (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST 24:441–461

    Article  MathSciNet  MATH  Google Scholar 

  • Ashurbekova K, Usseglio-Carleve A, Forbes F, Achard S (2019) Optimal shrinkage for robust covariance matrix estimators in a small sample size setting. https://hal.archives-ouvertes.fr/hal-02378034

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Boudt K, Rousseeuw PJ, Vanduffel S, Verdonck T (2020) The minimum regularized covariance determinant estimator. Stat Comput 30:113–128

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A, Farcomeni A (2011) Error rates for multivariate outlier detection. Comput Stat Data Anal 55:544–553

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample. Stat Methods Appl 27:559–587

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Y, Wiesel A, Hero AO (2011) Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans Signal Process 59:4097–4107

    Article  MathSciNet  MATH  Google Scholar 

  • Čížek P (2011) Semiparametrically weighted robust estimation of regression models. Comput Stat Data Anal 55:774–788

    Article  MathSciNet  MATH  Google Scholar 

  • Couillet R, McKay M (2014) Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators. J Multivar Anal 131:99–120

    Article  MathSciNet  MATH  Google Scholar 

  • DeMiguel V, Martin-Utrera A, Nogales FJ (2013) Size matters: optimal calibration of shrinkage estimators for portfolio selection. J Bank Finance 37:3018–3034

    Article  Google Scholar 

  • Filzmoser P, Todorov V (2011) Review of robust multivariate statistical methods in high dimension. Anal Chinica Acta 705:2–14

    Article  Google Scholar 

  • Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52:1694–1711

    Article  MathSciNet  MATH  Google Scholar 

  • Fritsch V, Varoquaux G, Thyreau B, Poline JB, Thirion B (2011) Detecting outlying subjects in high-dimensional neuroimaging datasets with regularized minimum covariance determinant. Lect Notes Comput Sci 6893:264–271

    Article  Google Scholar 

  • Gschwandtner M, Filzmoser P (2013) Outlier detection in high dimension using regularization. In: Kruse R et al (eds) Synergies of soft computing and statistics. Springer, Berlin, pp 37–244

    Google Scholar 

  • Gschwandtner M, Filzmoser P, Croux C, Haesbroeck G (2012) rrlda: robust regularized linear discriminant analysis. R package version 1.1. https://CRAN.R-project.org/package=rrlda

  • Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:928–946

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Hubert M, Debruyne M (2010) Minimal covariance determinant. Wiley Interdiscip Rev Comput Stat 2:36–43

    Article  Google Scholar 

  • Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79

    Article  MathSciNet  Google Scholar 

  • Hubert M, Rousseeuw PJ, Verdonck T (2012) A deterministic algorithm for robust location and scatter. J Comput Graph Stat 21:618–637

    Article  MathSciNet  Google Scholar 

  • Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. WIREs Comput Stat 10:e1421

    Article  MathSciNet  Google Scholar 

  • Jurečková J, Sen PK, Picek J (2013) Methodology in robust and nonparametric statistics. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Jurečková J, Picek J, Schindler M (2019) Robust statistical methods with R, 2nd edn. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Kalina J (2021) The minimum weighted covariance determinant estimator revisited. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2020.1725818

    Article  MathSciNet  MATH  Google Scholar 

  • Kalina J, Tichavský J (2019) Statistical learning for recommending (robust) nonlinear regression methods. J Appl Math Stat Inform 15(2):47–59

    Article  MathSciNet  MATH  Google Scholar 

  • Kalina J, Tichavský J (2020) On robust estimation of error variance in (highly) robust regression. Meas Sci Rev 20:6–14

    Article  Google Scholar 

  • Kalina J, Hlinka J, (2017) Implicitly weighted robust classification applied to brain activity research. In: Fred A, Gamboa H (eds) Biomedical engineering systems and technologies BIOSTEC, (2016) Communications in Computer and Information Science 690. Springer, Cham, pp 87–107

  • Karjanto S, Ramli NM, Ghani NAM, Aripin R, Yusop NM (2015) Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection. AIP Conf Proc 1643:225–231

    Article  Google Scholar 

  • Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411

    Article  MathSciNet  MATH  Google Scholar 

  • Lee K, You K (2019) CovTools: statistical tools for covariance analysis. R package version 0.5.3. https://CRAN.R-project.org/package=CovTools

  • Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47:653–665

    Article  MathSciNet  MATH  Google Scholar 

  • Pourahmadi M (2013) High-dimensional covariance estimation. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org

  • Ro K, Zou C, Wang Z (2015) Outlier detection for high-dimensional data. Biometrika 102:589–599

    Article  MathSciNet  MATH  Google Scholar 

  • Roelant E, Van Aelst S, Willems G (2009) The minimum weighted covariance determinant estimator. Metrika 70:177–204

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

  • Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York

    Book  MATH  Google Scholar 

  • Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639

    Article  Google Scholar 

  • Rusiecki A (2008) Robust MCD-based backpropagation learning algorithm. Lect Notes Artif Intell 5097:154–163

    Google Scholar 

  • Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:32

    Article  MathSciNet  Google Scholar 

  • Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47

    Article  Google Scholar 

  • Tong J, Hu R, Xi J, Xiao Z, Guo Q, Yu Y (2018) Linear shrinkage estimation of covariance matrices using low-complexity cross-validation. Signal Process 148:223–233

    Article  Google Scholar 

  • Van Aelst S (2016) Stahel–Donoho estimation for high-dimensional data. Int J Comput Math 93:628–639

    Article  MathSciNet  MATH  Google Scholar 

  • Víšek JÁ (2006) The least trimmed squares. Part I: consistency. Kybernetika 42:1–36

    MathSciNet  MATH  Google Scholar 

  • Víšek JÁ (2011) Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47:179–206

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The research was supported by the projects GA21-05325S and GA19-05704S of the Czech Science Foundation. The authors would like to thank Jurjen Duintjer Tebbens for discussion, and would like to thank the anonymous referees, an associate editor, and the editor-in-chief for their time and constructive advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kalina.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalina, J., Tichavský, J. The minimum weighted covariance determinant estimator for high-dimensional data. Adv Data Anal Classif 16, 977–999 (2022). https://doi.org/10.1007/s11634-021-00471-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00471-6

Keywords

Mathematics Subject Classification

Navigation