Network-Based Discriminant Analysis for Multiclassification

Li-Pang Chen ORCID: orcid.org/0000-0001-5440-5036¹

639 Accesses
7 Citations
Explore all metrics

Abstract

Classification for multi-label responses, known as multiclassification, has been an important problem in supervised learning and has attracted our attention. In the framework of statistical learning, discriminant analysis is a powerful method to do multiclassification. With the increasing availability of complex data, it becomes more challenging to analyze them. One of the important features in complex data is the network structure, which is ubiquitous in high-dimensional data because of strong or weak correlations among variables. Although discriminant analysis is one of the supervised learning methods to deal with multiclassification and relevant extensions have been explored, little method has been available to handle multiclassification with network structures accommodated. To incorporate network structures in predictors and improve the accuracy of classification, we propose network-based linear discriminant analysis and network-based quadratic discriminant analysis in this paper. The main advantage of the proposed methods is to estimate the inverse of covariance matrices directly and do classification for multi-label responses instead of restricting on binary responses. In addition, the proposed methods are easy to compute and implement. Finally, numerical studies are conducted to assess the performance of the proposed methods, and numerical results verify that the proposed methods outperform their competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Data Availability

The real dataset is available on the website. See the following link: https://archive.ics.uci.edu/ml/datasets/glass+identification.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In N. Petrov F. Czaki (Eds.) 2nd International Symposium on Information Theory (pp. 267–281). Bydapest: Akademiai Kaido.
Baladanddayuthapani, V., Talluri, R., Ji, Y., Coombes, K.R., Lu, Y., Hennessy, B.T., Davies, M.A., & Mallick, B.K. (2014). Bayesian sparse graphical models for classification with application to protein expression data. The Annals of Applied Statistics, 8, 1443–1468.
Article MathSciNet MATH Google Scholar
Bagirov, A.M., Ferguson, B., Ivkovic, S., Saunders, G., & Yearwood, J. (2003). New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics, 19, 1800–1807.
Article Google Scholar
Bicciato, S., Luchini, A., & Bello, C.D. (2003). PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics, 19, 571–578.
Article Google Scholar
Bielza, C., Li, G., & Larrañaga, P. (2011). Multi-dimensional classification with Bayesian networks. International Journal of Approximate Reasoning, 52, 705–727.
Article MathSciNet MATH Google Scholar
Cai, W., Guan, G., Pan, R., Zhu, X., & Wang, H. (2018). Network linear discriminant analysis. Computational Statistics and Data Analysis, 117, 32–44.
Article MathSciNet MATH Google Scholar
Chen, J., & Chen, Z. (2012). Extended BIC for small-n-large-P spases GLM. Statistica Sinica, 22, 555–574.
Article MathSciNet MATH Google Scholar
Chen, L.-P. (2018). Multiclassification to gene expression data with some complex features. Biostatistics and Biometrics Open Access Journal, 9, 555751. https://doi.org/10.19080/BBOAJ.2018.09.555751.
Article Google Scholar
Chen, L.-P. (2019). Survival Analysis of Complex Featured Data with Measurement Error. UWSpace. http://hdl.handle.net/10012/14927.
Chen, L.-P., Yi, G.Y., Zhang, Q., & He, W. (2019). Multiclass analysis and prediction with network structured covariates. Journal of Statistical Distributions and Applications, 6, 6. https://doi.org/10.1186/s40488-019-0094-2.
Article MATH Google Scholar
Chen, L.-P., & Yi, G.Y. (2021). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics, 77, 956–969.
Article MathSciNet Google Scholar
Clemmensen, L., Hastie, T., Witten, D., & Ersbøll, B. (2011). Sparse discriminant analysis. Technometrics, 53, 406–413.
Article MathSciNet Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432–441.
Article MATH Google Scholar
Guo, Y., Hastie, T., & Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 8, 86–100.
Article MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.
MATH Google Scholar
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. New York: CRC press.
Book MATH Google Scholar
He, W., Yi, G.Y., & Chen, L.-P. (2019). Support vector machine with graphical network structures in features. In Proceedings, Machine Learning and Data Mining in Pattern Recognition, 15th International Conference on Machine Learning and Data Mining, MLDM 2019, (Vol. II pp. 557–570).
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article MATH Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). An Introduction to Statistical Learning: With Applications in R. New York: Springer.
MATH Google Scholar
Liu, J.J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., & Ling, X.B. (2005). Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics, 21, 2691–2697.
Article Google Scholar
Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Annals of Statistics, 34, 1436–1462.
Article MathSciNet MATH Google Scholar
Miguel Hernández-Lobato, J., Hernández-Lobato, D., & Suárez, A. (2011). Network-based sparse Bayesian classification. Pattern Recognition, 44, 886–900.
Article MATH Google Scholar
Peterson, C.B., Stingo, F.C., & Vannucci, M. (2015). Joint Bayesian variable and graph selection for regression models with network-structured predictors. Statistics in Medicine, 35, 1017–1031.
Article MathSciNet Google Scholar
Ravikumar, P., Wainwright, M.J., & Lafferty, J. (2010). High-Dimensional Ising Model Selection Using ℓ₁-Regularized Logistic Regression. The Annals of Statistics, 38, 1287–1319.
Article MathSciNet MATH Google Scholar
Safo, S.E., & Ahn, J. (2016). General sparse multi-class linear discriminant analysis. Computational Statistics and Data Analysis, 99, 81–90.
Article MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of model. Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58, 267–288.
MathSciNet MATH Google Scholar
Wang, H., Li, R., & Tsai, C. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94, 553–568.
Article MathSciNet MATH Google Scholar
Wan, Y.-W., Allen, G.I., Baker, Y., Yang, E., Ravikumar, P., Anderson, M., & Liu, Z. (2016). XMRF: an R package to fit Markov Networks to high-throughput genetics data. BMC Systems Biology, 10(Suppl 3), 69.
Article Google Scholar
Witten, D.M., & Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society, Series B, 73, 753–772.
Article MathSciNet MATH Google Scholar
Yang, E., Ravikumar, P., Allen, G.I., & Liu, Z. (2015). Graphical models via univariate exponential family distribution. Journal of Machine Learning Research, 16, 3813–3847.
MathSciNet MATH Google Scholar
Yi, G.Y., Tan, X., & Li, R. (2015). Variable selection and inference procedures for marginal analysis of longitudinal data with missing observations and covariate measurement error. The Canadian Journal of Statistics, 43, 498–518.
Article MathSciNet MATH Google Scholar
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19–35.
Article MathSciNet MATH Google Scholar
Zhao, T., Liu, H., Roeder, K., Lafferty, J., & Wasserman, L. (2012). The huge package for high-dimensional undirected graph estimation in R. Journal of Machine Learning Research, 13, 1059–1062.
MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author thanks the editor, the associate editor, and three referees for their helpful comments on the initial version.

Author information

Authors and Affiliations

Department of Statistics, National Chengchi University, Taipei, 116, Taiwan (ROC)
Li-Pang Chen

Authors

Li-Pang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-Pang Chen.

Ethics declarations

Ethics Approval

This research does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The author declares no competing interests.

Additional information

Supplementary Material

The supplementary material contains simulation studies and their numerical results for the proposed methods.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 277 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, LP. Network-Based Discriminant Analysis for Multiclassification. J Classif 39, 410–431 (2022). https://doi.org/10.1007/s00357-022-09414-y

Download citation

Accepted: 30 March 2022
Published: 02 June 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00357-022-09414-y

Network-Based Discriminant Analysis for Multiclassification

Abstract

Access this article

Subscribe and save

Buy Now

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Supplementary Material

Publisher’s Note

Electronic supplementary material

(PDF 277 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Network-Based Discriminant Analysis for Multiclassification

Abstract

Access this article

Subscribe and save

Buy Now

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Supplementary Material

Publisher’s Note

Electronic supplementary material

(PDF 277 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation