A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1120 Accesses

Abstract

We are motivated by the problem of identifying potentially nonlinear regression relationships between high-dimensional outputs and high-dimensional inputs of heterogeneous data. This requires regression, clustering, and model selection, simultaneously. In this framework, we apply the mixture of experts models which are among the most popular ensemble learning techniques developed in the field of neural networks. In particular, we consider a more general case of mixture of experts models characterized by multiple Gaussian experts whose means are polynomials of the input variables and whose covariance matrices have block-diagonal structures. More especially, each expert is weighted by a gating network that is a softmax function of a polynomial of the input variables. These models require several hyper-parameters, including the number of mixture components, the complexity of the softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of the covariance matrices. We provide a non-asymptotic theory for model selection of such complex hyper-parameters using the slope heuristic approach in a penalized maximum likelihood estimation framework. Specifically, we establish a non-asymptotic risk bound on the penalized maximum likelihood estimation, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models

A flexible probabilistic framework for large-margin mixture of experts

Article 05 June 2019

Better than the best? Answers via model ensemble in density-based clustering

Article Open access 02 October 2020

References

Anderson, C.W., Stolz, E.A., Shamsunder, S.: Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans. Biomed. Eng. 45(3), 277–286 (1998)
Article Google Scholar
Arlot, S.: Minimal penalties and the slope heuristics: a survey. J. Soc. Française Stat. 160(3), 1–106 (2019)
MathSciNet Google Scholar
Bengio, Y.: Deep learning of representations: looking forward. In: Dediu, A.-H., Martín-Vide, C., Mitkov, R., Truthe, B. (eds.) SLSP 2013. LNCS (LNAI), vol. 7978, pp. 1–37. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39593-2_1
Chapter Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Article Google Scholar
Birgé, L., Massart, P.: Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields 138(1), 33–73 (2007)
Article MathSciNet Google Scholar
Borwein, J.M., Zhu, Q.J.: Techniques of Variational Analysis. Springer, Heidelberg (2004). https://doi.org/10.1007/0-387-28271-8
Book Google Scholar
Chamroukhi, F., Huynh, B.T.: Regularized maximum-likelihood estimation of mixture-of-experts for regression and clustering. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2018)
Google Scholar
Chen, Z., Deng, Y., Wu, Y., Gu, Q., Li, Y.: Towards understanding the mixture-of-experts layer in deep learning. In: NeurIPS (2022)
Google Scholar
Cohen, S.X., Le Pennec, E.: Partition-based conditional density estimation. ESAIM: Probab. Stat. 17, 672–697 (2013)
Article MathSciNet Google Scholar
Cohen, S., Le Pennec, E.: Conditional density estimation by penalized likelihood model selection and applications. Technical report, INRIA (2011)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet Google Scholar
Devijver, E.: Finite mixture regression: a sparse variable selection by model selection for clustering. Electron. J. Stat. 9(2), 2642–2674 (2015)
Article MathSciNet Google Scholar
Devijver, E.: Joint rank and variable selection for parsimonious estimation in a high-dimensional finite mixture regression model. J. Multivar. Anal. 157, 1–13 (2017)
Article MathSciNet Google Scholar
Devijver, E., Gallopin, M.: Block-diagonal covariance selection for high-dimensional Gaussian graphical models. J. Am. Stat. Assoc. 113(521), 306–314 (2018)
Article MathSciNet Google Scholar
Ho, N., Yang, C.Y., Jordan, M.I.: Convergence rates for Gaussian mixtures of experts. J. Mach. Learn. Res. 23(323), 1–81 (2022)
MathSciNet Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Khalili, A.: New estimation and feature selection methods in mixture-of-experts models. Can. J. Stat. 38(4), 519–539 (2010)
Article MathSciNet Google Scholar
Kwon, J., Qian, W., Caramanis, C., Chen, Y., Davis, D.: Global convergence of the EM algorithm for mixtures of two component linear regression. In: COLT, vol. 99, pp. 2055–2110. PMLR (2019)
Google Scholar
Masoudnia, S., Ebrahimpour, R.: Mixture of experts: a literature survey. Artif. Intell. Rev. 42(2), 275–293 (2014)
Article Google Scholar
Massart, P.: Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-48503-2_7
Book Google Scholar
Maugis, C., Michel, B.: A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: Probab. Stat. 15, 41–68 (2011)
Article MathSciNet Google Scholar
Mazumder, R., Hastie, T.: Exact covariance thresholding into connected components for large-scale graphical lasso. J. Mach. Learn. Res. 13(1), 781–794 (2012)
MathSciNet Google Scholar
Mendes, E.F., Jiang, W.: On convergence rates of mixtures of polynomial experts. Neural Comput. 24(11), 3025–3051 (2012)
Article MathSciNet Google Scholar
Montuelle, L., Le Pennec, E., et al.: Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach. Electron. J. Stat. 8(1), 1661–1695 (2014)
Article MathSciNet Google Scholar
Nguyen, H.D., Chamroukhi, F.: Practical and theoretical aspects of mixture-of-experts modeling: an overview. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 8(4), e1246 (2018)
Google Scholar
Nguyen, H.D., Chamroukhi, F., Forbes, F.: Approximation results regarding the multiple-output Gaussian gated mixture of linear experts model. Neurocomputing 366, 208–214 (2019)
Article Google Scholar
Nguyen, H.D., Nguyen, T., Chamroukhi, F., McLachlan, G.J.: Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models. J. Stat. Distrib. Appl. 8(1), 13 (2021)
Article Google Scholar
Nguyen, T., Chamroukhi, F., Nguyen, H.D., McLachlan, G.J.: Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces. Commun. Stat. - Theory Methods 52, 1–12 (2022)
MathSciNet Google Scholar
Nguyen, T., Nguyen, H.D., Chamroukhi, F., McLachlan, G.J.: An \(l_1\)-oracle inequality for the Lasso in mixture-of-experts regression models. arXiv preprint arXiv:2009.10622 (2020)
Nguyen, T., Nguyen, H.D., Chamroukhi, F., McLachlan, G.J.: Approximation by finite mixtures of continuous density functions that vanish at infinity. Cogent Math. Stat. 7(1), 1750861 (2020)
Article MathSciNet Google Scholar
Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)
MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B 68(1), 49–67 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Inria Grenoble Rhone-Alpes, 655 av. de l’Europe, 38335, Montbonnot, France
TrungTin Nguyen
Department of Statistical Sciences, University of Padova, Padua, Italy
Dung Ngoc Nguyen
School of Mathematics and Physics, University of Queensland, Brisbane, Australia
Hien Duy Nguyen
IRT SystemX, Palaiseau, France
Faicel Chamroukhi

Authors

TrungTin Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Dung Ngoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hien Duy Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Faicel Chamroukhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dung Ngoc Nguyen .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 477 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T., Nguyen, D.N., Nguyen, H.D., Chamroukhi, F. (2024). A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_19
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models

A flexible probabilistic framework for large-margin mixture of experts

Better than the best? Answers via model ensemble in density-based clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 477 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Non-asymptotic Risk Bound for Model Selection in a High-Dimensional Mixture of Experts via Joint Rank and Variable Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models

A flexible probabilistic framework for large-margin mixture of experts

Better than the best? Answers via model ensemble in density-based clustering

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 477 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation