Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference

Zhanyu Ma¹,
Arne Leijon²,
Zheng-Hua Tan³ &
…
Sheng Gao¹

523 Accesses
8 Citations
Explore all metrics

Abstract

In Bayesian analysis of a statistical model, the predictive distribution is obtained by marginalizing over the parameters with their posterior distributions. Compared to the frequently used point estimate plug-in method, the predictive distribution leads to a more reliable result in calculating the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution. The approximated predictive distribution obtained by minimizing the upper-bound is analytically tractable, facilitating the computation of the predictive likelihood. With synthesized data and real data evaluations, the good performance of the proposed LVI based method is demonstrated by comparing with some conventionally used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Article 04 March 2020

Jeffreys’ Priors for Mixture Estimation

Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications

Article 04 December 2024

Notes

In an extreme case, if the posterior distribution has no variance, the point estimate has absolute certainty.
There was another Bayesian estimation method proposed in [28]. However, the method introduced in [28] used the multiple lower-bounds (MLB) approximation to derive an analytically tractable solution. Different from [28], the method presented in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished) used the single lower-bound (SLB) approximation. As discussed in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished), the MLB approximation based solution cannot guarantee the convergency, while the SLB approximation based solution is more concise and can guarantee the convergency.
If a function f(x) is not convex in x but convex in ln x, it is called “convex relative to” ln x.
To prevent confusion, we use f(x; a) to denote the PDF of x parameterized by parameter a. f(x|a) is used to denote the conditional PDF of x given a, where both x and a are random variables. Both f(x; a) and f(x|a) have exactly the same mathematical expressions.
\(\tilde {\mathbf {u}}_{\backslash j}\) denotes all the elements in \(\tilde {\mathbf {u}}\) except \(\tilde {u}_j\).
The KL divergence from f(x) to g(x) is calculated as \(\text {KL}(f\|g)=\int f(x)\ln \frac {f(x)}{g(x)} dx\)
⊘ is the element-wise division.
Here, the dimensionalities of the mDWT coefficients are the same for all the channels.

References

Bjørnstad, J.F. (1990). Predictive likelihood: a review. Statistical Science, 5, 242–254.
Article MathSciNet Google Scholar
Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Sorenson, H.W. (1980). Parameter estimation: principles and problems. New York: Marcel Dekker.
MATH Google Scholar
Kamen, E.W., & Su, J. (1999). Introduction to optimal estimation, ser. Advanced textbooks in control and signal processing. London: Springer.
Google Scholar
Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–807.
MATH MathSciNet Google Scholar
Sinharay, S., & Stern, H.S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical Planning and Inference, 111, 209–221.
Article MATH MathSciNet Google Scholar
Patel, J.K., & Read C.B. (1996). Handbook of the normal distribution, ser. Statistics, textbooks and monographs. Marcel Dekker.
Jain, A.K., Duin, R.P.W., Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.
Article Google Scholar
McLachlan, G., & Peel, D. (2000). Finite mixture models, ser. Wiley series in probability and statistics: applied probability and statistics. Wiley.
Figueiredo, M.A.T., & Jain, A.K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 381–396.
Article Google Scholar
McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, ser. Wiley series in probability and statistics. Wiley-Interscience.
Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Article MATH MathSciNet Google Scholar
Ma, Z. (2011). Non-Gaussian statistical models and their applications. Ph.D. dissertation, US-AB, Stockholm: KTH - Royal Institute of Technology.
Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.
Article Google Scholar
Atapattu, S., Tellambura, C., Jiang, H. (2011). A mixture Gamma distribution to model the SNR of wireless channels. IEEE Transactions on Wireless Communications, 10(12), 4193–4203.
Article Google Scholar
Ma, Z., Leijon, A., Kleijn, W.B. (2013). Vector quantization of LSF parameters with a mixture of Dirichlet distributions. IEEE Transactions on Audio, Speech, and Language Processing, 21(9), 1777–1790.
Article Google Scholar
Bouguila, N., Ziou, D., Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Transactions on Image Processing, 13(11), 1533–1543.
Article Google Scholar
Blei, D.M. (2004). Probabilistic models of text and images. Ph.D, dissertation. University of California, Berkeley.
Rana, P.K., Ma, Z., Taghia, J., Flierl, M. (2013). Multiview depth map enhancement by variational Bayes inference estimation of Dirichlet mixture models. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP).
Ma, Z., & Leijon, A. (2010). Modeling speech line spectral frequencies with Dirichlet mixture models. In Proceedings of INTERSPEECH (pp. 2370–2373).
Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Blei, D.M., & Jordan, M.I. (2005). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1, 121–144.
Article MathSciNet Google Scholar
Orbanz, P., & Teh, Y.W. (2010). Bayesian nonparametric models. Encyclopedia of Machine Learning, 88–89.
Orbanz, P. (2010). Construction of nonparametric Bayesian models from parametric Bayes equations. In Advances in neural information processing systems.
Ghahramani Z. (2012). Bayesian non-parametrics and the probabilistic approach to modelling. Philosophical Transactions of the Royal Society A, 371.
Minka, T.P. (2003). Estimating a Dirichlet distribution. Annals of Physics, 2000(8), 1–13.
Google Scholar
Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.
Article Google Scholar
Fan, W., Bouguila, N., Ziou, D. (2012). Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 762–774.
Article Google Scholar
Palmer, J.A. (2003). Relative convexity. ECE Dept., UCSD Tech. Rep.
Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1, 17–35.
Article MATH MathSciNet Google Scholar
Jaakkola, T.S., & Jordan, M.I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25–37.
Article Google Scholar
Jaakkola, T.S. (2001). Tutorial on variational approximation methods. In M. Opper & D. Saad (Eds.), Advances in mean field methods (pp. 129–159). Cambridge: MIT Press.
Google Scholar
Hoffman, M., Blei, D., Cook, P. (2010). Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the international conference on machine learning.
Minka, T.P. (2001). Expectation propagation for approximate Bayesian inference. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 362–369).
Minka, T.P. (2001). A family of algorithms for approximate Bayesian inference. Ph.D. dissertation. Massachusetts Institute of Technology.
Ma, Z. (2012). Bayesian estimation of the Dirichlet distribution with expectation propagation. In Proceeding of the 20th European signal processing conference (pp. 689–693).
Ma, Z., & Leijon, A. (2011). Approximating the predictive distribution of the beta distribution with the local variational method. In Proceedings of IEEE international workshop on machine learning for signal processing (pp. 1–6).
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Brookes, M. (2013). The matrix reference manual. Available online: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html. Accessed 9 Aug 2013.
Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of Neural Engineering, 4(2), R1.
Article Google Scholar
Prasad, S., Tan, Z.-H., Prasad, R., Cabrera, A.F., Gu, Y., Dremstrup, K. (2011). Feature selection strategy for classification of single-trial EEG elicited by motor imagery. In International symposium on wireless personal multimedia communications (WPMC) (pp. 1–4).
Ma, Z., Tan, Z.-H., Prasad, S. (2012). EEG signal classification with super-Dirichlet mixture model. In Proceedings of IEEE statistical signal processing workshop (pp. 440–443).
Subasi, A. (2007). EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Systems with Applications, 32(4), 1084–1093.
Article Google Scholar
Farina, D., Nascimento, O.F., Lucas, M.F., Doncarli, C. (2007). Optimization of wavelets for classification of movement-related cortical potentials generated by variation of force-related parameters. Journal of Neuroscience Methods, 162, 357–363.
Article Google Scholar
Ma, Z., & Leijon, A. (2011). Super-Dirichlet mixture models using differential line spectral frequences for text-independent speaker identification. In Proceedings of INTERSPEECH (pp. 2349–2352).
BCI competition III. http://www.bbci.de/competition/iii.
Lal, T.N., Schroder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Scholkopf, B. (2004). Support vector channel selection in BCI. IEEE Transactions on Biomedical Engineering, 51(6), 1003–1010.
Article Google Scholar
Malina, W. (1981). On an extended fisher criterion for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5), 611–614.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition and Intelligent System Laboratory, Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma & Sheng Gao
School of Electrical Engineering, KTH - Royal Institute of Technology, Stockholm, Sweden
Arne Leijon
Department of Electronic Systems, Aalborg University, Aalborg, Denmark
Zheng-Hua Tan

Authors

Zhanyu Ma
View author publications
You can also search for this author in PubMed Google Scholar
Arne Leijon
View author publications
You can also search for this author in PubMed Google Scholar
Zheng-Hua Tan
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhanyu Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Z., Leijon, A., Tan, ZH. et al. Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference. J Sign Process Syst 74, 359–374 (2014). https://doi.org/10.1007/s11265-013-0769-8

Download citation

Received: 04 October 2012
Revised: 08 March 2013
Accepted: 25 April 2013
Published: 30 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s11265-013-0769-8

Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Jeffreys’ Priors for Mixture Estimation

Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Jeffreys’ Priors for Mixture Estimation

Revisiting Dirichlet Mixture Model: unraveling deeper insights and practical applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation