[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

In Bayesian analysis of a statistical model, the predictive distribution is obtained by marginalizing over the parameters with their posterior distributions. Compared to the frequently used point estimate plug-in method, the predictive distribution leads to a more reliable result in calculating the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution. The approximated predictive distribution obtained by minimizing the upper-bound is analytically tractable, facilitating the computation of the predictive likelihood. With synthesized data and real data evaluations, the good performance of the proposed LVI based method is demonstrated by comparing with some conventionally used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. In an extreme case, if the posterior distribution has no variance, the point estimate has absolute certainty.

  2. There was another Bayesian estimation method proposed in [28]. However, the method introduced in [28] used the multiple lower-bounds (MLB) approximation to derive an analytically tractable solution. Different from [28], the method presented in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished) used the single lower-bound (SLB) approximation. As discussed in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished), the MLB approximation based solution cannot guarantee the convergency, while the SLB approximation based solution is more concise and can guarantee the convergency.

  3. If a function f(x) is not convex in x but convex in ln x, it is called “convex relative to” ln x.

  4. To prevent confusion, we use f(x; a) to denote the PDF of x parameterized by parameter a. f(x|a) is used to denote the conditional PDF of x given a, where both x and a are random variables. Both f(x; a) and f(x|a) have exactly the same mathematical expressions.

  5. \(\tilde {\mathbf {u}}_{\backslash j}\) denotes all the elements in \(\tilde {\mathbf {u}}\) except \(\tilde {u}_j\).

  6. The KL divergence from f(x) to g(x) is calculated as \(\text {KL}(f\|g)=\int f(x)\ln \frac {f(x)}{g(x)} dx\)

  7. ⊘ is the element-wise division.

  8. Here, the dimensionalities of the mDWT coefficients are the same for all the channels.

References

  1. Bjørnstad, J.F. (1990). Predictive likelihood: a review. Statistical Science, 5, 242–254.

    Article  MathSciNet  Google Scholar 

  2. Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  3. Sorenson, H.W. (1980). Parameter estimation: principles and problems. New York: Marcel Dekker.

    MATH  Google Scholar 

  4. Kamen, E.W., & Su, J. (1999). Introduction to optimal estimation, ser. Advanced textbooks in control and signal processing. London: Springer.

    Google Scholar 

  5. Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–807.

    MATH  MathSciNet  Google Scholar 

  6. Sinharay, S., & Stern, H.S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical Planning and Inference, 111, 209–221.

    Article  MATH  MathSciNet  Google Scholar 

  7. Patel, J.K., & Read C.B. (1996). Handbook of the normal distribution, ser. Statistics, textbooks and monographs. Marcel Dekker.

  8. Jain, A.K., Duin, R.P.W., Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.

    Article  Google Scholar 

  9. McLachlan, G., & Peel, D. (2000). Finite mixture models, ser. Wiley series in probability and statistics: applied probability and statistics. Wiley.

  10. Figueiredo, M.A.T., & Jain, A.K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 381–396.

    Article  Google Scholar 

  11. McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, ser. Wiley series in probability and statistics. Wiley-Interscience.

  12. Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.

    Article  MATH  MathSciNet  Google Scholar 

  13. Ma, Z. (2011). Non-Gaussian statistical models and their applications. Ph.D. dissertation, US-AB, Stockholm: KTH - Royal Institute of Technology.

  14. Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.

    Article  Google Scholar 

  15. Atapattu, S., Tellambura, C., Jiang, H. (2011). A mixture Gamma distribution to model the SNR of wireless channels. IEEE Transactions on Wireless Communications, 10(12), 4193–4203.

    Article  Google Scholar 

  16. Ma, Z., Leijon, A., Kleijn, W.B. (2013). Vector quantization of LSF parameters with a mixture of Dirichlet distributions. IEEE Transactions on Audio, Speech, and Language Processing, 21(9), 1777–1790.

    Article  Google Scholar 

  17. Bouguila, N., Ziou, D., Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Transactions on Image Processing, 13(11), 1533–1543.

    Article  Google Scholar 

  18. Blei, D.M. (2004). Probabilistic models of text and images. Ph.D, dissertation. University of California, Berkeley.

  19. Rana, P.K., Ma, Z., Taghia, J., Flierl, M. (2013). Multiview depth map enhancement by variational Bayes inference estimation of Dirichlet mixture models. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP).

  20. Ma, Z., & Leijon, A. (2010). Modeling speech line spectral frequencies with Dirichlet mixture models. In Proceedings of INTERSPEECH (pp. 2370–2373).

  21. Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  22. Blei, D.M., & Jordan, M.I. (2005). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1, 121–144.

    Article  MathSciNet  Google Scholar 

  23. Orbanz, P., & Teh, Y.W. (2010). Bayesian nonparametric models. Encyclopedia of Machine Learning, 88–89.

  24. Orbanz, P. (2010). Construction of nonparametric Bayesian models from parametric Bayes equations. In Advances in neural information processing systems.

  25. Ghahramani Z. (2012). Bayesian non-parametrics and the probabilistic approach to modelling. Philosophical Transactions of the Royal Society A, 371.

  26. Minka, T.P. (2003). Estimating a Dirichlet distribution. Annals of Physics, 2000(8), 1–13.

    Google Scholar 

  27. Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.

    Article  Google Scholar 

  28. Fan, W., Bouguila, N., Ziou, D. (2012). Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 762–774.

    Article  Google Scholar 

  29. Palmer, J.A. (2003). Relative convexity. ECE Dept., UCSD Tech. Rep.

  30. Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1, 17–35.

    Article  MATH  MathSciNet  Google Scholar 

  31. Jaakkola, T.S., & Jordan, M.I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25–37.

    Article  Google Scholar 

  32. Jaakkola, T.S. (2001). Tutorial on variational approximation methods. In M. Opper & D. Saad (Eds.), Advances in mean field methods (pp. 129–159). Cambridge: MIT Press.

    Google Scholar 

  33. Hoffman, M., Blei, D., Cook, P. (2010). Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the international conference on machine learning.

  34. Minka, T.P. (2001). Expectation propagation for approximate Bayesian inference. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 362–369).

  35. Minka, T.P. (2001). A family of algorithms for approximate Bayesian inference. Ph.D. dissertation. Massachusetts Institute of Technology.

  36. Ma, Z. (2012). Bayesian estimation of the Dirichlet distribution with expectation propagation. In Proceeding of the 20th European signal processing conference (pp. 689–693).

  37. Ma, Z., & Leijon, A. (2011). Approximating the predictive distribution of the beta distribution with the local variational method. In Proceedings of IEEE international workshop on machine learning for signal processing (pp. 1–6).

  38. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  39. Brookes, M. (2013). The matrix reference manual. Available online: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html. Accessed 9 Aug 2013.

  40. Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of Neural Engineering, 4(2), R1.

    Article  Google Scholar 

  41. Prasad, S., Tan, Z.-H., Prasad, R., Cabrera, A.F., Gu, Y., Dremstrup, K. (2011). Feature selection strategy for classification of single-trial EEG elicited by motor imagery. In International symposium on wireless personal multimedia communications (WPMC) (pp. 1–4).

  42. Ma, Z., Tan, Z.-H., Prasad, S. (2012). EEG signal classification with super-Dirichlet mixture model. In Proceedings of IEEE statistical signal processing workshop (pp. 440–443).

  43. Subasi, A. (2007). EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Systems with Applications, 32(4), 1084–1093.

    Article  Google Scholar 

  44. Farina, D., Nascimento, O.F., Lucas, M.F., Doncarli, C. (2007). Optimization of wavelets for classification of movement-related cortical potentials generated by variation of force-related parameters. Journal of Neuroscience Methods, 162, 357–363.

    Article  Google Scholar 

  45. Ma, Z., & Leijon, A. (2011). Super-Dirichlet mixture models using differential line spectral frequences for text-independent speaker identification. In Proceedings of INTERSPEECH (pp. 2349–2352).

  46. BCI competition III. http://www.bbci.de/competition/iii.

  47. Lal, T.N., Schroder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Scholkopf, B. (2004). Support vector channel selection in BCI. IEEE Transactions on Biomedical Engineering, 51(6), 1003–1010.

    Article  Google Scholar 

  48. Malina, W. (1981). On an extended fisher criterion for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5), 611–614.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhanyu Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Z., Leijon, A., Tan, ZH. et al. Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference. J Sign Process Syst 74, 359–374 (2014). https://doi.org/10.1007/s11265-013-0769-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-013-0769-8

Keywords

Navigation