Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM
<p>(<b>a</b>) Average values, with standard deviation, over 2000 simulations of the negative log-likelihood along the steps of the Riemann EM. The Riemann EM increases the likelihood. (<b>b</b>) Average and standard deviation of the relative parameter reconstruction errors at the end of the Riemann EM.</p> "> Figure 2
<p>(<b>a</b>) Visual representation of the number of Riemann intervals over the EM steps for each profile <math display="inline"><semantics> <msub> <mi>φ</mi> <mi>i</mi> </msub> </semantics></math>. The total number of Riemann intervals computed over 100 EM iterations are: 5150 for “low”, 14,950 for “medium”, 50,500 for “linear” and 104,950 for “high”. (<b>b</b>) For each profile, average evolution of the negative log-likelihood, with standard deviation, over 50 simulations. The results are fairly similar, in particular between “medium”, “high” and “linear”.</p> "> Figure 3
<p>(<b>a</b>) Visual representation of the number of Riemann intervals over the EM steps for each profile <math display="inline"><semantics> <msub> <mi>φ</mi> <mi>i</mi> </msub> </semantics></math>. In higher dimension, the computational complexity of the profiles are very different. More precisely, the total number of Riemann squares computed over 100 EM iterations are: 4534 for “square root”, 125,662 for “5 square root”, 348,550 for “low” and 2,318,350 for “medium”. (<b>b</b>) For each profile, average evolution of the negative log-likelihood, with standard deviation, over 50 simulations. The “square root” profile performs poorly. However, the other three are comparable despite their different computational complexities.</p> "> Figure 4
<p>500 sample points from a Mixture of Gaussians with 3 classes. The true centroid of each Gaussian are depicted by black crosses, and their true covariance matrices are represented by the confidence ellipses of level 0.8, 0.99 and 0.999 around the centre. Each sub-figure corresponds to one of the three different versions of the true parameters. From (<b>a</b>) to (<b>c</b>): the true <math display="inline"><semantics> <msub> <mi>μ</mi> <mi>k</mi> </msub> </semantics></math> of the two left clusters (<math display="inline"><semantics> <msub> <mi>μ</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>μ</mi> <mn>2</mn> </msub> </semantics></math>) are getting closer while everything else stays identical.</p> "> Figure 5
<p>Typical final positioning of the centroids by EM (first row) and tmp-EM with <span class="html-italic">oscillating</span> profile (second row) <b>when the initialisation is made at the barycenter of all data points</b> (blue crosses). The three columns represent the three gradually more ambiguous parameter sets. Each figure represents the positions of the estimated centroids after convergence of the EM algorithms (orange cross), with their estimated covariance matrices (orange confidence ellipses). In each simulation, 500 sample points were drawn from the real GMM (small green crosses). In those example, tmp-EM managed to correctly identify the position of the three real centroids.</p> "> Figure 6
<p>Typical final positioning of the centroids by EM (first row) and tmp-EM with <span class="html-italic">oscillating</span> profile (second row) <b>when the initialisation is made by selecting two points in the isolated cluster and one in the lower ambiguous cluster</b> (blue crosses). The three columns represent the three gradually more ambiguous parameter sets. Each figure represents the positions of the estimated centroids after convergence of the EM algorithms (orange cross), with their estimated covariance matrices (orange confidence ellipses). In each simulation, 500 sample points were drawn from the real GMM (small green crosses). In those examples, although EM kept two centroids on the isolated cluster, tmp-EM managed to correctly identify the position of the three real centroids.</p> "> Figure 7
<p>Results over many simulations of the Riemann EM and tmp-Riemann EM on the Beta-Gaussian model. The tempered Riemann EM reaches relative errors on the real parameters that are orders of magnitude below the Riemann EM with no temperature. The likelihood reached is also lower with the tempering.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Deterministic Approximate EM Algorithm and Its Convergence for the Exponential Family
2.1.1. Context and Motivation
- (E) With the current parameter , calculate the conditional probability ;
- (M) To get , maximise in the function ;
2.1.2. Theorem
- Conditions M1–3 on the model.
- M1., and is a -finite positive Borel measure on . Let , and . Define , and as:
- M2. Assume that
- (a*) and are continuous on ;
- (b) for all , is finite and continuous on ;
- (c) there exists a continuous function such that for all ;
- (d) g is positive, finite and continuous on and the level set is compact for any .
- M3. Define and, for any , . Assume either that:
- (a) The set is compact or
- () for all compact sets is finite.
- The conditions on the approximation. Assume that is deterministic. Let . For all indices u, for any compact set , one of the two following configurations holds:Or
- (i)
- (a)
- and, with probability 1;
- (b)
- converges towards a connected component of .
- (ii)
- Let be the set closure function and be any point-to-set distance function within . If has an empty interior, then such that:
- When g is differentiable on , its stationary points are the stable points of the EM, i.e., .
- The M1–3 conditions in Theorem 1 are almost identical to the similarly named M1–3 conditions in [7]. The only difference is in , where we remove the hypothesis that S has to be a continuous function of z, since it is not needed when the approximation is not stochastic.
- The property of the condition (4) can seem hard to verify since S is not integrated here against a probability density function. However, when z is a finite variable, as is the case for finite mixtures, this integral becomes a finite sum. Hence, condition (4) is better adapted to finite mixtures, while condition (5) is better adapted to continuous hidden variables
- The two sufficient conditions (4) and (5) involve a certain form of convergence of towards . If the latent variable z is continuous, this excludes countable (and finite) approximations such as sums of Dirac functions, since they have a measure of zero. In particular, this excludes Quasi-Monte Carlo approximations. However, one look at the proof of Theorem 1 (in Appendix A.1) or at the following sketch of proof reveals it is actually sufficient to verify for any compact set K. Where denotes the approximated E step in the Stable Approximate EM. This condition can be verified by finite approximations.
2.1.3. Sketch of Proof
- converges towards a connected component of
- If has an empty interior, then such that converges towards . Moreover, converges towards the set . Where .
- (a)
- the C1–2 conditions of Proposition 10 of [7].
- -
- (C1) There exists g, aLyapunov function relative tosuch that for all, is compact, and:
- -
- (C2) is compact OR (C2’) is finite for all compact .
- (b)
- (c)
- ∀ compact :
- converges towards a connected component of .
- If has an empty interior, then the sequence converges towards a and converges towards the set .
2.2. Riemann Approximation EM
2.2.1. Context and Motivation
2.2.2. Theorem and Proof
2.3. Tempered EM
2.3.1. Context and Motivation
2.3.2. Theorem
- (T1)
- ,
- (T2)
- ,
2.3.3. Sketch of Proof
2.3.4. Examples of Models That Verify the Conditions
Gaussian Mixture Model
Poisson Count with Random Effect
- is measurable on .
- is continuous on K (and on ).
- With , then
2.4. Tempered Riemann Approximation EM
Context, Theorem and Proof
3. Results
3.1. Riemann Approximation EM: Two Applications
3.1.1. Application to a Gaussian Model with the Beta Prior
3.1.2. Application in Two Dimensions
3.2. Tempered EM: Application to Mixtures of Gaussian
3.2.1. Context and Experimental Protocol
3.2.2. Experimental Results Analysis
3.3. Tempered Riemann Approximation EM: Application to a Gaussian Model with Beta Prior
4. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proofs of the Two Main Theorems
Appendix A.1. Proof of the General Theorem
Appendix A.1.1. Verifying the Conditions of Proposition 2
Equivalent Formulation of the Convergence
Use the Uniform Continuity
Sufficient Condition for Convergence
Further Simplifications of the Desired Result with Successive Sufficient Conditions
Conclusions
Appendix A.1.2. Applying Proposition 2
Appendix A.1.3. Verifying the Conditions of Proposition 1
- (set closure) plays the part of the compact K
- plays the part of the set
- is also compact thanks to hypothesis
- The likelihood g is the Lyapunov function with regards to
- is the K valued sequence (since K is ).
Appendix A.1.4. Applying Proposition 1
- converges towards a connected component of
- If has an empty interior, then converges towards a and converges towards . Where
Appendix A.2. Proof of the Tempering Theorem
Appendix A.2.1. Taylor Development
Appendix A.2.2. Two Intermediary Lemmas
- (i)
- ,
- (ii)
- ,
- (iii)
- ,
- (iv)
- ,
- (v)
- .
Appendix A.2.3. Verifying the Conditions of Lemma A2
- (i)
- We have by hypothesis.
- (ii) and (iii)
- and are implied by the fact that and are continuous
- (iv) and (v)
- are also hypotheses of the theorem.Hence we can apply Lemma A2. This means that:
References
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1977, 39, 1–22. [Google Scholar]
- Wu, C.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
- Boyles, R.A. On the convergence of the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1983, 45, 47–50. [Google Scholar] [CrossRef]
- Lange, K. A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1995, 57, 425–437. [Google Scholar] [CrossRef] [Green Version]
- Delyon, B.; Lavielle, M.; Moulines, E. Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 1999, 27, 94–128. [Google Scholar] [CrossRef]
- Wei, G.C.; Tanner, M.A. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
- Fort, G.; Moulines, E. Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Stat. 2003, 31, 1220–1259. [Google Scholar] [CrossRef]
- Kuhn, E.; Lavielle, M. Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Anal. 2005, 49, 1020–1038. [Google Scholar] [CrossRef]
- Allassonnière, S.; Kuhn, E.; Trouvé, A. Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study. Bernoulli 2010, 16, 641–678. [Google Scholar] [CrossRef]
- Allassonnière, S.; Chevallier, J. A New Class of EM Algorithms. Escaping Local Maxima and Handling Intractable Sampling. Comput. Stat. Data Anal. 2021, 159, 107159. [Google Scholar]
- Neal, R.M.; Hinton, G.E. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models; Springer: Dordrecht, The Netherlands, 1998; pp. 355–368. [Google Scholar]
- Ng, S.K.; McLachlan, G.J. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Stat. Comput. 2003, 13, 45–55. [Google Scholar] [CrossRef]
- Cappé, O.; Moulines, E. On-line expectation–maximization algorithm for latent data models. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2009, 71, 593–613. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Zhu, J.; Teh, Y.W.; Zhang, T. Stochastic expectation maximization with variance reduction. Adv. Neural Inf. Process. Syst. 2018, 31, 7967–7977. [Google Scholar]
- Karimi, B.; Wai, H.T.; Moulines, E.; Lavielle, M. On the global convergence of (fast) incremental expectation maximization methods. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 2837–2847. [Google Scholar]
- Fort, G.; Moulines, E.; Wai, H.T. A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm. Adv. Neural Inf. Process. Syst. 2020, 34, 16972–16982. [Google Scholar]
- Kuhn, E.; Matias, C.; Rebafka, T. Properties of the stochastic approximation EM algorithm with mini-batch sampling. Stat. Comput. 2020, 30, 1725–1739. [Google Scholar] [CrossRef]
- Balakrishnan, S.; Wainwright, M.J.; Yu, B. Statistical guarantees for the EM algorithm: From population to sample-based analysis. Ann. Stat. 2017, 45, 77–120. [Google Scholar] [CrossRef]
- Dwivedi, R.; Ho, N.; Khamaru, K.; Wainwright, M.J.; Jordan, M.I.; Yu, B. Singularity, misspecification and the convergence rate of EM. Ann. Stat. 2020, 48, 3161–3182. [Google Scholar] [CrossRef]
- Booth, J.G.; Hobert, J.P. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. (Stat. Methodol.) 1999, 61, 265–285. [Google Scholar]
- Levine, R.A.; Casella, G. Implementations of the Monte Carlo EM algorithm. J. Comput. Graph. Stat. 2001, 10, 422–439. [Google Scholar] [CrossRef] [Green Version]
- Levine, R.A.; Fan, J. An automated (Markov chain) Monte Carlo em algorithm. J. Stat. Comput. Simul. 2004, 74, 349–360. [Google Scholar] [CrossRef]
- Pan, J.X.; Thompson, R. Quasi-Monte Carlo EM algorithm for MLEs in generalized linear mixed models. In COMPSTAT; Physica-Verlag: Heidelberg, Germany, 1998; pp. 419–424. [Google Scholar]
- Jank, W. Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM. Comput. Stat. Data Anal. 2005, 48, 685–701. [Google Scholar] [CrossRef]
- Attias, H. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1999; pp. 21–30. [Google Scholar]
- Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Tzikas, D.G.; Likas, A.C.; Galatsanos, N.P. The variational approximation for Bayesian inference. IEEE Signal Process. Mag. 2008, 25, 131–146. [Google Scholar] [CrossRef]
- Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
- Swendsen, R.H.; Wang, J.S. Replica Monte Carlo simulation of spin-glasses. Phys. Rev. Lett. 1986, 57, 2607. [Google Scholar] [CrossRef]
- Geyer, C.J.; Thompson, E.A. Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 1995, 90, 909–920. [Google Scholar] [CrossRef]
- Ueda, N.; Nakano, R. Deterministic annealing EM algorithm. Neural Netw. 1998, 11, 271–282. [Google Scholar] [CrossRef]
- Naim, I.; Gildea, D. Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients. In Proceedings of the 29th International Conference on Machine Learning; Omnipress: Madison, WI, USA, 2012; pp. 1655–1662. [Google Scholar]
- Chen, H.F.; Guo, L.; Gao, A.J. Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds. Stoch. Process. Their Appl. 1987, 27, 217–231. [Google Scholar] [CrossRef] [Green Version]
- Van Laarhoven, P.J.; Aarts, E.H. Simulated annealing. In Simulated Annealing: Theory and Applications; Springer: Dordrecht, The Netherlands, 1987; pp. 7–15. [Google Scholar]
- Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines; John Wiley and Sons Inc.: New York, NY, USA, 1988. [Google Scholar]
- Hukushima, K.; Nemoto, K. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 1996, 65, 1604–1608. [Google Scholar] [CrossRef] [Green Version]
- Titterington, D.; Smith, A.; Makov, U. Statistical Analysis of Finite Mixture Distributions; Wiley: New York, NY, USA, 1985. [Google Scholar]
- Ho, N.; Nguyen, X. Convergence rates of parameter estimation for some weakly identifiable finite mixtures. Ann. Stat. 2016, 44, 2726–2755. [Google Scholar] [CrossRef]
- Dwivedi, R.; Ho, N.; Khamaru, K.; Wainwright, M.; Jordan, M.; Yu, B. Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 1866–1876. [Google Scholar]
- Winkelbauer, A. Moments and absolute moments of the normal distribution. arXiv 2012, arXiv:1209.4340. [Google Scholar]
EM | tmp-EM (Decreasing T) | tmp-EM (Decreasing Oscillating T) | |||||
---|---|---|---|---|---|---|---|
Parameter | |||||||
Family | cl. | barycenter | 2v1 | barycenter | 2v1 | barycenter | 2v1 |
1 | 1 | 0.52 (1.01) | 1.52 (1.24) | 0.60 (1.08) | 0.87 (1.20) | 0.04 (0.26) | 0.29 (0.64) |
2 | 0.55 (1.05) | 1.53 (1.25) | 0.64 (1.10) | 0.96 (1.25) | 0.05 (0.31) | 0.30 (0.64) | |
3 | 0.01 (0.06) | 0.01 (0.03) | 0.01 (0.10) | 0.01 (0.02) | 0.03 (0.17) | 0.03 (0.19) | |
2 | 1 | 1.00 (1.42) | 1.69 (1.51) | 0.96 (1.41) | 1.10 (1.46) | 0.09 (0.47) | 0.37 (0.86) |
2 | 1.03 (1.44) | 1.71 (1.52) | 1.08 (1.46) | 1.11 (1.46) | 0.12 (0.57) | 0.32 (0.79) | |
3 | 0.01 (0.05) | 0.02 (0.03) | 0.01 (0.03) | 0.01 (0.04) | 0.01 (0.05) | 0.04 (0.22) | |
3 | 1 | 1.56 (1.75) | 1.79 (1.77) | 1.63 (1.76) | 1.38 (1.71) | 0.31 (0.97) | 0.39 (0.98) |
2 | 1.51 (1.74) | 1.88 (1.76) | 1.52 (1.74) | 1.30 (1.68) | 0.30 (0.93) | 0.39 (0.97) | |
3 | 0.02 (0.04) | 0.02 (0.04) | 0.01 (0.03) | 0.02 (0.06) | 0.01 (0.04) | 0.07 (0.30) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lartigue, T.; Durrleman, S.; Allassonnière, S. Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms 2022, 15, 78. https://doi.org/10.3390/a15030078
Lartigue T, Durrleman S, Allassonnière S. Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms. 2022; 15(3):78. https://doi.org/10.3390/a15030078
Chicago/Turabian StyleLartigue, Thomas, Stanley Durrleman, and Stéphanie Allassonnière. 2022. "Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM" Algorithms 15, no. 3: 78. https://doi.org/10.3390/a15030078
APA StyleLartigue, T., Durrleman, S., & Allassonnière, S. (2022). Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms, 15(3), 78. https://doi.org/10.3390/a15030078