Minimum Mutual Information and Non-Gaussianity Through the Maximum Entropy Method: Theory and Properties
<p>Semi-logarithmic graphs of <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mi>g</mi> </msub> <mo stretchy="false">(</mo> <mi>c</mi> <mo stretchy="false">)</mo> </mrow> </semantics> </math> (black thick line), <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mn>4</mn> </msub> <mo stretchy="false">(</mo> <mi>c</mi> <mo stretchy="false">)</mo> </mrow> </semantics> </math>, (grey thick line) and of the successive growing estimates of <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mn>4</mn> </msub> <mo stretchy="false">(</mo> <mi>c</mi> <mo stretchy="false">)</mo> </mrow> </semantics> </math>: <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mi>g</mi> </msub> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mi>g</mi> </msub> <mo>+</mo> <msub> <mi>I</mi> <mrow> <mi>n</mi> <mi>g</mi> <mo>,</mo> <mn>4</mn> </mrow> </msub> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mi>g</mi> </msub> <mo>+</mo> <msub> <mi>I</mi> <mrow> <mi>n</mi> <mi>g</mi> <mo>,</mo> <mn>6</mn> </mrow> </msub> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mi>g</mi> </msub> <mo>+</mo> <msub> <mi>I</mi> <mrow> <mi>n</mi> <mi>g</mi> <mo>,</mo> <mn>8</mn> </mrow> </msub> </mrow> </semantics> </math> (grey thin lines). See text for details.</p> "> Figure 2
<p>Field of the non-Gaussian MI <math display="inline"> <semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>n</mi> <mi>g</mi> <mo>,</mo> <mn>4</mn> </mrow> </msub> </mrow> </semantics> </math> along 6 bivariate cross-sections of the set of allowed moments. Two varying moments are featured in each cross-section: A (<span class="html-italic">c<sub>g</sub></span>, <span class="html-italic">m<sub>2,1</sub></span>), B (<span class="html-italic">c<sub>g</sub></span>, m3<span class="html-italic"><sub>,1</sub></span>), C (<span class="html-italic">c<sub>g</sub></span>, <span class="html-italic">m<sub>2,2</sub></span>), D (<span class="html-italic">m<sub>2,1</sub></span>, <span class="html-italic">m<sub>3,1</sub></span>) at <span class="html-italic">c<sub>g</sub>=0</span>, E (<span class="html-italic">m<sub>2,1</sub></span>, <span class="html-italic">m<sub>2,2</sub></span>) at <span class="html-italic">c<sub>g</sub> = 0</span> and F (<span class="html-italic">m<sub>3,1</sub></span>, <span class="html-italic">m<sub>2,2</sub></span>) at <span class="html-italic">c<sub>g</sub> </span>= 0. The letter G indicates points or curves where Gaussianity holds.</p> "> Figure 3
<p>Graphs depicting the total MI (<b>a</b>), Gaussian MI (<b>b</b>) and non-Gaussian MI (<b>c</b>) of order 8 for 6 cases (A–F) of different signal-noise combinations with the signal weight <span class="html-italic">s</span> in abscissas varying from 0 up to 1. See text and <a href="#entropy-14-01103-t001" class="html-table">Table 1</a> for details about the cases and their color code.</p> "> Figure 4
<p>Collection of stamp-format ME-PDFs for cases <b>A</b>–<b>F</b> (see text for details) and signal weight <span class="html-italic">s</span> = 0.1 (a), <span class="html-italic">s</span> = 0.5 (b) and <span class="html-italic">s</span> = 0.9 (c) over the [−3, 3]<sup>2</sup> support set.</p> ">
Abstract
:1. Introduction
2. MI Estimation from Maximum Entropy PDFs
2.1. General Properties of Bivariate Mutual Information
2.2. Congruency between Information Moment Sets
- Definition 1: Following the notation of [28], we define the moment class of bivariate PDFs of , as:
- Lemma 1: Given two information encapsulated information moment sets: i.e., with including more constraining functions than , the respective ME-PDFs , if they exist, satisfy the following conditions:
- Definition 2: If two information moment sets , are related by linear affine relationships, then the sets are referred to as “congruent”, a property hereby denoted as , and consequently both PDF sets are equal, i.e., [15]. A stronger condition than ‘congruency’ is the ME-congruency, denoted as , holding when both the associated ME-PDFs are equal. For example, both the univariate constraint sets and for lead to the same ME-PDF, the standard Gaussian N(0,1). Consequently both information moment sets are ME-congruent but not congruent since . This is because the Lagrange multiplier of the ME functional (see Appendix 1) corresponding to the fourth moment is set to zero without any constraining effect. The congruency implies ME-congruency but not the converse.
2.3. MI Estimation from Maximum Entropy Anamorphoses
- Theorem 1: Let be a pair of single random variables (RVs), distributed as the ME-PDF associated to the independent constraints . Both variables can be obtained from previous ME-anamophosis. Let be a subset of , i.e., and , such that all independent moment sets are ME-congruent (see Definition 2), i.e., , i.e., such that the independent extra moments in are not further constraining the ME-PDF. Each marginal moment set is decomposed as . For simplicity of notation, let us denote . Then, the following inequalities between constrained mutual informations hold:
3. MI Decomposition under Gaussian Marginals
3.1. Gaussian Anamorphosis and Gaussian Correlation
3.2. Gaussian and Non-Gaussian MI
- Theorem 2: Given , a pair of rotated standardized variables (A being an invertible 2 × 2 matrix), one has the following result with proof in Appendix 2:
- Corollary 1: For standard Gaussian variables and standardized rotated ones, we have
3.3. The Sequence of Non-Gaussian MI Lower Bounds from Cross-Constraints
3.4. Non-Gaussian MI across the Polytope of Cross Moments
4. The Effect of Noise and Nonlinearity on Non-Gaussian MI
Case study | Color | ||
---|---|---|---|
A—Gaussian noise (reference) | Black | ||
B—Additive non-Gaussian independent noise | Red | ||
C—Multiplicative noise | Blue | ||
D—Smooth nonlinear homeomorphism | Magenta | ||
E—Smooth non-injective transformation | Green | ||
F—Combined non-Gaussianity | Cyan |
7. Discussion and Conclusions
Acknowledgments
References
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons, Inc.: New York, NY, USA, 1991. [Google Scholar]
- Ebrahimi, N.; Soofi, E.S.; Soyer, R. Information measures in perspective. Int. Stat. Rev. 2010, 78, 383–412. [Google Scholar] [CrossRef]
- Stögbauer, H.; Kraskov, A.; Astakhov, S.A.; Grassberger, P. Least-dependent-component analysis based on mutual information. Phys. Rev. E 2004, 70, 066123:1–066123:17. [Google Scholar] [CrossRef]
- Hyvärinen, A.; Oja, E. Independent Component Analysis: Algorithms and application. Neural Network. 2000, 13, 411–430. [Google Scholar] [CrossRef]
- Fraser, H.M.; Swinney, H.L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A 1986, 33, 1134–1140. [Google Scholar] [CrossRef] [PubMed]
- DelSole, T. Predictability and information theory. Part I: Measures of predictability. J. Atmos. Sci. 2004, 61, 2425–2440. [Google Scholar] [CrossRef]
- Majda, A.; Kleeman, R.; Cai, D. A mathematical framework for quantifying predictability through relative entropy. Methods and applications of analysis. Meth. Appl. Anal. 2002, 9, 425–444. [Google Scholar]
- Darbellay, G.A.; Vajda, I. Entropy expressions for multivariate continuous distributions. IEEE Trans. Inform. Theor. 2000, 46, 709–712. [Google Scholar] [CrossRef]
- Nadarajah, S.; Zografos, K. Expressions for Rényi and Shannon entropies for bivariate distributions. Inform. Sci. 2005, 170, 173–189. [Google Scholar] [CrossRef]
- Khan, S.; Bandyopadhyay, S.; Ganguly, A.R.; Saigal, S.; Erickson, D.J.; Protopopescu, V.; Ostrouchov, G. Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Phys. Rev. E 2007, 76, 026209:1–026209:15. [Google Scholar] [CrossRef]
- Walters-Williams, J.; Li, Y. Estimation of mutual information: A survey. Lect. Notes Comput. Sci. 2009, 5589, 389–396. [Google Scholar]
- Globerson, A.; Tishby, N. The minimum information principle for discriminative learning. In Proceedings of the 20th conference on Uncertainty in artificial intelligence, Banff, Canada, 7–11 July 2004; pp. 193–200.
- Jaynes, E.T. On the rationale of maximum-entropy methods. Proc. IEEE 1982, 70, 939–952. [Google Scholar] [CrossRef]
- Wackernagel, H. Multivariate Geostatistics—An Introduction with Applications; Springer-Verlag: Berlin, Germany, 1995. [Google Scholar]
- Shams, S.A. Convergent iterative procedure for constructing bivariate distributions. Comm. Stat. Theor. Meth. 2010, 39, 1026–1037. [Google Scholar] [CrossRef]
- Ebrahimi, N.; Soofi, E.S.; Soyer, R. Multivariate maximum entropy identification, transformation, and dependence. J. Multivariate Anal. 2008, 99, 1217–1231. [Google Scholar] [CrossRef]
- Abramov, R. An improved algorithm for the multidimensional moment-constrained maximum entropy problem. J. Comput. Phys. 2007, 226, 621–644. [Google Scholar] [CrossRef]
- Abramov, R. The multidimensional moment-constrained maximum entropy problem: A BFGS algorithm with constraint scaling. J. Comput. Phys. 2009, 228, 96–108. [Google Scholar] [CrossRef]
- Abramov, R. The multidimensional maximum entropy moment problem: A review on numerical methods. Commun. Math. Sci. 2010, 8, 377–392. [Google Scholar] [CrossRef]
- Rockinger, M.; Jondeau, E. Entropy densities with an application to autoregressive conditional skewness and kurtosis. J. Econometrics 2002, 106, 119–142. [Google Scholar] [CrossRef]
- Pires, C.A.; Perdigão, R.A.P. Non-Gaussianity and asymmetry of the winter monthly precipitation estimation from the NAO. Mon. Wea. Rev. 2007, 135, 430–448. [Google Scholar] [CrossRef]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138:1–066138:16. [Google Scholar] [CrossRef]
- Myers, J.L.; Well, A.D. Research Design and Statistical Analysis, 2nd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2003. [Google Scholar]
- Calsaverini, R.S.; Vicente, R. An information-theoretic approach to statistical dependence: Copula information. Europhys. Lett. 2009, 88, 68003. 1–6. [Google Scholar] [CrossRef]
- Monahan, A.H.; DelSole, T. Information theoretic measures of dependence, compactness, and non-Gaussianity for multivariate probability distributions. Nonlinear Proc. Geoph. 2009, 16, 57–64. [Google Scholar] [CrossRef]
- Guo, D.; Shamai, S.; Verdú, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inform. Theor. 2005, 51, 1261–1283. [Google Scholar] [CrossRef]
- Pires, C.A.; Perdigão, R.A.P. Minimum mutual information and non-Gaussianity through the maximum entropy method: Estimation from finite samples. Entropy 2012. submitted for publication. [Google Scholar]
- Shore, J.E.; Johnson, R.W. Axiomatic derivation of the principle of maximum entropy and the principle of the minimum cross-entropy. IEEE Trans. Inform. Theor. 1980, 26, 26–37. [Google Scholar] [CrossRef]
- Koehler, K.J.; Symmanowski, J.T. Constructing multivariate distributions with specific marginal distributions. J. Multivariate Anal. 1995, 55, 261–282. [Google Scholar] [CrossRef]
- Cruz-Medina, I.R.; Osorio-Sánchez, M.; García-Páez, F. Generation of multivariate random variables with known marginal distribution and a specified correlation matrix. InterStat 2010, 16, 19–29. [Google Scholar]
- van Hulle, M.M. Edgeworth approximation of multivariate differential entropy. Neural Computat. 2005, 17, 1903–1910. [Google Scholar] [CrossRef]
- Comon, P. Independent component analysis, a new concept? Signal Process. 1994, 36, 287–314. [Google Scholar] [CrossRef] [Green Version]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
- Hilbert, D. Über die Darstellung definiter Formen als Summe von Formenquadraten. Math. Ann. 1888, 32, 342–350. [Google Scholar] [CrossRef]
- Ahmadi, A.A.; Parrilo, P.A. A positive definite polynomial hessian that does not factor. In Proceedings of the Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference, Shanghai, China, 15–18 December 2009; pp. 16–18.
- Wolfram, S. The Mathematica Book, 3rd ed.; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
- Bocquet, M.; Pires, C.; Lin, W. Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Wea. Rev. 2010, 138, 2997–3023. [Google Scholar] [CrossRef]
- Sura, P.; Newman, M.; Penland, C.; Sardeshmuck, P. Multiplicative noise and non-Gaussianity: A paradigm for atmospheric regimes? J. Atmos. Sci. 2005, 62, 1391–1406. [Google Scholar] [CrossRef]
- Guo, D.; Shamai, S.; Verdú, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inform. Theor. 2005, 51, 1261–1283. [Google Scholar] [CrossRef]
- Rioul, O. A simple proof of the entropy-power inequality via properties of mutual information. arXiv 2007. arXiv:cs/0701050v2 [cs.IT]. [Google Scholar]
- Guo, D.; Wu, Y.; Shamai, S.; Verdú, S. Estimation in gaussian noise: Properties of the minimum mean-square error. IEEE T. Inform. Theory 2011, 57, 2371–2385. [Google Scholar]
- Gilbert, J.; Lemaréchal, C. Some numerical experiments with variable-storage quasi-Newton algorithms. Math. Program. 1989, 45, 407–435. [Google Scholar] [CrossRef]
Appendix 1
Form and Numerical Estimation of ME-PDFs
Appendix 2
- Proof of Theorem 1: Since X follows the ME-PDF generated by and ME-congruency holds, we have with similar identities for Y. Therefore, the first inequality of (7a) follow directly from (5). The second one is obtained by difference and application of Lemma 1 to (q.e.d.).
- Proof of Theorem 2: The first equality of (11) comes from Equation (1) as and from the negentropy definition of Gaussianized variables , where is the entropy of the Gaussian fit. From the entropy formula of transformed variables we get, leading to the negentropy equality . Finally, the last equation of (11) comes from the equality and the definition of negentropy, i.e., (q.e.d.).
© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pires, C.A.L.; Perdigão, R.A.P. Minimum Mutual Information and Non-Gaussianity Through the Maximum Entropy Method: Theory and Properties. Entropy 2012, 14, 1103-1126. https://doi.org/10.3390/e14061103
Pires CAL, Perdigão RAP. Minimum Mutual Information and Non-Gaussianity Through the Maximum Entropy Method: Theory and Properties. Entropy. 2012; 14(6):1103-1126. https://doi.org/10.3390/e14061103
Chicago/Turabian StylePires, Carlos A. L., and Rui A. P. Perdigão. 2012. "Minimum Mutual Information and Non-Gaussianity Through the Maximum Entropy Method: Theory and Properties" Entropy 14, no. 6: 1103-1126. https://doi.org/10.3390/e14061103
APA StylePires, C. A. L., & Perdigão, R. A. P. (2012). Minimum Mutual Information and Non-Gaussianity Through the Maximum Entropy Method: Theory and Properties. Entropy, 14(6), 1103-1126. https://doi.org/10.3390/e14061103