A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test
<p><span class="html-italic">Synthetic data with known ground truth.</span> MSE (on a log-scale) of each method with respect to the sample size (in abscissa) over the nine settings retained.</p> "> Figure 2
<p><span class="html-italic">Sensitivity to dimensionality</span><b>Left</b>: MSE (on a log-scale) of each method for the multidimensional conditional mutual information (M-CMI) when increasing the dimension (x-axis) of the conditional variable from 0 to 4; the sample size is fixed to 2000. <b>Middle</b>: MSE (on a log-scale) of each method but LH for the multidimensional mutual information (M-MI) when increasing the number of observations. <b>Right</b>: MSE (on a log-scale) of each method but LH for the multidimensional independent conditional mutual information (M-ICMI) when increasing the number of observations.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. Conditional Mutual Information
2.2. Conditional Independence Tests
3. Hybrid Conditional Mutual Information Estimation for Mixed Data
3.1. Proposed Hybrid Estimator
Algorithm 1 Hybrid estimator CMIh |
|
3.2. Experimental Illustration
- MI quantitative. with .
- MI mixed. and , we get ;
- MI mixed imbalanced. and . The ground truth is , where is the Euler-Mascheroni constant.
- CMI quantitative, CMI mixed and CMI mixed imbalanced. We use the previous setting and add and independent qualitative random variable .
- CMI quantitative ., and , the ground truth is then .
- CMI mixed ., and , the ground truth is then .
- CMI mixed imbalanced ., and , the ground truth is .
4. Testing Conditional Independence
4.1. Local-Adaptive Permutation Test for Mixed Data
Algorithm 2 Local-Adaptive permutation test |
|
4.2. Experimental Illustration
4.2.1. Simulated Data
4.2.2. Real Data
Preprocessed DWD Dataset
- Case 1: latitude is unconditionally independent of longitude as the 349 weather stations are distributed irregularly on the map.
- Case 2: latitude is dependent of longitude given temperature as both latitude and longitude act on temperature: moving a thermometer towards the equator will generally result in an increased temperature, and climate in West Germany is more oceanic and less continental than in East Germany.
ADHD-200 Dataset
- Case 2: hyperactivity/impulsivity level is independent of medication status given attention deficit level, which has been confirmed by Cui et al. [49].
EasyVista IT Monitoring System
- Case 1 represents a conditional independence between message dispatcher at time t and metric insertion at time t given status metric extraction at time t and message dispatcher and metric insertion at time .
- Case 2 represents a conditional independence between group history insertion at time t, collector monitoring information at time t given status metric extraction at time t and group history insertion and collector monitoring information at time .
- Case 3 represents a conditional dependence between status metric extraction at time t and group history insertion at time t given status metric extraction at time .
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Generative Processes Used for the Different Configurations on the Three Structures Chain, Fork and Collider
Appendix A.1. Processes for Chain
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- with probability (0.6,0.1,0.1,0.1,0.1)
- with probability and the other four realizations with probability
- with probability and the other four realizations with probability
and where the function s and t are defined by
Appendix A.2. Processes for Fork
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- with probability and the other four realizations with probability
- with probability and the other four realizations with probability
- with probability (0.6,0.1,0.1,0.1,0.1).
and where the function p and q are defined by
Appendix A.3. Processes for Collider
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- For the configuration ’’:
- with probability (0.6,0.1,0.1,0.1,0.1)
- with probability (0.6,0.1,0.1,0.1,0.1)
- with probability and the other four realizations with probability
and where the function m is defined by
References
- Spirtes, P.; Glymour, C.N.; Scheines, R.; Heckerman, D. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
- Whittaker, J. Graphical Models in Applied Multivariate Statistics; Wiley Publishing: New York, NY, USA, 2009. [Google Scholar]
- Vinh, N.; Chan, J.; Bailey, J. Reconsidering mutual information based feature selection: A statistical significance view. In Proceedings of the AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
- Thomas, M.; Joy, A.T. Elements of Information Theory; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
- Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769–2794. [Google Scholar] [CrossRef]
- Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the International Conference on Algorithmic Learning Theory, Singapore, 8–11 October 2005; pp. 63–77. [Google Scholar]
- Gretton, A.; Smola, A.; Bousquet, O.; Herbrich, R.; Belitski, A.; Augath, M.; Murayama, Y.; Pauls, J.; Schölkopf, B.; Logothetis, N. Kernel constrained covariance for dependence measurement. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, Hastings, Barbados, 6–8 January 2005; pp. 112–119. [Google Scholar]
- Póczos, B.; Ghahramani, Z.; Schneider, J. Copula-based kernel dependency measures. arXiv 2012, arXiv:1206.4682. [Google Scholar]
- Berrett, T.B.; Samworth, R.J. Nonparametric independence testing via mutual information. Biometrika 2019, 106, 547–566. [Google Scholar] [CrossRef]
- Wyner, A.D. A definition of conditional mutual information for arbitrary ensembles. Inf. Control. 1978, 38, 51–59. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
- Frenzel, S.; Pompe, B. Partial Mutual Information for Coupling Analysis of Multivariate Time Series. Phys. Rev. Lett. 2007, 99, 204101. [Google Scholar] [CrossRef]
- Vejmelka, M.; Paluš, M. Inferring the directionality of coupling with conditional mutual information. Phys. Rev. E 2008, 77, 026214. [Google Scholar] [CrossRef]
- Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization; John Wiley & Sons: New York, NY, USA, 2015. [Google Scholar]
- Cabeli, V.; Verny, L.; Sella, N.; Uguzzoni, G.; Verny, M.; Isambert, H. Learning clinical networks from medical records based on information estimates in mixed-type data. PLoS Comput. Biol. 2020, 16, e1007866. [Google Scholar] [CrossRef]
- Marx, A.; Yang, L.; van Leeuwen, M. Estimating conditional mutual information for discrete-continuous mixtures using multi-dimensional adaptive histograms. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), SIAM, Virtual Event, 29 April–1 May 2021; pp. 387–395. [Google Scholar]
- Beirlant, J.; Dudewicz, E.J.; Györfi, L.; Van der Meulen, E.C. Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 1997, 6, 17–39. [Google Scholar]
- Kozachenko, L.F.; Leonenko, N.N. Sample estimate of the entropy of a random vector. Probl. Peredachi Informatsii 1987, 23, 9–16. [Google Scholar]
- Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, A.; Demchuk, E. Nearest neighbor estimates of entropy. Am. J. Math. Manag. Sci. 2003, 23, 301–321. [Google Scholar] [CrossRef]
- Singh, S.; Póczos, B. Finite-sample analysis of fixed-k nearest neighbor density functional estimators. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef]
- Ross, B.C. Mutual Information between Discrete and Continuous Data Sets. PLoS ONE 2014, 9, e87357. [Google Scholar]
- Gao, W.; Kannan, S.; Oh, S.; Viswanath, P. Estimating mutual information for discrete-continuous mixtures. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Rahimzamani, A.; Asnani, H.; Viswanath, P.; Kannan, S. Estimators for multivariate information measures in general probability spaces. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Mesner, O.C.; Shalizi, C.R. Conditional Mutual Information Estimation for Mixed, Discrete and Continuous Data. IEEE Trans. Inf. Theory 2020, 67, 464–484. [Google Scholar] [CrossRef]
- Ahmad, A.; Khan, S.S. Survey of state-of-the-art mixed data clustering algorithms. IEEE Access 2019, 7, 31883–31902. [Google Scholar] [CrossRef]
- Mukherjee, S.; Asnani, H.; Kannan, S. CCMI: Classifier based conditional mutual information estimation. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference, Tel Aviv, Israel, 22–25 July 2020; pp. 1083–1093. [Google Scholar]
- Mondal, A.; Bhattacharjee, A.; Mukherjee, S.; Asnani, H.; Kannan, S.; Prathosh, A. C-MI-GAN: Estimation of conditional mutual information using minmax formulation. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual, 3–6 August 2020; pp. 849–858. [Google Scholar]
- Meynaoui, A. New Developments around Dependence Measures for Sensitivity Analysis: Application to Severe Accident Studies for Generation IV Reactors. Ph.D. Thesis, INSA de Toulouse, Toulouse, France, 2019. [Google Scholar]
- Shah, R.D.; Peters, J. The hardness of conditional independence testing and the generalised covariance measure. Ann. Stat. 2020, 48, 1514–1538. [Google Scholar] [CrossRef]
- Fukumizu, K.; Gretton, A.; Sun, X.; Schölkopf, B. Kernel measures of conditional dependence. In Proceedings of the Advances in Neural Information Processing Systems 20 (NIPS 2007), Vancouver, BC, Canada, 3–6 December 2007; Volume 20. [Google Scholar]
- Zhang, K.; Peters, J.; Janzing, D.; Schölkopf, B. Kernel-Based Conditional Independence Test and Application in Causal Discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, UAI’11, Barcelona, Spain, 14–17 July 2011; pp. 804–813. [Google Scholar]
- Strobl, E.V.; Zhang, K.; Visweswaran, S. Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. J. Causal Inference 2019, 7. [Google Scholar] [CrossRef]
- Zhang, Q.; Filippi, S.; Flaxman, S.; Sejdinovic, D. Feature-to-Feature Regression for a Two-Step Conditional Independence Test. In Proceedings of the Association for Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, 11–15 August 2017. [Google Scholar]
- Doran, G.; Muandet, K.; Zhang, K.; Schölkopf, B. A Permutation-Based Kernel Conditional Independence Test. In Proceedings of the Association for Uncertainty in Artificial Intelligence UAI, Quebec City, QC, Canada, 23–27 July 2014; pp. 132–141. [Google Scholar]
- Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A kernel two-sample test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
- Tsagris, M.; Borboudakis, G.; Lagani, V.; Tsamardinos, I. Constraint-based causal discovery with mixed data. Int. J. Data Sci. Anal. 2018, 6, 19–30. [Google Scholar] [CrossRef]
- Berry, K.J.; Johnston, J.E.; Mielke, P.W. Permutation statistical methods. In The Measurement of Association; Springer: Berlin/Heidelberg, Germany, 2018; pp. 19–71. [Google Scholar]
- Runge, J. Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics 2018, Lanzarote, Spain, 9–11 April 2018; pp. 938–947. [Google Scholar]
- Manoukian, E.B. Mathematical Nonparametric Statistics; Taylor & Francis: Tokyo, Japan, 2022. [Google Scholar]
- Antos, A.; Kontoyiannis, I. Estimating the entropy of discrete distributions. In Proceedings of the IEEE International Symposium on Information Theory 2001, Washington, DC, USA, 24–29 June 2001; p. 45. [Google Scholar]
- Vollmer, M.; Rutter, I.; Böhm, K. On Complexity and Efficiency of Mutual Information Estimation on Static and Dynamic Data. In Proceedings of the EDBT, Vienna, Austria, 26–29 March 2018; pp. 49–60. [Google Scholar]
- Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
- Romano, J.P.; Wolf, M. Exact and approximate stepdown methods for multiple hypothesis testing. J. Am. Stat. Assoc. 2005, 100, 94–108. [Google Scholar] [CrossRef]
- Mooij, J.M.; Peters, J.; Janzing, D.; Zscheischler, J.; Schölkopf, B. Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks. J. Mach. Learn. Res. 2016, 17, 1103–1204. [Google Scholar]
- Cao, Q.; Zang, Y.; Sun, L.; Sui, M.; Long, X.; Zou, Q.; Wang, Y. Abnormal neural activity in children with attention deficit hyperactivity disorder: A resting-state functional magnetic resonance imaging study. Neuroreport 2006, 17, 1033–1036. [Google Scholar] [CrossRef]
- Bauermeister, J.J.; Shrout, P.E.; Chávez, L.; Rubio-Stipec, M.; Ramírez, R.; Padilla, L.; Anderson, A.; García, P.; Canino, G. ADHD and gender: Are risks and sequela of ADHD the same for boys and girls? J. Child Psychol. Psychiatry 2007, 48, 831–839. [Google Scholar] [CrossRef]
- Willcutt, E.G.; Pennington, B.F.; DeFries, J.C. Etiology of inattention and hyperactivity/impulsivity in a community sample of twins with learning difficulties. J. Abnorm. Child Psychol. 2000, 28, 149–159. [Google Scholar] [CrossRef]
- Cui, R.; Groot, P.; Heskes, T. Copula PC algorithm for causal discovery from mixed data. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Italy, 19–23 September 2016; pp. 377–392. [Google Scholar]
Dim of Z | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
CMIh | 8.30(0.14) | 5.30(0.05) | 4.37(0.04) | 4.16(0.04) | 4.39(0.08) |
FP | 16.19(0.40) | 22.09(0.27) | 24.28(0.21) | 25.91(0.08) | 27.41(0.07) |
LH | 0.54(0.07) | 1.09(0.02) | 6.52(0.12) | 58.58(13.74) | 691.68(123.90) |
MS | 16.28(0.40) | 22.08(0.07) | 24.26(0.10) | 26.07(0.06) | 27.73(0.06) |
RAVK | 16.14(0.11) | 22.07(0.07) | 24.28(0.08) | 25.89(0.09) | 27.44(0.14) |
CMIh-LocT | CMIh-LocAT | CMIh-GloT | MS-LocT | MS-LocAT | MS-GloT | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | ||
1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | ||
1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.9 | 1 | 0.9 | 0 | 0 | ||
Chain | 1 | 0.9 | 1 | 0.9 | 1 | 0.8 | 1 | 1 | 1 | 1 | 1 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
0 | 0 | 0.8 | 0.4 | 0 | 0 | 0 | 0 | 0.5 | 0.3 | 0 | 0 | ||
1 | 0.9 | 1 | 0.9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
0.9 | 0.9 | 0.9 | 0.9 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | ||
1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | ||
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
Fork | 1 | 1 | 1 | 0.9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
0 | 0 | 0.9 | 0.8 | 0 | 0 | 0 | 0 | 0.8 | 0.5 | 0 | 0 | ||
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.9 | 1 | 1 | 1 | 1 | ||
1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
1 | 1 | 1 | 1 | 0.8 | 0.9 | 1 | 1 | 1 | 1 | 1 | 1 | ||
1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
Collider | 0 | 0 | 0.4 | 0.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
0.6 | 1 | 1 | 1 | 0.2 | 0.4 | 0 | 0 | 0 | 0 | 0 | 0 | ||
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.4 | 0.9 |
CMIh-LocT | CMIh-LocAT | MS-LocT | MS-LocAT | |
---|---|---|---|---|
Case 1 | 0.05 | 0.05 | 0.03 | 0.03 |
Case 2 | 0 | 0 | 0.09 | 0.08 |
CMIh-LocT | CMIh-LocAT | MS-LocT | MS-LocAT | |
---|---|---|---|---|
Case 1 | 0.36 | 0.36 | 1 | 1 |
Case 2 | 0.17 | 0.19 | 1 | 1 |
CMIh-LocT | CMIh-LocAT | MS-LocT | MS-LocAT | |||||
---|---|---|---|---|---|---|---|---|
0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | 0.01 | 0.05 | |
Case 1 | 1 | 0.75 | 1 | 0.75 | 0.67 | 0.58 | 0.75 | 0.58 |
Case 2 | 1 | 0.67 | 1 | 0.67 | 0.92 | 0.75 | 1 | 0.83 |
Case 3 | 0.75 | 0.83 | 0.75 | 0.83 | 0 | 0 | 0 | 0 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zan, L.; Meynaoui, A.; Assaad, C.K.; Devijver, E.; Gaussier, E. A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test. Entropy 2022, 24, 1234. https://doi.org/10.3390/e24091234
Zan L, Meynaoui A, Assaad CK, Devijver E, Gaussier E. A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test. Entropy. 2022; 24(9):1234. https://doi.org/10.3390/e24091234
Chicago/Turabian StyleZan, Lei, Anouar Meynaoui, Charles K. Assaad, Emilie Devijver, and Eric Gaussier. 2022. "A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test" Entropy 24, no. 9: 1234. https://doi.org/10.3390/e24091234
APA StyleZan, L., Meynaoui, A., Assaad, C. K., Devijver, E., & Gaussier, E. (2022). A Conditional Mutual Information Estimator for Mixed Data and an Associated Conditional Independence Test. Entropy, 24(9), 1234. https://doi.org/10.3390/e24091234