Abstract
As the data bases grow bigger, the data mining becomes more and more important. In order to explore and understand the data, the help of computers and data mining methods, like clustering, is a necessity. This paper introduces some information theory based distances and how they can be used for clustering. More precisely, we want to classify a finite set of discrete random variables. This classification has to be based on the correlation between these random variables. In the design of a clustering system, the choice of the notion of distance is crucial, some information distances and classification methods are provided. We also show that in order to have distances over clusters, the variables that are functions of other variables have to be removed from the starting set. The last part gives some applications run on Matlab.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Papoulis, A., Pillai, S.U.: Probability, Random Variables, and Stochastic Processes, 4th edn. McGraw Hill, New York (2002)
Bansal, N., Blum, A., Chawla, S.: Correlation Clustering. Machine Learning 56, 89–113 (2004)
Glynn, E.F.: Correlation Distances and Hierarchical Clustering. TechNote, Stowers Institute for Medical Research (2005)
Tishby, N., Pereira, F., Bialek, W.: The Information Bottleneck Method. In: 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377. IEEE Press, Monticelo (1999)
Kraskov, A., Grassberger, P.: MIC: Mutual Information Based Hierarchical Clustering. In: Emmert-Streib, F., Dehmer, M. (eds.) Information Theory and Statistical Learning, pp. 101–124. Springer, New York (2009)
Kraskov, A., Grassberger, P.: Hierarchical Clustering Based on Mutual Information. Europhysics Letters 70(2), 278–284 (2005)
Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 623–656 (1948)
Kraskov, A., Stogbauer, H., Grassberger, P.: Estimating Mutual Information. Physical Review E 69, 066138 (2004)
Paninski, L.: Estimation of Entropy and Mutual Information. Neural Computation 15, 1191–1253 (2003)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New Jersey (2006)
Yeung, R.W.: Information Theory and Network Coding. Springer, New York (2008)
Yao, Y. Y.: Information-Theoretic Measures for Knowledge Discovery and Data Mining. In: Karmeshu (eds.) Entropy Measures, Maximum Entropy Principle and Emerging Applications, pp. 115–136. Springer, Heidelberg (2003)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Houllier, M., Luo, Y. (2010). Information Distances over Clusters. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13278-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-13278-0_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13277-3
Online ISBN: 978-3-642-13278-0
eBook Packages: Computer ScienceComputer Science (R0)