Abstract
We propose a new center-based iterative clustering algorithm, K- Harmonic Means (KHM), which is essentially insensitive to the initialization of the centers, demonstrated through a set of experiments. The dependency of the K-Means performance on the initialization of the centers has been a major problem; a similar issue exists for an alternative algorithm, Expectation Maximization (EM). Many have tried to generate good initializations to solve the sensitivity problem. KHM addresses the intrinsic problem by replacing the minimum distance from a data point to the centers, used in K-means, by the Harmonic Averages of the distances from the data point to all centers. KHM significantly improves the quality of clustering results comparing with both K- Means and EM. The KHM algorithm has been implemented in both sequential and parallel languages and tested on hundreds of randomly generated datasets with different data distribution and clustering characteristics.
Primary Contact: bzhang@hpl.hp.com. This document is released as a technical report in Oct. 1999, available at http://www.hpl.hp.com/techreports/1999/HPL-1999-124.html.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderberg, M. R. 1973. Cluster analysis for applications. Academic Press, New York. 35p.
Bay, S. D. (1999). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.
Bezdek, Ehrlich, & Full, “FCM: THE FUZZY c-MEANS CLUSTERING ALGORITHM”, Computers & Geosciences, v.10, pp.191–203, 1984
Bradley, P., Fayyad, U. M., and Reina, C.A., “Scaling EM Clustering to Large Databases,” MS Technical Report, 1998.
Bradley, P., Fayyad, U. M., C.A., “Refining Initial Points for KM Clustering”, MS Technical Report MSR-TR-98-36, May 1998.
Bradley, P., Fayyad, U.M., and Reina, C.A., “Scaling Clustering to Large Databases”, KDD98, 1998.
Duda, R., Hart, P., “Pattern Classification and Scene Analysis”, John Wiley & Sons, 1972.
Dempster, A. P., Laird, N.M., and Rubin, D.B., “Miximum Likelyhood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
Fayyad, U. M., Piatetsky-Shapiro, G. Smyth, P. and Uthurusamy, R., “Advances in Knowledge Discovery and Data Mining”, AAAI Press 1996
Gersho & Gray, “Vector Quantization and Signal Compression”, KAP, 1992
Gill, P.E., Murray, W. and Wright, H.M., “Practical Optimization”, Academic Press, 1981.
Gonzales, T.F., “Clustering to Minimize the Maximum Intercluster Distance”, Theo. Comp. Sci. 38, p293–306, 1985.
Kaufman, L. and Rousseeuw, P. J., “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, 1990.
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Pp. 281–297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.
McKenzie, P. and Alder, M., “Initializing the EM Algorithm for Use in Gaussian Mixture Modeling”, The Univ. of Western Australia, Center for Information Processing Systems, Manuscript.
McLachlan, G. J. and Krishnan, T., “The EM Algorithm and Extensions.”, John Wiley & Sons, Inc., 1997
Pelleg, D. and Moore, A, “Accelerating Exact K-Means Algorithms with Geometric Reasoning”, KDD-99, Proc. of the Fifth ACM SIGKDD Intern. Conf. On Knowledge Discovery and Data Mining, page 277–281.
Rendner, R.A. and Walker, H.F., “Mixture Densities, Maximum Likelihood and The EM Algorithm”, SIAM Review, vol. 26 #2, 1984.
Selim, S.Z. and Ismail, M.A., “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality”, IEEE Trans. On PAMI-6, #1, 1984.
Zhang, B., Hsu, M., Forman, G., “Accurate Recasting of Parameter Estimation Algorithms using Sufficient Statistics for Efficient Parallel Speed-up”, To appear in PKDD 2000, September, Lyon, France.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, B., Hsu, M., Dayal, U. (2001). K-Harmonic Means -A Spatial Clustering Algorithm with Boosting. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_4
Download citation
DOI: https://doi.org/10.1007/3-540-45244-3_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive