Stability of k-Means Clustering

Shai Ben-David¹,
Dávid Pál¹ &
Hans Ulrich Simon²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Included in the following conference series:

International Conference on Computational Learning Theory

3715 Accesses
27 Citations

Abstract

We consider the stability of k-means clustering problems. Clustering stability is a common heuristics used to determine the number of clusters in a wide variety of clustering applications. We continue the theoretical analysis of clustering stability by establishing a complete characterization of clustering stability in terms of the number of optimal solutions to the clustering optimization problem. Our results complement earlier work of Ben-David, von Luxburg and Pál, by settling the main problem left open there. Our analysis shows that, for probability distributions with finite support, the stability of k-means clusterings depends solely on the number of optimal solutions to the underlying optimization problem for the data distribution. These results challenge the common belief and practice that view stability as an indicator of the validity, or meaningfulness, of the choice of a clustering algorithm and number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Estimations of Clustering Quality via Evaluation of Its Stability

Clustering Algorithm Consistency in Fixed Dimensional Spaces

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

Article Open access 25 July 2022

References

Extended version of this paper. Availabe at http://www.cs.uwaterloo.ca/~dpal/papers/stability/stability.pdf or at http://www.cs.uwaterloo.ca/~shai/publications/stability.pdf
Ben-David, S.: A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In: Proceedings of the Conference on Computational Learning Theory, pp. 415–426 (2004)
Google Scholar
Ben-David, S., von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Proceedings of the Conference on Computational Learning Theory, pp. 5–19 (2006)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing 7, 6–17 (2002)
Google Scholar
Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 3(7) (2002)
Google Scholar
Lange, T., Braun, M.L., Roth, V., Buhmann, J.: Stability-based model selection. Advances in Neural Information Processing Systems 15, 617–624 (2003)
Google Scholar
Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Computation 13(11), 2573–2593 (2001)
Article MATH Google Scholar
Meila, M.: Comparing clusterings. In: Proceedings of the Conference on Computational Learning Theory, pp. 173–187 (2003)
Google Scholar
Pollard, D.: Strong consistency of k-means clustering. The Annals of Statistics 9(1), 135–140 (1981)
MATH MathSciNet Google Scholar
Rakhlin, A., Caponnetto, A.: Stability of k-means clustering. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA (2007)
Google Scholar
von Luxburg, U., Ben-David, S.: Towards a statistical theory of clustering. In: PASCAL workshop on Statistics and Optimization of Clustering (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
Shai Ben-David & Dávid Pál
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Authors

Shai Ben-David
View author publications
You can also search for this author in PubMed Google Scholar
Dávid Pál
View author publications
You can also search for this author in PubMed Google Scholar
Hans Ulrich Simon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ben-David, S., Pál, D., Simon, H.U. (2007). Stability of k-Means Clustering. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-72927-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72925-9
Online ISBN: 978-3-540-72927-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Stability of k-Means Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Estimations of Clustering Quality via Evaluation of Its Stability

Clustering Algorithm Consistency in Fixed Dimensional Spaces

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Stability of k-Means Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Estimations of Clustering Quality via Evaluation of Its Stability

Clustering Algorithm Consistency in Fixed Dimensional Spaces

Robustification of the k-means clustering problem and tailored decomposition methods: when more conservative means more accurate

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation