[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

On clusterings: Good, bad and spectral

Published: 01 May 2004 Publication History

Abstract

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is shown to have poly-logarithmic worst-case guarantees under the new measure. The main result of the article is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worst-case guarantees; another finds a "good" clustering, if one exists.

References

[1]
Alpert, C., Kahng, A., and Yao, Z. 1999. Spectral partitioning: The more eigenvectors the better. Discr. Appl. Math. 90, 3--26.
[2]
Azar, Y., Fiat, A., Karlin, A., McSherry F., and Saia, J. 2001. Spectral analysis of data. In Proceedings of 33rd Symposium on Theory of Computing, ACM, New York, 619--626.
[3]
Benczur, A., and Karger, D. 1996. Approximate s--t min-cuts in O(n2) time. In Proceedings of 28th Symposium on Theory of Computing. ACM, New York, 47--55.
[4]
Charikar, M., Chekuri, C., Feder, T., and Motwani, R. 1997. Incremental clustering and dynamic information retrieval. In Proceedings of 29th Symposium on Theory of Computing. ACM, New York, 626--635.
[5]
Charikar, M., Guha, S., Shmoys, D., and Tardos, E. 1999. A constant-factor approximation for the k-median problem. In Proceedings of 31st Symposium on Theory of Computing. ACM, New York, 1--10.
[6]
Cheng, D., Kannan, R., Vempala, S., and Wang, G. 2003. On a recursive spectral algorithm for clustering from pairwise similarities. MIT LCS Technical Report MIT-LCS-TR-906.
[7]
Dhillon, I. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM Conference on Knowledge Discovery and Data Mining. ACM, New York, 269--274.
[8]
Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 1999. Clustering in large graphs and matrices. Proceedings of 10th Symposium on Discrete Algorithms. ACM, New York, 291--299.
[9]
Dyer, M., and Frieze, A. 1985. A simple heuristic for the p-center problem. Oper. Res. Lett. 3(6), 285--288.
[10]
Eigencluster See http://www-math.mit.edu/cluster/.
[11]
Frieze, A., Kannan, R., and Vempala, S. 1998. Fast Monte-Carlo algorithms for finding low-rank approximations, In Proceedings of 39th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif, 370--378.
[12]
Garey, M., and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, Calif.
[13]
Indyk, P., 1999. A sublinear time approximation scheme for clustering in metric spaces, In Proceedings of 40th Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., 154--159.
[14]
Jain, K., and Vazirani, V. 2001. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. ACM 48, 2, 274--296.
[15]
Leighton, T., and Rao, S. 1999. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. J. ACM 46, 6, 787--832.
[16]
Papadimitriou, C., Raghavan, P., Tamaki, H., and Vempala, S. 2000. Latent semantic indexing: a probabilistic analysis, J. Comput. Syst. Sci., 61, 217--235.
[17]
Shi, J., and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22, 8, 888--905, (See http://www-2.cs.cmu.edu/∼jshi/Grouping/.)
[18]
Sinclair, A., and Jerrum, M. 1989. Approximate counting, uniform generation and rapidly mixing Markov chains. Inf. Comput. 82, 93--133.
[19]
Stewart, G. 1973. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev. 154, 727--764.
[20]
Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In Proceedings of IEEE International Conference on Computer Vision. IEEE Computer Society Press, Los Alamitos, Calif., 975--982.

Cited By

View all
  • (2025)A Wasserstein distance-based spectral clustering method for transaction data analysisExpert Systems with Applications10.1016/j.eswa.2024.125418260(125418)Online publication date: Jan-2025
  • (2024)Axioms for clustering simple unweighted graphs: No impossibility resultPLOS Complex Systems10.1371/journal.pcsy.00000111:2(e0000011)Online publication date: 3-Oct-2024
  • (2024)Well-connectedness and community detectionPLOS Complex Systems10.1371/journal.pcsy.00000091:3(e0000009)Online publication date: 5-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 51, Issue 3
May 2004
153 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/990308
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2004
Published in JACM Volume 51, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. graph algorithms
  3. spectral methods

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)12
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)A Wasserstein distance-based spectral clustering method for transaction data analysisExpert Systems with Applications10.1016/j.eswa.2024.125418260(125418)Online publication date: Jan-2025
  • (2024)Axioms for clustering simple unweighted graphs: No impossibility resultPLOS Complex Systems10.1371/journal.pcsy.00000111:2(e0000011)Online publication date: 3-Oct-2024
  • (2024)Well-connectedness and community detectionPLOS Complex Systems10.1371/journal.pcsy.00000091:3(e0000009)Online publication date: 5-Nov-2024
  • (2024)Approximating Small Sparse CutsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649747(319-330)Online publication date: 10-Jun-2024
  • (2024)A Comprehensive Survey on Community Detection With Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.3137396(1-21)Online publication date: 2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • (2024)On Approximating Cutwidth and Pathwidth2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00051(713-729)Online publication date: 27-Oct-2024
  • (2024)Enhanced Spectral Ensemble Clustering for Fault Diagnosis: Application to Photovoltaic SystemsIEEE Access10.1109/ACCESS.2024.349797712(170418-170436)Online publication date: 2024
  • (2024)Bayan algorithm: Detecting communities in networks through exact and approximate optimization of modularityPhysical Review E10.1103/PhysRevE.110.044315110:4Online publication date: 24-Oct-2024
  • (2024)A survey on semi-supervised graph clusteringEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108215133(108215)Online publication date: Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media