[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1772690.1772862acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Web-scale k-means clustering

Published: 26 April 2010 Publication History

Abstract

We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml

References

[1]
L. Bottou and Y. Bengio. Convergence properties of the kmeans algorithm. In Advances in Neural Information Processing Systems. 1995.
[2]
J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Machine learning, 2008.
[3]
C. Elkan. Using the triangle inequality to accelerate k-means. In ICML '03: Proceedings of the 20th international conference on Machine learning, 2003.
[4]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5, 2004.
[5]
D. Witten and R. Tibshirani. A framework for feature selection in clustering. To Appear: Journal of the American Statistical Association, 2010.
[6]
X. Wu and V. Kumar. The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, 2009.

Cited By

View all
  • (2024)Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured EnvironmentsRobotics10.3390/robotics1307010813:7(108)Online publication date: 18-Jul-2024
  • (2024)High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall DataMathematics10.3390/math1213193012:13(1930)Online publication date: 21-Jun-2024
  • (2024)Learning in Deep Radial Basis Function NetworksEntropy10.3390/e2605036826:5(368)Online publication date: 26-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. scalability
  2. sparse solutions
  3. unsupervised clustering

Qualifiers

  • Poster

Conference

WWW '10
WWW '10: The 19th International World Wide Web Conference
April 26 - 30, 2010
North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)381
  • Downloads (Last 6 weeks)42
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured EnvironmentsRobotics10.3390/robotics1307010813:7(108)Online publication date: 18-Jul-2024
  • (2024)High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall DataMathematics10.3390/math1213193012:13(1930)Online publication date: 21-Jun-2024
  • (2024)Learning in Deep Radial Basis Function NetworksEntropy10.3390/e2605036826:5(368)Online publication date: 26-Apr-2024
  • (2024)SCALE-BOSS-MR: Scalable Time Series Classification Using Multiple Symbolic RepresentationsApplied Sciences10.3390/app1402068914:2(689)Online publication date: 13-Jan-2024
  • (2024)PhyberSIM: a tool for the generation of ground truth to evaluate brain fiber clustering algorithmsFrontiers in Neuroscience10.3389/fnins.2024.139651818Online publication date: 30-May-2024
  • (2024)A comprehensive investigation of clustering algorithms for User and Entity Behavior AnalyticsFrontiers in Big Data10.3389/fdata.2024.13758187Online publication date: 9-May-2024
  • (2024)City composition and accessibility statistics in and around ParisFrontiers in Big Data10.3389/fdata.2024.13540077Online publication date: 1-Mar-2024
  • (2024)Image-Based Generative Artificial Intelligence in Radiology: Comprehensive UpdatesKorean Journal of Radiology10.3348/kjr.2024.039225:11(959)Online publication date: 2024
  • (2024)A 60-cm water body map obtained using aerial photography: Application to the Tama and Tsurumi riversHydrological Research Letters10.3178/hrl.18.118:1(1-6)Online publication date: 2024
  • (2024)A privacy-preserving vehicle trajectory clustering framework隐私保护下的车辆轨迹聚类方法研究Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230036925:7(988-1002)Online publication date: 27-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EPUB

View this article in ePub.

ePub

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media