Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

Richard Matthew McCutchen¹ &
Samir Khuller¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5171))

Included in the following conference series:

International Workshop on Approximation Algorithms for Combinatorial Optimization
International Workshop on Randomization and Approximation Techniques in Computer Science

1465 Accesses

Abstract

Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4 + ε)-approximation algorithm using O(ε ^− 1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ε)-approximation algorithm using O(ε ^− 1 ln (ε ^− 1) k + k ²) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Improved Data Stream Algorithm for Clustering

k-Median/Means with Outliers Revisited: A Simple Fpt Approximation

Streaming Algorithms for Smallest Intersecting Ball of Disjoint Balls

References

Agarwal, P., Har-Peled, S., Yu, H.: Robust shape fitting via peeling and grating coresets. In: Proc. of ACM Symp. on Discrete Algorithms (SODA), pp. 182–191 (2006)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proc. of ACM Principles of Database Systems (PODS), pp. 153–162 (2006)
Google Scholar
Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage, Beverly Hills (1984)
Google Scholar
Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 250–257 (2002)
Google Scholar
Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic infomation retrieval. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 626–635 (1997)
Google Scholar
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 30–39 (2003)
Google Scholar
Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 642–651 (2001)
Google Scholar
Chen, K.: A constant factor approximation algorithm for k-median clustering with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 826–835 (2008)
Google Scholar
Everitt, B.: Cluster Analysis. Heinemann Educational, London (1974)
Google Scholar
Gonzalez, T.: Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science 38, 293–306 (1985)
Article MATH MathSciNet Google Scholar
Guha, S.: The k-center karma of a data stream (unpublished manuscript) (2007)
Google Scholar
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proc. of IEEE Foundations of Computer Science (FOCS), pp. 359–366 (2000)
Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)
MATH Google Scholar
Hochbaum, D., Shmoys, D.B.: A best possible approximation algorithm for the k-center problem. Math. of Operations Research 10, 180–184 (1985)
Article MATH MathSciNet Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, NJ (1988)
MATH Google Scholar
Rasmussen, E.: Clustering algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland,
Richard Matthew McCutchen & Samir Khuller

Authors

Richard Matthew McCutchen
View author publications
You can also search for this author in PubMed Google Scholar
Samir Khuller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ashish Goel Klaus Jansen José D. P. Rolim Ronitt Rubinfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matthew McCutchen, R., Khuller, S. (2008). Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity. In: Goel, A., Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds) Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2008 2008. Lecture Notes in Computer Science, vol 5171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85363-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-85363-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85362-6
Online ISBN: 978-3-540-85363-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Improved Data Stream Algorithm for Clustering

k-Median/Means with Outliers Revisited: A Simple Fpt Approximation

Streaming Algorithms for Smallest Intersecting Ball of Disjoint Balls

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Improved Data Stream Algorithm for Clustering

k-Median/Means with Outliers Revisited: A Simple Fpt Approximation

Streaming Algorithms for Smallest Intersecting Ball of Disjoint Balls

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation