[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

  • Conference paper
Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques (APPROX 2008, RANDOM 2008)

Abstract

Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4 + ε)-approximation algorithm using O(ε − 1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ε)-approximation algorithm using O(ε − 1 ln (ε − 1) k + k 2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agarwal, P., Har-Peled, S., Yu, H.: Robust shape fitting via peeling and grating coresets. In: Proc. of ACM Symp. on Discrete Algorithms (SODA), pp. 182–191 (2006)

    Google Scholar 

  2. Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proc. of ACM Principles of Database Systems (PODS), pp. 153–162 (2006)

    Google Scholar 

  3. Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage, Beverly Hills (1984)

    Google Scholar 

  4. Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 250–257 (2002)

    Google Scholar 

  5. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic infomation retrieval. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 626–635 (1997)

    Google Scholar 

  6. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 30–39 (2003)

    Google Scholar 

  7. Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 642–651 (2001)

    Google Scholar 

  8. Chen, K.: A constant factor approximation algorithm for k-median clustering with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 826–835 (2008)

    Google Scholar 

  9. Everitt, B.: Cluster Analysis. Heinemann Educational, London (1974)

    Google Scholar 

  10. Gonzalez, T.: Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science 38, 293–306 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  11. Guha, S.: The k-center karma of a data stream (unpublished manuscript) (2007)

    Google Scholar 

  12. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proc. of IEEE Foundations of Computer Science (FOCS), pp. 359–366 (2000)

    Google Scholar 

  13. Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)

    MATH  Google Scholar 

  14. Hochbaum, D., Shmoys, D.B.: A best possible approximation algorithm for the k-center problem. Math. of Operations Research 10, 180–184 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  15. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, NJ (1988)

    MATH  Google Scholar 

  16. Rasmussen, E.: Clustering algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ashish Goel Klaus Jansen José D. P. Rolim Ronitt Rubinfeld

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matthew McCutchen, R., Khuller, S. (2008). Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity. In: Goel, A., Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds) Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2008 2008. Lecture Notes in Computer Science, vol 5171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85363-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85363-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85362-6

  • Online ISBN: 978-3-540-85363-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics