[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3627673.3679890acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Coresets for Deletion-Robust k-Center Clustering

Published: 21 October 2024 Publication History

Abstract

The k-center clustering problem is of fundamental importance for a broad range of machine learning and data science applications. In this paper, we study the deletion-robust version of the problem. Specifically, we aim to extract a small subset of a given data set, referred to as a coreset, that contains a provably good set of k centers even after an adversary deletes up to z arbitrarily chosen points from the data set. We propose a 4-approximation algorithm that provides a coreset of size O(kz). To our knowledge, this is the first algorithm for deletion-robust k-center clustering with a theoretical guarantee. Moreover, we accompany our theoretical results with extensive experiments, demonstrating that our algorithm achieves significantly better robustness than non-trivial baselines against three heuristic gray-box and white-box adversarial deletion attacks.

References

[1]
Haris Angelidakis, Adam Kurpisz, Leon Sering, and Rico Zenklusen. 2022. Fair and Fast k-Center Clustering for Data Summarization. In Proceedings of the 39th International Conference on Machine Learning (ICML '22). PMLR, 669--702.
[2]
Dmitrii Avdiukhin, Slobodan Mitroviç, Grigory Yaroslavtsev, and Samson Zhou. 2019. Adversarially Robust Submodular Maximization under Knapsack Constraints. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 148--156.
[3]
Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository. https://doi.org/10.24432/C5XW20
[4]
Ilija Bogunovic, Slobodan Mitrovic, Jonathan Scarlett, and Volkan Cevher. 2017. Robust Submodular Maximization: A Non-Uniform Partitioning Approach. In Proceedings of the 34th International Conference on Machine Learning (ICML '17). PMLR, 508--516.
[5]
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. 2019. Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially. Proc. VLDB Endow., Vol. 12, 7 (2019), 766--778.
[6]
T-H. Hubert Chan, Arnaud Guerquin, and Mauro Sozio. 2018. Fully Dynamic k-Center Clustering. In Proceedings of the 2018 World Wide Web Conference (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 579--587.
[7]
Shiri Chechik and David Peleg. 2015. The fault-tolerant capacitated K-center problem. Theor. Comput. Sci., Vol. 566 (2015), 12--25.
[8]
Ashish Chiplunkar, Sagar Kale, and Sivaramakrishnan Natarajan Ramamoorthy. 2020. How to Solve Fair k-Center in Massive Data Models. In Proceedings of the 37th International Conference on Machine Learning (ICML '20). PMLR, 1877--1886.
[9]
Shuang Cui, Kai Han, and He Huang. 2024. Deletion-Robust Submodular Maximization with Knapsack Constraints. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 10 (2024), 11695--11703.
[10]
Hu Ding, Haikuo Yu, and Zixiu Wang. 2019. Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction. In 27th Annual European Symposium on Algorithms (ESA 2019). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 40:1--40:16.
[11]
Paul Duetting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, and Morteza Zadimoghaddam. 2022. Deletion Robust Submodular Maximization over Matroids. In Proceedings of the 39th International Conference on Machine Learning (ICML '22). PMLR, 5671--5693.
[12]
Cristina G. Fernandes, Samuel P. de Paula, and Lehilton L. C. Pedrosa. 2018. Improved Approximation Algorithms for Capacitated Fault-Tolerant k-Center. Algorithmica, Vol. 80, 3 (2018), 1041--1072.
[13]
Teofilo F. Gonzalez. 1985. Clustering to Minimize the Maximum Intercluster Distance. Theor. Comput. Sci., Vol. 38 (1985), 293--306.
[14]
Gramoz Goranci, Monika Henzinger, Dariusz Leniowski, Christian Schulz, and Alexander Svozil. 2021. Fully Dynamic k-Center Clustering in Low Dimensional Metrics. In 2021 Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 143--153.
[15]
Sudipto Guha. 2009. Tight Results for Clustering and Summarizing Data Streams. In Proceedings of the 12th International Conference on Database Theory (ICDT '09). Association for Computing Machinery, New York, NY, USA, 268--275.
[16]
F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article 19 (2016), 19 pages.
[17]
Sèdjro Salomon Hotegni, Sepideh Mahabadi, and Ali Vakilian. 2023. Approximation Algorithms for Fair Range Clustering. In Proceedings of the 40th International Conference on Machine Learning (ICML '23). PMLR, 13270--13284.
[18]
Matthew Jones, Huy L. Nguyen, and Thy D. Nguyen. 2020. Fair k-Centers via Maximum Matching. In Proceedings of the 37th International Conference on Machine Learning (ICML '20). PMLR, 4940--4949.
[19]
Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. 2018. Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints. In Proceedings of the 35th International Conference on Machine Learning (ICML '18). PMLR, 2549--2558.
[20]
Samir Khuller, Robert Pless, and Yoram J. Sussmann. 2000. Fault tolerant K-center problems. Theor. Comput. Sci., Vol. 242, 1--2 (2000), 237--245.
[21]
Matthäus Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-Center Clustering for Data Summarization. In Proceedings of the 36th International Conference on Machine Learning (ICML '19). PMLR, 3448--3457.
[22]
Andrew Lim, Brian Rodrigues, Fan Wang, and Zhou Xu. 2005. k-Center problems with minimum coverage. Theor. Comput. Sci., Vol. 332, 1--3 (2005), 1--17.
[23]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 3730--3738.
[24]
Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, and Benjamin Moseley. 2015. Fast Distributed K-Center Clustering with Outliers on Massive Data. Advances in Neural Information Processing Systems, Vol. 28 (2015), 1063--1071.
[25]
Richard Matthew McCutchen and Samir Khuller. 2008. Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, 11th International Workshop, APPROX 2008, and 12th International Workshop, RANDOM 2008, Boston, MA, USA, August 25--27, 2008. Proceedings. Springer, Berlin, Heidelberg, 165--178.
[26]
Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2017. Deletion-Robust Submodular Maximization: Data Summarization with “the Right to be Forgotten”. In Proceedings of the 34th International Conference on Machine Learning (ICML '17). PMLR, 2449--2458.
[27]
Fabian Pedregosa, Gaël Varoquaux, et al. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., Vol. 12 (2011), 2825--2830.
[28]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 1532--1543.
[29]
Xiaoliang Wu, Qilong Feng, Ziyun Huang, Jinhui Xu, and Jianxin Wang. 2024. New Algorithms for Distributed Fair k-Center Clustering: Almost Accurate as Sequential Algorithms. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS '24). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, USA, 1938--1946.
[30]
Guangyi Zhang, Nikolaj Tatti, and Aristides Gionis. 2022. Coresets remembered and items forgotten: submodular maximization with deletions. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 676--685.
[31]
Tinghao Zhang, Kwok-Yan Lam, and Jun Zhao. 2024. Device Scheduling and Assignment in Hierarchical Federated Learning for Internet of Things. IEEE Internet Things J., Vol. 11, 10 (2024), 18449--18462.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
October 2024
5705 pages
ISBN:9798400704369
DOI:10.1145/3627673
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coreset
  2. deletion robustness
  3. k-center clustering

Qualifiers

  • Short-paper

Funding Sources

Conference

CIKM '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 70
    Total Downloads
  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)15
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media