[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3292500.3330826acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Coresets for Minimum Enclosing Balls over Sliding Windows

Published: 25 July 2019 Publication History

Abstract

Coresets are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.

References

[1]
Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. 2004. Approximating extent measures of points. J. ACM, Vol. 51, 4 (2004), 606--635.
[2]
Pankaj K. Agarwal and R. Sharathkumar. 2010. Streaming Algorithms for Extent Problems in High Dimensions. In SODA . 1481--1489.
[3]
Olivier Bachem, Mario Lucic, and Andreas Krause. 2018. Scalable k-Means Clustering via Lightweight Coresets. In KDD. 1119--1127.
[4]
Mihai Badoiu and Kenneth L. Clarkson. 2008. Optimal Core-sets for Balls. Comput. Geom., Vol. 40, 1 (2008), 14--22.
[5]
Mihai Badoiu, Sariel Har-Peled, and Piotr Indyk. 2002. Approximate clustering via core-sets. In STOC. 250--257.
[6]
Albert Bifet, Geoff Holmes, Bernhard Pfahringer, and Ricard Gavaldà. 2011. Mining frequent closed graphs on evolving data streams. In KDD. 591--599.
[7]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization .Cambridge university press.
[8]
Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. 2016. Clustering Problems on Sliding Windows. In SODA. 1374--1390.
[9]
Vladimir Braverman and Rafail Ostrovsky. 2007. Smooth Histograms for Sliding Windows. In FOCS. 283--293.
[10]
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. 2018. Fast Coreset-based Diversity Maximization under Matroid Constraints. In WSDM . 81--89.
[11]
Timothy M. Chan. 2006. Faster core-set constructions and data-stream algorithms in fixed dimensions. Comput. Geom., Vol. 35, 1--2 (2006), 20--35.
[12]
Timothy M. Chan and Vinayak Pathak. 2014. Streaming and dynamic algorithms for minimum enclosing balls in high dimensions. Comput. Geom., Vol. 47, 2 (2014), 240--247.
[13]
Fu-Lai Chung, Zhaohong Deng, and Shitong Wang. 2009. From Minimum Enclosing Ball to Fast Fuzzy Inference System Training on Large Datasets. IEEE Trans. Fuzzy Syst., Vol. 17, 1 (2009), 173--184.
[14]
Kenneth L. Clarkson. 2010. Coresets, Sparse Greedy Approximation, and the Frank-Wolfe Algorithm. ACM Trans. Algorithms, Vol. 6, 4 (2010), 63:1--63:30.
[15]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining Stream Statistics over Sliding Windows. SIAM J. Comput., Vol. 31, 6 (2002), 1794--1813.
[16]
Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. 2017. Submodular Optimization Over Sliding Windows. In WWW. 421--430.
[17]
Dan Feldman and Tamir Tassa. 2015. More Constraints, Smaller Coresets: Constrained Matrix Approximation of Sparse Big Data. In KDD . 249--258.
[18]
Kaspar Fischer, Bernd G"a rtner, and Martin Kutz. 2003. Fast Smallest-Enclosing-Ball Computation in High Dimensions. In ESA. 630--641.
[19]
Bernd G"artner. 1999. Fast and Robust Smallest Enclosing Balls. In ESA. 325--338.
[20]
Sariel Har-Peled and Soham Mazumdar. 2004. On coresets for k-means and k-median clustering. In STOC. 291--300.
[21]
Jonathan H. Huggins, Trevor Campbell, and Tamara Broderick. 2016. Coresets for Scalable Bayesian Logistic Regression. In NIPS . 4080--4088.
[22]
Piyush Kumar, Joseph S. B. Mitchell, and Emre Alper Yildirim. 2003. Approximate Minimum Enclosing Balls in High Dimensions using Core-Sets. ACM J. Exp. Algorithmics, Vol. 8 (2003).
[23]
Thomas Larsson and Linus K"allberg. 2013. Fast and Robust Approximation of Smallest Enclosing Balls in Arbitrary Dimensions. Comput. Graph. Forum, Vol. 32, 5 (2013), 93--101.
[24]
S. Muthukrishnan. 2005. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, Vol. 1, 2 (2005), 117--236.
[25]
Rasmus Pagh, Francesco Silvestri, Johan Sivertsen, and Matthew Skala. 2015. Approximate Furthest Neighbor in High Dimensions. In SISAP. 3--14.
[26]
Jeff M. Phillips and Wai Ming Tai. 2018. Near-Optimal Coresets of Kernel Density Estimates. In SoCG. 66:1--66:13.
[27]
Piyush Rai, Hal Daumé III, and Suresh Venkatasubramanian. 2009. Streamed Learning: One-Pass SVMs. In IJCAI. 1211--1216.
[28]
Ivor W. Tsang, James T. Kwok, and Pak-Ming Cheung. 2005. Core Vector Machines: Fast SVM Training on Very Large Data Sets. J. Mach. Learn. Res., Vol. 6 (2005), 363--392.
[29]
Yanhao Wang, Qi Fan, Yuchen Li, and Kian-Lee Tan. 2017. Real-Time Influence Maximization on Dynamic Social Streams. PVLDB, Vol. 10, 7 (2017), 805--816.
[30]
Yanhao Wang, Yuchen Li, and Kian-Lee Tan. 2018. Efficient Representative Subset Selection over Sliding Windows. IEEE Trans. Knowl. Data Eng. (2018).
[31]
Yanhao Wang, Yuchen Li, and Kian-Lee Tan. 2019. Coresets for Minimum Enclosing Balls over Sliding Windows. (2019).

Cited By

View all
  • (2023)A Streaming Approach to the Core Vector MachineArtificial Intelligence and Soft Computing10.1007/978-3-031-23480-4_8(91-101)Online publication date: 24-Jan-2023
  • (2022)Investigating The Scalability of Kernel Minimum Enclosing Balls for Novelty Detection: Algorithms with Empirical Evaluations2022 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI51031.2022.10022106(1127-1134)Online publication date: 4-Dec-2022
  • (2022)Adaptive k-center and diameter estimation in sliding windowsInternational Journal of Data Science and Analytics10.1007/s41060-022-00318-z14:2(155-173)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. coresets
  2. minimum enclosing ball
  3. streaming algorithm

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Streaming Approach to the Core Vector MachineArtificial Intelligence and Soft Computing10.1007/978-3-031-23480-4_8(91-101)Online publication date: 24-Jan-2023
  • (2022)Investigating The Scalability of Kernel Minimum Enclosing Balls for Novelty Detection: Algorithms with Empirical Evaluations2022 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI51031.2022.10022106(1127-1134)Online publication date: 4-Dec-2022
  • (2022)Adaptive k-center and diameter estimation in sliding windowsInternational Journal of Data Science and Analytics10.1007/s41060-022-00318-z14:2(155-173)Online publication date: 2-Apr-2022
  • (2022)Minimum Epsilon-Kernel Computation for Large-Scale Data ProcessingJournal of Computer Science and Technology10.1007/s11390-022-2429-637:6(1398-1411)Online publication date: 30-Nov-2022
  • (2022)Simple and Efficient Acceleration of the Smallest Enclosing Ball for Large Data Sets in $$E^2$$: Analysis and Comparative ResultsComputational Science – ICCS 202210.1007/978-3-031-08751-6_52(720-733)Online publication date: 15-Jun-2022
  • (2021)A Fully Dynamic Algorithm for k-Regret Minimizing Sets2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00144(1631-1642)Online publication date: Apr-2021
  • (2021)Classification in Non-stationary Environments Using Coresets over Sliding WindowsAdvances in Computational Intelligence10.1007/978-3-030-85030-2_11(126-137)Online publication date: 21-Aug-2021
  • (2020)Sliding window algorithms for k-clustering problemsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496455(8716-8727)Online publication date: 6-Dec-2020
  • (2020)Reactive Concept Drift Detection Using Coresets Over Sliding Windows2020 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI47803.2020.9308521(1350-1355)Online publication date: 1-Dec-2020
  • (2020)Shells within Minimum Enclosing Balls2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA49011.2020.00030(178-187)Online publication date: Oct-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media