[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3452021.3458322acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Minimum Coresets for Maxima Representation of Multidimensional Data

Published: 20 June 2021 Publication History

Abstract

Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set P of points in $ \mathbbR ^d $, where d is a small constant, and an error parameter $ \varepsilon \in (0,1) $, a subset $ Q \subseteq P $ is an $ \varepsilon $-coreset for the maxima representation of P iff the maximum of Q is an $ \varepsilon $-approximation of the maximum of P for any vector $ u \in \mathbbR ^d $, where the maximum is taken over the inner products between the set of points (P or Q) and u. We define a novel minimum $\varepsilon$-coreset problem that asks for an $\varepsilon$-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum $ \varepsilon $-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.

Supplementary Material

MP4 File (PODS21-pods043.mp4)
Presentation video for a paper entitled "Minimum Coresets for Maxima Representation of Multidimensional Data" on PODS 2021

References

[1]
Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. 2004. Approximating extent measures of points. J. ACM, Vol. 51, 4 (2004), 606--635.
[2]
Pankaj K Agarwal, Sariel Har-Peled, Kasturi R Varadarajan, et almbox. 2005. Geometric approximation via coresets. In Combinatorial and computational geometry. Cambridge University Press, 1--30.
[3]
Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, and Subhash Suri. 2017. Efficient Algorithms for k-Regret Minimizing Sets. In SEA. 7:1--7:23.
[4]
Pankaj K. Agarwal, Jeff M. Phillips, and Hai Yu. 2010. Stability of epsilon-Kernels. In ESA (1). 487--499.
[5]
Pankaj K. Agarwal and Hai Yu. 2007. A space-optimal data-stream algorithm for coresets in the plane. In SoCG. 1--10.
[6]
Boris Aronov, Micha Sharir, and Boaz Tagansky. 1997. The Union of Convex Polyhedra in Three Dimensions. SIAM J. Comput., Vol. 26, 6 (1997), 1670--1688.
[7]
Sunil Arya and Timothy M. Chan. 2014. Better ε-Dependencies for Offline Approximate Nearest Neighbor Search, Euclidean Minimum Spanning Trees, and ε-Kernels. In SoCG. 416--425.
[8]
Sunil Arya, Guilherme Dias da Fonseca, and David M. Mount. 2017. Near-Optimal epsilon-Kernel Construction and Related Problems. In SoCG. 10:1--10:15.
[9]
Abolfazl Asudeh, Azade Nazi, Nan Zhang, and Gautam Das. 2017. Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative. In SIGMOD. 821--834.
[10]
Franz Aurenhammer. 1991. Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure. ACM Comput. Surv., Vol. 23, 3 (1991), 345--405.
[11]
Olivier Bachem, Mario Lucic, and Andreas Krause. 2018. Scalable k-Means Clustering via Lightweight Coresets. In KDD. 1119--1127.
[12]
C. Bradford Barber, David P. Dobkin, and Hannu Huhdanpaa. 1996. The Quickhull Algorithm for Convex Hulls. ACM Trans. Math. Softw., Vol. 22, 4 (1996), 469--483.
[13]
Jon Louis Bentley, Mark G. Faust, and Franco P. Preparata. 1982. Approximation Algorithms for Convex Hulls. Commun. ACM, Vol. 25, 1 (1982), 64--68.
[14]
Avrim Blum, Vladimir Braverman, Ananya Kumar, Harry Lang, and Lin F. Yang. 2018. Approximate Convex Hull of Data Streams. In ICALP. 21:1--21:13.
[15]
Avrim Blum, Sariel Har-Peled, and Benjamin Raichel. 2016. Sparse Approximation via Generating Point Sets. In SODA. 548--557.
[16]
Christos Boutsidis, Petros Drineas, and Malik Magdon-Ismail. 2013. Near-Optimal Coresets for Least-Squares Regression. IEEE Trans. Inf. Theory, Vol. 59, 10 (2013), 6880--6892.
[17]
Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong, and Wei Zhan. 2017. k-Regret Minimizing Set: Efficient Algorithms and Hardness. In ICDT. 11:1--11:19.
[18]
Timothy M. Chan. 2006. Faster core-set constructions and data-stream algorithms in fixed dimensions. Comput. Geom., Vol. 35, 1--2 (2006), 20--35.
[19]
Timothy M. Chan. 2009. Dynamic Coresets. Discret. Comput. Geom., Vol. 42, 3 (2009), 469--488.
[20]
Timothy M. Chan. 2016. Dynamic Streaming Algorithms for Epsilon-Kernels. In SoCG. 27:1--27:11.
[21]
Timothy M. Chan. 2017. Applications of Chebyshev Polynomials to Low-Dimensional Computational Geometry. In SoCG. 26:1--26:15.
[22]
Gautam Das and Michael T. Goodrich. 1997. On the Complexity of Optimization Problems for 3-dimensional Convex Polyhedra and Decision Trees. Comput. Geom., Vol. 8 (1997), 123--137.
[23]
Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math., Vol. 1 (1959), 269--271.
[24]
Uriel Feige. 1998. A Threshold of ln n for Approximating Set Cover. J. ACM, Vol. 45, 4 (1998), 634--652.
[25]
Robert J. Fowler, Mike Paterson, and Steven L. Tanimoto. 1981. Optimal Packing and Covering in the Plane are NP-Complete. Inf. Process. Lett., Vol. 12, 3 (1981), 133--137.
[26]
Michael L. Fredman and Robert Endre Tarjan. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, Vol. 34, 3 (1987), 596--615.
[27]
Sariel Har-Peled and Soham Mazumdar. 2004. On coresets for k-means and k-median clustering. In STOC. 291--300.
[28]
Sariel Har-Peled and Manor Mendel. 2006. Fast Construction of Nets in Low-Dimensional Metrics and Their Applications. SIAM J. Comput., Vol. 35, 5 (2006), 1148--1184.
[29]
Ian Harris, Timothy J Osborn, Phil Jones, and David Lister. 2020. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci. Data, Vol. 7, 1 (2020), 1--18.
[30]
Lingxiao Huang, Jian Li, Jeff M. Phillips, and Haitao Wang. 2016. epsilon-Kernel Coresets for Stochastic Points. In ESA. 50:1--50:18.
[31]
Nirman Kumar and Stavros Sintos. 2018. Faster Approximation Algorithm for the k-Regret Minimizing Set and Related Problems. In ALENEX. 62--74.
[32]
Gang Luo, Kun-Lung Wu, and Philip S. Yu. 2009. Answering linear optimization queries with an approximate stream index. Knowl. Inf. Syst., Vol. 20, 1 (2009), 95--121.
[33]
Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. 2020. Coresets for Data-efficient Training of Machine Learning Models. In ICML. 6950--6960.
[34]
Stanislav Morozov and Artem Babenko. 2018. Non-metric Similarity Graphs for Maximum Inner Product Search. In NeurIPS . 4726--4735.
[35]
Danupon Nanongkai, Atish Das Sarma, Ashwin Lall, Richard J. Lipton, and Jun (Jim) Xu. 2010. Regret-Minimizing Representative Databases. PVLDB, Vol. 3, 1 (2010), 1114--1124.
[36]
James B. Orlin and Antonio Sede n o-Noda. 2017. An O(nm) time algorithm for finding the min length directed cycle in a graph. In SODA. 1866--1879.
[37]
Peng Peng and Raymond Chi-Wing Wong. 2014. Geometry approach for k-regret query. In ICDE. 772--783.
[38]
Franco P. Preparata and Michael Ian Shamos. 1985. Computational Geometry - An Introduction .Springer.
[39]
Suraj Shetiya, Abolfazl Asudeh, Sadia Ahmed, and Gautam Das. 2019. A Unified Optimization Algorithm For Solving "Regret-Minimizing Representative" Problems. Proc. VLDB Endow., Vol. 13, 3 (2019), 239--251.
[40]
Shulong Tan, Zhixin Zhou, Zhaozhuo Xu, and Ping Li. 2019. On Efficient Retrieval of Top Similarity Vectors. In EMNLP/IJCNLP (1). 5235--5245.
[41]
Yanhao Wang, Yuchen Li, and Kian-Lee Tan. 2019. Coresets for Minimum Enclosing Balls over Sliding Windows. In KDD. 314--323.
[42]
Min Xie, Raymond Chi-Wing Wong, and Ashwin Lall. 2020. An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J., Vol. 29, 1 (2020), 147--175.
[43]
Min Xie, Raymond Chi-Wing Wong, Jian Li, Cheng Long, and Ashwin Lall. 2018. Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality. In SIGMOD. 959--974.
[44]
Albert Yu, Pankaj K. Agarwal, and Jun Yang. 2012. Processing a large number of continuous preference top-k queries. In SIGMOD. 397--408.
[45]
Hai Yu, Pankaj K. Agarwal, Raghunath Poreddy, and Kasturi R. Varadarajan. 2008. Practical Methods for Shape Fitting and Kinetic Data Structures using Coresets. Algorithmica, Vol. 52, 3 (2008), 378--402.
[46]
Hamid Zarrabi-Zadeh. 2011. An Almost Space-Optimal Streaming Algorithm for Coresets in Fixed Dimensions. Algorithmica, Vol. 60, 1 (2011), 46--59.
[47]
Zhixin Zhou, Shulong Tan, Zhaozhuo Xu, and Ping Li. 2019. Möbius Transformation for Fast Inner Product Search on Graph. In NeurIPS . 8216--8227.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
June 2021
440 pages
ISBN:9781450383813
DOI:10.1145/3452021
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convex hull
  2. coreset
  3. epsilon-kernel
  4. maxima representation
  5. regret minimizing set

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computing Instance-Optimal Kernels in Two DimensionsDiscrete & Computational Geometry10.1007/s00454-024-00637-xOnline publication date: 7-Apr-2024
  • (2023)rkHit: Representative Query with Uncertain PreferenceProceedings of the ACM on Management of Data10.1145/35892711:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Continuous $k$k-Regret Minimization Queries: A Dynamic Coreset ApproachIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316683535:6(5680-5694)Online publication date: 1-Jun-2023
  • (2022)Happiness maximizing sets under group fairness constraintsProceedings of the VLDB Endowment10.14778/3565816.356583016:2(291-303)Online publication date: 1-Oct-2022
  • (2022)Minimum Epsilon-Kernel Computation for Large-Scale Data ProcessingJournal of Computer Science and Technology10.1007/s11390-022-2429-637:6(1398-1411)Online publication date: 30-Nov-2022
  • (2022)Computing Online Average Happiness Maximization Sets over Data StreamsWeb and Big Data10.1007/978-3-031-25201-3_2(19-33)Online publication date: 11-Aug-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media