[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Beyond equi-joins: ranking, enumeration and factorization

Published: 01 July 2021 Publication History

Abstract

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k, the k top-ranked answers are returned in O(n polylog n + k log k) time. This is within a polylogarithmic factor of O(n + k log k), i.e., the best known complexity for equi-joins, and even of O(n + k), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n for a join of relations. The key ingredient is a novel O(n polylog n)-size factorized representation of the query output, which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.

References

[1]
2020. Bird Occurrences in Oceania, From https://www.gbif.org/.
[2]
2021. TPC Benchmark H (Decision Support) Revision 3.0.0. http://tpc.org/tpch/
[3]
Mahmoud Abo Khamis, Ryan R. Curtin, Benjamin Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2019. On Functional Aggregate Queries with Additive Inequalities. In PODS. 414--431.
[4]
Mahmoud Abo Khamis, Hung Q Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. In-database learning with sparse tensors. In PODS. 325--340.
[5]
Mahmoud Abo Khamis, Hung Q Ngo, and Dan Suciu. 2017. What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another?. In PODS. 429--444.
[6]
Pankaj K. Agarwal. 2017. Range Searching. In Handbook of Discrete and Computational Geometry, Third Edition, Jacob E. Goodman, Joseph O'Rourke, and Csaba D. Tóth (Eds.). Chapman and Hall/CRC, 1057--1092.
[7]
Pankaj K Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang. 2021. Dynamic Enumeration of Similarity Joins. CoRR (2021). arXiv:2105.01818
[8]
Noga Alon, Raphael Yuster, and Uri Zwick. 1995. Color-coding. J. ACM 42, 4 (1995), 844--856.
[9]
Albert Atserias, Martin Grohe, and Dániel Marx. 2013. Size Bounds and Query Plans for Relational Joins. SIAM J. Comput. 42, 4 (2013), 1737--1767.
[10]
Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. 2007. On acyclic conjunctive queries and constant delay enumeration. In International Workshop on Computer Science Logic (CSL). 208--222.
[11]
Nurzhan Bakibayev, Tomáš Kočiský, Dan Olteanu, and Jakub Závodný. 2013. Aggregation and Ordering in Factorised Databases. PVLDB 6, 14 (2013), 1990--2001.
[12]
Nurzhan Bakibayev, Dan Olteanu, and Jakub Závodný. 2012. FDB: A Query Engine for Factorised Relational Databases. PVLDB 5, 11 (2012), 1232--1243.
[13]
Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. 2020. Constant Delay Enumeration for Conjunctive Queries: A Tutorial. ACM SIGLOG News 7, 1 (2020), 4--33.
[14]
Christoph Berkholz, Jens Keppeler, and Nicole Schweikardt. 2017. Answering Conjunctive Queries Under Updates. In PODS. 303--318.
[15]
Christoph Berkholz and Nicole Schweikardt. 2019. Constant Delay Enumeration with FPT-Preprocessing for Conjunctive Queries of Bounded Submodular Width. In 44th International Symposium on Mathematical Foundations of Computer Science (MFCS) (LIPIcs), Vol. 138. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 58:1--58:15.
[16]
Johann Brault-Baron. 2013. De la pertinence de l'énumération: complexité en logiques propositionnelle et du premier ordre. Ph.D. Dissertation. Université de Caen. https://hal.archives-ouvertes.fr/tel-01081392
[17]
Johann Brault-Baron. 2016. Hypergraph Acyclicity Revisited. ACM Comput. Surv. 49, 3, Article 54 (Dec. 2016), 26 pages.
[18]
David Bremner, Timothy M Chan, Erik D Demaine, Jeff Erickson, Ferran Hurtado, John Iacono, Stefan Langerman, and Perouz Taslakian. 2006. Necklaces, convolutions, and X+ Y. In European Symposium on Algorithms. Springer, 160--171.
[19]
Nofar Carmeli and Markus Kröll. 2019. On the Enumeration Complexity of Unions of Conjunctive Queries. In PODS. 134--148.
[20]
Nofar Carmeli and Markus Kröll. 2020. Enumeration Complexity of Conjunctive Queries with Functional Dependencies. Theory Comput. Syst. 64, 5 (2020), 828--860.
[21]
Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. 2021. Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries. In PODS. 325--341.
[22]
Nofar Carmeli, Shai Zeevi, Christoph Berkholz, Benny Kimelfeld, and Nicole Schweikardt. 2020. Answering (Unions of) Conjunctive Queries Using Random Access and Random-Order Enumeration. In PODS. 393--409.
[23]
Lijun Chang, Xuemin Lin, Wenjie Zhang, Jeffrey Xu Yu, Ying Zhang, and Lu Qin. 2015. Optimal enumeration: Efficient top-k tree matching. PVLDB 8, 5 (2015), 533--544.
[24]
Yuan-Chi Chang, Lawrence Bergman, Vittorio Castelli, Chung-Sheng Li, Ming-Ling Lo, and John R Smith. 2000. The onion technique: indexing for linear optimization queries. In SIGMOD. 391--402.
[25]
Bernard Chazelle. 1988. Functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17, 3 (1988), 427--462.
[26]
Sophie Cluet and Guido Moerkotte. 1995. Efficient evaluation of aggregates on bulk types. In Proceedings of the Fifth International Workshop on Database Programming Languages 5. 1--10.
[27]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). The MIT Press.
[28]
Gautam Das, Dimitrios Gunopulos, Nick Koudas, and Dimitris Tsirogiannis. 2006. Answering top-k queries using views. In VLDB. 451--462.
[29]
Mark De Berg, Marc Van Kreveld, Mark Overmars, and Otfried Schwarzkopf. 1997. Computational geometry. In Computational geometry. Springer, 1--17.
[30]
Shaleen Deep and Paraschos Koutris. 2021. Ranked Enumeration of Conjunctive Query Results. In ICDT, Vol. 186. 5:1--5:19.
[31]
David J. DeWitt, Jeffrey F. Naughton, and Donovan A. Schneider. 1991. An Evaluation of Non-Equijoin Algorithms. In VLDB. 443--452.
[32]
Mengsu Ding, Shimin Chen, Nantia Makrynioti, and Stefan Manegold. 2021. Progressive Join Algorithms Considering User Preference. In CIDR. https://ir.cwi.nl/pub/30501/30501.pdf
[33]
Arnaud Durand. 2020. Fine-Grained Complexity Analysis of Queries: From Decision to Counting and Enumeration. In PODS. 331--346.
[34]
Jost Enderle, Matthias Hampel, and Thomas Seidl. 2004. Joining Interval Data in Relational Databases. In SIGMOD. 683--694.
[35]
Ronald Fagin, Amnon Lotem, and Moni Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. System Sci. 66, 4 (2003), 614--656.
[36]
Jonathan Finger and Neoklis Polyzotis. 2009. Robust and efficient algorithms for rank join evaluation. In SIGMOD. 415--428.
[37]
Michel Gondran and Michel Minoux. 2008. Graphs, Dioids and Semirings: New Models and Algorithms (Operations Research/Computer Science Interfaces Series). Springer.
[38]
Georg Gottlob, Gianluigi Greco, Nicola Leone, and Francesco Scarcello. 2016. Hypertree Decompositions: Questions and Answers. In PODS. 57--74.
[39]
M.H. Graham. 1979. On the universal relation. Technical Report. Univ. of Toronto.
[40]
C. A. R. Hoare. 1962. Quicksort. Comput. J. 5, 1 (01 1962), 10--16.
[41]
Vagelis Hristidis, Nick Koudas, and Yannis Papakonstantinou. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. SIGMOD Record 30, 2 (2001), 259--270.
[42]
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2019. Efficient Query Processing for Dynamically Changing Datasets. SIGMOD Record 48, 1 (2019), 33--40.
[43]
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2020. General dynamic Yannakakis: conjunctive queries with theta joins under updates. VLDB J. 29 (2020), 619--653.
[44]
Ihab F Ilyas, Walid G Aref, and Ahmed K Elmagarmid. 2004. Supporting top-k join queries in relational databases. VLDB J. 13, 3 (2004), 207--221.
[45]
Ihab F Ilyas, George Beskales, and Mohamed A Soliman. 2008. A survey of top-k query processing techniques in relational database systems. Comput. Surveys 40, 4 (2008), 11.
[46]
Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. 2019. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In ICDT. 21:1--21:19.
[47]
Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2016. Joins via Geometric Resolutions: Worst Case and Beyond. TODS 41, 4, Article 22 (2016), 45 pages.
[48]
Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, and Panos Kalnis. 2017. Fast and scalable inequality joins. VLDB J. 26, 1 (2017), 125--150.
[49]
Paraschos Koutris, Tova Milo, Sudeepa Roy, and Dan Suciu. 2017. Answering Conjunctive Queries with Inequalities. Theory of Computing Systems 61, 1 (2017), 2--30.
[50]
Arun Kumar, Jeffrey Naughton, and Jignesh M Patel. 2015. Learning generalized linear models over normalized data. In SIGMOD. 1969--1984.
[51]
Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2018. Community interaction and conflict on the web. https://snap.stanford.edu/data/soc-RedditHyperlinks.html. In WWW. 933--943.
[52]
Rundong Li, Wolfgang Gatterbauer, and Mirek Riedewald. 2020. Near-Optimal Distributed Band-Joins through Recursive Partitioning. In SIGMOD. 2375--2390.
[53]
Qingyun Liu, Jack W. Stokes, Rob Mead, Tim Burrell, Ian Hellen, John Lambert, Andrey Marochko, and Weidong Cui. 2018. Latte: Large-Scale Lateral Movement Detection. In MILCOM. 1--6.
[54]
Neha Makhija and Wolfgang Gatterbauer. 2021. Towards a Dichotomy for Minimally Factorizing the Provenance of Self-Join Free Conjunctive Queries. CoRR abs/2105.14307 (2021). arXiv:2105.14307 https://arxiv.org/abs/2105.14307
[55]
Nikos Mamoulis, Man Lung Yiu, Kit Hung Cheng, and David W Cheung. 2007. Efficient top-k aggregation of ranked inputs. TODS 32, 3 (2007), 19.
[56]
Dániel Marx. 2013. Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries. J. ACM 60, 6, Article 42 (2013), 51 pages.
[57]
Apostol Natsev, Yuan-Chi Chang, John R Smith, Chung-Sheng Li, and Jeffrey Scott Vitter. 2001. Supporting incremental join queries on ranked inputs. In VLDB. 281--290. http://www.vldb.org/conf/2001/P281.pdf
[58]
Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma. 2020. Optimal Joins Using Compact Data Structures. In ICDT, Vol. 155. 21:1--21:21.
[59]
Hung Q Ngo. 2018. Worst-case optimal join algorithms: Techniques, results, and open problems. In PODS. 111--124.
[60]
Hung Q Ngo, Dung T Nguyen, Christopher Re, and Atri Rudra. 2014. Beyond worst-case analysis for joins with minesweeper. In PODS. 234--245.
[61]
Hung Q Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2018. Worst-case optimal join algorithms. J. ACM 65, 3 (2018), 16.
[62]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew Strikes Back: New Developments in the Theory of Join Algorithms. SIGMOD Record 42, 4 (Feb. 2014), 5--16.
[63]
Dan Olteanu and Jiewen Huang. 2008. Using OBDDs for efficient query evaluation on probabilistic databases. (2008), 326--340.
[64]
Dan Olteanu and Jiewen Huang. 2009. Secondary-storage confidence computation for conjunctive queries with inequalities. In SIGMOD. 389--402.
[65]
Dan Olteanu and Maximilian Schleich. 2016. F: Regression Models over Factorized Views. PVLDB 9, 13 (2016), 1573--1576.
[66]
Dan Olteanu and Maximilian Schleich. 2016. Factorized databases. SIGMOD Record 45, 2 (2016).
[67]
Dan Olteanu and Jakub Závodný. 2011. On factorisation of provenance polynomials. In TaPP. https://www.usenix.org/conference/tapp11/factorisation-provenance-polynomials
[68]
Dan Olteanu and Jakub Závodnỳ. 2012. Factorised representations of query results: size bounds and readability. In ICDT. 285--298.
[69]
Dan Olteanu and Jakub Závodnỳ. 2015. Size bounds for factorised representations of query results. TODS 40, 1 (2015), 2.
[70]
Krishna Kumar P., Paul Langton, and Wolfgang Gatterbauer. 2020. Factorized Graph Representations for Semi-Supervised Learning from Sparse Data. In SIGMOD. 1383--1398.
[71]
Christos H. Papadimitriou and Mihalis Yannakakis. 1999. On the complexity of database queries. J. Comput. System Sci. 58, 3 (1999), 407--427.
[72]
Saladi Rahul and Yufei Tao. 2019. A Guide to Designing Top-k Indexes. SIGMOD Record 48, 2 (2019).
[73]
Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning linear regression models over factorized joins. In SIGMOD. 3--18.
[74]
Luc Segoufin. 2015. Constant Delay Enumeration for Conjunctive Queries. SIGMOD Record 44, 1 (2015), 10--17.
[75]
Robert E Tarjan and Mihalis Yannakakis. 1984. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. Comput. 13, 3 (1984), 566--579.
[76]
Panayiotis Tsaparas, Themistoklis Palpanas, Yannis Kotidis, Nick Koudas, and Divesh Srivastava. 2003. Ranked join indices. In ICDE. IEEE, 277--288.
[77]
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, and Xiaofeng Yang. 2020. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. PVLDB 13, 9 (2020), 1582--1597.
[78]
Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. 2020. Optimal Join Algorithms Meet Top-k. In SIGMOD. 2659--2665.
[79]
Nikolaos Tziavelis, Wolfgang Gatterbauer, and Mirek Riedewald. 2021. Beyond Equi-joins: Ranking, Enumeration and Factorization. CoRR abs/2101.12158 (2021). arXiv:2101.12158
[80]
Moshe Y. Vardi. 1982. The Complexity of Relational Query Languages (Extended Abstract). In STOC. 137--146.
[81]
Todd L. Veldhuizen. 2014. Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In ICDT. 96--106.
[82]
Dan E. Willard. 1996. Applications of Range Query Theory to Relational Data Base Join and Selection Operations. J. Comput. System Sci. 52, 1 (1996), 157--169.
[83]
Dan E Willard. 2002. An algorithm for handling many relational calculus queries efficiently. J. Comput. System Sci. 65, 2 (2002), 295--331.
[84]
Huanhuan Wu, James Cheng, Silu Huang, Yiping Ke, Yi Lu, and Yanyan Xu. 2014. Path Problems in Temporal Graphs. PVLDB 7, 9 (2014), 721--732.
[85]
Minji Wu, Laure Berti-Equille, Amélie Marian, Cecilia M Procopiuc, and Divesh Srivastava. 2010. Processing top-k join queries. PVLDB 3, 1 (2010), 860--870.
[86]
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K Nicholson, Mirek Riedewald, and Alessandra Sala. 2018. Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs. In WWW. 489--498.
[87]
Xiaofeng Yang, Mirek Riedewald, Rundong Li, and Wolfgang Gatterbauer. 2018. Any-k Algorithms for Exploratory Analysis with Conjunctive Queries. In International Workshop on Exploratory Search in Databases and the Web (ExploreDB). 1--3.
[88]
Mihalis Yannakakis. 1981. Algorithms for Acyclic Database Schemes. In VLDB. 82--94.
[89]
Clement Tak Yu and Meral Z Ozsoyoglu. 1979. An algorithm for tree-query membership of a distributed query. In COMPSAC. IEEE, 306--312.

Cited By

View all
  • (2024)Complex Event Recognition meets Hierarchical Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36958342:5(1-26)Online publication date: 7-Nov-2024
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • (2023)Efficient Computation of Quantiles over JoinsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588670(303-315)Online publication date: 18-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 11
July 2021
732 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2021
Published in PVLDB Volume 14, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Complex Event Recognition meets Hierarchical Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36958342:5(1-26)Online publication date: 7-Nov-2024
  • (2024)Worst-Case-Optimal Similarity Joins on Graph DatabasesProceedings of the ACM on Management of Data10.1145/36392942:1(1-26)Online publication date: 26-Mar-2024
  • (2023)Efficient Computation of Quantiles over JoinsProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588670(303-315)Online publication date: 18-Jun-2023
  • (2023)Tractable Orders for Direct Access to Ranked Answers of Conjunctive QueriesACM Transactions on Database Systems10.1145/357851748:1(1-45)Online publication date: 2-Jan-2023
  • (2022)Conjunctive Queries with ComparisonsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517830(108-121)Online publication date: 10-Jun-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media