[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3448016.3457288acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Building Advanced SQL Analytics From Low-Level Plan Operators

Published: 18 June 2021 Publication History

Abstract

Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of functionalities to summarize data such as associative aggregates, distinct aggregates, ordered-set aggregates, grouping sets, and window functions. In this work, we propose a unified framework for advanced statistics that composes all flavors of complex SQL aggregates from low-level plan operators. These operators can reuse materialized intermediate results, which decouples monolithic aggregation logic and speeds up complex multi-expression queries. The contribution is therefore twofold: our framework modularizes aggregate implementations, and outperforms traditional systems whenever multiple aggregates are combined. We integrated our approach into the high-performance database system Umbra and experimentally show that we compute complex aggregates faster than the state-of-the-art HyPer system.

Supplementary Material

MP4 File (3448016.3457288.mp4)
Analytical queries virtually always involve aggregation and statistics. SQL offers a wide range of functionalities to summarize data such as associative aggregates, distinct aggregates, ordered-set aggregates, grouping sets, and window functions. In this work, we propose a unified framework for advanced statistics that composes all flavors of complex SQL aggregates from low-level plan operators. These operators can reuse materialized intermediate results, which decouples monolithic aggregation logic and speeds up complex multi-expression queries. The contribution is therefore twofold: our framework modularizes aggregate implementions, and outperforms traditional systems whenever multiple aggregates are combined. We integrated our approach into a high-performance database system and experimentally show that we compute complex aggregates faster than the state-of-the-art HyPer system.

References

[1]
Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems. PVLDB, Vol. 5, 10 (2012).
[2]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Ö zsu. 2013a. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited. PVLDB, Vol. 7, 1 (2013).
[3]
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Ö zsu. 2013b. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE.
[4]
Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Ö zsu. 2015. Main-Memory Hash Joins on Modern Processor Architectures. IEEE Trans. Knowl. Data Eng., Vol. 27, 7 (2015).
[5]
Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. In SIGMOD.
[6]
Ronald Barber, Guy M. Lohman, Ippokratis Pandis, Vijayshankar Raman, Richard Sidle, Gopi K. Attaluri, Naresh Chainani, Sam Lightstone, and David Sharpe. 2014. Memory-Efficient Hash Joins. PVLDB, Vol. 8, 4 (2014).
[7]
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD.
[8]
Sebastian Breß, Bastian Kö cher, Henning Funke, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2018. Generating custom code for efficient query execution on heterogeneous processors. VLDB J., Vol. 27, 6 (2018).
[9]
Yu Cao, Chee-Yong Chan, Jie Li, and Kian-Lee Tan. 2012. Optimization of Analytic Window Functions. PVLDB, Vol. 5, 11 (2012).
[10]
Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB, Vol. 1, 2 (2008).
[11]
Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur cC etintemel, and Stan Zdonik. 2015a. An Architecture for Compiling UDF-centric Workflows. PVLDB, Vol. 8, 12 (2015).
[12]
Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Ugur cC etintemel, and Stanley B. Zdonik. 2015b. Tupleware: "Big" Data, Big Analytics, Small Clusters. In CIDR.
[13]
Jens Dittrich and Joris Nix. 2020. The Case for Deep Query Optimisation. In CIDR.
[14]
Stefan Edelkamp and Armin Weiß. 2019. BlockQuicksort: Avoiding Branch Mispredictions in Quicksort. ACM J. Exp. Algorithmics, Vol. 24, 1 (2019).
[15]
Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl. 2012. Spinning Fast Iterative Data Flows. Proc. VLDB Endow., Vol. 5, 11 (2012).
[16]
Ziqiang Feng and Eric Lo. 2015. Accelerating aggregation using intra-cycle parallelism. In ICDE.
[17]
Michael J. Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann. 2020. Adopting Worst-Case Optimal Joins in Relational Database Systems. PVLDB, Vol. 13, 11 (2020).
[18]
Johann Christoph Freytag. 1987. A Rule-Based View of Query Optimization. In SIGMOD.
[19]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Min. Knowl. Discov., Vol. 1, 1 (1997).
[20]
Timo Kersten, Viktor Leis, and Thomas Neumann. 2021. Tidy Tuples and Flying Start: Fast Compilation and Fast Execution of Relational Queries in Umbra. VLDB J., Vol. 30 (2021).
[21]
Harald Lang, Viktor Leis, Martina-Cezara Albutiu, Thomas Neumann, and Alfons Kemper. 2013. Massively Parallel NUMA-aware Hash Joins. In IMDM.
[22]
Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven Parallelism: A NUMA-aware Query Evaluation Framework for the Many-core Age. In SIGMOD.
[23]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015a. How Good Are Query Optimizers, Really? PVLDB, Vol. 9, 3 (2015).
[24]
Viktor Leis, Kan Kundhikanjana, Alfons Kemper, and Thomas Neumann. 2015b. Efficient Processing of Window Functions in Analytical SQL Queries. PVLDB, Vol. 8, 10 (2015).
[25]
Guy M. Lohman. 1988. Grammar-like Functional Rules for Representing Query Optimization Alternatives. In SIGMOD.
[26]
Ingo Mü ller, Peter Sanders, Arnaud Lacurie, Wolfgang Lehner, and Franz F"a rber. 2015. Cache-Efficient Aggregation: Hashing Is Sorting. In SIGMOD.
[27]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, Vol. 4, 9 (2011).
[28]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.
[29]
Shoumik Palkar, James J. Thomas, Anil Shanbhag, Malte Schwarzkopf, Saman P. Amarasinghe, and Matei Zaharia. 2017. Weld: A Common Runtime for High Performance Data Analysis. In CIDR.
[30]
Duy-Hung Phan and Pietro Michiardi. 2016. A novel, low-latency algorithm for multiple Group-By query optimization. In ICDE.
[31]
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware. PVLDB, Vol. 9, 14 (2016).
[32]
Orestis Polychroniou and Kenneth A. Ross. 2013. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN.
[33]
Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Mü ller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, Vol. 6, 11 (2013).
[34]
Stefan Schuh, Xiao Chen, and Jens Dittrich. 2016. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In SIGMOD.
[35]
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD '79). Association for Computing Machinery, New York, NY, USA, 23--34. https://doi.org/10.1145/582095.582099
[36]
Richard Wesley and Fei Xu. 2016. Incremental Computation of Common Windowed Holistic Aggregates. PVLDB, Vol. 9, 12 (2016).
[37]
Wenjian Xu, Ziqiang Feng, and Eric Lo. 2016. Fast Multi-Column Sorting in Main-Memory Column-Stores. In SIGMOD.

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)Sharing Queries with Nonequivalent User-defined Aggregate FunctionsACM Transactions on Database Systems10.1145/364913349:2(1-46)Online publication date: 10-Apr-2024
  • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 1-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database systems
  2. query optimization
  3. query processing

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)49
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)Sharing Queries with Nonequivalent User-defined Aggregate FunctionsACM Transactions on Database Systems10.1145/364913349:2(1-46)Online publication date: 10-Apr-2024
  • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 1-Jul-2023
  • (2022)Designing an open framework for query optimization and compilationProceedings of the VLDB Endowment10.14778/3551793.355180115:11(2389-2401)Online publication date: 1-Jul-2022
  • (2022)Efficient Evaluation of Arbitrarily-Framed Holistic SQL Aggregates and Window FunctionsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526184(1243-1256)Online publication date: 10-Jun-2022
  • (2022)Practical planning and execution of groupjoin and nested aggregatesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00765-x32:6(1165-1190)Online publication date: 22-Oct-2022
  • (2021)ModularisProceedings of the VLDB Endowment10.14778/3484224.348422914:13(3308-3321)Online publication date: 1-Sep-2021
  • (2021)Evolution of a compiling query engineProceedings of the VLDB Endowment10.14778/3476311.347641014:12(3207-3210)Online publication date: 1-Jul-2021
  • (2021)Database technology for the massesProceedings of the VLDB Endowment10.14778/3476249.347629614:11(2483-2490)Online publication date: 1-Jul-2021
  • (2021)A practical approach to groupjoin and nested aggregatesProceedings of the VLDB Endowment10.14778/3476249.347628814:11(2383-2396)Online publication date: 1-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media