[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3183713.3190658acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization

Published: 27 May 2018 Publication History

Abstract

Advances in modern hardware, such as increases in the size of main memory available on computers, have made it possible to analyze data at a much higher rate than before. In this paper, we demonstrate that there is tremendous room for improvement in the processing of analytical queries on modern commodity hardware. We introduce BIPie, an engine for query processing implementing highly efficient decoding, selection, and aggregation for analytical queries executing on a columnar storage engine in MemSQL. We demonstrate that these operations are interdependent, and must be fused and considered together to achieve very high performance. We propose and compare multiple strategies for decoding, selection and aggregation (with GROUP BY), all of which are designed to take advantage of modern CPU architectures, including SIMD. We implemented these approaches in MemSQL, a high performance hybrid transaction and analytical processing database designed for commodity hardware. We thoroughly evaluate the performance of the approach across a range of parameters, and demonstrate a two to four times speedup over previously published TPC-H Query 1 performance.

References

[1]
2013. TPC-H Lenovo ThinkServer RD630. http://web.archive.org/web/ 20170331123020/http://c970058.r58.cf2.rackcdn.com/individual_results/Lenovo/Lenovo-RD630-sf100--130510-ES.pdf . (2013). {Online; accessed 29-November-2017}.
[2]
2013. TPC-H ThinkServer RD630. http://c970058.r58.cf2.rackcdn.com/individual_results/Lenovo/Lenovo-RD630-sf300--130510-ES.pdf . (2013). {Online; accessed 29-November-2017}.
[3]
2013. TPC-H ThinkServer RD630. http://c970058.r58.cf2.rackcdn.com/individual_results/Lenovo/Lenovo-RD630-sf100--130510-ES.pdf . (2013). {Online; accessed 29-November-2017}.
[4]
2014. TPC-H Cisco UCS C460 M4 Server. http://web.archive.org/web/ 20170421045119/http://c970058.r58.cf2.rackcdn.com/individual_results/Cisco/cisco tpch 1000 cisco_ucs_c460_m4_server es 2014-12-15 v02.pdf.(2014).
[5]
2014. TPC-H Dell PowerEdgeR720xd. http://web.archive.org/ web/20170421044648/http://c970058.r58.cf2.rackcdn.com/individual_results/Dell/dell tpch 100 dell_poweredge_r720xd_using_exasolution_5. 0 es 2014-09-23 v02.pdf. (2014). {Online; accessed 29-November-2017}. BIPie: Fast Selection and Aggregation on Encoded Data using Operator Specialization SIGMOD'18, June 10--15, 2018, Houston, TX, USA
[6]
2016. TPC-H Cisco UCS C460 M4 Server. http://c970058.r58.cf2.rackcdn.com/individual_results/Cisco/cisco tpch 10000 cisco_ucs_c460_m4_ server es 2016-11-28 v01.pdf. (2016). Online; accessed 29-November-2017}
[7]
Peter A Boncz, Marcin Zukowski, and Niels Nes. {n. d.}. MonetDB/X100: Hyper-Pipelining Query Execution.
[8]
Haran Boral and David J. DeWitt. 1983. Database Machines: an Idea Whose Time has Passed? A Critique of the Future of Database Machines. (July 1983).
[9]
Sebastian Breß, Max Heimel, Norbert Sigmund, Ladjel Bellatrech, and Gunter Aaake. 2014. GPU-accelerated Database Systems: Survey and Open Challenges. Transactions on Large-Scale Data- and Knowledge-Centered Systems XV. Lecture Notes in Computer Science 8920 (2014).
[10]
Jack Chen, Samir Jindel, Robert Walzer, Rajkumar Sen, Nika Jimsheleishvilli, and Michael Andrews. 2016. The MemSQL Query Optimizer: A Modern Optimizer for Real-time Analytics in a Distributed Database. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1401--1412.
[11]
Time Gubner and Peter Boncz. {n. d.}. Exploring Query Execution Strategies for JIT, Vectorization and SIMD. Proceedings of ADMS 2017. ({n. d.}).
[12]
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage Using Both Vectorization and Compilation. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 311--326.
[13]
Per-Ake Larson, Adrian Birka, Eric N. Hanson, Weiyun Huang, Michal Nowakiewicz, and Vassilis Papadimos. 2015. Real-time Analytical Processing with SQL Server. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1740--1751.
[14]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis &Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO '04). IEEE Computer Society, Washington, DC, USA, 75--. http://dl.acm.org/citation.cfm?id=977395.977673
[15]
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (June 2011), 539--550.
[16]
Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo - a Vector Algebra for Portable Database Performance on Modern Hardware. Proc. VLDB Endow. 9, 14 (Oct. 2016), 1707--1718.
[17]
Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM,New York, NY, USA, 1493--1508.
[18]
Vijayshankar Raman, Gopi Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, Rene Mueller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More Than Just a Column Store. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1080--1091.
[19]
Christopher Root and Todd Mostak. 2016. MapD: A GPU-powered Big Data Analytics and Visualization Platform. In ACM SIGGRAPH 2016 Talks (SIGGRAPH'16). ACM, New York, NY, USA, Article 73, 2 pages.
[20]
Benjamin Schlegel, Rainer Gemulla, and Wolfgang Lehner. 2010. Fast Integer Compression Using SIMD Instructions. In Proceedings of the Sixth International Workshop on Data Management on New Hardware (DaMoN '10). ACM, New York, NY, USA, 34--40.
[21]
A. Skidanov, A. J. Papito, and A. Prout. 2016. A column store engine for real-time streaming analytics. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 1287--1297.
[22]
Thomas Willhalm, Ismail Oukid, Ingo Mller, and Franz Frber. 2013. Vectorizing Database Column Scans with Complex Predicates. (08 2013).
[23]
Jingren Zhou and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). ACM, New York, NY, USA, 145--156.

Cited By

View all
  • (2024)SingleStore-V: An Integrated Vector Database System in SingleStoreProceedings of the VLDB Endowment10.14778/3685800.368580517:12(3772-3785)Online publication date: 8-Nov-2024
  • (2022)Cloud-Native Transactions and Analytics in SingleStoreProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526055(2340-2352)Online publication date: 10-Jun-2022
  • (2021)Charting the design space of query execution using VOILAProceedings of the VLDB Endowment10.14778/3447689.344770914:6(1067-1079)Online publication date: 12-Apr-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aggregation
  2. bipie
  3. column store
  4. encoded data
  5. operator specialization
  6. query processing
  7. selection

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SingleStore-V: An Integrated Vector Database System in SingleStoreProceedings of the VLDB Endowment10.14778/3685800.368580517:12(3772-3785)Online publication date: 8-Nov-2024
  • (2022)Cloud-Native Transactions and Analytics in SingleStoreProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526055(2340-2352)Online publication date: 10-Jun-2022
  • (2021)Charting the design space of query execution using VOILAProceedings of the VLDB Endowment10.14778/3447689.344770914:6(1067-1079)Online publication date: 12-Apr-2021
  • (2019)Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelinesThe VLDB Journal10.1007/s00778-019-00547-y29:2-3(757-774)Online publication date: 16-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media