[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Building efficient query engines in a high-level language

Published: 01 June 2014 Publication History

Abstract

In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine.
We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines.

References

[1]
D. J. Abadi, S. Madden, and N. Hachem. Column stores vs. Row stores: How Different Are They Really? In ACM SIGMOD, pages 967--980, 2008.
[2]
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of System R. Comm. ACM, 24(10):632--646, 1981.
[3]
F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA: data management for modern business applications. SIGMOD Record, 40(4):45--51, 2012.
[4]
G. Graefe. Volcano-an extensible and parallel query evaluation system. IEEE Transactions on Knowledge and Data Engineering, 6(1):120--135, 1994.
[5]
R. Greer. Daytona and the fourth-generation language Cymbal. In ACM SIGMOD, pages 525--526, 1999.
[6]
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible Query Processing in Starburst. In ACM SIGMOD, pages 377--388, 1989.
[7]
S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498, 2006.
[8]
G. C. Hunt and J. R. Larus. Singularity: Rethinking the Software Stack. SIGOPS Oper. Syst. Rev., 41(2):37--49, 2007.
[9]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008.
[10]
C. Koch. Abstraction without regret in data management systems. In CIDR, 2013.
[11]
C. Koch. Abstraction without regret in database systems building: a manifesto. IEEE Data Eng. Bull., 37(1):70--79, 2014.
[12]
K. Krikellas, S. Viglas, and M. Cintra. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.
[13]
C. Lattner. LLVM: An Infrastructure for Multi-Stage Optimization. http://llvm.org/.
[14]
S. Manegold, M. L. Kersten, and P. Boncz. Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB, 2(2):1648--1653, 2009.
[15]
T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011.
[16]
M. Odersky and M. Zenger. Scalable Component Abstractions. In OOPSLA, pages 41--57, 2005.
[17]
Oracle Corporation. TimesTen Database Architecture. http://download.oracle.com/otn_hosted_doc/timesten/603/TimesTen-Documentation/arch.pdf.
[18]
S. Padmanabhan, T. Malkemus, A. Jhingran, and R. Agarwal. Block oriented processing of relational database operations in modern computer architectures. In ICDE, pages 567--574, 2001.
[19]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, pages 60--69, 2008.
[20]
J. Rao, H. Pirahesh, C. Mohan, and G. Lohman. Compiled Query Execution Engine using JVM. In ICDE, pages 23--, 2006.
[21]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Generative Programming and Component Engineering, pages 127--136, 2010. http://scala-lms.github.io/.
[22]
T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In POPL, pages 497--510, 2013.
[23]
J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In DaMoN, pages 33--40, 2011.
[24]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553--564, 2005.
[25]
M. Stonebraker and U. Cetintemel. "One Size Fits All": An Idea Whose Time Has Come and Gone. In ICDE, pages 2--11, 2005.
[26]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). In VLDB, pages 1150--1160, 2007.
[27]
W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000.
[28]
Transaction Processing Performance Council. TPC-H, a decision support benchmark. http://www.tpc.org/tpch.
[29]
B. M. Zane, J. P. Ballard, F. D. Hinshaw, D. A. Kirkpatrick, and L. Premanand Yerabothu. Optimized SQL code generation, 2008. US Patent 7430549 B2.
[30]
R. Zhang, S. Debray, and R. T. Snodgrass. Micro-specialization: dynamic code specialization of database management systems. In Code Generation and Optimization, pages 63--73, 2012.
[31]
R. Zhang, R. Snodgrass, and S. Debray. Application of Micro-specialization to Query Evaluation Operators. In ICDE Workshops, pages 315--321, 2012.
[32]
R. Zhang, R. Snodgrass, and S. Debray. Micro-Specialization in DBMSes. In ICDE, pages 690--701, 2012.
[33]
M. Zukowski, P. A. Boncz, N. Nes, and S. HÃl'man. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., (2):17--22, 2005.

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)UltraPrecise: A GPU-Based Framework for Arbitrary-Precision Arithmetic in Database Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00294(3837-3850)Online publication date: 13-May-2024
  • (2023)Compilation of SQL Queries for Efficient Distributed In-Memory ProcessingProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615940(149-154)Online publication date: 11-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 10
June 2014
146 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2014
Published in PVLDB Volume 7, Issue 10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Query Compilation Without RegretsProceedings of the ACM on Management of Data10.1145/36549682:3(1-28)Online publication date: 30-May-2024
  • (2024)UltraPrecise: A GPU-Based Framework for Arbitrary-Precision Arithmetic in Database Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00294(3837-3850)Online publication date: 13-May-2024
  • (2023)Compilation of SQL Queries for Efficient Distributed In-Memory ProcessingProceedings of the 33rd Annual International Conference on Computer Science and Software Engineering10.5555/3615924.3615940(149-154)Online publication date: 11-Sep-2023
  • (2023)Declarative Sub-Operators for Universal Data ProcessingProceedings of the VLDB Endowment10.14778/3611479.361153916:11(3461-3474)Online publication date: 1-Jul-2023
  • (2022)Designing an open framework for query optimization and compilationProceedings of the VLDB Endowment10.14778/3551793.355180115:11(2389-2401)Online publication date: 1-Jul-2022
  • (2022)Exploiting Data Skew for Improved Query PerformanceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.300644634:5(2176-2189)Online publication date: 1-May-2022
  • (2022)Automatic Decomposition of a Sequential Algorithm for MapReduce Frameworks2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)10.1109/SIBIRCON56155.2022.10017034(1780-1783)Online publication date: 11-Nov-2022
  • (2022)Near Data Processing in Taurus Database2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00170(1662-1674)Online publication date: May-2022
  • (2022)On inter-operator data transfers in query processing2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00066(820-832)Online publication date: May-2022
  • (2022)Low-latency query compilationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00741-531:6(1171-1184)Online publication date: 10-May-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media