research-article

Building Efficient Query Engines in a High-Level Language

Authors:

Amir Shaikhha,

Yannis Klonatos,

Christoph KochAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 43, Issue 1

Article No.: 4, Pages 1 - 45

https://doi.org/10.1145/3183653

Published: 11 April 2018 Publication History

Get Access

Abstract

Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend.

In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes database systems code by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other.

We evaluate our approach with the TPC-H benchmark and show that (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database as well as an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) These optimizations may potentially come at the cost of using more system memory for improved performance. (d) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines.

Supplementary Material

a4-shaikhha-apndx.pdf (shaikhha.zip)

Supplemental movie, appendix, image and software files for, Building Efficient Query Engines in a High-Level Language

Download
357.98 KB

References

[1]

Daniel J. Abadi, Samuel R. Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: How different are they really? In Proceedings of the Special Interest Group International Conference on the Management of Data (SIGMOD’08). ACM, New York, NY, 967--980.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Building efficient query engines in a high-level language

Compile-Time Analysis of Compiler Frameworks for Query Compilation

Low-latency query compilation

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations