[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Generating application-specific data layouts for in-memory databases

Published: 01 July 2019 Publication History

Abstract

Database applications are often developed with object-oriented languages while using relational databases as the backend. To accelerate these applications, developers would manually design customized data structures to store data in memory, and ways to utilize such data structures to answer queries. Doing so is brittle and requires a lot of effort. Alternatively, developers might automate the process by using relational physical design tools to create materialized views and indexes instead. However, the characteristics of object-oriented database applications are often distinct enough from traditional database applications such that classical relational query optimization techniques often cannot speed up queries that arise from such applications, as our experiments show.
To address this, we build Chestnut, a data layout generator for in-memory object-oriented database applications. Given a memory budget, Chestnut generates customized in-memory data layouts and query plans to answer queries written using a subset of the Rails API, a common framework for building object-oriented database applications. Chestnut differs from traditional query optimizers and physical designers in two ways. First, Chestnut automatically generates data layouts that are customized for the application after analyzing their queries, hence Chestnut-generated data layouts are designed to be efficient to answer queries from such applications. second, Chestnut uses a novel enumeration and verification-based algorithm to generate query plans that use such data layouts, rather than rule-based approaches as in traditional query optimizers. We evaluated Chestnut on four open-source Rails database applications. The result shows that it can reduce average query processing time by over 3.6X (and up to 42X), as compared to other in-memory relational database engines.

References

[1]
Active record association. https://guides.rubyonrails.org/association_basics.html.
[2]
Active record query interface. https://guides.rubyonrails.org/active_record_querying.html.
[3]
Comparing oodb with LINQ. https://msdn.microsoft.com/en-us/library/aa479863.aspx.
[4]
Curated list of awesome rails lists. https://project-awesome.org/ekremkaraca/awesome-rails.
[5]
Database size reported by web developers. https://meta.discourse.org/t/slow-sql-queries/16604.
[6]
Database size reported by web developers. http://www.redmine.org/issues/23318.
[7]
Dexter, an automatic indexer for postgres. https://github.com/ankane/dexter.
[8]
Django, a python web application framework. https://www.djangoproject.com/.
[9]
Gurobi ilp solver. http://www.gurobi.com/.
[10]
Hibernate, an orm framework for java. http://hibernate.org/orm/.
[11]
Huginn, anagentmonitor. https://github.com/huginn/huginn.
[12]
Kandan, a chatting application. https://github.com/kandanapp/kandan.
[13]
Lobsters, a forumapplication. https://github.com/lobsters/lobsters.
[14]
Nested set model, a model to represent tree using relational table. https://en.wikipedia.org/wiki/Nested_set_model.
[15]
Protocol buffer, a language-neutral, platform-neutral extensible mechanism for serializing structured data. https://developers.google.com/protocol-buffers.
[16]
ranking of popular open source rails applications. http://www.opensourcerails.com/.
[17]
Redmine, a project management application. https://github.com/redmine/redmine.
[18]
Ruby on rails, a ruby web application framework. https://rubyonrails.org/.
[19]
Stx-btree library. https://github.com/bingmann/stx-btree.
[20]
top7 forceful rails apps for business management. https://www.cleveroad.com/blog/effective-ruby-on-rails-open-source-apps-to-help-your-business.
[21]
who uses huginn. https://github.com/huginn/huginn/wiki/Companies-and-People-Using-Huginn.
[22]
who uses redmine. http://www.redmine.org/projects/redmine/wiki/weareusingredmine.
[23]
Z3 theorem prover. https://github.com/Z3Prover/z3.
[24]
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In PVLDB, pages 496--505, 2000.
[25]
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, pages 359--370, 2004.
[26]
I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A hands-free adaptive store. In SIGMOD, pages 1103--1114, 2014.
[27]
M. Atkinson, D. DeWitt, D. Maier, F. Bancilhon, K. Dittrich, and S. Zdonik. The object-oriented database system manifesto. Elsevier, 1990.
[28]
T. Azim, M. Karpathiotakis, and A. Ailamaki. Recache: Reactive caching for fast analytics over heterogeneous data. PVLDB, 11(3):324--337, 2017.
[29]
J. A. Blakeley, W. J. McKenna, and G. Graefe. Experiences building the open oodb query optimizer. In SIGMOD, pages 287--296, 1993.
[30]
N. Bruno and S. Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In SIGMOD, 2005.
[31]
N. Bruno and S. Chaudhuri. Constrained physical design tuning. PVLDB, 19(1):4--15, 2008.
[32]
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft sql server. In PVLDB, pages 146--155, 1997.
[33]
A. Cheung and O. Arden. Statusquo: Making familiar abstractions perform using program analysis. In CIDR, 2013.
[34]
A. Cheung, A. Solar-Lezama, and S. Madden. Optimizing database-backed applications with query synthesis. In PLDI, pages 3--14, 2013.
[35]
S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR, 2017.
[36]
S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. In PLDI, pages 510--524, 2017.
[37]
D. Dash, N. Polyzotis, and A. Ailamaki. Cophy: A scalable, portable, and interactive index advisor for large workloads. PVLDB, 4(6):362--372, 2011.
[38]
L. Fegaras and D. Maier. An algebraic framework for physical OODB design. In DBLP, pages 6--8, 1995.
[39]
J. Gjengset, M. Schwarzkopf, J. Behrens, L. T. Araújo, M. Ek, E. Kohler, M. F. Kaashoek, and R. Morris. Noria: dynamic, partially-stateful data-flow for high-performance web applications. In USENIX, pages 213--231, 2018.
[40]
D. Gluche and M. Scholl. Physical design in OODBMS. In Grundlagen von Datenbanken, pages 21--25, 1996.
[41]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: A main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.
[42]
S. Idreos, N. Dayan, W. Qin, M. Akmanalp, S. Hilgard, A. Ross, J. Lennon, V. Jain, H. Gupta, D. Li, et al. Design continuums and the path toward self-designing key-value stores that know and learn. In CIDR, 2019.
[43]
A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800--812, 2018.
[44]
W. Kim and F. H. Lochovsky. Object-oriented concepts, databases, and applications. ACM Press/Addison-Wesley Publishing Co., 1989.
[45]
H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Coradd: Correlation aware database designer for materialized views and indexes. PVLDB, 3(1):1103--1113, 2010.
[46]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR, 2019.
[47]
Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014.
[48]
C. Loncaric, M. D. Ernst, and E. Torlak. Generalized data structure synthesis. In ICSE, 2018.
[49]
C. Loncaric, E. Torlak, and M. D. Ernst. Fast synthesis of fast collections. In PLDI, pages 355--368, 2016.
[50]
R. Marcus and O. Papaemmanouil. Towards a hands-free query optimizer through deep learning. In CIDR, 2019.
[51]
M. J. Mior, K. Salem, A. Aboulnaga, and R. Liu. Nose: Schema design for nosql applications. In ICDE, pages 181--192, 2016.
[52]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning state representations for query optimization with deep reinforcement learning. In DEEM, pages 4:1--4:4, 2018.
[53]
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In SIGMOD, pages 1--12, 2012.
[54]
Transaction Processing Performance Council. The TPC-H benchmark. http://www.tpc.org/information/benchmarks.asp, 1999.
[55]
C. Wang, A. Cheung, and R. Bodík. Speeding up symbolic reasoning for relational queries. PACMPL, 2(OOPSLA):157:1--157:25, 2018.
[56]
C. Yan and A. Cheung. Leveraging lock contention to improve OLTP application performance. PVLDB, 9(5):444--455, 2016.
[57]
C. Yan, J. Yang, A. Cheung, and S. Lu. Understanding database performance inefficiencies in real-world web applications. In CIKM, 2017.
[58]
J. Yang, C. Yan, P. Subramaniam, S. Lu, and A. Cheung. How not to structure your database-backed web applications: A study of performance bugs in the wild. In ICSE, pages 800--810, 2018.
[59]
J. Yang, C. Yan, P. Subramaniam, S. Lu, and A. Cheung. PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in ide. In ESEC/FSE, pages 884--887, 2018.

Cited By

View all
  • (2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
  • (2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 1-Aug-2023
  • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 11
July 2019
543 pages

Publisher

VLDB Endowment

Publication History

Published: 01 July 2019
Published in PVLDB Volume 12, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
  • (2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 1-Aug-2023
  • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
  • (2023)Predicate Pushdown for Data Science PipelinesProceedings of the ACM on Management of Data10.1145/35892811:2(1-28)Online publication date: 20-Jun-2023
  • (2023)Fine-Tuning Data Structures for Query ProcessingProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580016(149-161)Online publication date: 17-Feb-2023
  • (2020)Deductive optimization of relational data storageProceedings of the ACM on Programming Languages10.1145/34282384:OOPSLA(1-30)Online publication date: 13-Nov-2020
  • (2020)Managing data constraints in database-backed web applicationsProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380375(1098-1109)Online publication date: 27-Jun-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media