More Web Proxy on the site http://driver.im/

research-article

Generating application-specific data layouts for in-memory databases

Authors:

Alvin CheungAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 12, Issue 11

Pages 1513 - 1525

https://doi.org/10.14778/3342263.3342630

Published: 01 July 2019 Publication History

Abstract

Database applications are often developed with object-oriented languages while using relational databases as the backend. To accelerate these applications, developers would manually design customized data structures to store data in memory, and ways to utilize such data structures to answer queries. Doing so is brittle and requires a lot of effort. Alternatively, developers might automate the process by using relational physical design tools to create materialized views and indexes instead. However, the characteristics of object-oriented database applications are often distinct enough from traditional database applications such that classical relational query optimization techniques often cannot speed up queries that arise from such applications, as our experiments show.

To address this, we build Chestnut, a data layout generator for in-memory object-oriented database applications. Given a memory budget, Chestnut generates customized in-memory data layouts and query plans to answer queries written using a subset of the Rails API, a common framework for building object-oriented database applications. Chestnut differs from traditional query optimizers and physical designers in two ways. First, Chestnut automatically generates data layouts that are customized for the application after analyzing their queries, hence Chestnut-generated data layouts are designed to be efficient to answer queries from such applications. second, Chestnut uses a novel enumeration and verification-based algorithm to generate query plans that use such data layouts, rather than rule-based approaches as in traditional query optimizers. We evaluated Chestnut on four open-source Rails database applications. The result shows that it can reduce average query processing time by over 3.6X (and up to 42X), as compared to other in-memory relational database engines.

References

[1]

Active record association. https://guides.rubyonrails.org/association_basics.html.

[2]

Active record query interface. https://guides.rubyonrails.org/active_record_querying.html.

[3]

Comparing oodb with LINQ. https://msdn.microsoft.com/en-us/library/aa479863.aspx.

[4]

Curated list of awesome rails lists. https://project-awesome.org/ekremkaraca/awesome-rails.

[5]

Database size reported by web developers. https://meta.discourse.org/t/slow-sql-queries/16604.

[6]

Database size reported by web developers. http://www.redmine.org/issues/23318.

[7]

Dexter, an automatic indexer for postgres. https://github.com/ankane/dexter.

[8]

Django, a python web application framework. https://www.djangoproject.com/.

[9]

Gurobi ilp solver. http://www.gurobi.com/.

[10]

Hibernate, an orm framework for java. http://hibernate.org/orm/.

[11]

Huginn, anagentmonitor. https://github.com/huginn/huginn.

[12]

Kandan, a chatting application. https://github.com/kandanapp/kandan.

[13]

Lobsters, a forumapplication. https://github.com/lobsters/lobsters.

[14]

Nested set model, a model to represent tree using relational table. https://en.wikipedia.org/wiki/Nested_set_model.

[15]

Protocol buffer, a language-neutral, platform-neutral extensible mechanism for serializing structured data. https://developers.google.com/protocol-buffers.

[16]

ranking of popular open source rails applications. http://www.opensourcerails.com/.

[17]

Redmine, a project management application. https://github.com/redmine/redmine.

[18]

Ruby on rails, a ruby web application framework. https://rubyonrails.org/.

[19]

Stx-btree library. https://github.com/bingmann/stx-btree.

[20]

top7 forceful rails apps for business management. https://www.cleveroad.com/blog/effective-ruby-on-rails-open-source-apps-to-help-your-business.

[21]

who uses huginn. https://github.com/huginn/huginn/wiki/Companies-and-People-Using-Huginn.

[22]

who uses redmine. http://www.redmine.org/projects/redmine/wiki/weareusingredmine.

[23]

Z3 theorem prover. https://github.com/Z3Prover/z3.

[24]

S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In PVLDB, pages 496--505, 2000.

Digital Library

[25]

S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, pages 359--370, 2004.

Digital Library

[26]

I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A hands-free adaptive store. In SIGMOD, pages 1103--1114, 2014.

Digital Library

[27]

M. Atkinson, D. DeWitt, D. Maier, F. Bancilhon, K. Dittrich, and S. Zdonik. The object-oriented database system manifesto. Elsevier, 1990.

[28]

T. Azim, M. Karpathiotakis, and A. Ailamaki. Recache: Reactive caching for fast analytics over heterogeneous data. PVLDB, 11(3):324--337, 2017.

Digital Library

[29]

J. A. Blakeley, W. J. McKenna, and G. Graefe. Experiences building the open oodb query optimizer. In SIGMOD, pages 287--296, 1993.

Digital Library

[30]

N. Bruno and S. Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In SIGMOD, 2005.

Digital Library

[31]

N. Bruno and S. Chaudhuri. Constrained physical design tuning. PVLDB, 19(1):4--15, 2008.

Digital Library

[32]

S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft sql server. In PVLDB, pages 146--155, 1997.

Digital Library

[33]

A. Cheung and O. Arden. Statusquo: Making familiar abstractions perform using program analysis. In CIDR, 2013.

[34]

A. Cheung, A. Solar-Lezama, and S. Madden. Optimizing database-backed applications with query synthesis. In PLDI, pages 3--14, 2013.

Digital Library

[35]

S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR, 2017.

[36]

S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. In PLDI, pages 510--524, 2017.

Digital Library

[37]

D. Dash, N. Polyzotis, and A. Ailamaki. Cophy: A scalable, portable, and interactive index advisor for large workloads. PVLDB, 4(6):362--372, 2011.

Digital Library

[38]

L. Fegaras and D. Maier. An algebraic framework for physical OODB design. In DBLP, pages 6--8, 1995.

Digital Library

[39]

J. Gjengset, M. Schwarzkopf, J. Behrens, L. T. Araújo, M. Ek, E. Kohler, M. F. Kaashoek, and R. Morris. Noria: dynamic, partially-stateful data-flow for high-performance web applications. In USENIX, pages 213--231, 2018.

Digital Library

[40]

D. Gluche and M. Scholl. Physical design in OODBMS. In Grundlagen von Datenbanken, pages 21--25, 1996.

[41]

M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: A main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.

Digital Library

[42]

S. Idreos, N. Dayan, W. Qin, M. Akmanalp, S. Hilgard, A. Ross, J. Lennon, V. Jain, H. Gupta, D. Li, et al. Design continuums and the path toward self-designing key-value stores that know and learn. In CIDR, 2019.

[43]

A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800--812, 2018.

Digital Library

[44]

W. Kim and F. H. Lochovsky. Object-oriented concepts, databases, and applications. ACM Press/Addison-Wesley Publishing Co., 1989.

Digital Library

[45]

H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Coradd: Correlation aware database designer for materialized views and indexes. PVLDB, 3(1):1103--1113, 2010.

Digital Library

[46]

A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned cardinalities: Estimating correlated joins with deep learning. In CIDR, 2019.

[47]

Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014.

Digital Library

[48]

C. Loncaric, M. D. Ernst, and E. Torlak. Generalized data structure synthesis. In ICSE, 2018.

Digital Library

[49]

C. Loncaric, E. Torlak, and M. D. Ernst. Fast synthesis of fast collections. In PLDI, pages 355--368, 2016.

Digital Library

[50]

R. Marcus and O. Papaemmanouil. Towards a hands-free query optimizer through deep learning. In CIDR, 2019.

[51]

M. J. Mior, K. Salem, A. Aboulnaga, and R. Liu. Nose: Schema design for nosql applications. In ICDE, pages 181--192, 2016.

[52]

J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning state representations for query optimization with deep reinforcement learning. In DEEM, pages 4:1--4:4, 2018.

Digital Library

[53]

A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In SIGMOD, pages 1--12, 2012.

Digital Library

[54]

Transaction Processing Performance Council. The TPC-H benchmark. http://www.tpc.org/information/benchmarks.asp, 1999.

[55]

C. Wang, A. Cheung, and R. Bodík. Speeding up symbolic reasoning for relational queries. PACMPL, 2(OOPSLA):157:1--157:25, 2018.

Digital Library

[56]

C. Yan and A. Cheung. Leveraging lock contention to improve OLTP application performance. PVLDB, 9(5):444--455, 2016.

Digital Library

[57]

C. Yan, J. Yang, A. Cheung, and S. Lu. Understanding database performance inefficiencies in real-world web applications. In CIKM, 2017.

Digital Library

[58]

J. Yang, C. Yan, P. Subramaniam, S. Lu, and A. Cheung. How not to structure your database-backed web applications: A study of performance bugs in the wild. In ICSE, pages 800--810, 2018.

Digital Library

[59]

J. Yang, C. Yan, P. Subramaniam, S. Lu, and A. Cheung. PowerStation: Automatically detecting and fixing inefficiencies of database-backed web applications in ide. In ESEC/FSE, pages 884--887, 2018.

Digital Library

Cited By

Chatterjee SPekala MKruglyak LIdreos S(2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639302
Cheung AAhmad MHaynes BKittivorawong CLaddad SLiu XWang CYan C(2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611635
Dong RLiu JZhu YYan CMozafari BWang X(2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611515
Show More Cited By

Generating application-specific data layouts for in-memory databases
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Data page layouts for relational databases on deep memory hierarchies

Relational database systems have traditionally optimized for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and ...
Non-Relational Databases in Big Data
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

These days' Big data is becoming a very essential component for the industries where large volume of data at very high speed is used to solve particular data problems. Generally, big data is first analyzed and then used with other available data in the ...
Access-Aware In-memory Data Cache Middleware for Relational Databases
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, reads and other complex queries. In recent years, scientists regularly encounter limitations due to large data sets in many ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 12, Issue 11

July 2019

543 pages

ISSN:2150-8097

Editors:
Lei Chen,
Fatma Özcan

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2019

Published in PVLDB Volume 12, Issue 11

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
211
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chatterjee SPekala MKruglyak LIdreos S(2024)Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage EnginesProceedings of the ACM on Management of Data10.1145/36393022:1(1-28)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639302
Cheung AAhmad MHaynes BKittivorawong CLaddad SLiu XWang CYan C(2023)Towards Auto-Generated Data SystemsProceedings of the VLDB Endowment10.14778/3611540.361163516:12(4116-4129)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611635
Dong RLiu JZhu YYan CMozafari BWang X(2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611515
Yan CLin YHe Y(2023)Predicate Pushdown for Data Science PipelinesProceedings of the ACM on Management of Data10.1145/35892811:2(1-28)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589281
Shaikhha AKelepeshis MGhorbani MDubach CBruening DHardekopf B(2023)Fine-Tuning Data Structures for Query ProcessingProceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3579990.3580016(149-161)Online publication date: 17-Feb-2023
https://dl.acm.org/doi/10.1145/3579990.3580016
Feser JMadden STang NSolar-Lezama A(2020)Deductive optimization of relational data storageProceedings of the ACM on Programming Languages10.1145/34282384:OOPSLA(1-30)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3428238
Yang JSethi UYan CCheung ALu SRothermel GBae D(2020)Managing data constraints in database-backed web applicationsProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380375(1098-1109)Online publication date: 27-Jun-2020
https://dl.acm.org/doi/10.1145/3377811.3380375

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents