[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2882903.2915231acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads

Published: 14 June 2016 Publication History

Abstract

Data-intensive applications seek to obtain trill insights in real-time by analyzing a combination of historical data sets alongside recently collected data. This means that to support such hybrid workloads, database management systems (DBMSs) need to handle both fast ACID transactions and complex analytical queries on the same database. But the current trend is to use specialized systems that are optimized for only one of these workloads, and thus require an organization to maintain separate copies of the database. This adds additional cost to deploying a database application in terms of both storage and administration overhead.
To overcome this barrier, we present a hybrid DBMS architecture that efficiently supports varied workloads on the same database. Our approach differs from previous methods in that we use a single execution engine that is oblivious to the storage layout of data without sacrificing the performance benefits of the specialized systems. This obviates the need to maintain separate copies of the database in multiple independent systems. We also present a technique to continuously evolve the database's physical storage layout by analyzing the queries' access patterns and choosing the optimal layout for different segments of data within the same table. To evaluate this work, we implemented our architecture in an in-memory DBMS. Our results show that our approach delivers up to 3x higher throughput compared to static storage layouts across different workloads. We also demonstrate that our continuous adaptation mechanism allows the DBMS to achieve a near-optimal layout for an arbitrary workload without requiring any manual tuning.

References

[1]
Peloton Database Management System. http://pelotondb.org.
[2]
Linux perf framework. https://perf.wiki.kernel.org/index.php/Main_Page.
[3]
PostgreSQL Query Plan Cost. http://www.postgresql.org/docs/9.5/static/using-explain.html.
[4]
MemSQL. http://www.memsql.com, 2015.
[5]
MemSQL -- Columnstore. http://docs.memsql.com/4.0/concepts/columnstore/, 2015.
[6]
D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.
[7]
D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In SIGMOD, 2008.
[8]
S. Agrawal, V. Narasayya, and B. Yang. Integrating vertical and horizontal partitioning into automated physical database design. In SIGMOD, 2004.
[9]
A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, 2001.
[10]
A. Ailamaki, D. J. DeWitt, and M. D. Hill. Data page layouts for relational databases on deep memory hierarchies. The VLDB Journal, 11 (3): 198--215, 2002.
[11]
I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A hands-free adaptive store. In SIGMOD, 2014.
[12]
J. Arulraj, A. Pavlo, and S. Dulloor. Let's talk about storage & recovery methods for non-volatile memory database systems. In SIGMOD, 2015.
[13]
D. Beaver, S. Kumar, H. C. Li, J. Sobel, P. Vajgel, and F. Inc. Finding a needle in haystack: Facebook's photo storage. In OSDI, 2010.
[14]
P. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, pages 54--65, 1999.
[15]
P. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, 2005.
[16]
N. Bruno and S. Chaudhuri. An online approach to physical design tuning. In ICDE, 2007.
[17]
G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In SIGMOD, 1985.
[18]
D. Cornell and P. Yu. An effective approach to vertical partitioning for physical design of relational databases. In IEEE TSE, 1990.
[19]
N. de Bruijn. Asymptotic Methods in Analysis. Dover, 1981.
[20]
J. Dittrich and A. Jindal. Towards a one size fits all database architecture. In CIDR, pages 195--198, 2011.
[21]
G. Graefe. Volcano -- an extensible and parallel query evaluation system. In IEEE TKDE, volume 6, pages 120--135, Feb. 1994.
[22]
r, Plattner, Zeier, Cudre-Mauroux, and Madden}grund10M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: a main memory hybrid storage engine. In VLDB, pages 105--116, 2010.
[23]
R. A. Hankins and J. M. Patel. Data morphing: An adaptive, cache-conscious storage technique. In VLDB, 2003.
[24]
S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498, 2006.
[25]
S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.
[26]
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In SIGMOD, 2009.
[27]
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 2010.
[28]
A. Jindal and J. Dittrich. Relax and let the database do the partitioning online. In Enabling Real-Time Business Intelligence, Lecture Notes in Business Information Processing, 2012.
[29]
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011.
[30]
P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. In VLDB, 2011.
[31]
r, and Grund}lee13J. Lee, M. Muehle, N. May, F. Faerber, V. Sikka, H. Plattner, J. Krüger, and M. Grund. High-performance transaction processing in SAP HANA. IEEE Data Eng. Bull., 36 (2): 28--33, 2013.
[32]
N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory OLTP recovery. In ICDE, 2014.
[33]
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: Memory access. VLDB Journal, 2000.
[34]
MemSQL. How MemSQL Works. http://docs.memsql.com/4.1/intro/.
[35]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17 (1): 94--162, 1992. ISSN 0362--5915.
[36]
S. B. Navathe and M. Ra. Vertical partitioning for database design: A graphical algorithm. In SIGMOD, 1989.
[37]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow., 4 (9): 539--550, June 2011.
[38]
T. Neumann, T. Mühlbauer, and A. Kemper. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD, 2015.
[39]
Oracle. Oracle database in-memory option to accelerate analytics, data warehousing, reporting and OLTP. http://www.oracle.com/us/corporate/press/2020717, 2013.
[40]
M. Pezzini, D. Feinberg, N. Rayner, and R. Edjlali. Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation. https://www.gartner.com/doc/2657815/, 2014.
[41]
H. Plattner. A common database approach for OLTP and OLAP using an in-memory column database. In SIGMOD, 2009.
[42]
R. Ramamurthy, D. J. DeWitt, and Q. Su. A case for fractured mirrors. In Proceedings of the 28th International Conference on Very Large Data Bases, VLDB '02, pages 430--441, 2002.
[43]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE, 2008.
[44]
V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus, R. Mueller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle, A. Storm, and L. Zhang. DB2 with BLU acceleration: So much more than just a column store. In VLDB, volume 6, pages 1080--1091, 2013.
[45]
A. Rosenberg. Improving query performance in data warehouses. Business Intelligence Journal, 11, Jan. 2006.
[46]
D. Schwalb, M. Faust, J. Wust, M. Grund, and H. Plattner. Efficient transaction processing for Hyrise in mixed workload environments. In IMDM, 2014.
[47]
V. Sikka, F. Farber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd. Efficient transaction processing in SAP HANA database: The end of a column store myth. In SIGMOD, pages 731--742, 2012.
[48]
V. Sikka, F. Farber, A. Goel, and W. Lehner. SAP HANA: The evolution from a modern main-memory data platform to an enterprise application platform. In VLDB, 2013.
[49]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: A column-oriented dbms. VLDB, pages 553--564, 2005.
[50]
A. H. Watson, T. J. Mccabe, and D. R. Wallace. Structured testing: A software testing methodology using the cyclomatic complexity metric. In U.S. Department of Commerce/National Institute of Standards and Technology, 1996.
[51]
M. Zukowski and P. A. Boncz. Vectorwise: Beyond column stores. In IEEE Data Engineering Bulletin, 2012.

Cited By

View all
  • (2024)KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data InterconnectionProceedings of the VLDB Endowment10.14778/3685800.368581017:12(3841-3854)Online publication date: 8-Nov-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)SuccinctKV: a CPU-efficient LSM-tree Based KV Store with Scan-based CompactionACM Transactions on Architecture and Code Optimization10.1145/369587321:4(1-26)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HTAP
  2. column-stores
  3. hybrid workloads

Qualifiers

  • Research-article

Funding Sources

  • U.S. National Science Foundation
  • Intel Science and Technology Center for Big Data

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data InterconnectionProceedings of the VLDB Endowment10.14778/3685800.368581017:12(3841-3854)Online publication date: 8-Nov-2024
  • (2024)Two Birds With One Stone: Designing a Hybrid Cloud Storage Engine for HTAPProceedings of the VLDB Endowment10.14778/3681954.368200117:11(3290-3303)Online publication date: 1-Jul-2024
  • (2024)SuccinctKV: a CPU-efficient LSM-tree Based KV Store with Scan-based CompactionACM Transactions on Architecture and Code Optimization10.1145/369587321:4(1-26)Online publication date: 20-Nov-2024
  • (2024)Demonstrating PARS: A Decision Support System for Developing Vertical Partitioning PlansProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679224(5249-5253)Online publication date: 21-Oct-2024
  • (2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
  • (2024)Effortless Locality on Data Systems Using Relational FabricIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338682736:12(7410-7422)Online publication date: Dec-2024
  • (2024)A prefetching indexing scheme for in-memory database systemsFuture Generation Computer Systems10.1016/j.future.2024.03.012156(179-190)Online publication date: Jul-2024
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2024)A survey on hybrid transactional and analytical processingThe VLDB Journal10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
  • (2023)Mammoths are Slow: The Overlooked Transactions of Graph DataProceedings of the VLDB Endowment10.14778/3636218.363624117:4(904-911)Online publication date: 1-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media