[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3514221.3517834acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Proteus: Autonomous Adaptive Storage for Mixed Workloads

Published: 11 June 2022 Publication History

Abstract

Enterprises use distributed database systems to meet the demands of mixed or hybrid transaction/analytical processing (HTAP) workloads that contain both transactional (OLTP) and analytical (OLAP) requests. Distributed HTAP systems typically maintain a complete copy of data in row-oriented storage format that is well-suited for OLTP workloads and a second complete copy in column-oriented storage format optimized for OLAP workloads. Maintaining these data copies consumes significant storage space and system resources. Conversely, if a system stores data in a single format, OLTP or OLAP workload performance suffers. This paper presents Proteus, a distributed HTAP database system that adaptively and autonomously selects and changes its storage layout to optimize for mixed workloads. Proteus generates physical execution plans that utilize storage-aware operators for efficient transaction execution. Using comprehensive HTAP workloads and state-of-the-art comparison systems, we demonstrate that Proteus delivers superior HTAP performance while providing OLTP and OLAP performance on par with designs specialized for either type of workload.

References

[1]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 671--682.
[2]
Daniel J Abadi, Samuel R Madden, and Nabil Hachem. 2008. Column-stores vs. row-stores: how different are they really?. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA, 967--980.
[3]
Daniel J Abadi, Daniel S Myers, David J DeWitt, and Samuel R Madden. 2007. Materialization strategies in a column-oriented DBMS. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 466--475.
[4]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020 a. DynaMast: Adaptive dynamic mastering for replicated systems. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, IEEE, 1381--1392.
[5]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020 b. MorphoSys: Automatic physical design metamorphosis for distributed database systems. Proceedings of the VLDB Endowment, Vol. 13, 13 (2020), 3573--3587.
[6]
Michael Abebe, Horatiu Lazu, and Khuzaima Daudjee. 2022. Proteus: Autonomous Adaptive Storage for Mixed Workloads. Technical Report. University of Waterloo. https://cs.uwaterloo.ca/ kdaudjee/ProteusTR.pdf.
[7]
Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, New York, NY, USA, 1103--1114.
[8]
Vaibhav Arora, Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. 2017. Janus: A hybrid scalable multi-representation cloud datastore. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 4 (2017), 689--702.
[9]
Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the archipelago between row-stores and column-stores for hybrid workloads. In Proceedings of the 2016 International Conference on Management of Data. ACM, New York, NY, USA, 583--598.
[10]
Mike W Blasgen, Morton M Astrahan, Donald D Chamberlin, JN Gray, WF King, Bruce G Lindsay, Raymond A Lorie, James W Mehl, Thomas G Price, Gianfranco R Putzolu, et al. 1981. System R: An architectural overview. IBM systems journal, Vol. 20, 1 (1981), 41--62.
[11]
Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, Vol. 5. 225--237.
[12]
Gong Chen, Wenbo He, Jie Liu, Suman Nath, Leonidas Rigas, Lin Xiao, and Feng Zhao. 2008. Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services. In NSDI, Vol. 8. 337--350.
[13]
Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, et al. 2011. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems. 1--6.
[14]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 2010 ACM symposium on Cloud Computing (SoCC). ACM, 143--154.
[15]
George Copeland and Setreg Khoshafian. 1985. A Decomposition Storage Model. ACM Press.
[16]
Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. (2010).
[17]
Khuzaima Daudjee and Kenneth Salem. 2006. Lazy Database Replication with Snapshot Isolation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). 715--726.
[18]
Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. 2013. Oltp-bench: An extensible testbed for benchmarking relational databases. PVLDB, Vol. 7, 4 (2013), 277--288.
[19]
Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, and Tim Kraska. 2021. Instance-Optimized Data Layouts for Cloud Analytics Workloads. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS '21). ACM, New York, NY, USA, 418--431.
[20]
Anil K Goel, Jeffrey Pound, Nathan Auch, Peter Bumbulis, Scott MacLean, Franz F"arber, Francis Gropengiesser, Christian Mathis, Thomas Bodner, and Wolfgang Lehner. 2015. Towards scalable real-time analytics: An architecture for scale-out of OLxP workloads. Proceedings of the VLDB Endowment, Vol. 8, 12 (2015), 1716--1727.
[21]
Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, and Samuel Madden. 2010. HYRISE: a main memory hybrid storage engine. Proceedings of the VLDB Endowment, Vol. 4, 2 (2010), 105--116.
[22]
Herodotos Herodotou and Elena Kakoulli. 2019. Automating Distributed Tiered Storage Management in Cluster Computing. Proc. VLDB Endow., Vol. 13, 1 (2019), 43--56.
[23]
Benjamin Hilprecht, Carsten Binnig, and Uwe Röhm. 2020. Learning a partitioning advisor for cloud databases. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 143--157.
[24]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020 a. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3072--3084.
[25]
Yihe Huang, William Qian, Eddie Kohler, Barbara Liskov, and Liuba Shrira. 2020 b. Opportunities for optimism in contended main-memory multicore transactions. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 629--642.
[26]
Stratos Idreos, Martin L Kersten, and Stefan Manegold. 2009. Self-organizing tuple reconstruction in column-stores. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. ACM, New York, NY, USA, 297--308.
[27]
Stratos Idreos, Martin L Kersten, Stefan Manegold, et al. 2007. Database Cracking. In CIDR, Vol. 7. 68--78.
[28]
Alekh Jindal. 2010. The mimicking octopus: Towards a one-size-fits-all database architecture. In VLDB PhD Workshop. 78--83.
[29]
Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan PC Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J Abadi. 2008. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, Vol. 1, 2 (2008), 1496--1499.
[30]
Donghe Kang, Ruochen Jiang, and Spyros Blanas. 2021. Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS '21). ACM, New York, NY, USA, 898--911.
[31]
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research, Vol. 10 (2009), 1755--1758.
[32]
Jay Kreps, Neha Narkhede, and Jun Rao. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB .
[33]
Per-Ake Larson, Cipri Clinciu, Campbell Fraser, Eric N Hanson, Mostafa Mokhtar, Michal Nowakiewicz, Vassilis Papadimos, Susan L Price, Srikumar Rangarajan, Remus Rusanu, et al. 2013. Enhancements to SQL server column stores. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1159--1168.
[34]
Per-Åke Larson, Eric N Hanson, and Susan L Price. 2012. Columnar Storage in SQL Server 2012. IEEE Data Eng. Bull., Vol. 35, 1 (2012), 15--20.
[35]
Guy M Lohman, C Mohan, Laura M Haas, Dean Daniels, Bruce G Lindsay, Patricia G Selinger, and Paul F Wilms. 1985. Query processing in R. In Query processing in database systems. Springer, 31--47.
[36]
Yi Lu, Anil Shanbhag, Alekh Jindal, and Samuel Madden. 2017. AdaptDB: Adaptive Partitioning for Distributed Joins. Proc. VLDB Endow., Vol. 10, 5 (2017), 589--600.
[37]
Yi Lu, Xiangyao Yu, and Samuel Madden. 2019. STAR: Scaling Transactions through Asymmetric Replication. PVLDB, Vol. 12, 11 (2019), 1316--1329.
[38]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, Wen Lin, Ashwin Agrawal, Junfeng Yang, Hao Wu, Xiaoliang Li, Feng Guo, Jiang Wu, Jesse Zhang, and Venkatesh Raghavan. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS '21). ACM, New York, NY, USA, 2530--2542.
[39]
Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J Gordon. 2018. Query-based workload forecasting for self-driving database management systems. In Proceedings of the 2018 International Conference on Management of Data. ACM, New York, NY, USA, 631--645.
[40]
Darko Makreshanski, Jana Giceva, Claude Barthels, and Gustavo Alonso. 2017. BatchDB: Efficient isolated execution of hybrid OLTP
[41]
OLAP workloads for interactive applications. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, New York, NY, USA, 37--50.
[42]
Ryan Marcus, Olga Papaemmanouil, Sofiya Semenova, and Solomon Garber. 2018. NashDB: an end-to-end economic method for elastic database fragmentation, replication, and provisioning. In Proceedings of the 2018 International Conference on Management of Data. ACM, New York, NY, USA, 1253--1267.
[43]
Elizabeth J O'neil, Patrick E O'neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database disk buffering. Acm Sigmod Record, Vol. 22, 2 (1993), 297--306.
[44]
Fatma Özcan, Yuanyuan Tian, and Pinar Tözün. 2017. Hybrid transactional/analytical processing: A survey. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, New York, NY, USA, 1771--1775.
[45]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[46]
Andrew Pavlo, Matthew Butrovich, Lin Ma, Prashanth Menon, Wan Shen Lim, Dana Van Aken, and William Zhang. 2021. Make your database system dream of electric sheep: towards self-driving operation. Proceedings of the VLDB Endowment, Vol. 14, 12 (2021), 3211--3221.
[47]
Massimo Pezzini, Donald Feinberg, Nigel Rayner, and Roxane Edjlali. 2014. Hybrid transaction/analytical processing will foster opportunities for dramatic business innovation. Gartner (2014).
[48]
M Pezzini, D Feinberg, N Rayner, and R Edjlali. 2016. Real-time Insights and Decision Making using Hybrid Streaming, In-Memory Computing Analytics and Transaction Processing. Gartner (2016).
[49]
Ravishankar Ramamurthy, David J DeWitt, and Qi Su. 2003. A case for fractured mirrors. The VLDB Journal, Vol. 12, 2 (2003), 89--101.
[50]
Aunn Raza, Periklis Chrysogelos, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. Adaptive HTAP through elastic resource scheduling. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, 2043--2054.
[51]
Zechao Shang, Xi Liang, Dixin Tang, Cong Ding, Aaron J Elmore, Sanjay Krishnan, and Michael J Franklin. 2020. CrocodileDB: Efficient Database Execution through Intelligent Deferment. In CIDR .
[52]
Vishal Sikka, Franz F"arber, Anil Goel, and Wolfgang Lehner. 2013. SAP HANA: The evolution from a modern main-memory data platform to an enterprise application platform. Proceedings of the VLDB Endowment, Vol. 6, 11 (2013), 1184--1185.
[53]
Mark Slee et al. 2007. Thrift: Scalable cross-language services implementation.
[54]
Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-Store: A Column-Oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB '05). VLDB Endowment, 553--564.
[55]
Michael Stonebraker and Lawrence A Rowe. 1986. The design of Postgres. ACM Sigmod Record, Vol. 15, 2 (1986), 340--355.
[56]
Yihan Sun, Guy E Blelloch, Wan Shen Lim, and Andrew Pavlo. 2019. On supporting efficient snapshot isolation for hybrid workloads with multi-versioned indexes. Proceedings of the VLDB Endowment, Vol. 13, 2 (2019).
[57]
Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, and Francisco Andrade. 2018. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data. 205--219.
[58]
Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proceedings of the VLDB Endowment, Vol. 8, 3 (2014), 245--256.
[59]
Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale. The American Statistician, Vol. 72, 1 (2018), 37--45.
[60]
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems. In Proceedings of the 2012 ACM International Conference on Management of Data (SIGMOD) (Scottsdale, Arizona, USA). ACM, New York, NY, USA, 1--12. https://doi.org/10.1145/2213836.2213838
[61]
Ouri Wolfson, Sushil Jajodia, and Yixiu Huang. 1997. An adaptive data replication algorithm. ACM Transactions on Database Systems (TODS), Vol. 22, 2 (1997), 255--314.
[62]
Jiacheng Yang, Ian Rae, Jun Xu, Jeff Shute, Zhan Yuan, Kelvin Lau, Qiang Zeng, Xi Zhao, Jun Ma, Ziyang Chen, et al. 2020 b. F1 Lightning: HTAP as a Service. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3313--3325.
[63]
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020 a. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 193--208.
[64]
Marcin Zukowski, Sandor Heman, Niels Nes, and Peter Boncz. 2006. Super-scalar RAM-CPU cache compression. In 22nd International Conference on Data Engineering (ICDE'06). IEEE, 59--59.

Cited By

View all
  • (2025)A Multiple Compression Approach using Attribute-based SignaturesOpen Research Europe10.12688/openreseurope.19247.15(49)Online publication date: 10-Feb-2025
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 30-Aug-2024
  • (2024)Partition, Don't Sort! Compression Boosters for Cloud Data Ingestion PipelinesProceedings of the VLDB Endowment10.14778/3681954.368201317:11(3456-3469)Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive storage
  2. hybrid databases
  3. mixed workloads

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)14
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Multiple Compression Approach using Attribute-based SignaturesOpen Research Europe10.12688/openreseurope.19247.15(49)Online publication date: 10-Feb-2025
  • (2024)Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRADProceedings of the VLDB Endowment10.14778/3681954.368202617:11(3629-3643)Online publication date: 30-Aug-2024
  • (2024)Partition, Don't Sort! Compression Boosters for Cloud Data Ingestion PipelinesProceedings of the VLDB Endowment10.14778/3681954.368201317:11(3456-3469)Online publication date: 30-Aug-2024
  • (2024)The optimization problem of system load balancing and its solution for industrial Internet-of-Things data managementSCIENTIA SINICA Informationis10.1360/SSI-2023-021154:10(2343)Online publication date: 30-Sep-2024
  • (2024)HTAP Databases: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338969336:11(6410-6429)Online publication date: Nov-2024
  • (2024)SharDAG: Scaling DAG-Based Blockchains Via Adaptive Sharding2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00165(2068-2081)Online publication date: 13-May-2024
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 6-Jun-2024
  • (2024)A survey on hybrid transactional and analytical processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00858-933:5(1485-1515)Online publication date: 4-Jun-2024
  • (2023)Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data MeshesProceedings of the VLDB Endowment10.14778/3611479.361152616:11(3293-3301)Online publication date: 1-Jul-2023
  • (2023)Towards a Signature Based Compression Technique for Big Data Storage2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW58674.2023.00022(100-104)Online publication date: Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media