[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

DSB: a decision support benchmark for workload-driven and traditional database systems

Published: 01 September 2021 Publication History

Abstract

We describe a new benchmark, DSB, for evaluating both workload-driven and traditional database systems on modern decision support workloads. DSB is adapted from the widely-used industrial-standard TPC-DS benchmark. It enhances the TPC-DS benchmark with complex data distribution and challenging yet semantically meaningful query templates. DSB also introduces configurable and dynamic workloads to assess the adaptability of database systems. Since workload-driven and traditional database systems have different performance dimensions, including the additional resources required for tuning and maintaining the systems, we provide guidelines on evaluation methodology and metrics to report. We show a case study on how to evaluate both workload-driven and traditional database systems with the DSB benchmark. The code for the DSB benchmark is open sourced and is available at https://aka.ms/dsb.

References

[1]
[n.d.]. The DSB benchmark, https://aka.ms/dsb, last accessed on 2021-09-10.
[2]
[n.d.]. IMDB dataset. https://www.imdb.com/interfaces/, last accessed on 2021-09-10.
[3]
[n.d.]. A parallel zipf-skewed data generator for TPC-H benchmark. https://github.com/SrikanthKandula/tpch_dbgen_zipf_skew, last accessed on 2021-09-10.
[4]
[n.d.]. TPC-H data generation with skew. https://www.microsoft.com/en-us/download/details.aspx?id=52430, last accessed on 2021-09-10.
[5]
Ashraf Aboulnaga and Surajit Chaudhuri. 1999. Self-Tuning Histograms: Building Histograms without Looking at Data. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (SIGMOD '99). Association for Computing Machinery, New York, NY, USA, 181--192.
[6]
Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala. 2005. Database Tuning Advisor for Microsoft SQL Server 2005: Demo. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD '05). Association for Computing Machinery, New York, NY, USA, 930--932.
[7]
Peter Boncz, Angelos-Christos Anatiotis, and Steffen Kläbe. 2017. JCC-H: Adding join crossing correlations with skew to TPC-H. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 103--119.
[8]
Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. 2015. Smooth Scan: Statistics-oblivious access paths. In 2015 IEEE 31st International Conference on Data Engineering. 315--326.
[9]
Richard Cole, Florian Funke, Leo Giakoumakis, Wey Guy, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Meikel Poess, Kai-Uwe Sattler, Michael Seibold, Eric Simon, and Florian Waas. 2011. The Mixed Workload CH-BenCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems (DBTest '11). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages.
[10]
Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive query processing. Now Publishers Inc.
[11]
Bailu Ding, Surajit Chaudhuri, and Vivek Narasayya. 2020. Bitvector-Aware Query Optimization for Decision Support Queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 2011--2026.
[12]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1241--1258.
[13]
Anshuman Dutt, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2020. Efficiently Approximating Selectivity Functions Using Low Overhead Regression Models. Proc. VLDB Endow. 13, 12 (July 2020), 2215--2228.
[14]
Andrey Gubichev and Peter Boncz. 2014. Parameter Curation for Benchmark Queries. In 6th TPC Technology Conference on Performance Evaluation and Benchmarking. Springer/Verlag, 113--129.
[15]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CIDR (2019).
[16]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9, 3 (Nov. 2015), 204--215.
[17]
Guy Lohman. 2014. Is query optimization a "solved" problem. In Proc. Workshop on Database Query Optimization, Vol. 13. 10.
[18]
Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active Learning for ML-Enhanced Database Systems. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD 20). Association for Computing Machinery, New York, NY, USA, 175--191.
[19]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In Proceedings of the 2021 International Conference on Management of Data. Association for Computing Machinery, New York, NY, USA, 1275--1288.
[20]
Ryan Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. Proc. VLDB Endow. 12, 11 (July 2019), 1733--1746.
[21]
Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors. Proc. VLDB Endow. 2, 1 (Aug. 2009), 982--993.
[22]
Mark EJ Newman. 2005. Power laws, Pareto distributions and Zipf's law. Contemporary physics 46, 5 (2005), 323--351.
[23]
Meikel Poess and Chris Floyd. 2000. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Rec. 29, 4 (Dec. 2000), 64--71.
[24]
Meikel Poess, Bryan Smith, Lubor Kollar, and Paul Larson. 2002. TPC-DS, Taking Decision Support Benchmarking to the next Level. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). Association for Computing Machinery, New York, NY, USA, 582--587.
[25]
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. Proc. VLDB Endow. 14, 1 (Sept. 2020), 61--73.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)LST-Bench: Benchmarking Log-Structured Tables in the CloudProceedings of the ACM on Management of Data10.1145/36393142:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Index Terms

  1. DSB: a decision support benchmark for workload-driven and traditional database systems
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 14, Issue 13
      September 2021
      168 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 September 2021
      Published in PVLDB Volume 14, Issue 13

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 20 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
      • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
      • (2024)LST-Bench: Benchmarking Log-Structured Tables in the CloudProceedings of the ACM on Management of Data10.1145/36393142:1(1-26)Online publication date: 26-Mar-2024
      • (2024)Modeling Shifting Workloads for Learned Database SystemsProceedings of the ACM on Management of Data10.1145/36392932:1(1-27)Online publication date: 26-Mar-2024
      • (2024)Sub-optimal Join Order Identification with L1-errorProceedings of the ACM on Management of Data10.1145/36392722:1(1-24)Online publication date: 26-Mar-2024
      • (2023)Sample-Efficient Cardinality Estimation Using Geometric Deep LearningProceedings of the VLDB Endowment10.14778/3636218.363622917:4(740-752)Online publication date: 1-Dec-2023
      • (2023)An Empirical Evaluation of Columnar Storage FormatsProceedings of the VLDB Endowment10.14778/3626292.362629817:2(148-161)Online publication date: 1-Oct-2023
      • (2023)Analyzing the Impact of Cardinality Estimation on Execution Plans in Microsoft SQL ServerProceedings of the VLDB Endowment10.14778/3611479.361149416:11(2871-2883)Online publication date: 24-Aug-2023
      • (2023)Efficient Query Re-optimization with Judicious Subquery SelectionsProceedings of the ACM on Management of Data10.1145/35893301:2(1-26)Online publication date: 20-Jun-2023
      • (2023)MorphStream: Adaptive Scheduling for Scalable Transactional Stream Processing on MulticoresProceedings of the ACM on Management of Data10.1145/35889131:1(1-26)Online publication date: 30-May-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media