[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICDE.2009.130guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

Published: 29 March 2009 Publication History

Abstract

One of the most challenging aspects of managing a very large data warehouse is identifying how queries will behave before they start executing. Yet knowing their performance characteristics --- their runtimes and resource usage --- can solve two important problems. First, every database vendor struggles with managing unexpectedly long-running queries. When these long-running queries can be identified before they start, they can be rejected or scheduled when they will not cause extreme resource contention for the other queries in the system. Second, deciding whether a system can complete a given workload in a given time period (or a bigger system is necessary) depends on knowing the resource requirements of the queries in that workload. We have developed a system that uses machine learning to accurately predict the performance metrics of database queries whose execution times range from milliseconds to hours. For training and testing our system, we used both real customer queries and queries generated from an extended set of TPC-DS templates. The extensions mimic queries that caused customer problems. We used these queries to compare how accurately different techniques predict metrics such as elapsed time, records used, disk I/Os, and message bytes. The most promising technique was not only the most accurate, but also predicted these metrics simultaneously and using only information available prior to query execution. We validated the accuracy of this machine learning technique on a number of HP Neoview configurations. We were able to predict individual query elapsed time within 20% of its actual time for 85% of the test queries. Most importantly, we were able to correctly identify both the short and long-running (up to two hour) queries to inform workload management and capacity planning.

Cited By

View all
  • (2024)Saving Money for Analytical Workloads in the CloudProceedings of the VLDB Endowment10.14778/3681954.368201817:11(3524-3537)Online publication date: 1-Jul-2024
  • (2024)Challenges & Opportunities in Automating DBMS: A Qualitative StudyProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695264(2013-2023)Online publication date: 27-Oct-2024
  • (2024)Performance or Efficiency? A Tale of Two Cores for DB WorkloadsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663444(1-5)Online publication date: 10-Jun-2024
  • Show More Cited By
  1. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering
      March 2009
      1772 pages
      ISBN:9780769535456

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 29 March 2009

      Author Tags

      1. database performance prediction
      2. machine learning
      3. operational business intelligence

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Saving Money for Analytical Workloads in the CloudProceedings of the VLDB Endowment10.14778/3681954.368201817:11(3524-3537)Online publication date: 1-Jul-2024
      • (2024)Challenges & Opportunities in Automating DBMS: A Qualitative StudyProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695264(2013-2023)Online publication date: 27-Oct-2024
      • (2024)Performance or Efficiency? A Tale of Two Cores for DB WorkloadsProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663444(1-5)Online publication date: 10-Jun-2024
      • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
      • (2023)Quantum Machine Learning for Join Order Optimization using Variational Quantum CircuitsProceedings of the International Workshop on Big Data in Emergent Distributed Environments10.1145/3579142.3594299(1-7)Online publication date: 18-Jun-2023
      • (2023)Auto-WLM: Machine Learning Enhanced Workload Management in Amazon RedshiftCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589677(225-237)Online publication date: 4-Jun-2023
      • (2022)TiresiasProceedings of the VLDB Endowment10.14778/3551793.355185715:11(3126-3136)Online publication date: 29-Sep-2022
      • (2022)Zero-shot cost models for out-of-the-box learned cost predictionProceedings of the VLDB Endowment10.14778/3551793.355179915:11(2361-2374)Online publication date: 29-Sep-2022
      • (2022)Multi-Tenant Cloud Data Services: State-of-the-Art, Challenges and OpportunitiesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522566(2465-2473)Online publication date: 10-Jun-2022
      • (2021)Database workload characterization with query plan encodersProceedings of the VLDB Endowment10.14778/3503585.350360015:4(923-935)Online publication date: 1-Dec-2021
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media