Comprehensive and Efficient Workload Summarization

Shaleen Deep¹,
Anja Gruenheid¹,
Paraschos Koutris²,
Stratis Viglas³ &
…
Jeffrey Naughton⁴

153 Accesses
1 Citation
Explore all metrics

Abstract

This work studies the problem of constructing a representative workload from a given input analytical query workload where the former serves as an approximation with guarantees of the latter. We discuss our work in the context of workload analysis and monitoring. As an example, evolving system usage patterns in a database system can cause load imbalance and performance regressions which can be controlled by monitoring system usage patterns, i.e., a representative workload, over time. To construct such a workload in a principled manner, we formalize the notions of workload representativity and coverage. These metrics capture the intuition that the distribution of features in a compressed workload should match a target distribution, increasing representativity, and include common queries as well as outliers, increasing coverage. We show that solving this problem optimally is computationally hard and present a novel greedy algorithm that provides approximation guarantees. We compare our techniques to established algorithms in this problem space such as sampling and clustering, and demonstrate advantages and key trade-offs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Data Summarization Using Sampling Algorithms: Data Stream Case Study

A new approximate query engine based on intelligent capture and fast transformations of granulated data summaries

Article Open access 05 July 2017

Streaming Methods in Data Analysis

Notes

K‑medoids is an iterative greedy algorithm that chooses \(k\) cluster centers, assigns all points to the closest center and iteratively refines the points in each cluster.
Hierarchical clustering is a top-down approach where all points start in a single cluster and the algorithm recursively splits the points into \(k\) disjoint clusters.
A small sample is chosen to make sure that clustering algorithms can finish execution.

References

(2020) SQL Server execution statistics. https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql?view=sql-server-ver15. Accessed 17 Nov 2020
(2020) TPC‑H Benchmark. http://www.tpc.org/tpch. Accessed 17 Nov 2020
Agrawal S, Chaudhuri S, Kollar L, Marathe A, Narasayya V, Syamala M (2005) Database tuning advisor for microsoft sql server 2005. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp 930–932
Chapter Google Scholar
Chaudhuri S, Gupta Kumar A, Narasayya V (2002) Compressing sql workloads. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM, pp 488–499
Chapter Google Scholar
Chaudhuri S, Narasayya V, Ganesan P (2003) Primitives for workload summarization and implications for SQL. In: Proceedings 2003 VLDB Conference. Elsevier, pp 730–741
Chapter Google Scholar
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on Cloud computing, pp 143–154
Chapter Google Scholar
Shaleen Deep, Gruenheid A, Paraschos Koutris, Naughton J, Viglas S (2020) Comprehensive and efficient workload compression. https://arxiv.org/abs/2011.05549. Accessed 17 Nov 2020
DeWitt DJ (1993) The wisconsin benchmark: past, present, and future
Google Scholar
Jain S, Howe B (2019) Query2Vec: NLP meets databases for generalized workload Analytics. CIDR.
Google Scholar
Krause A, Guestrin C (2005) A note on the budgeted maximization of submodular functions. Carnegie Mellon University. Center for Automated Learning and Discovery
Google Scholar
Kul G, Luong D, Xie T, Coonan P, Chandola V, Kennedy O, Upadhyaya S (2016) Ettu: Analyzing query intents in corporate databases. In: International World Wide Web Conferences Steering Committee (ed) Proceedings of the 25th International Conference Companion on World Wide Web, pp 463–466
Google Scholar
Kul G, Luong D, Xie T, Coonan P, Chandola V, Kennedy O, Upadhyaya S (2016) Summarizing large query logs in ettu. arXiv preprint arXiv:1608.01013
Google Scholar
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 420–429
Chapter Google Scholar
Macke S, Yiming Z, Silu H, Parameswaran A (2018) Adaptive sampling for rapidly matching histograms. Proc Vldb Endow 11(10):1262–1275
Article Google Scholar
O’Neil PE, O’Neil EJ, Chen X (2007) The star schema benchmark (SSB)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Gray Systems Lab, Madison, USA
Shaleen Deep & Anja Gruenheid
Department of Computer Sciences, University of Wisconsin-Madison, Madison, USA
Paraschos Koutris
School of Informatics, University of Edinburgh, Edinburgh, UK
Stratis Viglas
Celonis, München, Germany
Jeffrey Naughton

Authors

Shaleen Deep
View author publications
You can also search for this author in PubMed Google Scholar
Anja Gruenheid
View author publications
You can also search for this author in PubMed Google Scholar
Paraschos Koutris
View author publications
You can also search for this author in PubMed Google Scholar
Stratis Viglas
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Naughton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaleen Deep.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the article.

Additional information

This work was done while the authors Gruenheid, Viglas and Naughton were employed at Google.

Rights and permissions

Springer Nature oder sein Lizenzgeber (z.B. eine Gesellschaft oder ein*e andere*r Vertragspartner*in) hält die ausschließlichen Nutzungsrechte an diesem Artikel kraft eines Verlagsvertrags mit dem/den Autor*in(nen) oder anderen Rechteinhaber*in(nen); die Selbstarchivierung der akzeptierten Manuskriptversion dieses Artikels durch Autor*in(nen) unterliegt ausschließlich den Bedingungen dieses Verlagsvertrags und dem geltenden Recht.

Reprints and permissions

About this article

Cite this article

Deep, S., Gruenheid, A., Koutris, P. et al. Comprehensive and Efficient Workload Summarization. Datenbank Spektrum 22, 249–256 (2022). https://doi.org/10.1007/s13222-022-00427-w

Download citation

Received: 21 August 2022
Accepted: 08 October 2022
Published: 17 November 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s13222-022-00427-w

Comprehensive and Efficient Workload Summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Summarization Using Sampling Algorithms: Data Stream Case Study

A new approximate query engine based on intelligent capture and fast transformations of granulated data summaries

Streaming Methods in Data Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Comprehensive and Efficient Workload Summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Summarization Using Sampling Algorithms: Data Stream Case Study

A new approximate query engine based on intelligent capture and fast transformations of granulated data summaries

Streaming Methods in Data Analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation