Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Forecasting Algorithms for Intelligent Resource Scaling: An Experimental Analysis
- Yanlei Diao,
- Dominik Horn,
- Andreas Kipf,
- Oleksandr Shchur,
- Ines Benito,
- Wenjian Dong,
- Davide Pagano,
- Pascal Pfeil,
- Vikram Nathan,
- Balakrishnan Narayanaswamy,
- Tim Kraska
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 126–143https://doi.org/10.1145/3698038.3698564There has been a growing demand for making modern cloud-based data analytics systems cost-effective and easy to use. AI-powered intelligent resource scaling is one such effort, aiming at automating scaling decisions for serverless offerings like Amazon ...
- research-articleNovember 2024
Vista: Machine Learning based Database Performance Troubleshooting Framework in Amazon RDS
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 83–98https://doi.org/10.1145/3698038.3698519Database performance troubleshooting is a complex multi-step process that broadly involves three key stages- (a) Detection: determining what's wrong and when; (b) Root Cause Analysis (RCA): reasoning about why is the performance poor; (c) Resolution: ...
- research-articleAugust 2024
Databases Unbound: Querying All of the World's Bytes with AI
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4546–4554https://doi.org/10.14778/3685800.3685916Over the past five decades, the relational database model has proven to be a scaleable and adaptable model for querying a variety of structured data, with use cases in analytics, transactions, graphs, streaming and more. However, most of the world's data ...
- research-articleAugust 2024
Resource Management in Aurora Serverless
- Bradley Barnhart,
- Marc Brooker,
- Daniil Chinenkov,
- Tony Hooper,
- Jihoun Im,
- Prakash Chandra Jha,
- Tim Kraska,
- Ashok Kurakula,
- Alexey Kuznetsov,
- Grant McAlister,
- Arjun Muthukrishnan,
- Aravinthan Narayanan,
- Douglas Terry,
- Bhuvan Urgaonkar,
- Jiaming Yan
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4038–4050https://doi.org/10.14778/3685800.3685825Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora with full MySQL and PostgreSQL compatibility. It automatically offers capacity scale-up/down (i.e., vertical scaling) based on a customer database application's needs. ...
Why TPC is Not Enough: An Analysis of the Amazon Redshift Fleet
- Alexander van Renen,
- Dominik Horn,
- Pascal Pfeil,
- Kapil Vaidya,
- Wenjian Dong,
- Murali Narayanaswamy,
- Zhengchun Liu,
- Gaurav Saxena,
- Andreas Kipf,
- Tim Kraska
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 3694–3706https://doi.org/10.14778/3681954.3682031Database research and development is heavily influenced by benchmarks, such as the industry-standard TPC-H and TPC-DS for analytical systems. However, these twenty-year-old benchmarks neither capture how databases are deployed nor what workloads modern ...
-
Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD
- Geoffrey X. Yu,
- Ziniu Wu,
- Ferdi Kossmann,
- Tianyu Li,
- Markos Markakis,
- Amadou Ngom,
- Samuel Madden,
- Tim Kraska
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 3629–3643https://doi.org/10.14778/3681954.3682026Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious ...
- short-paperJune 2024
Mallet: SQL Dialect Translation with LLM Rule Generation
aiDM '24: Proceedings of the Seventh International Workshop on Exploiting Artificial Intelligence Techniques for Data ManagementArticle No.: 3, Pages 1–5https://doi.org/10.1145/3663742.3663973Translating between the SQL dialects of different systems is important for migration and federated query processing. Existing approaches rely on hand-crafted translation rules, which tend to be incomplete and hard to maintain, especially as the number of ...
- research-articleJune 2024
Predicate Caching: Query-Driven Secondary Indexing for Cloud Data Warehouses
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 347–359https://doi.org/10.1145/3626246.3653395Cloud data warehouses are today's standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer workloads experience highly repetitive query patterns, i.e., users ...
- research-articleJune 2024
Intelligent Scaling in Amazon Redshift
- Vikram Nathan,
- Vikramank Singh,
- Zhengchun Liu,
- Mohammad Rahman,
- Andreas Kipf,
- Dominik Horn,
- Davide Pagano,
- Gaurav Saxena,
- Balakrishnan Narayanaswamy,
- Tim Kraska
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 269–279https://doi.org/10.1145/3626246.3653394Cloud-based data warehouses are built to be easy to use, requiring minimal intervention from customers as their workloads scale. However, there are still many dimensions of a workload that they do not scale with automatically. For example, in cloud-...
- research-articleJune 2024
Stage: Query Execution Time Prediction in Amazon Redshift
- Ziniu Wu,
- Ryan Marcus,
- Zhengchun Liu,
- Parimarjan Negi,
- Vikram Nathan,
- Pascal Pfeil,
- Gaurav Saxena,
- Mohammad Rahman,
- Balakrishnan Narayanaswamy,
- Tim Kraska
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 280–294https://doi.org/10.1145/3626246.3653391Query performance (e.g., execution time) prediction is a critical component of modern DBMSes. As a pioneering cloud data warehouse, Amazon Redshift relies on an accurate execution time prediction for many downstream tasks, ranging from high-level ...
- research-articleJune 2024
Automated Multidimensional Data Layouts in Amazon Redshift
- Jialin Ding,
- Matt Abrams,
- Sanghita Bandyopadhyay,
- Luciano Di Palma,
- Yanzhu Ji,
- Davide Pagano,
- Gopal Paliwal,
- Panos Parchas,
- Pascal Pfeil,
- Orestis Polychroniou,
- Gaurav Saxena,
- Aamer Shah,
- Amina Voloder,
- Sherry Xiao,
- Davis Zhang,
- Tim Kraska
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 55–67https://doi.org/10.1145/3626246.3653379Analytic data systems typically use data layouts to improve the performance of scanning and filtering data. Common data layout techniques include single-column sort keys, compound sort keys, and more complex multidimensional data layouts such as the Z-...
- research-articleJuly 2023
Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 11Pages 3293–3301https://doi.org/10.14778/3611479.3611526The last decade of database research has led to the prevalence of specialized systems for different workloads. Consequently, organizations often rely on a combination of specialized systems, organized in a Data Mesh. Data meshes present significant ...
- articleJune 2023
Technical Perspective for Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Separation of compute and storage has become the defacto standard for cloud database systems. First proposed in 2007 for database systems [2], it is now widely adopted by all major cloud providers such as Amazon Redshift, Google BigQuery, and Snowflake. ...
- research-articleJune 2023
Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift
- Gaurav Saxena,
- Mohammad Rahman,
- Naresh Chainani,
- Chunbin Lin,
- George Caragea,
- Fahim Chowdhury,
- Ryan Marcus,
- Tim Kraska,
- Ippokratis Pandis,
- Balakrishnan (Murali) Narayanaswamy
SIGMOD '23: Companion of the 2023 International Conference on Management of DataPages 225–237https://doi.org/10.1145/3555041.3589677There has been a lot of excitement around using machine learning to improve the performance and usability of database systems. However, few of these techniques have actually been used in the critical path of customer-facing database services. In this ...
FactorJoin: A New Cardinality Estimation Framework for Join Queries
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 41, Pages 1–27https://doi.org/10.1145/3588721Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on ...
Extract-Transform-Load for Video Streams
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 9Pages 2302–2315https://doi.org/10.14778/3598581.3598600Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a data ...
The Case for Learned In-Memory Joins
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 7Pages 1749–1762https://doi.org/10.14778/3587136.3587148In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is practical. To the ...
Robust Query Driven Cardinality Estimation under Changing Workloads
- Parimarjan Negi,
- Ziniu Wu,
- Andreas Kipf,
- Nesime Tatbul,
- Ryan Marcus,
- Sam Madden,
- Tim Kraska,
- Mohammad Alizadeh
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 6Pages 1520–1533https://doi.org/10.14778/3583140.3583164Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, and are easily adaptable for any kind of query. Unfortunately, such models can suffer ...
Can Learned Models Replace Hash Functions?
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 3Pages 532–545https://doi.org/10.14778/3570690.3570702Hashing is a fundamental operation in database management, playing a key role in the implementation of numerous core database data structures and algorithms. Traditional hash functions aim to mimic a function that maps a key to a random value, which can ...
- proceedingSeptember 2022