More Web Proxy on the site http://driver.im/

research-article

Lero: applying learning-to-rank in query optimizer

Authors:

Xingguang Chen,

Jingren ZhouAuthors Info & Claims

The VLDB Journal, Volume 33, Issue 5

Pages 1307 - 1331

https://doi.org/10.1007/s00778-024-00850-3

Published: 25 April 2024 Publication History

Abstract

In recent studies, machine learning techniques have been employed to support or enhance cost-based query optimizers in DBMS. Although these approaches have shown superiority in certain benchmarks, they also suffer from certain drawbacks. These include unstable performance, high training costs, and slow model updating, which can be attributed to the inherent challenges of predicting the cost or latency of execution plans using machine learning models. In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of the native query optimizer and continuously learns to improve query optimization. The key observation is that the relative order or rank of plans, rather than the exact cost or latency, is sufficient for query optimization. Lero employs a pairwise approach to train a classifier to compare any two plans and tell which one is better. Such a binary classification task is much easier than the regression task to predict the cost or latency, in terms of model efficiency and effectiveness. Rather than building a learned optimizer from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native optimizer. With its non-intrusive design, Lero can be implemented on top of any existing DBMS with minimum integration efforts. We implement Lero and demonstrate its outstanding performance using PostgreSQL and Spark SQL. In our experiments, Lero achieves near-optimal performance on several benchmarks. It reduces the execution time of the native PostgreSQL optimizer by up to

70 %

and other learned query optimizers by up to

37 %

on single-machine environments. On distributed environments, our Lero improves the running time of the native Spark SQL optimizer by up to

27 %

. Meanwhile, Lero continuously learns and automatically adapts to query workloads and changes in data.

References

[1]

Agrawal, S., Chaudhuri, S., Kollár, L., Marathe, A.P., Narasayya, V.R., Syamala, M.: Database tuning advisor for microsoft SQL server 2005. In: PVLDB, pp. 1110–1121 (2004)

[2]

Aken, D.V., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: SIGMOD, pp. 1009–1024 (2017)

[3]

Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015)

[4]

Balsa implementation. https://github.com/balsa-project/balsa (2022)

[5]

Chaiken R, Jenkins B, Larson PÅ, Ramsey B, Shakib D, Weaver S, and Zhou J Scope: easy and efficient parallel processing of massive data sets PVLDB 2008 1 2 1265-1276

Digital Library

[6]

Chaudhuri, S., Narasayya, V.R.: An efficient cost-driven index selection tool for microsoft SQL server. In: PVLDB, pp. 146–155 (1997)

[7]

Chen, J., Pan, X., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous sgd. arXiv:1604.00981 (2016)

[8]

Council(TPC), T.P.P.: TPC-DS Vesion 2 and Version 3. http://www.tpc.org/tpcds/ (2021)

[9]

Council(TPC), T.P.P.: Tpc-h vesion 2 and version 3. http://www.tpc.org/tpch/ (2021)

[10]

Dey A, Bhaumik S, Doraiswamy H, and Haritsa JR Efficiently approximating query optimizer plan diagrams PVLDB 2008 1 2 1325-1336

Digital Library

[11]

Ding, B., Das, S., Marcus, R., Wu, W., Chaudhuri, S., Narasayya, V.R.: Ai meets ai: leveraging query executions to improve index recommendations. In: SIGMOD, pp. 1241–1258 (2019)

[12]

Doraiswamy H, Darera PN, and Haritsa JR Identifying robust plans through plan diagram reduction PVLDB 2008 1 1 1124-1140

[13]

Dutt A, Wang C, Nazi A, Kandula S, Narasayya V, and Chaudhuri S Selectivity estimation for range predicates using lightweight models PVLDB 2019 12 9 1044-1057

Digital Library

[14]

Freund Y, Iyer R, Schapire RE, and Singer Y An efficient boosting algorithm for combining preferences J. Mach. Learn. Res. 2003 4 933-969

Digital Library

[15]

Fuhr N Optimum polynomial retrieval functions based on the probability ranking principle ACM Trans. Inf. Syst. 1989 7 3 183-204

Digital Library

[16]

Han, Y.: Github repository: stats end-to-end cardest benchmark. https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark (2021)

[17]

Han Y, Wu Z, Wu P, Zhu R, Yang J, Liang TW, Zeng K, Cong G, Qin Y, Pfadler A, Qian Z, Zhou J, Li J, and Cui B Cardinality estimation in dbms: a comprehensive benchmark evaluation PVLDB 2021 15 4 752-765

Digital Library

[18]

Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., et al.: Deep q-learning from demonstrations. In: AAAI, pp. 3223–3230 (2018)

[19]

Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: Deepdb: learn from data, not from queries!. pp. 992–1005 (2020)

[20]

Johannes, F., Eyke, H.: Preference Learning. Preference Learning (2011)

[21]

Kanet JJ and Sridharan V Scheduling with inserted idle time: problem taxonomy and literature review Oper. Res. 2000 48 1 99-110

Digital Library

[22]

Karatzoglou, A., Baltrunas, L., Shi, Y.: Learning to rank for recommender systems. In: RecSys, pp. 493–494. ACM (2013)

[23]

Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: Estimating correlated joins with deep learning. In: CIDR (2019)

[24]

Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:1808.03196 (2018)

[25]

Leis V, Gubichev A, Mirchev A, Boncz P, Kemper A, and Neumann T How good are query optimizers, really? PVLDB 2015 9 3 204-215

Digital Library

[26]

Lero on postgresql. https://github.com/Blondig/Lero-on-PostgreSQL (2022)

[27]

Lero on spark sql. https://github.com/Blondig/Lero-on-Spark (2023)

[28]

Li, J., König, A.C., Narasayya, V., Chaudhuri, S.: Robust estimation of resource consumption for sql queries using statistical techniques. arXiv:1208.0278 (2012)

[29]

Li, P., Wu, Q., Burges, C.: Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in neural information processing systems 20 (2007)

[30]

Liu J, Dong W, Zhou Q, and Li D Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation PVLDB 2021 14 11 1950-1963

Digital Library

[31]

Liu TY Learning to rank for information retrieval Found. Trends Inf. Retr. 2009 3 3 225-331

Digital Library

[32]

Liu, W., Liu, Q., Tang, R., Chen, J., He, X., Heng, P.A.: Personalized re-ranking with item relationships for e-commerce. In: CIKM, pp. 925–934 (2020)

[33]

Marcus, R.: Bao appendix. https://rmarcus.info/appendix.html (2020)

[34]

Marcus, R.: Github repository: Bao for postgresql. https://github.com/learnedsystems/BaoForPostgreSQL (2020)

[35]

Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., Kraska, T.: Bao: Making learned query optimization practical. In: SIGMOD, pp. 1275–1288 (2021)

[36]

Marcus R, Negi P, Mao H, Zhang C, Alizadeh M, Kraska T, Papaemmanouil O, and Tatbul N Neo: A learned query optimizer PVLDB 2019 12 11 1705-1718

Digital Library

[37]

Marcus R and Papaemmanouil O Plan-structured deep neural network models for query performance prediction PVLDB 2019 12 11 1733-1746

Digital Library

[38]

Moerkotte G, Neumann T, and Steidl G Preventing bad plans by bounding the impact of cardinality estimation errors PVLDB 2009 2 1 982-993

Digital Library

[39]

Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: AAAI, pp. 1287–1293 (2016)

[40]

Negi, P., Interlandi, M., Marcus, R., Alizadeh, M., Kraska, T., Friedman, M., Jindal, A.: Steering query optimizers: a practical take on big data workloads. In: SIGMOD, pp. 2557–2569 (2021)

[41]

Pang, L., Xu, J., Ai, Q., Lan, Y., Cheng, X., Wen, J.: Setrank: Learning a permutation-invariant ranking model for information retrieval. In: SIGIR, pp. 499–508 (2020)

[42]

Pilotscope. https://github.com/alibaba/pilotscope (2024)

[43]

Query Planning. https://www.postgresql.org/docs/current/runtime-config-query.html (2023)

[44]

Reddy, N., Haritsa, J.R.: Analyzing plan diagrams of database query optimizers. In: PVLDB, pp. 1228–1240 (2005)

[45]

Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Readings in Artificial Intelligence and Databases, pp. 511–522. Elsevier (1989)

[46]

Sun J and Li G An end-to-end learning-based cost estimator PVLDB 2019 13 3 307-319

Digital Library

[47]

Swezey R, Grover A, Charron B, and Ermon S Pirank: Scalable learning to rank via differentiable sorting Adv. Neural Inf. Process. Syst. 2021 34 21644-21654

[48]

Thompson WR On the likelihood that one unknown probability exceeds another in view of the evidence of two samples Biometrika 1933 25 3–4 285-294

[49]

Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., Antonakakis, J.: Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In: SIGMOD, pp. 1153–1170 (2019)

[50]

Tzoumas K, Deshpande A, and Jensen CS Lightweight graphical models for selectivity estimation without independence assumptions PVLDB 2011 4 11 852-863

Digital Library

[51]

Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G.M., Skelley, A.: DB2 advisor: An optimizer smart enough to recommend its own indexes. In: ICDE, pp. 101–110 (2000)

[52]

Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, pp. 109–116 (2011)

[53]

Wang J, Trummer I, and Basu D UDO: universal database optimization using reinforcement learning PVLDB 2021 14 13 3402-3414

Digital Library

[54]

Wang X, Qu C, Wu W, Wang J, and Zhou Q Are we ready for learned cardinality estimation? PVLDB 2021 14 9 1640-1654

Digital Library

[55]

Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable? In: ICDE, pp. 1081–1092 (2013)

[56]

Wu, Z., Shaikhha, A.: Bayescard: A unified bayesian framework for cardinality estimation. arXiv:2012.14743 (2020)

[57]

Yang, Z., Chiang, W., Luan, S., Mittal, G., Luo, M., Stoica, I.: Balsa: Learning a query optimizer without expert demonstrations. In: SIGMOD, pp. 931–944 (2022)

[58]

Yang Z, Kamsetty A, Luan S, Liang E, Duan Y, Chen X, and Stoica I Neurocard: One cardinality estimator for all tables PVLDB 2021 14 1 61-73

[59]

Yang Z, Liang E, Kamsetty A, Wu C, Duan Y, Chen X, Abbeel P, Hellerstein JM, Krishnan S, and Stoica I Deep unsupervised cardinality estimation PVLDB 2019 13 3 279-292

Digital Library

[60]

Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-lstm for join order selection. In: ICDE, pp. 1297–1308 (2020)

[61]

Zhang X, Chang Z, Wu H, Li Y, Chen J, Tan J, Li F, and Cui B A unified and efficient coordinating framework for autonomous DBMS tuning Proc. ACM Manag. Data 2023 1 2 1-26

Digital Library

[62]

Zhi Kang, J.K., Tan, S.Y., Cheng, F., Sun, S., He, B.: Efficient deep learning pipelines for accurate cost estimations over large scale query workload. In: SIGMOD, pp. 1014–1022 (2021)

[63]

Zhou, L.: A survey on contextual multi-armed bandits. arXiv:1508.03326 (2015)

[64]

Zhou, X., Sun, J., Li, G., Feng, J.: Query performance prediction for concurrent queries using graph embedding. PVLDB 13(9), 1416–1428 (2020)

[65]

Zhu R, Chen W, Ding B, Chen X, Pfadler A, Wu Z, and Zhou J Lero: A learning-to-rank query optimizer PVLDB 2023 16 6 1466-1479

Digital Library

[66]

Zhu R, Weng L, Wei W, Wu D, Peng J, Wang Y, Ding B, Lian D, Zheng B, and Zhou J Pilotscope: Steering databases with machine learning drivers PVLDB 2024 17 5 980-993

Digital Library

[67]

Zhu, R., Wu, Z., Chai, C., Pfadler, A., Ding, B., Li, G., Zhou, J.: Learned query optimizer: At the forefront of ai-driven databases. In: EDBT, pp. 1–4 (2022)

[68]

Zhu R, Wu Z, Han Y, Zeng K, Pfadler A, Qian Z, Zhou J, and Cui B Flat: Fast, lightweight and accurate method for cardinality estimation PVLDB 2021 14 9 1489-1502

Digital Library

Index Terms

Lero: applying learning-to-rank in query optimizer

Index terms have been assigned to the content through auto-classification.

Recommendations

Lero: A Learning-to-Rank Query Optimizer

A recent line of works apply machine learning techniques to assist or rebuild cost-based query optimizers in DBMS. While exhibiting superiority in some benchmarks, their deficiencies, e.g., unstable performance, high training cost, and slow model ...
Learned Query Optimizer: What is New and What is Next
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data

In recent times, learned query optimizer has becoming a hot research topic in learned databases. It serves as the most suitable experimental plots for utilizing numerous machine-learning techniques and exhibits its superiority with enough evidence. In ...
A Learned Query Optimizer for Spatial Join
SIGSPATIAL '21: Proceedings of the 29th International Conference on Advances in Geographic Information Systems

The importance and complexity of spatial join resulted in many join algorithms, some of which run on big-data platforms such as Hadoop and Spark. This paper proposes the first machine-learning-based query optimizer for spatial join operation which can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 33, Issue 5

Sep 2024

533 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 April 2024

Accepted: 15 March 2024

Revision received: 08 February 2024

Received: 28 October 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents