[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3368089.3409710acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Detecting optimization bugs in database engines via non-optimizing reference engine construction

Published: 08 November 2020 Publication History

Abstract

Database Management Systems (DBMS) are used ubiquitously. To efficiently access data, they apply sophisticated optimizations. Incorrect optimizations can result in logic bugs, which cause a query to compute an incorrect result set. We propose Non-Optimizing Reference Engine Construction (NoREC), a fully-automatic approach to detect optimization bugs in DBMS. Conceptually, this approach aims to evaluate a query by an optimizing and a non-optimizing version of a DBMS, to then detect differences in their returned result set, which would indicate a bug in the DBMS. Obtaining a non-optimizing version of a DBMS is challenging, because DBMS typically provide limited control over optimizations. Our core insight is that a given, potentially randomly-generated optimized query can be rewritten to one that the DBMS cannot optimize. Evaluating this unoptimized query effectively corresponds to a non-optimizing reference engine executing the original query. We evaluated NoREC in an extensive testing campaign on four widely-used DBMS, namely PostgreSQL, MariaDB, SQLite, and CockroachDB. We found 159 previously unknown bugs in the latest versions of these systems, 141 of which have been fixed by the developers. Of these, 51 were optimization bugs, while the remaining were error and crash bugs. Our results suggest that NoREC is effective, general and requires little implementation effort, which makes the technique widely applicable in practice.

Supplementary Material

Auxiliary Teaser Video (fse20main-p315-p-teaser.mp4)
Main Video
Auxiliary Presentation Video (fse20main-p315-p-video.mp4)
Main Video

References

[1]
Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL Query Generation for Systematic Testing of Database Engines. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE '10). ACM, New York, NY, USA, 329-332. https://doi.org/10.1145/1858996.1859063
[2]
Hardik Bati, Leo Giakoumakis, Steve Herbert, and Aleksandras Surna. 2007. A Genetic Approach for Random Testing of Database Systems. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07). VLDB Endowment, 1243-1251.
[3]
Carsten Binnig, Donald Kossmann, and Eric Lo. 2007. Reverse Query Processing. Proceedings-International Conference on Data Engineering, 506-515. https: //doi.org/10.1109/ICDE. 2007.367896
[4]
Carsten Binnig, Donald Kossmann, Eric Lo, and M. Tamer Özsu. 2007. QAGen: Generating Query-Aware Test Databases. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD '07). Association for Computing Machinery, New York, NY, USA, 341-352. https://doi.org/10.1145/ 1247480.1247520
[5]
Carl Friedrich Bolz, Darya Kurilova, and Laurence Tratt. 2016. Making an Embedded DBMS JIT-friendly. In 30th European Conference on Object-Oriented Programming, ECOOP 2016, July 18-22, 2016, Rome, Italy. 4 : 1-4 : 24. https: //doi.org/10.4230/LIPIcs.ECOOP. 2016.4
[6]
Robert Brummayer and Armin Biere. 2009. Fuzzing and Delta-Debugging SMT Solvers. In Proceedings of the 7th International Workshop on Satisfiability Modulo Theories (SMT '09). Association for Computing Machinery, New York, NY, USA, 1-5. https://doi.org/10.1145/1670412.1670413
[7]
Nicolas Bruno and Surajit Chaudhuri. 2005. Flexible Database Generators. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB '05). VLDB Endowment, 1097-1107.
[8]
Nicolas Bruno, Surajit Chaudhuri, and Ravi Ramamurthy. 2009. Power Hints for Query Optimization. In Proceedings of the 2009 IEEE International Conference on Data Engineering (ICDE '09). IEEE Computer Society, USA, 469-480. https: //doi.org/10.1109/ICDE. 2009.68
[9]
Nicolas Bruno, Surajit Chaudhuri, and Dilys Thomas. 2006. Generating Queries with Cardinality Constraints for DBMS Testing. IEEE Trans. on Knowl. and Data Eng. 18, 12 (Dec. 2006 ), 1721-1725. https://doi.org/10.1109/TKDE. 2006.190
[10]
Donald D. Chamberlin and Raymond F. Boyce. 1974. SEQUEL: A Structured English Query Language. In Proceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control (SIGFIDET '74). ACM, New York, NY, USA, 249-264. https://doi.org/10.1145/800296.811515
[11]
Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong.
[12]
Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article Article 4 (Jan. 2018 ), 27 pages. https://doi.org/10.1145/3143561
[13]
And Clover. 2019. Bug submission: left join filter on negated expression including NOTNULL. https://www.mail-archive.com/[email protected]/ msg117434.html
[14]
E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 ( June 1970 ), 377-387. https://doi.org/10.1145/362384.362685
[15]
E. F. Codd. 1972. Relational Completeness of Data Base Sublanguages. IBM Corporation.
[16]
Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing Static Analyzers with Randomly Generated Programs. In Proceedings of the 4th International Conference on NASA Formal Methods (NFM'12). Springer-Verlag, Berlin, Heidelberg, 120-125. https: //doi.org/10.1007/978-3-642-28891-3_12
[17]
DB-Engines. 2019. DB-Engines Ranking ( July 2019 ). https://db-engines.com/en/ ranking
[18]
Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. Proc. VLDB Endow. 11, 10 ( June 2018 ), 1123-1136. https://doi.org/10.14778/3231751.3231761
[19]
Ramez Elmasri and Sham Navathe. 2017. Fundamentals of database systems. Vol. 7. Pearson.
[20]
Leo Giakoumakis and César A Galindo-Legaria. 2008. Testing SQL Server's Query Optimizer: Challenges, Techniques and Experiences. IEEE Data Eng. Bull. 31, 1 ( 2008 ), 36-43.
[21]
Torsten Grabs, Steve Herbert, and Xin (Shin) Zhang. 2008. Testing Challenges for Extending SQL Server's Query Processor: A Case Study. In Proceedings of the 1st International Workshop on Testing Database Systems (DBTest '08). Association for Computing Machinery, New York, NY, USA, Article Article 2, 6 pages. https: //doi.org/10.1145/1385269.1385272
[22]
Goetz Graefe. 1993. Query evaluation techniques for large databases. ACM Computing Surveys (CSUR) 25, 2 ( 1993 ), 73-169.
[23]
Jim Gray, Prakash Sundaresan, Susanne Englert, Ken Baclawski, and Peter J. Weinberger. 1994. Quickly Generating Billion-Record Synthetic Databases. SIGMOD Rec. 23, 2 (May 1994 ), 243-252. https://doi.org/10.1145/191843.191886
[24]
Zhongxian Gu, Mohamed A. Soliman, and Florian M. Waas. 2012. Testing the Accuracy of Query Optimizers. In Proceedings of the Fifth International Workshop on Testing Database Systems (DBTest '12). ACM, New York, NY, USA, Article 11, 6 pages. https://doi.org/10.1145/2304510.2304525
[25]
Antonin Guttman. 1984. R-Trees: A Dynamic Index Structure for Spatial Searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD '84). Association for Computing Machinery, New York, NY, USA, 47-57. https://doi.org/10.1145/602259.602266
[26]
Kenneth Houkjaer, Kristian Torp, and Rico Wind. 2006. Simple and Realistic Data Generation. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB '06). VLDB Endowment, 1243-1246.
[27]
William E. Howden. 1978. Theoretical and Empirical Studies of Program Testing. In Proceedings of the 3rd International Conference on Software Engineering (ICSE '78). IEEE Press, Piscataway, NJ, USA, 305-311.
[28]
Matt Jibson. 2016. Testing Random, Valid SQL in CockroachDB. https://www. cockroachlabs.com/blog/testing-random-valid-sql-in-cockroachdb/
[29]
Matt Jibson. 2019. SQLsmith: Randomized SQL Testing in CockroachDB. https: //www.cockroachlabs.com/blog/sqlsmith-randomized-sql-testing/
[30]
Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woonhak Kang. 2019. APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems. Proc. VLDB Endow. 13, 1 (Sept. 2019 ), 57-70. https: //doi.org/10.14778/3357377.3357382
[31]
Timotej Kapus and Cristian Cadar. 2017. Automatic Testing of Symbolic Execution Engines via Program Generation and Diferential Testing. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017 ). IEEE Press, Piscataway, NJ, USA, 590-600.
[32]
S. A. Khalek, B. Elkarablieh, Y. O. Laleye, and S. Khurshid. 2008. Query-Aware Test Generation Using a Relational Constraint Solver. In Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE '08). IEEE Computer Society, Washington, DC, USA, 238-247. https://doi.org/10. 1109/ASE. 2008.34
[33]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). ACM, New York, NY, USA, 216-226. https://doi.org/10.1145/2594291.2594334
[34]
Eric Lo, Carsten Binnig, Donald Kossmann, M. Tamer Özsu, and Wing-Kai Hon. 2010. A framework for testing DBMS features. The VLDB Journal 19, 2 ( 01 Apr 2010 ), 203-230. https://doi.org/10.1007/s00778-009-0157-y
[35]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 ( July 2019 ), 1705-1718. https: //doi.org/10.14778/3342263.3342644
[36]
William M McKeeman. 1998. Diferential testing for software. Digital Technical Journal 10, 1 ( 1998 ), 100-107.
[37]
Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating Targeted Queries for Database Testing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08). ACM, New York, NY, USA, 499-510. https://doi.org/10.1145/1376616.1376668
[38]
Andrea Neufeld, Guido Moerkotte, and Peter C. Lockemann. 1993. Generating Consistent Test Data: Restricting the Search Space by a Generator Formula. The VLDB Journal 2, 2 (April 1993 ), 173-214.
[39]
Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 677-692. https://doi.org/10.1145/3183713.3183733
[40]
Stack Overflow. 2019. Developer Survey Results 2019. https://insights. stackoverflow.com/survey/2019
[41]
Andrew Pavlo and Matthew Aslett. 2016. What's Really New with NewSQL? SIGMOD Rec. 45, 2 (Sept. 2016 ), 45-55. https://doi.org/10.1145/3003665.3003674
[42]
Meikel Poess and John M. Stephens. 2004. Generating Thousand Benchmark Queries in Seconds. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30 (VLDB '04). VLDB Endowment, 1045-1053.
[43]
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-Case Reduction for C Compiler Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). Association for Computing Machinery, New York, NY, USA, 335-346. https://doi.org/10.1145/2254064.2254104
[44]
Manuel Rigger. 2019. LEFT JOIN in view malfunctions with NOTNULL. https: //www.sqlite.org/src/tktview?name=c31034044b
[45]
Manuel Rigger and Zhendong Su. 2020. ESEC/FSE 20 Artifact for "Detecting Optimization Bugs in Database Engines via Non-Optimizing Reference Engine Construction". https://doi.org/10.5281/zenodo.3947858
[46]
Manuel Rigger and Zhendong Su. 2020. Testing Database Engines via Pivoted Query Synthesis.
[47]
Sergio Segura and Zhi Quan Zhou. 2018. Metamorphic Testing 20 Years Later: A Hands-on Introduction. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE '18). Association for Computing Machinery, New York, NY, USA, 538-539. https://doi.org/10.1145/ 3183440.3183468
[48]
P. Grifiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (SIGMOD '79). Association for Computing Machinery, New York, NY, USA, 23-34. https://doi.org/10.1145/582095.582099
[49]
Andreas Seltenreich. 2019. SQLSmith. https://github.com/anse1/sqlsmith
[50]
Donald R Slutz. 1998. Massive stochastic testing of SQL. In VLDB, Vol. 98. 618-622.
[51]
SQLite3. 2020. Generated Columns. https://sqlite.org/gencol.html
[52]
SQLite3. 2020. How SQLite Is Tested. https://www.sqlite.org/testing.html
[53]
SQLite3. 2020. Most Widely Deployed and Used Database Engine. https: //www.sqlite.org/mostdeployed.html
[54]
SQLite3. 2020. The SQLite Query Optimizer Overview. https://www.sqlite.org/ optoverview.html
[55]
SQLite3. 2020. The Use Of assert() In SQLite. https://www.sqlite.org/assert.html
[56]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jafray, Lucy Zhang, and Peter Mattis. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1493-1509. https://doi.org/10.1145/3318464. 3386134
[57]
Tencent Blade Team. 2019. Magellan 2.0. https://blade.tencent.com/magellan2/ index_en.html
[58]
Manasi Vartak, Venkatesh Raghavan, and Elke A. Rundensteiner. 2010. QRelX: Generating Meaningful Queries That Provide Cardinality Assurance. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 1215-1218. https://doi.org/10.1145/1807167.1807323
[59]
Marianne Winslett and Vanessa Braganholo. 2019. Richard Hipp Speaks Out on SQLite. SIGMOD Rec. 48, 2 (Dec. 2019 ), 39-46. https://doi.org/10.1145/3377330. 3377338
[60]
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. 2018. Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12, 3 (Nov. 2018 ), 210-222. https://doi.org/10.14778/3291264. 3291267
[61]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). ACM, New York, NY, USA, 283-294. https://doi.org/10.1145/1993498.1993532

Cited By

View all
  • (2024)WINGFUZZProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692022(479-492)Online publication date: 10-Jul-2024
  • (2024)Detecting logic bugs in database engines via equivalent expression transformationProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691982(821-835)Online publication date: 10-Jul-2024
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2020
1703 pages
ISBN:9781450370431
DOI:10.1145/3368089
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. DBMS testing
  2. database testing
  3. query optimizer bugs
  4. test oracle

Qualifiers

  • Research-article

Conference

ESEC/FSE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)11
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)WINGFUZZProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692022(479-492)Online publication date: 10-Jul-2024
  • (2024)Detecting logic bugs in database engines via equivalent expression transformationProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691982(821-835)Online publication date: 10-Jul-2024
  • (2024)Detecting Metadata-Related Logic Bugs in Database Systems via Raw Database ConstructionProceedings of the VLDB Endowment10.14778/3659437.365944517:8(1884-1897)Online publication date: 1-Apr-2024
  • (2024)Understanding and Reusing Test Suites Across Database SystemsProceedings of the ACM on Management of Data10.1145/36988292:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Finding Logic Bugs in Spatial Database Engines via Affine Equivalent InputsProceedings of the ACM on Management of Data10.1145/36988102:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Step-wise Execution of Data-Centric SystemsCompanion Proceedings of the 2024 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3689491.3691821(19-21)Online publication date: 20-Oct-2024
  • (2024)Keep It Simple: Testing Databases via Differential Query PlansProceedings of the ACM on Management of Data10.1145/36549912:3(1-26)Online publication date: 30-May-2024
  • (2024)Testing Gremlin-Based Graph Database Systems via Query DisassemblingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680392(1695-1707)Online publication date: 11-Sep-2024
  • (2024)Testing Graph Database Systems with Graph-State Persistence OracleProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680311(666-677)Online publication date: 11-Sep-2024
  • (2024)Finding Cross-Rule Optimization Bugs in Datalog EnginesProceedings of the ACM on Programming Languages10.1145/36498158:OOPSLA1(110-136)Online publication date: 29-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media