[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

On the Caching Schemes to Speed Up Program Reduction

Published: 24 November 2023 Publication History

Abstract

Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P that exhibits a property ψ, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens and returns the minimal variant preserving ψ as the result.
A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 61.8% and 24.3% of the generated variants in two representative program reducers HDD and Perses, respectively, are duplicates. Checking them against ψ is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand.
This study is the first effort to conduct a systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods, ZIP and SHA, to compress the generated variants before they are stored in the cache. Furthermore, our keen understanding on the program reduction process motivates us to propose a novel, domain-specific, both memory and computation-efficient caching scheme, Refreshable Compact Caching (RCC). Our key insight is two-fold: ① by leveraging the correlation between variants and the original program P, we losslessly encode each variant into an equivalent, compact, canonical representation; ② periodically, stale cache entries, which will never be accessed, are timely removed to minimize the memory footprint over time.
Our extensive evaluation on 31 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 61.8% and 24.3% in HDD and Perses, respectively; correspondingly, the runtime performance is notably boosted by 22.8% and 18.2%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme STR. Specifically, ZIP and SHA cut down the memory footprint by more than 80% and 90% in both Perses and HDD compared to STR; moreover, the highly-scalable, domain-specific RCC dominates peer schemes, and outperforms the SHA by 96.4% and 91.74% in HDD and Perses, respectively.

References

[1]
Dimitris Andreour. 2015. ObjectExplorer. Retrieved May 29, 2023 from https://github.com/DimitrisAndreou/memory-measurer
[2]
Antoine Balestrat. 2012. CCG: A Random C Code Generator. Retrieved May 29, 2023 from https://github.com/Merkil/ccg/
[3]
David W. Binkley, Nicolas Gold, Mark Harman, Syed S. Islam, Jens Krinke, and Shin Yoo. 2014. ORBS: Language-independent program slicing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22). Shing-Chi Cheung, Alessandro Orso, and Margaret-Anne D. Storey (Eds.), ACM, 109–120. DOI:
[4]
Bobby R. Bruce, Tianyi Zhang, Jaspreet Arora, Guoqing Harry Xu, and Miryung Kim. 2020. JShrink: In-depth investigation into debloating modern Java applications. In Proceedings of the ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event. Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.), ACM, 135–146. DOI:
[5]
Federico Brunero and Petros Elia. 2023. Fundamental limits of combinatorial multi-access caching. IEEE Transactions on Information Theory 69, 2 (2023), 1037–1056. DOI:
[6]
Darius Buntinas, Brice Goglin, David Goodell, Guillaume Mercier, and Stéphanie Moreaud. 2009. Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis. In Proceedings of the ICPP 2009, International Conference on Parallel Processing. IEEE Computer Society, 462–469. DOI:
[7]
Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Z. Fern, Eric Eide, and John Regehr. 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’13. Hans-Juergen Boehm and Cormac Flanagan (Eds.), ACM, 197–208. DOI:
[8]
Hong Tai Chou and David J. DeWitt. 1986. An evaluation of buffer management strategies for relational database systems. Algorithmica 1, 1 (1986), 311–336. DOI:
[9]
Shaul Dar, Michael J. Franklin, Björn Þór Jónsson, Divesh Srivastava, and Michael Tan. 1996. Semantic data caching and replacement. In VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases. T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda (Eds.), Morgan Kaufmann, 330–341. Retrieved from http://www.vldb.org/conf/1996/P330.PDF
[10]
L. Peter Deutsch and Jean-loup Gailly. 1996. ZLIB Compressed Data Format Specification version 3.3 (RFC’50), RFC Editor, 11 pages. https://www.rfc-editor.org/info/rfc1950
[11]
Alastair Donaldson and David MacIver. 2021. Test Case Reduction: Beyond Bugs. Retrieved May 29, 2023 from https://blog.sigplan.org/2021/05/25/test-case-reduction-beyond-bugs
[12]
GCC. 2017. A Guide to Testcase Reduction. Retrieved May 29, 2023 from https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
[13]
Mark Adler Greg Roelofs. 1996. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. Retrieved Jul 1, 2023 from https://zlib.net
[14]
Shay Gueron, Simon Johnson, and Jesse Walker. 2011. SHA-512/256. In Proceedings of the 8th International Conference on Information Technology: New Generations, ITNG 2011.Shahram Latifi (Ed.), IEEE Computer Society, 354–358. DOI:
[15]
Jim Handy. 1998. The Cache Memory Book (2nd Ed.): The Authoritative Reference on Cache Design. Academic Press, Inc.
[16]
Kihong Heo, Woosuk Lee, Pardis Pashakhanloo, and Mayur Naik. 2018. Effective program debloating via reinforcement learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018. David Lie, Mohammad Mannan, Michael Backes, and XiaoFeng Wang (Eds.), ACM, 380–394. DOI:
[17]
Satia Herfert, Jibesh Patra, and Michael Pradel. 2017. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017. Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.), IEEE Computer Society, 861–871. DOI:
[18]
Renáta Hodován and Ákos Kiss. 2016. Modernizing hierarchical delta debugging. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2016. Tanja E. J. Vos, Sigrid Eldh, and Wishnu Prasetya (Eds.), ACM, 31–37. DOI:
[19]
Renáta Hodován, Ákos Kiss, and Tibor Gyimóthy. 2017. Coarse hierarchical delta debugging. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017. IEEE Computer Society, 194–203. DOI:
[20]
Renáta Hodován, Ákos Kiss, and Tibor Gyimóthy. 2017. Tree preprocessing and test outcome caching for efficient hierarchical delta debugging. In 12th IEEE/ACM International Workshop on Automation of Software Testing (AST@ICSE’17, Buenos Aires, Argentina, May 20-21, 2017), IEEE Computer Society, 23–29. DOI:
[21]
Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases. Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo (Eds.), Morgan Kaufmann, 439–450. Retrieved from http://www.vldb.org/conf/1994/P439.PDF
[22]
Christian Gram Kalhauge and Jens Palsberg. 2019. Binary reduction of dependency graphs. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019.Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.), ACM, 556–566. DOI:
[23]
Christian Gram Kalhauge and Jens Palsberg. 2021. Logical bytecode reduction. In Proceedings of the PLDI’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event. Stephen N. Freund and Eran Yahav (Eds.), ACM, 1003–1016. DOI:
[24]
J. Kelsey, S. Change, and R. Perlner. 2016. SHA-3 derived functions: CSHAKE, KMAC, TupleHash and ParallelHash.NISTSpecialPublication 800 (2016), 185 pages. https://www.nist.gov/publications/sha-3-derived-functions-cshake-kmac-tuplehash-and-parallelhash
[25]
Ákos Kiss, Renáta Hodován, and Tibor Gyimóthy. 2018. HDDr: A recursive variant of the hierarchical Delta debugging algorithm. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2018. Wishnu Prasetya, Tanja E. J. Vos, and Sinem Getir (Eds.), ACM, 16–22. DOI:
[26]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14.Michael F. P. O’Boyle and Keshav Pingali (Eds.), ACM, 216–226. DOI:
[27]
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015. Jonathan Aldrich and Patrick Eugster (Eds.), ACM, 386–399. DOI:
[28]
Bastien Lecoeur, Hasan Mohsin, and Alastair F. Donaldson. 2023. Program reconditioning: Avoiding undefined behaviour when finding and reducing compiler bugs. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 25 pages. DOI:
[29]
Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation.David Grove and Stephen M. Blackburn (Eds.), ACM, 65–76. DOI:
[30]
Ming Liu, Liang Luo, Jacob Nelson, Luis Ceze, Arvind Krishnamurthy, and Kishore Atreya. 2017. IncBricks: Toward in-network computation with an in-network cache. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017. Yunji Chen, Olivier Temam, and John Carter (Eds.), ACM, 795–809. DOI:
[31]
Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 196:1–196:25. DOI:
[32]
LLVM. 2017. How to Submit an LLVM Bug Report. Retrieved Mar 20, 2023 from https://llvm.org/docs/HowToSubmitABug.html
[33]
LLVM/Clang. 2022. Clang Documentation – LibTooling. Retrieved May 29, 2023 from https://clang.llvm.org/docs/LibTooling.html
[34]
Ghassan Misherghi and Zhendong Su. 2006. HDD: Hierarchical delta debugging. In Proceedings of the 28th International Conference on Software Engineering (ICSE 2006). Leon J. Osterweil, H. Dieter Rombach, and Mary Lou Soffa (Eds.), ACM, 142–151. DOI:
[35]
Elizabeth J. O’Neil, Patrick E. O’Neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’93 Washington, DC, USA, May 26-28, 1993), Peter Buneman and Sushil Jajodia (Eds.). ACM Press, 297–306.
[36]
John Regehr. 2015. Reducers are Fuzzers – EMBEDDED IN ACADEMIA. Retrieved Jul 1, 2023 from https://blog.regehr.org/archives/1284
[37]
John Regehr. 2016. [creduce-dev] cache. Retrieved May 29, 2023 from http://www.flux.utah.edu/listarchives/creduce-dev/msg00284.html
[38]
John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’12. Jan Vitek, Haibo Lin, and Frank Tip (Eds.), ACM, 335–346. DOI:
[39]
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–12.
[40]
Mozilla Security. 2008. Lithium: Line-Based Testcase Reducer. Retrieved May 29, 2023 from https://github.com/MozillaSecurity/lithium
[41]
Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016. Eelco Visser and Yannis Smaragdakis (Eds.), ACM, 849–863. DOI:
[42]
Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016. Andreas Zeller and Abhik Roychoudhury (Eds.), ACM, 294–305. DOI:
[43]
Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: Syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018. Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 361–371. DOI:
[44]
Yutian Tang, Hao Zhou, Xiapu Luo, Ting Chen, Haoyu Wang, Zhou Xu, and Yan Cai. 2022. XDebloat: Towards automated feature-oriented app debloating. IEEE Transactions on Software Engineering 48, 11 (2022), 4501–4520. DOI:
[45]
Jia Le Tian, Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Yiwen Dong, and Chengnian Sun. 2023. Ad hoc syntax-guided program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023.Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.), ACM, New York, NY.
[46]
Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Chengnian Sun, and Shing-Chi Cheung. 2023. Revisiting the evaluation of deep learning-based compiler testing. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. Edith Elkind (Ed.), ijcai.org.
[47]
Frank Tip, Peter F. Sweeney, Chris Laffra, Aldo Eisma, and David Streeter. 2002. Practical extraction techniques for Java. ACM Transactions on Programming Languages and Systems 24, 6 (2002), 625–666. DOI:
[48]
Guancheng Wang, Ruobing Shen, Junjie Chen, Yingfei Xiong, and Lu Zhang. 2021. Probabilistic delta debugging. In Proceedings of the ESEC/FSE’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.), ACM, 881–892. DOI:
[49]
Theodore Luo Wang, Yongqiang Tian, Yiwen Dong, Zhenyang Xu, and Chengnian Sun. 2023. Compilation consistency modulo debug information. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023. Tor M. Aamodt, Natalie D. Enright Jerger, and Michael M. Swift (Eds.), ACM, 146–158. DOI:
[50]
Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, Gaosen Zhao, Yu Jiang, and Chengnian Sun. 2023. Pushing the limit of 1-minimality of language-agnostic program reduction. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 29 pages. DOI:
[51]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011. Mary W. Hall and David A. Padua (Eds.), ACM, 283–294. DOI:
[52]
Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183–200. DOI:
[53]
Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Yu Jiang, and Chengnian Sun. 2023. PPR: Pairwise program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Satish Chandra, Kelly Blincoe, and Paolo Tonella (Eds.), ACM, New York, NY.

Cited By

View all
  • (2024)T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical SyntaxACM Transactions on Software Engineering and Methodology10.1145/3690631Online publication date: 30-Aug-2024
  • (2024)Towards Understanding the Bugs in Solidity CompilerProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680362(1312-1324)Online publication date: 11-Sep-2024
  • (2024)LPR: Large Language Models-Aided Program ReductionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652126(261-273)Online publication date: 11-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 1
January 2024
933 pages
EISSN:1557-7392
DOI:10.1145/3613536
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2023
Online AM: 05 September 2023
Accepted: 24 July 2023
Revised: 07 June 2023
Received: 05 October 2022
Published in TOSEM Volume 33, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Program reduction
  2. delta debugging
  3. debugging

Qualifiers

  • Research-article

Funding Sources

  • Natural Sciences and Engineering Research Council of Canada (NSERC) through the Discovery Grant, a project under Waterloo-Huawei Joint Innovation Lab, and CFI-JELF Project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)18
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical SyntaxACM Transactions on Software Engineering and Methodology10.1145/3690631Online publication date: 30-Aug-2024
  • (2024)Towards Understanding the Bugs in Solidity CompilerProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680362(1312-1324)Online publication date: 11-Sep-2024
  • (2024)LPR: Large Language Models-Aided Program ReductionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652126(261-273)Online publication date: 11-Sep-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media