More Web Proxy on the site http://driver.im/

research-article

Fuzzing MLIR Compiler Infrastructure via Operation Dependency Analysis

Authors:

Jianrong WangAuthors Info & Claims

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 1287 - 1299

https://doi.org/10.1145/3650212.3680360

Published: 11 September 2024 Publication History

Abstract

MLIR (Multi-Level Intermediate Representation) compiler infrastructure has gained widespread popularity in recent years. It introduces dialects to accommodate various levels of abstraction within the representation. Due to its fundamental role in compiler construction, it is critical to ensure its correctness. Recently, a grammar-based fuzzing technique (i.e., MLIRSmith) has been proposed for it and achieves notable effectiveness. However, MLIRSmith generates test programs in a random manner, which restricts the exploration of the input space, thereby limiting the overall fuzzing effectiveness. In this work, we propose a novel fuzzing technique, called MLIR. As complicated or uncommon data/control dependencies among various operations are often helpful to trigger MLIR bugs, it constructs the operation dependency graph for an MLIR program and defines the associated operation dependency coverage to guide the fuzzing process. To drive the fuzzing process towards increasing operation dependency coverage, MLIR then designs a set of dependency-targeted mutation rules. By applying MLIR to the latest revisions of the MLIR compiler infrastructure, it detected 63 previously unknown bugs, among which 38/48 bugs have been fixed/confirmed by developers.

References

[1]

2023. Flang. https://github.com/llvm/llvm-project/tree/main/flang

[2]

2023. gcov. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

[3]

2023. IREE. https://openxla.github.io/iree/

[4]

2023. MLIRod. https://github.com/tju-chenyaosuo/MLIRod

[5]

2023. TOSA Dialect. https://mlir.llvm.org/docs/Dialects/TOSA/

[6]

2024. M. Zalewski. american fuzzy lop. http://lcamtuf.coredump.cx/afl

[7]

2024. MLIR Pass Documentation. https://mlir.llvm.org/docs/Passes

[8]

Lukas Bernhard, Tobias Scharnowski, Moritz Schloegel, Tim Blazytko, and Thorsten Holz. 2022. JIT-Picking: Differential Fuzzing of JavaScript Engines. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022, Heng Yin, Angelos Stavrou, Cas Cremers, and Elaine Shi (Eds.). ACM, 351–364.

Digital Library

[9]

Junjie Chen, Jibesh Patra, Michael Pradel, Yingfei Xiong, Hongyu Zhang, Dan Hao, and Lu Zhang. 2020. A survey of compiler testing. ACM Computing Surveys (CSUR), 53, 1 (2020), 1–36.

Digital Library

[10]

Junjie Chen and Chenyao Suo. 2022. Boosting compiler testing via compiler optimization exploration. ACM Transactions on Software Engineering and Methodology (TOSEM), 31, 4 (2022), 1–33.

Digital Library

[11]

Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. 1987. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9, 3 (1987), 319–349.

Digital Library

[12]

Andrea Fioraldi, Dominik Maier, Heiko Eiß feldt, and Marc Heuse. 2020. $AFL++$: Combining incremental steps of fuzzing research. In 14th USENIX Workshop on Offensive Technologies (WOOT 20).

[13]

Tianchang Gao, Junjie Chen, Yingquan Zhao, Yuqun Zhang, and Lingming Zhang. 2023. Vectorizing Program Ingredients for Better JVM Testing. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 526–537.

Digital Library

[14]

Harrison Green and Thanassis Avgerinos. 2022. GraphFuzz: library API fuzzing with lifetime-aware dataflow graphs. In Proceedings of the 44th International Conference on Software Engineering. 1070–1081.

Digital Library

[15]

Samuel Groß, Simon Koch, Lukas Bernhard, Thorsten Holz, and Martin Johns. 2023. FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities. In 30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023. The Internet Society.

[16]

Yu Hao, Hang Zhang, Guoren Li, Xingyun Du, Zhiyun Qian, and Ardalan Amiri Sani. 2022. Demystifying the Dependency Challenge in Kernel Fuzzing. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 659–671.

[17]

Xiaoyu He, Xiaofei Xie, Yuekang Li, Jianwen Sun, Feng Li, Wei Zou, Yang Liu, Lei Yu, Jianhua Zhou, Wenchang Shi, and Wei Huo. 2021. SoFi: Reflection-Augmented Fuzzing for JavaScript Engines. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, Yongdae Kim, Jong Kim, Giovanni Vigna, and Elaine Shi (Eds.). ACM, 2229–2242.

[18]

Adrian Herrera, Hendra Gunadi, Shane Magrath, Michael Norrish, Mathias Payer, and Antony L. Hosking. 2021. Seed selection for successful fuzzing. In ISSTA ’21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, Denmark, July 11-17, 2021, Cristian Cadar and Xiangyu Zhang (Eds.). ACM, 230–243. https://doi.org/10.1145/3460319.3464795

Digital Library

[19]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In 21st USENIX Security Symposium (USENIX Security 12). 445–458.

[20]

He Jiang, Zhide Zhou, Zhilei Ren, Jingxuan Zhang, and Xiaochen Li. 2021. CTOS: Compiler testing for optimization sequences of LLVM. IEEE Transactions on Software Engineering, 48, 7 (2021), 2339–2358.

[21]

Jiajun Jiang, Yumeng Wang, Junjie Chen, Delin Lv, and Mengjiao Liu. 2023. Variable-Based Fault Localization via Enhanced Decision Tree. ACM Transactions on Software Engineering and Methodology, 33, 2 (2023), 1–32.

[22]

Jianfeng Jiang, Hui Xu, and Yangfan Zhou. 2021. RULF: Rust library fuzzing via API dependency graph traversal. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 581–592.

Digital Library

[23]

Kyungtae Kim, Dae R. Jeong, Chung Hwan Kim, Yeongjin Jang, Insik Shin, and Byoungyoung Lee. 2020. HFL: Hybrid Fuzzing on the Linux Kernel. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. The Internet Society.

[24]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2–14. https://doi.org/10.1109/CGO51591.2021.9370308

Digital Library

[25]

Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. ACM Sigplan Notices, 49, 6 (2014), 216–226.

Digital Library

[26]

Bastien Lecoeur, Hasan Mohsin, and Alastair F Donaldson. 2023. Program Reconditioning: Avoiding Undefined Behaviour When Finding and Reducing Compiler Bugs. Proceedings of the ACM on Programming Languages, 7, PLDI (2023), 1801–1825.

Digital Library

[27]

Suyoung Lee, HyungSeok Han, Sang Kil Cha, and Sooel Son. 2020. Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer. In 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, Srdjan Capkun and Franziska Roesner (Eds.). USENIX Association, 2613–2630.

[28]

Caroline Lemieux and Koushik Sen. 2018. FairFuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 475–485.

Digital Library

[29]

Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 19, 4 (2021), 2244–2258.

[30]

Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F Donaldson. 2015. Many-core compiler fuzzing. ACM SIGPLAN Notices, 50, 6 (2015), 65–76.

Digital Library

[31]

Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, and Lingming Zhang. 2023. Nnsmith: Generating diverse and valid test cases for deep learning compilers. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 530–543.

Digital Library

[32]

Alessandro Mantovani, Andrea Fioraldi, and Davide Balzarotti. 2022. Fuzzing with data dependency information. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). 286–302.

[33]

William S Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. 2021. Polygeist: Raising C to polyhedral MLIR. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). 45–59.

Digital Library

[34]

Shankara Pailoor, Andrew Aday, and Suman Jana. 2018. MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation. In 27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15-17, 2018, William Enck and Adrienne Porter Felt (Eds.). USENIX Association, 729–743.

[35]

Richard Schumi and Jun Sun. 2021. SpecTest: Specification-Based Compiler Testing. In Fundamental Approaches to Software Engineering - 24th International Conference, FASE 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, Esther Guerra and Mariëlle Stoelinga (Eds.) (Lecture Notes in Computer Science, Vol. 12649). Springer, 269–291. https://doi.org/10.1007/978-3-030-71500-7_14

Digital Library

[36]

Mayank Sharma, Pingshi Yu, and Alastair F Donaldson. 2023. RustSmith: Random Differential Compiler Testing for Rust. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1483–1486.

Digital Library

[37]

Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A Comprehensive Study of Deep Learning Compiler Bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA. 968–980. isbn:9781450385626 https://doi.org/10.1145/3468264.3468591

Digital Library

[38]

Nicolas Vasilache, Oleksandr Zinenko, Aart JC Bik, Mahesh Ravishankar, Thomas Raoux, Alexander Belyaev, Matthias Springer, Tobias Gysi, Diego Caballero, and Stephan Herhut. 2023. Structured Operations: Modular Design of Code Generators for Tensor Compilers. In International Workshop on Languages and Compilers for Parallel Computing. 141–156.

[39]

Haoyu Wang, Junjie Chen, Chuyue Xie, Shuang Liu, Zan Wang, Qingchao Shen, and Yingquan Zhao. 2023. MLIRSmith: Random Program Generation for Fuzzing MLIR Compiler Infrastructure. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering.

Digital Library

[40]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 579–594.

[41]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: grammar-aware greybox fuzzing. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 724–735.

Digital Library

[42]

Junjie Wang, Zhiyi Zhang, Shuang Liu, Xiaoning Du, and Junjie Chen. 2023. FuzzJIT: Oracle-Enhanced Fuzzing for JavaScript Engine JIT Compiler. In 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023, Joseph A. Calandrino and Carmela Troncoso (Eds.). USENIX Association, 1865–1882.

[43]

Zan Wang, Yingquan Zhao, Shuang Liu, Jun Sun, Xiang Chen, and Huarui Lin. 2019. Map-coverage: A novel coverage criterion for testing thread-safe classes. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 722–734.

Digital Library

[44]

Mingyuan Wu, Minghai Lu, Heming Cui, Junjie Chen, Yuqun Zhang, and Lingming Zhang. 2023. Jitfuzz: Coverage-guided fuzzing for jvm just-in-time compilers. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 56–68.

Digital Library

[45]

Mingyuan Wu, Minghai Lu, Heming Cui, Junjie Chen, Yuqun Zhang, and Lingming Zhang. 2023. JITfuzz: Coverage-guided Fuzzing for JVM Just-in-Time Compilers. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 56–68.

[46]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4all: Universal fuzzing with large language models. Proc. IEEE/ACM ICSE.

Digital Library

[47]

Chen Yang, Junjie Chen, Jiajun Jiang, and Yuliang Huang. 2024. Dependency-aware code naturalness. Proceedings of the ACM on Programming Languages.

[48]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation. 283–294.

Digital Library

[49]

Hongbin Zhang, Mingjie Xing, Yanjun Wu, and Chen Zhao. 2023. Compiler Technologies in Deep Learning Co-Design: A Survey. Intelligent Computing.

[50]

Yingquan Zhao, Zan Wang, Junjie Chen, Mengdi Liu, Mingyuan Wu, Yuqun Zhang, and Lingming Zhang. 2022. History-driven test program synthesis for JVM testing. In Proceedings of the 44th International Conference on Software Engineering. 1133–1144.

Digital Library

[51]

Zhide Zhou, Zhilei Ren, Guojun Gao, and He Jiang. 2021. An empirical study of optimization bugs in GCC and LLVM. Journal of Systems and Software, 174 (2021), 110884.

Cited By

Yang CChen JJiang JHuang Y(2024)Dependency-Aware Code NaturalnessProceedings of the ACM on Programming Languages10.1145/36897948:OOPSLA2(2355-2377)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689794

Index Terms

Fuzzing MLIR Compiler Infrastructure via Operation Dependency Analysis
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

MLIRSmith: Random Program Generation for Fuzzing MLIR Compiler Infrastructure
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

MLIR (Multi-Level Intermediate Representation) compiler infrastructure has gained popularity in recent years to support the construction of many compilers. Instead of designing a new IR with a single abstraction for each domain, MLIR compiler ...
Automated conformance testing for JavaScript engines via deep compiler fuzzing
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

JavaScript (JS) is a popular, platform-independent programming language. To ensure the interoperability of JS programs across different platforms, the implementation of a JS engine should conform to the ECMAScript standard. However, doing so is ...
A Survey of Compiler Testing

Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their correctness is of paramount importance. To validate and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2024: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 2024

1928 pages

ISBN:9798400706127

DOI:10.1145/3650212

General Chair:
Maria Christakis
TU Wien, Austria
,
Program Chair:
Michael Pradel
University of Stuttgart, Germany

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Young Elite Scientists Sponsorship Program by CAST

Conference

ISSTA '24

Sponsor:

SIGSOFT

ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
232
Total Downloads

Downloads (Last 12 months)232
Downloads (Last 6 weeks)38

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang CChen JJiang JHuang Y(2024)Dependency-Aware Code NaturalnessProceedings of the ACM on Programming Languages10.1145/36897948:OOPSLA2(2355-2377)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689794

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents