Abstract
We present BIRD: A Binary Intermediate Representation for formally verified Decompilation of x86-64 binaries. BIRD is a generic language capable of representing a binary program at various stages of decompilation. Decompilation can consist of various small translation passes, each raising the abstraction level from assembly to source code. Where most decompilation frameworks do not guarantee that their translations preserve the program’s operational semantics or even provide any formal semantics, translation passes built on top of BIRD must prove their output to be bisimilar to their input. This work presents the mathematical machinery needed to define BIRD. Moreover, it provides two instantiations - one representing x86-64 assembly, and one where registers have been replaced by variables—as well as a formally proven correct translation pass between them. This translation serves both as a practical first step in trustworthy decompilation as well as a proof of concept that semantic preserving translations of low-level programs are feasible. The entire effort has been formalized in the Coq theorem prover. As such, it does not only provide a mathematical formalism but can also be exported as executable code to be used in a decompiler. We envision BIRD to be used to define provably correct binary-level analyses and program transformations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Made available here: https://doi.org/10.5281/zenodo.7928215.
References
Barthe, G., Demange, D., Pichardie, D.: A formally verified SSA-based middle-end: Static single assignment meets compcert. In: Proceedings of the 21st European Conference on Programming Languages and Systems. p. 47–66. ESOP’12, Springer-Verlag, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28869-2_3
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: A binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verification. pp. 463–469. Springer, Berlin Heidelberg, Berlin, Heidelberg (2011). 0.1007/978-3-642-22110-1_37
Brumley, D., Lee, J., Schwartz, E.J., Woo, M.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Security Symposium (2013)
Burk, K., Pagani, F., Kruegel, C., Vigna, G.: Decomperson: How humans decompile and what we can learn from it. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 2765–2782. USENIX Association, Boston, MA (Aug 2022) https://www.usenix.org/conference/usenixsecurity22/presentation/burk
Canzanese, R., Oyer, M., Mancoridis, S., Kam, M.: A survey of reverse engineering tools for the 32-bit microsoft windows environment (01 2005)
Coq Development Team: The coq proof assistant. https://coq.inria.fr/ Accessed 12 May 2023
Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (oct 1991). https://doi.org/10.1145/115372.115320
Dasgupta, S., Park, D., Kasampalis, T., Adve, V.S., Roşu, G.: A complete formal semantics of x86–64 user-level instruction set architecture. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1133–1148. PLDI 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3314221.3314601
Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. IT Pro, No Starch Press (2008). https://books.google.de/books?id=BoFaZ1dB1H0C
Hamlen, K.W., Fisher, D., Lundquist, G.R.: Source-free machine-checked validation of native code in coq. In: Proceedings of the 3rd ACM Workshop on Forming an Ecosystem Around Software Transformation, pp. 25–30. FEAST’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338502.3359759
Hex Rays: Ida pro. https://hex-rays.com/ida-pro/ Accessed 12 May 2023
Kennedy, A., Benton, N., Jensen, J.B., Dagand, P.E.: Coq: The world’s best macro assembler? In: Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming, pp. 13–24. PPDP ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2505879.2505897
Kumar, R., Mullen, E., Tatlock, Z., Myreen, M.O.: Software verification with ITPs should use binary code extraction to reduce the TCB. In: Avigad, J., Mahboubi, A. (eds.) Interactive Theorem Proving, pp. 362–369. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-94821-8_21
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004, pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665
Lattner, C., et al.: MLIR: Scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009). https://doi.org/10.1145/1538788.1538814, http://xavierleroy.org/publi/compcert-CACM.pdf
Letouzey, P.: Extraction in coq: An overview. In: Beckmann, A., Dimitracopoulos, C., Löwe, B. (eds.) Logic and Theory of Algorithms. pp. 359–369. Springer, Berlin Heidelberg, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69407-6_39
N. S. Agency: Ghidra. https://ghidra-sre.org/Accessed 12 May 2023
Naus, N., Verbeek, F., Ravindran, B.: A formal semantics for P-Code. In: 14th International Conference on Verified Software: Theories, Tools, and Experiments (2022). https://doi.org/10.1007/978-3-031-25803-9_7
Palsberg, J.: Type-based analysis and applications. In: ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (07 2001). https://doi.org/10.1145/379605.379635
PNF Software: Jeb decompiler. https://www.pnfsoftware.com/ Accessed 12 May 2023
Schulte, E., Ruchti, J., Noonan, M., Ciarletta, D., Loginov, A.: Evolving exact decompilation. In: Shoshitaishvili, Y., Wang, R.F. (eds.) Workshop on Binary Analysis Research. San Diego, CA, USA (Feb 18–21 2018). https://doi.org/10.14722/bar.2018.23008, https://www.ndss-symposium.org/ndss2018/bar-workshop-programme/
Vector 35 Inc.: Binary ninja. https://binary.ninja/ Accessed 12 May 2023
Verbeek, F., Bharadwaj, A., Bockenek, J., Roessle, I., Weerwag, T., Ravindran, B.: X86 instruction semantics and basic block symbolic execution. Archive of Formal Proofs (Oct 2021). https://isa-afp.org/entries/X86_Semantics.html, Formal proof development
Yakdan, K., Dechand, S., Gerhards-Padilla, E., Smith, M.: Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 158–177. IEEE Computer Society, Los Alamitos, CA, USA (May 2016). https://doi.org/10.1109/SP.2016.18
Zakowski, Y., Beck, C., Yoon, I., Zaichuk, I., Zaliva, V., Zdancewic, S.: Modular, compositional, and executable formal semantics for LLVM IR. Proc. ACM Program. Lang. 5(ICFP) (Aug 2021). https://doi.org/10.1145/3473572
Acknowledgements
This work is supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4028.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Engel, D., Verbeek, F., Ravindran, B. (2023). BIRD: A Binary Intermediate Representation for Formally Verified Decompilation of X86-64 Binaries. In: Prevosto, V., Seceleanu, C. (eds) Tests and Proofs. TAP 2023. Lecture Notes in Computer Science, vol 14066. Springer, Cham. https://doi.org/10.1007/978-3-031-38828-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-38828-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38827-9
Online ISBN: 978-3-031-38828-6
eBook Packages: Computer ScienceComputer Science (R0)