[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

BIRD: A Binary Intermediate Representation for Formally Verified Decompilation of X86-64 Binaries

  • Conference paper
  • First Online:
Tests and Proofs (TAP 2023)

Abstract

We present BIRD: A Binary Intermediate Representation for formally verified Decompilation of x86-64 binaries. BIRD is a generic language capable of representing a binary program at various stages of decompilation. Decompilation can consist of various small translation passes, each raising the abstraction level from assembly to source code. Where most decompilation frameworks do not guarantee that their translations preserve the program’s operational semantics or even provide any formal semantics, translation passes built on top of BIRD must prove their output to be bisimilar to their input. This work presents the mathematical machinery needed to define BIRD. Moreover, it provides two instantiations - one representing x86-64 assembly, and one where registers have been replaced by variables—as well as a formally proven correct translation pass between them. This translation serves both as a practical first step in trustworthy decompilation as well as a proof of concept that semantic preserving translations of low-level programs are feasible. The entire effort has been formalized in the Coq theorem prover. As such, it does not only provide a mathematical formalism but can also be exported as executable code to be used in a decompiler. We envision BIRD to be used to define provably correct binary-level analyses and program transformations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Made available here: https://doi.org/10.5281/zenodo.7928215.

References

  1. Barthe, G., Demange, D., Pichardie, D.: A formally verified SSA-based middle-end: Static single assignment meets compcert. In: Proceedings of the 21st European Conference on Programming Languages and Systems. p. 47–66. ESOP’12, Springer-Verlag, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28869-2_3

  2. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: A binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) Computer Aided Verification. pp. 463–469. Springer, Berlin Heidelberg, Berlin, Heidelberg (2011). 0.1007/978-3-642-22110-1_37

    Google Scholar 

  3. Brumley, D., Lee, J., Schwartz, E.J., Woo, M.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Security Symposium (2013)

    Google Scholar 

  4. Burk, K., Pagani, F., Kruegel, C., Vigna, G.: Decomperson: How humans decompile and what we can learn from it. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 2765–2782. USENIX Association, Boston, MA (Aug 2022) https://www.usenix.org/conference/usenixsecurity22/presentation/burk

  5. Canzanese, R., Oyer, M., Mancoridis, S., Kam, M.: A survey of reverse engineering tools for the 32-bit microsoft windows environment (01 2005)

    Google Scholar 

  6. Coq Development Team: The coq proof assistant. https://coq.inria.fr/ Accessed 12 May 2023

  7. Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13(4), 451–490 (oct 1991). https://doi.org/10.1145/115372.115320

  8. Dasgupta, S., Park, D., Kasampalis, T., Adve, V.S., Roşu, G.: A complete formal semantics of x86–64 user-level instruction set architecture. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1133–1148. PLDI 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3314221.3314601

  9. Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. IT Pro, No Starch Press (2008). https://books.google.de/books?id=BoFaZ1dB1H0C

  10. Hamlen, K.W., Fisher, D., Lundquist, G.R.: Source-free machine-checked validation of native code in coq. In: Proceedings of the 3rd ACM Workshop on Forming an Ecosystem Around Software Transformation, pp. 25–30. FEAST’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338502.3359759

  11. Hex Rays: Ida pro. https://hex-rays.com/ida-pro/ Accessed 12 May 2023

  12. Kennedy, A., Benton, N., Jensen, J.B., Dagand, P.E.: Coq: The world’s best macro assembler? In: Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming, pp. 13–24. PPDP ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2505879.2505897

  13. Kumar, R., Mullen, E., Tatlock, Z., Myreen, M.O.: Software verification with ITPs should use binary code extraction to reduce the TCB. In: Avigad, J., Mahboubi, A. (eds.) Interactive Theorem Proving, pp. 362–369. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-94821-8_21

  14. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004, pp. 75–86 (2004). https://doi.org/10.1109/CGO.2004.1281665

  15. Lattner, C., et al.: MLIR: Scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308

  16. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009). https://doi.org/10.1145/1538788.1538814, http://xavierleroy.org/publi/compcert-CACM.pdf

  17. Letouzey, P.: Extraction in coq: An overview. In: Beckmann, A., Dimitracopoulos, C., Löwe, B. (eds.) Logic and Theory of Algorithms. pp. 359–369. Springer, Berlin Heidelberg, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69407-6_39

  18. N. S. Agency: Ghidra. https://ghidra-sre.org/Accessed 12 May 2023

  19. Naus, N., Verbeek, F., Ravindran, B.: A formal semantics for P-Code. In: 14th International Conference on Verified Software: Theories, Tools, and Experiments (2022). https://doi.org/10.1007/978-3-031-25803-9_7

  20. Palsberg, J.: Type-based analysis and applications. In: ACM SIGPLAN/SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (07 2001). https://doi.org/10.1145/379605.379635

  21. PNF Software: Jeb decompiler. https://www.pnfsoftware.com/ Accessed 12 May 2023

  22. Schulte, E., Ruchti, J., Noonan, M., Ciarletta, D., Loginov, A.: Evolving exact decompilation. In: Shoshitaishvili, Y., Wang, R.F. (eds.) Workshop on Binary Analysis Research. San Diego, CA, USA (Feb 18–21 2018). https://doi.org/10.14722/bar.2018.23008, https://www.ndss-symposium.org/ndss2018/bar-workshop-programme/

  23. Vector 35 Inc.: Binary ninja. https://binary.ninja/ Accessed 12 May 2023

  24. Verbeek, F., Bharadwaj, A., Bockenek, J., Roessle, I., Weerwag, T., Ravindran, B.: X86 instruction semantics and basic block symbolic execution. Archive of Formal Proofs (Oct 2021). https://isa-afp.org/entries/X86_Semantics.html, Formal proof development

  25. Yakdan, K., Dechand, S., Gerhards-Padilla, E., Smith, M.: Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 158–177. IEEE Computer Society, Los Alamitos, CA, USA (May 2016). https://doi.org/10.1109/SP.2016.18

  26. Zakowski, Y., Beck, C., Yoon, I., Zaichuk, I., Zaliva, V., Zdancewic, S.: Modular, compositional, and executable formal semantics for LLVM IR. Proc. ACM Program. Lang. 5(ICFP) (Aug 2021). https://doi.org/10.1145/3473572

Download references

Acknowledgements

This work is supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4028.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Engel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Engel, D., Verbeek, F., Ravindran, B. (2023). BIRD: A Binary Intermediate Representation for Formally Verified Decompilation of X86-64 Binaries. In: Prevosto, V., Seceleanu, C. (eds) Tests and Proofs. TAP 2023. Lecture Notes in Computer Science, vol 14066. Springer, Cham. https://doi.org/10.1007/978-3-031-38828-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38828-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38827-9

  • Online ISBN: 978-3-031-38828-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics