[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Open access

Control-Flow Deobfuscation using Trace-Informed Compositional Program Synthesis

Published: 08 October 2024 Publication History

Abstract

Code deobfuscation, which attempts to simplify code that has been intentionally obfuscated to prevent understanding, is a critical technique for downstream security analysis tasks like malware detection. While there has been significant prior work on code deobfuscation, most techniques either do not handle control flow obfuscations that modify control flow or they target specific classes of control flow obfuscations, making them unsuitable for handling new types of obfuscations or combinations of existing ones. In this paper, we study a new deobfuscation technique that is based on program synthesis and that can handle a broad class of control flow obfuscations. Given an obfuscated program P, our approach aims to synthesize a smallest program that is a control-flow reduction of P and that is semantically equivalent. Since our method does not assume knowledge about the types of obfuscations that have been applied to the original program, the underlying synthesis problem ends up being very challenging. To address this challenge, we propose a novel trace-informed compositional synthesis algorithm that leverages hints present in dynamic traces of the obfuscated program to decompose the synthesis problem into a set of simpler subproblems. In particular, we show how dynamic traces can be useful for inferring a suitable control-flow skeleton of the deobfuscated program and performing independent synthesis of each basic block. We have implemented this approach in a tool called Chisel and evaluate it on 546 benchmarks that have been obfuscated using combinations of six different obfuscation techniques. Our evaluation shows that our approach is effective and that it produces code that is almost identical (modulo variable renaming) to the original (non-obfuscated) program in 86% of cases. Our evaluation also shows that Chisel significantly outperforms existing techniques.

References

[1]
Anil Altinay, Joseph Nash, Taddeus Kroes, Prabhu Rajasekaran, Dixin Zhou, Adrian Dabrowski, David Gens, Yeoul Na, Stijn Volckaert, Cristiano Giufrida, Herbert Bos, and Michael Franz. 2020. BinRec: dynamic binary lifting and recompilation. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) ( EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 36, 16 pages. https://doi.org/10.1145/3342195. 3387550
[2]
Rajeev Alur, Pavol Černý, and Arjun Radhakrishna. 2015. Synthesis Through Unification. In Computer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Springer International Publishing, Cham, 163-179. https://doi.org/10. 1007/978-3-319-21668-3_10
[3]
Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In Tools and Algorithms for the Construction and Analysis of Systems, Axel Legay and Tiziana Margaria (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 319-336. https://doi.org/10.1007/978-3-319-21668-3_10
[4]
Sebastian Banescu. 2017. Characterizing the strength of software obfuscation against automated attacks. Ph. D. Dissertation. Technische Universität München.
[5]
Sebastian Banescu, Christian Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code Obfuscation against Symbolic Execution Attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications (Los Angeles, California, USA) ( ACSAC '16). Association for Computing Machinery, New York, NY, USA, 189-200. https://doi.org/10.1145/2991079.2991114
[6]
Sebastian Banescu, Christian Collberg, and Alexander Pretschner. 2017. Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning. In USENIX Security Symposium. USENIX, USA, 661-678.
[7]
Sebastian Banescu, Martín Ochoa, and Alexander Pretschner. 2015. A framework for measuring software obfuscation resilience against automated attacks. In 2015 IEEE/ACM 1st International Workshop on Software Protection. IEEE, IEEE, USA, 45-51.
[8]
Chandan Kumar Behera and D Lalitha Bhaskari. 2015. Diferent obfuscation techniques for code protection. Procedia Computer Science 70 ( 2015 ), 757-763.
[9]
Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. 2017. Syntia: Synthesizing the Semantics of Obfuscated Code. In USENIX Security Symposium. USENIX, USA, 643-659.
[10]
Pietro Borrello, Emilio Coppa, Daniele Cono D'Elia, and Camil Demetrescu. 2019. The ROP needle: hiding trigger-based injection vectors via code reuse. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (Limassol, Cyprus) (SAC '19). Association for Computing Machinery, New York, NY, USA, 1962-1970. https://doi.org/10.1145/ 3297280.3297472
[11]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of HighCoverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (San Diego, California) ( OSDI'08). USENIX Association, USA, 209-224.
[12]
Jien-Tsai Chan and Wuu Yang. 2004. Advanced obfuscation techniques for Java bytecode. Journal of Systems and Software 71, 1 ( 2004 ), 1-10. https://doi.org/10.1016/S0164-1212 ( 02 ) 00066-3
[13]
Christian Collberg. 2023. Flatten. https://tigress.wtf/flatten.html. Accessed: 2023-04-10.
[14]
Christian Collberg. 2023. The Tigress C Obfuscator. https://tigress.wtf/. Accessed: 2023-04-09.
[15]
Christian Collberg, Clark Thomborson, and Douglas Low. 1997. A taxonomy of obfuscating transformations. Technical Report. Department of Computer Science, The University of Auckland, New Zealand.
[16]
Christian Collberg, Clark Thomborson, and Douglas Low. 1998. Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) ( POPL '98). Association for Computing Machinery, New York, NY, USA, 184-196. https://doi.org/10.1145/268946.268962
[17]
Kevin Coogan, Gen Lu, and Saumya Debray. 2011. Deobfuscation of Virtualization-Obfuscated Software A SemanticsBased Approach. In Proceedings of the 18th ACM Conference on Computer and Communications Security. ACM, USA, 275-284. https://doi.org/10.1145/2046707.2046739
[18]
Robin David, Luigi Coniglio, and Mariano Ceccato. 2020. QSynth-A Program Synthesis based approach for Binary Code Deobfuscation. Proceedings 2020 Workshop on Binary Analysis Research 0, 0 ( 2020 ), 42-49.
[19]
Stephen Dolan. 2013. mov is Turing-complete. https://drwho.virtadpt.net/files/mov.pdf
[20]
Rui Dong, Zhicheng Huang, Ian Iong Lam, Yan Chen, and Xinyu Wang. 2022. WebRobot: web robotic process automation using interactive programming-by-demonstration. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (San Diego, CA, USA) ( PLDI 2022 ). Association for Computing Machinery, New York, NY, USA, 152-167. https://doi.org/10.1145/3519939.3523711
[21]
Weiyu Dong, Jian Lin, Rui Chang, and Ruimin Wang. 2022. CaDeCFF: Compiler-Agnostic Deobfuscator of Control Flow Flattening. In Internetware 2022: 13th Asia-Pacific Symposium on Internetware, Hohhot, China, June 11-12, 2022. ACM, China, 282-291. https://doi.org/10.1145/3545258.3545269
[22]
Weiyu Dong, Jian Lin, Rui Chang, and Ruimin Wang. 2022. CaDeCFF: Compiler-Agnostic Deobfuscator of Control Flow Flattening. In Proceedings of the 13th Asia-Pacific Symposium on Internetware (Hohhot, China) (Internetware '22). Association for Computing Machinery, New York, NY, USA, 282-291. https://doi.org/10.1145/3545258.3545269
[23]
Stephen Drape. 2010. Intellectual property protection using obfuscation.
[24]
John Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. ACM SIGPLAN Notices 50 (06 2015 ), 229-239. https://doi.org/10.1145/2813885.2737977
[25]
Peter Garba and Matteo Favaro. 2019. SATURN-Software Deobfuscation Framework Based on LLVM. CoRR abs/1909.01752 ( 2019 ), 27-38. arXiv: 1909.01752 http://arxiv.org/abs/ 1909.01752
[26]
GNU. 2023. GDB: The GNU Project Debugger. https://www.sourceware.org/gdb/. Accessed: 2023-04-09.
[27]
Sankha Narayan Guria, Jefrey S. Foster, and David Van Horn. 2021. RbSyn: Type-and Efect-Guided Program Synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) ( PLDI 2021 ). Association for Computing Machinery, New York, NY, USA, 344-358. https://doi.org/10.1145/3453483.3454048
[28]
Adrian Herrera. 2020. Optimizing Away JavaScript Obfuscation. In 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, USA, 215-220. https://doi.org/10.1109/SCAM51674. 2020.00029
[29]
SA Hex-Rays. 2013. Hex-Rays Decompiler.
[30]
Martin Hofmann. 2010. IGOR2-an analytical inductive functional programming system: tool demo. In Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (Madrid, Spain) ( PEPM '10). Association for Computing Machinery, New York, NY, USA, 29-32. https://doi.org/10.1145/1706356.1706364
[31]
Anusthika Jeyashankar. 2023. Most Common Malware obfuscation Techniques. https://www.socinvestigation. com/ most-common-malware-obfuscation-techniques/. Accessed: 2023-04-09.
[32]
Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-Guided Component-Based Program Synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 ( Cape Town, South Africa) (ICSE '10). Association for Computing Machinery, New York, NY, USA, 215-224. https://doi.org/ 10.1145/1806799.1806833
[33]
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM-Software Protection for the Masses. In 2015 IEEE/ACM 1st International Workshop on Software Protection. IEEE, USA, 3-9. https://doi.org/10. 1109/SPRO. 2015.10
[34]
OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[35]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. SIGPLAN Not. 50, 6 (jun 2015 ), 619-630. https://doi.org/10.1145/2813885.2738007
[36]
Colby Parker, Jefrey Todd McDonald, and Dimitrios Damopoulos. 2021. Machine Learning Classification of Obfuscation using Image Visualization. In Proceedings of the 18th International Conference on Security and Cryptography, SECRYPT 2021, July 6-8, 2021, Sabrina De Capitani di Vimercati and Pierangela Samarati (Eds.). SCITEPRESS, USA, 854-859. https://doi.org/10.5220/0010607408540859
[37]
Noah Patton, Kia Rahmani, Meghana Missula, Joydeep Biswas, and Işıl Dillig. 2024. Programming-by-Demonstration for Long-Horizon Robot Tasks. Proceedings of the ACM on Programming Languages 8, POPL (Jan. 2024 ), 512-545. https://doi.org/10.1145/3632860
[38]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. SIGPLAN Not. 51, 6 (jun 2016 ), 522-538. https://doi.org/10.1145/2980983.2908093
[39]
Aleieldin Salem and Sebastian Banescu. 2016. Metadata recovery from obfuscated programs using machine learning. In Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering, SSPREW@ACSAC 2016, Los Angeles, California, USA, December 5-6, 2016, Mila Dalla Preda, Natalia Stakhanova, and Jefrey Todd McDonald (Eds.). ACM, USA, 1 : 1-1 : 11. https://doi.org/10.1145/3015135.3015136
[40]
Jiasi Shen and Martin C. Rinard. 2019. Using active learning to synthesize models of applications that access databases. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) ( PLDI 2019 ). Association for Computing Machinery, New York, NY, USA, 269-285. https://doi.org/10.1145/ 3314221.3314591
[41]
Eui Chul Shin, Illia Polosukhin, and Dawn Song. 2018. Improving Neural Program Synthesis with Inferred Execution Traces. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc., USA. https://proceedings.neurips.cc/paper_ifles/paper/2018/file/7776e88b0c189539098176589250bcba-Paper.pdf
[42]
Armando Solar-Lezama, Christopher Grant Jones, and Rastislav Bodík. 2008. Sketching concurrent data structures. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI). ACM, USA, 136-148. https://doi.org/10.1145/1375581.1375599
[43]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. 2006. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, USA, 404-415. https://doi.org/10.1145/1168857.1168907
[44]
PreEmptive Solutions. 2023. Control Flow Obfuscation. https://www.preemptive.com/dasho/pro/userguide/en/ understanding_obfuscation_control.html. Accessed: 2023-04-11.
[45]
Ramtine Tofighi-Shirazi, Irina Mariuca Asavoae, and Philippe Elbaz-Vincent. 2019. Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning. CoRR abs/ 1911.07523 ( 2019 ), 1-12. arXiv: 1911.07523 http://arxiv.org/abs/ 1911.07523
[46]
Ramtine Tofighi-Shirazi, Irina-Mariuca Asavoae, Philippe Elbaz-Vincent, and Thanh-Ha Le. 2019. Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis. In Proceedings of the 3rd ACM Workshop on Software Protection (London, United Kingdom) (SPRO'19). Association for Computing Machinery, New York, NY, USA, 3-14. https://doi.org/10.1145/3338503.3357719
[47]
Ramtine Tofighi-Shirazi, Maria Christofi, Philippe Elbaz-Vincent, and Thanh-ha Le. 2018. DoSE: Deobfuscation Based on Semantic Equivalence. In Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop (San Juan, PR, USA) ( SSPREW-8). Association for Computing Machinery, New York, NY, USA, Article 1, 12 pages. https://doi.org/10.1145/3289239.3289243
[48]
S.K. Udupa, S.K. Debray, and M. Madou. 2005. Deobfuscation: reverse engineering obfuscated code. In 12th Working Conference on Reverse Engineering (WCRE'05). IEEE, USA, 10 pp.-54. https://doi.org/10.1109/WCRE. 2005.13
[49]
Code Virtualizer. 2023. Total obfuscation against reverse engineering.
[50]
Babak Yadegari and Saumya Debray. 2015. Symbolic Execution of Obfuscated Code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (Denver, Colorado, USA) ( CCS '15). Association for Computing Machinery, New York, NY, USA, 732-744. https://doi.org/10.1145/2810103.2813663
[51]
Babak Yadegari, Brian Johannesmeyer, Ben Whitely, and Saumya Debray. 2015. A Generic Approach to Automatic Deobfuscation of Executable Code. In 2015 IEEE Symposium on Security and Privacy. IEEE, USA, 674-691. https: //doi.org/10.1109/SP. 2015.47
[52]
Xinlei Yao, Jianmin Pang, Yichi Zhang, Yong Yu, and Jianping Lu. 2012. A Method and Implementation of Control Flow Obfuscation Using SEH. In 2012 Fourth International Conference on Multimedia Information Networking and Security. IEEE, USA, 336-339. https://doi.org/10.1109/MINES. 2012.25
[53]
Geunha You, Gyoosik Kim, Sangchul Han, Minkyu Park, and Seong-Je Cho. 2022. Deoptfuscator: Defeating Advanced Control-Flow Obfuscation Using Android Runtime (ART). IEEE Access 10 ( 2022 ), 61426-61440. https://doi.org/10.1109/ ACCESS. 2022.3181373
[54]
Geunha You, Gyoosik Kim, Jihyeon Park, Seong-Je Cho, and Minkyu Park. 2020. Reversing obfuscated control flow structures in android apps using redex optimizer. In The 9th International Conference on Smart Media and Applications. ACM, USA, 272-276.
[55]
Geunha You, Gyoosik Kim, Jihyeon Park, Seong-Je Cho, and Minkyu Park. 2021. Reversing Obfuscated Control Flow Structures in Android Apps Using ReDex Optimizer. In ACM (Jeju, Republic of Korea) (SMA 2020 ). Association for Computing Machinery, New York, NY, USA, 272-276. https://doi.org/10.1145/3426020.3426089
[56]
Ilsun You and Kangbin Yim. 2010. Malware Obfuscation Techniques: A Brief Survey. In 2010 International Conference on Broadband, Wireless Computing, Communication and Applications. IEEE, USA, 297-300. https://doi.org/10.1109/ BWCCA. 2010.85
[57]
Yongwei Yuan, Arjun Radhakrishna, and Roopsha Samanta. 2023. Trace-Guided Inductive Synthesis of Recursive Functional Programs. Proc. ACM Program. Lang. 7, PLDI, Article 141 (jun 2023 ), 24 pages. https://doi.org/10.1145/ 3591255
[58]
Guoqiang Zhang, Yuanchao Xu, Xipeng Shen, and Işıl Dillig. 2021. UDF to SQL translation through compositional lazy inductive synthesis. Proc. ACM Program. Lang. 5, OOPSLA, Article 112 (oct 2021 ), 26 pages. https://doi.org/10.1145/ 3485489
[59]
Yujie Zhao, Zhanyong Tang, Guixin Ye, Xiaoqing Gong, and Dingyi Fang. 2021. Input-output example-guided data deobfuscation on binary. Security and Communication Networks 2021 ( 2021 ), 1-16.
[60]
Yongxin Zhou, Alec Main, Yuan X. Gu, and Harold Johnson. 2007. Information Hiding in Software with Mixed Boolean-Arithmetic Transforms. In Information Security Applications, Sehun Kim, Moti Yung, and Hyung-Woo Lee (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 61-75.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA2
October 2024
2691 pages
EISSN:2475-1421
DOI:10.1145/3554319
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024
Published in PACMPL Volume 8, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Deobfuscation
  2. Obfuscation
  3. Program Synthesis

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 184
    Total Downloads
  • Downloads (Last 12 months)184
  • Downloads (Last 6 weeks)109
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media