More Web Proxy on the site http://driver.im/

research-article

Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis

Authors:

Ramtine Tofighi-Shirazi,

Irina-Mariuca Asavoae,

Philippe Elbaz-Vincent,

Thanh-Ha LeAuthors Info & Claims

SPRO'19: Proceedings of the 3rd ACM Workshop on Software Protection

Pages 3 - 14

https://doi.org/10.1145/3338503.3357719

Published: 15 November 2019 Publication History

Abstract

We present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deobfuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology.

References

[1]

The Algorithms. [n. d.]. C. https://github.com/TheAlgorithms/C/. [Online; accessed 30-01-2019].

[2]

Roberto Baldoni, Emilio Coppa, Daniele Cono D'Elia, Camil Demetrescu, and Irene Finocchi. 2018. Survey of Symbolic Execution Techniques. ACM Comput. Surv., Vol. 51, 3 (2018), 50:1--50:39. https://doi.org/10.1145/3182657

Digital Library

[3]

Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, USA. 189--200. http://dl.acm.org/citation.cfm?id=2991114

Digital Library

[4]

Sé bastien Bardin, Robin David, and Jean-Yves Marion. 2017. Backward-Bounded DSE: Targeting Infeasibility Questions on Obfuscated Codes. In 2017 IEEE Symposium on Security and Privacy, SP 2017, USA. 633--651. https://doi.org/10.1109/SP.2017.36

[5]

Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2017. The SMT-LIB Standard: Version 2.6. Technical Report. Department of Computer Science, The University of Iowa. Available at www.SMT-LIB.org.

[6]

Fabrizio Biondi, Sé bastien Josse, Axel Legay, and Thomas Sirvent. 2017. Effectiveness of synthesis in concolic deobfuscation. Computers & Security, Vol. 70 (2017), 500--515. https://doi.org/10.1016/j.cose.2017.07.006

[7]

Fabrizio Biondi, Sébastien Josse, Axel Legay, and Thomas Sirvent. 2017. Effec-tiveness of synthesis in concolic deobfuscation.Computers & Security 70 (2017),500--515. https://doi.org/10.1016/j.cose.2017.07.006

[8]

Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. 2017. Syntia: Synthesizing the Semantics of Obfuscated Code. In 26th USENIX Security Symposium, USENIX Security 2017, Canada. 643--659. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/blazytko

[9]

Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting Self-mutating Malware Using Control-Flow Graph Matching. In Detection of Intrusions and Malware & Vulnerability Assessment, Third International Conference, DIMVA 2006 Proceedings, Germany. 129--143. https://doi.org/10.1007/11790754_8

[10]

Christian Collberg, Sam Martin, Jonathan Myers, Bill Zimmerman, Petr Krajca, Gabriel Kerneis, Saumya Debray, and Babak Yadegari. [n. d.]. The Tigress C Diversifier/Obfuscator. http://tigress.cs.arizona.edu/index.html. [Online; accessed 30-01--2019].

[11]

Christian Collberg, Clark Thomborson, and Douglas Low. 1997. A Taxonomy of Obfuscating Transformations.

[12]

Christian S. Collberg, Clark D. Thomborson, and Douglas Low. 1998. Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. In POPL '98, USA. 184--196. https://doi.org/10.1145/268946.268962

Digital Library

[13]

Brad Conte. [n. d.]. crypto-algorithms. https://github.com/B-Con/crypto-algorithms. [Online; accessed 30-01--2019].

[14]

Fabrice Desclaux. 2012. Miasm: Framework de reverse engineering. https://github.com/cea-sec/miasm. [Online; accessed 30-01-2019].

[15]

Thomas G. Dietterich. 1995. Overfitting and Undercomputing in Machine Learning. ACM Comput. Surv., Vol. 27, 3 (1995), 326--327. https://doi.org/10.1145/212094.212114

Digital Library

[16]

Ninon Eyrolles, Louis Goubin, and Marion Videau. 2016. Defeating MBA-based Obfuscation. In Proceedings of the 2016 ACM Workshop on Software PROtection, SPRO@CCS 2016, Austria. 27--38. https://doi.org/10.1145/2995306.2995308

Digital Library

[17]

Rosa L. Figueroa, Qing Zeng-Treitler, Sasikiran Kandula, and Long H. Ngo. 2012. Predicting sample size required for classification performance. BMC Med. Inf. & Decision Making, Vol. 12 (2012), 8.

[18]

Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction, 2nd Edition. Springer. http://www.worldcat.org/oclc/300478243

[19]

Hex-Rays. [n. d.]. IDA Pro : Interactive DisAssembler. https://www.hex-rays.com/products/ida/index.shtml. [Online; accessed 30-01--2019].

[20]

Simon Howard. [n. d.]. c-algorithms. https://github.com/fragglet/c-algorithms. [Online; accessed 30-01-2019].

[21]

Mike James. 1985. Classification Algorithms. Wiley-Interscience, USA.

[22]

Karen SpĠrck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 60, 5 (2004), 493--502. https://doi.org/10.1108/00220410410560573

[23]

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM -- Software Protection for the Masses. In Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO'15, Firenze, Italy, May 19th, 2015, Brecht Wyseur (Ed.). IEEE, 3--9. https://doi.org/10.1109/SPRO.2015.10

Digital Library

[24]

Brian W. Kernighan. 1988. The C Programming Language 2nd ed.). Prentice Hall Professional Technical Reference.

[25]

Ron Kohavi. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Canada. 1137--1145. http://ijcai.org/Proceedings/95--2/Papers/016.pdf

[26]

Sotiris B. Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica (Slovenia), Vol. 31, 3 (2007), 249--268. http://www.informatica.si/index.php/informatica/article/view/148

[27]

Aleksandrina Kovacheva. 2013. Efficient Code Obfuscation for Android. In Advances in Information Technology - 6th International Conference, IAIT 2013, Thailand. 104--119. https://doi.org/10.1007/978-3-319-03783-7_10

[28]

Arun Lakhotia, Eric Uday Kumar, and Michael Venable. 2005. A Method for Detecting Obfuscated Calls in Malicious Binaries. IEEE Trans. Software Eng., Vol. 31, 11 (2005), 955--968. https://doi.org/10.1109/TSE.2005.120

Digital Library

[29]

Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, USA, October 12-6, 2015. 757--768. https://doi.org/10.1145/2810103.2813617

Digital Library

[30]

Ginger Myles and Christian S. Collberg. 2006. Software watermarking via opaque predicates: Implementation, analysis, and attacks. Electronic Commerce Research, Vol. 6, 2 (2006), 155--171. https://doi.org/10.1007/s10660-006-6955-z

Digital Library

[31]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[32]

Mila Dalla Preda, Matias Madou, Koen De Bosschere, and Roberto Giacobazzi. 2006. Opaque Predicates Detection by Abstract Interpretation. In Algebraic Methodology and Software Technology, 11th International Conference, AMAST 2006, Estonia. 81--95. https://doi.org/10.1007/11784180_9

[33]

GNU Project. 2002. GNU Core Utilities. https://www.gnu.org/software/coreutils/. [Online; accessed 30-01-2019].

[34]

Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, Vol. 19, 4 (2011), 639--668.

Digital Library

[35]

Thomas Rinsma. 2017. Seeing through obfuscation: interactive detection and removal of opaque predicates. https://github.com/Riscure/DROP-IDA-plugin. [Online; accessed 30-01--2019].

[36]

Lior Rokach and Oded Maimon. 2014. Data Mining With Decision Trees: Theory and Applications 2nd ed.). World Scientific Publishing Co., Inc., USA.

[37]

Guido Rossum. 1995. Python Reference Manual. Technical Report. Amsterdam, The Netherlands, The Netherlands.

[38]

Aleieldin Salem and Sebastian Banescu. 2016. Metadata recovery from obfuscated programs using machine learning. In Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering, SSPREW 2016, USA, 2016. 1:1--1:11. https://doi.org/10.1145/3015135.3015136

Digital Library

[39]

Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar R. Weippl. 2016. Protecting Software through Obfuscation: Can It Keep Pace with Progress in Code Analysis? ACM Comput. Surv., Vol. 49, 1 (2016), 4:1--4:37. https://doi.org/10.1145/2886012

[40]

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy.

[41]

Bjorn De Sutter, Cataldo Basile, Mariano Ceccato, Paolo Falcarin, Michael Zunke, Brecht Wyseur, and Jé rô me d'Annoville. 2016. The ASPIRE Framework for Software Protection. In Proceedings of the 2016 ACM Workshop on Software PROtection, SPRO@CCS 2016, Vienna, Austria, October 24-28, 2016, Brecht Wyseur and Bjorn De Sutter (Eds.). ACM, 91--92. https://doi.org/10.1145/2995306.2995316

[42]

Ramtine Tofighi-Shirazi, Maria Christofi, Philippe Elbaz-Vincent, and Thanh Ha Le. 2018. DoSE: Deobfuscation based on Semantic Equivalence. In Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop, USA. 1:1--1:12. https://doi.org/10.1145/3289239.3289243

[43]

Sharath K. Udupa, Saumya K. Debray, and Matias Madou. 2005. Deobfuscation: Reverse Engineering Obfuscated Code. In 12th Working Conference on Reverse Engineering, WCRE 2005, USA. 45--54. https://doi.org/10.1109/WCRE.2005.13

[44]

Hui Xu, Yangfan Zhou, Yu Kang, Fengzhi Tu, and Michael R. Lyu. 2018. Manufacturing Resilient Bi-Opaque Predicates Against Symbolic Execution. In 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, Luxembourg. 666--677. https://doi.org/10.1109/DSN.2018.00073

[45]

Yongxin Zhou, Alec Main, Yuan Xiang Gu, and Harold Johnson. 2007. Information Hiding in Software with Mixed Boolean-Arithmetic Transforms. In Information Security Applications, 8th International Workshop, WISA 2007, Korea. 61--75. https://doi.org/10.1007/978-3-540-77535-5_5

Cited By

De Sutter BSchrittwieser SCoppens BKochberger P(2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
https://doi.org/10.1145/3702314
Mariano BWang ZPailoor SCollberg CDillig I(2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689789
Aragón-Jurado JJareóo JDe la Torre JRuiz PDorronsoro B(2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10612116
Show More Cited By

Index Terms

Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis

Recommendations

LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Opaque predicates have been widely used to insert superfluous branches for control flow obfuscation. Opaque predicates can be seamlessly applied together with other obfuscation methods such as junk code to turn reverse engineering attempts into arduous ...
DoSE: Deobfuscation based on Semantic Equivalence
SSPREW-8: Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop

Software deobfuscation is a key challenge in malware analysis to understand the internal logic of the code and establish adequate countermeasures. In order to defeat recent obfuscation techniques, state-of-the-art generic deobfuscation methodologies are ...
Zero Footprint Opaque Predicates: Synthesizing Opaque Predicates from Naturally Occurring Invariants
Detection of Intrusions and Malware, and Vulnerability Assessment
Abstract
A popular control-flow obfuscation approach used to protect software is inserting opaque predicates. However, recent research has questioned the usefulness of opaque predicates with the realization that simple heuristic attacks can effectively ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPRO'19: Proceedings of the 3rd ACM Workshop on Software Protection

November 2019

87 pages

ISBN:9781450368353

DOI:10.1145/3338503

General Chair:
Paolo Falcarin
University of East London, UK
,
Program Chair:
Michael 'MiZu' Zunke
Thales Group, Germany

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

French National Research Agency

Conference

CCS '19

Sponsor:

SIGSAC

CCS '19: 2019 ACM SIGSAC Conference on Computer and Communications Security

November 15, 2019

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 8 of 14 submissions, 57%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
326
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)9

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

De Sutter BSchrittwieser SCoppens BKochberger P(2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
https://doi.org/10.1145/3702314
Mariano BWang ZPailoor SCollberg CDillig I(2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689789
Aragón-Jurado JJareóo JDe la Torre JRuiz PDorronsoro B(2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10612116
Zhang ZTao GShen GAn SXu QLiu YYe YWu YZhang XCalandrino JTroncoso C(2023)PELICANProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620370(2365-2382)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620370
De Pasquale GNakanishi FFerla DCavallaro L(2023)ROPfuscator: Robust Obfuscation with ROP2023 IEEE Security and Privacy Workshops (SPW)10.1109/SPW59333.2023.00026(1-10)Online publication date: May-2023
https://doi.org/10.1109/SPW59333.2023.00026
Xiao XWang YHu YGu D(2023)xVMP: An LLVM-based Code Virtualization Obfuscator2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00082(738-742)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00082
Tang KShan ZZhang CXu LQiao MLiu F(2022)DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph NetworkElectronics10.3390/electronics1119323011:19(3230)Online publication date: 8-Oct-2022
https://doi.org/10.3390/electronics11193230
Li YWen BZheng H(2022)Generic O-LLVM Automatic Multi-architecture Deobfuscation Framework Based on Symbolic ExecutionProceedings of the 4th International Conference on Advanced Information Science and System10.1145/3573834.3574541(1-6)Online publication date: 25-Nov-2022
https://dl.acm.org/doi/10.1145/3573834.3574541
Banescu SValenzuela SGuggenmos MAhmadvand MPretschner A(2021)Dynamic Taint Analysis versus Obfuscated Self-CheckingProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485926(182-193)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3485832.3485926
Shafiq SMashkoor AMayr-Dorn CEgyed A(2021)A Literature Review of Using Machine Learning in Software Development Life Cycle StagesIEEE Access10.1109/ACCESS.2021.31197469(140896-140920)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3119746
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents