[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3338503.3357719acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Defeating Opaque Predicates Statically through Machine Learning and Binary Analysis

Published: 15 November 2019 Publication History

Abstract

We present a new approach that bridges binary analysis techniques with machine learning classification for the purpose of providing a static and generic evaluation technique for opaque predicates, regardless of their constructions. We use this technique as a static automated deobfuscation tool to remove the opaque predicates introduced by obfuscation mechanisms. According to our experimental results, our models have up to 98% accuracy at detecting and deobfuscating state-of-the-art opaque predicates patterns. By contrast, the leading edge deobfuscation methods based on symbolic execution show less accuracy mostly due to the SMT solvers constraints and the lack of scalability of dynamic symbolic analyses. Our approach underlines the efficiency of hybrid symbolic analysis and machine learning techniques for a static and generic deobfuscation methodology.

References

[1]
The Algorithms. [n. d.]. C. https://github.com/TheAlgorithms/C/. [Online; accessed 30-01-2019].
[2]
Roberto Baldoni, Emilio Coppa, Daniele Cono D'Elia, Camil Demetrescu, and Irene Finocchi. 2018. Survey of Symbolic Execution Techniques. ACM Comput. Surv., Vol. 51, 3 (2018), 50:1--50:39. https://doi.org/10.1145/3182657
[3]
Sebastian Banescu, Christian S. Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, USA. 189--200. http://dl.acm.org/citation.cfm?id=2991114
[4]
Sé bastien Bardin, Robin David, and Jean-Yves Marion. 2017. Backward-Bounded DSE: Targeting Infeasibility Questions on Obfuscated Codes. In 2017 IEEE Symposium on Security and Privacy, SP 2017, USA. 633--651. https://doi.org/10.1109/SP.2017.36
[5]
Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2017. The SMT-LIB Standard: Version 2.6. Technical Report. Department of Computer Science, The University of Iowa. Available at www.SMT-LIB.org.
[6]
Fabrizio Biondi, Sé bastien Josse, Axel Legay, and Thomas Sirvent. 2017. Effectiveness of synthesis in concolic deobfuscation. Computers & Security, Vol. 70 (2017), 500--515. https://doi.org/10.1016/j.cose.2017.07.006
[7]
Fabrizio Biondi, Sébastien Josse, Axel Legay, and Thomas Sirvent. 2017. Effec-tiveness of synthesis in concolic deobfuscation.Computers & Security 70 (2017),500--515. https://doi.org/10.1016/j.cose.2017.07.006
[8]
Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. 2017. Syntia: Synthesizing the Semantics of Obfuscated Code. In 26th USENIX Security Symposium, USENIX Security 2017, Canada. 643--659. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/blazytko
[9]
Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting Self-mutating Malware Using Control-Flow Graph Matching. In Detection of Intrusions and Malware & Vulnerability Assessment, Third International Conference, DIMVA 2006 Proceedings, Germany. 129--143. https://doi.org/10.1007/11790754_8
[10]
Christian Collberg, Sam Martin, Jonathan Myers, Bill Zimmerman, Petr Krajca, Gabriel Kerneis, Saumya Debray, and Babak Yadegari. [n. d.]. The Tigress C Diversifier/Obfuscator. http://tigress.cs.arizona.edu/index.html. [Online; accessed 30-01--2019].
[11]
Christian Collberg, Clark Thomborson, and Douglas Low. 1997. A Taxonomy of Obfuscating Transformations.
[12]
Christian S. Collberg, Clark D. Thomborson, and Douglas Low. 1998. Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. In POPL '98, USA. 184--196. https://doi.org/10.1145/268946.268962
[13]
Brad Conte. [n. d.]. crypto-algorithms. https://github.com/B-Con/crypto-algorithms. [Online; accessed 30-01--2019].
[14]
Fabrice Desclaux. 2012. Miasm: Framework de reverse engineering. https://github.com/cea-sec/miasm. [Online; accessed 30-01-2019].
[15]
Thomas G. Dietterich. 1995. Overfitting and Undercomputing in Machine Learning. ACM Comput. Surv., Vol. 27, 3 (1995), 326--327. https://doi.org/10.1145/212094.212114
[16]
Ninon Eyrolles, Louis Goubin, and Marion Videau. 2016. Defeating MBA-based Obfuscation. In Proceedings of the 2016 ACM Workshop on Software PROtection, SPRO@CCS 2016, Austria. 27--38. https://doi.org/10.1145/2995306.2995308
[17]
Rosa L. Figueroa, Qing Zeng-Treitler, Sasikiran Kandula, and Long H. Ngo. 2012. Predicting sample size required for classification performance. BMC Med. Inf. & Decision Making, Vol. 12 (2012), 8.
[18]
Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction, 2nd Edition. Springer. http://www.worldcat.org/oclc/300478243
[19]
Hex-Rays. [n. d.]. IDA Pro : Interactive DisAssembler. https://www.hex-rays.com/products/ida/index.shtml. [Online; accessed 30-01--2019].
[20]
Simon Howard. [n. d.]. c-algorithms. https://github.com/fragglet/c-algorithms. [Online; accessed 30-01-2019].
[21]
Mike James. 1985. Classification Algorithms. Wiley-Interscience, USA.
[22]
Karen SpĠrck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 60, 5 (2004), 493--502. https://doi.org/10.1108/00220410410560573
[23]
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM -- Software Protection for the Masses. In Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO'15, Firenze, Italy, May 19th, 2015, Brecht Wyseur (Ed.). IEEE, 3--9. https://doi.org/10.1109/SPRO.2015.10
[24]
Brian W. Kernighan. 1988. The C Programming Language 2nd ed.). Prentice Hall Professional Technical Reference.
[25]
Ron Kohavi. 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, IJCAI 95, Canada. 1137--1145. http://ijcai.org/Proceedings/95--2/Papers/016.pdf
[26]
Sotiris B. Kotsiantis. 2007. Supervised Machine Learning: A Review of Classification Techniques. Informatica (Slovenia), Vol. 31, 3 (2007), 249--268. http://www.informatica.si/index.php/informatica/article/view/148
[27]
Aleksandrina Kovacheva. 2013. Efficient Code Obfuscation for Android. In Advances in Information Technology - 6th International Conference, IAIT 2013, Thailand. 104--119. https://doi.org/10.1007/978-3-319-03783-7_10
[28]
Arun Lakhotia, Eric Uday Kumar, and Michael Venable. 2005. A Method for Detecting Obfuscated Calls in Malicious Binaries. IEEE Trans. Software Eng., Vol. 31, 11 (2005), 955--968. https://doi.org/10.1109/TSE.2005.120
[29]
Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. LOOP: Logic-Oriented Opaque Predicate Detection in Obfuscated Binary Code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, USA, October 12-6, 2015. 757--768. https://doi.org/10.1145/2810103.2813617
[30]
Ginger Myles and Christian S. Collberg. 2006. Software watermarking via opaque predicates: Implementation, analysis, and attacks. Electronic Commerce Research, Vol. 6, 2 (2006), 155--171. https://doi.org/10.1007/s10660-006-6955-z
[31]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[32]
Mila Dalla Preda, Matias Madou, Koen De Bosschere, and Roberto Giacobazzi. 2006. Opaque Predicates Detection by Abstract Interpretation. In Algebraic Methodology and Software Technology, 11th International Conference, AMAST 2006, Estonia. 81--95. https://doi.org/10.1007/11784180_9
[33]
GNU Project. 2002. GNU Core Utilities. https://www.gnu.org/software/coreutils/. [Online; accessed 30-01-2019].
[34]
Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. 2011. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, Vol. 19, 4 (2011), 639--668.
[35]
Thomas Rinsma. 2017. Seeing through obfuscation: interactive detection and removal of opaque predicates. https://github.com/Riscure/DROP-IDA-plugin. [Online; accessed 30-01--2019].
[36]
Lior Rokach and Oded Maimon. 2014. Data Mining With Decision Trees: Theory and Applications 2nd ed.). World Scientific Publishing Co., Inc., USA.
[37]
Guido Rossum. 1995. Python Reference Manual. Technical Report. Amsterdam, The Netherlands, The Netherlands.
[38]
Aleieldin Salem and Sebastian Banescu. 2016. Metadata recovery from obfuscated programs using machine learning. In Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering, SSPREW 2016, USA, 2016. 1:1--1:11. https://doi.org/10.1145/3015135.3015136
[39]
Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar R. Weippl. 2016. Protecting Software through Obfuscation: Can It Keep Pace with Progress in Code Analysis? ACM Comput. Surv., Vol. 49, 1 (2016), 4:1--4:37. https://doi.org/10.1145/2886012
[40]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy.
[41]
Bjorn De Sutter, Cataldo Basile, Mariano Ceccato, Paolo Falcarin, Michael Zunke, Brecht Wyseur, and Jé rô me d'Annoville. 2016. The ASPIRE Framework for Software Protection. In Proceedings of the 2016 ACM Workshop on Software PROtection, SPRO@CCS 2016, Vienna, Austria, October 24-28, 2016, Brecht Wyseur and Bjorn De Sutter (Eds.). ACM, 91--92. https://doi.org/10.1145/2995306.2995316
[42]
Ramtine Tofighi-Shirazi, Maria Christofi, Philippe Elbaz-Vincent, and Thanh Ha Le. 2018. DoSE: Deobfuscation based on Semantic Equivalence. In Proceedings of the 8th Software Security, Protection, and Reverse Engineering Workshop, USA. 1:1--1:12. https://doi.org/10.1145/3289239.3289243
[43]
Sharath K. Udupa, Saumya K. Debray, and Matias Madou. 2005. Deobfuscation: Reverse Engineering Obfuscated Code. In 12th Working Conference on Reverse Engineering, WCRE 2005, USA. 45--54. https://doi.org/10.1109/WCRE.2005.13
[44]
Hui Xu, Yangfan Zhou, Yu Kang, Fengzhi Tu, and Michael R. Lyu. 2018. Manufacturing Resilient Bi-Opaque Predicates Against Symbolic Execution. In 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, Luxembourg. 666--677. https://doi.org/10.1109/DSN.2018.00073
[45]
Yongxin Zhou, Alec Main, Yuan Xiang Gu, and Harold Johnson. 2007. Information Hiding in Software with Mixed Boolean-Arithmetic Transforms. In Information Security Applications, 8th International Workshop, WISA 2007, Korea. 61--75. https://doi.org/10.1007/978-3-540-77535-5_5

Cited By

View all
  • (2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
  • (2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
  • (2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPRO'19: Proceedings of the 3rd ACM Workshop on Software Protection
November 2019
87 pages
ISBN:9781450368353
DOI:10.1145/3338503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deobfuscation
  2. machine learning
  3. obfuscation
  4. opaque predicate
  5. software protection
  6. symbolic execution

Qualifiers

  • Research-article

Funding Sources

  • French National Research Agency

Conference

CCS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 8 of 14 submissions, 57%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)9
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation Methodologies in Software Protection ResearchACM Computing Surveys10.1145/3702314Online publication date: 2-Nov-2024
  • (2024)Control-Flow Deobfuscation using Trace-Informed Compositional Program SynthesisProceedings of the ACM on Programming Languages10.1145/36897898:OOPSLA2(2211-2241)Online publication date: 8-Oct-2024
  • (2024)Two-Level Software Obfuscation with Cooperative Co-Evolutionary Algorithms2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612116(1-8)Online publication date: 30-Jun-2024
  • (2023)PELICANProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620370(2365-2382)Online publication date: 9-Aug-2023
  • (2023)ROPfuscator: Robust Obfuscation with ROP2023 IEEE Security and Privacy Workshops (SPW)10.1109/SPW59333.2023.00026(1-10)Online publication date: May-2023
  • (2023)xVMP: An LLVM-based Code Virtualization Obfuscator2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00082(738-742)Online publication date: Mar-2023
  • (2022)DFSGraph: Data Flow Semantic Model for Intermediate Representation Programs Based on Graph NetworkElectronics10.3390/electronics1119323011:19(3230)Online publication date: 8-Oct-2022
  • (2022)Generic O-LLVM Automatic Multi-architecture Deobfuscation Framework Based on Symbolic ExecutionProceedings of the 4th International Conference on Advanced Information Science and System10.1145/3573834.3574541(1-6)Online publication date: 25-Nov-2022
  • (2021)Dynamic Taint Analysis versus Obfuscated Self-CheckingProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485926(182-193)Online publication date: 6-Dec-2021
  • (2021)A Literature Review of Using Machine Learning in Software Development Life Cycle StagesIEEE Access10.1109/ACCESS.2021.31197469(140896-140920)Online publication date: 2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media