[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection

Published: 02 September 2021 Publication History

Abstract

Recent work has shown that adversarial Windows malware samples—referred to as adversarial EXEmples in this article—can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable file format. These attacks, named Full DOS, Extend, and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better tradeoff in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

References

[1]
Hojjat Aghakhani, Fabio Gritti, Francesco Mecca, Martina Lindorfer, Stefano Ortolani, Davide Balzarotti, Giovanni Vigna, and Christopher Kruegel. 2020. When malware is packin’heat; limits of machine learning classifiers based on static analysis features. In Proceedings of the Network and Distributed Systems Security (NDSS’20) Symposium 202
[2]
Hyrum S. Anderson, Anant Kharkar, Bobby Filar, and Phil Roth. 2017. Evading machine learning malware detection. Black Hat (2017).
[3]
Hyrum S. Anderson and Phil Roth. 2018. Ember: An open dataset for training static pe malware machine learning models. arXiv:1804.04637. Retrieved from https://arxiv.org/abs/1804.04637.
[4]
B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli. 2013. Evasion attacks against machine learning at test time. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’13), Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný (Eds.), Lecutre Notes in Computer Science, Vol. 8190. Springer, Berlin, 387–402.
[5]
Battista Biggio, Giorgio Fumera, and Fabio Roli. 2014. Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26, 4 (Apr. 2014), 984–996.
[6]
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP’17). IEEE, 39–57.
[7]
Raphael Labaca Castro, Corinna Schmitt, and Gabi Dreo. 2019. AIMED: Evolving malware with genetic programming to evade detection. In Proceedings of the 18th IEEE International Conference on Trust, Security and Privacy in Computing And Communications/13th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE’19). IEEE, 240–247.
[8]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. 2017. ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec’17). ACM, New York, NY, 15–26.
[9]
Scott E. Coull and Christopher Gardner. 2019. Activation analysis of a byte-based deep neural network for malware classification. In Proceedings of the 2019 IEEE Security and Privacy Workshops (SPW). IEEE, 21–27.
[10]
Omid E. David and Nathan S. Netanyahu. 2015. Deepsign: Deep learning for automatic malware signature generation and classification. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
[11]
Luca Demetrio and Battista Biggio. 2021. secml-malware: A Python Library for adversarial robustness evaluation of Windows malware classifiers. arXiv:cs.CR/2104.12848. Retrieved from https://arxiv.org/abs/cs.CR/2104.12848.
[12]
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2019. Explaining vulnerabilities of deep learning to adversarial malware binaries. In Proceedings of the 3rd Italian Conference on CyberSecurity (ITASEC’19).
[13]
Luca Demetrio, Battista Biggio, Giovanni Lagorio, Fabio Roli, and Alessandro Armando. 2021. Functionality-preserving black-box optimization of adversarial Windows malware. IEEE Transactions on Information Forensics and Security 16 (2021), 3469–3478.
[14]
Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. 2019. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security’19). USENIX Association.
[15]
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. In Proceedings of the 3th International Conference on Learning Representations (ICLR’15).
[16]
William Hardy, Lingwei Chen, Shifu Hou, Yanfang Ye, and Xin Li. 2016. DL4MD: A deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN’16). The Steering Committee of The World Congress in Computer Science, 61.
[17]
L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D. Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Artificial Intelligence and Security (AISec’11). 43–57.
[18]
Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning (ICML’18), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 2137–2146.
[19]
Inigo Incer, Michael Theodorides, Sadia Afroz, and David Wagner. 2018. Adversarially robust malware detection using monotonic classification. In Proceedings of the 4th ACM International Workshop on Security and Privacy Analytics. ACM, 54–63.
[20]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in Neural Information Processing Systems. 971–980.
[21]
Bojan Kolosnjaji, Ambra Demontis, Battista Biggio, Davide Maiorca, Giorgio Giacinto, Claudia Eckert, and Fabio Roli. 2018. Adversarial malware binaries: Evading deep learning for malware detection in executables. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO’18). IEEE, 533–537.
[22]
Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. 2016. Deep learning for classification of malware system call sequences. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 137–149.
[23]
Marek Krčál, Ondřej Švec, Martin Bálek, and Otakar Jašek. 2018. Deep convolutional malware classifiers can learn from raw executables and labels only. Proceedings of the 6th International Conference on Learning Representations (ICLR’18) Workshop.
[24]
Felix Kreuk, Assi Barak, Shir Aviv-Reuven, Moran Baruch, Benny Pinkas, and Joseph Keshet. 2018. Deceiving end-to-end deep learning malware detectors using adversarial examples. In Workshop on Security in Machine Learning (NeurIPS).
[25]
Stefano Melacci, Gabriele Ciravegna, Angelo Sotgiu, Ambra Demontis, Battista Biggio, Marco Gori, and Fabio Roli. 2020. Can domain knowledge alleviate adversarial attacks in multi-label classifiers? arXiv:cs.LG/2006.03833. Retrieved from https://arxiv.org/abs/cs.LG/2006.03833.
[26]
Marco Melis, Ambra Demontis, Maura Pintor, Angelo Sotgiu, and Battista Biggio. 2019. secml: A Python library for secure and explainable machine learning. arXiv:cs.LG/1912.10013. Retrieved from https://arxiv.org/abs/cs.LG/1912.10013.
[27]
Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. 2016. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv:1605.07277. Retrieved from https://arxiv.org/abs/1605.07277.
[28]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the ACM on Asia Conference on Computer and Communications Security (ASIA CCS’17). ACM, New York, NY, 506–519.
[29]
Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, and Lorenzo Cavallaro. 2020. Intriguing properties of adversarial ML attacks in the problem space. In Proceedings of the IEEE Symposium on Security and Privacy (SP’20). IEEE, 1332–1349.
[30]
Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles K. Nicholas. 2018. Malware detection by eating a whole exe. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.
[31]
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In Proceedings of the 2015 10th International Conference on Malicious and Unwanted Software (MALWARE’15). IEEE, 11–20.
[32]
Mahmood Sharif, Keane Lucas, Lujo Bauer, Michael K. Reiter, and Saurabh Shintre. 2019. Optimization-guided binary diversification to mislead neural networks for malware detection. arXiv:1912.09064. Retrieved from https://arxiv.org/abs/1912.09064.
[33]
Wei Song, Xuezixiang Li, Sadia Afroz, Deepali Garg, Dmitry Kuznetsov, and Heng Yin. 2020. Automatic generation of adversarial examples for interpreting malware classifiers. arXiv:2003.03100. Retrieved from https://arxiv.org/abs/2003.03100.
[34]
Octavian Suciu, Scott E. Coull, and Jeffrey Johns. 2019. Exploring adversarial examples in malware detection. In Proceedings of the 2019 IEEE Security and Privacy Workshops (SPW’19). IEEE, 8–14.
[35]
Octavian Suciu, Radu Marginean, Yigitcan Kaya, Hal Daume III, and Tudor Dumitras. 2018. When does machine learning FAIL?: Generalized transferability for evasion and poisoning attacks. In Proceedings of the 27th USENIX Security Symposium (USENIX Security’18). USENIX Association, 1299–1316.
[36]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations.
[37]
Matthias Wenzl, Georg Merzdovnik, Johanna Ullrich, and Edgar Weippl. 2019. From hack to elaborate technique—a survey on binary rewriting. ACM Comput. Surv. 52, 3 (2019), 1–37.
[38]
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. J. Mach. Learn. Res. 15 (2014), 949–980.

Cited By

View all
  • (2025)Practical clean-label backdoor attack against static malware detectionComputers & Security10.1016/j.cose.2024.104280150(104280)Online publication date: Mar-2025
  • (2025)SLIFER: Investigating performance and robustness of malware detection pipelinesComputers & Security10.1016/j.cose.2024.104264150(104264)Online publication date: Mar-2025
  • (2024)Evolving AI-Based Malware DetectionMachine Intelligence Applications in Cyber-Risk Management10.4018/979-8-3693-7540-2.ch007(135-158)Online publication date: 22-Nov-2024
  • Show More Cited By

Index Terms

  1. Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Privacy and Security
      ACM Transactions on Privacy and Security  Volume 24, Issue 4
      November 2021
      295 pages
      ISSN:2471-2566
      EISSN:2471-2574
      DOI:10.1145/3476876
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 September 2021
      Accepted: 01 June 2021
      Revised: 01 March 2021
      Received: 01 August 2020
      Published in TOPS Volume 24, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Adversarial examples
      2. evasion
      3. malware detection
      4. semantics-invariant manipulations

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • ALHOA
      • RexLearn

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)311
      • Downloads (Last 6 weeks)46
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Practical clean-label backdoor attack against static malware detectionComputers & Security10.1016/j.cose.2024.104280150(104280)Online publication date: Mar-2025
      • (2025)SLIFER: Investigating performance and robustness of malware detection pipelinesComputers & Security10.1016/j.cose.2024.104264150(104264)Online publication date: Mar-2025
      • (2024)Evolving AI-Based Malware DetectionMachine Intelligence Applications in Cyber-Risk Management10.4018/979-8-3693-7540-2.ch007(135-158)Online publication date: 22-Nov-2024
      • (2024)Evaluating Realistic Adversarial Attacks against Machine Learning Models for Windows PE Malware DetectionFuture Internet10.3390/fi1605016816:5(168)Online publication date: 12-May-2024
      • (2024)When Adversarial Perturbations meet Concept Drift: An Exploratory Analysis on ML-NIDSProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694757(149-160)Online publication date: 6-Nov-2024
      • (2024)How to Train your Antivirus: RL-based Hardening through the Problem SpaceProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678912(130-146)Online publication date: 30-Sep-2024
      • (2024)CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language ModelsACM Transactions on Privacy and Security10.1145/367800727:3(1-26)Online publication date: 12-Jul-2024
      • (2024)AdverSPAM: Adversarial SPam Account Manipulation in Online Social NetworksACM Transactions on Privacy and Security10.1145/364356327:2(1-31)Online publication date: 14-Mar-2024
      • (2024)It Is All about Data: A Survey on the Effects of Data on Adversarial RobustnessACM Computing Surveys10.1145/362781756:7(1-41)Online publication date: 9-Apr-2024
      • (2024)Efficient Malware Analysis Using Metric EmbeddingsDigital Threats: Research and Practice10.1145/36156695:1(1-20)Online publication date: 21-Mar-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media