[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3628797.3628996acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Binary Representation Embedding and Deep Learning For Binary Code Similarity Detection in Software Security Domain

Published: 07 December 2023 Publication History

Abstract

Binary Code Similarity Detection (BCSD) is the process of analyzing the binary representations of two functions, programs, or related entities to generate a quantitative output that signifies the similarity score between them. This task encompasses a wide range of applications, including addressing the binary search problem, which involves searching for code segments within a binary file that are similar to a specified binary code segment. These capabilities open up numerous potential applications within the domain of binary code analysis such as software vulnerability detection, clone detection, and malware analysis. In this paper, we introduce BiSim-Inspector, a BCSD tool based on Deep Learning (DL). This tool leverages the Bytes2vec method, which we develop to transform the bytecode of binary functions into vectors, which are then fed into the Convolutional Neural Network - Gated Recurrent Unit (CNN-GRU) model. Additionally, we conducted a series of experiments to assess the effectiveness of our method by comparing it with existing state-of-the-art (SOTA) tools. We use a large-scale, well-structured, and diversified dataset, BinaryCorp, for the task of BCSD. The results show that our framework achieves a Recall rate of 89%, which is 25% higher than existing SOTA methods, without compromising the training and prediction time.

References

[1]
Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2013. Control flow-based malware variantdetection. IEEE Transactions on Dependable and Secure Computing 11, 4 (2013), 307–317.
[2]
Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical similarity of binaries. Acm sigplan notices 51, 6 (2016), 266–280.
[3]
Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. In Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation. 79–94.
[4]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. Firmup: Precise static detection of common vulnerabilities in firmware. ACM SIGPLAN Notices 53, 2 (2018), 392–404.
[5]
Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 472–489.
[6]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A semantic learning based vulnerability seeker for cross-platform binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 896–899.
[7]
Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. 63–72.
[8]
Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary code clone detection across architectures and compiling configurations. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 88–98.
[9]
Jiyong Jang, Abeer Agrawal, and David Brumley. 2012. ReDeBug: finding unpatched code clones in entire os distributions. In 2012 IEEE Symposium on Security and Privacy. IEEE, 48–62.
[10]
Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning. PMLR, 3835–3845.
[11]
Jian Lin, Dingding Wang, Rui Chang, Lei Wu, Yajin Zhou, and Kui Ren. 2021. Enbindiff: Identifying data-only patches for binaries. IEEE Transactions on Dependable and Secure Computing (2021).
[12]
Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. 2018. α diff: cross-version binary code similarity detection with dnn. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 667–678.
[13]
Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 389–400.
[14]
Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, and Kai Lu. 2023. VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search. In NDSS.
[15]
Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16. Springer, 309–329.
[16]
Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. 2015. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy. IEEE, 709–724.
[17]
Noam Shalev and Nimrod Partush. 2018. Binary similarity detection using machine learning. In Proceedings of the 13th Workshop on Programming Languages and Analysis for Security. 42–47.
[18]
Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. (2015).
[19]
Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13.
[20]
Fei Xiao, Zhaowen Lin, Yi Sun, and Yan Ma. 2019. Malware detection based on deep learning of behavior graphs. Mathematical Problems in Engineering 2019 (2019), 1–10.
[21]
Ban Xiaofang, Chen Li, Hu Weihua, and Wu Qu. 2014. Malware variant detection using similarity search over content fingerprint. In The 26th Chinese Control and Decision Conference (2014 CCDC). IEEE, 5334–5339.
[22]
Zhengzi Xu, Bihuan Chen, Mahinthan Chandramohan, Yang Liu, and Fu Song. 2017. Spain: security patch analysis for binaries towards understanding the pain and pills. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 462–472.
[23]
Lu Yu, Yuliang Lu, Yi Shen, Hui Huang, and Kailong Zhu. 2021. Bedetector: A two-channel encoding method to detect vulnerabilities based on binary similarity. IEEE Access 9 (2021), 51631–51645.
[24]
Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. 2020. Order matters: Semantic-aware neural networks for binary code similarity detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1145–1152.

Index Terms

  1. Binary Representation Embedding and Deep Learning For Binary Code Similarity Detection in Software Security Domain

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology
    December 2023
    1058 pages
    ISBN:9798400708916
    DOI:10.1145/3628797
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 December 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Binary Code Similarity Detection
    2. Deep Learning
    3. Software Security

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SOICT 2023

    Acceptance Rates

    Overall Acceptance Rate 147 of 318 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 100
      Total Downloads
    • Downloads (Last 12 months)77
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media