[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2950290.2983962acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
abstract

Fine-grained binary code authorship identification

Published: 01 November 2016 Publication History

Abstract

Binary code authorship identification is the task of determining the authors of a piece of binary code from a set of known authors. Modern software often contains code from multiple authors. However, existing techniques assume that each program binary is written by a single author. We present a new finer-grained technique to the tougher problem of determining the author of each basic block. Our evaluation shows that our new technique can discriminate the author of a basic block with 52% accuracy among 282 authors, as opposed to 0.4% accuracy by random guess, and it provides a practical solution for identifying multiple authors in software.

References

[1]
A. Abbasi, W. Li, V. Benjamin, S. Hu, and H. Chen. Descriptive analytics: Examining expert hackers in web forums. In 2014 IEEE Joint Intelligence and Security Informatics Conference (JISIC), Hague, Netherlands, Sep. 2014.
[2]
S. Alrabaee, N. Saleem, S. Preda, L. Wang, and M. Debbabi. Oba2: An onion approach to binary code authorship attribution. Digital Investigation, 11, Supplement 1:S94 – S103, May 2014.
[3]
Apache Software Foundation. Apache http server, http://httpd.apache.org.
[4]
V. Benjamin and H. Chen. Securing cyberspace: Identifying key actors in hacker communities. In 2012 IEEE International Conference on Intelligence and Security Informatics (ISI), Arlington, VA, USA, June 2012.
[5]
S. Burrows. Source code authorship attribution. PhD thesis, Melbourne, Victoria, Australia, RMIT University, 2010.
[6]
A. Caliskan-Islam, R. Harang, A. Liu, A. Narayanan, C. Voss, F. Yamaguchi, and R. Greenstadt. De-anonymizing programmers via code stylometry. In 24th USENIX Security Symposium (USENIX Security), Washington, D.C., Aug. 2015.
[7]
A. Caliskan-Islam, F. Yamaguchi, E. Dauber, R. Harang, K. Rieck, R. Greenstadt, and A. Narayanan. When coding style survives compilation: De-anonymizing programmers from executable binaries. http://arxiv.org/pdf/1512.08546.pdf, Dec. 2015.
[8]
E. Chatzicharalampous, G. Frantzeskou, and E. Stamatatos. Author identification in imbalanced sets of source code samples. In 2012 IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), pages 790–797, Athens, Greece, Nov. 2012.
[9]
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3), Sep. 1995.
[10]
F. de la Cuadra. The geneology of malware. Network Security, 4:17–20, May 2007.
[11]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, June 2008.
[12]
G. Frantzeskou, E. Stamatatos, S. Gritzalis, C. E. Chaski, and B. S. Howald. Identifying authorship by byte-level n-grams: The source code author profile (scap) method. International Journal of Digital Evidence, 6(1):1–18, 2007.
[13]
GNU Project. Gcc: The gnu compiler collection, http://gcc.gnu.org.
[14]
Hex-Rays. IDA, https://www.hex-rays.com/products/ida/.
[15]
T. K. Ho. Random decision forests. In 3rd International Conference on Document Analysis and Recognition (ICDAR), Montreal, Canada, Aug. 1995.
[16]
T. J. Holt, D. Strumsky, O. Smirnova, and M. Kilger. Examining the social networks of malware writers and hackers. International Journal of Cyber Criminology, 6(1):891–903, Jan. 2012.
[17]
R. C. Lange and S. Mancoridis. Using code metric histograms and genetic algorithms to perform author identification for software forensics. In 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), London, England, July 2007.
[18]
M. Lindorfer, A. Di Federico, F. Maggi, P. M. Comparetti, and S. Zanero. Lines of malicious code: Insights into the malicious software industry. In 28th Annual Computer Security Applications Conference (ACSAC), Orlando, Florida, USA, Dec. 2012.
[19]
Mandiant. Mandiant 2013 Threat Report. https://www2.fireeye.com/ WEB-2013-MNDT-RPT-M-Trends-2013 LP.html, 2013. Mandiant White Paper.
[20]
M. Marquis-Boire, M. Marschalek, and C. Guarnieri. Big game hunting: The peculiarities in nation-state malware research. In Black Hat, Las Vegas, NV, USA, Aug. 2015.
[21]
X. Meng, B. P. Miller, W. R. Williams, and A. R. Bernat. Mining software repositories for accurate authorship. In 2013 IEEE International Conference on Software Maintenance (ICSM), Eindhoven, Netherlands, Sep. 2013.
[22]
N. Moran and J. Bennett. Supply chain analysis: From quartermaster to sunshop. https://www.fireeye.com/ content/dam/fireeye-www/global/en/current-threats/ pdfs/rpt-malware-supply-chain.pdf, Nov. 2013. FireEye Labs White Paper.
[23]
G. O’Gorman and G. McDonald. The elderwood project. http://www.symantec.com/content/en/us/ enterprise/media/security response/whitepapers/ the-elderwood-project.pdf, Sep. 2012. Symantec White Paper.
[24]
Paradyn Project. Dyninst: Putting the Performance in High Performance Computing, http://www.dyninst.org.
[25]
A. Rahimian, P. Shirani, S. Alrbaee, L. Wang, and M. Debbabi. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation, 14, Supplement 1, Aug. 2015.
[26]
R. Roberts. Malware development life cycle. In Virus Bulletin Conference (VB), Oct. 2008.
[27]
N. Rosenblum, B. P. Miller, and X. Zhu. Recovering the toolchain provenance of binary code. In 2011 International Symposium on Software Testing and Analysis (ISSTA), Toronto, Ontario, Canada, July 2011.
[28]
N. Rosenblum, X. Zhu, and B. P. Miller. Who wrote this code? identifying the authors of program binaries. In 16th European Conference on Research in Computer Security (ESORICS), Leuven, Belgium, Sep. 2011.
[29]
B. Ruttenberg, C. Miles, L. Kellogg, V. Notani, M. Howard, C. LeDoux, A. Lakhotia, and A. Pfeffer. Identifying shared software components to support malware forensics. In 11th Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), Egham, London, UK, July 2014.
[30]
M. F. Tennyson. On improving authorship attribution of source code. In 4th International Conference Digital Forensics and Cyber Crime (ICDF2C), Lafayette, IN, USA, Oct. 2012.

Cited By

View all
  • (2024)Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive SurveyInformation10.3390/info1503013115:3(131)Online publication date: 28-Feb-2024
  • (2024)VeriBin: A Malware Authorship Verification Approach for APT Tracking through Explainable and Functionality-Debiasing Adversarial Representation LearningACM Transactions on Privacy and Security10.1145/366990127:3(1-37)Online publication date: 20-Jul-2024
  • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
  • Show More Cited By

Index Terms

  1. Fine-grained binary code authorship identification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
    November 2016
    1156 pages
    ISBN:9781450342186
    DOI:10.1145/2950290
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2016

    Check for updates

    Author Tags

    1. Basic block level
    2. Code features
    3. Software forensics

    Qualifiers

    • Abstract

    Conference

    FSE'16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 17 of 128 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive SurveyInformation10.3390/info1503013115:3(131)Online publication date: 28-Feb-2024
    • (2024)VeriBin: A Malware Authorship Verification Approach for APT Tracking through Explainable and Functionality-Debiasing Adversarial Representation LearningACM Transactions on Privacy and Security10.1145/366990127:3(1-37)Online publication date: 20-Jul-2024
    • (2024)Identifying Authorship in Malicious Binaries: Features, Challenges & DatasetsACM Computing Surveys10.1145/365397356:8(1-36)Online publication date: 26-Mar-2024
    • (2023)SCS-Gan: Learning Functionality-Agnostic Stylometric Representations for Source Code Authorship VerificationIEEE Transactions on Software Engineering10.1109/TSE.2022.317722849:4(1426-1442)Online publication date: 1-Apr-2023
    • (2022)BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00120(1023-1033)Online publication date: Mar-2022
    • (2021)Authorship attribution of source code: a language-agnostic approach and applicability in software engineeringProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468606(932-944)Online publication date: 20-Aug-2021
    • (2021)Binary Code Authorship Identification with Neural Representation LearningAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-70665-4_153(1407-1415)Online publication date: 27-Jun-2021
    • (2020)Towards Attribution in Mobile MarketsProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security10.1145/3372297.3417281(771-785)Online publication date: 30-Oct-2020
    • (2020) A 3 Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME46990.2020.00064(617-628)Online publication date: Sep-2020
    • (2020)Optimizing Multi-class Classification of Binaries Based on Static FeaturesMalware Analysis Using Artificial Intelligence and Deep Learning10.1007/978-3-030-62582-5_9(249-268)Online publication date: 21-Dec-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media