[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Statistical similarity of binaries

Published: 02 June 2016 Publication History

Abstract

We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed, Shellshock and Venom. We show that Esh produces high accuracy results, with few to no false positives -- a crucial factor in the scenario of vulnerability search in stripped binaries.

References

[1]
Clobberingtime: Cves, and a ffected products. http://www. kb.cert.org/vuls/id/852879.
[2]
Gnu coreutils. http://www.gnu.org/software/ coreutils.
[3]
Heartbleed vulnerability cve information. https: //cve.mitre.org/cgi-bin/cvename.cgi?name= CVE-2014-0160.
[4]
Hex-rays IDAPRO. http://www.hex-rays.com.
[5]
Smack: A bounded software verifier for c programs. https: //github.com/smackers/smack.
[6]
Venom vulnerability cve information. http://cve.mitre. org/cgi-bin/cvename.cgi?name=CVE-2015-3456.
[7]
zynamics bindi ff. http://www.zynamics.com/bindiff. html.
[8]
zynamics bindi ff manual - understanding bindiff. www.zynamics.com/bindiff/manual/index.html# chapUnderstanding.
[9]
Aiken, A. Moss. https://theory.stanford.edu/ ~aiken/moss/.
[10]
Barnett, M., Chang, B. E., DeLine, R., Jacobs, B., and Leino, K. R. M. Boogie: A modular reusable verifier for objectoriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures (2005), pp. 364–387.
[11]
Boiman, O., and Irani, M. Similarity by composition. In NIPS (2006), MIT Press, pp. 177–184.
[12]
Brumley, D., Jager, I., Avgerinos, T., and Schwartz, E. J. Bap: A binary analysis platformIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 463–469.
[13]
David, Y., and Yahav, E. Tracelet-based code search in executablesIn Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014), PLDI ’14, ACM, pp. 349–360.
[14]
Egele, M., Woo, M., Chapman, P., and Brumley, D. Blanket execution: Dynamic similarity testing for program binaries and components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014.
[15]
(2014), pp. 303–317.
[16]
Ferrante, J., Ottenstein, K. J., and Warren, J. D. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349.
[17]
Hawblitzel, C., Lahiri, S. K., Pawar, K., Hashmi, H., Gokbulut, S., Fernando, L., Detlefs, D., and Wadsworth, S. Will you still compile me tomorrow? static cross-version compiler validation. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC /FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013 (2013), pp. 191–201.
[18]
Jacobson, E. R., Rosenblum, N., and Miller, B. P. Labeling library functions in stripped binariesIn Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (2011), PASTE ’11, ACM, pp. 1–8.
[19]
Khoo, W. M., Mycroft, A., and Anderson, R. Rendezvous: A search engine for binary codeIn Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR ’13, IEEE Press, pp. 329–338.
[20]
Kleinbaum, D. G., and Klein, M. Analysis of Matched Data Using Logistic Regression. Springer, 2010.
[21]
Lahiri, S. K., Sinha, R., and Hawblitzel, C. Automatic rootcausing for program equivalence failures in binaries. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I (2015), pp. 362–379.
[22]
Lattner, C., and Adve, V. Llvm: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on (2004), IEEE, pp. 75–86.
[23]
Leino, K. R. M. This is boogie 2. http://research. microsoft.com/en-us/um/people/leino/papers/ krml178.pdf.
[24]
Ng, B. H., and Prakash, A. Expose: Discovering potential binary code re-use. In Computer Software and Applications Conference (COMPSAC), 2013 IEEE 37th Annual (July 2013), pp. 492–501.
[25]
Partush, N., and Yahav, E. Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, ch. Abstract Semantic Di fferencing for Numerical Programs, pp. 238–258.
[26]
Partush, N., and Yahav, E. Abstract semantic di fferencing via speculative correlation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014 (2014), pp. 811–828.
[27]
Pewny, J., Garmany, B., Gawlik, R., Rossow, C., and Holz, T. Cross-architecture bug search in binary executables. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015 (2015), pp. 709–724.
[28]
Pewny, J., Schuster, F., Bernhard, L., Holz, T., and Rossow, C. Leveraging semantic signatures for bug search in binary programsIn Proceedings of the 30th Annual Computer Security Applications Conference (2014), ACSAC ’14, ACM, pp. 406–415.
[29]
Ramos, D. A., and Engler, D. R. Practical, low-effort equivalence verification of real codeIn Proceedings of the 23rd International Conference on Computer Aided Verification (2011), CAV’11, Springer-Verlag, pp. 669–685.
[30]
Rosenblum, N., Miller, B. P., and Zhu, X. Recovering the toolchain provenance of binary codeIn Proceedings of the 2011 International Symposium on Software Testing and Analysis (2011), ISSTA ’11, ACM, pp. 100–110.
[31]
Sæbjørnsen, A., Willcock, J., Panas, T., Quinlan, D. J., and Su, Z. Detecting code clones in binary executables. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, IL, USA, July 19-23, 2009 (2009), pp. 117–128.
[32]
Sharma, R., Schkufza, E., Churchill, B., and Aiken, A. Datadriven equivalence checkingIn Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (2013), OOPSLA ’13, ACM, pp. 391–406.
[33]
Smith, R., and Horwitz, S. Detecting and measuring similarity in code clones. In Proceedings of the International Workshop on Software Clones (IWSC) (2009).
[34]
Swamidass, S. J., Azencott, C., Daily, K., and Baldi, P. A CROC stronger than ROC: measuring, visualizing and optimizing early retrieval. Bioinformatics 26, 10 (2010), 1348– 1356.
[35]
Weiser, M. Program slicing. In Proceedings of the 5th International Conference on Software Engineering, San Diego, California, USA, March 9-12, 1981.

Cited By

View all
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • (2024)Semantic aware-based instruction embedding for binary code similarity detectionPLOS ONE10.1371/journal.pone.030529919:6(e0305299)Online publication date: 11-Jun-2024
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 6
PLDI '16
June 2016
726 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2980983
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2016
    726 pages
    ISBN:9781450342612
    DOI:10.1145/2908080
    • General Chair:
    • Chandra Krintz,
    • Program Chair:
    • Emery Berger
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2016
Published in SIGPLAN Volume 51, Issue 6

Check for updates

Author Tags

  1. partial equivalence
  2. static binary analysis
  3. statistical similarity
  4. verification-aided similarity

Qualifiers

  • Article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)19
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNNChinese Journal of Electronics10.23919/cje.2022.00.22833:1(128-138)Online publication date: Jan-2024
  • (2024)Semantic aware-based instruction embedding for binary code similarity detectionPLOS ONE10.1371/journal.pone.030529919:6(e0305299)Online publication date: 11-Jun-2024
  • (2024)Vulnerabilities and Security Patches Detection in OSS: A SurveyACM Computing Surveys10.1145/369478257:1(1-37)Online publication date: 9-Sep-2024
  • (2024)CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction TechniquesProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657572(143-154)Online publication date: 20-Jun-2024
  • (2024)CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity DetectionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652117(149-161)Online publication date: 11-Sep-2024
  • (2024)R2I: A Relative Readability Metric for Decompiled CodeProceedings of the ACM on Software Engineering10.1145/36437441:FSE(383-405)Online publication date: 12-Jul-2024
  • (2024)Dynamic Neural Control Flow Execution: an Agent-Based Deep Equilibrium Approach for Binary Vulnerability DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679726(1215-1225)Online publication date: 21-Oct-2024
  • (2024)A Semantics-Based Approach on Binary Function Similarity DetectionIEEE Internet of Things Journal10.1109/JIOT.2024.338901411:15(25910-25924)Online publication date: 1-Aug-2024
  • (2024)Are We There Yet? Filling the Gap Between Binary Similarity Analysis and Binary Software Composition Analysis2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P)10.1109/EuroSP60621.2024.00034(506-523)Online publication date: 8-Jul-2024
  • (2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media