[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

An Efficient Platform for the Automatic Extraction of Patterns in Native Code

Published: 01 February 2017 Publication History

Abstract

Different software tools, such as decompilers, code quality analyzers, recognizers of packed executable files, authorship analyzers, and malware detectors, search for patterns in binary code. The use of machine learning algorithms, trained with programs taken from the huge number of applications in the existing open source code repositories, allows finding patterns not detected with the manual approach. To this end, we have created a versatile platform for the automatic extraction of patterns from native code, capable of processing big binary files. Its implementation has been parallelized, providing important runtime performance benefits for multicore architectures. Compared to the single-processor execution, the average performance improvement obtained with the best configuration is 3.5 factors over the maximum theoretical gain of 4 factors.

References

[1]
Defense Advanced Research Projects Agency, MUSE envisions mining “big code” to improve software reliability and construction, 2014, <ext-link ext-link-type="url">http://www.darpa.mil/news-events/2014-03-06a
[2]
Ortin F., Escalada J., Rodriguez-Prieto O., Big code: new opportunities for improving software construction Journal of Software 2016 Volume 11 Issue 11 pp.1083 –1008
[3]
Yamaguchi F., Lottmann M., Rieck K., Generalized vulnerability extrapolation using abstract syntax trees Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC '12) December 2012 Los Angeles, Calif, USA ACM pp.359 –368
[4]
Alpaydin E., Introduction to Machine Learning 2010 2nd The MIT Press
[5]
Bao T., Burket J., Woo M., Turner R., Brumley D., Byteweight: learning to recognize functions in binary code Proceedings of the 23rd USENIX Conference on Security Symposium (SEC '14) August 2014 San Diego, Calif, USA USENIX Association pp.845 –860
[6]
Rosenblum N., Zhu X., Miller B., Hunt K., Learning to analyze binary computer code Proceedings of the 23rd National Conference on Artificial Intelligence—Volume 2 (AAAI '08) 2008 AAAI Press pp.798 –804
[7]
Rosenblum N. E., Miller B. P., Zhu X., Extracting compiler provenance from program binaries Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE '10) June 2010 Toronto, Canada ACM pp.21 –28
[8]
Rosenblum N., Miller B. P., Zhu X., Recovering the toolchain provenance of binary code Proceedings of the 20th International Symposium on Software Testing and Analysis (ISSTA '11) July 2011 Ontario, Canada ACM pp.100 –110
[9]
Santos I., Penya Y. K., Devesa J., Bringas P. G., N-grams-based file signatures for malware detection Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS '09) 2009 AIDSS pp.317 –320
[10]
Liangboonprakong C., Sornil O., Classification of malware families based on N-grams sequential pattern features Proceedings of the 8th IEEE Conference on Industrial Electronics and Applications (ICIEA '13) June 2013 pp.777 –782
[11]
Raychev V., Vechev M., Krause A., Predicting program properties from “big code” Proceedings of the 42nd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages (POPL '15) 2015 pp.111 –124
[12]
Troshina K., Chernov A., Derevenets Y., C decompilation: is it possible? Proceedings of the International Workshop on Program Understanding (PSI '09) 2009 Altai Mountains, Russia pp.18 –27
[13]
Schwartz E. J., Lee J., Woo M., Brumley D., Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring Proceedings of the 22nd USENIX Security Symposium, USENIX 2013 Washington, DC, USA pp.353 –368
[14]
Fokin A., Derevenetc E., Chernov A., Troshina K., SmartDec: approaching C++ decompilation Proceedings of the 18th Working Conference on Reverse Engineering (WCRE '11) October 2011 IEEE pp.347 –356
[15]
Fan Y., Ye Y., Chen L., Malicious sequential pattern mining for automatic malware detection Expert Systems with Applications 2016 Volume 52 pp.16 –25
[16]
Lafferty J. D., McCallum A., Pereira F. C. N., Conditional random fields: probabilistic models for segmenting and labeling sequence data Proceedings of the 18th International Conference on Machine Learning (ICML '01) 2001 Morgan Kaufmann pp.282 –289
[17]
Escalada J., Ortin F., Source code for the article: An efficient platform for the automatic extraction of patterns in native code, 2016, <ext-link ext-link-type="url">http://www.reflection.uniovi.es/decompilation/download/2016/sp
[18]
LLVM, clang: a C language family frontend for LLVM, 2016, <ext-link ext-link-type="url">http://clang.llvm.org
[19]
Bachaalany E., GitHub: IDAPython, 2016, <ext-link ext-link-type="url">https://github.com/idapython
[20]
Beazley D., Understanding the python GIL Proceedings of the PyCON Python Conference February 2010 Atlanta, Ga, USA
[21]
Phillips D., Python 3 Object-Oriented Programming 2015 2nd Birmingham, UK Packt Publishing Ltd, Livery Place
[22]
Redondo J. M., Ortin F., Lovelle J. M. C., Optimizing reflective primitives of dynamic languages International Journal of Software Engineering and Knowledge Engineering 2008 Volume 18 Issue 6 pp.759 –783
[23]
Ortin F., Vinuesa L., Felix J. M., The DSAW aspect-oriented software development platform International Journal of Software Engineering and Knowledge Engineering 2011 Volume 21 Issue 7 pp.891 –929
[24]
Amdahl G. M., Validity of the single processor approach to achieving large scale computing capabilities Proceedings of the Spring Joint Computer Conference April 1967 Atlantic City, NJ, USA pp.483 –485
[25]
Rosenblum N., Zhu X., Miller B. P., Who wrote this code? Identifying the authors of program binaries Computer Security—ESORICS 2011: 16th European Symposium on Research in Computer Security, Leuven, Belgium, September 12–14,2011. Proceedings 2011 Volume 6879 Berlin, Germany Springer pp.172 –189 Lecture Notes in Computer Science
[26]
Jacobson E. R., Rosenblum N., Miller B. P., Labeling library functions in stripped binaries Proceedings of the 10th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools (PASTE '11) September 2011 Szeged, Hungary ACM pp.1 –8
[27]
Santos I., Ugarte-Pedrero X., Sanz B., Laorden C., Bringas P. G., Collective classification for packed executable identification Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS '11) September 2011 Perth, Australia pp.23 –30
[28]
Ugarte-Pedrero X., Santos I., Bringas P. G., Structural feature based anomaly detection for packed executable identification Computational Intelligence in Security for Information Systems: 4th International Conference, CISIS 2011, Held at IWANN 2011, Torremolinos-Málaga, Spain, June 8–10, 2011. Proceedings 2011 Volume 6694 Berlin, Germany Springer pp.230 –237 Lecture Notes in Computer Science
[29]
Cifuentes C., Simon D., Fraboulet A., Assembly to high-level language translation Proceedings of the IEEE International Conference on Software Maintenance (ICSM '98) November 1998 Bethesda, Md, USA IEEE pp.228 –237
[30]
Cifuentes C., Van Emmerik M., Recovery of jump table case statements from binary code Science of Computer Programming 2001 Volume 40 Issue 2-3 pp.171 –188
[31]
Mycroft A., Type-based decompilation Proceedings of the European Symposium on Programming (ESOP '99) 1999 pp.208 –223
[32]
Balakrishnan G., Reps T., Divine: discovering variables in executables Verification, Model Checking, and Abstract Interpretation: 8th International Conference, VMCAI 2007, Nice, France, January 14–16, 2007. Proceedings 2007 Volume 4349 Berlin, Germany Springer pp.1 –28 Lecture Notes in Computer Science
[33]
Cozzie A., Stratton F., Xue H., King S. T., Digging for data structures Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI '08) December 2008 San Diego, Calif, USA pp.255 –266

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Scientific Programming
Scientific Programming  Volume 2017, Issue
February 2017
ISSN:1058-9244
EISSN:1875-919X
Issue’s Table of Contents

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 February 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media