Abstract
We employ classical NLP techniques (n-grams and various smoothing algorithms) combined with machine learning for non-NLP applications of detection, classification, and reporting of weaknesses related to vulnerabilities or bad coding practices found in artificial constrained languages, such as programming languages and their compiled counterparts. We compare and contrast the NLP approach to the signal processing approach in our results summary along with concrete promising results for specific test cases of open-source software written in C, C++, and JAVA. We use the open-source MARF’s NLP framework and its MARFCAT application for the task, where the latter originally was designed for the Static Analysis Tool Exposition (SATE) workshop
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mokhov, S.A.: Evolution of MARF and its NLP framework. In: C3S2E, pp. 118–122. ACM (2010)
Okun, V., Delaitre, A., Black, P.E.: NIST SAMATE: Static Analysis Tool Exposition, SATE (2014), http://samate.nist.gov
Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: Learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 105–114. ACM, New York (2010)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press (2002)
Mokhov, S.A., Debbabi, M.: File type analysis using signal processing techniques and machine learning vs. file unix utility for forensic analysis. In: IMF. LNI, vol. 140, pp. 73–85. GI (2008)
Mokhov, S.A.: L’approche MARF à DEFT 2010: A MARF approach to DEFT 2010. In: DEFT, LIMSI / ATALA, pp. 35–49 (2010)
Tlili, S.: Automatic detection of safety and security vulnerabilities in open source software. PhD thesis, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada (2009) ISBN: 9780494634165
Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: Inferring the specification within. In: Proceedings of the 7th Symposium on Operating System Design and Implementation (2006)
Kong, Y., Zhang, Y., Liu, Q.: Eliminating human specification in static analysis. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 494–495. Springer, Heidelberg (2010)
Eto, M., et al.: NICTER: a large-scale network incident analysis system: case studies for understanding threat landscape. In: BADGERS, pp. 37–45. ACM (2011)
NIST: National Vulnerability Database (2014), http://nvd.nist.gov/
MITRE: Common Weakness Enumeration (CWE) – a community-developed dictionary of software weakness types (2014), http://cwe.mitre.org
Mokhov, S.A., Paquet, J., Debbabi, M., Sun, Y.: MARFCAT: Transitioning to binary and larger data sets of SATE IV (May 2012), http://arxiv.org/abs/1207.3718
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Mokhov, S.A., Paquet, J., Debbabi, M. (2014). The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-06483-3_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06482-6
Online ISBN: 978-3-319-06483-3
eBook Packages: Computer ScienceComputer Science (R0)