The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities

Serguei A. Mokhov²¹,
Joey Paquet²¹ &
Mourad Debbabi²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8436))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

3115 Accesses
6 Citations
1 Altmetric

Abstract

We employ classical NLP techniques (n-grams and various smoothing algorithms) combined with machine learning for non-NLP applications of detection, classification, and reporting of weaknesses related to vulnerabilities or bad coding practices found in artificial constrained languages, such as programming languages and their compiled counterparts. We compare and contrast the NLP approach to the signal processing approach in our results summary along with concrete promising results for specific test cases of open-source software written in C, C++, and JAVA. We use the open-source MARF’s NLP framework and its MARFCAT application for the task, where the latter originally was designed for the Static Analysis Tool Exposition (SATE) workshop

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automated Detection of Logical Errors in Programs

An Experimental Evaluation of Deliberate Unsoundness in a Static Program Analyzer

Using Generic and Generated Components to Create Static Software Analysis Tools Faster

References

Mokhov, S.A.: Evolution of MARF and its NLP framework. In: C3S2E, pp. 118–122. ACM (2010)
Google Scholar
Okun, V., Delaitre, A., Black, P.E.: NIST SAMATE: Static Analysis Tool Exposition, SATE (2014), http://samate.nist.gov
Bozorgi, M., Saul, L.K., Savage, S., Voelker, G.M.: Beyond heuristics: Learning to classify vulnerabilities and predict exploits. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 105–114. ACM, New York (2010)
Chapter Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press (2002)
Google Scholar
Mokhov, S.A., Debbabi, M.: File type analysis using signal processing techniques and machine learning vs. file unix utility for forensic analysis. In: IMF. LNI, vol. 140, pp. 73–85. GI (2008)
Google Scholar
Mokhov, S.A.: L’approche MARF à DEFT 2010: A MARF approach to DEFT 2010. In: DEFT, LIMSI / ATALA, pp. 35–49 (2010)
Google Scholar
Tlili, S.: Automatic detection of safety and security vulnerabilities in open source software. PhD thesis, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada (2009) ISBN: 9780494634165
Google Scholar
Kremenek, T., Twohey, P., Back, G., Ng, A., Engler, D.: From uncertainty to belief: Inferring the specification within. In: Proceedings of the 7th Symposium on Operating System Design and Implementation (2006)
Google Scholar
Kong, Y., Zhang, Y., Liu, Q.: Eliminating human specification in static analysis. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 494–495. Springer, Heidelberg (2010)
Chapter Google Scholar
Eto, M., et al.: NICTER: a large-scale network incident analysis system: case studies for understanding threat landscape. In: BADGERS, pp. 37–45. ACM (2011)
Google Scholar
NIST: National Vulnerability Database (2014), http://nvd.nist.gov/
MITRE: Common Weakness Enumeration (CWE) – a community-developed dictionary of software weakness types (2014), http://cwe.mitre.org
Mokhov, S.A., Paquet, J., Debbabi, M., Sun, Y.: MARFCAT: Transitioning to binary and larger data sets of SATE IV (May 2012), http://arxiv.org/abs/1207.3718

Download references

Author information

Authors and Affiliations

Concordia University, Montreal, QC, Canada
Serguei A. Mokhov, Joey Paquet & Mourad Debbabi

Authors

Serguei A. Mokhov
View author publications
You can also search for this author in PubMed Google Scholar
Joey Paquet
View author publications
You can also search for this author in PubMed Google Scholar
Mourad Debbabi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Medicine and School of Electrical Engineering and Computer Science, Department of Epidemiology & Community Medicine, University of Ottawa, 451 Smyth Road, Room 3105, K1H 8M5, Ottawa, ON, Canada
Marina Sokolova
Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada
Peter van Beek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mokhov, S.A., Paquet, J., Debbabi, M. (2014). The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-06483-3_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06482-6
Online ISBN: 978-3-319-06483-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automated Detection of Logical Errors in Programs

An Experimental Evaluation of Deliberate Unsoundness in a Static Program Analyzer

Using Generic and Generated Components to Create Static Software Analysis Tools Faster

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Use of NLP Techniques in Static Code Analysis to Detect Weaknesses and Vulnerabilities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automated Detection of Logical Errors in Programs

An Experimental Evaluation of Deliberate Unsoundness in a Static Program Analyzer

Using Generic and Generated Components to Create Static Software Analysis Tools Faster

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation