More Web Proxy on the site http://driver.im/

research-article

An Empirical Study of Rule-Based and Learning-Based Approaches for Static Application Security Testing

Authors:

Dominic Newlands,

M. Ali BabarAuthors Info & Claims

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Article No.: 8, Pages 1 - 12

https://doi.org/10.1145/3475716.3475781

Published: 11 October 2021 Publication History

Abstract

Background: Static Application Security Testing (SAST) tools purport to assist developers in detecting security issues in source code. These tools typically use rule-based approaches to scan source code for security vulnerabilities. However, due to the significant shortcomings of these tools (i.e., high false positive rates), learning-based approaches for Software Vulnerability Prediction (SVP) are becoming a popular approach. Aims: Despite the similar objectives of these two approaches, their comparative value is unexplored. We provide an empirical analysis of SAST tools and SVP models, to identify their relative capabilities for source code security analysis. Method: We evaluate the detection and assessment performance of several common SAST tools and SVP models on a variety of vulnerability datasets. We further assess the viability and potential benefits of combining the two approaches. Results: SAST tools and SVP models provide similar detection capabilities, but SVP models exhibit better overall performance for both detection and assessment. Unification of the two approaches is difficult due to lacking synergies. Conclusions: Our study generates 12 main findings which provide insights into the capabilities and synergy of these two approaches. Through these observations we provide recommendations for use and improvement.

References

[1]

Bushra Aloraini, Meiyappan Nagappan, Daniel M German, Shinpei Hayashi, and Yoshiki Higo. 2019. An empirical study of security warnings from static application security testing tools. Journal of Systems and Software 158 (2019), 110427.

Digital Library

[2]

Moritz Beller, Radjino Bholanath, Shane McIntosh, and Andy Zaidman. 2016. Analyzing the state of static analysis: A large-scale evaluation in open source software. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 470--481.

[3]

Tim Boland and Paul E Black. 2012. Juliet 1.1 C/C++ and Java test suite. IEEE Computer Architecture Letters 45, 10 (2012), 88--90.

Digital Library

[4]

CERN. [n.d.]. Rough Auditing Tool for Security (RATS). https://security.web.cern.ch/recommendations/en/codetools/rats.shtml

[5]

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1 (2020), 1--13.

[6]

Istehad Chowdhury and Mohammad Zulkernine. 2011. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture 57, 3 (2011), 294--313.

Digital Library

[7]

Maria Christakis and Christian Bird. 2016. What developers want and need from program analysis: an empirical study. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering. 332--343.

Digital Library

[8]

William G Cochran. 2007. Sampling techniques. John Wiley & Sons.

[9]

Rory Coulter, Qing-Long Han, Lei Pan, Jun Zhang, and Yang Xiang. 2020. Code analysis for intelligent cyber systems: A data-driven approach. Information sciences 524 (2020), 46--58.

[10]

Roland Croft, Dominic Newlands, Ziyu Chen, and Ali Babar. 2021. Reproduction package for "An Empirical Study of Rule-Based and Learning-Based Approaches for Static Application Security Testing". https://doi.org/10.6084/m9.figshare.14585076.v1

[11]

CWE. [n.d.]. Common Weakness Enumeration. https://cwe.mitre.org/

[12]

Gabriel Díaz and Juan Ramón Bermejo. 2013. Static analysis of source code security: Assessment of tools against SAMATE tests. Information and software technology 55, 8 (2013), 1462--1476.

Digital Library

[13]

Davide Falessi, Jacky Huang, Likhita Narayana, Jennifer Fong Thai, and Burak Turhan. 2020. On the need of preserving order of data when validating within-project defect classifiers. Empirical Software Engineering 25, 6 (2020), 4805--4830.

Digital Library

[14]

Yuanrui Fan, D Alencar da Costa, D Lo, AE Hassan, and L Shanping. 2020. The impact of mislabeled changes by szz on just-in-time defect prediction. IEEE Transactions on Software Engineering (2020).

[15]

OWASP Foundation. [n.d.]. Static Code Analysis. https://owasp.org/www-community/controls/Static_Code_Analysis

[16]

Michael Gegick and Laurie Williams. 2007. Toward the use of automated static analysis alerts for early identification of vulnerability-and attack-prone components. In Second International Conference on Internet Monitoring and Protection (ICIMP 2007). IEEE, 18--18.

Digital Library

[17]

Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Computing Surveys (CSUR) 50, 4 (2017), 1--36.

Digital Library

[18]

Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2017. A large-scale study of the impact of feature selection techniques on defect classification models. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 146--157.

Digital Library

[19]

Hazim Hanif, Mohd Hairul Nizam Md Nasir, Mohd Faizal Ab Razak, Ahmad Firdaus, and Nor Badrul Anuar. 2021. The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches. Journal of Network and Computer Applications (2021), 103009.

[20]

Nasif Imtiaz, Akond Rahman, Effat Farhana, and Laurie Williams. 2019. Challenges with responding to static analysis tool alerts. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 245--249.

Digital Library

[21]

Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist 11, 2 (1912), 37--50.

[22]

Matthieu Jimenez, Yves Le Traon, and Mike Papadakis. 2018. Enabling the continous analysis of security vulnerabilities with vuldata7. In IEEE International Working Conference on Source Code Analysis and Manipulation.

[23]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don't software developers use static analysis tools to find bugs?. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 672--681.

Digital Library

[24]

Arvinder Kaur and Ruchikaa Nayyar. 2020. A comparative study of static code analysis tools for vulnerability detection in c/c++ and java source code. Procedia Computer Science 171 (2020), 2023--2029.

[25]

Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81--93.

[26]

Saad Khan and Simon Parkinson. 2018. Review into state of the art of vulnerability assessment using artificial intelligence. In Guide to Vulnerability Analysis for Computer Networks and Systems. Springer, 3--32.

[27]

Gary A Kildall. 1973. A unified approach to global program optimization. In Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 194--206.

Digital Library

[28]

Triet Huynh Minh Le, David Hin, Roland Croft, and M Ali Babar. 2020. PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning. In Proceedings of the 17th International Conference on Mining Software Repositories. 350--361.

Digital Library

[29]

Triet Le Le Huynh Minh, Roland Croft, David Hin, and Muhammad Ali Ali Babar. 2021. A Large-scale Study of Security Vulnerability Support on Developer Q&A Websites. In Evaluation and Assessment in Software Engineering. 109--118.

Digital Library

[30]

Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. In 25th Annual Network and Distributed System Symposium.

[31]

Daniel Marjamaki. [n.d.]. Cppcheck. http://cppcheck.sourceforge.net/

[32]

Jian-Xun Mi, An-Di Li, and Li-Fang Zhou. 2020. Review Study of Interpretation Methods for Future Interpretable Machine Learning. IEEE Access 8 (2020), 191969--191985.

[33]

Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1--9.

Digital Library

[34]

Patrick J Morrison, Rahul Pandita, Xusheng Xiao, Ram Chillarege, and Laurie Williams. 2018. Are vulnerabilities discovered and resolved like other defects? Empirical Software Engineering 23, 3 (2018), 1383--1421.

Digital Library

[35]

Nuthan Munaiah and Andrew Meneely. 2019. Data-driven insights from vulnerability discovery metrics. In 2019 IEEE/ACM Joint 4th International Workshop on Rapid Continuous Software Engineering and 1st International Workshop on Data-Driven Decisions, Experimentation and Evolution (RCoSE/DDrEE). IEEE, 1--7.

Digital Library

[36]

Zaigham Mushtaq, Ghulam Rasool, and Balawal Shehzad. 2017. Multilingual source code analysis: A systematic literature review. IEEE Access 5 (2017), 11307--11336.

[37]

National Institute of Standards and Technology. [n.d.]. Software Assurance and Reference Dataset. https://samate.nist.gov/SARD/testsuite.php

[38]

National Institute of Standards and Technology. [n.d.]. Source Code Security Analyzers. https://samate.nist.gov/index.php/Source_Code_Security_Analyzers.html

[39]

Tosin Daniel Oyetoyan, Bisera Milosheska, Mari Grini, and Daniela Soares Cruzes. 2018. Myths and facts about static application security testing tools: an action research at telenor digital. In International Conference on Agile Software Development. Springer, Cham, 86--103.

[40]

Rajshakhar Paul, Asif Kamal Turzo, and Amiangshu Bosu. 2021. Why Security Defects Go Unnoticed during Code Reviews? A Case-Control Study of the Chromium OS Project. In 2021 43rd International Conference on Software Engineering (ICSE). IEEE.

Digital Library

[41]

Jose D'Abruzzo Pereira, João R Campos, and Marco Vieira. 2019. An exploratory study on machine learning to combine security vulnerability alerts from static analysis tools. In 2019 9th Latin-American Symposium on Dependable Computing (LADC). IEEE, 1--10.

[42]

Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 426--437.

Digital Library

[43]

Nico Poel. 2010. Automated Security Review of PHP Web Applications with Static Code Analysis. Master's thesis. University of Groningen.

[44]

Chanathip Pornprasit and Chakkrit Tantithamthavorn. 2021. JITLine: A Simpler, Better, Faster, Finer-grained Just-In-Time Defect Prediction. arXiv preprint arXiv:2103.07068 (2021).

[45]

Foyzur Rahman, Sameer Khatri, Earl T Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering. 424--434.

Digital Library

[46]

Athos Ribeiro, Paulo Meirelles, Nelson Lago, and Fabio Kon. 2019. Ranking warnings from multiple source code static analyzers via ensemble learning. In Proceedings of the 15th International Symposium on Open Collaboration. 1--10.

Digital Library

[47]

Emre Sahal and Ayse Tosun. 2018. Identifying bug-inducing changes for code additions. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1--2.

Digital Library

[48]

Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering 40, 10 (2014), 993--1006.

[49]

Robert C Seacord. 2005. Secure Coding in C and C++. Pearson Education.

Digital Library

[50]

Hossain Shahriar and Mohammad Zulkernine. 2012. Mitigating program security vulnerabilities: Approaches and challenges. ACM Computing Surveys (CSUR) 44, 3 (2012), 1--46.

Digital Library

[51]

Yonghee Shin, Andrew Meneely, Laurie Williams, and Jason A Osborne. 2010. Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE transactions on software engineering 37, 6 (2010), 772--787.

Digital Library

[52]

Yonghee Shin and Laurie Williams. 2013. Can traditional fault prediction models be used for vulnerability prediction? Empirical Software Engineering 18, 1 (2013), 25--59.

[53]

Justin Smith, Brittany Johnson, Emerson Murphy-Hill, Bill Chu, and Heather Richter Lipford. 2018. How developers diagnose potential security vulnerabilities with a static analysis tool. IEEE Transactions on Software Engineering 45, 9 (2018), 877--897.

Digital Library

[54]

Vincent Smyth. 2017. Software vulnerability management: how intelligence helps reduce the risk. Network Security 2017, 3 (2017), 10--12.

Digital Library

[55]

Chakkrit Tantithamthavorn, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering 46, 11 (2018), 1200--1219.

[56]

Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, and John Grundy. 2020. Explainable AI for Software Engineering. arXiv preprint arXiv:2012.01614 (2020).

[57]

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. An empirical comparison of model validation techniques for defect prediction models. IEEE Transactions on Software Engineering 43, 1 (2016), 1--18.

Digital Library

[58]

TIOBE. [n.d.]. TIOBE Index. https://www.tiobe.com/tiobe-index/

[59]

John Viega, JT Bloch, Tadayoshi Kohno, and Gary McGraw. 2002. Token-based scanning of source code for security problems. ACM Transactions on Information and System Security (TISSEC) 5, 3 (2002), 238--261.

Digital Library

[60]

Andreas Wagner and Johannes Sametinger. 2014. Using the Juliet test suite to compare static security scanners. In 2014 11th International Conference on Security and Cryptography (SECRYPT). IEEE, 1--9.

Digital Library

[61]

James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting vulnerable components: Software metrics vs text mining. In 2014 IEEE 25th international symposium on software reliability engineering. IEEE, 23--33.

Digital Library

[62]

David Wheeler. [n.d.]. Flawfinder. https://dwheeler.com/flawfinder/

[63]

Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in statistics. Springer, 196--202.

[64]

Yichen Xie, Andy Chou, and Dawson Engler. 2003. Archer: using symbolic, path-sensitive analysis to detect memory access errors. In Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering. 327--336.

Digital Library

[65]

Yanming Yang, Xin Xia, David Lo, Tingting Bi, John Grundy, and Xiaohu Yang. 2020. Predictive Models in Software Engineering: Challenges and Opportunities. arXiv preprint arXiv:2008.03656 (2020).

[66]

Jongwon Yoon, Minsik Jin, and Yungbum Jung. 2014. Reducing false alarms from an industrial-strength static analyzer by SVM. In 2014 21st Asia-Pacific Software Engineering Conference, Vol. 2. IEEE, 3--6.

Digital Library

[67]

Peng Zeng, Guanjun Lin, Lei Pan, Yonghang Tai, and Jun Zhang. 2020. Software Vulnerability Analysis and Discovery using Deep Learning Techniques: A Survey. IEEE Access (2020).

[68]

Thomas Zimmermann, Nachiappan Nagappan, and Laurie Williams. 2010. Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In 2010 Third International Conference on Software Testing, Verification and Validation. IEEE, 421--428.

Digital Library

Cited By

Li BYang PSun YHu ZYi M(2024)Advances and challenges in artificial intelligence text generation人工智能文本生成的进展与挑战Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230041025:1(64-83)Online publication date: 8-Feb-2024
https://doi.org/10.1631/FITEE.2300410
Bennett GHall TCounsell SWinter EShippey T(2024)Do Developers Use Static Application Security Testing (SAST) Tools Straight Out of the Box? A large-scale Empirical StudyProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690750(454-460)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3690750
Le TBabar M(2024)Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686675(131-142)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686675
Show More Cited By

Index Terms

An Empirical Study of Rule-Based and Learning-Based Approaches for Static Application Security Testing

Recommendations

Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Static application security testing (SAST) takes a significant role in the software development life cycle (SDLC). However, it is challenging to comprehensively evaluate the effectiveness of SAST tools to determine which is the better one for detecting ...
Barriers to Using Static Application Security Testing (SAST) Tools: A Literature Review
ASEW '24: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops

Developers face a challenging problem with no clear solution. Modern software breaches can wreak havoc on businesses and individuals alike. With code vulnerabilities being a leading cause, securing applications must be a priority for developers. Static ...
Machine learning and the Internet of Things security: Solutions and open challenges
Highlights
- Emphasizing security challenges and requirements of IoT-based systems.
- ...
Abstract
Internet of Things (IoT) is a pervasively-used technology for the last few years. IoT technologies are also responsible for intensifying various everyday smart applications improving the standard of living. However, the inter-crossing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

October 2021

368 pages

ISBN:9781450386654

DOI:10.1145/3475716

General Chair:
Filippo Lanubile

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Cyber Security Cooperative Research Centre

Conference

ESEM '21

Sponsor:

SIGSOFT

ESEM '21: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

October 11 - 15, 2021

Bari, Italy

Acceptance Rates

ESEM '21 Paper Acceptance Rate 24 of 124 submissions, 19%;

Overall Acceptance Rate 130 of 594 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
644
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)15

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li BYang PSun YHu ZYi M(2024)Advances and challenges in artificial intelligence text generation人工智能文本生成的进展与挑战Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.230041025:1(64-83)Online publication date: 8-Feb-2024
https://doi.org/10.1631/FITEE.2300410
Bennett GHall TCounsell SWinter EShippey T(2024)Do Developers Use Static Application Security Testing (SAST) Tools Straight Out of the Box? A large-scale Empirical StudyProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690750(454-460)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3690750
Le TBabar M(2024)Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686675(131-142)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3674805.3686675
Hou CSun YLi LChen WXu X(2024)Code Defect Detection Model with Multi-layer Bi-directional Long Short Term Memory based on Self-Attention MechanismProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650676(1656-1660)Online publication date: 17-Apr-2024
https://doi.org/10.1145/3650400.3650676
Li ZLiu ZWong WMa PWang S(2024)Evaluating C/C++ Vulnerability Detectability of Query-Based Static Application Security Testing ToolsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2024.3354789(1-18)Online publication date: 2024
https://doi.org/10.1109/TDSC.2024.3354789
Gorchakov AAnatolievna Demidova L(2024)Methods and Algorithms for Cross-Language Search of Source Code Fragments2024 International Conference on Information Technologies (InfoTech)10.1109/InfoTech63258.2024.10701403(1-4)Online publication date: 11-Sep-2024
https://doi.org/10.1109/InfoTech63258.2024.10701403
Cao DJun W(2024)LLM-CloudSec: Large Language Model Empowered Automatic and Deep Vulnerability Analysis for Intelligent CloudsIEEE INFOCOM 2024 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)10.1109/INFOCOMWKSHPS61880.2024.10620804(1-6)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOMWKSHPS61880.2024.10620804
Ferreira IRafiq ACheng J(2024)Incivility detection in open source code review and issue discussionsJournal of Systems and Software10.1016/j.jss.2023.111935209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111935
Hashmi EYamin MYayilgan S(2024)Securing tomorrow: a comprehensive survey on the synergy of Artificial Intelligence and information securityAI and Ethics10.1007/s43681-024-00529-zOnline publication date: 30-Jul-2024
https://doi.org/10.1007/s43681-024-00529-z
Yang LWei CYang JMa JGuo HCheng LLi Z(2024)Seq2Seq-AFL: Fuzzing via sequence-to-sequence modelInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02153-z15:10(4403-4421)Online publication date: 23-Apr-2024
https://doi.org/10.1007/s13042-024-02153-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents