[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Automatic software vulnerability assessment by extracting vulnerability elements

Published: 01 October 2023 Publication History

Abstract

Software vulnerabilities take threats to software security. When faced with multiple software vulnerabilities, the most urgent ones need to be fixed first. Therefore, it is critical to assess the severity of vulnerabilities in advance. However, increasing number of vulnerability descriptions do not use templates, which reduces the performance of the existing software vulnerability assessment approaches. In this paper, we propose an automated vulnerability assessment approach that using vulnerability elements for predicting the severity of six vulnerability metrics (i.e., Access Vector, Access Complexity, Authentication, Confidentiality Impact, Integrity Impact and Availability Impact). First, we use BERT-MRC to extract vulnerability elements from vulnerability descriptions. Second, we assess six metrics using vulnerability elements instead of full descriptions. We conducted experiments on our manually labeled dataset. The experimental results show that our approach has an improvement of 12.03%, 14.37%, and 38.65% on Accuracy over three baselines.

Highlights

Using vulnerability elements extracted from vulnerability description with BERT-MRC for vulnerability assessment.
We constructed a vulnerability elements dataset with seven vulnerability elements.
The experimental results show that our approach achieves 12.03%-38.65% improvement on Accuracy compared with three state-of-the-art approaches.

References

[1]
Aota M., Kanehara H., Kubo M., Murata N., Sun B., Takahashi T., Automation of vulnerability classification from its description using machine learning, in: IEEE Symposium on Computers and Communications, ISCC 2020, Rennes, France, July 7-10, 2020, IEEE, 2020, pp. 1–7,.
[2]
Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, in: Bengio Y., LeCun Y. (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015, URL: http://arxiv.org/abs/1409.0473.
[3]
Cao Y., Yusup A., Chinese electronic medical record named entity recognition based on BERT-WWM-IDCNN-CRF, in: 9th International Conference on Dependable Systems and their Applications, DSA 2022, Wulumuqi, China, August 4-5, 2022, IEEE, 2022, pp. 582–589,.
[4]
Chen Y., Convolutional Neural Network for Sentence Classification, (Masters thesis) University of Waterloo, 2015.
[5]
Chen T., Guestrin C., Xgboost: A scalable tree boosting system, in: Krishnapuram B., Shah M., Smola A.J., Aggarwal C.C., Shen D., Rastogi R. (Eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, ACM, 2016, pp. 785–794,.
[6]
Chen H., Liu J., Liu R., Park N., Subrahmanian V.S., VEST: a system for vulnerability exploit scoring & timing, in: Kraus S. (Ed.), Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, ijcai.org, 2019, pp. 6503–6505,.
[7]
Cortes C., Vapnik V., Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297,.
[8]
Gong L., Ji R., What does a TextCNN learn?, 2018, arXiv preprint arXiv:1801.06287.
[9]
Guo H., Chen S., Xing Z., Li X., Bai Y., Sun J., Detecting and augmenting missing key aspects in vulnerability descriptions, ACM Trans. Softw. Eng. Methodol. 31 (3) (2022) 49:1–49:27,.
[10]
Han Z., Li X., Xing Z., Liu H., Feng Z., Learning to predict severity of software vulnerability using only vulnerability description, in: 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, Shanghai, China, September 17-22, 2017, IEEE Computer Society, 2017, pp. 125–136,.
[11]
He T., Huang W., Qiao Y., Yao J., Text-attentional convolutional neural network for scene text detection, IEEE Trans. Image Process. 25 (6) (2016) 2529–2541,.
[12]
Ho T.K., Random decision forests, in: Third International Conference on Document Analysis and Recognition, Vol. 1, ICDAR 1995, August 14 - 15, 1995, Montreal, Canada, IEEE Computer Society, 1995, pp. 278–282,.
[13]
Huang Z., Xu W., Yu K., Bidirectional LSTM-CRF models for sequence tagging, CoRR abs/1508.01991, 2015, URL: http://arxiv.org/abs/1508.01991.
[14]
Karpathy A., The unreasonable effectiveness of recurrent neural networks, Andrej Karpathy Blog 21 (2015) 23.
[15]
Kaur J., Buttar P.K., A systematic review on stopword removal algorithms, Int. J. Future Revolut. Comput. Sci. Commun. Eng. 4 (4) (2018) 207–210.
[16]
Kudjo P.K., Chen J., Mensah S., Amankwah R., Kudjo C., The effect of Bellwether analysis on software vulnerability severity prediction models, Softw. Qual. J. 28 (2020) 1413–1446.
[17]
Kuehn P., Relke D.N., Reuter C., Common vulnerability scoring system prediction based on open source intelligence information sources, 2022, arXiv preprint arXiv:2210.02143.
[18]
Landis J.R., Koch G.G., The measurement of observer agreement for categorical data, Biometrics (1977) 159–174.
[19]
Le T.H.M., Babar M.A., On the use of fine-grained vulnerable code statements for software vulnerability assessment models, in: IEEE/ACM 19th International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022, IEEE, 2022, pp. 621–633,.
[20]
Le T.H., Chen H., Babar M.A., A survey on data-driven software vulnerability assessment and prioritization, ACM Comput. Surv. 55 (5) (2022) 1–39.
[21]
Le T.H.M., Hin D., Croft R., Babar M.A., DeepCVA: Automated commit-level vulnerability assessment with deep multi-task learning, in: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021, IEEE, 2021, pp. 717–729,.
[22]
Le T.H.M., Sabir B., Babar M.A., Automated software vulnerability assessment with concept drift, in: Storey M.D., Adams B., Haiduc S. (Eds.), Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada, IEEE / ACM, 2019, pp. 371–382,.
[23]
Lee J., Cho K., Hofmann T., Fully character-level neural machine translation without explicit segmentation, Trans. Assoc. Comput. Linguist. 5 (2017) 365–378,.
[24]
Li W., Du Y., Li X., Chen X., Xie C., Li H., Li X., UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning, Eng. Appl. Artif. Intell. 116 (2022),.
[25]
Li X., Feng J., Meng Y., Han Q., Wu F., Li J., A unified MRC framework for named entity recognition, in: Jurafsky D., Chai J., Schluter N., Tetreault J.R. (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, 2020, pp. 5849–5859,.
[26]
Lomio F., Iannone E., Lucia A.D., Palomba F., Lenarduzzi V., Just-in-time software vulnerability detection: Are we there yet?, J. Syst. Softw. 188 (2022),.
[27]
Lovins J.B., Development of a stemming algorithm., Mech. Transl. Comput. Linguist. 11 (1–2) (1968) 22–31.
[28]
Luong T., Pham H., Manning C.D., Effective approaches to attention-based neural machine translation, in: Màrquez L., Callison-Burch C., Su J., Pighin D., Marton Y. (Eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, The Association for Computational Linguistics, 2015, pp. 1412–1421,.
[29]
Martinez A., Sudoh K., Matsumoto Y., Sub-subword N-gram features for subword-level neural machine translation, J. Nat. Lang. Process. 28 (1) (2021) 82–103.
[30]
Olah C., Understanding lstm networks, 2015.
[31]
Plisson, J., Lavrac, N., Mladenic, D., et al., 2004. A rule based approach to word lemmatization. In: Proceedings of IS, Vol. 3. pp. 83–86.
[32]
Reimers N., Gurevych I., Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks, CoRR abs/1707.06799, 2017, URL: http://arxiv.org/abs/1707.06799.
[33]
Sahin, S.E., Tosun, A., 2019. A conceptual replication on predicting the severity of software vulnerabilities. In: Proceedings of the Evaluation and Assessment on Software Engineering. pp. 244–250.
[34]
Sanh V., Debut L., Chaumond J., Wolf T., DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019, arXiv preprint arXiv:1910.01108.
[35]
Shahid M.R., Debar H., CVSS-BERT: Explainable natural language processing to determine the severity of a computer security vulnerability from its description, in: Wani M.A., Sethi I.K., Shi W., Qu G., Raicu D.S., Jin R. (Eds.), 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, Pasadena, CA, USA, December 13-16, 2021, IEEE, 2021, pp. 1600–1607,.
[36]
Srinivasa S., Pedersen J.M., Vasilomanolakis E., Deceptive directories and “vulnerable” logs: a honeypot study of the LDAP and log4j attack landscape, in: IEEE European Symposium on Security and Privacy, EuroS&P 2022 - Workshops, Genoa, Italy, June 6-10, 2022, IEEE, 2022, pp. 442–447,.
[37]
Stavrianou A., Andritsos P., Nicoloyannis N., Overview and semantic issues of text mining, SIGMOD Rec. 36 (3) (2007) 23–34,.
[38]
Sun X., Li L., Bo L., Wu X., Wei Y., Li B., Automatic software vulnerability classification by extracting vulnerability triggers, Journal of Software: Evolution and Process (2022) e2508.
[39]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I., Attention is all you need, in: Guyon I., von Luxburg U., Bengio S., Wallach H.M., Fergus R., Vishwanathan S.V.N., Garnett R. (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 2017, pp. 5998–6008. URL: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[40]
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E., 2016. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1480–1489.
[41]
Yin W., Kann K., Yu M., Schütze H., Comparative study of CNN and RNN for natural language processing, CoRR abs/1702.01923, 2017, URL: http://arxiv.org/abs/1702.01923.
[42]
Yitagesu S., Xing Z., Zhang X., Feng Z., Li X., Han L., Unsupervised labeling and extraction of phrase-based concepts in vulnerability descriptions, in: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021, IEEE, 2021, pp. 943–954,.
[43]
Yitagesu S., Zhang X., Feng Z., Li X., Xing Z., Automatic part-of-speech tagging for security vulnerability descriptions, in: 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021, Madrid, Spain, May 17-19, 2021, IEEE, 2021, pp. 29–40,.
[44]
Zhang D., Wang D., Relation classification: CNN or RNN?, in: Lin C., Xue N., Zhao D., Huang X., Feng Y. (Eds.), Natural Language Understanding and Intelligent Applications - 5th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2016, and 24th International Conference on Computer Processing of Oriental Languages, ICCPOL 2016, Kunming, China, December 2-6, 2016, Proceedings, in: Lecture Notes in Computer Science, vol. 10102, Springer, 2016, pp. 665–675,.

Cited By

View all
  • (2024)Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639168(1-13)Online publication date: 20-May-2024
  • (2024)SCL-CVDComputers and Security10.1016/j.cose.2024.103994145:COnline publication date: 1-Oct-2024
  • (2024)Estimating vulnerability metrics with word embedding and multiclass classification methodsInternational Journal of Information Security10.1007/s10207-023-00734-723:1(247-270)Online publication date: 1-Feb-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Systems and Software
Journal of Systems and Software  Volume 204, Issue C
Oct 2023
486 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2023

Author Tags

  1. Vulnerability assessment
  2. Deep learning
  3. Multi-class classification
  4. Mining software repositories

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639168(1-13)Online publication date: 20-May-2024
  • (2024)SCL-CVDComputers and Security10.1016/j.cose.2024.103994145:COnline publication date: 1-Oct-2024
  • (2024)Estimating vulnerability metrics with word embedding and multiclass classification methodsInternational Journal of Information Security10.1007/s10207-023-00734-723:1(247-270)Online publication date: 1-Feb-2024

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media