Abstract
In light of the significant strides made by large language models (LLMs) in the field of natural language processing (NLP) [5], our research seeks to evaluate and contrast their proficiency in establishing associations within the realm of cybersecurity. Our experimental framework involves juxtaposing actual connections from various cybersecurity knowledge graphs (including MITRE CAPEC, D3FEND, and CVE connections to ATT &CK) against predictions made by LLMs using semantic textual similarity (STS). These connections span a broad spectrum, encapsulating diverse abstractions of threat descriptions, attack patterns, defense strategies, and vulnerabilities. The language models chosen for this study are varied, comprising state-of-the-art models from STS leaderboards, LLMs (GPT3.5 and PaLM), and ATTACK BERT [1], a cybersecurity domain-specific language model. Our experiments provide valuable insights into the differentiation between language models and data sources, thereby facilitating the broader application of STS in cybersecurity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdeen, B., Al-Shaer, E., Singhal, A., Khan, L., Hamlen, K.: SMET: semantic mapping of CVE to ATT &CK and its application to cybersecurity. In: Atluri, V., Ferrara, A.L. (eds.) DBSec 2023. LNCS, vol. 13942, pp. 243–260. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37586-6_15
Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E.: SecureBERT: a domain-specific language model for cybersecurity. In: Li, F., Liang, K., Lin, Z., Katsikas, S.K. (eds.) Security and Privacy in Communication Systems. LNICST, vol. 462, pp. 39–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25538-0_3
Akbar, K.A., Halim, S.M., Hu, Y., Singhal, A., Khan, L., Thuraisingham, B.: Knowledge mining in cybersecurity: from attack to defense. In: Sural, S., Lu, H. (eds.) DBSec 2022. LNCS, vol. 13383, pp. 110–122. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10684-2_7
Al-Hawawreh, M., Aljuhani, A., Jararweh, Y.: ChatGPT for cybersecurity: practical applications, challenges, and future directions. Clust. Comput. 26(6), 3421–3436 (2023)
Bubeck, S., et al.: Sparks of artificial general intelligence: early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023)
Crumpler, W., Lewis, J.A.: The Cybersecurity Workforce Gap. JSTOR (2019)
Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From ChatGPT to ThreatGPT: impact of generative AI in cybersecurity and privacy. IEEE Access 11, 80218–80245 (2023)
Huggingface: MTEB Leaderboard (2023). https://huggingface.co/spaces/mteb/leaderboard. Accessed 1 Dec 2023
Kaiser, F.K., Andris, L.J., Tennig, T.F., Iser, J.M., Wiens, M., Schultmann, F.: Cyber threat intelligence enabled automated attack incident response. In: 2022 3rd International Conference on Next Generation Computing Applications (NextComp), pp. 1–6. IEEE (2022)
Kanakogi, K., et al.: Tracing CVE vulnerability information to CAPEC attack patterns using natural language processing techniques. Information 12(8), 298 (2021)
Kuppa, A., Aouad, L., Le-Khac, N.A.: Linking CVE’s to MITRE ATT &CK techniques. In: Proceedings of the 16th International Conference on Availability, Reliability and Security, pp. 1–12 (2021)
McKenna, N., Li, T., Cheng, L., Hosseini, M.J., Johnson, M., Steedman, M.: Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552 (2023)
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)
Ranade, P., Piplai, A., Joshi, A., Finin, T.: CyBERT: contextualized embeddings for the cybersecurity domain. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3334–3342. IEEE (2021)
Roy, S., Panaousis, E., Noakes, C., Laszka, A., Panda, S., Loukas, G.: SoK: the MITRE ATT &CK framework in research and practice. arXiv preprint arXiv:2304.07411 (2023)
Sarker, I.H., Furhad, M.H., Nowrozy, R.: AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput. Sci. 2, 1–18 (2021)
Venturebeat: Mental Health: 66% of cybersecurity analysts experienced burnout this year (2023). https://venturebeat.com/security/mental-health-cybersecurity-analysts/. Accessed 19 July 2023
Wåreus, E., Hell, M.: Automated CPE labeling of CVE summaries with machine learning. In: Maurice, C., Bilge, L., Stringhini, G., Neves, N. (eds.) DIMVA 2020. LNCS, vol. 12223, pp. 3–22. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52683-2_1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Soltani, A., Nkashama, D.K., Masakuna, J.F., Frappier, M., Tardif, PM., Kabanza, F. (2024). Extended Abstract: Assessing Language Models for Semantic Textual Similarity in Cybersecurity. In: Maggi, F., Egele, M., Payer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2024. Lecture Notes in Computer Science, vol 14828. Springer, Cham. https://doi.org/10.1007/978-3-031-64171-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-64171-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64170-1
Online ISBN: 978-3-031-64171-8
eBook Packages: Computer ScienceComputer Science (R0)