Extended Abstract: Assessing Language Models for Semantic Textual Similarity in Cybersecurity

Arian Soltani²⁸,
DJeff Kanda Nkashama²⁸,
Jordan Felicien Masakuna²⁸,
Marc Frappier²⁸,
Pierre-Martin Tardif²⁸ &
…
Froduald Kabanza²⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14828))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

477 Accesses

Abstract

In light of the significant strides made by large language models (LLMs) in the field of natural language processing (NLP) [5], our research seeks to evaluate and contrast their proficiency in establishing associations within the realm of cybersecurity. Our experimental framework involves juxtaposing actual connections from various cybersecurity knowledge graphs (including MITRE CAPEC, D3FEND, and CVE connections to ATT &CK) against predictions made by LLMs using semantic textual similarity (STS). These connections span a broad spectrum, encapsulating diverse abstractions of threat descriptions, attack patterns, defense strategies, and vulnerabilities. The language models chosen for this study are varied, comprising state-of-the-art models from STS leaderboards, LLMs (GPT3.5 and PaLM), and ATTACK BERT [1], a cybersecurity domain-specific language model. Our experiments provide valuable insights into the differentiation between language models and data sources, thereby facilitating the broader application of STS in cybersecurity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SecureBERT: A Domain-Specific Language Model for Cybersecurity

Knowledge Mining in Cybersecurity: From Attack to Defense

Cybersecurity as an Industry: A Cyber Threat Intelligence Perspective

References

Abdeen, B., Al-Shaer, E., Singhal, A., Khan, L., Hamlen, K.: SMET: semantic mapping of CVE to ATT &CK and its application to cybersecurity. In: Atluri, V., Ferrara, A.L. (eds.) DBSec 2023. LNCS, vol. 13942, pp. 243–260. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-37586-6_15
Chapter Google Scholar
Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E.: SecureBERT: a domain-specific language model for cybersecurity. In: Li, F., Liang, K., Lin, Z., Katsikas, S.K. (eds.) Security and Privacy in Communication Systems. LNICST, vol. 462, pp. 39–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25538-0_3
Chapter Google Scholar
Akbar, K.A., Halim, S.M., Hu, Y., Singhal, A., Khan, L., Thuraisingham, B.: Knowledge mining in cybersecurity: from attack to defense. In: Sural, S., Lu, H. (eds.) DBSec 2022. LNCS, vol. 13383, pp. 110–122. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10684-2_7
Chapter Google Scholar
Al-Hawawreh, M., Aljuhani, A., Jararweh, Y.: ChatGPT for cybersecurity: practical applications, challenges, and future directions. Clust. Comput. 26(6), 3421–3436 (2023)
Article Google Scholar
Bubeck, S., et al.: Sparks of artificial general intelligence: early experiments with GPT-4. arXiv preprint arXiv:2303.12712 (2023)
Crumpler, W., Lewis, J.A.: The Cybersecurity Workforce Gap. JSTOR (2019)
Google Scholar
Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From ChatGPT to ThreatGPT: impact of generative AI in cybersecurity and privacy. IEEE Access 11, 80218–80245 (2023)
Article Google Scholar
Huggingface: MTEB Leaderboard (2023). https://huggingface.co/spaces/mteb/leaderboard. Accessed 1 Dec 2023
Kaiser, F.K., Andris, L.J., Tennig, T.F., Iser, J.M., Wiens, M., Schultmann, F.: Cyber threat intelligence enabled automated attack incident response. In: 2022 3rd International Conference on Next Generation Computing Applications (NextComp), pp. 1–6. IEEE (2022)
Google Scholar
Kanakogi, K., et al.: Tracing CVE vulnerability information to CAPEC attack patterns using natural language processing techniques. Information 12(8), 298 (2021)
Article Google Scholar
Kuppa, A., Aouad, L., Le-Khac, N.A.: Linking CVE’s to MITRE ATT &CK techniques. In: Proceedings of the 16th International Conference on Availability, Reliability and Security, pp. 1–12 (2021)
Google Scholar
McKenna, N., Li, T., Cheng, L., Hosseini, M.J., Johnson, M., Steedman, M.: Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552 (2023)
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)
Article Google Scholar
Ranade, P., Piplai, A., Joshi, A., Finin, T.: CyBERT: contextualized embeddings for the cybersecurity domain. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3334–3342. IEEE (2021)
Google Scholar
Roy, S., Panaousis, E., Noakes, C., Laszka, A., Panda, S., Loukas, G.: SoK: the MITRE ATT &CK framework in research and practice. arXiv preprint arXiv:2304.07411 (2023)
Sarker, I.H., Furhad, M.H., Nowrozy, R.: AI-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput. Sci. 2, 1–18 (2021)
Article Google Scholar
Venturebeat: Mental Health: 66% of cybersecurity analysts experienced burnout this year (2023). https://venturebeat.com/security/mental-health-cybersecurity-analysts/. Accessed 19 July 2023
Wåreus, E., Hell, M.: Automated CPE labeling of CVE summaries with machine learning. In: Maurice, C., Bilge, L., Stringhini, G., Neves, N. (eds.) DIMVA 2020. LNCS, vol. 12223, pp. 3–22. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52683-2_1
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

GRIC, Université de Sherbrooke, Sherbrooke, QC, J1K 2R1, Canada
Arian Soltani, DJeff Kanda Nkashama, Jordan Felicien Masakuna, Marc Frappier, Pierre-Martin Tardif & Froduald Kabanza

Authors

Arian Soltani
View author publications
You can also search for this author in PubMed Google Scholar
DJeff Kanda Nkashama
View author publications
You can also search for this author in PubMed Google Scholar
Jordan Felicien Masakuna
View author publications
You can also search for this author in PubMed Google Scholar
Marc Frappier
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Martin Tardif
View author publications
You can also search for this author in PubMed Google Scholar
Froduald Kabanza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arian Soltani .

Editor information

Editors and Affiliations

AWS, San Diego, CA, USA
Federico Maggi
Boston University, Boston, MA, USA
Manuel Egele
EPFL, Lausanne, Switzerland
Mathias Payer
Politecnico di Milano, Milan, Italy
Michele Carminati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soltani, A., Nkashama, D.K., Masakuna, J.F., Frappier, M., Tardif, PM., Kabanza, F. (2024). Extended Abstract: Assessing Language Models for Semantic Textual Similarity in Cybersecurity. In: Maggi, F., Egele, M., Payer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2024. Lecture Notes in Computer Science, vol 14828. Springer, Cham. https://doi.org/10.1007/978-3-031-64171-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-64171-8_19
Published: 09 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64170-1
Online ISBN: 978-3-031-64171-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extended Abstract: Assessing Language Models for Semantic Textual Similarity in Cybersecurity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SecureBERT: A Domain-Specific Language Model for Cybersecurity

Knowledge Mining in Cybersecurity: From Attack to Defense

Cybersecurity as an Industry: A Cyber Threat Intelligence Perspective

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Extended Abstract: Assessing Language Models for Semantic Textual Similarity in Cybersecurity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SecureBERT: A Domain-Specific Language Model for Cybersecurity

Knowledge Mining in Cybersecurity: From Attack to Defense

Cybersecurity as an Industry: A Cyber Threat Intelligence Perspective

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation