[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence

Published: 01 November 2023 Publication History

Abstract

Gathering cyber threat intelligence from open sources is becoming increasingly important for maintaining and achieving a high level of security as systems become larger and more complex. However, these open sources are often subject to information overload. It is therefore useful to apply machine learning models that condense the amount of information to what is necessary. Yet, previous studies and applications have shown that existing classifiers are not able to process information about emerging cybersecurity events, such as new malware names or novel attack contexts, due to their low generalisation capability. Therefore, we propose a system to overcome this problem by training a new classifier for each new incident. Since this requires a lot of labelled data using standard training methods, we combine three different low-data regime techniques – transfer learning, data augmentation, and few-shot learning – to train a high-quality classifier from very few labelled instances. We evaluated our approach using a novel dataset derived from the Microsoft Exchange Server data breach of 2021 which was labelled by three experts. Our findings reveal an increase in F1 score of more than 21 points compared to standard training methods and more than 18 points compared to a state-of-the-art method in few-shot learning. Furthermore, the classifier trained with this method and 32 instances is only less than 5 F1 score points worse than a classifier trained with 1800 instances.

References

[1]
M.S. Abu, S.R. Selamat, A. Ariffin, R. Yusof, Cyber threat intelligence–issue and challenges, Indones. J. Electr. Eng. Comput. Sci. 10 (1) (2018) 371–379.
[2]
F. Alves, A. Andongabo, I. Gashi, P.M. Ferreira, A. Bessani, Follow the blue bird: a study on threat data published on Twitter, in: L. Chen, N. Li, K. Liang, S. Schneider (Eds.), Computer Security – ESORICS 2020, Springer International Publishing, Cham, 2020, pp. 217–236.
[3]
A. Anaby-Tavor, B. Carmeli, E. Goldbraich, A. Kantor, G. Kour, S. Shlomov, N. Tepper, N. Zwerdling, Do not have enough data? Deep learning to the rescue!, in: Proceedings of the AAAI, 2020, arXiv:1911.03118.
[4]
M. Bayer, M.A. Kaufhold, B. Buchhold, M. Keller, J. Dallmeyer, C. Reuter, Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers, Int. J. Mach. Learn. Cybern. (2021),. arXiv:2103.14453.
[5]
M. Bayer, M.A. Kaufhold, C. Reuter, A survey on data augmentation for text classification, ACM Comput. Surv. (2022),. 3544558.
[6]
Y. Belinkov, Y. Bisk, Synthetic and natural noise both break neural machine translation, in: Proceedings of ICLR, 2018.
[7]
Beltagy, I.; Lo, K.; Cohan, A. (2019): SciBERT: a pretrained language model for scientific text. arXiv:1903.10676.
[8]
Black, S.; Biderman, S.; Hallahan, E.; Anthony, Q.; Gao, L.; Golding, L.; He, H.; Leahy, C.; McDonell, K.; Phang, J.; Pieler, M.; Prashanth, U.S.; Purohit, S.; Reynolds, L.; Tow, J.; Wang, B.; Weinbach, S. (2022): GPT-NeoX-20B: an open-source autoregressive language model. arXiv:2204.06745.
[9]
Bragg, J.; Cohan, A.; Lo, K.; Beltagy, I. (2021): FLEX: unifying evaluation for few-shot NLP. arXiv:2107.07170 p. 14.
[10]
T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, in: NeurIPS, 2020, arXiv:2005.14165.
[11]
J. Caballero, G. Gomez, S. Matic, G. Sánchez, S. Sebastián, A. Villacañas, The rise of GoodFATR: a novel accuracy comparison methodology for indicator extraction tools, Future Gener. Comput. Syst. 144 (2023) 74–89,. arXiv:2208.00042.
[12]
S. Chatterjee, S. Thekdi, An iterative learning and inference approach to managing dynamic cyber vulnerabilities of complex systems, Reliab. Eng. Syst. Saf. 193 (2020),.
[13]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. (2018): BERT: pre-training of deep bidirectional transformers for language understanding (Mlm). arXiv:1810.04805.
[14]
N. Dionísio, F. Alves, P.M. Ferreira, A. Bessani, Towards end-to-end cyberthreat detection from Twitter using multi-task learning, in: 2020 International Joint Conference on Neural Networks (IJCNN), (ISSN ) 2020, pp. 1–8,.
[15]
Fabbri, A.R.; Han, S.; Li, H.; Li, H.; Ghazvininejad, M.; Joty, S.; Radev, D.; Mehdad, Y. (2021): Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. arXiv:2010.12836.
[16]
Y. Fang, J. Gao, Z. Liu, C. Huang, Detecting cyber threat event from Twitter using IDCNN and BiLSTM, Appl. Sci. 10 (17) (2020) 5922,. https://www.mdpi.com/2076-3417/10/17/5922.
[17]
Gao, T.; Fisch, A.; Chen, D. (2021): Making pre-trained language models better few-shot learners. arXiv:2012.15723.
[18]
G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, X. Niu, TTPDrill: automatic and accurate extraction of threat actions from unstructured text of CTI sources, in: Proceedings of the 33rd Annual Computer Security Applications Conference, ACM, Orlando, FL, USA, 2017, pp. 103–115,.
[19]
H. Jiang, P. He, W. Chen, X. Liu, J. Gao, T. Zhao, SMART: robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 2177–2190,. Online https://www.aclweb.org/anthology/2020.acl-main.197.
[20]
M.A. Kaufhold, A.S. Basyurt, K. Eyilmez, V. Ag, M. Stöttinger, C. Reuter, A. Sercan, Cyber threat observatory: design and evaluation of an interactive dashboard for computer emergency response teams, in: ECIS 2022, 2022, p. 18.
[21]
P. Kuehn, T. Riebe, L. Apelt, M. Jansen, C. Reuter, Sharing of cyber threat intelligence between states, Sicherh. Frieden 38 (1) (2020) 22–28,.
[22]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. (2020): ALBERT: a lite BERT for self-supervised learning of language representations. arXiv:1909.11942.
[23]
Q. Le Sceller, E.B. Karbab, M. Debbabi, F. Iqbal, Sonar: automatic detection of cyber security events over the Twitter stream, in: Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES'17, Association for Computing Machinery, New York, NY, USA, 2017,.
[24]
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics (2019),. https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz682/5566506.
[25]
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, P.G. Allen, RoBERTa: a Robustly Optimized BERT Pretraining Approach, Tech. Rep. 2019, https://github.com/pytorch/fairseq.
[26]
S. Longpre, Y. Wang, C. DuBois, How effective is task-agnostic data augmentation for pretrained transformers?, in: Findings of EMNLP, 2020.
[27]
Mahabadi, R.K.; Zettlemoyer, L.; Henderson, J.; Saeidi, M.; Mathias, L.; Stoyanov, V.; Yazdani, M. (2022): PERFECT: prompt-free and efficient few-shot learning with language models. arXiv:2204.01172.
[28]
L. Martin, B. Muller, P.J. Ortiz Suárez, Y. Dupont, L. Romary, É de la Clergerie, D. Seddah, B. Sagot, CamemBERT: a tasty French language model, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020, pp. 7203–7219. Online https://www.aclweb.org/anthology/2020.acl-main.645.
[29]
McMillan, R. (2013): Definition: threat intelligence. https://www.gartner.com/en/documents/2487216.
[30]
S. Mittal, P.K. Das, V. Mulwad, A. Joshi, T. Finin, Cybertwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities, in: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2016, pp. 860–867.
[31]
A.V. Mosolova, V.V. Fomin, I.Y. Bondarenko, Text augmentation for neural networks, CEUR Workshop Proc. 2268 (2018) 104–109.
[32]
A. Niakanlahiji, L. Safarnejad, R. Harper, B.T. Chu, IoCMiner: automatic extraction of indicators of compromise from Twitter, in: 2019 IEEE International Conference on Big Data (Big Data), IEEE, Los Angeles, CA, USA, 2019, pp. 4747–4754,. https://ieeexplore.ieee.org/document/9006562/.
[33]
S.J. Pan, Transfer learning, Learn. 21 (2020) 1–2.
[34]
H. Queiroz Abonizio, S. Barbon Junior, Pre-trained data augmentation for text classification, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12319 LNAI, Springer Science and Business Media Deutschland GmbH, 2020, pp. 551–565,. iSSN: 16113349.
[35]
Reimers, N.; Gurevych, I. (2019): Sentence-BERT: sentence embeddings using Siamese BERT-networks. https://doi.org/10.18653/v1/d19-1410.
[36]
T. Riebe, M.A. Kaufhold, C. Reuter, The impact of organizational structure and technology use on collaborative practices in computer emergency response teams: an empirical study, Proc. ACM Hum.-Comput. Interact. 5 (CSCW2) (2021) 1–30,.
[37]
T. Riebe, T. Wirth, M. Bayer, P. Kühn, M.A. Kaufhold, V. Knauthe, S. Guthe, C. Reuter, CySecAlert: an alert generation system for cyber security events using open source intelligence data, in: D. Gao, Q. Li, X. Guan, X. Liao (Eds.), Information and Communications Security, in: Lecture Notes in Computer Science, Springer International Publishing, Cham, 2021, pp. 429–446,.
[38]
A. Rodriguez, K. Okamura, Generating real time cyber situational awareness information through social media data mining, in: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), vol. 2, IEEE, 2019, pp. 502–507.
[39]
C. Sabottke, O. Suciu, T. Dumitras, Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits, in: 24th USENIX Security Symposium (USENIX Security 15), USENIX Association, Washington, D.C., 2015, pp. 1041–1056. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/sabottke.
[40]
Schick, T.; Schütze, H. (2021): Exploiting cloze questions for few shot text classification and natural language inference. arXiv:2001.07676.
[41]
R. Sennrich, B. Haddow, A. Birch, Improving neural machine translation models with monolingual data, in: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers, 2016,.
[42]
L. Sun, C. Xia, W. Yin, T. Liang, P.S. Yu, L. He, Mixup-transfomer: dynamic data augmentation for NLP tasks, 2020,. iSSN: 23318422 arXiv:2010.02394.
[43]
Tam, D.; Menon, R.R.; Bansal, M.; Srivastava, S.; Raffel, C. (2021): Improving and simplifying pattern exploiting training. arXiv:2103.11955.
[44]
W.L. Taylor, “Cloze procedure”: a new tool for measuring readability, Journal. Quart. 30 (4) (1953) 415–433,.
[45]
L. Torrey, J. Shavlik, Transfer learning, in: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, IGI Global, 2010, pp. 242–264.
[46]
W. Tounsi, H. Rais, A survey on technical threat intelligence in the age of sophisticated cyber attacks, Comput. Secur. 72 (2018) 212–233,.
[47]
T.D. Wagner, K. Mahbub, E. Palomar, A.E. Abdallah, Cyber threat intelligence sharing: survey and research directions, Comput. Secur. 87 (2019),.
[48]
J. Wei, K. Zou, EDA: easy data augmentation techniques for boosting performance on text classification tasks, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019,.
[49]
K.M. Yoo, D. Park, J. Kang, S.W. Lee, W. Park, GPT3Mix: leveraging large-scale language models for text augmentation, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 2225–2239,.
[50]
Zhang, N.; Li, L.; Chen, X.; Deng, S.; Bi, Z.; Tan, C.; Huang, F.; Chen, H. (2022): Differentiable prompt makes pre-trained language models better few-shot learners. arXiv:2108.13161.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computers and Security
Computers and Security  Volume 134, Issue C
Nov 2023
485 pages

Publisher

Elsevier Advanced Technology Publications

United Kingdom

Publication History

Published: 01 November 2023

Author Tags

  1. Cyber threat intelligence
  2. Few-shot learning
  3. Transfer learning
  4. Data augmentation
  5. Information overload

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media