Abstract
The first step in an incident response plan of an organization is to establish whether the reported event is in fact an incident. This is not an easy task especially if it is a novel event, which has not previously been documented. A typical classification of a novel event includes consulting a database of events with similar keywords and making a subjective decision by human. Efforts have been made to categorize events but there is no universal list of all possible incidents because each incident can be described in multiple different ways. In this paper we propose automating the process of receiving and classifying an event based on the assumption that the main difference between an event and an incident in the field of security is that an event is a positive or a neutral occurrence whereas an incident has strictly negative connotations. We applied sentiment analysis on event reports from the RISI dataset, and the results supported our assumption. We further observed that the sentiment analysis score and magnitude parameters of similar incidents were also very similar and we used them as features in a machine learning model along with other features obtained from each report such as impact and duration in order to predict the likelihood that an event is an incident. We found that using sentiment analysis as a feature of the model increases its accuracy, precision, and recall by at least 10%. The difference between our approach and the typical incident classification approach is that in our approach we train the system to recognize the incidents before any incident actually takes place and our system can recognize incidents even if their descriptions do not include keywords previously encountered by the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cohen, F.B.: Protection and Security on the Information Superhighway. Wiley, New York (1995)
ITU-T X.1056 Recommendations. http://handle.itu.int/11.1002/1000/9615. Accessed 02 June 2018
Howard, J.D., Longstaff, T.A.: A common language for computer security incidents. United States (1998). https://doi.org/10.2172/751004
Cohen, F.: Information system attacks: a preliminary classification scheme. Comput. Secur. 16(1), 29–46 (1997)
AI Automation for incident management. https://medium.com/kmeanswhat/ai-automation-for-incident-management-c872ee10e833. Accessed 02 June 2018
http://www.risidata.com/Database/P30. Accessed 02 June 2018
Pham, C.: From events to incidents, SANS InfoSec Reading Room. https://www.sans.org/reading-room/whitepapers/incident/events-incidents-646. Accessed 02 June 2018
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. CoRR, abs/1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Proceedings of the 26th International Conference on Neural Information Processing Systems, (NIPS 2013), vol. 2, pp. 3111–3119. Curran Associates Inc., USA (2013)
Bae, S., Yi, Y.: Acceleration of Word2vec using GPUs. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9948, pp. 269–279. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_31
Logistic Regression: Calculating a probability. https://developers.google.com/machine-learning/crash-course/logistic-regression/calculating-a-probability. Accessed 02 June 2018
Classification: precision and recall. https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall. Accessed 02 June 2018
Lazorenko, A.: TensorFlow Performance Test: CPU vs GPU. https://medium.com/andriylazorenko/tensorflow-performance-test-cpu-vs-gpu-79fcd39170c. Accessed 02 June 2018
Chu, V.: Benchmarking Tensorflow Performance and Cost Across Different GPU Options. https://medium.com/initialized-capital/benchmarking-tensorflow-performance-and-cost-across-different-gpu-options-69bd85fe5d58. Accessed 02 June 2018
Vafaie, H., Imam, I.F.: Feature selection methods: genetic algorithms vs. greedy-like search. In: Proceedings of the 3rd International Fuzzy Systems and Intelligent Control Conference, Louisville, KY, March 1994
LinearClassifier. https://www.tensorflow.org/api_docs/python/tf/estimator/LinearClassifier. Accessed 12 June 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ibrishimova, M.D., Li, K.F. (2018). Automating Incident Classification Using Sentiment Analysis and Machine Learning. In: Traore, I., Woungang, I., Ahmed, S., Malik, Y. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2018. Lecture Notes in Computer Science(), vol 11317. Springer, Cham. https://doi.org/10.1007/978-3-030-03712-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-03712-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03711-6
Online ISBN: 978-3-030-03712-3
eBook Packages: Computer ScienceComputer Science (R0)