Applications of Machine Learning in Cyber Security: A Review
<p>Malware infection process and consequences; side effects denoted with dashed arrows.</p> "> Figure 2
<p>Ransomware infection process and consequences; side effects denoted with dashed arrows.</p> "> Figure 3
<p>Phishing infection process and consequences.</p> "> Figure 4
<p>DDoS infection process and consequences.</p> "> Figure 5
<p>SQL injection infection process and consequences.</p> "> Figure 6
<p>Zero-day exploit infection process and consequences.</p> "> Figure 7
<p>DNS tunneling infection process and consequences.</p> "> Figure 8
<p>XSS attack infection process and consequences.</p> "> Figure 9
<p>Social engineering-based infection process and consequences.</p> "> Figure 10
<p>Bubble chart of datasets by year created, number of attack labels, and dataset name as annotation. The bubble size demonstrates the number of sub-groups or scenarios.</p> "> Figure 11
<p>User perception of FN/FP in IDS and recommended resolutions.</p> ">
Abstract
:1. Introduction
2. Review Method
3. The Cyber Security Research Landscape
- Advanced Threat Detection and Response: New methods of identification of patterns and anomalies, indicative of potential security threats, enhance the capabilities for detecting and responding to threats [5].
- IoT security: The quick integration of Internet of Things (IoT) devices into the digital ecosystem necessitates secure frameworks to mitigate inherent vulnerabilities [6].
- The role of encryption: The evolution of encryption technologies, including the development of quantum-resistant algorithms, is critical [9].
- Blockchain for security: The blockchain’s decentralized framework ensures the integrity and transparency of transactions and data exchanges in cyber security infrastructures [10], as well as identity management and secure communication.
- Cyber security awareness and training: This is vital for reducing the susceptibility to human errors, which cause a significant proportion of security breaches [11].
- Incident response planning: This is characterized by well-defined protocols and responsibilities, demonstrating preparedness and resilience [12].
3.1. Types of Cyber Security
3.1.1. Infrastructure and Network Security
- Physical security measures: biometric access, deployment of security personnel, and surveillance systems preventing unauthorized physical access [14].
- Virtual protection mechanisms: sophisticated intrusion detection systems, routine security assessments to identify and rectify vulnerabilities, and the maintenance of current software and hardware [15].
- Redundancy and resilience: backup systems and alternative data routes ensure service continuity [13].
- Firewalls: a defense line between secure/internal and potentially unsafe/external networks [16].
- Two-Factor Authentication (2FA): two forms of user identification prior to granting network access [19].
- Remote access management: network access to authorized personnel; e.g., through a Virtual Private Network (VPN).
3.1.2. Application, Information Security, and Human Factors
3.2. Cyber Security Attack Types
3.2.1. Malware
3.2.2. Ransomware
3.2.3. Phishing
3.2.4. DDoS
3.2.5. SQL Injections
3.2.6. Zero-Day Exploit
3.2.7. DNS Tunneling
3.2.8. XSS Attacks
3.2.9. Social Engineering
4. Dataset Availability and Assessment
- include audit logs and raw network data;
- provide a variety of modern attacks;
- represent realistic and diverse normal traffic;
- be labeled;
- comply with ethical AI principles and privacy protocols (e.g., GPDR);
- be accepted by the scientific community.
5. Intrusion Detection System Evaluation
- Explainability: addressing the need for IDS system decisions to be easily explainable to the user,
- Bias: addressing imbalance and multicollinearity, treating outliers effectively, and efforts to mitigate dataset imbalance, ultimately affecting the ability of the model to eliminate false predictions,
- Robustness: evaluating the model against attacks and normal traffic and analyzing the repeatability of the outcomes,
- Efficiency: the model’s inference execution time and whether it is reported.
6. Shortcomings in Existing IDSs
6.1. False Identification
6.2. Ethical AI and Compliance Aspects
- Transparency: In the cyber security world, transparency is essential, and transparency about the model’s decision-making process is especially vital in capturing why the model sets off an alert or makes specific predictions. Explainable AI (XAI) techniques like SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations) can be used to elucidate the inner workings of a model and provide insight into its threat detection abilities for human understanding and trust. An informational case example within the IBM Watson for CS (https://medium.com/trusted-ai/explainability-using-shap-in-ibm-watson-openscale-55548adedf38, accessed on 3 November 2024) project focused on how to make Watson’s detection approach transparent using natural language processing to reveal real-time information for a batch of incidents. SHAP integration proved essential to explain features that contributed to the detection of threats, easing the interpretation of the model for cyber security teams.
- Fairness and bias mitigation: In the scope of cyber security, fairness can be ensured if ML models prepared for threat detection do not discriminate against any group of malicious activities or any potentially relevant data source from which threats might arise. A case study of bias in a cyber security dataset could use SMOTE to address the data bias, problem where the team aimed for a more balanced cyber security dataset, allowing the system to recognize a large and diverse set of malicious threats, thus lowering the possibility of biased detection. Microsoft has published extensively in this domain (https://www.microsoft.com/de-ch/ai/responsible-ai, accessed on 3 November 2024).
- Privacy and security: Privacy-preserving techniques such as differential privacy or federated learning ensure that individual data are protected. Google’s Federated Learning for Mobile Threat Detection epitomizes privacy-centric cyber security (https://research.google/pubs/federated-learning-for-mobile-keyboard-prediction-2/, accessed on 3 November 2024 ). This approach detects malware by training models directly on users’ devices rather than centralized servers. Through this distributed approach, detection happens without sacrificing sensitive data, and the certainty of a balance between security and privacy is achieved.
- Accountability with auditability: This can involve the provision for audit trails, along with auditing of any changes in model performance. The EU’s AI4Cyber (https://ai4cyber.eu/, accessed on 3 November 2024) project team included an audit framework for cyber security models. This auditing framework allowed for a series of regular reviews and impact evaluations to gauge success in model performance, to determine conformity with the EU GDPR, and to create a responsible mechanism for consideration of complaints and problems relating to the model.
- Robustness and security against adversarial attacks: Models used in cyber security must also be formally resistant to adversarial attacks that aim to manipulate underlying vulnerabilities in the training data or the model structure. Such robustness can be achieved through robust training paradigms, such as adversarial training. The Guaranteeing AI Robustness against Deception project by DARPA (https://www.darpa.mil/program/guaranteeing-ai-robustness-against-deception, accessed on 3 November 2024) is an example of a project that secures ML models against adversarial threats. In this project, models were trained on artificially generated cyber security data to prepare them for real-world malicious attacks.
- User-centric design and human oversight: In ethical AI within cyber security, user-centric design must ensure human oversight is prioritized. A human-in-the-loop (HITL) approach allows for human intervention if automation fails to suffice, thus providing a layer of ethical decision-making. The Umbrella Security platform (https://umbrella.cisco.com/, accessed on 3 November 2024) used by Cisco effectively implements HITL strategies, flagging uncertain cases for human review.
- Compliance: Integrating frameworks such as GDPR, EU AI Act, and NIST’s AI risk management framework will ensure the legal use and applications of ethical AI. Periodic evaluations ensure the models remain compliant with pressure from regulation amendments. The NIST’s AI Compliance for Federal Cyber Security project requires the creation of federal cyber security models complying with the NIST (https://www.nist.gov/itl/ai-risk-management-framework, accessed on 3 November 2024 ) principles for AI ethics, fairness, and robustness, setting a template for cyber security teams throughout the U.S. to maintain policy compliance and set a standard for ethical AI across public cyber security.
7. Open Questions
- RQ1: Which features used in ML/AI training can be considered sensitive data, and how can these be protected without losing utility?
- RQ2: Which datasets and ML approaches used for intrusion detection have been affected by unnecessary high dimensionality or multicollinearity?
- RQ3: What methods can be used to detect and mitigate bias in IDS models?
- RQ4: Which of the modern approaches for transparency and explainability are useful for cyber security-relevant datasets and IDS models, and how should they be adapted?
- RQ5: How can we protect ML models for IDS from adversarial attacks, including evasion and poisoning attacks?
- RQ6: What are the legal and accountability implications of ML/AL-based decisions in IDS?
- RQ7: Which modern approaches in incremental learning can be adapted to enable IDS models to learn continuously and adapt to evolving threats without introducing new ethical or security issues?
- RQ8: What optimizations can be suitably applied in IDS techniques to make them computationally efficient and propose acceptable trade-offs between accuracy and resource consumption?
- RQ9: What frameworks and certification processes can be developed to standardize ethical practices in ML/AI for IDS?
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- bin Zainuddin, A.A.; Sairin, H.; Mazlan, I.A.; Muslim, N.N.A.; Sabarudin, W.A.S.W. Enhancing IoT Security: A Synergy of Machine Learning, Artificial Intelligence, and Blockchain. Data Sci. Insights 2024, 2, 11. [Google Scholar]
- Mammeri, Z.Z. Introduction to Computer Security; Wiley Data and Cybersecurity: Hoboken, NJ, USA, 2024. [Google Scholar]
- Manikandan, V.; Raj, V.; Janakiraman, S.; Sivaraman, R.; Amirtharajan, R. Let wavelet authenticate and tent-map encrypt: A sacred connect against a secret nexus. Soft Comput. 2024, 28, 6839–6853. [Google Scholar] [CrossRef]
- Hayagreevan, H.; Khamaru, S. Security of and by Generative AI platforms. arXiv 2024, arXiv:2410.13899. [Google Scholar]
- Mijwil, M.; Salem, I.E.; Ismaeel, M.M. The Significance of Machine Learning and Deep Learning Techniques in Cybersecurity: A Comprehensive Review. Iraqi J. Comput. Sci. Math. 2023, 4, 87–101. [Google Scholar]
- Alrawais, A.; Alhothaily, A.; Hu, C.; Cheng, X. Fog computing for the internet of things: Security and privacy issues. IEEE Internet Comput. 2017, 21, 34–42. [Google Scholar] [CrossRef]
- Azam, N.; Michala, A.L.; Ansari, S.; Truong, N.B. Modelling Technique for GDPR-Compliance: Toward a Comprehensive Solution. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3300–3305. [Google Scholar]
- Kulesza, J.; Balleste, R. Cybersecurity and Human Rights in the Age of Cyberveillance; Rowman & Littlefield: Lanham, MD, USA, 2015. [Google Scholar]
- Chen, L.; Chen, L.; Jordan, S.; Liu, Y.K.; Moody, D.; Peralta, R.; Perlner, R.A.; Smith-Tone, D. Report on Post-Quantum Cryptography; US Department of Commerce, National Institute of Standards and Technology: Gaithersburg, MD, USA, 2016; Volume 12.
- Kshetri, N. Can blockchain strengthen the internet of things? IT Prof. 2017, 19, 68–72. [Google Scholar] [CrossRef]
- Hadlington, L. Human factors in cybersecurity; Examining the link between Internet addiction, impulsivity, attitudes towards cybersecurity, and risky cybersecurity behaviours. Heliyon 2017, 3, e00346. [Google Scholar] [CrossRef]
- Cichonski, P.; Millar, T.; Grance, T.; Scarfone, K. Computer security incident handling guide. NIST Spec. Publ. 2012, 800, 1–147. [Google Scholar]
- Sharma, S.; Mishra, N. Original Research Article Anomaly detection in Smart Traffic Light system using blockchain: Securing through proof of stake and machine learning. J. Auton. Intell. 2024, 7, 1087. [Google Scholar] [CrossRef]
- Wisdom, D.D.; Vincent, O.R.; Igulu, K.; Hyacinth, E.A.; Christian, A.U.; Oduntan, O.E.; Hauni, A.G. Industrial IoT Security Infrastructures and Threats. In Communication Technologies and Security Challenges in IoT: Present and Future; Springer: Singapore, 2024; pp. 369–402. [Google Scholar]
- Tarab, H.I. Cyber-attack detection and identification using deep learning. Int. J. Comput. Artif. Intell. 2024, 5, 42–49. [Google Scholar] [CrossRef]
- Swathi, G.C.; Kumar, G.K.; Kumar, A.S. Ensemble classification to predict botnet and its impact on IoT networks. Meas. Sensors 2024, 33, 101130. [Google Scholar] [CrossRef]
- Buedi, E.D.; Ghorbani, A.A.; Dadkhah, S.; Ferreira, R.L. Enhancing EV Charging Station Security Using A Multi-dimensional Dataset: CICEVSE2024. Res. Sq. 2024. [Google Scholar] [CrossRef]
- Lightbody, D.; Ngo, D.M.; Temko, A.; Murphy, C.C.; Popovici, E. Dragon_Pi: IoT Side-Channel Power Data Intrusion Detection Dataset and Unsupervised Convolutional Autoencoder for Intrusion Detection. Future Internet 2024, 16, 88. [Google Scholar] [CrossRef]
- Murthy, A.; Asghar, M.R.; Tu, W. A lightweight Intrusion Detection for Internet of Things-based smart buildings. Secur. Priv. 2024, 7, e386. [Google Scholar] [CrossRef]
- Nijim, M.; Kanumuri, V.; Al Aqqad, W.; Albataineh, H. Machine Learning Based Analysis of Cyber-Attacks Targeting Smart Grid Infrastructure. In Proceedings of the International Conference on Advances in Computing Research, Madrid, Spain, 3–5 June 2024; Springer: Cham, Switzerland, 2024; pp. 334–349. [Google Scholar]
- Pulimamidi, R. To enhance customer (or patient) experience based on IoT analytical study through technology (IT) transformation for E-healthcare. Meas. Sensors 2024, 33, 101087. [Google Scholar] [CrossRef]
- Bolat-Akça, B.; Bozkaya, E. Digital twin-assisted intelligent anomaly detection system for Internet of Things. Ad Hoc Netw. 2024, 158, 103484. [Google Scholar] [CrossRef]
- Sikorski, M.; Honig, A. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software; No Starch Press: San Francisco, CA, USA, 2012. [Google Scholar]
- Ucci, D.; Aniello, L.; Baldoni, R. Survey of machine learning techniques for malware analysis. Comput. Secur. 2019, 81, 123–147. [Google Scholar] [CrossRef]
- Javaid, A.; Niyaz, Q.; Sun, W.; Alam, M. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), New York, NY, USA, 3–5 December 2015; pp. 21–26. [Google Scholar]
- Savage, K.; Coogan, P.; Lau, H. The Evolution of Ransomware, Symantec Security Response; Symantec Corporation: Mountain View, CA, USA, 2015. [Google Scholar]
- Kharraz, A.; Robertson, W.; Balzarotti, D.; Bilge, L.; Kirda, E. Cutting the gordian knot: A look under the hood of ransomware attacks. In Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment: 12th International Conference, DIMVA 2015, Milan, Italy, 9–10 July 2015; Springer: Cham, Switzerland, 2015; pp. 3–24. [Google Scholar]
- Richardson, R.; North, M.M. Ransomware: Evolution, mitigation and prevention. Int. Manag. Rev. 2017, 13, 10. [Google Scholar]
- Liska, A.; Gallo, T. Ransomware: Defending Against Digital Extortion; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016. [Google Scholar]
- Hadnagy, C. Social Engineering: The Art of Human Hacking; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
- Collier, H.; Morton, C. Teenagers: A Social Media Threat Vector. In Proceedings of the International Conference on Cyber Warfare and Security, Johannesburg, South Africa, 26–27 March 2024; Voume 19, pp. 55–61. [Google Scholar]
- Hix, J.; Teng, J.; Juker, M.; Ryan, G. AI-Based Phishing Countermeasures; Embry-Riddle Aeronautical University, Prescott Campus: Prescott, AZ, USA, 2024. [Google Scholar]
- Adekunle, T.S.; Alabi, O.O.; Lawrence, M.O.; Ebong, G.N.; Ajiboye, G.O.; Bamisaye, T.A. The Use of AI to Analyze Social Media Attacks for Predictive Analytics. J. Comput. Theor. Appl. 2024, 2, 169–178. [Google Scholar]
- Ussatova, O.; Zhumabekova, A.; Karyukin, V.; Matson, E.T.; Ussatov, N. The development of a model for the threat detection system with the use of machine learning and neural network methods. Int. J. Innov. Res. Sci. Stud. 2024, 7, 863–877. [Google Scholar] [CrossRef]
- Abu-Amara, F.; Hosani, R.A.; Tamimi, H.A.; Hamdi, B.A. Spreading cybersecurity awareness via gamification: Zero-day game. Int. J. Inf. Technol. 2024, 16, 2945–2953. [Google Scholar] [CrossRef]
- Heartfield, R.; Loukas, G. A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. (CSUR) 2015, 48, 1–39. [Google Scholar] [CrossRef]
- Mirkovic, J.; Reiher, P. A taxonomy of DDoS attack and DDoS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 2004, 34, 39–53. [Google Scholar] [CrossRef]
- Kambourakis, G.; Kolias, C.; Stavrou, A. The mirai botnet and the iot zombie armies. In Proceedings of the MILCOM 2017—2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 23–25 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 267–272. [Google Scholar]
- Zekri, M.; El Kafhali, S.; Aboutabit, N.; Saadi, Y. DDoS attack detection using machine learning techniques in cloud computing environments. In Proceedings of the 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, 24–26 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–7. [Google Scholar]
- Zargar, S.T.; Joshi, J.; Tipper, D. A survey of defense mechanisms against distributed denial of service (DDoS) flooding attacks. IEEE Commun. Surv. Tutor. 2013, 15, 2046–2069. [Google Scholar] [CrossRef]
- Jemal, I.; Cheikhrouhou, O.; Hamam, H.; Mahfoudhi, A. Sql injection attack detection and prevention techniques using machine learning. Int. J. Appl. Eng. Res. 2020, 15, 569–580. [Google Scholar]
- Falor, A.; Hirani, M.; Vedant, H.; Mehta, P.; Krishnan, D. A deep learning approach for detection of SQL injection attacks using convolutional neural networks. In Proceedings of the Data Analytics and Management: ICDAM 2021, Polkowice, Poland, 26 June 2021; Springer: Singapore, 2022; Volume 2, pp. 293–304. [Google Scholar]
- Sabottke, C.; Suciu, O.; Dumitraș, T. Vulnerability disclosure in the age of social media: Exploiting twitter for predicting {Real-World} exploits. In Proceedings of the 24th USENIX Security Symposium (USENIX Security 15), Washington, DC, USA, 12–14 August 2015; pp. 1041–1056. [Google Scholar]
- Radhakrishnan, K.; Menon, R.R.; Nath, H.V. A survey of zero-day malware attacks and its detection methodology. In Proceedings of the TENCON 2019—2019 IEEE Region 10 Conference (TENCON), Kochi, India, 17–20 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 533–539. [Google Scholar]
- Farnham, G.; Atlasis, A. Detecting DNS tunneling. SANS Inst. Infosec Read. Room 2013, 9, 1–32. [Google Scholar]
- Zhang, R.; Zhang, Y.; Ren, K. Distributed privacy-preserving access control in sensor networks. IEEE Trans. Parallel Distrib. Syst. 2011, 23, 1427–1438. [Google Scholar] [CrossRef]
- Abualghanam, O.; Alazzam, H.; Elshqeirat, B.; Qatawneh, M.; Almaiah, M.A. Real-time detection system for data exfiltration over DNS tunneling using machine learning. Electronics 2023, 12, 1467. [Google Scholar] [CrossRef]
- Matti, E. Evaluation of Open Source Web Vulnerability Scanners and Their Techniques Used to Find SQL Injection and Cross-Site Scripting Vulnerabilities. Dissertation. 2021. Available online: https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177606 (accessed on 3 November 2024).
- Venkatesha, S.; Reddy, K.R.; Chandavarkar, B. Social engineering attacks during the COVID-19 pandemic. SN Comput. Sci. 2021, 2, 78. [Google Scholar] [CrossRef]
- Granger, S. Social Engineering Fundamentals, Part I: Hacker Tactics. 2003. Available online: https://api.semanticscholar.org/CorpusID:110906298 (accessed on 3 November 2024).
- Wilson, M.; Hash, J. Building an information technology security awareness and training program. NIST Spec. Publ. 2003, 800, 1–39. [Google Scholar]
- Kus, D.; Wagner, E.; Pennekamp, J.; Wolsing, K.; Fink, I.B.; Dahlmanns, M.; Wehrle, K.; Henze, M. A False Sense of Security? Revisiting the State of Machine Learning-Based Industrial Intrusion Detection. In Proceedings of the 8th ACM on Cyber-Physical System Security Workshop, Nagasaki, Japan, 30 May 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 73–84. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Hakak, S.; Ghorbani, A.A. Developing Realistic Distributed Denial of Service (DDoS) Attack Dataset and Taxonomy. In Proceedings of the 2019 International Carnahan Conference on Security Technology (ICCST), Chennai, India, 1–3 October 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Houda, Z.A.E.; Brik, B.; Khoukhi, L. “Why Should I Trust Your IDS?”: An Explainable Deep Learning Framework for Intrusion Detection Systems in Internet of Things Networks. IEEE Open J. Commun. Soc. 2022, 3, 1164–1176. [Google Scholar] [CrossRef]
- Thakkar, A.; Lohiya, R. Fusion of statistical importance for feature selection in Deep Neural Network-based Intrusion Detection System. Inf. Fusion 2023, 90, 353–363. [Google Scholar] [CrossRef]
- Satyanarayana, G.; Chatrapathi, K.S. Improving Intrusion Detection Performance with Genetic Algorithm-Based Feature Extraction and Ensemble Machine Learning Methods. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 100–112. [Google Scholar]
- Yin, Y.; Jang-Jaccard, J.; Xu, W.; Singh, A.; Zhu, J.; Sabrina, F.; Kwak, J. IGRF-RFE: A hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J. Big Data 2023, 10, 15. [Google Scholar] [CrossRef]
- Pinto, A.; Herrera, L.C.; Donoso, Y.; Gutierrez, J.A. Survey on Intrusion Detection Systems Based on Machine Learning Techniques for the Protection of Critical Infrastructure. Sensors 2023, 23, 2415. [Google Scholar] [CrossRef]
- Thakkar, A.; Lohiya, R. A Review on Challenges and Future Research Directions for Machine Learning-Based Intrusion Detection System. Arch. Comput. Methods Eng. 2023, 30, 4245–4269. [Google Scholar] [CrossRef]
- Thakkar, A.; Lohiya, R. A survey on intrusion detection system: Feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 2022, 55, 453–563. [Google Scholar] [CrossRef]
- Sarker, I. Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective. SN Comput. Sci. 2021, 2, 154. [Google Scholar] [CrossRef]
- Chan, J.Y.L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.W.; Chen, Y.L. Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
- Boukerche, A.; Zheng, L.; Alfandi, O. Outlier Detection: Methods, Models, and Classification. ACM Comput. Surv. 2020, 53, 55. [Google Scholar] [CrossRef]
- Kumar, P.; Bhatnagar, R.; Gaur, K.; Bhatnagar, A. Classification of imbalanced data: Review of methods and applications. Iop Conf. Ser. Mater. Sci. Eng. 2021, 1099, 012077. [Google Scholar] [CrossRef]
- Nabi, F.; Zhou, X. Enhancing Intrusion Detection Systems Through Dimensionality Reduction: A Comparative Study of Machine Learning Techniques for Cyber Security. Cyber Secur. Appl. 2024, 2, 100033. [Google Scholar] [CrossRef]
- Zoghi, Z.; Serpen, G. UNSW-NB15 Computer Security Dataset: Analysis through Visualization. arXiv 2021, arXiv:2101.05067. [Google Scholar] [CrossRef]
- Musleh, D.; Alotaibi, M.; Alhaidari, F.; Rahman, A.; Mohammad, R.M. Intrusion Detection System Using Feature Extraction with Machine Learning Algorithms in IoT. J. Sens. Actuator Netw. 2023, 12, 29. [Google Scholar] [CrossRef]
- Dehlaghi-Ghadim, A.; Moghadam, M.H.; Balador, A.; Hansson, H. Anomaly Detection Dataset for Industrial Control Systems. arXiv 2023, arXiv:c2305.09678. [Google Scholar] [CrossRef]
- Kumar, A.; Sharma, I. CNN-based Approach for IoT Intrusion Attack Detection. In Proceedings of the 2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 23–25 March 2023; pp. 492–496. [Google Scholar] [CrossRef]
- Subbiah, S.; Anbananthen, K.S.M.; Thangaraj, S.; Kannan, S.; Chelliah, D. Intrusion detection technique in wireless sensor network using grid search random forest with Boruta feature selection algorithm. J. Commun. Netw. 2022, 24, 264–273. [Google Scholar] [CrossRef]
- Imanbayev, A.; Tynymbayev, S.; Odarchenko, R.; Gnatyuk, S.; Berdibayev, R.; Baikenov, A.; Kaniyeva, N. Research of Machine Learning Algorithms for the Development of Intrusion Detection Systems in 5G Mobile Networks and Beyond. Sensors 2022, 22, 9957. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems. In Proceedings of the 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Kyoto, Japan, 5 November 2015; pp. 25–31. [Google Scholar] [CrossRef]
- Siganos, M.; Radoglou-Grammatikis, P.; Kotsiuba, I.; Markakis, E.; Moscholios, I.; Goudos, S.; Sarigiannidis, P. Explainable AI-Based Intrusion Detection in the Internet of Things. In Proceedings of the 18th International Conference on Availability, Reliability and Security, Benevento Italy, 29 August–1 September 2023; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
- Bacevicius, M.; Paulauskaite-Taraseviciene, A. Machine Learning Algorithms for Raw and Unbalanced Intrusion Detection Data in a Multi-Class Classification Problem. Appl. Sci. 2023, 13, 7328. [Google Scholar] [CrossRef]
- Hnamte, V.; Hussain, J. Dependable intrusion detection system using deep convolutional neural network: A Novel framework and performance evaluation approach. Telemat. Inform. Rep. 2023, 11, 100077. [Google Scholar] [CrossRef]
- Hnamte, V.; Hussain, J. DCNNBiLSTM: An Efficient Hybrid Deep Learning-Based Intrusion Detection System. Telemat. Inform. Rep. 2023, 10, 100053. [Google Scholar] [CrossRef]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
- Strandberg, P.E.; Söderman, D.; Dehlaghi-Ghadim, A.; Leon, M.; Markovic, T.; Punnekkat, S.; Moghadam, M.H.; Buffoni, D. The Westermo network traffic data set. Data Brief 2023, 50, 109512. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Li, T.; Liang, G.; He, W.; Zhao, Y. A simple recurrent unit model based intrusion detection system with DCGAN. IEEE Access 2019, 7, 83286–83296. [Google Scholar] [CrossRef]
- Dunmore, A.; Jang-Jaccard, J.; Sabrina, F.; Kwak, J. A Comprehensive Survey of Generative Adversarial Networks (GANs) in Cybersecurity Intrusion Detection. IEEE Access 2023, 11, 76071–76094. [Google Scholar] [CrossRef]
- Ho, C.Y.; Lai, Y.C.; Chen, I.W.; Wang, F.Y.; Tai, W.H. Statistical analysis of false positives and false negatives from real traffic with intrusion detection/prevention systems. IEEE Commun. Mag. 2012, 50, 146–154. [Google Scholar] [CrossRef]
- Pietraszek, T.; Tanner, A. Data mining and machine learning—Towards reducing false positives in intrusion detection. Inf. Secur. Tech. Rep. 2005, 10, 169–183. [Google Scholar] [CrossRef]
- Ohta, S.; Kurebayashi, R.; Kobayashi, K. Minimizing false positives of a decision tree classifier for intrusion detection on the internet. J. Netw. Syst. Manag. 2008, 16, 399–419. [Google Scholar] [CrossRef]
- Pietraszek, T. Using adaptive alert classification to reduce false positives in intrusion detection. In Proceedings of the Recent Advances in Intrusion Detection: 7th International Symposium, RAID 2004, Sophia Antipolis, France, 15–17 September 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 102–124. [Google Scholar]
- Hachmi, F.; Boujenfa, K.; Limam, M. Enhancing the accuracy of intrusion detection systems by reducing the rates of false positives and false negatives through multi-objective optimization. J. Netw. Syst. Manag. 2019, 27, 93–120. [Google Scholar] [CrossRef]
- Jose, J.; Jose, D.V. AS-CL IDS: Anomaly and signature-based CNN-LSTM intrusion detection system for internet of things. Int. J. Adv. Technol. Eng. Explor. 2023, 10, 1622–1639. [Google Scholar]
- Al Jallad, K.; Aljnidi, M.; Desouki, M.S. Anomaly detection optimization using big data and deep learning to reduce false-positive. J. Big Data 2020, 7, 68. [Google Scholar] [CrossRef]
- Latah, M.; Toker, L. Minimizing false positive rate for DoS attack detection: A hybrid SDN-based approach. ICT Express 2020, 6, 125–127. [Google Scholar] [CrossRef]
- Pitre, P.; Gandhi, A.; Konde, V.; Adhao, R.; Pachghare, V. An intrusion detection system for zero-day attacks to reduce false positive rates. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Vij, C.; Saini, H. Intrusion detection systems: Conceptual study and review. In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 694–700. [Google Scholar]
- Azeez, N.A.; Bada, T.M.; Misra, S.; Adewumi, A.; Van der Vyver, C.; Ahuja, R. Intrusion detection and prevention systems: An updated review. In Data Management, Analytics and Innovation: Proceedings of ICDMAI 2019, Volume 1; Springer: Singapore, 2020; pp. 685–696. [Google Scholar]
- Shin, Y.; Kim, K. Comparison of anomaly detection accuracy of host-based intrusion detection systems based on different machine learning algorithms. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 33. [Google Scholar] [CrossRef]
- Laghrissi, F.; Douzi, S.; Douzi, K.; Hssina, B. IDS-attention: An efficient algorithm for intrusion detection systems using attention mechanism. J. Big Data 2021, 8, 149. [Google Scholar] [CrossRef]
- Jiang, Y.; Atif, Y. A selective ensemble model for cognitive cybersecurity analysis. J. Netw. Comput. Appl. 2021, 193, 103210. [Google Scholar] [CrossRef]
- Alkhudaydi, O.A.; Krichen, M.; Alghamdi, A.D. A deep learning methodology for predicting cybersecurity attacks on the internet of things. Information 2023, 14, 550. [Google Scholar] [CrossRef]
- Alahmadi, B.A.; Axon, L.; Martinovic, I. 99% false positives: A qualitative study of {SOC} analysts’ perspectives on security alarms. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 2783–2800. [Google Scholar]
- Al-Shehari, T.; Rosaci, D.; Al-Razgan, M.; Alfakih, T.; Kadrie, M.; Afzal, H.; Nawaz, R. Enhancing Insider Threat Detection in Imbalanced Cybersecurity Settings Using the Density-Based Local Outlier Factor Algorithm. IEEE Access 2024, 12, 34820–34834. [Google Scholar] [CrossRef]
Type | Role & Methods |
---|---|
Infrastructure security | Protects infrastructure, such as power networks and data centers, and confirms the absence of any gaps Physical security–virtual security–redundancy/resilience |
Network security | Protects networks from intrusions by utilizing certain tools, such as intrusion detection and prevention systems (IDPS), remote access management (AC), two-factor authentication (2FA), and firewalls Firewalls–IDPS–2FA–AC |
Application security | Executes convoluted codes to preserve and encrypt data and codes in a way that is difficult to crack Security by design (sperating system, embedded, application) |
Information security | Protects data from unauthorized access and modifications Database and communication encryption–AC |
User education | Safeguards all of the above systems by reducing human error factors, especially those related to providing access |
Threat Type | Level of Threat | Sophistication | Potential Impact | Mitigation Complexity |
---|---|---|---|---|
Malware | Moderate to high | Moderate | High | Moderate to high |
Ransomware | Moderate to high | Moderate | High | Moderate to high |
Phishing | High | Moderate | High | Moderate |
Distributed Denial of Service (DDoS) | High | High | Very high | Very high |
SQL Injection | High | Moderate | Very high | Moderate to high |
Zero-day exploits | Very high | Very high | Very high | Very high |
Domain Name System (DNS) tunnel. | Moderate to high | High | Moderate to high | High |
Cross-Site Scripting (XSS) | Moderate to high | Moderate | Moderate to high | Moderate |
Social engineering | High | Variable | High | Moderate to high |
Modeling Approach | Prevelance (%) |
---|---|
SVM | 20.83 |
RNN | 8.33 |
Regression methods | 4.17 |
Isolation, random forest, XGBoost | 19.44 |
Autoencoders | 2.78 |
Unspecified classification | 9.72 |
Digital twins | 1.39 |
Multiobjective optimization | 1.39 |
Hybrid models | 18.06 |
Decision trees | 13.89 |
KNN | 4.17 |
Generative AI | 4.17 |
Ensemble methods | 15.28 |
Bayesian networks | 5.56 |
NN | 9.72 |
CNN | 18.06 |
DNN | 25 |
Dataset | Attack Diversity | Realism | Balance | Quality | Size and Complexity |
---|---|---|---|---|---|
UNSW-NB15 | Good, some imbalanced classes | Large and complex | |||
NSL-KDD | Improved balance and reduced redundancy | Manageable | |||
CICIDS2017/2018 | Generally good, some imbalance | Very large |
Dataset | Audit Logs & Raw Data | Modern Attacks | Real or Simulated | Labelled | AI & GDPR Compliant | Scientifically Accepted |
---|---|---|---|---|---|---|
UNSW-NB15 | Yes (raw packet data, audit logs unclear) | Partially (2015) | Simulated (generated in a controlled environment) | Yes | Presumed yes (anonymized) | Yes (widely used) |
NSL KDD | Yes (raw packet data, audit logs unclear) | No (1999) | Simulated (injecting attacks into normal flow) | Yes | Presumed yes (KDD’99, privacy concerns addressed) | Yes (it is still a reference dataset in the community) |
CIC datasets | Yes | Yes (2017/19) | Simulated (attacks to emulate real-world situations) | Yes | Yes (anonymized) | Yes (Recently gaining acceptance, but limited applicability) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vourganas, I.J.; Michala, A.L. Applications of Machine Learning in Cyber Security: A Review. J. Cybersecur. Priv. 2024, 4, 972-992. https://doi.org/10.3390/jcp4040045
Vourganas IJ, Michala AL. Applications of Machine Learning in Cyber Security: A Review. Journal of Cybersecurity and Privacy. 2024; 4(4):972-992. https://doi.org/10.3390/jcp4040045
Chicago/Turabian StyleVourganas, Ioannis J., and Anna Lito Michala. 2024. "Applications of Machine Learning in Cyber Security: A Review" Journal of Cybersecurity and Privacy 4, no. 4: 972-992. https://doi.org/10.3390/jcp4040045
APA StyleVourganas, I. J., & Michala, A. L. (2024). Applications of Machine Learning in Cyber Security: A Review. Journal of Cybersecurity and Privacy, 4(4), 972-992. https://doi.org/10.3390/jcp4040045