Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

Mohammad Reza Abbaszadeh Bavil Soflaei¹,
Arash Salehpour² &
Karim Samadzamini¹

423 Accesses
Explore all metrics

Abstract

With the expansion of the Internet, Internet of Things devices, and related services, effective intrusion detection systems are vital in cybersecurity. This study presents a significant advancement in cybersecurity by leveraging ensemble learning techniques alongside generative adversarial networks, proposing a novel framework for network behavior classification using the UNSW-NB15 dataset. Similar to any other real-world dataset, the UNSW-NB15 dataset poses inherent challenges of data imbalance, with significantly fewer instances of intrusion compared to normal network behavior. Our main contribution to the existing literature is the introduction of a conditional tabular generative adversarial network (CTGAN), aimed at addressing the existing issue of data imbalance in the dataset. In previous approaches, this issue was often overlooked; however, the proposed framework achieves a substantial improvement in model performance by balancing this dataset. Through training three shallow binary classification algorithms (decision trees, logistic regression, and Gaussian naive Bayes) on both the CTGAN-balanced data and the original imbalanced dataset, we uncover remarkable improvements in identifying network intrusion. Our study employs a novel two-stage label-wise ensembling process, notably resulting in a final XGBoost meta-classifier. The ultimate achievement of our framework demonstrates 98% accuracy for binary classification and 95% for multi-class classification, outperforming existing state-of-the-art models. By offering a robust framework for effective intrusion detection, this work marks a substantial step forward in addressing data imbalance challenges within the UNSW-NB15 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

MIM: A multiple integration model for intrusion detection on imbalanced samples

Article 10 July 2024

Detecting cybersecurity attacks across different network features and learners

Article Open access 23 February 2021

Intrusion detection based on ensemble learning for big data classification

Article 07 November 2023

Data availability

The dataset used in this study is the UNSW-NB15 dataset, which is a publicly available dataset at https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15.

References

Nie F et al (2024) M2VT-IDS: A multi-task multi-view learning architecture for designing IoT intrusion detection system. Internet of Things 25:101102
Google Scholar
The Internet of Things (IoT) units installed base by category from 2014 to 2020. Available from: https://www.statista.com/statistics/370350/internet-of-things-installed-base-by-category/.
Internet of Things (IoT) connected devices installed base worldwide from 2015 to 2025. Available from: https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/.
Rani S et al (2021) Threats and corrective measures for IoT security with observance of cybercrime: a survey. Wirel Commun Mob Comput 2021:5579148
Google Scholar
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS)
Anderson JP (1980) Computer security threat monitoring and surveillance
Scarfone K, Mell P (2010) Intrusion detection and prevention systems. In: Stavroulakis P, Stamp M (eds) Handbook of information and communication security. Springer, Berlin, pp 177–192
Google Scholar
Aldweesh A, Derhab A, Emam AZ (2020) Deep learning approaches for anomaly-based intrusion detection systems: a survey, taxonomy, and open issues. Knowl-Based Syst 189:105124
Google Scholar
Liao H-J et al (2013) Intrusion detection system: a comprehensive review. J Netw Comput Appl 36(1):16–24
Google Scholar
Salehpour A, Samadzamini K (2024) A bibliometric analysis on the application of deep learning in economics, econometrics, and finance. Int J Comput Sci Eng 27(2):167–181
Google Scholar
Kabilan N, Ravi V, Sowmya V (2024) Unsupervised intrusion detection system for in-vehicle communication networks. J Saf Sci Resilience
Abdallah EE, Eleisah W, Otoom AF (2022) Intrusion detection systems using supervised machine learning techniques: a survey. Procedia Computer Sci 201:205–212
Google Scholar
Sun Z et al (2024) Optimized machine learning enabled intrusion detection 2 system for internet of medical things. Franklin Open 6:100056
Google Scholar
Bourou S et al (2021) A review of tabular data synthesis using GANs on an IDS dataset. Information 12:375. https://doi.org/10.3390/info12090375
Article Google Scholar
Venkatesan K, Rahayu SB (2024) Blockchain security enhancement: an approach towards hybrid consensus algorithms and machine learning techniques. Sci Rep 14(1):1149
Google Scholar
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):27
Google Scholar
Rezvani S, Wang X (2023) A broad review on class imbalance learning techniques. Appl Soft Comput 143:110415
Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Google Scholar
Mian Z et al (2024) A literature review of fault diagnosis based on ensemble learning. Eng Appl Artif Intell 127:107357
Google Scholar
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):9
Google Scholar
Khan AA, Chaudhari O, Chandra R (2024) A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl 244:122778
Google Scholar
Dina AS, Siddique AB, Manivannan D (2022) Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks. IEEE Access 10:96731–96747
Google Scholar
Wei Q, Dunbrack RL Jr (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7):e67863
Google Scholar
Goodfellow I et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27
Trevisan-de-Souza VL et al (2023) A review on generative adversarial networks for image generation. Comput Graph 114:13–25
Google Scholar
Gonog L, Zhou Y (2019) A review: generative adversarial networks. In: 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA)
Shahriar MH et al (2020) G-IDS: generative adversarial networks assisted intrusion detection system
Xu L, Veeramachaneni K (2018) Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264
Xu L et al (2019) Modeling tabular data using conditional gan. In: Advances in neural information processing systems, vol 32
Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Google Scholar
Thockchom N, Singh MM, Nandi U (2023) A novel ensemble learning-based model for network intrusion detection. Complex Intell Syst 1–22
Plackett RL (1983) Karl Pearson and the Chi-squared test. Int Stat Rev/Revue Internationale de Statistique 51(1):59–72
MathSciNet Google Scholar
Park C et al (2023) An enhanced ai-based network intrusion detection system using generative adversarial networks. IEEE Internet Things J 10(3):2330–2345
Google Scholar
Mukkamala S, Sung AH, Abraham A (2005) Intrusion detection using an ensemble of intelligent paradigms. J Netw Comput Appl 28(2):167–182
Google Scholar
Al-Hawawreh M, Moustafa N, Sitnikova E (2018) Identification of malicious activities in industrial internet of things based on deep learning models. J Inf Secur Appl 41:1–11
Google Scholar
Khammassi C, Krichen S (2017) A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 70:255–277
Google Scholar
Rajagopal S, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur Commun Netw 2020:4586875
Google Scholar
Khan IA et al (2022) An enhanced multi-stage deep learning framework for detecting malicious activities from autonomous vehicles. IEEE Trans Intell Transp Syst 23(12):25469–25478
Google Scholar
Seo E, Song HM, Kim HK (2018) GIDS: GAN based intrusion detection system for in-vehicle network. In: 2018 16th Annual Conference on Privacy, Security and Trust (PST), pp 1–6
Choudhary S, Kesswani N (2020) Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 datasets using deep learning in IoT. Procedia Comput Sci 167:1561–1573
Google Scholar
Yin Y et al (2023) IGRF-RFE: a hybrid feature selection method for MLP-based network intrusion detection on UNSW-NB15 dataset. J Big Data 10(1):15
Google Scholar
Mohy-Eddine M et al (2023) An ensemble learning based intrusion detection model for industrial IoT security. Big Data Min Anal 6(3):273–287
Google Scholar
Boppana TK, Bagade P (2023) GAN-AE: An unsupervised intrusion detection system for MQTT networks. Eng Appl Artif Intell 119:105805
Google Scholar
Yang K, Kpotufe S, Feamster N (2021) An efficient one-class SVM for anomaly detection in the Internet of Things. arXiv preprint arXiv:2104.11146
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining
Aldhaheri S, Alhuzali A (2023) SGAN-IDS: self-attention-based generative adversarial network against intrusion detection systems. Sensors 23:7796. https://doi.org/10.3390/s23187796
Article Google Scholar
Ashraf J et al (2021) IoTBoT-IDS: A novel statistical learning-enabled botnet detection framework for protecting networks of smart cities. Sustain Cities Soc 72:103041
Google Scholar
Rajesh-Kanna P, Santhi P (2021) Unified deep learning approach for efficient intrusion detection system using integrated spatial-temporal features. Knowl-Based Syst 226:107132
Google Scholar
Ts P, Shrinivasacharya P (2021) Evaluating neural networks using Bi-Directional LSTM for network IDS (intrusion detection systems) in cyber security. Global Transit Proc 2(2):448–454
Google Scholar
Kaur G (2020) A comparison of two hybrid ensemble techniques for network anomaly detection in spark distributed environment. J Inf Secur Appl 55:102601
Google Scholar
He M et al (2024) Reinforcement learning meets network intrusion detection: a transferable and adaptable framework for anomaly behavior identification. IEEE Trans Netw Serv Manag PP:1
Google Scholar
He M et al (2024) A lightweight and efficient IoT intrusion detection method based on feature grouping. IEEE Internet Things J 11:2935–2949
Google Scholar
Hong Y et al (2023) Graph based encrypted malicious traffic detection with hybrid analysis of multi-view features. Inf Sci 644:119229
Google Scholar
Zhao Z et al (2023) ERNN: error-resilient RNN for encrypted traffic detection towards network-induced phenomena. IEEE Trans Depend Secure Comput PP:1–18
Google Scholar
Nguyen H, Kashef R (2023) TS-IDS: traffic-aware self-supervised learning for IoT network intrusion detection. Knowl-Based Syst 279:110966
Google Scholar
Alabsi BA, Anbar M, Rihan SD (2023) Conditional tabular generative adversarial based intrusion detection system for detecting Ddos and Dos attacks on the internet of things networks. Sensors 23:5644. https://doi.org/10.3390/s23125644
Article Google Scholar
Supriyadi D, Purwanto P, Warsito B (2023) Optimizing neural networks for academic performance classification using feature selection and resampling approach. In: MENDEL
Pirani R (2023) Anomaly detection in large datasets: a case study in loan defaults. University of Windsor, Canada
Pedregosa F et al (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Manjunath YSK et al (2022) Segmented learning for class-of-service network traffic classification. In: GLOBECOM 2022—2022 IEEE Global Communications Conference
Manjunath YSK, Zhao S, Zhang XP (2021) Time-distributed feature learning in network traffic classification for internet of things. In: 2021 IEEE 7th world forum on internet of things (WF-IoT)
Hasanin T, Khoshgoftaar T (2018) The effects of random undersampling with simulated class imbalance for big data. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI)
Wakjira TG, Alam MS (2024) Peak and ultimate stress-strain model of confined ultra-high-performance concrete (UHPC) using hybrid machine learning model with conditional tabular generative adversarial network. Appl Soft Comput 154:111353
Google Scholar
Gulrajani I et al (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, vol 30
Amrith V et al (2023) An early malware threat detection model using Conditional Tabular Generative Adversarial Network, pp 1–8
Li M, Chen S (2015) Resampling methods for solving class imbalance problem in traffic incident detection. Appl Mech Mater 744–746:1985–1989
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv [cs.LG]
Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Google Scholar
Gollapalli M et al (2022) A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: pre-diabetes, T1DM, and T2DM. Comput Biol Med 147:105757
Google Scholar
Kalagotla SK, Gangashetty SV, Giridhar K (2021) A novel stacking technique for prediction of diabetes. Comput Biol Med 135:104554
Google Scholar
Shang Y (2024) Prevention and detection of DDOS attack in virtual cloud computing environment using Naive Bayes algorithm of machine learning. Meas Sens 31:100991
Google Scholar
Peng J, Lee K, Ingersoll G (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96:3–14
Google Scholar
Understanding Overfitting and Underfitting in Decision Trees. Available from: https://ai.plainenglish.io/understanding-overfitting-and-underfitting-in-decision-trees-c52a50ed949b.
Liu J et al (2022) Deep instance segmentation with automotive radar detection points. IEEE Trans Intell Veh PP:1
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Zhang T et al (2023) Optimization configuration and application value assessment modeling of hybrid energy storage in the new power system with multi-flexible resources coupling. J Energy Storage 62:106876
Google Scholar
Bakro M et al (2024) Building a cloud-IDS by Hybrid bio-inspired feature selection algorithms along with random forest model. IEEE Access
Bauder RA, Khoshgoftaar TM, Hasanin T (2018) Data sampling approaches with severely imbalanced big data for medicare fraud detection. In: 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE
Rokach L, Romano R, Maimon O (2008) Mining manufacturing databases to discover the effect of operation sequence on the product quality. J Intell Manuf 19:313–325
Google Scholar
Cai J (2020) Data-driven approach to holistic situational awareness in construction site safety management. Purdue University Graduate School
Umar MA, Zhanfang C (2023) Effects of feature selection and normalization on network intrusion detection. Authorea Preprints
Vassilev A (2024) Adversarial machine learning. Gaithersburg, MD
Sharadqh AAM et al (2023) Hybrid chain: Blockchain enabled framework for bi-level intrusion detection and graph-based mitigation for security provisioning in edge assisted IoT environment. IEEE Access 11:27433–27449
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, University College of Nabi Akram, Tabriz, Iran
Mohammad Reza Abbaszadeh Bavil Soflaei & Karim Samadzamini
Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
Arash Salehpour

Authors

Mohammad Reza Abbaszadeh Bavil Soflaei
View author publications
You can also search for this author in PubMed Google Scholar
Arash Salehpour
View author publications
You can also search for this author in PubMed Google Scholar
Karim Samadzamini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

This manuscript represents a collaborative effort where each author contributed significantly to its development.

Corresponding author

Correspondence to Arash Salehpour.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Soflaei, M.R.A.B., Salehpour, A. & Samadzamini, K. Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers. J Supercomput 80, 16301–16333 (2024). https://doi.org/10.1007/s11227-024-06108-7

Download citation

Accepted: 25 March 2024
Published: 10 April 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11227-024-06108-7

Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MIM: A multiple integration model for intrusion detection on imbalanced samples

Detecting cybersecurity attacks across different network features and learners

Intrusion detection based on ensemble learning for big data classification

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MIM: A multiple integration model for intrusion detection on imbalanced samples

Detecting cybersecurity attacks across different network features and learners

Intrusion detection based on ensemble learning for big data classification

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation