[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Anomaly Detection in Cybersecurity Datasets via Cooperative Co-evolution-based Feature Selection

Published: 04 February 2022 Publication History

Abstract

Anomaly detection from Big Cybersecurity Datasets is very important; however, this is a very challenging and computationally expensive task. Feature selection (FS) is an approach to remove irrelevant and redundant features and select a subset of features, which can improve the machine learning algorithms’ performance. In fact, FS is an effective preprocessing step of anomaly detection techniques. This article’s main objective is to improve and quantify the accuracy and scalability of both supervised and unsupervised anomaly detection techniques. In this effort, a novel anomaly detection approach using FS, called Anomaly Detection Using Feature Selection (ADUFS), has been introduced. Experimental analysis was performed on five different benchmark cybersecurity datasets with and without feature selection and the performance of both supervised and unsupervised anomaly detection techniques were investigated. The experimental results indicate that instead of using the original dataset, a dataset with a reduced number of features yields better performance in terms of true positive rate (TPR) and false positive rate (FPR) than the existing techniques for anomaly detection. For example, with FS, a supervised anomaly detection technique, multilayer perception increased the TPR by over 200% and decreased the FPR by about 97% for the KDD99 dataset. Similarly, with FS, an unsupervised anomaly detection technique, local outlier factor increased the TPR by more than 40% and decreased the FPR by 15% and 36% for Windows 7 and NSL-KDD datasets, respectively. In addition, all anomaly detection techniques require less computational time when using datasets with a suitable subset of features rather than entire datasets. Furthermore, the performance results have been compared with six other state-of-the-art techniques based on a decision tree (J48).

References

[1]
M. Ahmed. 2019. Intelligent big data summarization for rare anomaly detection. IEEE Access 7 (2019), 68669–68677. DOI:https://doi.org/10.1109/ACCESS.2019.2918364
[2]
M. Ahmed, A. Anwar, A. N. Mahmood, Z. Shah, and M. J. Maher. 2015. An investigation of performance analysis of anomaly detection techniques for big data in SCADA systems. EAI Endor. Trans. Industr. Netw. Intell. Syst. 2, 3 (2015), e5. DOI:https://doi.org/10.4108/inis.2.3.e5
[3]
M. Ahmed, A. N. Mahmood, and J. Hu. 2016. A survey of network anomaly detection techniques. J. Netw. Comput. Applic. 60 (2016), 19–31. DOI:https://doi.org/10.1016/j.jnca.2015.11.016
[4]
M. Ahmed, A. N. Mahmood, and M. R. Islam. 2016. A survey of anomaly detection techniques in financial domain. Fut. Gen. Comput. Syst. 55 (2016), 278–288. DOI:https://doi.org/10.1016/j.future.2015.01.001
[5]
U. Ahmed, J. C. W. Lin, G. Srivastava, and Y. Djenouri. 2021. A deep Q-learning sanitization approach for privacy preserving data mining. In Proceedings of the International Conference on Distributed Computing and Networking (ICDCN’21). Association for Computing Machinery, New York, NY, 43–48. DOI:https://doi.org/10.1145/3427477.3429990
[6]
A. A. Alabdel Abass, M. Hajimirsadeghi, N. B. Mandayam, and Z. Gajic. 2016. Evolutionary game theoretic analysis of distributed denial of service attacks in a wireless network. In Proceedings of the Annual Conference on Information Science and Systems. 36–41. DOI:https://doi.org/10.1109/CISS.2016.7460473
[7]
M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan. 2016. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65, 10 (2016), 2986–2998. DOI:https://doi.org/10.1109/TC.2016.2519914
[8]
S. Bagui, E. Kalaimannan, S. Bagui, D. Nandi, and A. Pinto. 2019. Using machine learning techniques to identify rare cyber-attacks on the UNSW-NB15 dataset. Secur. Privacy 2, 6 (2019), e91. DOI:https://doi.org/10.1002/spy2.91
[9]
A. Belhadi, Y. Djenouri, G. Srivastava, D. Djenouri, A. Cano, and J. C. W. Lin. 2020. A two-phase anomaly detection model for secure intelligent transportation ride-hailing trajectories. IEEE Trans. Intell. Transport. Syst. (2020), 1–11. DOI:https://doi.org/10.1109/TITS.2020.3022612
[10]
A. Binbusayyis and T. Vaiyapuri. 2019. Identifying and benchmarking key features for cyber intrusion detection: An ensemble approach. IEEE Access 7 (2019), 106495–106513. DOI:https://doi.org/10.1109/ACCESS.2019.2929487
[11]
H. Bostani and M. Sheikhan. 2017. Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems. Soft Comput. 21, 9 (2017), 2307–2324. DOI:https://doi.org/10.1007/s00500-015-1942-8
[12]
A. Branitskiy and I. Kotenko. 2018. Applying artificial intelligence methods to network attack detection. In AI in Cybersecurity, L. F. Sikos (Ed.). Springer, Cham. DOI:https://doi.org/10.1007/978-3-319-98842-9_5
[13]
A. Bucci and J. B. Pollack. 2005. On identifying global optima in cooperative coevolution. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. ACM, New York., 539–544. DOI:https://doi.org/10.1145/1068009.1068098
[14]
B. Chakraborty and A. Kawamura. 2018. A new penalty-based wrapper fitness function for feature subset selection with evolutionary algorithms. J. Inf. Telecommun. 2, 2 (2018), 163–180. DOI:https://doi.org/10.1080/24751839.2018.1423792
[15]
C. H. Chee, J. Jaafar, I. A. Aziz, M. H. Hasan, and W. Yeoh. 2019. Algorithms for frequent itemset mining: A literature review. Artif. Intell. Rev. 52, 4 (2019), 2603–2621. DOI:https://doi.org/10.1007/s10462-018-9629-z
[16]
Y. Chen, A. Abraham, and B. Yang. 2006. Feature selection and classification using flexible neural tree. Neurocomputing 70, 1 (2006), 305–313. DOI:https://doi.org/10.1016/j.neucom.2006.01.022
[17]
S. Dwivedi, M. Vardhan, and S. Tripathi. 2020. Incorporating evolutionary computation for securing wireless network against cyberthreats. J. Supercomput. 76 (2020), 8691–8728. DOI:https://doi.org/10.1007/s11227-020-03161-w
[18]
S. Elsayed, R. Sarker, and J. Slay. 2015. Evaluating the performance of a differential evolution algorithm in anomaly detection. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC). 2490–2497. DOI:https://doi.org/10.1109/CEC.2015.7257194
[19]
P. Fournier-Viger, J. C. W. Lin, B. Vo, T. T. Chi, J. Zhang, and H. B. Le. 2017. A survey of itemset mining. WIREs Data Mining Knowl. Discov. 7, 4 (2017), e1207. DOI:https://doi.org/10.1002/widm.1207
[20]
M. G Ravetti and P. Moscato. 2008. Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3, 9 (09 2008), 1–12. DOI:https://doi.org/10.1371/journal.pone.0003111
[21]
N. M. Karie, N. M. Sahri, and P. Haskell-Dowland. 2020. IoT threat detection advances, challenges and future directions. In Proceedings of the Workshop on Emerging Technologies for Security in IoT. 22–29. DOI:https://doi.org/10.1109/ETSecIoT50046.2020.00009
[22]
A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman. 2019. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2, 1 (2019), 1–22. DOI:https://doi.org/10.1186/s42400-019-0038-7
[23]
I. Ko, D. Chambers, and E. Barrett. 2019. Unsupervised learning with hierarchical feature selection for DDoS mitigation within the ISP domain. ETRI J. 41, 5 (2019), 574–584. DOI:https://doi.org/10.4218/etrij.2019-0109
[24]
J. Kusyk, M. U. Uyar, and C. S. Sahin. 2018. Survey on evolutionary computation methods for cybersecurity of mobile ad hoc networks. Evolut. Intell. 10, 3–4 (2018), 95–117. DOI:https://doi.org/10.1007/s12065-018-0154-4
[25]
Y. Li, J. Chen, Q. Li, and A. Liu. 2020. Differential privacy algorithm based on personalized anonymity. In Proceedings of the 5th IEEE International Conference on Big Data Analytics (ICBDA). 260–267. DOI:https://doi.org/10.1109/ICBDA49040.2020.9101213
[26]
S. Mohammadi, H. Mirvaziri, M. Ghazizadeh-Ahsaee, and H. Karimipour. 2019. Cyber intrusion detection by combined feature selection algorithm. J. Inf. Secur. Applic. 44 (2019), 80–88. DOI:https://doi.org/10.1016/j.jisa.2018.11.007
[27]
K. L. Moore, T. J. Bihl, K. W. Bauer Jr, and T. E. Dube. 2017. Feature extraction and feature selection for classifying cyber traffic threats. J. Defense Model. Simul. 14, 3 (2017), 217–231. DOI:https://doi.org/10.1177/1548512916664032
[28]
N. Moustafa and J. Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS). 1–6. DOI:https://doi.org/10.1109/MilCIS.2015.7348942
[29]
H. T. Nguyen, S. Petrović, and K. Franke. 2010. A comparison of feature-selection methods for intrusion detection. In Computer Network Security, I. Kotenko and V. Skormin (Eds.). Springer Berlin, 242–255. DOI:https://doi.org/10.1007/978-3-642-14706-7_19
[30]
M. N. Omidvar, X. Li, Y. Mei, and X. Yao. 2013. Cooperative co-evolution with differential grouping for large scale optimization. IEEE Trans. Evolut. Computat. 18, 3 (2013), 378–393. DOI:https://doi.org/10.1109/TEVC.2013.2281543
[31]
M. N. Omidvar, M. Yang, Y. Mei, X. Li, and X. Yao. 2017. DG2: A faster and more accurate differential grouping for large-scale black-box optimization. IEEE Trans. Evolut. Computat. 21, 6 (2017), 929–942. DOI:https://doi.org/10.1109/TEVC.2017.2694221
[32]
D. Philp, N. Chan, and L. F. Sikos. 2019. Decision support for network path estimation via automated reasoning. In Intelligent Decision Technologies 2019, I. Czarnowski, R. J. Howlett, and L. C. Jain (Eds.). Springer, Singapore, 335–344. DOI:https://doi.org/10.1007/978-981-13-8311-3_29
[33]
M. A. Potter. 1997. The Design and Analysis of a Computational Model of Cooperative Coevolution. Ph.D. Dissertation. George Mason University, VA.
[34]
M. A. Potter and K. A. De Jong. 1994. A cooperative coevolutionary approach to function optimization. In Proceedings of the International Conference on Parallel Problem Solving from Nature. Springer, 249–257. DOI:https://doi.org/10.1007/3-540-58484-6_269
[35]
M. A. Potter and K. A. D. Jong. 2000. Cooperative coevolution: An architecture for evolving coadapted subcomponents. Evolut. Computat. 8, 1 (2000), 1–29. DOI:https://doi.org/10.1162/106365600568086
[36]
A. Powell, D. Bates, C. Van Wyk, and D. de Abreu. 2019. A cross-comparison of feature selection algorithms on multiple cyber security data-sets. In FAIR. 196–207. Retrieved from http://ceur-ws.org/Vol-2540/FAIR2019_paper_69.pdf.
[37]
A. N. M. B. Rashid. 2018. Access methods for big data: Current status and future directions. EAI Endors. Trans. Scalab. Inf. Syst. 4, 15 (2018). DOI:https://doi.org/10.4108/eai.28-12-2017.153520
[38]
A. N. M. B. Rashid, M. Ahmed, L. F. Sikos, and P. Haskell-Dowland. 2020. Cooperative co-evolution for feature selection in big data with random feature grouping. J. Big Data 7, 1 (2020), 1–42. DOI:https://doi.org/10.1186/s40537-020-00381-y
[39]
A. N. M. B. Rashid, M. Ahmed, L. F. Sikos, and P. Haskell-Dowland. 2020. A novel penalty-based wrapper objective function for feature selection in big data using cooperative co-evolution. IEEE Access 8 (2020), 150113–150129. DOI:https://doi.org/10.1109/ACCESS.2020.3016679
[40]
A. N. M. B. Rashid and T. Choudhury. 2019. Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and MapReduce perspectives. Prob. Perspect. Manag. 17, 4 (2019), 340. DOI:https://doi.org/10.21511/ppm.17(4).2019.28
[41]
A. N. M. B. Rashid and T. Choudhury. 2021. Cooperative co-evolution and MapReduce: A review and new insights for large-scale optimization. Int. J. Inf. Technol. Project Manag. 12, 1 (2021), 29–62. DOI:https://doi.org/10.4018/IJITPM.2021010102
[42]
S. Sadik, M. Ahmed, L. F. Sikos, and A. K. M. N. Islam. 2020. Toward a sustainable cybersecurity ecosystem. Computers 9, 3 (2020), 74. DOI:https://doi.org/10.3390/computers9030074
[43]
D. Schatz, R. Bashroush, and J. Wall. 2017. Towards a more representative definition of cyber security. J. Digit. Forens., Secur. Law 12, 2 (2017), 53–74. DOI:https://doi.org/10.15394/jdfsl.2017.1476
[44]
M. Shi and S. Gao. 2017. Reference sharing: A new collaboration model for cooperative coevolution. J. Heurist. 23, 1 (2017), 1–30. DOI:https://doi.org/10.1007/s10732-016-9322-9
[45]
L. F. Sikos. 2020. Packet analysis for network forensics: A comprehensive survey. Forens. Sc. Int. Dig. Investig. 32 (2020), 200892. DOI:https://doi.org/10.1016/j.fsidi.2019.200892
[46]
L. F. Sikos, D. Philp, S. Voigt, C. Howard, M. Stumptner, and W. Mayer. 2018. Provenance-aware LOD datasets for detecting network inconsistencies. In Joint Proceedings of the International Workshops on Contextualized Knowledge Graphs, and Semantic Statistics Co-located with 17th International Semantic Web Conference, S. Capadisli, F. Cotton, J. M. Giménez-García, A. Haller, E. Kalampokis, V. Nguyen, A. Sheth, and R. Troncy (Eds.). RWTH Aachen University, Aachen. Retrieved from http://ceur-ws.org/Vol-2317/article-03.pdf.
[47]
L. F. Sikos, M. Stumptner, W. Mayer, C. Howard, S. Voigt, and D. Philp. 2018. Automated reasoning over provenance-aware communication network knowledge in support of cyber-situational awareness. In Knowledge Science, Engineering and Management, W. Liu, F. Giunchiglia, and B. Yang (Eds.). Springer, Cham, 132–143. DOI:https://doi.org/10.1007/978-3-319-99247-1_12
[48]
L. F. Sikos, M. Stumptner, W. Mayer, C. Howard, S. Voigt, and D. Philp. 2018. Representing network knowledge using provenance-aware formalisms for cyber-situational awareness. Proced. Comput. Sci. 126 (2018), 29–38. DOI:https://doi.org/10.1016/j.procs.2018.07.206
[49]
R. Storn and K. Price. 1997. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 4 (1997), 341–359. DOI:https://doi.org/10.1023/A:1008202821328
[50]
S. Thudumu, P. Branch, J. Jin, and J. J. Singh. 2020. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7, 1 (2020), 1–30. DOI:https://doi.org/10.1186/s40537-020-00320-x
[51]
G. A. Trunfio, P. Topa, and J. Wa̧s. 2016. A new algorithm for adapting the configuration of subcomponents in large-scale optimization with cooperative coevolution. Inf. Sci. 372 (2016), 773–795. DOI:https://doi.org/10.1016/j.ins.2016.08.080
[52]
A. Tundis, S. Ruppert, and M. Mühlhäuser. 2020. On the automated assessment of open-source cyber threat intelligence sources. In Computational Science – ICCS 2020, V. V. Krzhizhanovskaya, Gábor Závodszky, M. H. Lees, J. J. Dongarra, P. M. A. Sloot, Sérgio Brissos, and João Teixeira (Eds.). Springer International Publishing, Cham, 453–467. DOI:https://doi.org/10.1007/978-3-030-50417-5_34
[53]
F. van den Bergh and A. P. Engelbrecht. 2004. A cooperative approach to particle swarm optimization. IEEE Trans. Evolut. Computat. 8, 3 (2004), 225–239. DOI:https://doi.org/10.1109/TEVC.2004.826069
[54]
R. P. Wiegand. 2003. An Analysis of Cooperative Coevolutionary Algorithms. Ph.D. Dissertation. George Mason University, VA.
[55]
J. M. T. Wu, G. Srivastava, J. C. W. Lin, Y. Djenouri, M. Wei, R. M. Parizi, and M. S. Khan. 2021. Mining of high-utility patterns in big IoT-based databases. Mob. Netw. Applic. 26 (2021), 216–233. DOI:https://doi.org/10.1007/s11036-020-01701-5
[56]
J. M. T. Wu, G. Srivastava, M. Wei, U. Yun, and J. C. W. Lin. 2021. Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework. Inf. Sci. 553 (2021), 31–48. DOI:https://doi.org/10.1016/j.ins.2020.12.004
[57]
Z. Yang, K. Tang, and X. Yao. 2008. Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178, 15 (2008), 2985–2999. DOI:https://doi.org/10.1016/j.ins.2008.02.017

Cited By

View all
  • (2024)Anomaly Detection in Meteorological Data Using a Hierarchical Temporal Memory Model: A Study on the Case of KazakhstanFırat Üniversitesi Mühendislik Bilimleri Dergisi10.35234/fumbd.142563536:1(481-498)Online publication date: 28-Mar-2024
  • (2024)The Role of Mining and Detection of Big Data Processing Techniques in CybersecurityApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-09429:1Online publication date: 3-May-2024
  • (2024)Cost-Efficient Feature Selection for Horizontal Federated LearningIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34366645:12(6551-6565)Online publication date: Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems
ACM Transactions on Management Information Systems  Volume 13, Issue 3
September 2022
312 pages
ISSN:2158-656X
EISSN:2158-6578
DOI:10.1145/3512349
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2022
Accepted: 01 November 2021
Revised: 01 September 2021
Received: 01 October 2020
Published in TMIS Volume 13, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Anomaly detection
  2. feature selection
  3. cybersecurity
  4. Big Data
  5. cooperative co-evolution
  6. machine learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Edith Cowan University (ECU) Higher Degree by Research Scholarship (HDRS)
  • ECU School of Science Research Scholarship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)317
  • Downloads (Last 6 weeks)25
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Anomaly Detection in Meteorological Data Using a Hierarchical Temporal Memory Model: A Study on the Case of KazakhstanFırat Üniversitesi Mühendislik Bilimleri Dergisi10.35234/fumbd.142563536:1(481-498)Online publication date: 28-Mar-2024
  • (2024)The Role of Mining and Detection of Big Data Processing Techniques in CybersecurityApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-09429:1Online publication date: 3-May-2024
  • (2024)Cost-Efficient Feature Selection for Horizontal Federated LearningIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34366645:12(6551-6565)Online publication date: Dec-2024
  • (2024)Risk Assessment of Cybersecurity IoT Anomalies Through Cyber Value at Risk (CVaR)2024 IEEE World AI IoT Congress (AIIoT)10.1109/AIIoT61789.2024.10578956(77-83)Online publication date: 29-May-2024
  • (2024)Deep Neural Network Optimization Based on Binary Method for Handling Multi-Class ProblemsIEEE Access10.1109/ACCESS.2024.338219512(46881-46890)Online publication date: 2024
  • (2024)Anomaly detection using unsupervised machine learning algorithms: A simulation studyScientific African10.1016/j.sciaf.2024.e0238626(e02386)Online publication date: Dec-2024
  • (2024)Evolving techniques in cyber threat hunting: A systematic reviewJournal of Network and Computer Applications10.1016/j.jnca.2024.104004232(104004)Online publication date: Dec-2024
  • (2024)EDSUCh: A robust ensemble data summarization method for effective medical diagnosisDigital Communications and Networks10.1016/j.dcan.2022.07.00710:1(182-189)Online publication date: Feb-2024
  • (2024)HEOD: Human-assisted Ensemble Outlier Detection for cybersecurityComputers & Security10.1016/j.cose.2024.104040146(104040)Online publication date: Nov-2024
  • (2024)Cooperative coevolution for non-separable large-scale black-box optimization: Convergence analyses and distributed accelerationsApplied Soft Computing10.1016/j.asoc.2024.112232166(112232)Online publication date: Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media