Abstract
Ever since the internet became part of the everyday lives of humans providing network security has been considered of utmost importance. Over the years lot of time and energy has been devoted by people in the research community and industry to provide better, improved and secure mechanisms to ensure secure communications on the internet. Amongst the many fields of study, the most prominent and ever evolving one has been the study of network traffic for attack detection and mitigation. The advent of new technologies has led to an increase in the pace of network based attacks and therefore novel modified approaches are needed to be able to cope with these latest trends. Distributed machine learning with the development of new tools and frameworks like RDD structure in Apache Spark provides an immense scope of growth in this direction. Moreover, the dynamic nature of present day network traffic called concept drift has also necessitated studying solutions from a different angle. We, therefore, in this paper have worked on distributed machine learning based ensemble techniques to detect the presence of concept drift in network traffic and detect network based attacks. The work has been done in three parts. Firstly, two classifiers, namely, Random Forest and Logistic Regression have been used as level ‘0′ learners and Support Vector Machine has been used as level ‘1′ learner. Secondly, to handle the process of concept drift we have used a sliding window based K-means clustering. And thirdly ensemble based techniques for detection of attacks in the traffic. The experiments have been performed on three datasets, namely, the NSL-KDD dataset, the CIDDS-2017 dataset and generated Testbed dataset. These tests have been conducted on different machines by varying the number of executor cores to study time latency in a distributed environment. An accuracy of 93% on NSL-KDD, 98% on CIDDS-2017 and 97% on Testbed datasets for SVM based blending model have been achieved.
Similar content being viewed by others
References
Choras, M., Kozik, R., Bruna, M.P.T., Yautsiukhin, A., Churchill, A., Maciejewska, I., Eguinoa, I., Jomni, A.: Comprehensive approach to increase cyber security and resilience. In: 2015 10th International Conference on Availability, Reliability and Security, pp. 686–692 (2015). https://doi.org/10.1109/ARES.2015.30
Zlomislić, V., Fertalj, K., Sruk, V.: Denial of service attacks, defences and research challenges. Clust. Comput. 20(1), 661–671 (2017). https://doi.org/10.1007/s10586-017-0730-x
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2013). https://doi.org/10.1109/SURV.2013.052213.00046
Jain, M., Kaur, G.: A study of feature reduction techniques and classification for network anomaly detection. J. Comput. Inf. Technol. 27(4), 1–16 (2019). https://doi.org/10.20532/cit.2019.1004591
Yang, C.: Anomaly network traffic detection algorithm based on information entropy measurement under the cloud computing environment. Clust. Comput. 22(4), 8309–8317 (2019). https://doi.org/10.1007/s10586-018-1755-5
Rajendran, R., Kumar, S.S., Palanichamy, Y., Arputharaj, K.: Detection of DoS attacks in cloud networks using intelligent rule based classification system. Clust. Comput. 22(1), 423–434 (2019). https://doi.org/10.1007/s10586-018-2181-4
Satam, P., Alipour, H., Al-Nashif, Y.B., Hariri, S.: Anomaly behavior analysis of DNS protocol. J. Internet Serv. Inf. Secur. 5(4), 85–97 (2015). https://doi.org/10.22667/JISIS.2015.11.31.085
Jaber, A.N., Rehman, S.U.: FCM–SVM based intrusion detection system for cloud computing environment. Clust. Comput. 23, 3221–3231 (2020). https://doi.org/10.1007/s10586-020-03082-6
Bhuvaneswari Amma, N.G., Selvakumar, S.: A statistical class center based triangle area vector method for detection of denial of service attacks. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03120-3
Pacheco, J., Hariri, S.: Anomaly behavior analysis for IoT sensors. Trans. Emerg. Telecommun. Technol. 29(4), e3188 (2018). https://doi.org/10.1002/ett.3188
Stiawan, D., Heryanto, A., Berdadi, A., Rini, D.P., Subroto, I.M.I., Idris, M.Y., Abdullah, A.H., Kerim, B., Budiarto, R.: An approach for optimizing ensemble intrusion detection systems. IEEE Access 9, 6930–6947 (2020). https://doi.org/10.1109/ACCESS.2020.3046246
Prasad, K.M., Reddy, A.R.M., Rao, K.V.: Defad: ensemble classifier for DDoS enabled flood attack defense in distributed network environment. Clust. Comput. 21(4), 1765 (2018). https://doi.org/10.1007/s10586-018-2808-5
Choras, M., Wozniak, M.: Concept drift analysis for improving anomaly detection systems in cybersecurity. In: Central European Cybersecurity Conference, CECC, pp. 35–42 (2017). https://doi.org/10.18690/978-961-286-114-8.3
Karimi, A.M., Niyaz, Q., Sun, W., Javaid, A.Y., Devabhaktuni, V.K.: Distributed network traffic feature extraction for a real-time IDS. In: 2016 IEEE International Conference on Electro Information Technology (EIT), pp. 0522–0526 (2016). https://doi.org/10.1109/EIT.2016.7535295
Kato, K., Klyuev, V.: Development of a network intrusion detection system using Apache Hadoop and Spark. In: 2017 IEEE Conference on Dependable and Secure Computing, pp. 416–423 (2017). https://doi.org/10.1109/DESEC.2017.8073860
Csaba, B.: Processing intrusion data with machine learning and MapReduce. Acad. Appl. Res. Mil. Sci. 16(1), 37–52 (2017). https://folyoirat.ludovika.hu/index.php/aarms/article/view/1612
Apache Spark RDD. https://spark.apache.org/docs/latest/rdd-programming-guide.html. Accessed 3 May 2020
Jain, M., Kaur, G.: A novel distributed semi-supervised approach for detection of network based attacks. In: 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 120–125 (2019). https://doi.org/10.1109/CONFLUENCE.2019.8776616
Resende, P.A.A., Drummond, A.C.: HTTP and contact-based features for Botnet detection. Secur. Priv. 1(5), e41 (2018). https://doi.org/10.1002/spy2.41
Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using Apache Spark. Procedia Comput. Sci. 93, 824–831 (2016). https://doi.org/10.1016/j.procs.2016.07.238
Alnafessah, A., Casale, G.: Artificial neural networks based techniques for anomaly detection in Apache Spark. Clust. Comput. 23(2), 1345–1360 (2020). https://doi.org/10.1007/s10586-019-02998-y
NSL-KDD. http://nsl.cs.unb.ca/NSL-KDD/. Accessed 29 Apr 2020
CIDDS-2017 (2017). https://www.hs-coburg.de/forschung/forschungsprojekte-oeffentlich/informationstechnologie/cidds-coburg-intrusion-detection-data-sets.html. Accessed 25 Apr 2020
Kaur, G., Jain, M.: A comparison of two blending-based ensemble techniques for network anomaly detection in Spark distributed environment. Int. J. Ad Hoc Ubiquitous Comput. 35(2), 71–83 (2020). https://doi.org/10.1504/IJAHUC.2020.109794
Aburomman, A.A., Reaz, M.B.I.: A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 65, 135–152 (2017). https://doi.org/10.1016/j.cose.2016.11.004
Tama, B.A., Comuzzi, M., Rhee, K.H.: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly based intrusion detection system. IEEE Access 7, 94497–94507 (2019). https://doi.org/10.1109/ACCESS.2019.2928048
Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. (2020). https://doi.org/10.1155/2020/4586875
Spinosa, E.J., de Leon, F., de Carvalho, A.P., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 976–980 (2008). https://doi.org/10.1145/1363686.1363912
Wankhade, K.K., Dongre, S.S.: A new adaptive ensemble boosting classifier for concept drifting stream data. Int. J. Model. Opt. 2(4), 493 (2012). https://doi.org/10.7763/IJMO.2012.V2.169
Yuan, X., Wang, R., Zhuang, Y., Zhu, K., Hao, J.: A concept drift based ensemble incremental learning approach for intrusion detection. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 350–357 (2018). https://doi.org/10.1109/Cybermatics_2018.2018.00087
Hameed, S., Ali, U.: HADEC: Hadoop-based live DDoS detection framework. EURASIP J. Inf. Secur. 1, 1–9 (2018). https://doi.org/10.1186/s13635-018-0081-z
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Kozik, R., Choraś, M.: Pattern extraction algorithm for netflow-based botnet activities detection. Secur. Commun. Netw. (2017). https://doi.org/10.1155/2017/6047053
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, M., Kaur, G. Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data. Cluster Comput 24, 2099–2114 (2021). https://doi.org/10.1007/s10586-021-03249-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03249-9