[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3299815.3314439acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Intrusion Detection Using Big Data and Deep Learning Techniques

Published: 18 April 2019 Publication History

Abstract

In this paper, Big Data and Deep Learning Techniques are integrated to improve the performance of intrusion detection systems. Three classifiers are used to classify network traffic datasets, and these are Deep Feed-Forward Neural Network (DNN) and two ensemble techniques, Random Forest and Gradient Boosting Tree (GBT). To select the most relevant attributes from the datasets, we use a homogeneity metric to evaluate features. Two recently published datasets UNSW NB15 and CICIDS2017 are used to evaluate the proposed method. 5-fold cross validation is used in this work to evaluate the machine learning models. We implemented the method using the distributed computing environment Apache Spark, integrated with Keras Deep Learning Library to implement the deep learning technique while the ensemble techniques are implemented using Apache Spark Machine Learning Library. The results show a high accuracy with DNN for binary and multiclass classification on UNSW NB15 dataset with accuracies at 99.16% for binary classification and 97.01% for multiclass classification. While GBT classifier achieved the best accuracy for binary classification with the CICIDS2017 dataset at 99.99%, for multiclass classification DNN has the highest accuracy with 99.56%.

References

[1]
M. Al-Zewairi, S. Almajali, and A. Awajan. 2017. Experimental Evaluation of a Multi-layer Feed-Forward Artificial Neural Network Classifier for Network Intrusion Detection System. 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, pp. 167--172, IEEE
[2]
M. Belouch, S. El Hadaj, and M. Idhammad. 2017. Two-stage Classifier Approach Using RepTree algorithm for Network Intrusion Detection. International Journal of Advanced Computer Science and Applications, 8(6), pp. 389--394.
[3]
M. Belouch, S. El Hadaj, and M. Idhammad. 2018. Performance Evaluation of Intrusion Detection based on Machine Learning Using Apache Spark. Procedia Computer Science 127, pp. 1--6.
[4]
L. Breiman. 2001. Random Forests. Machine Learning, 45(1), pp. 5--32.
[5]
V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly Detection: A Survey. ACM Computing Surveys, 41(3), pp. 1--15.
[6]
F. Coelho, A. Braga, and M. Verleysen. 2012. Cluster Homogeneity as a Semi-supervised Principle for Feature Selection Using Mutual Information. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium.
[7]
P. Dahiya and D. Srivastava. 2018. Network Intrusion Detection in Big Dataset Using Spark. Procedia Computer Science 132, pp. 253--262.
[8]
L. Dhanabal, and S. p.Shantharajah. 2015. A Study on NSL KDD Dataset for Intrusion Detection System based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 4(6), pp. 446--452.
[9]
R. Di Pietro and L. V. Mancini, eds. 2008. Intrusion Detection Systems. Springer Science & Business, vol. 38. Media.
[10]
Osama Faker. 2018. Intrusion Detection Using Big Data and Deep Learning Techniques. MS Thesis, Cankaya University.
[11]
J.H. Friedman. 2002. Stochastic Gradient Boosting. Computational Statistics & Data Analysis, 38(4), pp. 367--378.
[12]
H. Gharaee and H. Hosseinvand. 2016. A New Feature Selection IDS based on Genetic Algorithm and SVM. Telecommunications (IST), 2016 8th International Symposium on. IEEE, pp. 139--144.
[13]
G.P. Gupta and M. Kulariya. 2016. A Framework for Fast and Efficient Cyber Security Network Intrusion Detection Using Apache Spark. Procedia Computer Science 93, Kochi, India, pp. 824--831.
[14]
J. Han, E. Haihong, G. Le, and J. Du. 2011. Survey on NoSQL Databases. In Pervasive Computing and Applications (ICPCA), Port Elizabeth, South Africa 2011 6th International Conference on, pp. 363--366. IEEE.
[15]
A. Lashkari, G. Draper-Gil, M. Mamun, and A. Ghorbani. 2017. Characterization of Tor Traffic Using Time based Features. The 3rd International Conference on Information Systems Security and Privacy, pp. 253--262.
[16]
Y. Liu. 2014. Random Forest Algorithm in Big Data Environment. Computer Modelling & New Technologies, 18(12A), pp. 147--151.
[17]
N. Moustafa and J. Slay. 2016. The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of the UNSW NB15 Data Set and the Comparison with the KDD99 Data Set. Information Security Journal: A Global Perspective, 25(13), pp. 18--31.
[18]
N. Moustafa and J. Slay. 2015. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, pp. 1--6, IEEE.
[19]
N. Moustafa and J. Slay. 2018. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1-3), pp. 18--31.
[20]
R. Primartha and B. Tama. 2017. Anomaly Detection Using Random Forest: A Performance Revisited. Data and Software Engineering (ICoDSE), International Conference on, Palembang Sumatra Selatan, Indonesia, pp. 1--6, IEEE.
[21]
P. Resende and A. Drummond. 2018. Adaptive Anomaly-based Intrusion Detection System Using Genetic Algorithm and Profiling. Security and Privacy, e36, pp. 1--13.
[22]
A. Rosenberg and J. Hirschberg. 2007. V-measure: A Conditional Entropy-based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL), pp. 410--420.
[23]
J. Schmidhuber. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks, vol. 61, pp. 85--117.
[24]
I. Sharafaldin, A. Lashkari, and A. A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018). Funchal, Madeira-Portugal, pp. 108--116.
[25]
I. Sharafaldin, A. Gharib, A. H. Lashkari, and A. A. Ghorbani. 2018. Towards a Reliable Intrusion Detection Benchmark Dataset. Software Networking, 2018(1), pp. 177--200.
[26]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. Mass Storage Systems and Technologies (MSST), IEEE 26th symposium on, pp. 1--10.
[27]
O.B. Sezer, M. Ozbayoglu, E. Dogdu. 2017. A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters. Procedia Computer Science, 114, pp. 473--480.
[28]
S. Suthaharan. 2014. Big Data Classification: Problems and Challenges in Network Intrusion Prediction with Machine Learning. ACM SIGMETRICS Performance Evaluation Review 41(4), pp. 70--73.
[29]
M. Tavallaee, E. Bagheri, W. Lu, and A. A.Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In Computational Intelligence for Security and Defense Applications. CISDA 2009. IEEE Symposium on, pp. 1--6, IEEE.
[30]
A. Thusoo, et al.2009. Hive: A Warehousing Solution over a Map-Reduce Framework. Proceedings of the VLDB Endowment 2(2), pp. 1626--1629.
[31]
E.D. Ubeyli and E. Dogdu. 2010. Automatic Detection of Erythemato-squamous Diseases Using K-means Clustering. Journal of Medical Systems, 34(2), pp. 179--184.
[32]
R. Vijayanand, D. Devaraj, and B. Kannapiran. 2018. Intrusion Detection System for Wireless Mesh Network Using Multiple Support Vector Machine Classifiers with Genetic-Algorithm-based Feature Selection. Computers & Security 77, pp. 304--314.
[33]
M. Zaharia, et al. 2016. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM 59(11), pp. 56--65.
[34]
C. Zhang and Y. Ma, eds. 2012. Ensemble Machine Learning: Methods and Applications. Springer Science & Business Media, Springer.
[35]
P. Zikopoulos and C. Eaton. 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Osborne Media.
[36]
R. Zuech, T. M. Khoshgoftaar, and R. Wald. 2015. Intrusion Detection and Big Heterogeneous Data: A Survey. Journal of Big Data, 2(3), pp. 1--41.

Cited By

View all
  • (2024)A Biological Immunity-Based Neuro Prototype for Few-Shot Anomaly Detection with Character EmbeddingCyborg and Bionic Systems10.34133/cbsystems.00865Online publication date: 16-Jan-2024
  • (2024)The Improved Network Intrusion Detection Techniques Using the Feature Engineering Approach with Boosting ClassifiersMathematics10.3390/math1224390912:24(3909)Online publication date: 11-Dec-2024
  • (2024)Machine Learning-Based Methodologies for Cyber-Attacks and Network Traffic Monitoring: A Review and InsightsInformation10.3390/info1511074115:11(741)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '19: Proceedings of the 2019 ACM Southeast Conference
April 2019
295 pages
ISBN:9781450362511
DOI:10.1145/3299815
  • Conference Chair:
  • Dan Lo,
  • Program Chair:
  • Donghyun Kim,
  • Publications Chair:
  • Eric Gamess
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Intrusion detection system
  2. artificial neural networks
  3. big data
  4. deep learning
  5. ensemble techniques
  6. feature selection
  7. machine learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM SE '19
Sponsor:
ACM SE '19: 2019 ACM Southeast Conference
April 18 - 20, 2019
GA, Kennesaw, USA

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)18
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Biological Immunity-Based Neuro Prototype for Few-Shot Anomaly Detection with Character EmbeddingCyborg and Bionic Systems10.34133/cbsystems.00865Online publication date: 16-Jan-2024
  • (2024)The Improved Network Intrusion Detection Techniques Using the Feature Engineering Approach with Boosting ClassifiersMathematics10.3390/math1224390912:24(3909)Online publication date: 11-Dec-2024
  • (2024)Machine Learning-Based Methodologies for Cyber-Attacks and Network Traffic Monitoring: A Review and InsightsInformation10.3390/info1511074115:11(741)Online publication date: 20-Nov-2024
  • (2024)SEDAT: A Stacked Ensemble Learning-Based Detection Model for Multiscale Network AttacksElectronics10.3390/electronics1315295313:15(2953)Online publication date: 26-Jul-2024
  • (2024)A Critical Review of Artificial Intelligence Based Approaches in Intrusion Detection: A Comprehensive AnalysisJournal of Engineering10.1155/2024/39091732024(1-16)Online publication date: 15-Apr-2024
  • (2024)ENIDS: A Deep Learning-Based Ensemble Framework for Network Intrusion Detection SystemsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.341430521:5(5809-5825)Online publication date: Oct-2024
  • (2024)A Network Intrusion Detection Model for IoT Networks2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG)10.1109/SEB4SDG60871.2024.10629842(1-8)Online publication date: 2-Apr-2024
  • (2024)Apache Spark Powered: Enhancing Network Intrusion Detection System Using Random Forest2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES)10.1109/NILES63360.2024.10753188(289-294)Online publication date: 19-Oct-2024
  • (2024)A Comprehensive Survey on Anomaly Detection in Social Media Networks: Challenges, Methods, and Future Directions2024 4th International Conference on Sustainable Expert Systems (ICSES)10.1109/ICSES63445.2024.10763303(363-370)Online publication date: 15-Oct-2024
  • (2024)Deep Attention Learning for Extreme Minority Class Intrusion Detection in Network Traffic2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS)10.1109/ICKECS61492.2024.10617078(1-9)Online publication date: 18-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media