Authors:
Arnaud Rosay
1
;
Eloïse Cheval
2
;
Florent Carlier
3
and
Pascal Leroux
3
Affiliations:
1
STMicroelectronics, Rue Pierre-Félix Delarue, Le Mans, France
;
2
Polytech Nantes, Nantes University, Rue Christian Pauc, Nantes, France
;
3
CREN, Le Mans University, Avenue Olivier Messiaen, Le Mans, France
Keyword(s):
Network Intrusion Detection, CIC-IDS2017, CSE-CIC-IDS2018, CICFlowMeter, LycoSTand, LYCOS-IDS2017, Machine Learning.
Abstract:
With an ever increasing number of connected devices, network intrusion detection is more important than ever. Over the past few decades, several datasets were created to address this security issue. Analysis of older datasets, such as KDD-Cup99 and NSL-KDD, uncovered problems, paving the way for newer datasets that solved the identified issues. Among the recent datasets for network intrusion detection, CIC-IDS2017 is now widely used. It presents the advantage of being available as raw data and as flow-based features in CSV files. In this paper, we analyze this dataset in detail and report several problems we discovered in the flows extracted from the network packets. To address these issues, we propose a new feature extraction tool called LycoSTand, available as open source. We create LYCOS-IDS2017 dataset by extracting features from CIC-IDS2017 raw data files. The performance comparison between the original and the new datasets shows significant improvements for all machine learning
algorithms we tested. Beyond the improvements on CIC- IDS2017, we discuss other datasets that are affected by the same problems and for which LycoSTand could be used to generate improved network intrusion detection datasets.
(More)