[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Representativeness-Based Instance Selection for Intrusion Detection

Published: 01 January 2021 Publication History

Abstract

With the continuous development of network technology, an intrusion detection system needs to face detection efficiency and storage requirement when dealing with large data. A reasonable way of alleviating this problem is instance selection, which can reduce the storage space and improve intrusion detection efficiency by selecting representative instances. An instance is representative not only in its class but also in different classes. This representativeness reflects the importance of an instance. Since the existing instance selection algorithm does not take into account the above situations, some selected instances are redundant and some important instances are removed, increasing storage space and reducing efficiency. Therefore, a new representativeness of instance is proposed and considers not only the influence of all instances of the same class on the selected instance but also the influence of instances of different classes on the selected instance. Moreover, it considers the influence of instances of different classes as an advantageous factor. Based on this representativeness, two instance selection algorithms are proposed to handle balanced and imbalanced data problems for intrusion detection. One is a representative-based instance selection for balanced data, which is named RBIS and selects the same proportion of instances from each class. The other is a representative-based instance selection for imbalanced data, which is named RBIS-IM and selects important majority instances according to the number of instances of the minority class. Compared with other algorithms on the benchmark data sets of intrusion detection, experimental results verify the effectiveness of the proposed RBIS and RBIS-IM algorithms and demonstrate that the proposed algorithms can achieve a better balance between accuracy and reduction rate or between balanced accuracy and reduction rate.

References

[1]
T. Phuoc, P. Tsai, T. Jan, and X. Kong, “Network intrusion detection using machine learning techniques,” in Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), pp. 1–7, Vellore, India, February 2020.
[2]
H. Hindy, D. Brosset, E. Bayne et al., “A taxonomy of network threats and the effect of current datasets on intrusion detection systems,” IEEE Access, vol. 8, pp. 104650–104675, 2020.
[3]
O. Adeleke, “Intrusion detection: issues, problems and solutions,” in Proceedings of the 3rd International Conference on Information and Computer Technologies (ICICT), pp. 397–402, San Jose, CA, USA, March 2020.
[4]
J. Nalepa and M. Kawulok, “Selecting training sets for support vector machines: a review,” Artificial Intelligence Review, vol. 52, pp. 857–900, 2019.
[5]
Z. H. Zhu, Z. Wang, D. D. Li, and W. L. Du, “NearCount: selecting critical instances based on the cited counts of nearest neighbors,” Knowledge-Based Systems, vol. 190, 2020.
[6]
A. D. Haro-Garcia, G. Cerruela-Garcia, and N. Garcia-Pedrajas, “Instance selection based on boosting for instance-based learners,” Pattern Recognition, vol. 96, 2019.
[7]
C. Guo, Y.-J. Zhou, Y. Ping, S.-S. Luo, Y.-P. Lai, and Z.-K. Zhang, “Efficient intrusion detection using representative instances,” Computers & Security, vol. 39, pp. 255–267, 2013.
[8]
J. Li and Y. Wang, “A new fast reduction technique based on binary nearest neighbor tree,” Neurocomputing, vol. 149, pp. 1647–1657, 2015.
[9]
L. Yang, Q. Zhu, J. Huang, Q. Wu, D. Cheng, and X. Hong, “Constraint nearest neighbor for instance reduction,” Soft Computing, vol. 23, no. 24, pp. 13235–13245, 2019.
[10]
C. D. S. Pereira and G. D. C. Cavalcanti, “Instance selection algorithm based on a ranking procedure,” in Proceedings of the 2011 International Joint Conference on Neural Networks, pp. 2409–2416, San Jose, CA, USA, July 2011.
[11]
G. D. C. Cavalcanti and R. J. O. Soares, “Ranking-based instance selection for pattern classification,” Expert Systems with Applications, vol. 150, 2020.
[12]
H. Hmida, S. B. Hamida, A. Borgi, and M. Rukoz, “Hierarchical data topology based selection for large scale learning,” in Proceedings of the 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), pp. 1221–1226, Toulouse, France, July 2016.
[13]
J. Hamidzadeh, N. Kashefi, and M. Moradi, “Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem,” Engineering Applications of Artificial Intelligence, vol. 90, 2020.
[14]
L. Li, K. Y. Zhao, R. Z. Sun et al., “Parameter-free extreme learning machine for imbalanced classification,” Neural Processing Letters, vol. 52, no. 3, pp. 1927–1944, 2020.
[15]
H. X. Guo, Y. J. Li, J. Shang et al., “Learning from class-imbalanced data: review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, 2017.
[16]
C.-H. Chou, B.-H. Kuo, and F. Chang, “The generalized condensed nearest neighbor rule as A data reduction method,” in Proceedings of the 18th International Conference on Pattern Recognition, vol. 2, pp. 556–559, Hong Kong, China, August 2006.
[17]
H. A. Fayed and A. F. Atiya, “A novel template reduction approach for the k-nearest neighbor method,” IEEE Transactions on Neural Networks, vol. 20, no. 5, pp. 890–896, 2009.
[18]
J. Arturo Olvera-López, J. Ariel Carrasco-Ochoa, and J. Francisco Martínez-Trinidad, “A new fast prototype selection method based on clustering,” Pattern Analysis & Applications, vol. 13, no. 2, pp. 131–141, 2010.
[19]
A. A. Akinyelu and A. E. Ezugwu, “Nature inspired instance selection techniques for support vector machine speed optimization,” IEEE Access, vol. 7, pp. 154581–154599, 2019.
[20]
A. Akinyelu and A. O. Adewumi, “On the performance of cuckoo search and bat algorithms based instance selection techniques for SVM speed optimization with application to E-fraud detection,” KSII Transactions on Internet and Information Systems, vol. 12, no. 3, pp. 1348–1375, 2018.
[21]
C. E. Brodley, “Recursive automatic bias selection for classifier construction,” Machine Learning, vol. 20, pp. 63–94, 1995.
[22]
I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man and Cybernetics, vol. 6, pp. 448–452, 1976.
[23]
N. Jankowski and M. Grochowski, “Comparison of instances selection algorithms I. Algorithms survey,” International Conference on Artificial Intelligence and Soft Computing, vol. 10, pp. 937–942, 2004.
[24]
Q. Y. Wang, X. Q. Ouyang, and J. C. Zhan, “A classification algorithm based on data clustering and data reduction for intrusion detection system over big data,” KSII Transactions on Internet and Information Systems, vol. 13, pp. 3714–3732, 2019.
[25]
P. Ghosh, A. Saha, and S. Phadikar, “Penalty-reward based instance selection method in cloud environment using the concept of nearest neighbor,” Procedia Computer Science, vol. 89, pp. 82–89, 2016.
[26]
L. Yang, Q. Zhu, J. Huang, and D. Cheng, “Adaptive edited natural neighbor algorithm,” Neurocomputing, vol. 230, pp. 427–433, 2017.
[27]
N. García-Pedrajas, J. A. Romero del Castillo, and D. Ortiz-Boyer, “A cooperative coevolutionary algorithm for instance selection for instance-based learning,” Machine Learning, vol. 78, no. 3, pp. 381–420, 2010.
[28]
J. Li, Q. Zhu, and Q. Wu, “A parameter-free hybrid instance selection algorithm based on local Sets with natural neighbors,” Applied Intelligence, vol. 50, no. 5, pp. 1527–1541, 2020.
[29]
B. Jia and Y. Liang, “Anti-D chain: a lightweight DDoS attack detection scheme based on heterogeneous ensemble learning in blockchain,” China Communications, vol. 17, no. 9, pp. 11–24, 2020.
[30]
C. Guo, Y. Ping, N. Liu, and S. S. Luo, “A two-level hybrid approach for intrusion detection,” Neurocomputing, vol. 214, 2016.
[31]
University of California Department of Information and Computer Science, KDD Cup 99 Intrusion Detection Dataset Task Description, University of California Department of Information and Computer Science, Berkeley, CA, USA, 1999, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
[32]
M. Alkasassbeh, G. Al-Naymat, B. A. Ahmad, and M. Almseidin, “Detecting distributed denial of service attacks using data mining techniques,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 1, 2016.
[33]
M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A survey of network-based intrusion detection data sets,” Computers & Security, vol. 86, pp. 147–167, 2019.

Cited By

View all
  • (2023)On detecting distributed denial of service attacks using fuzzy inference systemCluster Computing10.1007/s10586-022-03657-526:2(1337-1351)Online publication date: 1-Apr-2023
  • (2021)An effective genetic algorithm-based feature selection method for intrusion detection systemsComputers and Security10.1016/j.cose.2021.102448110:COnline publication date: 29-Dec-2021

Index Terms

  1. Representativeness-Based Instance Selection for Intrusion Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Security and Communication Networks
        Security and Communication Networks  Volume 2021, Issue
        2021
        10967 pages
        ISSN:1939-0114
        EISSN:1939-0122
        Issue’s Table of Contents
        This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

        Publisher

        John Wiley & Sons, Inc.

        United States

        Publication History

        Published: 01 January 2021

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 21 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)On detecting distributed denial of service attacks using fuzzy inference systemCluster Computing10.1007/s10586-022-03657-526:2(1337-1351)Online publication date: 1-Apr-2023
        • (2021)An effective genetic algorithm-based feature selection method for intrusion detection systemsComputers and Security10.1016/j.cose.2021.102448110:COnline publication date: 29-Dec-2021

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media