CN111797997A - Network intrusion detection method, model construction method, device and electronic equipment - Google Patents
Network intrusion detection method, model construction method, device and electronic equipment Download PDFInfo
- Publication number
- CN111797997A CN111797997A CN202010655063.6A CN202010655063A CN111797997A CN 111797997 A CN111797997 A CN 111797997A CN 202010655063 A CN202010655063 A CN 202010655063A CN 111797997 A CN111797997 A CN 111797997A
- Authority
- CN
- China
- Prior art keywords
- feature
- attribute
- randomness
- intrusion detection
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 134
- 238000010276 construction Methods 0.000 title claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 206
- 238000012549 training Methods 0.000 claims abstract description 84
- 238000010801 machine learning Methods 0.000 claims abstract description 24
- 230000009467 reduction Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims description 35
- 238000012545 processing Methods 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000012216 screening Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 9
- 238000005457 optimization Methods 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000005574 cross-species transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application provides a network intrusion detection method, a model construction method, a device and electronic equipment, and relates to the technical field of network security. The network intrusion model construction method comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model. The network intrusion detection model is matched with a network intrusion detection method, the characteristic attributes are optimized according to the uncertainty of the characteristic attributes, the low-quality characteristic attributes are eliminated, and the discarding of high-quality characteristic vectors due to the interference of the low-quality characteristic vectors at the subsequent stage is avoided, so that the higher-quality characteristic vectors can be extracted in a dimensionality reduction mode, and the accuracy of network intrusion detection is improved.
Description
Technical Field
The present application relates to the field of network security technologies, and in particular, to a network intrusion detection method, a model construction method, an apparatus, and an electronic device.
Background
With the rapid development of information-based construction and IT technology, various network technologies are more widely and deeply applied, and a lot of network security problems also occur while the openness of the network brings great convenience, and various network attacks from the inside and the outside become one of the main threats faced by enterprises. Therefore, the network-based intrusion detection system becomes one of the important tools for maintaining the security of the enterprise network for network administrators, system operation and maintenance personnel, and the like. Network intrusion detection systems have been developed for more than two decades, and in the past, intrusion detection techniques have relied primarily on known signature and anomaly detection techniques. Currently, with the fire development of machine learning and artificial intelligence technologies, research on intrusion detection systems based on machine learning, deep learning and other technologies becomes a hotspot direction in academic and industrial fields.
At present, the main research methods for network intrusion detection systems are based on machine learning or deep learning techniques, and these methods are mainly classified into two categories: supervised and unsupervised. The supervision type is generally realized according to a classification task area by using machine learning and deep learning algorithms, manual feature selection is carried out on a labeled data set, then feature selection or dimension reduction processing in various modes is carried out on a large number of feature attributes, model training and optimization are carried out through different algorithms, and finally a supervision type model is formed; the unsupervised type usually realizes intrusion detection from the perspective of anomaly detection, and selects a traditional machine learning algorithm if characteristics are manually selected, and selects a deep learning algorithm if characteristics are automatically selected.
Different detection techniques and methods have certain detection effects under specific conditions, but have some defects and shortcomings. Based on machine learning and a method for manually determining features, the method depends on feature selection and feature screening, dimension disasters are easily caused when too many features are used, and relevance is easily lost in a high-dimensional or even extra-high-dimensional space; too little feature usage tends to miss critical features. Although the method based on deep learning does not use manual feature selection, the feature self-learning function of the neural network can send data of network traffic types into a deep learning algorithm after preprocessing, so that partial attributes and features of original traffic data are lost, and the accuracy of network intrusion detection is reduced.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a network intrusion detection method, a model construction method, a device and an electronic device, so as to solve the problem of network traffic in the prior art.
The embodiment of the application provides a network intrusion detection model construction method, which comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
Optionally, the converting the network traffic data into at least one feature vector includes: converting the network traffic data into the at least one feature vector in units of sessions; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
In the implementation mode, the feature attributes and the attack categories are labeled on the feature vectors, processing basis is provided for screening of subsequent feature attributes, and feature screening efficiency and accuracy can be improved.
Optionally, the determining a training feature vector based on the randomness of the feature attributes includes: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
In the implementation mode, the relevance between the characteristic attribute and the attack category is determined through the randomness of the characteristic attribute corresponding to the attack category, and then the training characteristic vector is determined based on the randomness, so that the characteristic vector with high characteristic attribute phase quality can be accurately extracted for model training, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the determining the attribute randomness of each feature attribute corresponding to the attack category respectively includes: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
In the implementation mode, the first randomness set is carried out on the feature vectors based on the randomness of the feature vectors corresponding to the attack categories, then the second randomness of the feature attributes corresponding to the attack categories is determined in each first randomness set, the influence degree of the feature attributes on the judgment of the attack categories when the randomness of the feature vectors corresponding to the attack categories is the same is reflected, and then the attribute randomness is determined, so that the attribute randomness can more accurately reflect the quality of the feature attributes, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the determining a training feature vector based on the attribute randomness comprises: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a feature vector to be trained based on the feature attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
In the implementation mode, the second randomness set with the attribute randomness values being earlier is selected to determine the feature attribute vector, so that the high-quality feature attributes which have a large influence on network intrusion judgment can be rapidly and accurately screened out, feature extraction and processing are performed again based on the feature attribute vector, and finally the obtained training feature vector has higher feature attribute quality, so that the detection accuracy of the generated network intrusion detection model is improved.
The embodiment of the application also provides a network intrusion detection method, which comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; converting the at least one feature vector into an input feature vector; inputting the input feature vector into the network intrusion detection model obtained by the network intrusion detection model construction method, and determining whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
In the implementation mode, the feature attributes are optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the high-dimensional feature vectors used in network intrusion detection are firstly subjected to feature attribute screening, then feature selection and subsequent processing are carried out according to the screening result, then machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before the feature vectors are processed, the feature attributes are firstly optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the phenomenon that the high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the feature selection and subsequent processing stages is avoided, therefore, the feature vectors with higher quality can be extracted in a dimensionality reduction mode, meanwhile, the network intrusion detection model obtained by training based on the same principle is adopted to carry out network intrusion detection, the algorithm is simple in calculation, the algorithm is simple, the performance is better, and the accuracy and the efficiency of network intrusion detection are improved.
The embodiment of the present application further provides a device for constructing a network intrusion detection model, the device includes: the training data acquisition module is used for acquiring network flow data; the first characteristic acquisition module is used for converting the network traffic data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute; the training feature vector acquisition module is used for determining a training feature vector based on the randomness of each feature attribute; and the model training module is used for performing machine learning model training by adopting the training characteristic vectors to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
Optionally, the first feature obtaining module is specifically configured to: converting the network traffic data into the at least one feature vector in units of sessions; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
In the implementation mode, the feature attributes and the attack categories are labeled on the feature vectors, processing basis is provided for screening of subsequent feature attributes, and feature screening efficiency and accuracy can be improved.
Optionally, the training feature vector obtaining module is specifically configured to: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
In the implementation mode, the relevance between the characteristic attribute and the attack category is determined through the randomness of the characteristic attribute corresponding to the attack category, and then the training characteristic vector is determined based on the randomness, so that the characteristic vector with high characteristic attribute phase quality can be accurately extracted for model training, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the training feature vector obtaining module is specifically configured to: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
In the implementation mode, the first randomness set is carried out on the feature vectors based on the randomness of the feature vectors corresponding to the attack categories, then the second randomness of the feature attributes corresponding to the attack categories is determined in each first randomness set, the influence degree of the feature attributes on the judgment of the attack categories when the randomness of the feature vectors corresponding to the attack categories is the same is reflected, and then the attribute randomness is determined, so that the attribute randomness can more accurately reflect the quality of the feature attributes, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the training feature vector obtaining module is specifically configured to: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a feature vector to be trained based on the feature attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
In the implementation mode, the second randomness set with the attribute randomness values being earlier is selected to determine the feature attribute vector, so that the high-quality feature attributes which have a large influence on network intrusion judgment can be rapidly and accurately screened out, feature extraction and processing are performed again based on the feature attribute vector, and finally the obtained training feature vector has higher feature attribute quality, so that the detection accuracy of the generated network intrusion detection model is improved.
An embodiment of the present application further provides a network intrusion detection apparatus, where the apparatus includes: the detection data acquisition module is used for acquiring network flow data; the second characteristic acquisition module is used for converting the network flow data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute; an input feature vector obtaining module, configured to convert the at least one feature vector into an input feature vector; and the intrusion detection module is used for inputting the input characteristic vectors into the network intrusion detection model obtained by the network intrusion detection model construction device and determining whether network intrusion exists in the network flow data or not based on the output result of the network intrusion detection model.
In the implementation mode, the feature attributes are optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the high-dimensional feature vectors used in network intrusion detection are firstly subjected to feature attribute screening, then feature selection and subsequent processing are carried out according to the screening result, then machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before the feature vectors are processed, the feature attributes are firstly optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the phenomenon that the high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the feature selection and subsequent processing stages is avoided, therefore, the feature vectors with higher quality can be extracted in a dimensionality reduction mode, meanwhile, the network intrusion detection model obtained by training based on the same principle is adopted to carry out network intrusion detection, the algorithm is simple in calculation, the algorithm is simple, the performance is better, and the accuracy and the efficiency of network intrusion detection are improved.
An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any one of the above implementation manners when reading and executing the program instructions.
The embodiment of the present application further provides a readable storage medium, in which computer program instructions are stored, and the computer program instructions are read by a processor and executed to perform the steps in any of the above implementation manners.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a method for constructing a network intrusion detection model according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating an attribute randomness determining step according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of a training feature vector determining step according to an embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating a network intrusion detection method according to an embodiment of the present application.
Fig. 5 is a schematic block diagram of a network intrusion detection model building apparatus according to an embodiment of the present application.
Fig. 6 is a schematic block diagram of a network intrusion detection device according to an embodiment of the present disclosure.
Icon: 30-a network intrusion detection model construction device; 31-a training data acquisition module; 32-a first feature acquisition module; 33-a training feature vector acquisition module; 34-a model training module; 40-a network intrusion detection device; 41-detection data acquisition module; 42-a second feature acquisition module; 43-input feature vector acquisition module; 44-intrusion detection module.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The research of the applicant discovers that the characteristics selected manually based on the machine learning method are more, the performance is seriously reduced due to the fact that dimension is not reduced, overfitting can be caused, the weight characteristics of the attributes can be lost due to the fact that dimension reduction is directly performed on the characteristic attributes, some original important attributes are discarded, the attribute characteristic vectors during model training can be not accurate enough, and the accuracy of the model is finally influenced. On the other hand, the method based on deep learning performs feature self-learning, firstly, network traffic needs to be converted into pictures or data sequences, and loss of original traffic information can be caused in the conversion process, so that the quality of training data is deteriorated, and finally the accuracy of the model is influenced.
In order to solve the above problems in the prior art, an embodiment of the present application provides a method for constructing a network intrusion detection model. Referring to fig. 1, fig. 1 is a schematic flow chart of a method for building a network intrusion detection model according to an embodiment of the present application, where the method for building a network intrusion detection model includes the following specific steps:
step S12: and acquiring network flow data.
Since the network intrusion detection model construction method inputs the training and construction stages of the network intrusion detection model, the offline data with the common category label can be used as the input network traffic data in this embodiment.
In particular, the network traffic data may be a classified labeled attackHit class of process characteristics analysis software package (pcap) formatted data sets. Alternatively, the attack category in this embodiment may be Lable0,Label1,Label2,…,LabelL-1Is shown in which Lable0The normal category of non-attack is shown, and the total of L attack categories.
Optionally, the Attack categories may include common vulnerability spillover, scanning, extraction, and Distributed Denial of Service Attack (DDoS).
Step S14: and converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute.
Reading and analyzing all network traffic data in the pcap, and converting the network traffic data into at least one feature vector by taking a Session (Session) as a unit.
Further, in order to perform feature attribute screening subsequently, in this embodiment, feature attributes and categories of feature vectors are labeled, and the feature vector corresponding to each session may be represented as follows: v ═ V1,V2,V3,…Vv,LaveliV represents the number of original feature attributes, ViRepresenting a characteristic attribute i, LabeliRepresenting an attack category i.
Commonly used feature attributes are more than one hundred dimensions, and optionally, the feature attributes may include: the number of uplink packets, the number of downlink packets, the number of uplink bytes, the number of downlink bytes, the average value of uplink packet length, the average value of downlink packet length, the distribution of the first 50 bytes, the value of N-Gram (Chinese language model), and the like.
Step S16: a training feature vector is determined based on the randomness of each feature attribute.
For step S16, it is for each feature attribute V in the feature vector generated in step S14iAnd optimizing the characteristic attributes based on the uncertainty of the value range of each characteristic attribute in the whole data set. The basic principle of the optimization is as follows: given a characteristic attribute ViObtaining the attack category value of all data in the data set in the characteristic attribute, if all data in the whole data set are in the attack category valueThe larger the randomness of the attack category value of the data on the attribute (the larger the uncertainty is), the better the effect of the characteristic attribute in the classification model is shown, so the characteristic attribute is judged to be the high-quality characteristic attribute; if all the values of the attribute are very similar or have small randomness, the effect of the attribute in the classification model is weak, and therefore the characteristic attribute is judged to be the low-quality characteristic attribute.
Specifically, step S16 may include the following sub-steps:
step S161: and respectively determining the attribute randomness of each characteristic attribute corresponding to each attack category.
Further, referring to fig. 2, fig. 2 is a schematic flowchart of a step of determining attribute randomness provided in an embodiment of the present application, where the step of determining attribute randomness specifically includes the following steps:
step S1611: and respectively determining the first randomness of each feature vector corresponding to each attack category.
In this embodiment, D represents the data set of the whole feature vector, and the value is D1,D2,…DN,DiRepresenting the i-th feature vector in the sample set, N representing the total amount of data in the data set, diRepresenting the number of eigenvectors with attack type i in D, giA first randomness of an attack class i in D,wherein,
then G { G } is adopted0,g1,…,gL-1Denotes a data set g0,g1,…,gL-1Randomness of data sets corresponding to different attack categories:
step S1612: and dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness.
TkA set of characteristic vectors, T, of value k representing a first randomness in the dataseti,kRepresentation feature attribute ViThe first randomness in (1) is a set of characteristic vectors with a value of k, Ti,kI.e. one of the first set of randomness.
Step S1613: and determining second randomness corresponding to each characteristic attribute of each attack category based on the plurality of first randomness sets.
Gi,kIs shown at Ti,kThe characteristic attribute V is corresponding to each attack category in the setiThe second randomness of (2) is calculated in the same manner as the randomness calculation described above.
Step S1614: and determining attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
By using GiAttribute V for representing corresponding characteristics of attack categoryiRandomness of (2) to Wherein G is G { G ═ G0,g1,…,gL-1}。
Using E (i) to represent the characteristic attribute ViThe attribute randomness corresponding to the attack category, e (i) ═ G-Gi。
Step S162: and determining training feature vectors based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a training feature vector determining step according to an embodiment of the present disclosure. Specifically, the training feature vector determination step S162 may include the following sub-steps:
step S1621: and dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness.
The second randomness set may be denoted as e ═ e1,e2,…,ev}。
Step S1622: and performing descending arrangement on the plurality of second randomness sets according to the values of the attribute randomness, and selecting the characteristic attributes with the preset number from the plurality of second randomness sets after the arrangement as characteristic attribute vectors.
The feature attribute set in descending order is defined as E ═ E1,E2,…,Ev}。
It should be understood that the value of the preset number may be adjusted according to the specific network intrusion detection requirement, and in this embodiment, it may be, but is not limited to, H, where H is a positive integer.
The feature attribute vector may be expressed as NV ═ NV1,NV2,…,NVH},1<H<v。
Step S1623: and converting the network traffic data into a characteristic vector to be trained based on the characteristic attribute vector.
Step S1624: and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the training feature vector to be determined to obtain the training feature vector.
Optionally, in this embodiment, the dimensionality of the training feature vector to be determined may be reduced by using a Principal Component Analysis (PCA) dimensionality reduction method. The PCA transforms the original data into a group of representations which are linearly independent of each dimension through linear transformation, can be used for extracting main characteristic components of the data, is commonly used for dimensionality reduction of high-dimensional data, and can retain most characteristics and reduce dimensionality simultaneously.
Wherein, Standard Normal transformation (SNV) is also called Standard positive-phase variance correction or normalization processing, and after the Standard Normal transformation, the feature attributes in the feature vector to be determined are screened again to generate a training feature vector SNV ═ SNV { (SNV)1,SNV2,…,NVP},1<P<H。
Step S18: and training a machine learning model by adopting the training characteristic vector to obtain a network intrusion detection model.
And processing the training characteristic vector SNV in an SNV characteristic attribute mode to generate a training characteristic vector set based on the SNV, and sending the training characteristic vector set into a machine learning model for training to obtain a network intrusion detection model.
Optionally, the Machine learning model in this embodiment may be based on a supervised Machine learning algorithm such as a random forest, a Support Vector Machine (SVM), a Gradient Boost iterative Decision Tree (GBDT), and the like, and a learning model thereof.
After obtaining the network intrusion detection model, when performing real-time network intrusion detection, it is further necessary to input network traffic data into the network intrusion detection model for detection, so that this embodiment provides a network intrusion detection method, please refer to fig. 4, where fig. 4 is a schematic flow diagram of a network intrusion detection method provided in this embodiment of the present application, and the specific steps of the network intrusion detection method may be as follows:
step S21: and acquiring network flow data.
Step S22: and converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute.
Step S23: the at least one feature vector is converted into an input feature vector.
Alternatively, the feature vector conversion may be vectorization processing based on the SNV feature vector.
Step S24: and inputting the input characteristic vector into a network intrusion detection model, and determining whether network intrusion exists in the network flow data based on an output result of the network intrusion detection model.
Based on the network intrusion detection model construction method and the network intrusion detection method provided by the embodiment of the application, before the dimensionality reduction of high-dimensional network traffic data is carried out, the feature attributes are firstly screened and filtered based on the feature attribute randomness, the low-value feature attributes are discarded, and the high-value feature attributes are reserved, so that the feature selection can be rapidly, efficiently and accurately realized, the feature attribute relevance of original traffic data is reserved, and the online detection performance of an intrusion detection system based on machine learning is improved under the condition that the preparation rate of a detection model is ensured.
In order to cooperate with the method for constructing a network intrusion detection model provided in this embodiment, an apparatus 30 for constructing a network intrusion detection model is also provided in this embodiment. Referring to fig. 5, fig. 5 is a schematic block diagram of a network intrusion detection model building apparatus according to an embodiment of the present disclosure.
The network intrusion detection model building apparatus 30 includes:
a training data obtaining module 31, configured to obtain network traffic data;
a first feature obtaining module 32, configured to convert the network traffic data into at least one feature vector, where each feature vector corresponds to at least one feature attribute;
a training feature vector obtaining module 33, configured to determine a training feature vector based on randomness of each feature attribute;
and the model training module 34 is configured to perform machine learning model training by using the training feature vectors to obtain a network intrusion detection model.
Optionally, the first feature obtaining module 32 is specifically configured to: converting the network traffic data into at least one feature vector by taking a session as a unit; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
Optionally, the training feature vector obtaining module 33 is specifically configured to: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining training feature vectors based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
Optionally, the training feature vector obtaining module 33 is specifically configured to: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
Optionally, the training feature vector obtaining module 33 is specifically configured to: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and the characteristic attributes with the preset number are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a characteristic vector to be trained based on the characteristic attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the training feature vector to be determined to obtain the training feature vector.
In order to cooperate with the network intrusion detection method provided in this embodiment, an embodiment of the present application further provides a network intrusion detection device 40. Referring to fig. 6, fig. 6 is a schematic block diagram of a network intrusion detection device according to an embodiment of the present disclosure.
The network intrusion detection device 40 includes:
a detection data obtaining module 41, configured to obtain network traffic data;
a second feature obtaining module 42, configured to convert the network traffic data into at least one feature vector, where each feature vector corresponds to at least one feature attribute;
an input feature vector obtaining module 43, configured to convert at least one feature vector into an input feature vector;
and the intrusion detection module 44 is configured to input the input feature vector into a network intrusion detection model obtained by the network intrusion detection model building device, and determine whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in any one of the method for building a network intrusion detection model and the method for detecting network intrusion provided by this embodiment.
It should be understood that the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic device having a logical computing function.
The embodiment of the application also provides a readable storage medium, wherein computer program instructions are stored in the readable storage medium, and the computer program instructions are read by a processor and run to execute the steps in the network intrusion detection model building method or the network intrusion detection method.
To sum up, the embodiment of the present application provides a network intrusion detection method, a model construction method, an apparatus and an electronic device, wherein the network intrusion model construction method includes: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Therefore, the present embodiment further provides a readable storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any of the block data storage methods. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. A method for constructing a network intrusion detection model is characterized by comprising the following steps:
acquiring network flow data;
converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute;
determining a training feature vector based on the randomness of each feature attribute;
and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
2. The method of claim 1, wherein converting the network traffic data into at least one feature vector comprises:
converting the network traffic data into the at least one feature vector in units of sessions;
and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
3. The method of claim 1 or 2, wherein the determining a training feature vector based on the randomness of the feature attributes comprises:
respectively determining attribute randomness of each characteristic attribute corresponding to each attack category;
and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
4. The method of claim 3, wherein the separately determining the attribute randomness of each feature attribute corresponding to the attack category comprises:
respectively determining first randomness of each feature vector corresponding to each attack category;
dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness;
determining second randomness, corresponding to each characteristic attribute, of each attack category based on the plurality of first randomness sets;
and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
5. The method of claim 4, wherein the determining training feature vectors based on the attribute randomness comprises:
dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness;
the second randomness sets are arranged in a descending order according to the value of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors;
converting the network traffic data into a feature vector to be trained based on the feature attribute vector;
and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
6. A method for network intrusion detection, the method comprising:
acquiring network flow data;
converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute;
converting the at least one feature vector into an input feature vector;
inputting the input feature vector into a network intrusion detection model obtained by the network intrusion detection model construction method according to any one of claims 1 to 5, and determining whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
7. A network intrusion detection model building apparatus, the apparatus comprising:
the training data acquisition module is used for acquiring network flow data;
the first characteristic acquisition module is used for converting the network traffic data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute;
the training feature vector acquisition module is used for determining a training feature vector based on the randomness of each feature attribute;
and the model training module is used for performing machine learning model training by adopting the training characteristic vectors to obtain a network intrusion detection model.
8. A network intrusion detection device, the device comprising:
the detection data acquisition module is used for acquiring network flow data;
the second characteristic acquisition module is used for converting the network flow data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute;
an input feature vector obtaining module, configured to convert the at least one feature vector into an input feature vector;
an intrusion detection module, configured to input the input feature vector into the network intrusion detection model obtained by the network intrusion detection model construction apparatus according to claim 7, and determine whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
9. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-6.
10. A storage medium having stored thereon computer program instructions for executing the steps of the method according to any one of claims 1 to 6 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010655063.6A CN111797997A (en) | 2020-07-08 | 2020-07-08 | Network intrusion detection method, model construction method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010655063.6A CN111797997A (en) | 2020-07-08 | 2020-07-08 | Network intrusion detection method, model construction method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111797997A true CN111797997A (en) | 2020-10-20 |
Family
ID=72809753
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010655063.6A Pending CN111797997A (en) | 2020-07-08 | 2020-07-08 | Network intrusion detection method, model construction method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111797997A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113553589A (en) * | 2021-07-30 | 2021-10-26 | 江苏易安联网络技术有限公司 | Extraction method, device and application of malicious software propagation characteristics |
US20210374239A1 (en) * | 2019-02-15 | 2021-12-02 | Sophos Limited | Augmented security recognition tasks |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100115618A1 (en) * | 2008-11-03 | 2010-05-06 | Korea University Industry And Academy Collaboration Foundation | Method and device for detecting unknown network worms |
CN102158486A (en) * | 2011-04-02 | 2011-08-17 | 华北电力大学 | Method for rapidly detecting network invasion |
CN103944887A (en) * | 2014-03-24 | 2014-07-23 | 西安电子科技大学 | Intrusion event detection method based on hidden conditional random field |
CN104219253A (en) * | 2014-10-13 | 2014-12-17 | 吉林大学 | Multi-step attack alarm associated network service interface development method |
CN106656981A (en) * | 2016-10-21 | 2017-05-10 | 东软集团股份有限公司 | Network intrusion detection method and device |
CN109962909A (en) * | 2019-01-30 | 2019-07-02 | 大连理工大学 | A kind of network intrusions method for detecting abnormality based on machine learning |
CN110188883A (en) * | 2019-04-22 | 2019-08-30 | 中国移动通信集团河北有限公司 | Failure analysis methods, calculate equipment and computer storage medium at device |
CN110602120A (en) * | 2019-09-19 | 2019-12-20 | 国网江苏省电力有限公司信息通信分公司 | Network-oriented intrusion data detection method |
CN111314329A (en) * | 2020-02-03 | 2020-06-19 | 杭州迪普科技股份有限公司 | Traffic intrusion detection system and method |
-
2020
- 2020-07-08 CN CN202010655063.6A patent/CN111797997A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100115618A1 (en) * | 2008-11-03 | 2010-05-06 | Korea University Industry And Academy Collaboration Foundation | Method and device for detecting unknown network worms |
CN102158486A (en) * | 2011-04-02 | 2011-08-17 | 华北电力大学 | Method for rapidly detecting network invasion |
CN103944887A (en) * | 2014-03-24 | 2014-07-23 | 西安电子科技大学 | Intrusion event detection method based on hidden conditional random field |
CN104219253A (en) * | 2014-10-13 | 2014-12-17 | 吉林大学 | Multi-step attack alarm associated network service interface development method |
CN106656981A (en) * | 2016-10-21 | 2017-05-10 | 东软集团股份有限公司 | Network intrusion detection method and device |
CN109962909A (en) * | 2019-01-30 | 2019-07-02 | 大连理工大学 | A kind of network intrusions method for detecting abnormality based on machine learning |
CN110188883A (en) * | 2019-04-22 | 2019-08-30 | 中国移动通信集团河北有限公司 | Failure analysis methods, calculate equipment and computer storage medium at device |
CN110602120A (en) * | 2019-09-19 | 2019-12-20 | 国网江苏省电力有限公司信息通信分公司 | Network-oriented intrusion data detection method |
CN111314329A (en) * | 2020-02-03 | 2020-06-19 | 杭州迪普科技股份有限公司 | Traffic intrusion detection system and method |
Non-Patent Citations (2)
Title |
---|
张克君等: ""基于DBN和TSVM的混合入侵检测模型研究"", 《计算机应用与软件》, vol. 35, no. 5, pages 313 - 317 * |
朱文杰等: ""基于信息熵的SVM入侵检测技术"", 《计算机工程与科学》, pages 47 - 51 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210374239A1 (en) * | 2019-02-15 | 2021-12-02 | Sophos Limited | Augmented security recognition tasks |
US11681800B2 (en) * | 2019-02-15 | 2023-06-20 | Sophos Limited | Augmented security recognition tasks |
CN113553589A (en) * | 2021-07-30 | 2021-10-26 | 江苏易安联网络技术有限公司 | Extraction method, device and application of malicious software propagation characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112398779B (en) | Network traffic data analysis method and system | |
CN111277603B (en) | Unsupervised anomaly detection system and method | |
CN108737406B (en) | Method and system for detecting abnormal flow data | |
CN111600919B (en) | Method and device for constructing intelligent network application protection system model | |
Wang et al. | Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things | |
CN110826060A (en) | Visual classification method and device for malicious software of Internet of things and electronic equipment | |
CN111556016B (en) | Network flow abnormal behavior identification method based on automatic encoder | |
CN113360912A (en) | Malicious software detection method, device, equipment and storage medium | |
CN111431849B (en) | Network intrusion detection method and device | |
EP4258610A1 (en) | Malicious traffic identification method and related apparatus | |
CN109033833B (en) | Malicious code classification method based on multiple features and feature selection | |
WO2022180613A1 (en) | Global iterative clustering algorithm to model entities' behaviors and detect anomalies | |
CN112738014A (en) | Industrial control flow abnormity detection method and system based on convolution time sequence network | |
Ustebay et al. | Cyber attack detection by using neural network approaches: shallow neural network, deep neural network and autoencoder | |
CN111259397A (en) | Malware classification method based on Markov graph and deep learning | |
CN117220920A (en) | Firewall policy management method based on artificial intelligence | |
Harbola et al. | Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set | |
CN111797997A (en) | Network intrusion detection method, model construction method, device and electronic equipment | |
Feng et al. | Network protocol recognition based on convolutional neural network | |
CN117914555A (en) | Training and flow detection method and device for intelligent gateway | |
Ao | Using machine learning models to detect different intrusion on NSL-KDD | |
Hanafi et al. | IDSX-Attention: Intrusion detection system (IDS) based hybrid MADE-SDAE and LSTM-Attention mechanism. | |
CN115622810A (en) | Business application identification system and method based on machine learning algorithm | |
CN115941295A (en) | Network behavior anomaly detection method and device | |
Natarajan et al. | An Investigation of Crime Detection Using Artificial Intelligence and Face Sketch Synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |