CN112019497A - Word embedding-based multi-stage network attack detection method - Google Patents
Word embedding-based multi-stage network attack detection method Download PDFInfo
- Publication number
- CN112019497A CN112019497A CN202010660792.0A CN202010660792A CN112019497A CN 112019497 A CN112019497 A CN 112019497A CN 202010660792 A CN202010660792 A CN 202010660792A CN 112019497 A CN112019497 A CN 112019497A
- Authority
- CN
- China
- Prior art keywords
- attack
- stage
- data
- vector
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a word embedding-based multi-stage network attack detection method, which comprises the following steps: 1) carrying out feature selection on a data set formed by network traffic features after attack occurs; 2) vectorizing network traffic data using a word embedding method; 3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method; 4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack. The method has the advantages that the intrusion detection system can automatically associate the attack stage from the data packet level without defining association rules, and simultaneously, the problem that no alarm is generated in part of the attack stage when multi-stage attack detection is carried out from the alarm level is avoided.
Description
Technical Field
The invention relates to a word embedding-based multi-stage attack network attack detection method, which is suitable for purposefully carrying out multi-stage network attack intrusion detection by an attacker under an industrial internet boundary protection scene.
Background
The industrial internet boundary protection generally comprises five aspects of identification, protection, detection, response and recovery. The intrusion detection technology is an important ring in industrial internet boundary protection, and the attack is positioned by monitoring and detecting continuous network flow of the industrial internet and analyzing the network flow characteristics after the attack so as to identify the occurrence of a security event and provide information for a security response and security recovery mechanism.
Due to the continuous development of the industrial internet boundary protection technology, it is gradually difficult for an attacker to infiltrate the network by utilizing isolated vulnerabilities and security flaws (such as SQL injection attacks, denial of service attacks, etc.). Therefore, in order to successfully invade, an attacker often needs to combine and gradually infiltrate a series of attack means such as network detection, vulnerability discovery, defect utilization and the like, so that one invasion process is composed of a plurality of stages to form a multi-stage attack, and what is more, in order to achieve the purpose of hiding the attack, the attacker often disguises some attack stages as normal network behaviors, but the disguised attack stages are associated with other behaviors to achieve the purpose of hiding the attack.
The traditional intrusion detection technology based on machine learning generally models network traffic analysis, or identifies based on the network characteristics of existing attacks, or detects through the abnormity of network packets, basically ignores the sequence correlation characteristics of network data, and cannot detect multi-stage attacks. Therefore, detection of multi-stage attacks faces new challenges. On the other hand, the existing multi-stage attack detection methods are mainly classified into rule-based methods and statistical learning algorithm-based methods, wherein the rule-based methods need to write rules manually and are generally used for extracting multi-stage attacks from attacked data and performing association analysis. The method based on the statistical learning algorithm mainly uses a hidden Markov model, a large number of attack samples are learned through statistical analysis to obtain model parameters, but the hidden Markov model uses an independence hypothesis, namely, the current state is only related to the previous state, and the deeper multi-stage attack characteristics cannot be learned.
Disclosure of Invention
The invention aims to provide a multi-stage network attack detection method based on a word embedding method from the viewpoint that corresponding network packets have potential correlation in different attack stages of multi-stage attack. Different from the existing method, the invention develops a word embedding-based multi-stage network attack detection method aiming at the sequence characteristics of network flow data and the planned multi-stage attack behavior of an attacker.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
a multi-stage attack network attack detection method based on word embedding comprises the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
The characteristic selection in the step 1) comprises the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, and making X ═ (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, and selecting n features;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),…,xi (j)) Finally, obtaining a sequence with no more than the characteristic quantity;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈Rk;
Step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j));
The step 3) of constructing the current vector and the historical vector and constructing the training sample comprises the following steps:
step 3.1, creating a vector H epsilon R with the length of mm×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkInformation indicating current data, let AiRepresenting a corresponding attack phase;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, correspondingThe label is 1, and data at other moments are takenAs a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, and the positive and negative sample construction ratio is 1: g, g>1 is an algorithm parameter which is manually set during sampling training;
step 3.3, updating H according to the attack stage, wherein the new vector contains the information of the attack stage which is present at present, and repeating the sample construction process in the step 3.2 to construct M samples;
the establishment of the word embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating a correlation vector based on the current data vector D and the historical vector H, and recording the correlation vector as R (H, D) as the input of a classifier;
the correlation vector R (H, D) is calculated by
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmMaking a Hadamard product;
the optimization objective is
Wherein S represents the size of a training set, H and D are input of a model, y is a label and represents the real output of the model, and p (D, H) represents the association probability between current data D and historical data H which have been attacked;
step 4.2, initializing a history vector H;
step 4.3, reading the real-time network traffic data, selecting the characteristics used by the read-in real-time network traffic data according to the network traffic characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, Using step4.1, judging the relation between D and the occurred attack stage, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the D into the cache, and updating the H when the size of the cache reaches the specified size;
and 4.6, repeating the steps 4.3 to 4.5.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
the intrusion detection system can automatically associate the attack stage from the data packet level without defining association rules, and simultaneously avoids the problem that no alarm is generated in part of the attack stage when multi-stage attack detection is carried out from the alarm level.
Drawings
FIG. 1 is a general flow chart of the process of the present invention.
FIG. 2(a) is a diagram of a multi-stage attack detection model in the training phase of the present invention.
FIG. 2(b) is a diagram of a multi-stage attack detection model at the testing stage of the present invention.
Fig. 3(a) is a diagram of a network traffic data vectorization model according to the present invention.
Fig. 3(b) is an exemplary diagram of vectorization of network traffic data according to the present invention.
FIG. 4 is a receiver operational characteristic of a multi-stage attack detection model.
FIG. 5 is a confusion matrix corresponding to the multi-stage attack detection results in the test set.
FIG. 6(a) is a diagram showing the real-time detection result of multi-stage attack in the test set according to the present invention.
FIG. 6(b) is a diagram of the original attack phase distribution of network traffic in the test suite according to the present invention.
Fig. 6(c) is a diagram of the multi-stage attack real-time detection result in other network scenarios according to the present invention.
Fig. 6(d) is the original attack stage distribution diagram under other network scenarios.
Detailed Description
The invention is described in detail below with reference to the drawings and preferred embodiments.
The first embodiment is as follows:
as shown in fig. 1, a multi-stage network attack detection method based on word embedding includes the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
Example two:
this embodiment is substantially the same as the first embodiment, and is characterized in that:
in this embodiment, a multi-stage network attack detection method based on word embedding, the feature selection in step 1) includes the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, wherein the division ratio of the embodiment is 80%: 10%: 10%, let X be (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, selecting n features, selecting an IP address, a network protocol, a port number, data length and time difference as original features in the embodiment, and mapping the port number to a common vulnerability utilization port;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),,xi (j)) Finally, a sequence with no more than the number of features is obtained, in the embodiment, only a network, a port number and an IP are finally selected for vectorization, wherein a source port and a destination port are of the same polarity, so that only an embedded vector of a sequence learning port is constructed, and IP addresses are the same;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈RkDenotes vi j∈RkIn this embodiment, the vector dimension is set to be 8;
step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j));
The training sample construction in the step 3) comprises the following steps:
step 3.1, maintain a vector H belonging to R with length of mm×j×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkWatch, watchInformation indicating current data, order AiRepresents the corresponding attack phase, the length of H in this example is 50;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, the corresponding label is 1, and data at other time points is arbitrarily takenAs a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, the positive and negative sample construction ratio is 1: g, and in order to enhance the generalization capability of the model, the positive and negative sample ratio is set to be 1:50 in the embodiment;
step 3.3, updating H according to the attack stage, wherein the new vector contains the information of the attack stage which has appeared at present, repeating the sample construction process in the step 3.2, and constructing M samples, wherein M is 20400 in the embodiment;
the establishment of the word embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating an association vector based on a current data vector D and a historical vector H, recording the association vector as R (H, D), and training a multi-stage attack detection model based on word embedding;
the correlation vector calculation method comprises
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmAnd (5) making a Hadamard product.
The optimization objective is
Wherein S represents the size of the training set, H and D are the input of the model, y is the label and represents the real output of the model, and p (D, H) represents the association probability between the current data D and the historical data H in which the attack has occurred.
Step 4.2, initializing a history vector H, reading real-time network traffic data, wherein the initialization H is the first stage of the multi-stage attack;
4.3, selecting the characteristics used by the read-in real-time network flow data according to the network flow characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, judging the relation between the D and the occurred attack stage by using the model established in the step 4.1, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the value D into the cache, and updating the value H when the size of the cache reaches the specified size, wherein the threshold value is selected according to the recall rate and the accuracy rate of the model in the embodiment, and the size of the cache is set to be 50;
and 4.6, repeating the steps 4.3 to 4.5.
Example three:
this embodiment is basically the same as the second embodiment, and is characterized in that:
in this embodiment, when a word embedding-based multi-stage attack detection model is established, a plurality of classification algorithms are used to output association probabilities, and the results are shown in the following table through comparison analysis in multiple aspects of accuracy, recall, F1, AUC, and inference time:
the network traffic characteristics selected in this embodiment are as follows:
(1) protocol, network transport protocol types such as TCP, UDP, ICMP, etc.;
(2) data length, length of a single data packet;
(3) delta time, the time difference between the current packet and the last packet;
(4) source port, source host port number;
(5) destinationport, destination host port number;
(6) source IP, source host IP address;
(7) destination IP, destination host IP address;
in this embodiment, fig. 2(a) shows a multi-stage attack detection model based on word embedding in a training stage, where inputs of the model are historical attack stage data and current data, a historical vector and a current vector are obtained respectively after word embedding vectorization, an association vector is calculated by using the method mentioned in step 4.1, and finally, association between the current data and the historical stage of attack is determined by using a classifier. Fig. 2(b) is a model diagram of a test phase, where the test phase model acquires network traffic in real time, initializes a history vector by using an intrusion detection system, outputs the association probability between current data and the history vector, adds the probability value to a cache when the probability value is greater than a given threshold, and updates the history vector when the cache reaches a specified size.
In this embodiment, the multi-variable word embedding vectorization network data model in step two is shown in fig. 3(a), fig. 3(b) is an example of multi-variable word embedding vectorization, and it is noted that the example of fig. 3(b) includes three features, and there are only two sequences after serialization because the port numbers are the same between the source host and the target host, so the port numbers of the source host and the target host are regarded as the same sequence, and the same port numbers have the same word vector between the source host and the target host.
In this embodiment, fig. 4 shows the receiver operation characteristic curve of the proposed multi-stage attack detection model when using different classifiers, which shows that the ensemble learning classifier achieves the best effect, mainly because the vectors obtained by using the word embedding method are high-dimensional data, in this embodiment, the word embedding dimension is not high, and a good result can be obtained by using the conventional machine learning classification algorithm.
Fig. 5 shows the confusion matrix corresponding to the multi-stage attack detection results in the test set, and it can be seen that each attack stage is correctly detected with an accuracy of over 90%.
Fig. 6(a) shows the association probability of the data at the current time and the previous attack stages in the test set network traffic of the multi-stage attack model, fig. 6(b) shows the positions where different stages of the attack occur in the original test set data, and comparing fig. 6(a) and fig. 6(b), the model can correctly detect that the model is associated with the previous attack stages. Meanwhile, in order to test the generalization performance of the model, the multi-stage attack detection model of the invention is further evaluated in different network scenarios, and the corresponding detection result and the original attack stage distribution are as shown in fig. 6(c) and fig. 6(d), and it can be seen from the figure that the attack stage 2 to the attack stage 4 are correctly associated in the network scenario.
The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other various embodiments according to the disclosure of the present invention, so that all designs and concepts of the present invention can be changed or modified without departing from the scope of the present invention.
Claims (4)
1. A multi-stage network attack detection method based on word embedding is characterized by comprising the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
2. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the feature selection in the step 1) comprises the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, and making X ═ (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, and selecting n features;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),…,xi (j)) Finally, obtaining a sequence with no more than the characteristic quantity;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈Rk;
Step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j))。
3. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the constructing of the training samples in the step 3) comprises the following steps:
step 3.1, creating a vector H epsilon R with the length of mm×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkInformation indicating current data, let AiRepresenting a corresponding attack phase;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, the corresponding label is 1, and data at other time points is arbitrarily takenAs a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, and the positive and negative sample construction ratio is 1: g, g>1 is an algorithm parameter which is manually set during sampling training;
and 3.3, updating H according to the attack stage, wherein the new vector contains the current attack stage information, and repeating the sample construction process in the step 3.2 to construct M samples.
4. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the establishing of the word-embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating an association vector based on a current data vector D and a historical vector H, recording the association vector as R (H, D), and training a multi-stage attack detection model based on word embedding;
the correlation vector R (H, D) is calculated by
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmMake HadaProduct of Mare;
the optimization objective is
Wherein S represents the size of a training set, H and D are input of a model, y is a label and represents the real output of the model, and p (D, H) represents the association probability between current data D and historical data H which have been attacked;
step 4.2, initializing a history vector H, reading real-time network traffic data, wherein the initialization H is the first stage of the multi-stage attack;
4.3, selecting the characteristics used by the read-in real-time network flow data according to the network flow characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, judging the relation between the D and the occurred attack stage by using the model established in the step 4.1, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the D into the cache, and updating the H when the size of the cache reaches the specified size, wherein the threshold value is selected according to the recall rate and the accuracy rate of the model, and the size of the cache is set to be 50;
and 4.6, repeating the steps 4.3 to 4.5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010660792.0A CN112019497B (en) | 2020-07-10 | 2020-07-10 | Word embedding-based multi-stage network attack detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010660792.0A CN112019497B (en) | 2020-07-10 | 2020-07-10 | Word embedding-based multi-stage network attack detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112019497A true CN112019497A (en) | 2020-12-01 |
CN112019497B CN112019497B (en) | 2021-12-03 |
Family
ID=73498505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010660792.0A Active CN112019497B (en) | 2020-07-10 | 2020-07-10 | Word embedding-based multi-stage network attack detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112019497B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112995209A (en) * | 2021-04-20 | 2021-06-18 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
CN113098735A (en) * | 2021-03-31 | 2021-07-09 | 上海天旦网络科技发展有限公司 | Inference-oriented application flow and index vectorization method and system |
CN113179256A (en) * | 2021-04-12 | 2021-07-27 | 中国电子科技集团公司第三十研究所 | Time information safety fusion method and system for time synchronization system |
CN113221100A (en) * | 2021-02-09 | 2021-08-06 | 上海大学 | Countermeasure intrusion detection method for industrial internet boundary protection |
CN113591971A (en) * | 2021-07-28 | 2021-11-02 | 上海数鸣人工智能科技有限公司 | User individual behavior prediction method based on DPI time series word embedded vector |
CN117118687A (en) * | 2023-08-10 | 2023-11-24 | 国网冀北电力有限公司张家口供电公司 | Multi-stage attack dynamic detection system based on unsupervised learning |
CN118075030A (en) * | 2024-04-19 | 2024-05-24 | 鹏城实验室 | Network attack detection method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107807987A (en) * | 2017-10-31 | 2018-03-16 | 广东工业大学 | A kind of string sort method, system and a kind of string sort equipment |
CN109117482A (en) * | 2018-09-17 | 2019-01-01 | 武汉大学 | A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency |
CN109190372A (en) * | 2018-07-09 | 2019-01-11 | 四川大学 | A kind of JavaScript Malicious Code Detection model based on bytecode |
CN109670307A (en) * | 2018-12-04 | 2019-04-23 | 成都知道创宇信息技术有限公司 | A kind of SQL injection recognition methods based on CNN and massive logs |
CN109766693A (en) * | 2018-12-11 | 2019-05-17 | 四川大学 | A kind of cross-site scripting attack detection method based on deep learning |
-
2020
- 2020-07-10 CN CN202010660792.0A patent/CN112019497B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107807987A (en) * | 2017-10-31 | 2018-03-16 | 广东工业大学 | A kind of string sort method, system and a kind of string sort equipment |
CN109190372A (en) * | 2018-07-09 | 2019-01-11 | 四川大学 | A kind of JavaScript Malicious Code Detection model based on bytecode |
CN109117482A (en) * | 2018-09-17 | 2019-01-01 | 武汉大学 | A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency |
CN109670307A (en) * | 2018-12-04 | 2019-04-23 | 成都知道创宇信息技术有限公司 | A kind of SQL injection recognition methods based on CNN and massive logs |
CN109766693A (en) * | 2018-12-11 | 2019-05-17 | 四川大学 | A kind of cross-site scripting attack detection method based on deep learning |
Non-Patent Citations (1)
Title |
---|
陈旖等: "基于一维卷积神经网络的HTTP慢速DoS攻击检测方法", 《计算机应用》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221100A (en) * | 2021-02-09 | 2021-08-06 | 上海大学 | Countermeasure intrusion detection method for industrial internet boundary protection |
CN113098735A (en) * | 2021-03-31 | 2021-07-09 | 上海天旦网络科技发展有限公司 | Inference-oriented application flow and index vectorization method and system |
CN113179256A (en) * | 2021-04-12 | 2021-07-27 | 中国电子科技集团公司第三十研究所 | Time information safety fusion method and system for time synchronization system |
CN113179256B (en) * | 2021-04-12 | 2022-02-08 | 中国电子科技集团公司第三十研究所 | Time information safety fusion method and system for time synchronization system |
CN112995209A (en) * | 2021-04-20 | 2021-06-18 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
CN112995209B (en) * | 2021-04-20 | 2021-08-17 | 北京智源人工智能研究院 | Flow monitoring method, device, equipment and medium |
CN113591971A (en) * | 2021-07-28 | 2021-11-02 | 上海数鸣人工智能科技有限公司 | User individual behavior prediction method based on DPI time series word embedded vector |
CN113591971B (en) * | 2021-07-28 | 2024-05-07 | 上海数鸣人工智能科技有限公司 | User individual behavior prediction method based on DPI time sequence word embedded vector |
CN117118687A (en) * | 2023-08-10 | 2023-11-24 | 国网冀北电力有限公司张家口供电公司 | Multi-stage attack dynamic detection system based on unsupervised learning |
CN117118687B (en) * | 2023-08-10 | 2024-08-20 | 国网冀北电力有限公司张家口供电公司 | Multi-stage attack dynamic detection system based on unsupervised learning |
CN118075030A (en) * | 2024-04-19 | 2024-05-24 | 鹏城实验室 | Network attack detection method and device, electronic equipment and storage medium |
CN118075030B (en) * | 2024-04-19 | 2024-07-02 | 鹏城实验室 | Network attack detection method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112019497B (en) | 2021-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112019497B (en) | Word embedding-based multi-stage network attack detection method | |
CN112953924B (en) | Network abnormal flow detection method, system, storage medium, terminal and application | |
Radford et al. | Network traffic anomaly detection using recurrent neural networks | |
Yuan et al. | DeepDefense: identifying DDoS attack via deep learning | |
CN106911669B (en) | DDOS detection method based on deep learning | |
JP6835703B2 (en) | Cyber attack detection system, feature selection system, cyber attack detection method, and program | |
JP2019110513A (en) | Anomaly detection method, learning method, anomaly detection device, and learning device | |
CN112333195B (en) | APT attack scene reduction detection method and system based on multi-source log correlation analysis | |
CN110768946A (en) | Industrial control network intrusion detection system and method based on bloom filter | |
Anil et al. | A hybrid method based on genetic algorithm, self-organised feature map, and support vector machine for better network anomaly detection | |
Saurabh et al. | Nfdlm: A lightweight network flow based deep learning model for ddos attack detection in iot domains | |
GB2583892A (en) | Adaptive computer security | |
CN111367908A (en) | Incremental intrusion detection method and system based on security assessment mechanism | |
Al-Shabi | Design of a network intrusion detection system using complex deep neuronal networks | |
CN117061254B (en) | Abnormal flow detection method, device and computer equipment | |
Huynh et al. | On the performance of intrusion detection systems with hidden multilayer neural network using DSD training | |
CN116170237B (en) | Intrusion detection method fusing GNN and ACGAN | |
CN115277178B (en) | Abnormality monitoring method, device and storage medium based on enterprise network flow | |
CN114444075B (en) | Method for generating evasion flow data | |
Leevy et al. | IoT attack prediction using big Bot-IoT data | |
Alrawashdeh et al. | Optimizing Deep Learning Based Intrusion Detection Systems Defense Against White-Box and Backdoor Adversarial Attacks Through a Genetic Algorithm | |
Yashwanth et al. | Network Intrusion Detection using Auto-encoder Neural Networks and MLP | |
Girubagari et al. | Hybrid Intelligent Anomaly Detection System Using Attention based Deep Learning Approach for Cyber Attacks Prevention | |
CN117574135B (en) | Power grid attack event detection method, device, equipment and storage medium | |
Avram et al. | Tiny network intrusion detection system with high performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |