CN113242160A - Protocol identification method based on state machine - Google Patents
Protocol identification method based on state machine Download PDFInfo
- Publication number
- CN113242160A CN113242160A CN202110782472.7A CN202110782472A CN113242160A CN 113242160 A CN113242160 A CN 113242160A CN 202110782472 A CN202110782472 A CN 202110782472A CN 113242160 A CN113242160 A CN 113242160A
- Authority
- CN
- China
- Prior art keywords
- protocol
- data
- model
- state machine
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/18—Protocol analysers
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of information transmission protocol identification, and particularly discloses a protocol identification method based on a state machine, which comprises the following steps: acquiring protocol data; providing the protocol data to a model in a project state machine, and making a data set and a corresponding model; training the model to obtain a model array comprising a plurality of AI models; constructing a state machine based on a project, correspondingly arranging a state model on the state machine, and deploying the model array in the state model; and collecting a new protocol data stream, and importing the new protocol data stream into a state machine for identification so as to obtain a protocol type. According to the protocol identification method, the protocol data are trained to obtain the model array comprising the multiple AI models, the model array is deployed in the constructed state machine, the data to be verified are led in, the data and the model array in the state machine are matched and identified to judge the protocol type, the identification of the unusual or unknown protocol is realized, and the protocol identification effect is improved.
Description
Technical Field
The invention relates to the technical field of information transmission protocol identification, in particular to a protocol identification method based on a state machine.
Background
With the technical development of the information security field, the identification of the protocol gradually enters the era of big data and AI, the protocol security problem based on data and signals is particularly prominent, the number of problem events gradually rises, the normal operation of network data flow activities is favorably ensured by comprehensively detecting the states of the signals and the data transmission protocol, and how to identify the protocol is important. At present, the related protocols are mainly identified by using protocol identification software based on wireshark in the field, but due to the rapid technical development in the field of information security, protocol identification standards are more and more, the existing software can only identify open protocols, the identification effect on some relatively uncommon protocols or unknown protocols is poor, and the identification requirement of information transmission protocols is difficult to meet.
Disclosure of Invention
Therefore, it is necessary to provide a protocol identification method based on a state machine for solving the technical problem that the existing software has poor identification effect on an uncommon or unknown protocol.
A protocol identification method based on a state machine comprises the following steps:
s1: acquiring protocol data;
s2: providing the protocol data to a model in a project state machine, and making a data set and a corresponding model;
s3: training the model to obtain a model array comprising a plurality of AI models;
s4: constructing a state machine based on a project, correspondingly arranging a state model on the state machine, and deploying the model array in the state model;
s5: and collecting a new protocol data stream, and importing the new protocol data stream into a state machine for identification so as to obtain a protocol type.
In one embodiment, the step S1 further includes: protocol basic data is acquired through signal acquisition or network packet capturing, and the protocol basic data is preprocessed to acquire protocol packet data.
In one embodiment, the preprocessing of the protocol basic data in step S1 includes cleaning the protocol basic data.
In one embodiment, in step S3, the AI models include a dbcs7799 protocol model and a tds073 protocol model.
In one embodiment, the step S5 further includes:
s51: analyzing the length of the protocol data flow to obtain a data flow state, and segmenting the protocol packet data according to the data flow state;
s52: and constructing a basic segment by project on the basis of the segmented data, and constructing an automatic protocol identification model on the basic segment by a cracking algorithm so as to crack each node one by one.
In one embodiment, the cracking algorithm in step S52 includes:
s521: extracting protocol fingerprints and establishing corresponding protocol verification rules through the protocol samples;
s522: and fast fingerprint matching and fast verification of a protocol identification result are carried out in the protocol identification stage.
In one embodiment, the cracking algorithm is carried out in parallel, and nodes are automatically distributed according to the node computing power of single equipment.
In one embodiment, in step S52, SDN technology is used to perform on-demand allocation of traffic and compute nodes.
In one embodiment, plaintext information cracked by each node is combined according to a time sequence to obtain industrial control information used for industrial control behavior recognition.
In one embodiment, the step S5 further includes: before step S51, the collected protocol data stream is monitored, checked, and uploaded to the state machine for matching identification by big data means.
According to the protocol identification method based on the state machine, the model array comprising various AI models is obtained by training the model of the protocol data, the model array is deployed in the state machine on the basis of building the state machine based on the project, the data to be verified is led into the state machine and is matched and identified with the model array in the state machine to judge the type of the protocol, so that identification of an uncommon or unknown protocol can be realized, and the protocol identification effect is improved.
Drawings
FIG. 1 is a flow diagram of a state machine based protocol identification method in one embodiment of the invention;
FIG. 2 is a flow diagram of protocol identification in one embodiment of the invention;
FIG. 3 is a flow diagram of cracking recognition in one embodiment of the invention;
FIG. 4 is a logic diagram of a state machine based protocol identification method in an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
The invention relates to a protocol identification method based on a state machine, which specifically comprises the steps of obtaining a plurality of AI models through data training, then establishing the state machine, deploying the AI models in the state machine, and providing a database for matching and identifying subsequently acquired protocol data so as to judge the type of a protocol to be identified. Referring to fig. 1, the method for identifying a protocol based on a state machine according to the present invention includes the following steps:
and S1, acquiring protocol data. Preferably, the step S1 further includes: protocol basic data are acquired through signal acquisition or network packet capturing, and are preprocessed to acquire protocol packet data, wherein the preprocessing of the protocol basic data comprises cleaning of the protocol basic data or filtering of the data. The cleaning of protocol basic data is a process of rechecking and checking data, and aims to delete duplicate information, correct existing errors and provide data consistency, namely, to find and correct recognizable errors in data files, including checking data consistency, processing invalid values and missing values and the like, so as to improve the reliability of the data.
S2, data set generation: the protocol data is provided to the model in the project state machine and the data set and corresponding model are made. Specifically, for different protocol application scenarios, protocol data needs to be made into data sets expressed in various different protocol data forms, and then corresponding models are selected according to the data sets, wherein the models are models preset in a project state machine, and specifically, the existing project state machine can be loaded into a system, and the models of the existing project state machine are adopted as preset models to correspond to various applications, in other words, data packets of various protocols are formed and the models are selected, so that the models can be trained in subsequent processing.
S3, model establishment: and training the model to obtain a model array comprising a plurality of AI models. Preferably, the AI models include the dbcs7799 protocol model and the tds073 protocol model.
S4, constructing a state machine: and constructing a state machine based on the project, correspondingly arranging a state model on the state machine, and deploying the model array in the state model.
S5, protocol identification: and collecting a new protocol data stream, and importing the new protocol data stream into a state machine for identification so as to obtain a protocol type. Preferably, referring to fig. 2, step S5 further includes:
s51, preprocessing data: analyzing the length of the protocol data flow to obtain a data flow state, and segmenting the protocol packet data according to the data flow state. Before step S51, the collected protocol data stream is monitored and checked, and then uploaded to a state machine through a big data approach for matching identification, for example, the protocol data stream may be compressed first, and then the compressed protocol data stream is uploaded to the state machine in batch to increase the uploading rate.
S52, cracking and identifying: and constructing a basic segment by project on the basis of the segmented data, and constructing an automatic protocol identification model on the basic segment by a cracking algorithm so as to crack each node one by one.
Referring to fig. 3, the cracking algorithm in step S52 includes: s521: extracting protocol fingerprints and establishing corresponding protocol verification rules through the protocol samples; s522: and fast fingerprint matching and fast verification of a protocol identification result are carried out in the protocol identification stage. Furthermore, the cracking algorithm is performed in parallel, and nodes are automatically distributed according to the node computing power of a single device, that is, step S521 and step S522 are performed synchronously, and the two steps are not logically sequential. Further preferably, in step S52, traffic and compute nodes are allocated on demand using SDN technology. The plaintext information decoded by each node is combined according to a time sequence to obtain industrial control information for industrial control behavior identification, wherein the industrial control behaviors comprise industrial control behaviors such as air conditioner control and manipulator arm posture adjustment, other industrial control behaviors can be related naturally, the industrial control information is determined according to a scene applied by a protocol identification method, and the industrial control information is not repeated.
According to the protocol identification method based on the state machine, the model array comprising various AI models is obtained by training the model of the protocol data, the model array is deployed in the state machine on the basis of building the state machine based on the project, the data to be verified is led into the state machine and is matched and identified with the model array in the state machine to judge the type of the protocol, so that identification of an uncommon or unknown protocol can be realized, and the protocol identification effect is improved.
The protocol identification method is fully described below in conjunction with an embodiment of the present invention.
Referring to fig. 4, first, a data signal is collected by a collector to obtain protocol data. In the data acquisition process, protocol basic data can be acquired by adopting a signal acquisition or network packet capturing mode, and then the protocol basic data is preprocessed in a data cleaning mode to eliminate invalid or repeated data, improve the reliability of the data and obtain protocol packet data. In this embodiment, two sets of raw data are taken as an example for explanation.
Raw data sample 1:
323736000002ffffff90ffffffc83333300000000000000000000000000000000000000000000000000032323300000000000000000000000000000000000000000000001fffffff8b0800000000000003ffffff8550416a03310c3c4f5effffffe117ffffff84ffffff916cffffffadffffffedffffffdcffffffda2404ffffffeafffffff4ffffffd8ffffffb7fffffff461ffffffbd157a0804427f5579ffffffb349ffffff9625ffffffd011fffffff6ffffffc8ffffffb646ffffff92253affffffa41cffffff89ffffff84ffffffe2ffffff80ffffffc0ffffffa0ffffff94ffffffc2ffffffcc18ffffffc436163746ffffffbc51ffffffd310193c4affffffdcffffffaaffffffafffffff944936ffffffd1ffffff92ffffff98ffffffd159420837ffffffa9ffffff87ffffffa90affffff88082affffffc8ffffffd21501ffffffa7fffffff3ffffffd725ffffff8cffffffc0ffffffc7ffffffb6ffffffa17912ffffff843bffffffbaffffff8bffffffd3fffffff7ffffffefffffffcf743cffffffbe3730ffffff91ffffffcbffffffb867401effffffbcffffffa441ffffffa279553affffffcbffffffc8ffffffeaffffffcd5d7918fffffff961727366ffffffd5ffffffbd7fffffffb417ffffffc8fffffff4ffffffa6ffffff9955ffffffc7ffffffaeffffffb86cffffffb3ffffffef4fffffffef6619ffffffe7327c3effffffdc6e22ffffff8603fffffff6ffffffb95affffffaa566afffffffcffffffef7bfffffffd6fffffff9e7affffffed33777effffffdd6d71ffffff9ffffffff552ffffffdb27ffffffbeffffffcfffffffab3ffffffffbffffffbfffffffabffffff81ffffffd8010000。
raw data sample 2:
343036000200ffffffc0ffffff813443312346312373616c655f7265636f72642331393939643037303331353130363337320000000c20000000000000000040000000000b5220ffffff9fffffffffffffffffffffffffffffffbf7effffffc4080000000000000001ffffffc00000000064fffffffb400000000000000001ffffff88ffffff88ffffff88ffffff8800000001ffffffc000000000000c20ffffff9fffffffffffffffffffffffffffffffffffffffffffffffddffffffd0400000000024ffffff91ffffffa0ffffff9fffffffffffffffffffffffffffffffbf7effffffc4084e554c000000000100000000000000000000000000324e554c000000000000010000000000000000ffffff9fffffffffffffffffffffffffffffffffffffffffffffffde100000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffff9fffffffffffffffffffffffffffffffffffffffffffffffdf100000000000000000400000000000ffffffbb00000000000000。
after obtaining the two sets of raw data, the raw data is provided to a model in a project state machine to create a data set. Specifically, a Dbcs7799 protocol data form and a Zdsh8848 protocol data form are respectively adopted to process two groups of original data:
dbcs7799 protocol data
Zdsh8848 protocol data
After the processing, data sets in two protocol forms are obtained, then corresponding models are selected for different data sets respectively, and the models are trained to obtain AI models in different forms. The training of the model is common to all data mining technologies, is used for calculating the data mining model, and is to prepare data and preprocess the data before establishing the model, and particularly, to define identification fields to be allocated to relevant information during preprocessing, such as mining types and specific control fields. Model training is used to input additional sets of values for the data mining model, which may be used as a description of the testing phase, the results of which are used as examples to determine the algorithm end time. In other embodiments, when the amount of the collected protocol data is large, more protocol models can be generated after the data models are trained, so that a model array comprising a plurality of protocol models can be obtained.
While obtaining the model array, a state machine is made based on project. The project software is ontology editing and knowledge acquisition software developed based on Java language, or an ontology development tool, and is also a knowledge-based editor, and belongs to open source code software. The software is mainly used for building ontologies in a semantic network, is a core development tool for building ontologies in the semantic network, provides building of ontology concept classes, relations, attributes and examples, shields a specific ontology description language, and reduces the difficulty of model building as a user only needs to build a domain ontology model on a concept level. The State machine is a short for finite State automata, is a mathematical model abstracted by operation rules of real things, and comprises states (State): a state machine is at least to comprise two states; event (Event): an event is a trigger condition or a password for performing an operation; action (Action): performing an action after the event occurs; and transformation (Transition): i.e. from one state to another. After a state machine comprising a plurality of groups of variable relations is constructed through software, a model array is deployed in the state machine, the state of a state model in the state machine is used for judging whether a protocol is matched, and an event is the identification matching of an input protocol data stream and an AI model; the action is taken as the success of matching the protocol data flow with the AI model; and converting the data into the protocol data type corresponding to the AI model matched with the protocol data flow in the output state machine, thus achieving the purpose of identifying the protocol data type.
Specifically, in the process of identifying and matching the protocol data, because the length of the protocol data stream is often long, the protocol data stream needs to be segmented according to the characteristics of different segments of the protocol data stream, a plurality of basic segments are constructed through software, and on the basis of constructing the basic segments, the basic segments are sequentially substituted into the state machine and the identification and matching are noticed. Before protocol identification, setting a corresponding protocol verification rule in a state machine, then carrying out protocol fingerprint extraction on a protocol sample, namely a protocol data stream to be verified, namely extracting characteristics which can be identified by a computer in the protocol data stream, then substituting the extracted characteristics of the protocol data stream into a model array for matching identification, and carrying out quick verification on a protocol identification result again according to the protocol verification rule after matching so as to improve the reliability of the protocol identification result.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A protocol identification method based on a state machine is characterized by comprising the following steps:
s1: acquiring protocol data;
s2: providing the protocol data to a model in a project state machine, and making a data set and a corresponding model;
s3: training the model to obtain a model array comprising a plurality of AI models;
s4: constructing a state machine based on a project, correspondingly arranging a state model on the state machine, and deploying the model array in the state model;
s5: and collecting a new protocol data stream, and importing the new protocol data stream into a state machine for identification so as to obtain a protocol type.
2. The method for identifying a state machine based protocol according to claim 1, wherein the step S1 further comprises: protocol basic data is acquired through signal acquisition or network packet capturing, and the protocol basic data is preprocessed to acquire protocol packet data.
3. The state-machine-based protocol identification method according to claim 2, wherein in step S1, the preprocessing of the protocol basic data comprises cleaning the protocol basic data.
4. The state-machine-based protocol recognition method of claim 3, wherein in step S3, the AI models comprise a dbcs7799 protocol model and a tds073 protocol model.
5. The method for identifying a state machine based protocol according to claim 4, wherein the step S5 further comprises:
s51: analyzing the length of the protocol data flow to obtain a data flow state, and segmenting the protocol packet data according to the data flow state;
s52: and constructing a basic segment by project on the basis of the segmented data, and constructing an automatic protocol identification model on the basic segment by a cracking algorithm so as to crack each node one by one.
6. The method for identifying a protocol based on a state machine according to claim 5, wherein the cracking algorithm in the step S52 comprises:
s521: extracting protocol fingerprints and establishing corresponding protocol verification rules through the protocol samples;
s522: and fast fingerprint matching and fast verification of a protocol identification result are carried out in the protocol identification stage.
7. The state-machine based protocol recognition method of claim 6, wherein the cracking algorithms are performed in parallel and automatically assign nodes based on node power of individual devices.
8. The state-machine based protocol identification method of claim 7, wherein in step S52, SDN technology is used for on-demand allocation of traffic and compute nodes.
9. The state-machine based protocol recognition method of claim 8, wherein the plaintext information cracked by each node is combined according to a time sequence to obtain industrial control information for industrial control behavior recognition.
10. The method for identifying a state machine based protocol according to claim 5, wherein the step S5 further comprises: before step S51, the collected protocol data stream is monitored, checked, and uploaded to the state machine for matching identification by big data means.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110782472.7A CN113242160A (en) | 2021-07-12 | 2021-07-12 | Protocol identification method based on state machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110782472.7A CN113242160A (en) | 2021-07-12 | 2021-07-12 | Protocol identification method based on state machine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113242160A true CN113242160A (en) | 2021-08-10 |
Family
ID=77135245
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110782472.7A Pending CN113242160A (en) | 2021-07-12 | 2021-07-12 | Protocol identification method based on state machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113242160A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114760234A (en) * | 2022-03-30 | 2022-07-15 | 中核武汉核电运行技术股份有限公司 | Verification system and method for protocol analysis result of industrial control system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106131023A (en) * | 2016-07-15 | 2016-11-16 | 深圳市永达电子信息股份有限公司 | A kind of Information Security Risk strength identifies system |
CN106850338A (en) * | 2016-12-30 | 2017-06-13 | 西可通信技术设备(河源)有限公司 | A kind of R+1 classes application protocol recognition method and device based on semantic analysis |
US10063434B1 (en) * | 2017-08-29 | 2018-08-28 | Extrahop Networks, Inc. | Classifying applications or activities based on network behavior |
CN110661682A (en) * | 2019-09-19 | 2020-01-07 | 上海天旦网络科技发展有限公司 | Automatic analysis system, method and equipment for universal interconnection data |
US20200344185A1 (en) * | 2019-04-26 | 2020-10-29 | Oracle International Corporation | Directed acyclic graph based framework for training models |
-
2021
- 2021-07-12 CN CN202110782472.7A patent/CN113242160A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106131023A (en) * | 2016-07-15 | 2016-11-16 | 深圳市永达电子信息股份有限公司 | A kind of Information Security Risk strength identifies system |
CN106850338A (en) * | 2016-12-30 | 2017-06-13 | 西可通信技术设备(河源)有限公司 | A kind of R+1 classes application protocol recognition method and device based on semantic analysis |
US10063434B1 (en) * | 2017-08-29 | 2018-08-28 | Extrahop Networks, Inc. | Classifying applications or activities based on network behavior |
US20200344185A1 (en) * | 2019-04-26 | 2020-10-29 | Oracle International Corporation | Directed acyclic graph based framework for training models |
CN110661682A (en) * | 2019-09-19 | 2020-01-07 | 上海天旦网络科技发展有限公司 | Automatic analysis system, method and equipment for universal interconnection data |
Non-Patent Citations (1)
Title |
---|
刘文祺: ""基于机器学习的网络安全关键技术研究"", 《中国博士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114760234A (en) * | 2022-03-30 | 2022-07-15 | 中核武汉核电运行技术股份有限公司 | Verification system and method for protocol analysis result of industrial control system |
CN114760234B (en) * | 2022-03-30 | 2024-05-10 | 中核武汉核电运行技术股份有限公司 | Verification system and method for industrial control system protocol analysis result |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110597734B (en) | Fuzzy test case generation method suitable for industrial control private protocol | |
US11294754B2 (en) | System and method for contextual event sequence analysis | |
CN108768986B (en) | Encrypted traffic classification method, server and computer readable storage medium | |
CN109889538B (en) | User abnormal behavior detection method and system | |
CN110427298B (en) | Automatic feature extraction method for distributed logs | |
CN114143037B (en) | Malicious encrypted channel detection method based on process behavior analysis | |
CN104468262A (en) | Network protocol recognition method and system based on semantic sensitivity | |
CN110046073A (en) | A kind of log collection method and device, equipment, storage medium | |
CN112165484A (en) | Network encryption traffic identification method and device based on deep learning and side channel analysis | |
WO2024104406A1 (en) | Anomaly detection method and cloud network platform | |
CN116662184B (en) | Industrial control protocol fuzzy test case screening method and system based on Bert | |
CN111723846A (en) | Method and device for identifying encryption and compressed flow based on randomness characteristics | |
CN109698798B (en) | Application identification method and device, server and storage medium | |
CN112860676B (en) | Data cleaning method applied to big data mining and business analysis and cloud server | |
CN114328106A (en) | Log data processing method, device, equipment and storage medium | |
CN113242160A (en) | Protocol identification method based on state machine | |
CN108959922B (en) | Malicious document detection method and device based on Bayesian network | |
KR101073402B1 (en) | Method for simulating and examining traffic and network traffic analysis system | |
CN112087450B (en) | Abnormal IP identification method, system and computer equipment | |
CN115118447A (en) | Safety discrimination method and device for industrial control network flow, electronic device and medium | |
CN115865630B (en) | Network equipment fault diagnosis method and system based on deep learning | |
CN103220274B (en) | A kind of network message pattern matching process for operator's network outlet and system | |
CN111080362A (en) | Advertisement monitoring system and method | |
CN117240522A (en) | Vulnerability intelligent mining method based on attack event model | |
KR102559398B1 (en) | Security monitoring intrusion detection alarm processing device and method using artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210810 |