CN108683658A

CN108683658A - An abnormal identification method for industrial control network traffic based on multi-RBM network construction benchmark model

Info

Publication number: CN108683658A
Application number: CN201810449297.8A
Authority: CN
Inventors: 李怡晨; 马颖华; 李生红; 张波; 梁启联
Original assignee: Information And Communication Branch Of Jiangsu Electric Power Co Ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Global Energy Interconnection Research Institute; Shanghai Jiao Tong University
Current assignee: Information And Communication Branch Of Jiangsu Electric Power Co Ltd; State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Global Energy Interconnection Research Institute; Shanghai Jiao Tong University
Priority date: 2018-05-11
Filing date: 2018-05-11
Publication date: 2018-10-19
Anticipated expiration: 2038-05-11
Also published as: CN108683658B

Abstract

A method for identifying traffic anomalies in industrial control networks based on multiple RBM networks to build a benchmark model, extracting features from the industrial control network and generating a training data set, training the benchmark model and obtaining a normal benchmark model of the industrial control network including multiple RBM models and training For the abnormal data clusters in the data set, the normal benchmark model of the industrial control network is used for real-time network message evaluation to realize traffic anomaly detection; the present invention can internally complete whether to reduce the dimension and the dimension that needs to be reduced through the setting of parameters, and has better Robustness, it is not necessary to set the number of clusters in advance, and it is done through the degree of interrelationship of the model, which is more in line with the actual application situation.

Description

An abnormal identification method for industrial control network traffic based on multi-RBM network construction benchmark model

技术领域technical field

本发明涉及一种计算机领域的技术，具体涉及一种基于多个RBM网络构建基准模型，并根据基准模型进行网络流量的异常识别方法。The invention relates to a technology in the computer field, in particular to a method for constructing a benchmark model based on multiple RBM networks and performing abnormal identification of network traffic according to the benchmark model.

背景技术Background technique

随着攻击手段的不断变化，基于已知攻击特征攻击检测技术已经不能保护网络免遭攻击,对网络流量进行攻击检测非常有必要。攻击网络流量包由海量的流量数据组成，这些流量数据记录了电网终端的所有活动和行为。通过分析和整合这些网络流量包，可以从中提取特征，来发现攻击。但由于网络流量数量巨大，要达成攻击识别，就必须达成实时处理，对检测算法的效率要求很高。传统的神经网络学习方法和大部分机器学习方法往往在处理这方面的问题上会出现捉襟见肘的情况，对于电网网络流量攻击检测系统，如何高效、高精度的处理这些海量数据是一个巨大的挑战。With the continuous change of attack methods, attack detection technology based on known attack characteristics can no longer protect the network from attacks, and it is very necessary to conduct attack detection on network traffic. The attack network traffic packet consists of massive traffic data, which records all activities and behaviors of power grid terminals. By analyzing and integrating these network traffic packets, features can be extracted from them to discover attacks. However, due to the huge amount of network traffic, in order to achieve attack identification, real-time processing must be achieved, which requires high efficiency of detection algorithms. Traditional neural network learning methods and most machine learning methods are often stretched to deal with this problem. For the power grid network traffic attack detection system, how to process these massive data efficiently and with high precision is a huge challenge.

发明内容Contents of the invention

本发明针对已有技术的不足以及电网工控环境的特殊情况，提出一种基于多RBM网络构建基准模型的工控网络流量异常识别方法，通过对工控网络流量数量与时间的监控，进而聚类出工控网络流量的基准模型，进而由基准模型识别工控网络中工控设备的各种工作状态，从中找出异常状态。Aiming at the deficiencies of the existing technology and the special circumstances of the industrial control environment of the power grid, the present invention proposes an abnormal identification method for industrial control network traffic based on a multi-RBM network construction benchmark model. By monitoring the quantity and time of industrial control network traffic, the industrial control The benchmark model of network traffic, and then use the benchmark model to identify various working states of industrial control equipment in the industrial control network, and find out abnormal states from it.

本发明是通过以下技术方案实现的：The present invention is achieved through the following technical solutions:

本发明涉及一种基于多RBM网络构建基准模型的工控网络流量异常识别方法，从工控网络中提取出特征并生成训练数据集，对基准模型进行训练并得到包含多个RBM模型的工控网络正常基准模型和训练数据集中的异常数据簇，用工控网络正常基准模型进行实时网络报文评估，实现流量异常检测。The present invention relates to a method for identifying traffic anomalies in industrial control networks based on multi-RBM network construction benchmark models, which extracts features from industrial control networks and generates training data sets, trains benchmark models and obtains normal benchmarks of industrial control networks containing multiple RBM models For the abnormal data clusters in the model and training data sets, the normal benchmark model of the industrial control network is used for real-time network message evaluation to realize traffic anomaly detection.

所述的训练数据集，根据工控网络的网络特性进行特征提取和归并后，以时间段划分出数据簇形式的训练数据。In the training data set, after feature extraction and merging are performed according to the network characteristics of the industrial control network, the training data in the form of data clusters are divided into time segments.

所述的工控网络的网络特性包括但不限于：通过工控网络的前置采集机或者网络设备从旁路复制报文。The network characteristics of the industrial control network include but are not limited to: copying messages from the bypass through the front-end collector or network equipment of the industrial control network.

所述的特征提取是指：根据工控网络流量数据传输的协议，提取报文传输的时间、数量、种类等特征进行特征选择，去除数据集中的亢余特征，得到提取后的报文特征。The feature extraction refers to: according to the industrial control network traffic data transmission protocol, extracting characteristics such as time, quantity, and type of message transmission for feature selection, removing redundant features in the data set, and obtaining the extracted message features.

所述的归并是指：按照合并时间段Ta内流量数据的数量进行特征的归并。The merging refers to merging features according to the quantity of traffic data in the merging time period Ta.

所述的数据簇，按照工控网络的流量传输时间作为聚类时间段Tb进行时间段划分，数据集划分为各个数据簇。The data clusters are divided according to the traffic transmission time of the industrial control network as the clustering time period Tb, and the data set is divided into each data cluster.

所述的基准模型包括至少一个RBM网络，该基准模型通过输入任一数据簇来完成RBM网络参数的更新且基准模型的初始参数随机设定，通过接受不同规律的数据簇完成RBM网络数量的增加。The benchmark model includes at least one RBM network, the benchmark model completes the update of the RBM network parameters by inputting any data cluster and the initial parameters of the benchmark model are randomly set, and the increase in the number of RBM networks is completed by accepting data clusters of different laws .

所述的RBM网络的网络参数包括但不限于：学习速率α、迭代次数n、可见层与隐藏层节点个数、均方根误差阈值e、合并时间段Ta、时间簇的聚类时间段Tb等，其中：学习速率α为RBM模型受到反馈后参数每次改变的范围，学习速率越大，开始收敛的速度越快，但是很难收敛到准确值；迭代次数n为RBM网络训练到收敛的次数，为了防止RBM模型过拟合，因此允许存在一定的误差；可见层的节点个数由输入数据的特征决定，隐藏层的节点个数跟降维后的维度及收敛需要的精度有关，一般需要实验得出合理设定值；均方根误差阈值e是指输入数据与已有的RBM之间的相似程度，均方根误差越大，相似程度越小，聚类后的模型越少，但是误差越大；合并时间段Ta是指工控网络特征提取后的单个数据在该时间内数量的合并，用于表征网段短时间的流量传输特点；时间簇的聚类时间段Tb是指每个RBM模型内的时间段，其中有多个合并时间段的数据，表示网段在一段时间输入输出的流量传输模式。The network parameters of the RBM network include but are not limited to: learning rate α, number of iterations n, number of nodes in the visible layer and hidden layer, root mean square error threshold e, merging time period Ta, clustering time period Tb of time clusters Etc., among them: the learning rate α is the range of each change of the parameters of the RBM model after receiving feedback. The larger the learning rate, the faster the convergence speed, but it is difficult to converge to an accurate value; the number of iterations n is the RBM network training to converge. In order to prevent the RBM model from overfitting, a certain error is allowed; the number of nodes in the visible layer is determined by the characteristics of the input data, and the number of nodes in the hidden layer is related to the dimension after dimension reduction and the accuracy required for convergence. Generally, Experiments are required to obtain a reasonable setting value; the root mean square error threshold e refers to the degree of similarity between the input data and the existing RBM. The larger the root mean square error, the smaller the similarity, and the fewer models after clustering. However, the error is larger; the merging period Ta refers to the merging of individual data within the time period after the feature extraction of the industrial control network, which is used to characterize the short-term traffic transmission characteristics of the network segment; the clustering period Tb of the time cluster refers to each A time period in an RBM model, in which there are multiple combined time periods of data, indicating the traffic transmission mode of the input and output of the network segment during a period of time.

所述的训练是指：将数据簇输入初始化后的基准模型中，测试基准模型中的所有的RBM基准模型，计算该数据簇在基准模型的重构输出，计算重构输出与原始数据的平方根误差，根据与各个模型之间距离的大小，对训练模型参数完善或者对基准模型进行增加，直至所有训练数据集训练完毕后，得到包含多个RBM模型的工控网络正常基准模型和训练数据集中的异常数据簇。The training refers to: input the data cluster into the initialized benchmark model, test all the RBM benchmark models in the benchmark model, calculate the reconstruction output of the data cluster in the benchmark model, and calculate the square root of the reconstruction output and the original data Error, according to the size of the distance between each model, improve the training model parameters or increase the benchmark model until all training data sets are trained, and the normal benchmark model and training data set of the industrial control network containing multiple RBM models are obtained. Unusual data clusters.

所述的模型之间距离，采用但不限于平方根误差进行表征。The distance between the models is characterized by but not limited to the square root error.

所述的异常数据簇是指：根据聚类后RBM模型中的数据簇的数量设定每个数据簇的异常度，RBM模型中数据簇的数量越多，说明该模型越符合网段传输规律，对应的数据簇异常度越低，该异常数据簇对应的报文就是异常数据。The abnormal data cluster refers to: set the abnormality degree of each data cluster according to the number of data clusters in the RBM model after clustering, the more the number of data clusters in the RBM model, the more the model conforms to the network segment transmission rule , the lower the abnormality of the corresponding data cluster is, the message corresponding to the abnormal data cluster is the abnormal data.

所述的异常度是模型中异常数据的百分比，由聚类后RBM模型中的数据簇的数量确定，RBM模型中数据簇的数量越多，对应的RBM模型异常度越低，它表征的是RBM模型的异常状态。The abnormality is the percentage of abnormal data in the model, determined by the number of data clusters in the RBM model after clustering, the more the number of data clusters in the RBM model, the lower the corresponding RBM model abnormality, which represents Abnormal state of the RBM model.

所述的对基准模型进行增加是指：当训练过程中输出数据与原数据的距离全部超过设定阈值时，则说明该数据簇中的特征与现有的所有RBM网络模式均不吻合即属于新的模式类型，因此需要新建一个RBM网络并将该数据簇输入该RBM网络中进行训练并调整网络参数，最后将该新建并初始化后的RBM网络加入到基准模型中。Adding the reference model refers to: when the distance between the output data and the original data exceeds the set threshold during the training process, it means that the features in the data cluster do not match all the existing RBM network models, that is, they belong to A new model type, so it is necessary to create a new RBM network and input the data cluster into the RBM network for training and adjust network parameters, and finally add the newly created and initialized RBM network to the benchmark model.

所述的调整网络参数是指：将符合预设的异常度检测阈值的RBM模型汇总，汇总后为一个多RBM模型集，模型集对应多个RBM模型，RBM模型的个数为K，每个RBM模型对应自己的参数与数据簇。The adjustment of the network parameters refers to: summarizing the RBM models that meet the preset abnormality detection threshold, and then summarizing them into a multi-RBM model set, the model set corresponds to multiple RBM models, and the number of RBM models is K, each The RBM model corresponds to its own parameters and data clusters.

所述的原数据：经过特征提取后的数据，该数据对应在输入RBM网络之前被称为重构输出的原数据。The original data: the data after feature extraction, which corresponds to the original data that is called reconstruction output before being input into the RBM network.

所述的异常度检测阈值为RBM模型跟正常基准模型的误差，阈值设定越小说明误差越小，该RBM就是正常基准模型。The abnormality detection threshold is the error between the RBM model and the normal benchmark model, and the smaller the threshold setting, the smaller the error, and the RBM is the normal benchmark model.

所述的训练模型参数完善是指：当训练过程中输出数据与原数据的距离部分在阈值范围内时，选定距离最小的RBM模型集，添加该数据簇所对应的原数据进入对应基准模型的训练数据集，同时重新训练RBM网路，更新模型参数。The training model parameter improvement refers to: when the distance between the output data and the original data is within the threshold range during the training process, the RBM model set with the smallest distance is selected, and the original data corresponding to the data cluster is added into the corresponding benchmark model The training data set, and retrain the RBM network at the same time, and update the model parameters.

当模型的训练数据集中数据过多时，根据提前设定的数据集数据量个数随机抛弃部分冗余数据，训练新的数据集并更新对应的基准模型参数。When there is too much data in the training data set of the model, part of the redundant data is randomly discarded according to the number of data sets set in advance, a new data set is trained and the corresponding benchmark model parameters are updated.

所述的实时网络报文评估是指：将网络报文进行特征提取和归并后，以时间段划分出数据簇形式的检测数据簇并输入到工控网络正常基准模型中，测试其中所有的RBM模型并计算该检测数据簇的输出数据与原数据的距离，当距离大于异常度误差值时，则检测数据簇对应的网络报文为异常报文。The real-time network packet evaluation refers to: after feature extraction and merging of the network packets, the detection data clusters in the form of data clusters are divided by time periods and input into the normal benchmark model of the industrial control network, and all the RBM models are tested. And calculate the distance between the output data of the detection data cluster and the original data, and when the distance is greater than the abnormality error value, the network message corresponding to the detection data cluster is an abnormal message.

技术效果technical effect

与现有技术相比，本发明技术效果包括：Compared with the prior art, the technical effects of the present invention include:

1)对于实时流量的运行速度得以提升，当电网工控网络某设定的网络的一小时所有流量进入时，本发明可以在一分钟以内完成异常的识别与参数的更新；本发明采用RBM网络的构建，可以在内部通过参数的设定完成是否降维以及需要降低到的维度且由于本发明可以在参数的更新中舍弃关联不大的数据，保持数据的有效的同时避免亢余因此对硬件要求更低；1) The operating speed of real-time traffic can be improved. When all the traffic of a set network of the power grid industrial control network enters in one hour, the present invention can complete abnormal identification and parameter update within one minute; the present invention adopts RBM network For construction, whether to reduce the dimension and the dimension to be reduced can be completed internally through the setting of the parameters, and because the present invention can discard the data with little correlation in the update of the parameters, keep the data valid and avoid redundancy, so the hardware requirements lower;

2)通过RBM方法建立基准模型具有非线性的特点，使得本发明所采用的工控网络正常基准模型具有更好的鲁棒性，此外多个RBM建模可以有效避免不同工作状态对数据的影响，有利于把握更多正常工作状态，从而更为准确的识别异常状态。2) The establishment of the benchmark model by the RBM method has nonlinear characteristics, which makes the normal benchmark model of the industrial control network adopted in the present invention have better robustness. In addition, multiple RBM modeling can effectively avoid the influence of different working states on the data. It is beneficial to grasp more normal working states, so as to identify abnormal states more accurately.

3)本发明采用层次聚类，不用提前设定需要聚类的数量，通过模型的相互关联程度来完成，更符合实际应用的情况。3) The present invention adopts hierarchical clustering, does not need to set the number of clusters in advance, and completes it through the degree of interrelationship of the models, which is more in line with the actual application situation.

附图说明Description of drawings

图1为本发明工控网络正常基准模型自动构建流程图；Fig. 1 is the flow chart of automatic construction of normal benchmark model of industrial control network of the present invention;

图2为本发明基于正常基准模型的异常流量检测方法流程图。Fig. 2 is a flow chart of the abnormal traffic detection method based on the normal reference model in the present invention.

具体实施方式Detailed ways

本实施例操作对象为每天不间断采集全部网段的用电数据采样的报文数据，本实施例采用了15天的数据作为基准模型的构建数据基于某设定网段内的数据dataset，报文数据中前15天数据设定为基准模型训练数据train_data，后8天数据为测试数据test_data。The operation object of this embodiment is the uninterrupted collection of the message data of the power consumption data sampling of all network segments every day. This embodiment uses 15 days of data as the construction data of the benchmark model. Based on the data dataset in a certain set network segment, the report The data of the first 15 days in the text data is set as the benchmark model training data train_data, and the data of the last 8 days is the test data test_data.

如图1所示，为本实施例涉及的一种在电网工控网络中用电数据采集流量的异常检测方法，具体包括以下步骤：As shown in Figure 1, it is a method for abnormal detection of electricity data collection flow in the power grid industrial control network involved in this embodiment, which specifically includes the following steps:

在进行方法检测前初始化和设定一些参数，方法所述的数据预处理包括以下几个部分的内容：根据通信协议中报文传输性质的afn、fn确定了数据的特征，提取所有数据的特征种类，得到97种报文特征，根据设定的97种特征来转化数据。然后按照采样时间间隔10分钟来进行报文数量的合并(设定Ta＝10mins，Tb＝1hour)，每十分钟设定为一个报文传输数据，当该十分钟没有数据传输则全为0.最后将合并后的数据进行min-max标准化归一化处理，通过简单的缩放，调整数据每一个维度的值到[0,1],转化运用的函数为：x＝(x-min)/(max-min)。经过归一化处理后的特征就可以用于K-RBM算法的输入。Initialize and set some parameters before the method detection. The data preprocessing described in the method includes the following parts: the characteristics of the data are determined according to the afn and fn of the message transmission nature in the communication protocol, and the characteristics of all data are extracted. Type, get 97 message features, and convert the data according to the set 97 features. Then according to the sampling time interval of 10 minutes, the number of messages is combined (set Ta=10mins, Tb=1hour), and every ten minutes is set as one message transmission data. When there is no data transmission in the ten minutes, it is all 0. Finally, the merged data is subjected to min-max standardization and normalization processing, and the value of each dimension of the data is adjusted to [0,1] through simple scaling. The function used for conversion is: x=(x-min)/( max-min). The normalized features can be used as the input of the K-RBM algorithm.

同时设定聚类模型的RBM网络参数，RBM网络中的可见层节点个数设定为96，因为输入到RBM的模型为97维(RBM模型可见层节点从0开始)，隐藏层节点个数设定为11，学习速率α＝0.02，RBM模型迭代次数为1000次，RBM模型均方根误差为0.03，时间簇时间段Tb设定为1小时。At the same time, set the RBM network parameters of the clustering model. The number of visible layer nodes in the RBM network is set to 96, because the model input to the RBM is 97-dimensional (the visible layer nodes of the RBM model start from 0), and the number of hidden layer nodes It is set to 11, the learning rate α=0.02, the number of iterations of the RBM model is 1000, the root mean square error of the RBM model is 0.03, and the time cluster time period Tb is set to 1 hour.

设定聚类后的RBM异常度为：当该RBM模型的数据簇在所有数据簇的占比为i％，则对应的异常度为1-i％，异常度检测阈值为1％，异常度检测误差值为5％。Set the abnormal degree of RBM after clustering to: when the data clusters of the RBM model account for i% of all data clusters, the corresponding abnormal degree is 1-i%, the abnormal degree detection threshold is 1%, and the abnormal degree The detection error value is 5%.

如图1所示，具体步骤如下：As shown in Figure 1, the specific steps are as follows:

步骤1)当train_data经过上述预处理以后的样本集为data＝{x1，x2…xm}，每个样本特征类别xi＝{t1，t2···t97}，然后将data按照时间簇时间段Tb进行数据切分，Ta是保证RBM模型同时输入多个数据段，这多个数据段代表了流量数据在Tb内的时间传输规律。切分好的数据簇可以认为data_i(i＝1,2…n),共n段数据簇，然后通过迭代建立基准模型。Step 1) When the sample set of train_data after the above preprocessing is data={x1, x2...xm}, each sample feature category xi={t1, t2···t97}, and then divide the data according to the time cluster time period Tb For data segmentation, Ta is to ensure that the RBM model inputs multiple data segments at the same time, and these multiple data segments represent the time transmission law of traffic data within Tb. The segmented data clusters can be considered as data_i (i=1, 2...n), a total of n segments of data clusters, and then a benchmark model is established through iteration.

步骤2)训练第一个数据簇data_1并记录第一个数据簇RBM模型的模型参数para_1，将该RBM模型添加进模型集R，记为R1，将para_2添加到参数集P，记为P1，将数据簇内各个数据添加到模型数据集D，记为D1，记录模型个数K为1.随后迭代求取基准模型，具体过程如下：Step 2) Train the first data cluster data_1 and record the model parameter para_1 of the RBM model of the first data cluster, add the RBM model to the model set R, denoted as R1, add para_2 to the parameter set P, denoted as P1, Add each data in the data cluster to the model data set D, denoted as D1, and record the number of models K as 1. Then iteratively obtain the benchmark model, the specific process is as follows:

步骤3)提取数据簇data_j，测试参数集P中所有的RBM模型的参数，当经过模型y训练过后的数据与原数据的均方根e误差小于RBM模型均方根误差，则状态参数集stata_y＝true,否则为state_y＝false，y为模型集中任意一个模型。验证所有state，当全部为false，则说明该数据簇不符合所有已有模型，当已有模型数为n，训练数据簇data_j，并记录该数据簇RBM模型的模型参数para_j，将该RBM模型添加进模型集R，记为Rn+1，将para_j添加到参数集P，记为Pn+1，将数据簇内各个数据添加到模型数据集D，记为Dn+1，记录模型个数K为n+1；当存在_st_at_e为t_rue，则求取e_d＝min(e)，其中d为对应的模型，然后将data_j中数据加入到Dd，训练Dd并更新对应的参数Pd，在加入数据簇到对应的模型中，当数据簇的个数大于100，训练时随机抛弃数据簇j，j随机选择，使得训练时的数据簇始终保持在100。Step 3) extract the data cluster data_j, test the parameters of all RBM models in the parameter set P, when the root mean square e error between the data after model y training and the original data is less than the root mean square error of the RBM model, then the state parameter set stata_y =true, otherwise state_y=false, y is any model in the model set. Verify all states. If all are false, it means that the data cluster does not conform to all existing models. When the number of existing models is n, train the data cluster data_j, and record the model parameter para_j of the RBM model of the data cluster, and use the RBM model Add it to the model set R, denoted as Rn+1, add para_j to the parameter set P, denoted as Pn+1, add each data in the data cluster to the model data set D, denoted as Dn+1, record the number of models K is n+1; when _s t _a t _e is true, then find _{e_d} =min(e), where d is the corresponding model, then add the data in data_j to Dd, train Dd and update the corresponding parameter Pd , when data clusters are added to the corresponding model, when the number of data clusters is greater than 100, data cluster j is randomly discarded during training, and j is randomly selected so that the number of data clusters during training is always kept at 100.

步骤4)迭代完所有数据，然后根据上述的异常度方法计算计算每个数据簇的异常度，对于大于异常度阈值的RBM模型提取并认为是异常，小于异常度的阈值的RBM模型设置为基准模型，基准模型可能包括多个RBM模型。Step 4) Iterate all the data, then calculate and calculate the abnormality of each data cluster according to the above-mentioned method of abnormality, extract and consider the RBM model greater than the threshold of abnormality as abnormal, and set the RBM model smaller than the threshold of abnormality as the benchmark model, the baseline model may include multiple RBM models.

步骤5)如图2所示，测试基准模型在实时数据中应用的有效性，读取test_data数据经过上述预处理以后的测试集为data_test＝{x1，x2…xm}，每个样本特征类别xi＝{t1，t2···t97}，然后将data_test按照时间簇时间段Tb进行数据切分为数据簇，依此读取数据簇中的数据，测试正常基准模型中所有的RBM模型的参数，当存在相似模型，那么该段数据簇就符合正常基准模型，该次报文也是符合数据传输规律的正常报文；当不存在小于异常检测误差度的平方根误差，则说明该次报文为异常报文Step 5) As shown in Figure 2, test the validity of the benchmark model in real-time data, read the test_data data after the above preprocessing test set is data_test={x1, x2...xm}, each sample feature category xi ={t1, t2···t97}, then divide the data_test into data clusters according to the time cluster time period Tb, read the data in the data clusters accordingly, and test all the parameters of the RBM model in the normal benchmark model, When there is a similar model, then the data cluster of this segment conforms to the normal benchmark model, and the message is also a normal message that conforms to the data transmission law; when there is no square root error less than the error degree of abnormal detection, it means that the message is abnormal message

在本实施例中，为了验证多RBM模型构建的基准模型的性能和有效性，在上述的电网工控网络的dataset对本发明所示的方法做了广泛的分析和评估，使用本方法可以发现电网工控网络中的异常流量以及不符合正常网络传输规律的流量，具体准确度由电网具体实施环境验证。In this embodiment, in order to verify the performance and effectiveness of the benchmark model constructed by multiple RBM models, the method shown in the present invention has been extensively analyzed and evaluated in the dataset of the above-mentioned power grid industrial control network. Using this method, it can be found that the power grid industrial control For abnormal traffic in the network and traffic that does not conform to the normal network transmission rules, the specific accuracy is verified by the specific implementation environment of the power grid.

上述具体实施可由本领域技术人员在不背离本发明原理和宗旨的前提下以不同的方式对其进行局部调整，本发明的保护范围以权利要求书为准且不由上述具体实施所限，在其范围内的各个实现方案均受本发明之约束。The above specific implementation can be partially adjusted in different ways by those skilled in the art without departing from the principle and purpose of the present invention. The scope of protection of the present invention is subject to the claims and is not limited by the above specific implementation. Each implementation within the scope is bound by the invention.

Claims

1. a kind of industry control network Traffic Anomaly recognition methods based on more RBM network structions benchmark models, which is characterized in that from work Feature is extracted in control network and generates training dataset, and benchmark model is trained and is obtained comprising multiple RBM models The abnormal data cluster that industry control network normal baseline model and training data are concentrated, is carried out in real time with industry control network normal baseline model Network message is assessed, and realizes Traffic anomaly detection；

The benchmark model includes at least one RBM networks, which completes RBM nets by inputting any data cluster The update of the network parameter and initial parameter of benchmark model is set at random passes through and receives the aggregates of data of different rules and complete RBM networks The increase of quantity；

The network parameter of the RBM networks includes：Learning rate α, iterations n, visible layer and hidden layer node number, Square error threshold value e, merge period Ta, Temporal Clustering cluster period Tb.

2. according to the method described in claim 1, the training refers to：By the benchmark model after aggregate of data input initialization In, all RBM benchmark models in test benchmark model, the reconstruct for calculating the aggregate of data in benchmark model exports, and calculates weight Structure exports the square root error with initial data, perfect to training pattern parameter according to the size of distance between each model Or benchmark model is increased, until after the training of all training datasets, obtain the industry control for including multiple RBM models The abnormal data cluster that network normal baseline model and training data are concentrated.

3. according to the method described in claim 1, the training dataset, feature is carried out according to the network characteristic of industry control network After extraction and merger, the training data in the form of aggregate of data is marked off by the period.

4. according to the method described in claim 3, the feature extraction refers to：The association transmitted according to industry control network data on flows View extracts the features such as time, quantity, the type of message transmissions and carries out feature selecting, removes the high remaining feature in data set, obtains Message characteristic after extraction.

5. method according to claim 1 or 2, the abnormal data cluster refers to：According to the number in RBM models after cluster According to the abnormality degree of each aggregate of data of the quantity set of cluster, the quantity of aggregate of data is more in RBM models, illustrates that the model more meets net Section transportation law, corresponding aggregate of data abnormality degree is lower, which is exactly abnormal data；

The abnormality degree is the percentage of abnormal data in model, is determined by the quantity of the aggregate of data in RBM models after clustering, The quantity of aggregate of data is more in RBM models, and corresponding RBM models abnormality degree is lower, and what it was characterized is the abnormal shape of RBM models State.

6. according to the method described in claim 2, it is described to benchmark model carry out increase refer to：When exporting number in training process According at a distance from former data all more than given threshold when, then illustrate feature in the aggregate of data and existing all RBM networks Pattern misfits the mode type for belonging to new, it is therefore desirable to create a RBM network and the aggregate of data is inputted the RBM nets Network parameter is trained and adjusted in network, and finally this is created and the RBM networks after initializing are added in benchmark model.

7. according to the method described in claim 6, the adjustment network parameter refers to：Preset abnormality degree detection threshold will be met The RBM models of value summarize, and are RBM Models Sets more than one after summarizing, and Models Sets correspond to multiple RBM models, and the number of RBM models is K, each RBM models correspond to the parameter and aggregate of data of oneself.

8. according to the method described in claim 2, the training pattern parameter is improved refers to：When output data in training process With when part is in threshold range at a distance from former data, the RBM Models Sets of selected distance minimum add corresponding to the aggregate of data Former data enter the training dataset of corresponding benchmark model, while the networks re -training RBM, update model parameter.

9. according to the method described in claim 8, when model training data concentrate overabundance of data when, according to the number being set in advance Partial redundance data are abandoned at random according to collection data volume number, and the new data set of training simultaneously updates corresponding benchmark model parameter.

10. according to the method described in claim 1, the real-time network message assessment refers to：Network message is subjected to feature After extraction and merger, the detection data cluster in the form of aggregate of data is marked off by the period and is input to industry control network normal baseline model In, wherein all RBM models of test and calculate the output data of the detection data cluster at a distance from original data, when distance is more than When abnormality degree error amount, then the corresponding network message of detection data cluster is exception message.