CN118659986B

CN118659986B - Progressive service flow classification method and device based on convolutional neural network

Info

Publication number: CN118659986B
Application number: CN202411143981.5A
Authority: CN
Inventors: 许郑勇; 曹达明; 翟江涛; 刘光杰; 吉小鹏
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Filing date: 2024-08-20
Publication date: 2024-11-19
Anticipated expiration: 2044-08-20

Abstract

The invention provides a progressive service flow classification method and device based on a convolutional neural network, wherein the method comprises the following steps: the method comprises the steps of periodically collecting data packets by setting the size of an initial time window, and constructing feature input and index of a model; the method comprises the steps of inputting the data into a progressive one-dimensional convolutional neural network to learn time sequence characteristics of network flow data, dynamically selecting an output head to output a classification result according to the use quantity of actual data packets, and evaluating confidence of the classification result; and feeding back the data packet time window size according to the evaluation result, thereby flexibly adjusting the number of the data packets required in classification. The invention effectively makes a classification decision in time by a progressive processing strategy on high-speed and complex network flow in the data center environment, and remarkably improves the processing speed and response capability.

Description

Progressive service flow classification method and device based on convolutional neural network

Technical Field

The invention relates to the field of network traffic classification, in particular to a progressive service traffic classification method and device based on a convolutional neural network.

Background

With the rapid expansion of cloud computing and large data services, the size and complexity of data centers continues to increase. The data center includes various network applications and services such as cloud computing, big data analysis, online media and virtualization. These applications vary in network bandwidth, latency, and reliability requirements, and there is an urgent need to implement sophisticated priority policies to meet these diverse requirements and provide consistent high quality services. In a data center environment, accurate traffic classification is the basis for efficient resource management. By identifying and classifying business applications, network resources can be preferentially allocated according to real-time requirements of different applications.

Traditional flow analysis methods, mostly used for offline classification, typically require observation and analysis of the complete data stream to perform efficient flow classification and prediction. This approach causes significant data processing delays in a rapidly changing network environment of the data center. This is particularly disadvantageous in scenarios with extremely high real-time requirements, because waiting for a complete data stream means that a large amount of data must be collected and analyzed before any resource allocation or prioritization decisions are made. The invention discloses a method for classifying network traffic, a system, a device and a readable storage medium (CN 116319582A), which is used for coding all data packets of each stream according to the size and time of the packets and classifying the data packets as data vectors. The invention discloses a traffic classification method based on the combination of statistical features and time-space features, namely a traffic classification method disclosed by a system (CN 116701995A), namely, the method obtains the multidimensional statistical features of a network session, such as the maximum and minimum values of data packets, average packet sizes, the maximum and minimum values of time intervals and the like.

The existing technology for realizing online flow classification is mainly characterized in that feature extraction and classification time is shortened, and the technology can be mainly divided into the following two types. The first method focuses on the effective statistical characteristics of the slave flow segments, for example, the invention of an encryption network flow classification method (CN 116319583A) based on GCNN and MoE discloses a classification technology for dividing mobile application flow for a period of time into a plurality of flow blocks and then quickly converting the flow into a graph dataset; the invention discloses a deep neural network-based encryption traffic classification method (CN 116232696A), which discloses a technology for intercepting and slicing each data stream according to time period t seconds and extracting the arrival time and packet length information of a data packet of each data packet. While the above approach can significantly shorten feature extraction and classification times relative to conventional techniques that wait for a complete data stream, it still relies on longer periods of stream data to ensure the integrity and representativeness of the extracted features.

Another approach is to capture the first UDP packet or byte of a network session or flow in order to identify the traffic class. The invention discloses a method for converting 784 bytes before each stream into a flow chart, which is based on a network (CN 116051883A) of a CNN-converter hybrid architecture; the invention discloses a network traffic classification method (CN 117793020A) of an adaptive convolutional neural network structure, which discloses a method for reserving the first 32 messages for each session and reserving the first 512 bytes of data for each packet for classification. Even though this type of approach can easily achieve near real-time classification, it places high demands and overhead on the system, especially on highly loaded networks, where it is difficult to capture the first specific packet or byte.

Disclosure of Invention

The invention aims to: the invention aims to provide a progressive flow business classification method based on a convolutional neural network, which aims to effectively adapt and analyze network flow characteristics of a data center. The method improves the classification precision of the model in processing dynamic and complex network traffic through a progressive learning mechanism, and ensures accurate identification and classification of various traffic types. Meanwhile, a dynamic data window adjustment strategy is introduced, so that the model is allowed to automatically adjust a processing window according to the real-time flow change, and the response speed and the accuracy are optimized. In addition, the method constructs a multi-head output model capable of simultaneously carrying out traffic classification, bandwidth prediction and duration prediction, improves the resource utilization efficiency and reduces the operation resource and time consumption.

The invention provides a progressive service flow classification method based on a convolutional neural network, which comprises the following steps:

Step 1, introducing a dynamic time window mechanism to acquired network data, setting the size of an initial window, and preprocessing the data in the window, wherein the preprocessing comprises data standardization, feature extraction and feature vector construction of a progressive convolutional neural network;

step 2, matching the dimension of the feature vector constructed in the step 1 with the input dimension of the progressive convolutional neural network, and supplementing the dimension in a zero filling mode if the dimension does not accord with the dimension input by the progressive convolutional neural network; generating a corresponding index vector for each feature vector to distinguish the original length of the feature vector before filling;

Step 3, inputting the constructed feature vector and the corresponding index vector into a progressive convolutional neural network, wherein the progressive convolutional neural network comprises three classification tasks, namely an application classification task, a traffic bandwidth classification task and a traffic duration task; the progressive convolutional neural network can accept feature vectors generated under time windows with different sizes, and meanwhile, a multi-head output selection mechanism is adopted: dynamically adjusting the output layer structure according to the input feature vectors and the index vectors, wherein each output head corresponds to one feature vector with original length (namely the number of data packets), and can simultaneously output classification results of three classification tasks;

Step 4, outputting the high-dimensional feature vectors of three tasks which are not normalized before the progressive convolutional neural network The pretreatment is carried out, wherein,Representing the score obtained by the first type of reasoning result in the bandwidth prediction task,Representing the score obtained by the first type of reasoning result in the duration prediction task,Representing the score obtained by the first class of results in the application classification task, said preprocessing comprising multiplying each high-dimensional feature vector by a task factor、AndI.e.，，The task factors represent the specific gravity of three classification tasks, if the three classification tasks are equally valued, the specific gravity is set to be 1, and finally normalization processing is carried out, so that all elements in the vector are equally scaled to be within the range of (0, 1);

Step 5, taking the feature vector preprocessed in the step 4 as input to a decision module, and performing confidence assessment on the results of the three classification tasks by the decision module to calculate a confidence index; if the confidence index is greater than or equal to a preset threshold (e.g. 0.5), confirming that the final classification result is the final classification result, ending the classification process of the stream and ending the collection of the data packet, if the confidence index is lower than the preset threshold, expanding a time window, continuing to collect the data packet, and repeating the steps 1-3 for reclassifying until the confidence index reaches the preset threshold or the complete data stream is collected.

Step 1 comprises the steps of:

Step 1-1, capturing network data packets, dividing the data streams according to five-tuple (source IP address, destination IP address, source port number, destination port number, transport layer protocol), applying a dynamically adjustable time window to each data stream, and setting the initial size of the time window, wherein the time window defines the capturing quantity of the data packets;

step 1-2, extracting characteristic information of all data packets in a time window, wherein the characteristic information comprises the packet length, the time interval and the packet transmission direction;

the packet length refers to the total byte number of the network packet payload;

the time interval refers to the difference between the time stamps of two consecutive data packets;

The data packet transmission direction means that the data packet transmitted from the server to the client is transmitted in the forward direction, and is set to be +1; the data packet transmitted from the client to the server is reversely transmitted and is set to be-1;

step 1-3, constructing feature vectors based on the feature information WhereinIndicating the direction of transmission of the nth data packet,Indicating the packet length of the nth data packet,Indicating the arrival time of the first packet,Representing a time interval between a first data packet and a second data packet; n represents the number of data packets in the time window;

And 1-4, carrying out normalization processing on the feature vectors, adopting a zero center normalization method for the data packet transmission direction and the data packet length data features, uniformly distributing the feature values in a (-1, +1) range, adopting a maximum and minimum normalization method for the time interval features, and uniformly scaling the feature values into a (0, +1) range.

Step 2 comprises the steps of:

Step 2-1, matching the dimension of the normalized feature vector with the input dimension of the progressive convolutional neural network, and supplementing the dimension in a zero filling mode if the dimension of the feature vector does not meet the input requirement of the one-dimensional convolutional neural network;

step 2-1, generating an index vector: the index vector includes elements 0 and 1, element 1 represents a feature portion actually generated in the feature vector, element 0 represents a filling portion in the feature vector, and lengths of the index vector and the feature vector remain identical.

In step 3, the hierarchical structure of the progressive convolutional neural network includes:

L1-1: the input layer receives two types of data sequences, the sizes of the two types of data sequences are L1 and L2 respectively, the first type of data sequence is main input, and the input is the characteristic vector generated in the step 1 and represents the time and space characteristics of the network flow; the second is auxiliary input, the input is index vector, which indicates the actual length of the input convolution sequence, namely the use quantity of data packets of one stream;

l1-2: the convolution layer is shared, and the convolution operation is carried out on the input feature vector;

l1-201: a first convolution layer comprising 32 one-dimensional convolution kernels of size 5, all convolution kernels having a step size of 1, and activated using a ReLU function;

L1-202: a first max pooling layer, which applies max pooling operation of 2×1 to the output feature vector of the first convolution layer to perform space downsampling;

l1-203: the second convolution layer comprises 64 one-dimensional convolution kernels with the size of 4, the step size of all the convolution kernels is 1, and the convolution kernels are activated by using a ReLU function, and the input of the input layer is added to the output of the second convolution layer after the scale of the input layer is adjusted through 1 multiplied by 1 convolution by the input of the second convolution layer to form residual connection;

l1-204: a second maximum pooling layer, which applies 2×1 maximum pooling operation to the output feature vector of the second convolution layer to perform spatial downsampling;

L1-205: a third convolution layer comprising 128 one-dimensional convolution kernels of size 3, all convolution kernels having a step size of 1, and activated using a ReLU function; the input of the third convolution layer is used for adjusting the scale of the output of the first convolution layer through 1 multiplied by 1 convolution and then adding the scale to the output of the first convolution layer to form residual connection;

l1-206: a third maximum pooling layer, which applies 2×1 maximum pooling operation to the output feature vector of the second convolution layer to perform spatial downsampling;

l1-3: the full connection layer is used for activating the output of the third maximum pooling layer through the full connection layer with 256 units by using a ReLU function;

L1-4: the Lambda layer inputs the index vector into the Lambda layer for condition judgment, and selects an output branch to be activated according to the actual length of the input feature vector;

l1-5: the branch output layer comprises M branch outputs, each branch comprises the output of three mutually independent classification tasks, the output of the full connection layer is selected to pass through the full connection layer of one branch, the feature vector is mapped to 3 5-dimensional class output vectors, the weight and the bias of the full connection layer in each branch are mutually independent, and the three classification tasks and the class thereof are defined as: { application type: 0 denotes application a,1 denotes application B,2 denotes application C,3 denotes application D,4 denotes application E }, { traffic bandwidth: 0 represents bandwidth a,1 represents bandwidth B,2 represents bandwidth C,3 represents bandwidth D,4 represents bandwidth E }, { duration: 0 represents time a,1 represents time B,2 represents time C,3 represents time D,4 represents time E };

l1-6: and the Softmax layer is used for carrying out Softmax normalization on the output vectors of the branch output layers to obtain 3 5-dimensional probability distribution vectors, wherein the elements in each output vector correspond to the posterior probability of one content type.

In step 3, training the progressive multitasking convolutional neural network by:

step 3-a1, constructing a training data set, which specifically comprises the following steps:

Step 3-a1-1, collecting data: collecting original network flow data from various network environments, covering common service scenes, and identifying and marking service types through Deep Packet Inspection (DPI); in the acquisition process, marking the bandwidth and duration of each data stream at the same time;

Step 3-a1-2, data classification: classifying the collected original network flow data according to service types, and classifying and labeling the bandwidth and the duration based on a predefined interval to obtain a label pair; if the large bandwidth flow interval can be set to 100Mbps or more, the small bandwidth flow interval can be set to 100Mbps or less; the long time stream interval may be set to 60 seconds or more, and the short time stream may be set to 60 seconds or less;

Step 3-a1-3, dividing the data set: randomly dividing the data and the label pairs marked in the step 3-a1-2 into mutually exclusive training sets, verification sets and test sets;

step 3-a2, executing a training process, specifically comprising the following steps:

step 3-a2-1, designing a loss function: the classification cross entropy loss function is adopted to measure the difference between the distribution of the output content types of the neural network and the real labels;

step 3-a2-2, initializing parameters: for the first convolution layer L1-201, the second convolution layer L1-203, the third convolution layer L1-205 and the weights of the full connection layer use a normal initialization method, initializing to a small constant value of 0.01 for the bias vector;

step 3-a2-3, setting super parameters: presetting network structure parameters, including convolution kernel size, quantity, batch size and learning rate, without additional tuning;

Step 3-a3, mixing precision training, specifically comprising the following steps:

Step 3-a3-1, automatically converting the data and calculation of the first convolution layer L1-201, the second convolution layer L1-203 and the third convolution layer L1-205 into a half-precision floating point FP16 in the training process, wherein a single-precision floating point FP32 is used for a full-connection layer, and the numerical stability is ensured;

Step 3-a3-2, setting a loss scaling, initializing a scaling factor, and if the gradient overflows, reducing the scaling factor; if the gradient does not overflow, the scale factor size is maintained;

step 3-a3-3, calculating an adaptive learning rate and updating weights for each layer by using a layer-by-layer adaptive batch training momentum LAMB optimizer;

Step 3-a3-4, training is terminated, and when the performance of the verification set is continuously X ₁ (generally with the value of 5-10) epochs are not improved (in the neural network, one epoch refers to a training process that all samples of the whole training data set pass through the neural network once), training is stopped;

Step 3-a4, model testing and deployment, which specifically comprises the following steps:

Step 3-a4-1, evaluating the classification accuracy, recall rate, F1 fraction and speed of the progressive convolutional neural network obtained by training on a test set;

Step 3-a4-2, converting the trained progressive convolutional neural network into a format (e.g., tensorFlow SavedModel, pyTorch ScriptModule) suitable for a deployment environment, and providing a unified standardized API call interface according to a processing mode, wherein the interface can load the progressive convolutional neural network in the deployment environment.

In step 3, the decision module uses the multi-layer perceptron MLP to evaluate the confidence level of the classification result of the progressive convolutional neural network, and the structural hierarchy of the decision module includes:

L2-1: the input layer receives 3 5-dimensional feature vectors, wherein the feature vectors are high-dimensional feature vectors subjected to pretreatment before being output by the convolutional neural network;

l2-2: a first hidden layer, which is used for inputting through a fully-connected layer containing 256 neurons and activating by using a ReLU function, and outputting as a 256-dimensional feature vector;

L2-3: the second hidden layer is used for enabling the output of the first hidden layer to pass through a full-connection layer containing 128 neurons, performing batch normalization Batch Normalization operation, activating by using an exponential linear unit ELU function and outputting a 128-dimensional feature vector;

l2-4: the third hiding layer is used for enabling the output of the second hiding layer to pass through a full-connection layer containing 64 neurons, conducting random drop Dropout operation, activating by using an ELU function and outputting a 64-dimensional feature vector;

l2-5: an output layer, comprising individual neurons, is activated using a sigmoid function, with an output limit between 0 and 1.

In step 3, the multi-layer perceptron MLP is trained by:

step 3-b1, constructing a training data set, which specifically comprises the following steps:

Step 3-b1-1, collecting data: after each time of data processing, the progressive multitask convolutional neural network records the logits vectors which are output;

step 3-b1-2, generating a tag pair: comparing the reasoning result output by the progressive convolutional neural network with the real label, if the reasoning result correctly reflects the real label, the data instance is correctly judged by the progressive convolutional neural network, and the data instance is marked as 1; if the reasoning result is different from the real label, the data instance is marked as 0 through the judgment error of the convolutional neural network;

step 3-b1-3, dividing the data set: randomly dividing the data and the label pairs generated in the step 3-b1-2 into mutually exclusive training sets, verification sets and test sets;

Step 3-b2, executing a training process, specifically comprising the following steps:

step 3-b2-1, designing a loss function: the difference between the MLP output of the multilayer perceptron and the real label is measured by adopting a binary cross entropy loss function;

Step 3-b2-2, initializing parameters: initializing weights of the first hidden layer to the third hidden layer by using truncated normal distribution, and initializing a bias vector to be a zero vector;

Step 3-b2-3, setting super parameters including prediction network structure parameters, batch size and learning rate, without additional tuning;

step 3-b3, floating point quantization training, specifically comprising the following steps:

step 3-b3-1, weighting the training set, the labels and the model weight into 16-bit floating point numbers, and performing half-precision floating point FP16 operation in forward propagation and backward propagation;

step 3-b3-2, setting the precision loss factor between 0 and1, and dynamically adjusting according to the input maximum value and minimum value;

step 3-b3-3, using AdamW self-adaptive learning rate optimization algorithm to adjust learning rate update parameters;

Step 3-b3-4, training is terminated: stopping training when the performance of the verification set is continuously X ₁ epochs and is not lifted;

Step 3-b4, model testing and deployment, which specifically comprises the following steps:

Step 3-b4-1, evaluating the performance indexes of the trained multi-layer perceptron MLP on a test set, wherein the performance indexes comprise accuracy and speed;

And step 3-b4-2, converting the trained multi-layer perceptron MLP into an environment format suitable for deployment, and providing a unified calling interface according to a processing mode.

The invention also provides a progressive service flow classification device based on the convolutional neural network, which comprises:

the preprocessing module is used for preprocessing the acquired network data to form characteristic input and selection input;

The convolutional neural network classification module is used for carrying out service classification, bandwidth and duration prediction on the characteristic input;

and the decision module is used for carrying out confidence evaluation on the classification result and feeding back the evaluation result to the dynamic window to adjust the size.

The invention also provides a flow classification device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method.

The beneficial effects are that: compared with the prior art, the invention has the following advantages:

1. The design adopts a strategy of dynamically adjusting the size of the data window based on the requirement, so that the system can primarily classify the flow through the data in the small window. Based on the confidence of the classification result, the system will dynamically adjust the amount of data that needs to be classified next. The method avoids the time delay of waiting for observing the whole data flow traditionally, and reduces the resource waste in the data processing process, thereby improving the response speed and optimizing the resource utilization efficiency.

2. The design adopts a multi-head output mechanism, an independent output head is designed for the input generated by different data window sizes, and the model output is dynamically adjusted according to the number of data packets in the window.

The invention discloses a progressive flow classification method based on a convolutional neural network, which can effectively classify flow without waiting for complete data flow, collect data through a preliminary small window and carry out quick classification, then dynamically adjust the size of a subsequent data window according to the confidence level of classification results, is suitable for high-speed and dynamic network environments of a data center, and provides a new effective tool for real-time network flow management and priority adjustment.

Drawings

FIG. 1 is a flow chart of the steps of the method of the present invention.

Fig. 2 is a diagram of network traffic data preprocessing.

Fig. 3 is a block diagram of a progressive multitasking one-dimensional convolutional neural network.

Fig. 4 is a training process of the progressive one-dimensional convolutional neural network of the present invention.

FIG. 5 is a diagram of a multi-layer sensor network architecture of a decision module.

FIG. 6 is a diagram illustrating a multi-layered perceptron training process in accordance with the present invention.

Detailed Description

The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.

As shown in fig. 1, the invention discloses a progressive traffic classification method based on a convolutional neural network, which comprises the following steps: firstly, collecting network traffic, and primarily dividing the network traffic according to five tuples. Then, processing the data set in a multistage preprocessing mode, introducing a dynamic time window, setting the size of an initial time window, intercepting part of data packets of the data flow from the window, extracting characteristic information from the data packets, inputting the characteristic information into a constructed progressive one-dimensional convolutional neural network to learn time sequence characteristics of the network flow data, and evaluating the confidence coefficient of a preliminary classification result through a decision module; and determining the operation of the next part according to the evaluation result. If the evaluation result shows the belief classification result, the collection of the stream is terminated, thereby achieving the purpose of saving resources; if the evaluation result shows that the classification result is not ideal, the stream can be continuously collected, and the use quantity of the data packets is increased, so that the accuracy of model classification is improved. Compared with the traditional convolutional neural network, one model can only analyze the limitation of one length sequence, and the progressive convolutional neural network adopts a multi-head output selection mechanism to dynamically select a proper output head to output a classification result according to a characteristic sequence generated by using the number of actual data packets, so that tasks which need a plurality of models to be completed are fused into one model to be completed.

The method specifically comprises the following steps:

Step 4, preprocessing high-dimensional feature vectors of three tasks which are not normalized before being output by a progressive convolutional neural network, wherein the high-dimensional feature vectors represent scores obtained by first-class reasoning results in bandwidth prediction tasks, the scores obtained by first-class reasoning results in duration prediction tasks and the scores obtained by first-class results in application classification tasks, the preprocessing comprises the steps of multiplying each high-dimensional feature vector by a task factor, namely, the task factors represent the proportion of the three classification tasks, setting the proportion of the task factors to be 1 if the three classification tasks are valued equally, and finally carrying out normalization processing to scale all elements in the vectors equally to a range of (0, 1);

In this embodiment, as shown in fig. 2, step 1 specifically includes the following steps:

Step 1-5, matching the normalized feature vector with the input dimension of the progressive convolutional neural network, and supplementing the dimension in a zero filling mode if the feature vector dimension does not meet the input requirement of the one-dimensional convolutional neural network;

Step 1-6, generating an index vector: the vector is composed of elements '0' and '1', the element '1' represents the feature part actually generated in the feature vector, the element '0' represents the filling part in the feature vector, and the lengths of the index vector and the feature vector are kept consistent.

In this embodiment, as shown in fig. 3, the classification module of the progressive network traffic classification includes a one-dimensional convolutional neural network, and the hierarchical structure of the convolutional neural network is as follows:

L1-1: an input layer for receiving two types of data sequences, wherein the two types of data sequences are set to be sequences with the length of 160, the first type of data sequences are main inputs, and the input is the feature vectors generated in the step 1 and represent the time and space features of the network flow; the second is auxiliary input, the input is index vector, which indicates the actual length of the input convolution sequence, namely the use quantity of data packets of one stream;

L1-5: the branch output layer comprises M branch outputs, each branch comprises the output of three mutually independent tasks, the output of the fully connected layer is selected to pass through the fully connected layer of one branch, the feature vector is mapped to 3 5-dimensional class output vectors, the weight and the bias of the fully connected layer in each branch are mutually independent, and three classification tasks and the class thereof are defined as: { application type: 0: application a,1: application B,2: application C,3: application D,4: application E } { traffic bandwidth: 0: bandwidth a,1: bandwidth B,2: bandwidth C,3: bandwidth D,4: bandwidth E } { duration: 0: time a,1: time B,2: time C,3: time D,4: time E };

In this embodiment, as shown in fig. 4, the progressive one-dimensional convolutional neural network is trained by:

step 2-1-1, collecting data: original network traffic data is collected from various network environments, common service scenes are covered, and the original network traffic data can be captured respectively: weChat flow, internet easy cloud music flow, tencent video flow, hundred-degree search flow and Tengxu QQ flow; recording the bandwidth and duration of each stream separately;

Step 2-1-2, data classification: classifying the collected original network flow data according to service types, and classifying and labeling the bandwidth and duration time based on a predefined interval; the bandwidth may be divided into five intervals, a first interval: less than 0.01Mbps; the second interval: 0.01Mbps to 0.05Mbps; third interval: 0.05Mbps to 0.1Mbps; fourth interval: 0.1Mbps to 0.5Mbps; fifth interval: greater than 0.5Mbps; the duration may be divided into five intervals, a first interval: less than 0.01s; the second interval: 0.01s to 0.5s; third interval: 0.5s to 1s; fourth interval: 1s to 5s; fifth interval: greater than 5s;

Step 2-1-3, dividing the data set: randomly dividing the data and the label pairs marked in the step 2-1-2 into mutually exclusive training sets, verification sets and test sets;

step 2-2, executing a training process, specifically comprising the following steps:

Step 2-2-1, designing a loss function: the classification cross entropy loss function is adopted to measure the difference between the distribution of the output content types of the neural network and the real labels;

step 2-2-2, initializing parameters: for the first convolution layer L1-201, the second convolution layer L1-203, the third convolution layer L1-205 and the weights of the full connection layer use a normal initialization method, initializing to a small constant value of 0.01 for the bias vector;

Step 2-2-3, setting super parameters: presetting network structure parameters, including convolution kernel size, quantity, batch size and learning rate, without additional tuning;

step 2-3, mixing precision training, specifically comprising the following steps:

step 2-3-1, automatically converting the data and calculation of the first convolution layer L1-201, the second convolution layer L1-203 and the third convolution layer L1-205 into half-precision floating points (FP 16) in the training process, wherein a single-precision floating point (FP 32) is used for a full-connection layer, and the numerical stability is ensured;

Step 2-3-2, setting loss scaling, initializing a scaling factor to 1024, and if the gradient overflows, reducing the scaling factor; if the gradient is not overflowed, the size of the scaling factor is kept;

step 2-3-3, calculating an adaptive learning rate and updating weights for each layer by using a LAMB optimizer (layer-by-layer adaptive batch training momentum optimizer);

step 2-3-4, training is terminated, and training is stopped when the performance of the verification set is continuously improved by 10 epochs (in the neural network, one epoch refers to a training process that all samples of the whole training data set pass through the neural network once);

Step 2-4, model testing and deployment, which specifically comprises the following steps:

step 2-4-1, evaluating the classification accuracy, recall rate, F1 fraction and speed of the progressive convolutional neural network obtained by training on a test set;

Step 2-4-2, converting the trained progressive convolutional neural network into a format (e.g., tensorFlow SavedModel, pyTorch ScriptModule) suitable for a deployment environment, and providing a unified standardized API call interface according to a processing mode, wherein the interface can load the progressive convolutional neural network in the deployment environment.

In this embodiment, as shown in fig. 5, the decision module of progressive traffic classification includes a multi-layer sensor, and the hierarchical structure of the network is as follows:

L2-3: the second hidden layer, output of the first hidden layer is processed through a full-connection layer containing 128 neurons, and Batch Normalization (batch normalization) operation is carried out, and ELU (exponential linear unit) function is used for activation, so that a 128-dimensional feature vector is output;

L2-4: the third hidden layer, output of the second hidden layer is passed through a full-connection layer containing 64 neurons, dropout (random discard) operation is carried out, ELU function is used for activation, and a 64-dimensional feature vector is output;

In this embodiment, as shown in fig. 6, the multi-layer sensor network is trained by:

step 3-1, constructing a training data set, which specifically comprises the following steps:

Step 3-1-1, collecting data: after each time of data processing, the progressive multitask convolutional neural network records the logits vectors which are output;

Step 3-1-2, generating a label: comparing the reasoning result output by the progressive convolutional neural network with the real label, if the reasoning result correctly reflects the real label, the data instance is correctly judged by the progressive convolutional neural network, and the data instance is marked as 1; if the reasoning result is different from the real label, the data instance is marked as 0 through the judgment error of the convolutional neural network;

step 3-1-3, dividing the data set: randomly dividing the data and the label pairs generated in the step 3-1-2 into mutually exclusive training sets, verification sets and test sets;

step 3-2, executing a training process, specifically comprising the following steps:

Step 3-2-1, designing a loss function: the difference between the MLP output of the multilayer perceptron and the real label is measured by adopting a binary cross entropy loss function;

Step 3-2-2, initializing parameters: initializing weights of the first hidden layer to the third hidden layer by using truncated normal distribution, and initializing a bias vector to be a zero vector;

Step 3-2-3, setting super parameters including prediction network structure parameters, batch size and learning rate, without additional tuning;

Step 3-3, floating point quantization training, specifically comprising the following steps:

Step 3-3-1, weighting the training set, the labels and the model weight into 16-bit floating point numbers, and performing half-precision floating point FP16 operation in forward propagation and backward propagation;

Step 3-3-2, setting the precision loss factor at 0.5, and dynamically adjusting according to the input maximum value and minimum value;

step 3-3-3, using AdamW self-adaptive learning rate optimization algorithm to adjust learning rate update parameters;

step 3-3-4, training is terminated: stopping training when the performance of the verification set is continuously improved by 10 epochs;

Step 3-4, model testing and deployment, which specifically comprises the following steps:

Step 3-4-1, evaluating the performance indexes of the trained multi-layer perceptron MLP on a test set, wherein the performance indexes comprise accuracy and speed;

And 3-4-2, converting the trained multi-layer perceptron MLP into an environment format suitable for deployment, and providing a unified calling interface according to a processing mode.

The invention provides a progressive service flow classification method and device based on a convolutional neural network, and the method and the device for realizing the technical scheme are a plurality of methods and approaches, and the above description is only a preferred embodiment of the invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and modifications can be made without departing from the principle of the invention, and the improvements and modifications are also considered as the protection scope of the invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims

1. The progressive service flow classification method based on the convolutional neural network is characterized by comprising the following steps of:

Step 3, inputting the constructed feature vector and the corresponding index vector into a progressive convolutional neural network, wherein the progressive convolutional neural network comprises three classification tasks, namely an application classification task, a traffic bandwidth classification task and a traffic duration task; the progressive convolutional neural network can accept feature vectors generated under time windows with different sizes, and meanwhile, a multi-head output selection mechanism is adopted: dynamically adjusting an output layer structure according to the input feature vector and the index vector, wherein each output head corresponds to a feature vector with an original length, and each output head can output classification results of three classification tasks simultaneously;

Step 4, preprocessing high-dimensional feature vectors (α₁,α₂,α₃,α₄,α₅),(β₁,β₂,β₃,β₄,β₅),(γ₁,γ₂,γ₃,γ₄,γ₅) of three tasks which are not normalized before being output by a progressive convolutional neural network, wherein alpha ₁ represents a score obtained by a first type of reasoning result in a bandwidth prediction task, beta ₁ represents a score obtained by a first type of reasoning result in a duration prediction task, gamma ₁ represents a score obtained by a first type of result in an application classification task, the preprocessing comprises the steps of multiplying each high-dimensional feature vector by task factors w ₁、w₂ and w ₃ respectively, namely (α₁,α₂,α₃,α₄,α₅)*w₁,(β₁,β₂,β₃,β₄,β₅)*w₂,(γ₁,γ₂,γ₃,γ₄,γ₅)*w₃,, wherein the task factors represent the specific weights of the three classification tasks, setting the specific weights to be 1 if the three classification tasks are valued equally, and finally performing normalization processing to scale all elements in the vectors to be within a range of (0, 1);

step 5, taking the feature vector preprocessed in the step 4 as input to a decision module, and performing confidence assessment on the results of the three classification tasks by the decision module to calculate a confidence index; if the confidence coefficient index is larger than or equal to a preset threshold value, confirming that the final classification result is the final classification result, ending the classification process of the stream and ending the collection of the data packet, if the confidence coefficient index is lower than the preset threshold value, expanding a time window, continuing to collect the data packet, and repeating the steps 1-3 for reclassifying until the confidence coefficient index reaches the preset threshold value or the complete data stream is collected;

2. The method according to claim 1, wherein step 1 comprises the steps of:

Step 1-1, capturing network data packets, dividing the data streams according to five-tuple, namely a source IP address, a destination IP address, a source port number, a destination port number and a transport layer protocol, applying a dynamically adjustable time window to each data stream, and setting the initial size of the time window, wherein the time window defines the capturing quantity of the data packets;

Step 1-3, constructing a feature vector ((d₁*L₁,T₁),(d₂*L₂,T₂),…,(d_N*L_N,T_N)), based on the feature information, wherein d _N represents the transmission direction of the nth data packet, L _N represents the packet length of the nth data packet, T ₁ represents the arrival time of the first data packet, and T ₂ represents the time interval between the first data packet and the second data packet; n represents the number of data packets in the time window;

3. The method according to claim 2, wherein step 2 comprises the steps of:

4. A method according to claim 3, characterized in that in step 3, the progressive multitasking convolutional neural network is trained by:

Step 3-a1-1, collecting data: collecting original network flow data from various network environments, and identifying and marking service types through deep packet detection; in the acquisition process, marking the bandwidth and duration of each data stream at the same time;

Step 3-a1-2, data classification: classifying the collected original network flow data according to service types, and classifying and labeling the bandwidth and the duration based on a predefined interval to obtain a label pair;

Step 3-a2-2, initializing parameters: for the first convolution layer L1-201, the second convolution layer L1-203, the third convolution layer L1-205 and the weights of the full connection layer use a normal initialization method to initialize the bias vector to a constant value;

Step 3-a3-1, automatically converting the data and calculation of the first convolution layer L1-201, the second convolution layer L1-203 and the third convolution layer L1-205 into a half-precision floating point FP16 in the training process, wherein a single-precision floating point FP32 is used for a full connection layer;

Step 3-a3-4, training is terminated, and training is stopped when the performance of the verification set is continuously X ₁ epochs and is not lifted;

And step 3-a4-2, converting the trained progressive convolutional neural network into a format suitable for a deployment environment, and providing a unified standardized API call interface according to a processing mode.

5. The method of claim 4, wherein in step 3, the decision module evaluates confidence in the classification result of the progressive convolutional neural network using a multi-layer perceptron MLP, and the structural hierarchy of the decision module comprises:

6. The method according to claim 5, wherein in step 3, the multi-layer perceptron MLP is trained by:

7. A progressive traffic classification device based on the method of any one of claims 1to 6, comprising:

the progressive convolutional neural network is used for carrying out service classification, bandwidth and duration prediction on the characteristic input;

8. A traffic classification device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method according to any one of claims 1-6.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-6.