CN115333959B

CN115333959B - Flow prediction method of distributed network platform

Info

Publication number: CN115333959B
Application number: CN202211244691.0A
Authority: CN
Inventors: 胡夕国; 胡玥
Original assignee: Nantong Zhonghong Network Technology Co ltd
Current assignee: Nantong Zhonghong Network Technology Co ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-03-31
Anticipated expiration: 2042-10-12
Also published as: CN115333959A

Abstract

The invention relates to a flow prediction method of a distributed network platform, belonging to the technical field of network flow prediction. The method comprises the following steps: obtaining target candidate sampling matrixes according to the rank of the low-rank matrix and linear independent row vectors in the low-rank matrix; obtaining a correlation characteristic value corresponding to each target candidate sampling matrix according to each linear independent row vector in the low-rank matrix corresponding to each target candidate sampling matrix; according to the sparse matrix, obtaining entropy values corresponding to the target candidate sampling matrixes; obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value; obtaining a flow prediction network according to each sub-flow data sequence corresponding to the optimal data length; and inputting the flow data to be predicted into a flow prediction network to obtain predicted flow data at the next moment. The invention can reduce the network training time and improve the prediction precision of the flow prediction network.

Description

Flow prediction method of distributed network platform

Technical Field

The invention relates to the technical field of network traffic prediction, in particular to a traffic prediction method of a distributed network platform.

Background

With the increasing development of networks, the services and applications borne by the networks are increasingly abundant, the key problems to be solved are to strengthen the network management construction and improve the network operation speed and the utilization rate, and the key to solve the problems is to predict the network flow; prediction refers to estimating and inferring in advance from past or present information.

The existing network traffic prediction method generally performs traffic prediction according to a trained traffic prediction network, each piece of data input in the network training process is often traffic data of a period of time, and the traffic data sequence of the period of time is not divided according to the real period of the data sequence and then the network training is performed, so that the training time of a neural network can be increased by the traditional traffic prediction network training method, and the prediction accuracy of the traffic prediction network is relatively low.

Disclosure of Invention

The invention provides a flow prediction method of a distributed network platform, which is used for solving the problem of lower prediction precision when the existing method predicts flow data, and adopts the following technical scheme:

an embodiment of the present invention provides a traffic prediction method for a distributed network platform, including the following steps:

acquiring a flow data sequence corresponding to a target time period;

uniformly dividing the flow data sequence by using different data lengths to obtain sub flow data sequences corresponding to the data lengths; according to the sub-flow data sequences, a sampling matrix corresponding to each data length is constructed; decomposing the sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to the sampling matrix;

obtaining each candidate sampling matrix according to the rank of the low-rank matrix; screening the candidate sampling matrixes according to the linear independent row vectors in the low-rank matrix to obtain target candidate sampling matrixes;

obtaining a correlation characteristic value corresponding to each target candidate sampling matrix according to each linear independent row vector in the low-rank matrix corresponding to each target candidate sampling matrix; obtaining entropy values corresponding to the target candidate sampling matrixes according to the sparse matrix; obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value;

obtaining a flow prediction network according to each sub-flow data sequence corresponding to the optimal data length; and inputting the flow data to be predicted into a flow prediction network to obtain predicted flow data at the next moment.

Has the advantages that: the method comprises the steps of uniformly dividing flow data sequences by using different data lengths, and constructing to obtain sampling matrixes corresponding to the data lengths; decomposing each sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to each sampling matrix; then obtaining each candidate sampling matrix according to the rank of the low-rank matrix; screening each candidate sampling matrix according to each linear independent row vector in each low-rank matrix to obtain each target candidate sampling matrix; then according to each linear independent row vector in the low-rank matrix corresponding to each target candidate sampling matrix, obtaining a correlation characteristic value corresponding to each target candidate sampling matrix; according to the sparse matrix, entropy values of the sparse matrix corresponding to each target candidate sampling matrix are obtained; then obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value; finally, according to each sub-flow data sequence corresponding to the optimal data length, a flow prediction network is obtained; and inputting the flow data to be predicted into a flow prediction network to obtain predicted flow data at the next moment. The method of the invention not only can reduce the labeled data volume and network training time during network training, but also can improve the prediction precision of the flow prediction network.

Preferably, the method for obtaining each candidate sampling matrix according to the rank of the low-rank matrix includes:

arranging the data lengths in a sequence from small to large to obtain a data length sequence;

acquiring the rank of a low-rank matrix corresponding to each sampling matrix corresponding to the data length sequence, and constructing to obtain a rank sequence;

acquiring a minimum rank position in the rank sequence;

selecting a preset rank on the left side of each minimum rank position and a preset rank on the right side of each minimum rank position in the rank sequence; and recording the sampling matrix corresponding to each minimum rank, the sampling matrix corresponding to the preset rank at the left side of each minimum rank position and the sampling matrix corresponding to the preset rank at the right side of each minimum rank position as candidate sampling matrices.

Preferably, the method for screening the candidate sampling matrices according to the linearly independent row vectors in the low-rank matrix to obtain the target candidate sampling matrices includes:

acquiring each linear independent row vector in a low-rank matrix corresponding to each candidate sampling matrix;

according to each linear independent row vector in the low-rank matrix, each program group corresponding to each linear independent row vector in the low-rank matrix is constructed;

calculating to obtain each ideal row vector corresponding to each linear independent row vector according to each program group corresponding to each linear independent row vector;

obtaining the correlation rank of the low-rank matrix corresponding to each candidate sampling matrix according to the cosine similarity between each linear independent row vector and each corresponding ideal row vector;

and screening the candidate sampling matrixes according to the correlation rank to obtain target candidate sampling matrixes.

Preferably, the method for obtaining each ideal row vector corresponding to each linearly independent row vector includes:

for any candidate sampling matrix:

obtaining each linear independent row vector in a low-rank matrix corresponding to the candidate sampling matrix; if the linearly independent row vectors are linearly independent row vector a = [ A1 A2 A3], linearly independent row vector B = [ B1B 2B 3], and linearly independent row vector C = [ C1C 2C 3], respectively; the A1, A2, A3, B1, B2, B3, C1, C2 and C3 are parameters in a linear independent row vector in the low-rank matrix;

for a linearly independent row vector a:

according to each linear independent row vector corresponding to the low-rank matrix, each equation set corresponding to the linear independent row vector A is constructed and obtained, and each equation set is a first equation set corresponding to the linear independent row vector A

A second set of equations corresponding to a linearly independent row vector A>

And a third equation set corresponding to the linearly independent row vector A

(ii) a Is/are>

、

、

Is a coefficient;

according to linearly independent row vector A pairCalculating the coefficients in the first equation group according to the first equation group

Is greater than or equal to>

A value of (d); />

According to the coefficient

And a value and coefficient>

To obtain a first ideal value corresponding to the linearly independent row vector A

；

According to the first ideal value, a first ideal row vector corresponding to the linear independent row vector A is constructed and obtained

；

According to a second equation set corresponding to the linear independent row vector A, calculating to obtain coefficients in the second equation set

And a value and coefficient>

A value of (d);

according to the coefficient

Is greater than or equal to>

To obtain a second ideal value corresponding to the linearly independent row vector A

；

According to the second ideal value, a second ideal row vector corresponding to the linear independent row vector A is constructed and obtained

；

According to the third program group corresponding to the linear independent row vector A, calculating to obtain the coefficient in the third program group

And a value and coefficient>

A value of (d);

according to the coefficient

And a value and coefficient>

To obtain a third ideal value corresponding to the linearly independent row vector A

；

According to the third ideal value, a third ideal row vector corresponding to the linear independent row vector A is constructed and obtained

；

The first ideal row vector, the second ideal row vector and the third ideal row vector are ideal row vectors corresponding to the linearly independent row vector A.

Preferably, the method for obtaining the correlation rank of the low-rank matrix corresponding to each candidate sampling matrix according to the cosine similarity between each linearly independent row vector and each corresponding ideal row vector includes:

for any candidate sampling matrix:

calculating each cosine similarity between each linear independent row vector in the low-rank matrix corresponding to the candidate sampling matrix and each corresponding ideal row vector to obtain each cosine similarity corresponding to the linear independent row vector A, each cosine similarity corresponding to the linear independent row vector B and each cosine similarity corresponding to the linear independent row vector C;

judging whether the cosine similarity corresponding to the linear irrelevant row vector A is larger than a preset similarity threshold value, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix;

judging whether the cosine similarity corresponding to the linear irrelevant row vector B is larger than a preset similarity threshold value, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix;

judging whether the cosine similarity corresponding to the linear irrelevant row vector C is larger than a preset similarity threshold value or not, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix;

and counting an accumulated value of the rank-subtracted values of the low-rank matrix corresponding to the candidate sampling matrix, and marking the value obtained by subtracting the accumulated value from the rank of the low-rank matrix corresponding to the candidate sampling matrix as the related rank of the low-rank matrix corresponding to the candidate sampling matrix.

Preferably, the method for screening the candidate sampling matrices according to the correlation rank to obtain the target candidate sampling matrices includes:

arranging the relevant ranks of the low-rank matrix corresponding to each candidate sampling matrix according to the sequence from large to small, and constructing to obtain a relevant rank sequence; and recording each candidate sampling matrix corresponding to the minimum correlation rank in the correlation rank sequence as a target candidate sampling matrix.

Preferably, the method for obtaining the correlation eigenvalue corresponding to each target candidate sampling matrix according to each linear independent row vector in the low rank matrix corresponding to each target candidate sampling matrix includes:

for any target candidate sampling matrix:

obtaining cosine similarities which are greater than a preset similarity threshold value in cosine similarities between each linear irrelevant column vector in a low-rank matrix corresponding to the target candidate sampling matrix and each corresponding ideal column vector, and marking as the cosine similarities of each target corresponding to the target candidate sampling matrix;

and calculating the mean value of the cosine similarity of each target corresponding to each target candidate sampling matrix, and recording the mean value as the correlation characteristic value corresponding to each target candidate sampling matrix.

Preferably, the method for obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value includes:

obtaining the eigenvalue of each target candidate sampling matrix according to the corresponding correlation eigenvalue and the corresponding entropy value of each target candidate sampling matrix;

and recording the data length of the target candidate sampling matrix corresponding to the maximum eigenvalue as the optimal data length corresponding to the flow data sequence.

Drawings

To more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the following description will be made

While the embodiments or figures required for use in the prior art description are briefly described, it should be apparent that the figures in the following description are merely examples of the invention and that other figures may be derived from those figures by a person skilled in the art without inventive step.

Fig. 1 is a flowchart of a traffic prediction method of a distributed network platform according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention belong to the protection scope of the embodiments of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The embodiment provides a traffic prediction method for a distributed network platform, which is described in detail as follows:

as shown in fig. 1, the traffic prediction method for the distributed network platform includes the following steps:

and S001, acquiring a flow data sequence corresponding to the target time period.

The embodiment mainly utilizes the traffic prediction network to predict the traffic data in the future time period, and when the network is used for prediction, the network can be regarded as a fitter for fitting the internal rules of the data, wherein the period is an important property when the internal rules of the data are fitted; and the traffic data has certain periodicity in general, so that the prediction can be carried out by using a network. If the fitting of the neural network to the periodic information is not good, the accuracy of the prediction result is difficult to guarantee while the calculation amount is increased. Therefore, the embodiment analyzes the traffic data sequence to obtain the optimal data length corresponding to the traffic data sequence, where the optimal data length is close to the real period of the traffic data sequence; then, each sub-flow data sequence corresponding to the optimal data length is used as input in network training to train the network, and a trained flow prediction network is obtained; and then inputting the flow data to be predicted into a flow prediction network to obtain the predicted flow data at the next moment. In the embodiment, the input data is not random flow data of a period of time during network training, but a data sequence obtained by dividing the flow data of a period of time according to the period of the data sequence; then training the flow prediction network according to the divided data sequence; therefore, the method not only can reduce the labeled data volume and the network training time during network training, but also can improve the prediction accuracy of the traffic prediction network.

In this embodiment, each flow data corresponding to a target time period and a flow data sequence corresponding to the target time period are selected from a database, and a flow data curve corresponding to the flow data sequence is constructed and obtained according to the flow data sequence; the present embodiment sets the target time period to one day; as another embodiment, the time length of another target time period may be set according to different requirements, for example, the target time period may be set to 5 hours. And because the flow data curve has the problem of period nesting, the conventional spectrum analysis method is difficult to directly calculate the optimal period of the flow data sequence.

Step S002, uniformly dividing the flow data sequence by utilizing different data lengths to obtain sub flow data sequences corresponding to the data lengths; constructing and obtaining a sampling matrix corresponding to each data length according to each sub-flux data sequence; and decomposing the sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to the sampling matrix.

In this embodiment, the traffic data sequence is divided by using different data lengths, and a sampling matrix corresponding to each data length is constructed; decomposing each sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to each sampling matrix; the low-rank matrix is an original matrix without noise influence, and the sparse matrix is a matrix influenced by noise; the low-rank matrix and the sparse matrix are the basis for obtaining the optimal data length through subsequent analysis.

In this embodiment, the number of data lengths is set to a, and the data lengths are arranged in a descending order to obtain a data length sequence, where differences between adjacent data lengths in the data length sequence are equal, and an initial data length in the data length sequence is set to be smaller, and a difference between adjacent data lengths in the data length sequence is set to be smaller; the difference between the maximum data length in the data length sequence and the total data length corresponding to the traffic data sequence is large, so that the number of rows of each obtained sampling matrix is generally large. For any data length: uniformly dividing the flow data sequence according to the data length to obtain each sub-flow data sequence corresponding to the data length; then according to each sub-flux data sequence,constructing and obtaining a sampling matrix corresponding to the data length; for example, the traffic data sequence is: [1 2 3 4 2 1 2 3 4 3 1 2 3 4 5]Setting the data length to be 5, that is, taking 5 as a period, and then uniformly dividing the stream data sequence according to the data length 5 to obtain three sub-stream data sequences [1 2 3 4 2 ] corresponding to the data length]、 [1 2 3 4 3]And [1 2 3 4 5](ii) a And constructing and obtaining a sampling matrix corresponding to the data length

. Therefore, the sampling matrix corresponding to each data length in the data length sequence can be obtained through the method.

In the embodiment, each sampling matrix is decomposed by using RPCA to obtain a low-rank matrix and a sparse matrix corresponding to each sampling matrix; the low-rank matrix can represent the redundancy degree of matrix data, that is, the lower the rank corresponding to the low-rank matrix is, the closer the data length corresponding to the corresponding sampling matrix is to the real period of the flow data sequence is, otherwise, the larger the difference between the data length corresponding to the corresponding sampling matrix and the real period of the flow data sequence is; the sparse matrix is a noise matrix, when the entropy value of the noise matrix is larger, the difference between the data length of the corresponding sampling matrix and the real period of the flow data sequence is larger, otherwise, the difference between the data length of the corresponding sampling matrix and the real period of the flow data sequence is smaller.

S003, obtaining each candidate sampling matrix according to the rank of the low-rank matrix; and screening the candidate sampling matrixes according to the linear independent row vectors in the low-rank matrix to obtain target candidate sampling matrixes.

In this embodiment, each sampling matrix is constructed by uniformly dividing the traffic data sequence according to different data lengths, and because the rank of the matrix represents the number of linearly independent row vectors in the matrix, when the rank of the low-rank matrix corresponding to the sampling matrix is smaller, it indicates that the data length corresponding to the sampling matrix is closer to the real period of the traffic data sequence; and as the data length increases, it will slowly approach the real period of the traffic data sequence, which may then beSlowly away from the real period of the traffic data sequence; the linear independence is absolute linear independence, namely, as long as any row vector in the matrix can not be completely represented by other row vectors, the row vector is the linear independence row vector; however, for periodic traffic data, while periodicity exists, there often exists a slight difference in different periods, so that the data length of the sampling matrix corresponding to the low-rank matrix corresponding to the minimum rank may not be the closest to the real period of the traffic data sequence in the data length sequence, and it is possible that other data lengths around the data length of the sampling matrix corresponding to the low-rank matrix corresponding to the minimum rank are the closest to the real period of the traffic data sequence; e.g. for any low rank matrix

The low rank matrix has a rank of 3 and for an ideal matrix->

The ideal matrix has a rank of 2 and the low rank matrix +>

And the ideal matrix

The difference in (2) is only between 6 and 7, but the difference between the two matrices is small, i.e. [ 4 7 ]]And [2 4 6]The difference between is small and the low rank matrix +>

In [1 2 3]]And [2 4 7]Is also large, but not a hundred percent linear correlation, the data length of the sampling matrix corresponding to the low rank matrix may be closest to the real period of the traffic data sequence in the data length sequence, but the rank corresponding to the low rank matrix may not be the minimum. Therefore, if the present embodiment only refers to the rank of the low rank matrix to find the real period of the traffic data sequence, the above phenomenon may be ignored, and therefore the present embodiment needs to correspond to the minimum rankObtaining each candidate sampling matrix according to the data length of the sampling matrix corresponding to the low-rank matrix, and then analyzing each linear independent row vector in the low-rank matrix corresponding to each candidate sampling matrix to obtain each target candidate sampling matrix; and subsequently, taking each target candidate sampling matrix as a basis for obtaining the optimal data length corresponding to the flow data sequence, wherein the optimal data length is closest to the real period of the flow data sequence in the data length sequence.

(a) The specific process of obtaining each candidate sampling matrix according to the rank of the low-rank matrix is as follows:

in this embodiment, the rank of the low-rank matrix corresponding to each sampling matrix corresponding to the data length sequence is obtained, and a rank sequence is constructed; acquiring a minimum rank position in a rank sequence, wherein the minimum rank may be multiple; then, selecting a preset rank on the left side of each minimum rank position and a preset rank on the right side of each minimum rank position in the rank sequence, and recording a sampling matrix corresponding to each minimum rank, a sampling matrix corresponding to the preset rank on the left side of each minimum rank position and a sampling matrix corresponding to the preset rank on the right side of each minimum rank position as candidate sampling matrices; the present embodiment sets and selects 5 ranks on the left side of each minimum rank position and 5 ranks on the right side of each minimum rank position in the rank sequence.

(b) Screening each candidate sampling matrix according to each linear independent row vector in the low-rank matrix, and obtaining each target candidate sampling matrix comprises the following specific processes:

acquiring each linear independent row vector in a low-rank matrix corresponding to each candidate sampling matrix; according to each linear independent row vector in the low-rank matrix corresponding to each candidate sampling matrix, each program group corresponding to each linear independent row vector in the low-rank matrix corresponding to each candidate sampling matrix is constructed; calculating to obtain each ideal row vector corresponding to each linear independent row vector according to each equation group corresponding to each linear independent row vector; and for any one of the linearly independent row vectors, the number of the equation sets corresponding to the linearly independent row vector is the number of the parameters in the linearly independent row vector, the number of the ideal row vectors corresponding to the linearly independent row vector is also the number of the parameters in the linearly independent row vector, and the number of the equations in any one of the equation sets corresponding to the linearly independent row vector is the number of the parameters in the linearly independent row vector minus 1. The specific process of obtaining each ideal row vector corresponding to each linearly independent row vector is as follows:

for any candidate sampling matrix: obtaining each linear independent row vector in a low-rank matrix corresponding to the candidate sampling matrix; if the linearly independent row vectors are respectively the linearly independent row vector A = [ A1A 2A 3]]Linear independent row vector B = [ B1B 2B 3 =]And a linearly independent row vector C = [ C1C 2C 3]](ii) a Then, according to each linear independent row vector in the low-rank matrix corresponding to the candidate sampling matrix, each equation set corresponding to the linear independent row vector A is constructed and obtained, and each equation set is a first equation set corresponding to the linear independent row vector A

A second equation set corresponding to the linearly independent row vector A

And a third equation set corresponding to the linearly independent row vector A

The A1, A2, A3, B1, B2, B3, C1, C2, C3 are parameters in the linearly independent row vector in the low rank matrix corresponding to the candidate sampling matrix, and the

、

、

Are coefficients. According to a first equation group corresponding to the linear independent row vector A, calculating to obtain coefficients in the first equation group

Value and coefficient of

A value of (d); according to the obtained coefficient

Value and coefficient of

(ii) a According to the first ideal value, a first ideal row vector corresponding to the linear independent row vector A is constructed and obtained

. According to a second equation set corresponding to the linear independent row vector A, calculating to obtain coefficients in the second equation set

Value and coefficient of

A value of (d); according to the obtained coefficient

Value and coefficient of

(ii) a According to the second ideal value, a second ideal row vector corresponding to the linear irrelevant row vector A is constructed and obtained

. According to the third program group corresponding to the linear independent row vector A, calculating to obtain coefficients in the third program group

Value and coefficient of

A value of (d); according to the obtained coefficient

Value and coefficient of

(ii) a According to the third ideal value, a third ideal row vector corresponding to the linear irrelevant row vector A is constructed and obtained

。

By analogy, a first ideal row vector, a second ideal row vector and a third ideal row vector corresponding to the linearly independent row vector B can be obtained, and a first ideal row vector, a second ideal row vector and a third ideal row vector corresponding to the linearly independent row vector C can be obtained. Therefore, the ideal row vectors corresponding to the linearly independent row vector a, the ideal row vectors corresponding to the linearly independent row vector B, and the ideal row vectors corresponding to the linearly independent row vector C can be obtained through the above processes.

Then calculating each cosine similarity between each linear independent row vector in the low-rank matrix corresponding to the candidate sampling matrix and each corresponding ideal row vector to obtain each cosine similarity corresponding to the linear independent row vector A, each cosine similarity corresponding to the linear independent row vector B and each cosine similarity corresponding to the linear independent row vector C; judging whether the cosine similarity corresponding to the linear irrelevant row vector A is larger than a preset similarity threshold value or not, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix; judging whether the cosine similarity corresponding to the linear irrelevant row vector B is larger than a preset similarity threshold value, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix; judging whether the cosine similarity corresponding to the linear irrelevant row vector C is larger than a preset similarity threshold value, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix; counting an accumulated value of the rank subtracted value of the low-rank matrix corresponding to the candidate sampling matrix, and marking a value obtained by subtracting the accumulated value from the rank of the low-rank matrix corresponding to the candidate sampling matrix as a correlation rank of the low-rank matrix corresponding to the candidate sampling matrix; the present embodiment sets the value of the preset similarity threshold to 0.8.

Therefore, in this embodiment, the ideal row vectors corresponding to the linear independent row vectors in the low-rank matrix corresponding to each candidate sampling matrix, the cosine similarities between the ideal row vectors and the cosine similarities between the ideal row vectors corresponding to each linear independent row vector in the low-rank matrix corresponding to each candidate sampling matrix, and the correlation rank of the low-rank matrix corresponding to each candidate sampling matrix can be obtained by the above method; in this embodiment, the correlation ranks of the low-rank matrix corresponding to each candidate sampling matrix are arranged according to a descending order, and a correlation rank sequence is constructed; and recording each candidate sampling matrix corresponding to each low-rank matrix corresponding to the minimum correlation rank in the correlation rank sequence as a target candidate sampling matrix.

Step S004, obtaining correlation characteristic values corresponding to the target candidate sampling matrixes according to the linear independent row vectors in the low-rank matrix corresponding to the target candidate sampling matrixes; obtaining entropy values corresponding to the target candidate sampling matrixes according to the sparse matrix; and obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value.

In the embodiment, correlation characteristic values and entropy values corresponding to the target candidate sampling matrices are obtained by analyzing the target candidate sampling matrices; then obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value corresponding to each target candidate sampling matrix; the optimal data length is closest to the real period of the traffic data sequence in the data length sequence. The specific process of obtaining the optimal data length corresponding to the flow data sequence is as follows:

and calculating the entropy value of the sparse matrix corresponding to each target candidate sampling matrix, and recording the entropy value as the entropy value corresponding to each target candidate sampling matrix, wherein the smaller the entropy value is, the closer the data length corresponding to the target candidate sampling matrix is to the real cycle of the flow data sequence is. For any target candidate sampling matrix: and obtaining cosine similarities which are greater than a preset similarity threshold value in the cosine similarities between each linear irrelevant row vector in the low-rank matrix corresponding to the target candidate sampling matrix and each corresponding ideal row vector, and recording as the cosine similarities of each target corresponding to the target candidate sampling matrix. Then calculating to obtain the mean value of the cosine similarity of each target corresponding to each target candidate sampling matrix, and recording the mean value as the correlation characteristic value corresponding to each target candidate sampling matrix; the larger the correlation characteristic value is, the closer the data length corresponding to the target candidate sampling matrix is to the real period of the flow data sequence is; obtaining the eigenvalue of each target candidate sampling matrix according to the correlation eigenvalue and the corresponding entropy value corresponding to each target candidate sampling matrix; calculating the eigenvalue of any target candidate sampling matrix according to the following formula:

wherein,

for the eigenvalue of the target candidate sampling matrix, <' >>

For the corresponding correlation feature value of the target candidate sampling matrix, <' >>

Entropy corresponding to the target candidate sampling matrixA value;

The larger the data length is, the closer the data length corresponding to the target candidate sampling matrix is to the real period of the traffic data sequence.

In this embodiment, the data length of the target candidate sampling matrix corresponding to the maximum eigenvalue is recorded as the optimal data length corresponding to the traffic data sequence.

Step S005, obtaining a flow prediction network according to each sub-flow data sequence corresponding to the optimal data length; and inputting the flow data to be predicted into a flow prediction network to obtain the predicted flow data at the next moment.

The embodiment acquires each sub-flow data corresponding to the optimal data length; then, using each sub-flow data corresponding to the optimal data length as an input in the process of training the LSTM flow prediction network, namely, using each sub-flow data corresponding to the optimal data length to train the LSTM flow prediction network to obtain the trained LSTM flow prediction network; the specific training process and network structure of the LSTM traffic prediction network are the prior art, and therefore, detailed description is not provided; and then inputting the flow data to be predicted into the trained LSTM flow prediction network to obtain the predicted flow data at the next moment.

Has the advantages that: in the embodiment, the flow data sequence is uniformly divided by using different data lengths, and a sampling matrix corresponding to each data length is constructed; decomposing each sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to each sampling matrix; then obtaining each candidate sampling matrix according to the rank of the low-rank matrix; screening each candidate sampling matrix according to each linear independent row vector in each low-rank matrix to obtain each target candidate sampling matrix; then obtaining a correlation characteristic value corresponding to each target candidate sampling matrix according to each linear independent row vector in the low-rank matrix corresponding to each target candidate sampling matrix; according to the sparse matrix, entropy values of the sparse matrix corresponding to each target candidate sampling matrix are obtained; then obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value; finally, according to each sub-flow data sequence corresponding to the optimal data length, a flow prediction network is obtained; and inputting the flow data to be predicted into a flow prediction network to obtain predicted flow data at the next moment. The method of the embodiment not only can reduce the labeled data amount and the network training time during network training, but also can improve the prediction accuracy of the traffic prediction network.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A flow prediction method of a distributed network platform is characterized by comprising the following steps:

acquiring a flow data sequence corresponding to a target time period;

uniformly dividing the flow data sequence by using different data lengths to obtain each sub-flow data sequence corresponding to each data length; according to the sub-flow data sequences, a sampling matrix corresponding to each data length is constructed; decomposing the sampling matrix to obtain a sparse matrix and a low-rank matrix corresponding to the sampling matrix;

obtaining a flow prediction network according to each sub-flow data sequence corresponding to the optimal data length; inputting the flow data to be predicted into a flow prediction network to obtain predicted flow data at the next moment;

the method for obtaining each candidate sampling matrix according to the rank of the low-rank matrix comprises the following steps:

acquiring a minimum rank position in the rank sequence;

selecting a preset rank on the left side of each minimum rank position and a preset rank on the right side of each minimum rank position in the rank sequence; recording the sampling matrix corresponding to each minimum rank, the sampling matrix corresponding to the preset rank at the left side of each minimum rank position and the sampling matrix corresponding to the preset rank at the right side of each minimum rank position as candidate sampling matrices;

the method for screening the candidate sampling matrixes according to the linear independent row vectors in the low-rank matrix to obtain the target candidate sampling matrixes comprises the following steps:

according to each linear independent row vector in the low-rank matrix, each equation set corresponding to each linear independent row vector in the low-rank matrix is constructed;

screening the candidate sampling matrixes according to the correlation rank to obtain target candidate sampling matrixes;

the method for obtaining the correlation eigenvalue corresponding to each target candidate sampling matrix according to each linear independent row vector in the low-rank matrix corresponding to each target candidate sampling matrix comprises the following steps:

for any target candidate sampling matrix:

obtaining cosine similarities which are greater than a preset similarity threshold value in the cosine similarities between each linear irrelevant row vector in the low-rank matrix corresponding to the target candidate sampling matrix and each corresponding ideal row vector, and recording as the cosine similarities of each target corresponding to the target candidate sampling matrix;

calculating the mean value of the cosine similarity of each target corresponding to each target candidate sampling matrix, and recording the mean value as the correlation characteristic value corresponding to each target candidate sampling matrix;

the method for obtaining the optimal data length corresponding to the flow data sequence according to the correlation characteristic value and the entropy value comprises the following steps:

obtaining the eigenvalue of each target candidate sampling matrix according to the correlation eigenvalue and the corresponding entropy value corresponding to each target candidate sampling matrix;

recording the data length of the target candidate sampling matrix corresponding to the maximum eigenvalue as the optimal data length corresponding to the flow data sequence;

calculating entropy values of the sparse matrix corresponding to each target candidate sampling matrix, and recording the entropy values as entropy values corresponding to each target candidate sampling matrix;

calculating the eigenvalue of any target candidate sampling matrix according to the following formula:

wherein,

for the eigenvalue of the target candidate sampling matrix, <' >>

And the entropy value corresponding to the target candidate sampling matrix is obtained.

2. The traffic prediction method for distributed network platforms according to claim 1, wherein the method of obtaining each ideal row vector corresponding to each linearly independent row vector comprises:

for any candidate sampling matrix:

obtaining each linear independent row vector in a low-rank matrix corresponding to the candidate sampling matrix; if the linearly independent row vectors are linearly independent row vector a = [ A1 A2 A3], linearly independent row vector B = [ B1B 2B 3], and linearly independent row vector C = [ C1C 2C 3], respectively; the A1, the A2, the A3, the B1, the B2, the B3, the C1, the C2 and the C3 are parameters in a linear independent row vector in the low-rank matrix;

for a linearly independent row vector a:

The second set of equations corresponding to the linearly independent row vector A->

And a third set of equations corresponding to the linearly independent row vector A->

(ii) a Said +>

、

、

Is a coefficient;

according to a first equation group corresponding to the linear independent row vector A, calculating to obtain coefficients in the first equation group

And a value and coefficient>

A value of (d);

according to the coefficient

And a value and coefficient>

；

；

Is greater than or equal to>

A value of (d);

according to the coefficient

And a value and coefficient>

；

；

Is greater than or equal to>

A value of (d);

according to the coefficient

And a value and coefficient>

；

；

3. The method for traffic prediction of a distributed network platform according to claim 2, wherein the method for obtaining the correlation rank of the low-rank matrix corresponding to each candidate sampling matrix according to the cosine similarity between each linearly independent row vector and each corresponding ideal row vector comprises:

for any candidate sampling matrix:

calculating cosine similarities between each linear independent row vector in the low-rank matrix corresponding to the candidate sampling matrix and each ideal row vector corresponding to the candidate sampling matrix to obtain cosine similarities corresponding to a linear independent row vector A, cosine similarities corresponding to a linear independent row vector B and cosine similarities corresponding to a linear independent row vector C;

judging whether the cosine similarity corresponding to the linear irrelevant row vector B is larger than a preset similarity threshold value or not, if so, subtracting 1 from the rank of the low-rank matrix corresponding to the candidate sampling matrix, otherwise, subtracting 0 from the rank of the low-rank matrix corresponding to the candidate sampling matrix;

and counting an accumulated value of the subtracted values of the ranks of the low-rank matrix corresponding to the candidate sampling matrix, and marking a value obtained by subtracting the accumulated value from the rank of the low-rank matrix corresponding to the candidate sampling matrix as the correlation rank of the low-rank matrix corresponding to the candidate sampling matrix.

4. The traffic prediction method for a distributed network platform according to claim 1, wherein the method for obtaining target candidate sampling matrices by screening the candidate sampling matrices according to the correlation rank comprises:

arranging the correlation ranks of the low-rank matrixes corresponding to the candidate sampling matrixes according to a sequence from large to small, and constructing to obtain a correlation rank sequence; and recording each candidate sampling matrix corresponding to the minimum correlation rank in the correlation rank sequence as a target candidate sampling matrix.