CN115730717A

CN115730717A - Power load prediction method and system based on combination of transfer learning strategy and multiple channels

Info

Publication number: CN115730717A
Application number: CN202211433918.6A
Authority: CN
Inventors: 周杭霞
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-03-03

Abstract

The invention discloses a power load prediction method and a system based on a transfer learning strategy and combined with multiple channels, which relate to the technical field of power load prediction and have the technical scheme key points that: the multi-channel CNN and BilSTM combined model fully utilizes the advantages of CNN extracted data local features and BilSTM hidden layer weight back propagation to gradually optimize the training weight, compared with the traditional single-channel model, the multi-channel model adopts convolution kernels with various sizes to learn the features of different details, and fully considers the influence of the features and the time sequence change of the data. The improved layered migration learning strategy can improve the prediction precision of a small-scale data sample or a model when the data sample is insufficient, and compared with a direct migration learning prediction effect, the prediction precision is further improved.

Description

Power load prediction method and system based on combination of transfer learning strategy and multiple channels

Technical Field

The invention relates to the technical field of power load prediction, in particular to a power load prediction method and system based on a transfer learning strategy and combined with multiple channels.

Background

The power load prediction is used as the basis of power system scheduling, and accurate prediction has important guiding significance on the operation planning of the power system. The existing literature has conducted a great deal of research on power load prediction. The linear method represented by the autoregressive moving average model is simple and feasible, but the power load is easily influenced by various factors such as meteorological factors, hours, holidays and the like, the influence of the linear method on the nonlinear factors in the power load prediction is difficult to evaluate, and the prediction accuracy cannot be guaranteed. The common machine learning models applied to the load prediction field comprise support vector regression, random forest algorithm, extreme learning machine and the like, the methods have great advantages in processing nonlinear problems, but the problems of difficult data correlation processing, multiple characteristic dimensions, large data scale and low processing speed still exist.

At present, data are subjected to dimensionality reduction and decomposition during load data preprocessing, common methods for decomposing load data include wavelet decomposition, empirical mode decomposition, variational mode decomposition and an improved mode decomposition method, the wavelet decomposition method relies on subjective experience to set a basis function, the empirical mode decomposition method has the problem of mode aliasing, and reconstruction errors can be caused by introduction of white noise in the improved mode decomposition method. Therefore, the combined model prediction method is widely used, the advantages of the base model on different feature processing capabilities are fully utilized to improve the overall model performance, wherein a Long Short Term Memory Network (LSTM) embodies good nonlinear representation capability and time sequence analysis capability in the prediction process, in the CNN-LSTM combined model, CNN is used for feature extraction of prediction variables, and an LSTM model is used for learning a time sequence rule for the extracted features, so that the CNN-LSTM combined model has better prediction accuracy compared with a single model.

However, the existing load prediction research relies on a large amount of data to train for extracting features, and for the situation of insufficient data samples, it is still difficult to accurately predict the load. Therefore, how to research and design a power load prediction method and system based on a transfer learning strategy and combined with multiple channels, which can overcome the above defects, is a problem that needs to be solved urgently at present.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a power load prediction method and system based on a transfer learning strategy and combined with multiple channels, and when data samples are insufficient, the improved layered transfer learning strategy can effectively reduce prediction errors compared with direct transfer learning.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect, a power load prediction method based on a transfer learning strategy combined with multiple channels is provided, which includes the following steps:

constructing a multi-channel CNN-BilSTM model, and layering according to the model structure;

inputting target domain data into a multichannel CNN-BilSTM model for training after sliding window processing to obtain an initial prediction model;

inputting source domain data into an initial prediction model after sliding window processing, wherein each sliding window can obtain a prediction result;

evaluating the prediction result by the goodness-of-fit to obtain a corresponding goodness-of-fit value;

dividing source domain data according to the magnitude of the goodness-of-fit value, grouping and sequencing according to goodness-of-fit intervals, and removing data with goodness-of-fit lower than a threshold;

sequentially inputting the divided source domain data into different levels of the multi-channel CNN-BilSTM model for hierarchical migration learning by using an improved hierarchical migration learning strategy, and adjusting the model according to the target domain data to obtain a final prediction model;

and predicting the power load by using the final prediction model to obtain a final prediction result.

Further, the multi-channel CNN-BilSTM model comprises four parallel channels;

each channel comprises an input layer, a CNN layer, a BilSTM layer, a splicing operation layer, a full connection layer and an output layer;

the CNN layers of the four channels comprise two layers of serial one-dimensional convolutions, the first layer of one-dimensional convolution is used for improving data dimensionality, and the second layer of one-dimensional convolution is used for extracting feature information of different scales;

the features extracted by the CNN layer are transmitted to two serial BilSTM layers as input, and each BilSTM layer models the frequency and time level of input data;

the output of the last four BilSTM layers from the four channels is spliced and transmitted to a full connection layer;

and finally, the data is changed into one dimension by the full connection layer to obtain the final output.

Further, the layering of the multichannel CNN-BilSTM model specifically comprises:

in the first channel: the 1 st one-dimensional convolution is used as an L1 layer, the 2 nd one-dimensional convolution is used as a second layer, the 1 st BiLSTM is used as an L3 layer, and the 2 nd BiLSTM is used as an L4 layer;

the whole channel model of the second channel is used as an L5 layer;

the whole channel model of the third channel is used as an L6 layer;

the fourth channel has the entire channel model as the L7 layer.

Further, the goodness-of-fit is a value of the degree of variation interpretable by the independent variable as the total degree of variation.

Further, the goodness-of-fit is corrected according to the number of samples and the number of features to eliminate the influence of the number of samples.

Further, the calculation formula of the goodness of fit specifically includes:

wherein R is ² _Adjusted Expressing the goodness of fit after correction; n represents the number of samples; p represents the number of features; SSE represents the sum of squared errors; SST denotes the sum of the squares of the total deviations.

Further, the target domain data and the source domain data convert the text data into numerical data by performing label coding on the features.

Further, the characteristics include holidays, non-holidays, workdays, saturdays, sundays, daylight savings times, and winter times.

Further, the specific process of the hierarchical transfer learning is as follows:

s1: sorting the source domain data according to the fitting goodness, wherein the data are divided into { X1, X2, \8230;, XL } L groups;

S2：X ₁ inputting the training to an L1 layer of the model, freezing the other layers, keeping the weight of the training of the layer, and freezing the L1 layer;

S3：X ₂ inputting the training to an L2 layer of the model, loading the weight of the previous layer, and judging whether the effect is improved;

s4: if the effect of the current layer is improved, the layer and the previous layer, the layer and the previous two layers are unfrozen respectively, and the rest is done in the same way until the layer and the previous layer are unfrozen completely; comparing the prediction effects of different thawing layer numbers, and keeping the training weight with the most effect improvement;

s5: if the current layer effect is not improved, the layer weight is not reserved; training the layer by using the upper layer data, and if the effect is not improved yet, training by using the data of the upper layer without weight preservation during the next layer training;

s6: and so on until X _L Training is obtained, and corresponding weights are reserved.

In a second aspect, a power load prediction system based on a transfer learning strategy and combined with multiple channels is provided, which includes:

the model layering module is used for constructing a multichannel CNN-BilSTM model and layering according to the model structure;

the data training module is used for inputting target domain data into the multichannel CNN-BilSTM model for training after the target domain data is processed by a sliding window to obtain an initial prediction model;

the data prediction module is used for inputting the source domain data into the initial prediction model after the source domain data are processed by sliding windows, and each sliding window can obtain a prediction result;

the evaluation analysis module is used for evaluating the prediction result by the goodness-of-fit to obtain a corresponding goodness-of-fit value;

the sequencing and dividing module is used for dividing the source domain data according to the fitting goodness value, grouping and sequencing according to the fitting goodness interval and removing the data with the fitting goodness lower than a threshold value;

the migration learning module is used for sequentially inputting the divided source domain data into different levels of the multi-channel CNN-BilSTM model for layered migration learning by using an improved layered migration learning strategy, and adjusting the model according to the target domain data to obtain a final prediction model;

and the load prediction module is used for predicting the power load by using the final prediction model to obtain a final prediction result.

Compared with the prior art, the invention has the following beneficial effects:

1. the multi-channel CNN and BilSTM combined model fully utilizes the advantages of CNN extracted data local characteristics and BilSTM hidden layer weight back propagation to gradually optimize the training weight, compared with the traditional single-channel model, the multi-channel model uses convolution kernels with various sizes to learn the characteristics of different details by using convolution layers connected in parallel, the influence of the characteristics and the time sequence change of data are fully considered, the improved layered migration learning strategy can improve the prediction precision of a small-scale data sample or a model with insufficient data samples, and the prediction effect is further improved compared with the direct migration learning;

2. according to the method, the influence of the number of samples on the value of the sample is eliminated through the corrected goodness-of-fit, and the prediction accuracy under the condition of less number of samples can be further improved;

3. the text data is converted into numerical data by adopting a mode of carrying out label coding on the characteristics, so that model calculation is facilitated;

4. according to the structural characteristics of the multi-channel model, the data of different fitting goodness can be more fully trained by layering from shallow to deep.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart in an embodiment of the invention;

FIG. 2 is a block diagram of a multi-channel CNN-BilSTM model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-channel CNN-BilSTM model in an embodiment of the present invention in layers;

FIG. 4 is a diagram illustrating source domain data partitioning according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the distribution of power load data in an embodiment of the present invention;

FIG. 6 is a schematic diagram of prediction errors for different numbers of migration layers in an embodiment of the present invention;

FIG. 7 is a comparison of predicted values for an embodiment of the present invention;

fig. 8 is a block diagram of a system in an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1: the method for predicting the power load based on the migration learning strategy and combining multiple channels is specifically realized by the following steps as shown in fig. 1.

The method comprises the following steps: and constructing a multi-channel CNN-BilSTM model, and layering according to the model structure.

As shown in FIG. 2, the multi-channel CNN-BilSTM model includes four parallel channels; each channel comprises an input layer, a CNN layer, a BilSTM layer, a splicing operation layer, a full connection layer and an output layer; the CNN layers of the four channels comprise two layers of serial one-dimensional convolutions, the first layer of one-dimensional convolution is used for improving data dimensionality, and the second layer of one-dimensional convolution is used for extracting feature information of different scales; the features extracted by the CNN layer are used as input and transmitted to two serial BilSTM layers, and each BilSTM layer models the frequency and time level of input data; the output of the last four BilSTM layers of the four channels is spliced and transmitted to a full connection layer; and finally, the data is changed into one dimension by the full connection layer to obtain the final output.

As shown in fig. 3, the multi-channel CNN-BiLSTM model specifically comprises the following layers: in the first channel: taking the 1 st one-dimensional convolution as an L1 layer, taking the 2 nd one-dimensional convolution as a second layer, taking the 1 st BiLSTM as an L3 layer, and taking the 2 nd BiLSTM as an L4 layer; the whole channel model of the second channel is used as an L5 layer; the whole channel model of the third channel is used as an L6 layer; the fourth channel has the entire channel model as the L7 layer.

Step two: and inputting the target domain data into a multi-channel CNN-BilSTM model for training after sliding window processing to obtain an initial prediction model.

Step three: and inputting the source domain data into an initial prediction model after the source domain data is processed by sliding windows, wherein each sliding window can obtain a prediction result.

The external characteristics and the load value of the sliding window of each time step are taken as input, and specific input at the time t is as follows:

X＝{X _t-T+1 ,X _t-T+2 ,X _t-T+3 ,...,X _t }

X _t ＝{load _t ,daytype _t ,holiday _t ,climate _t }

wherein, X is an input sequence and represents the load values of the T time points before and at the time T; t is the size of the sliding window; x _t Input characteristic data representing time t; load _t Representing the load at time t; daytype _t Is a date type; holiday _t Whether the data is a holiday or not is represented; climate _t Representing a time-of-flight feature.

The output form of the prediction model is as follows:

Y＝{load _t+1 ,load _t+2 ,...,load _t+120 }

in the formula, Y is an output sequence and indicates the predicted load value at N time points after time t.

Step four: and evaluating the prediction result by using the goodness-of-fit to obtain a corresponding goodness-of-fit value.

Goodness of fit refers to the ratio of the degree of variation that can be explained by the independent variable to the total degree of variation. The sum of the squared differences between the sample values and the predicted values is called the sum of the squared errors. The total sum of squared deviations equals the sum of squared errors plus the sum of squared regressions. The higher the goodness of fit, the higher the degree to which the data can be interpreted by the model, and the better the regression model. The specific calculation formula is as follows:

SSR＝SST-SSE

in the formula, n is the number of samples of the test set; y is _i Is the true value;

is the sample mean;

is a predicted value; SSR is regression sum of squares; SSE is the sum of the squares of the errors; SST is the sum of the squares of the total deviations.

Because the value of the goodness-of-fit increases with the number of the independent variables, the corrected goodness-of-fit eliminates the influence of the number of samples on the value on the basis of the goodness-of-fit, and the calculation formula is as follows:

wherein R is ² _Adjusted Expressing the goodness of fit after correction; n represents the number of samples; p represents the number of features; SSE represents the sum of squared errors; SST denotes total dissociationThe difference is the sum of squares.

Step five: as shown in fig. 4, in order to find data with more similar goodness of fit to the data of the source domain and the target domain, the data of the source domain is divided according to the goodness of fit value, and the data with goodness of fit lower than the threshold are removed according to the grouping and sorting of the goodness of fit intervals.

Specifically, the data quantity of 1/4, 2/4 and 3/4 of the target domain is used for simulating the data missing situation, the data with the goodness of fit smaller than sigma in the source domain data are removed from the source domain data, and the fitting degree of the data with the target domain data is considered to be not high; and the goodness-of-fit interval corresponding to the source domain data after being eliminated is [ sigma, 1], and then the source domain data are sorted according to the interval after being divided according to the goodness-of-fit.

Step six: and sequentially inputting the divided source domain data into different levels of the multi-channel CNN-BilSTM model by using an improved hierarchical migration learning strategy for hierarchical migration learning, and adjusting the model according to the target domain data to obtain a final prediction model.

Step seven: and predicting the power load by using the final prediction model to obtain a final prediction result.

The specific process of the hierarchical migration learning comprises the following steps:

S3：X ₂ inputting the weight into an L2 layer of the model for training, loading the weight of the previous layer, and judging whether the effect is improved;

s6: and so on until X _L Training is performed and the corresponding weights are retained.

In addition, the load data distribution situation shows that the fluctuation of the power load is regular in summer, and the peak-valley value fluctuation of the load is larger in winter; the overall load shows periodic changes in units of weeks, with higher daily load values from monday to friday, in order of fluctuations, lower saturday than weekday load values, and lower sunday than saturday; if a certain day is a major activity or a holiday (as the first peak of the power load graph of a week in winter in the above graph, the date corresponding to the peak is monday), the load value of the day is lower. This concludes 7 features: holidays, non-holidays, workdays, saturdays, sundays, daylight savings times, and winter times. The above-described features of the data set are tag-encoded, with each non-numerical feature encoding shown in table 1 below.

TABLE 2 feature coding

Holiday/non-holiday	Workday/saturday/sunday	Summer time/winter time
			0/1	0/1/2	0/1

The purpose of tag encoding is to convert text data into numerical data to facilitate model computation.

Experimental validation analysis

1. The experimental Data in the invention is derived from load Data of 1 year (12 months) in each state in the United states acquired by an Open Energy Data Initiative (OEDI) website. The facility power load data of different hospitals in 9 areas are selected as the experimental data set of the invention, wherein the facility power load data of the hospitals in eight different areas are used as source area data, the data with the data volume of 12 months is used as sufficient data (8760 data at 24 time points every day in 365 days), and the total data of 8 areas is (8 × 8760). The remaining 1 area is used as a target area, and the data of the target areas 1/4, 2/4 and 3/4 (namely the data of the target areas for 3 months, 6 months and 9 months) are selected to simulate the situation of insufficient data samples. The training set and the test set of the target domain data are divided according to the proportion of 5.

The normalized distribution of a portion of representative load data is shown in fig. 5. Because the power load numerical range of the selected data set fluctuates within the range of 600-1500 kilowatt-hours (KW.h), the numerical span is large, and the training of the model is not facilitated, the data is normalized firstly. And a small amount of abnormal values exist in the data, and in order to improve the prediction effect, the abnormal values of the original power load data are processed by using a mean square value method.

2. The invention uses 3 indexes of Mean Absolute Error (MAE), mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) to evaluate the experimental result of the embodiment. The evaluation index calculation formula is shown below.

Wherein n is the number of prediction samples,

as model predicted value, y _i ＝{y ₁ ,y ₂ ,...,y _n The real value.

3. Parameter setting

The parameters of the CNN layer are set as follows: the Conv1D convolution adopts Same filling, the first-layer one-dimensional convolution uses convolution calculation with four convolution kernels of which the size is 1, the step length is also 1 and the number of the convolution kernels is 16 for improving the data dimensionality; the second layer of one-dimensional convolution uses convolution calculation that the sizes of four convolution kernels are respectively 2, 4, 6 and 8, the step lengths are respectively set to be 1, 2, 3 and 4, and the number of the convolution kernels is 32 for extracting feature information of different scales. The convolution layer and the activation function layer adopt a batch standardization mode to prevent gradient disappearance and gradient explosion and accelerate the learning process; the activation function is Leaky Relu, so that the input layer can effectively learn even under the condition of a negative value.

The parameters of the BilSTM layer are set as follows: the activation function is Sigmoid, dropout is set to 0.2, and the two-layer BilSTM output dimensions are both 64. In the training process, adam is adopted by the optimizer, the loss function is MSE, the iteration number is set to 300, and the batch size is set to 32. Training time is reduced by adopting an Early Stopping mode (Early Stopping) strategy and a Learning Rate attenuation (Reduce Learning Rate) strategy, and the Patients of Early Stopping and Reduce Learning rates are respectively set to be 7 and 23. The initial learning rate is set to 0.01, the lowest learning rate is set to 1e-5, and the attenuation coefficient is 0.5. In the process of model fine tuning, weights of all layers before the model full-connection layer are kept unchanged, only the full-connection layer is trained, the learning rate is set to be 1e-4, and the batch size is set to be 16.

4. Performance verification

1. Base model performance

In order to verify the performance of the multichannel CNN-BilSTM neural network model provided by the method, three models of LSTM, bilSTM and CNN-LSTM are compared in experiments. All models are trained and predicted under the same data set (enough data of 12 months in a target domain), the model in the table is a multi-channel CNN-BilSTM model, and the comparison result of prediction errors of all models is shown in Table 2.

TABLE 2 comparison of prediction errors for different models

Model/index	LSTM	BiLSTM	CNN-BiLSTM	Text model
					MAE	72.28	67.25	62.50	59.07
MAPE/％	6.83	6.48	6.11	5.60
					RMSE	85.66	83.00	79.06	72.15

As can be seen from the table, when the bidirectional LSTM and the unidirectional LSTM are trained on the same data set, 3 evaluation indexes of the BiLSTM model are all reduced, which indicates that the bidirectional LSTM is better represented on the data set; the combination of the CNN performance on the basis of the bidirectional LSTM is further improved; compared with CNN-BilSTM without using multiple channels, the model has higher prediction accuracy, and 3 evaluation indexes are respectively reduced by 5.49%,8.35% and 8.72%.

2. Improved hierarchical migration learning performance

The following data and training methods were used to perform comparative experiments using multichannel CNN-BilSTM as the basis model:

A. data loss 1: target domain 3 months data;

B. data missing 2: target domain 6 months data;

C. data missing 3: target domain 9 months data;

D. no migration: directly training and predicting without using transfer learning;

E. direct migration: training and predicting target domain data and partial source domain data (source domain data left after eliminating source domain data with low goodness of fit) by adopting a common transfer learning method;

F. layered migration: data and direct migration (E) were used, trained and predicted using the hierarchical migration learning strategy presented herein.

The predicted result error pairs for the missing data case are shown in table 3.

TABLE 3 data loss experiment comparison results

As can be seen from the above table, in the case that the target domain data is only 3 months, the prediction error is large, and the prediction error gradually decreases with the increase of the target domain data samples; the data with insufficient target domain and partial data of the source domain are used, a direct migration method is adopted, and compared with the method without migration learning, the prediction result is improved to a certain extent; compared with direct transfer learning, the prediction error of the layered transfer learning strategy provided by the method is further reduced.

For the convenience of further analysis later, by taking the data of the target domain missing for 6 months as reference, compared with direct transfer learning, the three evaluation indexes have the advantages that the MAE index is reduced by 6%, the MAPE is reduced by 5.75%, and the RMSE is reduced by 4.28%.

After the migration of each layer is finished and before the migration of the next layer is started, the prediction error of the current layer is calculated, and the prediction errors of different migration layers and the prediction error after the fine adjustment by using the target domain data are shown in fig. 6 as follows. It can be seen from the figure that the prediction error after the transfer learning of each layer is reduced with the increasing number of layers, and in the figure, after the transfer of the seventh layer is finished, the prediction error after the normalization is reduced remarkably after the model is finely adjusted by using the target domain data.

The actual values of the predicted points, the predicted values of data loss (no migration), the predicted values of the direct migration learning strategy, and the predicted values of the hierarchical migration learning proposed herein (the result after model fine tuning is used as the text method) are used for comparison, and the predicted value comparison results are shown in fig. 7. In the figure, the ordinate represents the load value, the abscissa represents the serial number of the sample point, and as can be seen from the figure, the predicted result of the method is best fitted with the true value at the peak valley of the curve, especially in comparison with the case of data loss, the predicted date type corresponding to five days in fig. 7 is from wednesday to sunday, and it can be seen from the figure that the working day load value is higher than weekend, the saturday load value is higher than sunday, the load and time feature presents periodicity, and the validity of the periodic feature selected in the figure is verified.

Example 2: the power load prediction system based on the combination of the transfer learning strategy and multiple channels is used for implementing the prediction method described in embodiment 1, and as shown in fig. 8, the prediction system includes a model layering module, a data training module, a data prediction module, an evaluation and analysis module, a sorting and dividing module, a transfer learning module, and a load prediction module.

The model layering module is used for constructing a multichannel CNN-BilSTM model and layering according to the model structure; the data training module is used for inputting target domain data into a multichannel CNN-BilSTM model for training after the target domain data is processed by a sliding window to obtain an initial prediction model; the data prediction module is used for inputting the source domain data into the initial prediction model after the source domain data are processed by sliding windows, and each sliding window can obtain a prediction result; the evaluation analysis module is used for evaluating the prediction result by the goodness-of-fit to obtain a corresponding goodness-of-fit value; the sequencing and dividing module is used for dividing the source domain data according to the fitting goodness value, grouping and sequencing according to the fitting goodness interval and removing the data with the fitting goodness lower than a threshold value; the migration learning module is used for sequentially inputting the divided source domain data into different levels of the multi-channel CNN-BilSTM model for layered migration learning by using an improved layered migration learning strategy, and adjusting the model according to the target domain data to obtain a final prediction model; and the load prediction module is used for predicting the power load by using the final prediction model to obtain a final prediction result.

The working principle is as follows: the multichannel CNN and BilSTM combined model fully utilizes the advantages of CNN extracted data local features and BilSTM hidden layer weight back propagation to gradually optimize training weights, compared with the traditional single-channel model, the multichannel model adopts convolution layers with various sizes to learn different detailed features, and fully considers the influence of the features and the time sequence change of data. The improved layered migration learning strategy can improve the prediction precision of a small-scale data sample or a model when the data sample is insufficient, and compared with a direct migration learning prediction effect, the prediction precision is further improved.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The power load prediction method based on the combination of the transfer learning strategy and multiple channels is characterized by comprising the following steps of:

evaluating the prediction result by using the goodness-of-fit to obtain a corresponding goodness-of-fit value;

2. The method for predicting power load based on combination of transfer learning strategy and multi-channel as claimed in claim 1, wherein the multi-channel CNN-BiLSTM model comprises four parallel channels;

each channel comprises an input layer, a CNN layer, a BilSTM layer, a splicing operation layer, a full-connection layer and an output layer;

the output of the last four BilSTM layers of the four channels is spliced and transmitted to a full connection layer;

3. The method for predicting power load based on combination of the transfer learning strategy and multiple channels according to claim 2, wherein the multiple channels CNN-BilSTM model is layered as follows:

in the first channel: taking the 1 st one-dimensional convolution as an L1 layer, taking the 2 nd one-dimensional convolution as a second layer, taking the 1 st BiLSTM as an L3 layer, and taking the 2 nd BiLSTM as an L4 layer;

the whole channel model of the second channel is used as an L5 layer;

the whole channel model of the third channel is used as an L6 layer;

the fourth channel has the entire channel model as the L7 layer.

4. The method according to claim 1, wherein the goodness-of-fit is a value of a variation level interpretable by an independent variable to a total variation level.

5. The method according to claim 4, wherein the goodness-of-fit is corrected according to the number of samples and the number of features to eliminate the influence of the number of samples.

6. The method for predicting the power load based on the combination of the transfer learning strategy and the multiple channels according to claim 5, wherein the calculation formula of the goodness-of-fit is specifically as follows:

wherein R is ² _Adjusted Expressing the goodness of fit after correction; n represents the number of samples; p represents the number of features; SSE denotes errorThe difference is the sum of squares; SST denotes the sum of the squares of the total deviations.

7. The method according to claim 1, wherein the target domain data and the source domain data are obtained by converting text data into numerical data by tag coding features.

8. The method of claim 7, wherein the characteristics include holiday, non-holiday, weekday, saturday, sunday, daylight savings time, and winter time.

9. The method for predicting the power load based on the combination of the transfer learning strategy and the multiple channels according to claim 1, wherein the specific process of the hierarchical transfer learning is as follows:

10. A power load prediction system based on a transfer learning strategy and combined with multiple channels is characterized by comprising:

the data training module is used for inputting target domain data into a multichannel CNN-BilSTM model for training after the target domain data is processed by a sliding window to obtain an initial prediction model;