CN115238592A - Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method - Google Patents
Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method Download PDFInfo
- Publication number
- CN115238592A CN115238592A CN202210967237.1A CN202210967237A CN115238592A CN 115238592 A CN115238592 A CN 115238592A CN 202210967237 A CN202210967237 A CN 202210967237A CN 115238592 A CN115238592 A CN 115238592A
- Authority
- CN
- China
- Prior art keywords
- state
- action
- strategy
- function
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/04—Power grid distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/02—Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-time-scale meteorological prediction, distribution and parallel trust strategy optimization power generation control method, which combines multi-time-scale meteorological prediction, distribution and parallel trust strategy optimization neural networks and is used for power generation control of a novel power system. Firstly, the multi-time scale meteorological prediction in the method is used for processing meteorological data of different time scales and predicting future meteorological changes. Secondly, the distributed parallel trust strategy optimization method in the method is used for coordination and quick reaction among power plants in the region. The improved multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method can solve the problem of fast and stable regulation and control of novel power systems with different time scales under the condition of continuously changing weather, realizes the function of power generation control of the novel power systems through meteorological prediction, optimizes regulation and control precision and improves regulation and control speed.
Description
Technical Field
The invention belongs to the field of power generation control of novel power systems of power systems, relates to artificial intelligence, quantum technology and a power generation control method, and is suitable for power generation control of novel power systems and comprehensive energy systems.
Background
The problem that environmental factors are not fully considered exists in the automatic power generation control of the existing novel power system, and the novel power system cannot accurately track the environment to be adjusted.
In addition, the traditional strategy optimization network needs a large amount of data for training, and the network training speed is slow and dimensionality disaster is easy to occur due to the large dimensionality of the data.
Therefore, the multi-time-interval meteorological prediction distribution parallel trust strategy optimization power generation control method is provided, the problem that the novel power system cannot accurately track the environment for adjustment can be solved, the training speed of the novel power system is accelerated, and the dimension disaster is eliminated.
Disclosure of Invention
The invention provides a multi-time-interval meteorological prediction distribution parallel trust strategy optimization power generation control method, which combines multi-time-scale meteorological prediction, distribution parallel and trust strategy optimization neural networks and is used for power generation control of a novel power system; the method for optimizing the power generation control by the multi-time-distance meteorological distribution parallel trust strategy comprises the following steps in the use process:
step (1): defining each controlled power generation area as an Agent, and marking each area as { Agent 1 ,Agent 2 ,…,Agent i H, where i is the index of the respective power generation region; all power generation areas are not interfered with each other and are mutually connected, so that the robustness is high;
step (2): initializing parameters of a stack self-coding neural network and a gating cycle unit, collecting a meteorological wind intensity and illumination intensity data set in three years, extracting a meteorological characteristic data set, and inputting the meteorological characteristic data set into the stack self-coding neural network;
the stacked self-coding neural network is formed by stacking a plurality of self-coding neural networks, wherein x is an input meteorological characteristic data vector, x is an n-dimensional vector, and x belongs to R n (ii) a Will self-encode neural networks AE 1 Is hiddenHidden layer h (1) As a self-encoding network AE 2 Input of (3), training of self-encoding network AE 2 Then the self-encoding network AE is further processed 2 Hidden layer h of (2) As a self-encoding network AE 3 The input of (2) and so on; after the data are stacked layer by layer, the feature dimension of the weather data can be reduced, the training speed of the gate control circulation unit can be accelerated, and meanwhile, the key information of the data can be stored; the dimensions of each hidden layer are different, so that:
wherein h is (1) Is a self-coding neural network AE 1 Hidden layer of (a), h (2) Is a self-coding neural network AE 2 Hidden layer of h (p -1) Is a self-coding neural network AE p-1 Hidden layer of (a), h (p) Is a self-coding neural network AE p A hidden layer of (a); w is a group of (1) Is a hidden layer h (1) Parameter matrix of, W (2) Is a hidden layer h (2) Parameter matrix of, W (p) Is a hidden layer h (p) A parameter matrix of (2); b is a mixture of (1) Is a self-coding neural network AE 1 Bias of (b), b (2) Is a self-coding neural network AE 2 B (p) is a self-encoding neural network AE p Bias of (c); f () is an activation function; p is the number of layers of the stacked self-coding network stack; softmax () is a normalized exponential function, used as a classifier;
in a stacked self-coding neural network, if h: (b) ((c)) 1 ) Is m dimension, h: ( 2 ) For k dimensions, from self-encoding networks AE 1 Stack to self-encoding network AE 2 Training a network of n → m → k structure; training the network n → m → n to get the transformation of n → m, then training the network m → k → m to get the transformation of m → k, and finally self-encoding the network AE 1 And self-encoding network AE 2 Stacking to obtain a network n → m → k; through a self-encoding network AE 1 To AE p Stacking layer by layer, and finally solving an output vector through a softmax functionAfter training of the stack self-coding neural network, certain initial value network parameters and weather characteristics after dimension reduction are obtainedAs an output;
and (3): order toOutput vector for time t self-coding neural networkPre-training the stack self-coding neural network to obtain the initial values of the network parameters and the meteorological featuresThe inputs of the refresh gate and the reset gate in the gated cyclic unit are set asThe outputs of the update gate and the reset gate in the gated loop unit are respectively:
wherein z is t Updating the output of the gate for time t, r t Reset the output of the gate for time t, h t-1 For the hidden state of the gated-cyclic unit at time (t-1), x t For input at time t, [ 2 ]]Denotes that two vectors are connected, W z To update the weight matrix of the gate, W r σ () is a sigmoid function, which is a weight matrix of the reset gate;
the gate control circulation unit discards and memorizes the input information through two gates to obtain a candidate hidden state value at the time tComprises the following steps:
wherein tanh () represents a tanh activation function,as a hidden state valueA weight matrix of (a); * Representing the product of the matrices;
after the tanh activation function obtains updated state information through the update gate, vectors of all possible values are created according to input, and candidate hidden state values are obtained through calculationThen calculating the state h at the time t through the network t ,h t Comprises the following steps:
the reset gate determines the number of past states that are desired to be remembered; when r is t 0, state information h at time (t-1) t-1 Can be forgotten to be in a hidden stateWill be reset to the information input at time t; the update gate determines the number of past states in the new state; when z is t When 1, the hidden stateUpdate to state h at time t t (ii) a The gate control circulation unit reserves important characteristics through a gate function by updating and resetting the gate storage and filtering information, and captures dependence items through learning so as to obtain an optimal meteorological predicted value;
and (4): after the training of the stack self-coding neural network and the gating cycle unit is finished, meteorological data to be predicted are input into the gating cycle unit through the stack self-coding neural network, and an obtained meteorological predicted value is input into the novel power system; in a novel power system, three parallel trust strategy optimization networks of short-time distance, medium-time distance and long-time distance are arranged in each power generation area, wherein the short-time distance is one day, the medium-time distance is fifteen days, and the long-time distance is three months;
and (5): initializing parameters of a parallel trust strategy optimization network in each region, setting a parallel trust strategy optimization network strategy, and initializing a parallel expectation value table in the parallel trust strategy optimization network, wherein the initial expectation value is 0;
and (6): setting the iteration frequency as X, setting the initial value of the search frequency as a positive integer V, and initializing the search frequency of each intrinsic action as V = V;
and (7): in the current state, the parallel trust strategy optimization network in each agent selects an action by means of a strategy to obtain an award value corresponding to the action in the current environment, and feeds the obtained award value back to a parallel expected value table, and then the iteration number is increased by one; if the current iteration times are equal to X, the iteration is completed, and a trained parallel trust strategy optimization network is obtained;
and (8): carrying out strategy optimization and parallel value optimization in a parallel trust strategy optimization network, wherein the optimization method comprises the following steps:
the core of the parallel trust optimization strategy network is an actor-critic method; in policy optimization of a parallel belief optimization policy network, the Markov decision process is the tuple (S, A, P, r, ρ) 0 Gamma), wherein S is a state space consisting of wind power intensity, illumination intensity, frequency deviation delta f, area control error ACE and tie line power exchange assessment index CPS, and any state S belongs to S; a is the power variation Δ P of different magnitude Gi ,i=1,2,...,2 j A space of motion where j is the quantum superposition state motion | A>Any action a belongs to A; p is a transition probability distribution matrix for transitioning from an arbitrary state s to a state s' through an arbitrary action a; r () is a reward function; ρ is a unit of a gradient 0 Is in an initial state s 0 Probability distribution of(ii) a Gamma is a discount factor; let π denote the random strategy π S × A → [0,1](ii) a The desired jackpot function η (π) under policy π is:
wherein s is 0 Is in an initial state, a 0 Is in a state s 0 Action of pi selection, gamma, of lower random strategy t For a discount factor at time t, s t Is the state at time t, a t The operation at time t, r(s) t ) Is in a state s t The value of the prize to be paid down,representing a state s in strategy pi t Downward pair action a t Sampling is carried out;
introducing a state-action value function Q π (s t ,a t ) Function of state values V π (s t ) Advantage function A π (s, a) and probability distribution function ρ π (s);
Function Q of state-action value π (s t ,a t ) Finding the state s in strategy pi t Lower execution action a t Late jackpot, state-action value function Q π (s t ,a t ) Comprises the following steps:
wherein s is t+1 The state at time (t + 1); s t+l The state at time (t + l); a is t+1 An action at time (t + 1); a is a t+l Is the action at time (t + l);representing a state s in strategy pi t+l Downward pair action a t+l Sampling is carried out; gamma ray t+l A discount factor for time (t + l); l is a positive integer; r(s) t+l ) Is in a state s t+l A reward value of;
function of state value V π (s t ) Finding the state s in strategy pi t Accumulated award of, V π (s t ) Is Q π (s t ,a t ) Regarding action a t Of the mean value, the function of the state value V π (s t ) Comprises the following steps:
dominance function A π (s, a) to find the advantage of taking an arbitrary action a in an arbitrary state s compared to the average, the advantage function A π (s, a) is:
A π (s,a)=Q π (s,a)-V π (s) (8)
Q π (s, a) is the state s t Is in an arbitrary state s, action a t A state-action value function for any action a; v π (s) is the state s t Is a function of the state value at any state s;
probability distribution function ρ π (s) solving probability distribution under any state s in strategy pi, probability distribution function rho π (s) is:
wherein P(s) t = s) is the state s at time t t A transition probability distribution matrix when in an arbitrary state s;
the parallel value optimization of the parallel trust optimization strategy network is to a state-action value function Q π (s t ,a t ) Performing parallel optimization; state-action value function Q π (s t ,a t ) Is optimized towards Q π (s t ,a t ) Introducing quantum bits and a Grover search method to accelerate the network training speed and eliminate the dimensionality disaster;
parallel value optimization will act a t Quantising and using in state s t Updating the searching times V instead of the state s t The action probability is updated as follows:
is provided with an action space 2 j An intrinsic action of 2 j An intrinsic action | a t >Is represented by the superposition ofWill act a t Quantization to j-dimensional quantum superposition state action | A>,|A>Is each qubit of |0>And |1>Superposition of the two states; quantum superposition state motionEquivalent to | A>(ii) a j-dimensional quantum superposition state motion | A>The expression of (a) is:
wherein | a>Is a j-dimensional quantum superposition state action | A>Observed quantum motion; c a Is a quantum motion | a>A probability amplitude; i C a | Quantum action | a>A modulus value of (a) satisfies
When the quantum superposition state motion | A > is observed, | A > will collapse to a quantum motion | a >, and each qubit of quantum motion | a > is |0> or |1>; each qubit of quantum action | a > has a value representing its desired value; the expected values of these qubits are different in different states; the expected values are used for selecting actions in a certain state, and the updating rules of the expected values are as follows:
if in strategy π, in state s t Down, quantum superposition state action | A>Collapse into a quantum motion | a>(ii) a The action | a is such that the qubit in each qubit is |0 when the jackpot η increases>Is reduced, the qubit is |1>The expected value of (d) increases; updating the increment and decrement of the expected value according to strategy pi, and setting the quantum position with positive expected value of each quantum position of the action as |1 when the action is selected in the same state>The rest of the qubits are |0>Obtaining quantum motion; let the parameter vector of strategy pi beNormalizing the quantum action to a specific action value and passing through a parameter vectorConverted into power variation quantity delta P G1 As Agent in this state 1 Outputting; wherein theta is 0 ,θ 1 ,…,θ u As a vector of parametersU is a positive integer;
quantum action selection is to obtain state s at time t by Grover search method t Observing quantum superposition state action | A>Quantum action | a obtained by collapsing>The obtaining and updating rules of the search times V are as follows:
first, all intrinsic actions are superimposed with equal weight, and j Hadamard gates are sequentially applied to j independent qubits in initial states of |0>, resulting in:
wherein H is a Hadamard gate, H being capable of coupling the ground state |0>Conversion to equal weight superposition Is the initialized quantum superposition state action, which is composed of 2 j The actions with the same probability amplitude are superposed together;means that j Hadamard gates are sequentially applied to 2 j An initialization intrinsic action; two parts of the Grover iterative operator are U |a> Andrespectively as follows:
where I is a unitary matrix of appropriate dimensions;<a | is a quantum action | a>Reverse quantum action of (d);<0| is a quantum action |0>Reverse quantum action of (d);is a quantum motionThe reverse quantum action of (2); | a><a | is a quantum action | a>The outer product of (d);is initialized quantum superposition state actionThe outer product of (d); u shape |a> Andis a quantum black box; u shape |a> Acting on the intrinsic action | a t >While, U |a> Change and | a>The phase of the upward state in the same direction is changed by 180 degrees;acting on the intrinsic action | a t >When the utility model is used, the water is discharged,change andthe phase of the state in the same direction is changed by 180 degrees;
record Grover iteration as U Grov :
According to U Grov Iterating each intrinsic action, obtaining the closest iteration times under the minimum iteration times, updating the closest iteration times into the iteration times V corresponding to each intrinsic action, and updating into a strategy pi;
from equation (9), the state s at time t t For any state s, the strategy is updated from pi toExpected cumulative reward function of timeComprises the following steps:
whereinIs an updated policy; p(s) t = s) as being in policyState s at time t t A transition probability distribution matrix at any state s; a. The π (s, a) isOn-policyState s at time t t Is a merit function at any state s;is in the policy ofState s at time t t A transition probability distribution matrix at any state s;is in the policyMiddle state s t Downward pair action a t Sampling is carried out;is in the policySampling value a in any state s;is in the policy ofProbability distribution in any state s; a. The π (s t ,a t ) Is the state s at time t t Take action a t Superiority over average; η (π) is the desired jackpot function under strategy π;
at an arbitrary state s, there areWhereinFor updated policiesState s at time t t The accumulated reward eta can be improved by the action value selected down, or the accumulated reward eta is kept unchanged under the condition that the expected advantage is zero, so that the strategy is continuously updated to optimize the accumulated reward eta;
since the updated strategy needs to be calculated in equation (14)Probability distribution ofThis results in higher complexity of equation (7), which is difficult to optimize, and introduces a substitution function to reduce the complexity of the calculation
Wherein argmax () is a function that finds the largest argument in the function; rho π (s) is the probability distribution function at any state s in strategy pi;
substitution functionAndis distinguished by a substitution functionIgnoring the change in state access density caused by the policy change,using rho π As access frequency rather than as access frequency Access frequency p of π Obtained by using an approximation to the strategy pi when the strategy pi is equal toSubstitution function when certain constraints are satisfiedCan replace the original desired jackpot function
In the parameter vectorIn the updating of (2), using the parameter vectorParameterizing strategy pi in the form of arbitrary parameter theta Is in a parameterized strategyAny action a in any state s; for any parameter θ, when the policy is not updated, the substitution functionAnd original cumulative reward functionAre exactly equal, i.e. there are:
when the derivative of the substitute function and the original cumulative reward function with respect to the arbitrary parameter theta is in the strategy pi θ Are identical, i.e. the policy is fromIs updated toIf there is a very small change, the function value is replacedIf the cumulative reward eta increases, then the strategy can be improved by using the alternative function as the optimization goal, namely:
equations (16) and (17) illustrate the strategy fromIs updated toIs a step size small enough to increase accumulationA product reward eta; defining pi' as the strategy with the maximum accumulated reward value in the old strategy and defining the intermediate divergence variableSetting a conservative iteration strategy pi for increasing the lower bound of the cumulative reward eta new (a | s) is:
π new (a|s)=(1-α)π old (a|s)+απ′(a|s) (18)
wherein pi new Is a new strategy; pi old Is the current policy;is pi new And pi old Maximum total variation divergence therebetween; pi old (. S) is in strategy pi old Selected action in any state s; pi new (. S) is in strategy π new The selected action in any state s; d TV (π old (·|s)||π new (. S)) is π old (. S) and π new Total divergence of variation between (· | s); π '(a | s) is an arbitrary action a selected in an arbitrary state s in strategy π'; pi old (as) is in strategy π old Any action a selected in any state s;
for any random strategy, let the intermediate entropy variable ε = max s,a |A π (s, a) |, where max s,a The absolute value of the function for selecting any action a in any state s is | |; by usingSubstitution of pi new Replacement of pi by pi old (ii) a Substitute function value L π And the jackpot η satisfy:
wherein γ is a discount factor;
maximum relative entropy ofWhere π (· | s) is an action selected in an arbitrary state s in policy π;is in the policySelected action in any state s;is pi (. | s) andrelative entropy between;
the relation between the total variation divergence and the relative entropy satisfiesWhereinIs the total variation divergence between π (· | s) and π (· | s);
wherein C is a penalty coefficient;
under the constraint condition, the strategy pi for continuous update 0 →π 1 →...→π X Existence of eta (pi) 0 )≤η(π 1 )≤...≤η(π X ) (ii) a Where → denotes a policy update procedure; pi 0 ,π 1 ,...,π X Is a strategy sequence of the parallel trust optimization strategy network; eta (pi) 0 ),η(π 1 ),...,η(π X ) Is in parallelAccumulated reward of each strategy in the strategy sequence of the trust optimization strategy network;
considering parameterized strategiesAnd a parameter vectorPruning and parameter vectorsAn unrelated item;
expected cumulative reward function after conversion of parameter variablesComprises the following steps:
the constraint conditions after the parameter variable conversion are as follows:
wherein, = taking the equivalent value after variable conversion;is the parameter vector that needs to be updated;is a parameter vectorAn updated parameter vector;is a policy pi passing parameter vectorA parameterized policy;is a policy pi passing parameter vectorA parameterized policy;is a strategyDesired jackpot function;is a strategyA substitution function of (a);is thatAndrelative entropy between;is the maximum value of the relative entropy after the parameter variable is converted;
obtaining parallel policy optimization network parameter vector from formulas (21) to (24)The updating process of (3); by a parameter vectorThe updating of the operation can optimize the selection weight of the action, thereby achieving the purpose of optimizing the parallel control;
to ensure the jackpot eta is increased, makeMaximization; because C is used as a penalty coefficient, the result is that each timeBecomes small, resulting in a short step per update, reducing the update speed, so the penalty term becomes the constraint term:
wherein δ is a constant;
equation (14) is based on policySampling is performed due to pre-update policiesIs unknown and cannot be policy-basedSampling, so using importance sampling to accumulate the reward function for parameterizationRewriting is carried out; for parameterized jackpot functionsIgnoring terms not related to any parameter theta and usingInstead of the formerFinally, the updating of the parallel trust optimization strategy network becomes:
whereinIs to parameterized about the policyProbability distribution ofAnd state-action valueSampling is carried out;is in the policyMiddle state s t Is in an arbitrary state s, action a t A state-action value function for any action a;
parameter vector according to set constraintUpdating, namely updating the strategy pi by using the updated parameter vector to complete strategy updating in the parallel strategy optimization network, and then selecting actions by using a new strategy in the current state to perform step-by-step iteration;
and (9): after iteration is judged, the network is optimized according to the trained parallel trust strategy, and the power variation delta P of each power generation area of the novel power system is regulated and controlled Gi Each area of the novel power system reaches the optimal tie line power exchange assessment index CPS; each power generation area can reach the optimal tie line power exchange assessment index CPS by the method from the step (1) to the step (8); through training of networks in each power generation area, area cooperation is sought to achieve dynamic balance, finally, the frequency deviation delta f between the power generation areas approaches 0, the power exchange assessment index CPS approaches 100%, and the whole novel power system gradually achieves global optimization.
Compared with the prior art, the invention has the following advantages and effects:
(1) In a distributed system, all modules are mutually independent, the whole system is a multi-line parallel framework, the whole normal operation cannot be influenced due to the fact that one module has a problem, and the robustness is high; the real-time meteorological prediction network is added into the novel power system, so that the novel power system is fully interactive with the environment, the novel power system can accurately track the environment and perform intelligent power generation control and regulation.
(2) The weather prediction neural network can reduce the feature dimension of weather data by adding the stack self-coding neural network, and can accelerate the training speed of the network. (3) Compared with the existing strategy optimization network, the parallel trust strategy optimization network can reduce the dimension of the expected value table and eliminate the dimension disaster.
Drawings
FIG. 1 is a block diagram of the meteorological prediction distribution parallel trust policy optimization of the method of the present invention.
FIG. 2 is a control flow diagram of the multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation of the method of the present invention.
Fig. 3 is a block diagram of a stacked self-coding network of the method of the present invention.
FIG. 4 is a block diagram of a gated loop unit of the method of the present invention.
Detailed Description
The invention provides a multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method, which is explained in detail by combining the accompanying drawings as follows:
FIG. 1 is a framework diagram of parallel trust strategy optimization of meteorological prediction distribution in the method of the present invention.
First, each controlled power generation area is defined as a controlled Agent, and each area is marked as { Agent 1 ,Agent 2 ,…,Agent i };
Secondly, initializing parameters of a stack self-coding neural network and a gate control cycle unit in a meteorological prediction neural network, inputting meteorological data of the previous year, extracting meteorological features, and inputting the meteorological features into the stack self-coding neural network and the gate control cycle unit respectively;
then, training the stack self-coding neural network to obtain a parameter initial value for training the gating circulation unit, predicting future meteorological data by using the trained gating circulation unit, and finishing the training if the prediction effect reaches the standard;
then, inputting meteorological data to be predicted into a gating circulation unit through a stack self-coding neural network to obtain a prediction result, and inputting the prediction result into a novel power system;
then, setting three parallel trust strategy optimization networks of short-time distance, medium-time distance and long-time distance in each power generation area, wherein the short-time distance is one day, the medium-time distance is fifteen days, and the long-time distance is three months;
then, initializing system parallel trust strategy optimization network parameters, setting parallel trust strategy optimization network strategies, initializing parallel expectation value tables in the parallel trust strategy optimization network, setting the initial expectation value as 0, setting the search times as V, and setting the iteration times as X;
then, pre-training the parallel trust strategy optimization network, and inputting the initial values of the pre-trained network parameters into the parallel trust strategy optimization network;
then, under the current state, the parallel trust strategy optimization network in each intelligent agent depends on the strategy selection action to obtain the reward value corresponding to the action under the current environment, the obtained reward value is fed back to the parallel expected value table, meanwhile, the iteration number is increased by one, and whether the current iteration number is equal to X or not is judged; if the iteration times are not equal to X, updating the parallel expected value table, updating the time difference error in the experience pool, and updating the optimization strategy; if the iteration times are equal to X, the parallel trust strategy optimization network training is completed;
and finally, controlling the novel power system according to the trained parallel trust strategy optimization network, and regulating and controlling the power output of each agent to ensure that the novel power system reaches the optimal tie line power exchange assessment index CPS.
FIG. 2 is a control flow diagram of the multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation of the method of the present invention.
The method comprises the following steps that firstly, the meteorological prediction neural network part is operated, the gating circulation unit is trained by using meteorological data in the previous period, and then the trained gating circulation unit is used for predicting future meteorology from the meteorological data in the current period;
then, using the Agent of the novel power system 1 For example, agent 1 Three parallel trust strategy optimization networks, agents with short time interval, medium time interval and long time interval are arranged in the network 1 Receiving meteorological data output from the prediction neural network, and passing through the Agent 1 Frequency deviation Δ f of 1 Area control error ACE 1 And a tie line power exchange assessment index CPS, and selecting actions according to strategies in each parallel trust strategy optimization network;
and finally, updating the parallel expected value table, updating the time difference error in the experience pool, updating the optimization strategy, and circularly obtaining the Agent 1 The optimal tie line power exchange assessment index CPS;
except for Agent 1 And other power generation areas can obtain the optimal tie line power exchange assessment index CPS of the area by the method.
Fig. 3 is a block diagram of a stacked self-coding network of the method of the present invention.
Firstly, initializing model parameters, and inputting a current meteorological data set into a self-coding neural network; establishing an initial self-coding neural network to compress meteorological data from original n dimensions to m dimensions;
then, neglecting the input state y of the self-coding neural network, taking the hidden layer h as an original novel type, training a new self-coder, and stacking layer by layer to reduce the characteristic dimension of data and simultaneously keep key information;
then, comparing the trained data with the actual data, calculating a loss function, and updating system parameters;
and finally, inputting the trained initial parameter values into a self-coding neural network.
FIG. 4 is a block diagram of a gated loop unit of the method of the present invention.
Firstly, initializing model parameters, and inputting a current meteorological data set into a gating cycle unit;
then, the short-term dependency relationship in the gate capturing time sequence is reset, and the long-term dependency relationship in the gate capturing time sequence is updated, so that the network parameters are updated;
and finally, using the trained network for predicting the meteorological data at the current stage.
Claims (1)
1. A multi-time-interval meteorological prediction distribution parallel trust strategy optimization power generation control method is characterized in that the method combines multi-time-scale meteorological prediction, distribution parallel and trust strategy optimization neural networks for power generation control of a novel power system; the method for optimizing the power generation control by the multi-time-distance meteorological prediction distribution parallel trust strategy comprises the following steps in the use process:
step (1): defining each controlled power generation area as an Agent, and marking each area as { Agent 1 ,Agent 2 ,…,Agent i },
Wherein i is the number of each power generation region; all power generation areas are not interfered with each other and are mutually connected, so that the robustness is high;
step (2): initializing parameters of a stack self-coding neural network and a gating cycle unit, collecting a meteorological wind intensity and illumination intensity data set in three years, extracting a meteorological characteristic data set, and inputting the meteorological characteristic data set into the stack self-coding neural network;
the stacked self-coding neural network is formed by stacking a plurality of self-coding neural networks, wherein x is an input meteorological characteristic data vector, x is an n-dimensional vector, and x belongs to R n (ii) a Will self-encode neural networks AE 1 Hidden layer h of (1) As a self-encoding network AE 2 Training the self-encoding network AE 2 Then the self-encoding network AE is further processed 2 Is hidden layer h (2) As a self-encoding network AE 3 The input of (2) and so on; after stacking layer by layer, the feature dimension of the weather data is reduced, the training speed of the gate control circulation unit can be accelerated, and the key information of the data can be stored; the dimensions of each hidden layer are different, so that the hidden layers have the following dimensions:
wherein h is (1) Is a self-coding neural network AE 1 Hidden layer of h (2) Is a self-coding neural network AE 2 Hidden layer of h (p-1) Is a self-coding neural network AE p-1 Hidden layer of h (p) Is a self-coding neural network AE p The hidden layer of (2); w is a group of (1) Is a hidden layer h (1) Parameter matrix of, W (2) Is a hidden layer h (2) Parameter matrix of, W (p) Is a hidden layer h (p) A parameter matrix of (2); b is a mixture of (1) Is a self-coding neural network AE 1 Bias of (b), b (2) Is a self-coding neural network AE 2 Bias of (b) (p) Is a self-coding neural network AE p Bias of (3); f () is activatedA function; p is the number of layers of the stacked self-coding network stack; softmax () is a normalized exponential function, used as a classifier;
in a stacked self-coding neural network, if h (1) Is m dimension, h (2) From self-coding network AE for k dimension 1 Stack to self-encoded network AE 2 Training a network of n → m → k structure; training the network n → m → n to get the transformation of n → m, then training the network m → k → m to get the transformation of m → k, and finally self-encoding the network AE 1 And self-encoding network AE 2 Stacking to obtain a network n → m → k; through a self-encoding network AE 1 To AE p Stacking layer by layer, and finally obtaining an output vector through a softmax functionObtaining a certain initial value of network parameters and meteorological features after dimension reduction after training of a stack self-coding neural networkAs an output;
and (3): order toOutput vector for time t self-coding neural networkPre-training the stack self-coding neural network to obtain the initial values of the network parameters and the meteorological featuresThe inputs of the refresh gate and the reset gate in the gated cyclic unit are set asThe outputs of the update gate and the reset gate in the gated loop unit are respectively:
wherein z is t Updating the output of the gate for time t, r t Resetting the output of the gate for time t, h t-1 For gating the hidden state of the cyclic unit at time (t-1), x t For the input at the time of the t-time, 2 [ 2 ]]Denotes that two vectors are connected, W z To update the weight matrix of the gate, W r σ () is a sigmoid function, which is a weight matrix of the reset gate;
the gate control circulation unit discards and memorizes the input information through two gates to obtain a candidate hidden state value at the time tComprises the following steps:
wherein tanh () represents a tanh activation function,as a hidden state valueA weight matrix of (a); * Representing a product of the matrices;
after the tanh activation function obtains updated state information through the update gate, vectors of all possible values are created according to the input, and candidate hidden state values are obtained through calculationThen the state h at the time t is calculated through the network t ,h t Comprises the following steps:
the reset gate determines the number of past states that are desired to be remembered; when r is t When 0, the state information h at the time (t-1) t-1 Can be forgotten to be in a hidden stateWill be reset to the information input at time t; the update gate determines the number of past states in the new state; when z is t When it is 1, the hidden stateUpdate to state h at time t t (ii) a The gate control circulation unit reserves important features through a gate function by updating and resetting the gate storage and filtering information, and captures dependence items through learning so as to obtain an optimal meteorological predicted value;
and (4): after the training of the stack self-coding neural network and the gating circulation unit is finished, meteorological data to be predicted are input into the gating circulation unit through the stack self-coding neural network, and the obtained meteorological predicted value is input into the novel power system; in a novel power system, three parallel trust strategy optimization networks of short-time distance, medium-time distance and long-time distance are arranged in each power generation area, wherein the short-time distance is one day, the medium-time distance is fifteen days, and the long-time distance is three months;
and (5): initializing parameters of a parallel trust strategy optimization network in each region, setting a parallel trust strategy optimization network strategy, and initializing a parallel expectation value table in the parallel trust strategy optimization network, wherein the initial expectation value is 0;
and (6): setting the iteration frequency as X, setting an initial value of the search frequency as a positive integer V, and initializing the search frequency of each intrinsic action as V = V;
and (7): in the current state, the parallel trust strategy optimization network in each intelligent agent selects an action by means of a strategy to obtain an award value corresponding to the action in the current environment, feeds the obtained award value back to a parallel expectation value table, and then adds one to the iteration number; if the current iteration times are equal to X, the iteration is completed, and a trained parallel trust strategy optimization network is obtained;
and (8): carrying out strategy optimization and parallel value optimization in a parallel trust strategy optimization network, wherein the optimization method comprises the following steps:
the core of the parallel trust optimization strategy network is an actor-critic method; in policy optimization of a parallel belief optimization policy network, the Markov decision process is the tuple (S, A, P, r, ρ) 0 Gamma), wherein S is a state space consisting of wind power intensity, illumination intensity, frequency deviation delta f, area control error ACE and tie line power exchange assessment index CPS, and any state S belongs to S; a is the power variation Δ P of different magnitude Gi ,i=1,2,...,2 j A formed motion space, where j is a quantum stacking state motion | A>Any action a belongs to A; p is a transition probability distribution matrix for transitioning from an arbitrary state s to a state s' through an arbitrary action a; r () is a reward function; rho 0 Is in an initial state s 0 A probability distribution of (a); gamma is a discount factor; let π denote the random strategy π: S × A → [0,1](ii) a The desired jackpot function η (π) under policy π is:
wherein s is 0 Is in an initial state, a 0 Is in a state s 0 Action of pi selection, gamma, of lower random strategy t Discounting factor for time t, s t Is the state at time t, a t R(s) is the movement at time t t ) Is in a state s t The value of the prize to be awarded,representing a state s in strategy pi t Downward action a t Sampling is carried out;
introducing a state-action value function Q π (s t ,a t ) Function of state value V π (s t ) Advantage function A π (s, a) and probability distribution function ρ π (s);
Function of state-action valueNumber Q π (s t ,a t ) Finding the state s in strategy π t Lower execution action a t Late jackpot, state-action value function Q π (s t ,a t ) Comprises the following steps:
wherein s is t+1 The state at time (t + 1); s t+l The state at time (t + l); a is t+1 An action at time (t + 1); a is t+l The action at time (t + l);representing a state s in strategy pi t+l Downward action a t+l Sampling is carried out; gamma ray t+l A discount factor for time (t + l); l is a positive integer; r(s) t+l ) Is in a state s t+l A reward value of;
function of state value V π (s t ) Finding the state s in strategy π t Accumulated award of, V π (s t ) Is Q π (s t ,a t ) Regarding action a t Average value of (2), state value function V π (s t ) Comprises the following steps:
dominance function A π (s, a) to find the dominance of taking an arbitrary action a in an arbitrary state s compared to the average, a dominance function A π (s, a) is:
A π (s,a)=Q π (s,a)-V π (s) (8)
Q π (s, a) is the state s t Is in an arbitrary state s, action a t A state-action value function for any action a; v π (s) is a state s t Is a function of the state value at any state s;
probability distribution function ρ π (s) solving the probability distribution, probability distribution function rho, of the probability distribution in any state s in the strategy pi π (s) is:
wherein P(s) t = s) is the state s at time t t A transition probability distribution matrix when in an arbitrary state s;
the parallel value optimization of the parallel trust optimization strategy network is to a state-action value function Q π (s t ,a t ) Performing parallel optimization; function Q of state-action value π (s t ,a t ) Is optimized towards Q π (s t ,a t ) Introducing quantum bits and a Grover search method to accelerate the network training speed and eliminate the dimensionality disaster;
parallel value optimization will act a t Quantising and using in state s t Updating the searching times V instead of the state s t The action probability is updated as follows:
an action space is provided with 2 j An intrinsic action of 2 j An intrinsic action | a t >Is represented by the superposition ofWill act a t Quantization to j-dimensional quantum superposition state action | A>,|A>Is each qubit of |0>And |1>Superposition of the two states; quantum superposition state motionEquivalent to | A>(ii) a j dimension quantum superposition dynamicMake | A>The expression of (c) is:
wherein | a>Is a j-dimensional quantum superposition state action | A>Observed quantum motion; c a Is a quantum motion | a>A probability amplitude; i C a | Quantum action | a>A modulus of (d) satisfies
When quantum superposition state motion | A > is observed, | A > will collapse to a quantum motion | a >, which is a |0> or |1> on each qubit of quantum motion | a >; each qubit of quantum action | a > has a value representing its desired value; the expected values of these qubits are different in different states; these expected values are used to select an action in a certain state, and the update rule of these expected values is as follows:
if in strategy π, in state s t Down, quantum superposition state action | A>Collapse into a quantum motion | a>(ii) a Action | a as the jackpot η increases>The qubit in each qubit is |0>Is reduced, the qubit is |1>The expected value of (d) increases; updating the increment and decrement of the expected value according to strategy pi, and setting the quantum position with positive expected value of each quantum position of the action as |1 when the action is selected in the same state>And the rest quantum positions are |0>Obtaining quantum motion; let the parameter vector of strategy pi beNormalizing the quantum motion to a specific motion value and passing through a parameter vectorConverted into power variation quantity delta P G1 As Agent in this state 1 Outputting; wherein theta is 0 ,θ 1 ,...,θ u As a vector of parametersU is a positive integer;
the quantum motion selection is to obtain the state s at the time t by a Grover search method t Observing quantum superposition state action | A>Quantum action | a resulting from collapsing>The obtaining and updating rules of the search times V are as follows:
first, all intrinsic actions are superimposed with equal weight, and j Hadamard gates are sequentially applied to j independent qubits in initial states of |0>, resulting in:
wherein H is a Hadamard gate, H being capable of coupling the ground state |0>Conversion to equal weight superpositionIs the initialized quantum superposition state action, which is composed of 2 j The actions with the same probability amplitude are superposed together;indicating that j Hadamard gates are sequentially applied to 2 j An initialization intrinsic action; two parts of the Grover iterative operator are U |a> Andrespectively as follows:
where I is a unitary matrix of appropriate dimensions;<a | is a quantum action | a>Reverse quantum action of (d);<0| is a quantum action |0>The reverse quantum action of (2);is a quantum motionThe reverse quantum action of (2); | a><a | is a quantum action | a>The outer product of (2);is initialized quantum superposition state actionThe outer product of (d); u shape |a> Andis a quantum black box; u shape |a> Acting on the intrinsic action | a t >While, U |a> Change and | a>The phase of the upward state in the same direction is changed by 180 degrees;acting on the intrinsic action | a t >When the utility model is used, the water is discharged,change andthe phase of the upward state in the same direction is changed by 180 degrees;
record Grover iteration as U Grov :
According to U Grov Iterating each intrinsic action, obtaining the closest iteration number under the least iteration number, and updating the iteration number to the iteration corresponding to each intrinsic actionThe times V are counted, and the strategy pi is updated at the same time;
from equation (9), the state s at time t t At any state s, the strategy is updated from piExpected cumulative reward function of timeComprises the following steps:
whereinIs an updated policy; p(s) t = s) as being in policyState s at time t t A transition probability distribution matrix at any state s; a. The π (s, a) is in the policyState s at the next t t Is a merit function at any state s;is in the policy ofState s at time t t A transition probability distribution matrix at any state s;is in the policy ofMiddle state s t Downward pair action a t Sampling is carried out;is in the policySampling value a in any state s;is in the policy ofProbability distribution in any state s; a. The π (s t ,a t ) Is the state s at time t t Take action a t Superiority over average; η (π) is the desired jackpot function under strategy π;
at an arbitrary state s, there areWhereinFor updated policyState s at time t t The selected action value can be used for improving the accumulated reward eta, or the accumulated reward eta is kept unchanged under the condition that the expected advantage is zero, so that the strategy is continuously updated to optimize the accumulated reward eta;
since the updated policy needs to be calculated in equation (14)Probability distribution ofThis results in higher complexity of equation (7), which is difficult to optimize, and introduces a substitution function to reduce the complexity of the calculation
Wherein argmax () is a function that finds the largest argument in the function; rho π (s) is the probability distribution function at any state s in strategy pi;
substitution functionAndis distinguished by a substitution functionIgnoring the change in state access density caused by the policy change,using rho π As access frequency rather than as access frequency Access frequency p of π Obtained by using an approximation to the strategy π when the strategy π is compared withSubstitution function when certain constraints are satisfiedCan replace the original desired jackpot function
In the parameter vectorIn the updating of (2), using the parameter vectorParameterizing strategy pi in the form of arbitrary parameter theta Is in a parameterized strategyAny action a in any state s; for any parameter θ, when the policy is not updated, the substitution functionAnd original cumulative reward functionAre exactly equal, i.e. have:
when the derivatives of the alternative function and the original cumulative reward function with respect to any parameter theta are in the strategy pi θ Are identical, i.e. the policy is fromIs updated toIf there is a very small change, the function value is replacedIf the cumulative reward eta increases, then the strategy can be improved by using the alternative function as the optimization goal, namely:
equations (16) and (17) illustrate the strategy fromIs updated toIs a step small enough to increase the jackpot η; defining pi' as the strategy with the maximum accumulated reward value in the old strategy and defining the intermediate divergence variableSetting a conservative iteration strategy pi to increase the lower bound of the cumulative reward eta new (a | s) is:
π new (a|s)=(1-α)π old (a|s)+απ′(a|s) (18)
wherein pi new Is a new strategy; pi old Is the current policy;is pi new And pi old Maximum total variation divergence between; pi old (. S) is in strategy pi old Selected action in any state s; pi new (. S) is in strategy π new The selected action in any state s; d TV (π old (·|s)||π new (. S)) is π old (. S) and π new Total divergence of variation between (· | s); π '(a | s) is an arbitrary action a selected in an arbitrary state s in strategy π'; pi old (as) is in strategy π old Any action a selected in any state s;
for any random strategy, let the intermediate entropy variable ε = max s,a |A π (s, a) |, where max s,a | | is the maximum absolute value of a function for selecting any action a in any state s; by usingSubstitution of pi new Replacement of pi by pi old (ii) a Substitute function value L π And the jackpot η satisfy:
wherein γ is a discount factor;
maximum relative entropy ofWhere π (· | s) is an action selected in an arbitrary state s in policy π;is in the policy ofSelected action in any state s;is π (· | s) andrelative entropy between;
the relation between the total variation divergence and the relative entropy satisfiesWhereinIs the total variation divergence between π (· | s) and π (· | s);
wherein C is a penalty coefficient;
under the constraint condition, the strategy pi for continuous update 0 →π 1 →...→π X In the presence of eta (pi) 0 )≤η(π 1 )≤...≤η(π X ) (ii) a Where → denotes a policy update procedure; pi 0 ,π 1 ,...,π X Is a strategy sequence of the parallel trust optimization strategy network; eta (pi) 0 ),η(π 1 ),...,η(π X ) The accumulated reward of each strategy in the strategy sequence of the parallel trust optimization strategy network is obtained;
considering a parameterized policyAnd a parameter vectorPruning and parameter vectorsAn unrelated item;
expected cumulative reward function after conversion of parameter variablesComprises the following steps:
the constraint conditions after the parameter variable conversion are as follows:
wherein = taking the equivalent value after variable conversion;is the parameter vector that needs to be updated;is a parameter vectorAn updated parameter vector;is a policy pi passing parameter vectorA parameterized policy;is that the strategy pi passes through a parameter vectorA parameterized policy;is a strategyDesired jackpot function;is a strategyA substitution function of (a);is thatAndrelative entropy between;is the maximum value of the relative entropy after the parameter variable is converted;
obtaining parallel policy optimization network parameter vector from formulas (21) to (24)The updating process of (1); by a parameter vectorThe updating of the operation can optimize the selection weight of the action, thereby achieving the purpose of optimizing the parallel control;
to ensure the jackpot eta is increased, makeMaximization; because C is used as a penalty coefficient, the result is that each timeBecomes small, resulting in a short step per update, reducing the update speed, so the penalty term becomes the constraint term:
wherein δ is a constant;
equation (14) is based on policyTo carry out miningAlso, due to pre-update policiesIs unknown and cannot be policy-basedSampling, so using importance sampling to a parameterized cumulative reward functionRewriting is carried out; for parameterized jackpot functionsIgnoring terms not related to any parameter theta and usingSubstitute forFinally, the updating of the parallel trust optimization strategy network becomes:
whereinIs to the parameterized about policyProbability distribution ofAnd state-action valueTo proceed withSampling;is in the policyMiddle state s t Is in an arbitrary state s, action a t A state-action value function for any action a;
vector of parameters according to set constraintsUpdating, namely updating the strategy pi by using the updated parameter vector to complete strategy updating in the parallel strategy optimization network, and then selecting actions by using a new strategy in the current state to perform step-by-step iteration;
and (9): after iteration is judged to be completed, the network is optimized according to the trained parallel trust strategy, and the power variation delta P of each power generation area of the novel power system is regulated and controlled Gi Each area of the novel power system reaches the optimal tie line power exchange assessment index CPS; each power generation area can reach the optimal tie line power exchange assessment index CPS by the method from the step (1) to the step (8); through training of networks in each power generation area, area cooperation is sought to achieve dynamic balance, finally, the frequency deviation delta f between the power generation areas approaches 0, the power exchange assessment index CPS approaches 100%, and the whole novel power system gradually achieves global optimization.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210967237.1A CN115238592A (en) | 2022-08-12 | 2022-08-12 | Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210967237.1A CN115238592A (en) | 2022-08-12 | 2022-08-12 | Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115238592A true CN115238592A (en) | 2022-10-25 |
Family
ID=83678600
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210967237.1A Pending CN115238592A (en) | 2022-08-12 | 2022-08-12 | Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115238592A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117040030A (en) * | 2023-10-10 | 2023-11-10 | 国网浙江宁波市鄞州区供电有限公司 | New energy consumption capacity risk management and control method and system |
CN118297364A (en) * | 2024-06-06 | 2024-07-05 | 贵州乌江水电开发有限责任公司 | Production scheduling system and method for watershed centralized control hydropower station |
-
2022
- 2022-08-12 CN CN202210967237.1A patent/CN115238592A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117040030A (en) * | 2023-10-10 | 2023-11-10 | 国网浙江宁波市鄞州区供电有限公司 | New energy consumption capacity risk management and control method and system |
CN117040030B (en) * | 2023-10-10 | 2024-04-02 | 国网浙江宁波市鄞州区供电有限公司 | New energy consumption capacity risk management and control method and system |
CN118297364A (en) * | 2024-06-06 | 2024-07-05 | 贵州乌江水电开发有限责任公司 | Production scheduling system and method for watershed centralized control hydropower station |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Combined model with secondary decomposition-model selection and sample selection for multi-step wind power forecasting | |
CN108280551B (en) | Photovoltaic power generation power prediction method utilizing long-term and short-term memory network | |
Shen et al. | Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network | |
Kim et al. | Particle swarm optimization-based CNN-LSTM networks for forecasting energy consumption | |
CN111340273A (en) | Short-term load prediction method for power system based on GEP parameter optimization XGboost | |
CN113053115A (en) | Traffic prediction method based on multi-scale graph convolution network model | |
CN106600059A (en) | Intelligent power grid short-term load predication method based on improved RBF neural network | |
CN115238592A (en) | Multi-time-interval meteorological prediction distribution parallel trust strategy optimized power generation control method | |
CN106971237A (en) | A kind of Medium-and Long-Term Runoff Forecasting method for optimized algorithm of being looked for food based on bacterium | |
Jin et al. | Adaptive forecasting of wind power based on selective ensemble of offline global and online local learning | |
Fan et al. | Multi-objective LSTM ensemble model for household short-term load forecasting | |
He et al. | Nonparametric probabilistic load forecasting based on quantile combination in electrical power systems | |
CN110738363B (en) | Photovoltaic power generation power prediction method | |
CN116911459A (en) | Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant | |
CN115481788B (en) | Phase change energy storage system load prediction method and system | |
Sang et al. | Ensembles of gradient boosting recurrent neural network for time series data prediction | |
CN109408896B (en) | Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production | |
Song et al. | Study on GA-based training algorithm for extreme learning machine | |
CN118214025A (en) | Power generation regulation and control method of 100% new energy power system considering sample shortage | |
CN117893043A (en) | Hydropower station load distribution method based on DDPG algorithm and deep learning model | |
Zhuang et al. | Research on quantitative stock selection strategy based on CNN-LSTM | |
Luo et al. | Prediction of the Stock Adjusted Closing Price Based On Improved PSO-LSTM Neural Network | |
CN113420492A (en) | Modeling method for frequency response model of wind-solar-fire coupling system based on GAN and GRU neural network | |
Dinh et al. | End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty | |
Zhou et al. | An ultra-short-term wind power prediction method based on CNN-LSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |