CN114336759A - Micro-grid autonomous operation voltage control method based on deep reinforcement learning - Google Patents
Micro-grid autonomous operation voltage control method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114336759A CN114336759A CN202210021329.0A CN202210021329A CN114336759A CN 114336759 A CN114336759 A CN 114336759A CN 202210021329 A CN202210021329 A CN 202210021329A CN 114336759 A CN114336759 A CN 114336759A
- Authority
- CN
- China
- Prior art keywords
- microgrid
- network
- current
- state
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000002787 reinforcement Effects 0.000 title claims abstract description 28
- 230000006870 function Effects 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 230000009471 action Effects 0.000 claims description 78
- 238000011156 evaluation Methods 0.000 claims description 55
- 230000008569 process Effects 0.000 claims description 8
- 230000006399 behavior Effects 0.000 claims description 7
- 238000010248 power generation Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 239000003990 capacitor Substances 0.000 claims description 3
- 238000011217 control strategy Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000004146 energy storage Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
Images
Landscapes
- Feedback Control In General (AREA)
Abstract
The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training. The micro-grid autonomous operation voltage control method based on deep reinforcement learning has the advantage of high stability, can effectively stabilize load voltage, and can also effectively maintain the stability of the voltage when the voltage fluctuates due to external factors.
Description
Technical Field
The invention belongs to the field of isolated microgrid control, and particularly relates to a microgrid autonomous operation voltage control method based on deep reinforcement learning.
Background
The distributed power supply output has the defects of intermittency and randomness, so that the distributed power supply output is difficult to use on a large scale, and the micro-grid formed by the distributed power supply, the energy storage device and the load together is an effective means for solving the problem. When the micro-grid operates in an island, the micro-grid needs to be capable of being separated from a power distribution network and independently supplying power to a load, and stable voltage and frequency are provided. In recent years, with the rapid development of artificial intelligence, control technologies are continuously developed and updated, how to apply the new-generation artificial intelligence technology to smart grids and energy internet is a research focus in the current grid and energy field, and in the aspect of control of isolated micro-grids, researches on control by deep reinforcement learning are less, and a voltage control method needs to be continuously researched.
Disclosure of Invention
The invention aims to provide a micro-grid autonomous operation voltage control method based on deep reinforcement learning, and the method has the advantages of strong control capability and good stability.
In order to achieve the above object, the present invention provides a method for controlling voltage of autonomous operation of a micro-grid based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.
Preferably, the step S1 of establishing the classical model of the microgrid during island operation includes the following steps: s101, establishing a microgrid electrical model; step S102, establishing a differential equation matched with the microgrid electrical model in the step S101; wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with a public power grid through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.
Preferably, the microgrid module comprises: the output end of the distributed power generation unit is connected with an inverter unit; a filter unit having one end connected to the inverter unit circuit and the other end connected to the transformer unit; one end of the load unit is connected with the transformer unit circuit through PCC, and the other end of the load unit is grounded; the filter unit) is composed of a filter resistor RtAnd a filter inductance LtAre connected in series; the load unit comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor RLAnd a load inductor L connected in series.
Preferably, the differential equation described in step S102 is as follows:
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs a load inductance resistance;
when the frequency of the steady state voltage and current signals is omega0=2πf0Performing park transformation on the differential equation in the step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:
wherein, Itd、ItqIs itabcD-q axis component of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
Preferably, the step S2 of establishing the neural network of the constant voltage control microgrid by using the DDPG algorithm includes the following steps: step S201, establishing a state space expression of micro-grid island operation; step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm; step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202; the state space expression of the microgrid isolated island operation in step S201 is as follows:
wherein,C=[0 0 0 1](ii) a From the differential equation in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd。
Preferably, at time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the network parameter of the target evaluation is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network described in step S202 are as follows: the current action network can be based on the current state stGenerating specific actions a, exploratory or unexploredt(ii) a The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1(ii) a The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value; the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω') of。
Preferably, the updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps: step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nThe return value at the moment t + n;
step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2
wherein, Q(s)t,atω) is the output value of the current action network at the moment t;
step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ'
wherein tau is an updating coefficient, and the value range of tau is more than or equal to 0 and less than or equal to 0.01.
Preferably, the step S3 specifically includes the following steps: step S301, designing a reward function and a termination function; step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.
Preferably, the expression of the reward function is:
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Voltage per unit value representing the PCC point;
the termination function is represented by:
preferably, the training of the DDPG algorithm in step S302 specifically includes the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience playback, and setting the maximum training times; step S322b, setting a loop from T ═ 1 to the maximum loop number T in training, where T is a positive integer;
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initializing S to be the first state of the current state sequence; step S322b, obtaining behavior A based on the first state S in the current action network; step S322c, executing action A to obtain a new state S', obtaining a reward R according to the reward function (8) in step S301, and obtaining termination state information is _ end according to the termination function (9) in step S301;
step S323, storing sample information, namely storing quintuple sample information { S, A, R, S', is _ end } obtained in step S322 into an experience revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1, 2. m, calculating the current target value yj:
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following steps: step 325a, updating the parameter ω of the current evaluation network by the gradient back propagation of the neural network by using a mean square error loss function; the mean square error loss function isStep 325b, updating the parameter theta of the current action network by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, determining whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, the process returns to step S322b until state S' is the end state or the maximum number of cycles is reached.
In summary, compared with the prior art, the method for controlling the autonomous operation voltage of the micro-grid based on the deep reinforcement learning provided by the invention has the following beneficial effects that the artificial intelligence deep reinforcement learning technology is applied to the micro-grid control: by applying the depth determination gradient strategy to isolated microgrid voltage control, load voltage can be effectively stabilized, and voltage can be effectively maintained to be stable when external causes cause voltage fluctuation.
Drawings
FIG. 1 is a flow chart of a voltage control method according to the present invention;
FIG. 2 is an electrical model diagram of the microgrid of the present invention in islanding operation;
FIG. 3 is a learning flow chart of the isolated microgrid DDPG algorithm of the present invention.
Detailed Description
The technical solution, the structural features, the achieved objects and the effects of the embodiments of the present invention will be described in detail with reference to fig. 1 to 3 of the embodiments of the present invention.
It should be noted that the drawings are simplified in form and not to precise scale, and are only used for convenience and clarity to assist in describing the embodiments of the present invention, but not for limiting the conditions of the embodiments of the present invention, and therefore, the present invention is not limited by the technical spirit, and any structural modifications, changes in the proportional relationship, or adjustments in size, should fall within the scope of the technical content of the present invention without affecting the function and the achievable purpose of the present invention.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, wherein a micro-grid can be divided into two major operation modes of grid-connected operation and island operation according to different operation modes; when the micro-grid is in the island operation mode, better power quality needs to be provided for the load, so that a reasonable control strategy needs to be provided, and the observed quantity and the input quantity are continuous quantities; as shown in fig. 1, the voltage control method includes:
step S1, establishing a microgrid model when a microgrid isolated island operates;
step S2, establishing a neural network of the voltage control microgrid by adopting a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning;
step S3, designing a suitable reward function for the microgrid model and the voltage control requirement described in step S1, and obtaining a converged voltage through DDPG algorithm training.
The step S1 of establishing the microgrid model during microgrid island operation includes the following steps:
s101, establishing a microgrid electrical model;
step S102, establishing a differential equation matched with the microgrid electrical model in the step S101;
specifically, as shown in fig. 2, the microgrid electrical model established in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC (Point of Common Coupling, namely a public connection Point, namely a connection position of more than one user load in the power system) circuit; a switch CB provided on the PCC for controlling the connection and disconnection between the microgrid module and the public power grid module (Mainnet); when the switch CB is turned off, the micro-grid module is disconnected with the public power grid module, namely the micro-grid is isolated from the public power grid, and a micro-grid island operation mode is formed; when the switch CB is switched on, the micro-grid module is conducted with the public power grid module, namely the micro-grid is conducted with the public power grid in a closed mode, and a micro-grid-connected operation mode is formed.
As shown in fig. 2, the microgrid module in the microgrid electrical model comprises: the output end of the distributed power generation unit is connected with an Inverter unit (Inverter)1 and provides a direct-current power supply for the Inverter unit 1; a filter unit 2 having one end electrically connected to the inverter unit 1 and the other end connected to a Transformer unit (Transformer) 3; a load cell 4, one end of which is electrically connected with the transformer cell 3 through a PCC and the other end of which is grounded; wherein the filter unit 2 is composed of a filter resistorRtAnd a filter inductance LtAre connected in series; the load unit 4 comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor RLThe load inductor L is connected in series; the model is improved by setting the multiple loads, so that the adaptability of the microgrid electrical model when the microgrid runs in an isolated island is stronger.
In one embodiment, the distributed power generation unit is formed by combining a photovoltaic device and an energy storage device, and equivalently forms a stable direct current power supply, and the parameters of each unit of the microgrid in an island operation mode are set as follows: the distributed power generation unit parameter is output voltage Vdc800V, microgrid frequency f 50Hz, low voltage side U of transformer unit 31600V, high pressure side U2Filter inductance L in filter unit 2 of 13800Vt0.3mH, filter resistance Rt1.5m Ω, load resistance R in the load unit 4 is 76 Ω, load inductance L is 111.9mH, and load inductance resistance RL0.3515 Ω, and 62.855 μ F load capacitance C.
Further, since the microgrid system is balanced, the differential equation that is established in step S102 and matched with the microgrid electrical model is as follows:
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs the load inductor resistance.
Performing park transformation on the obtained differential equation (1) (the park transformation refers to projecting the three-phase currents of a, b and c to a direct axis (d axis), a quadrature axis (q axis) and a zero axis (0 axis) perpendicular to a dq plane), and selecting the d axis to be in the same direction as a voltage vector, so that the voltage of the q axis is enabled to be in the same directionThe component is zero. When the micro-grid is in an island operation mode, the frequency f of the micro-grid is provided by the inverter through a constant-frequency internal oscillator, the frequency of the system is controlled in an open loop mode, and the frequency of a steady-state voltage signal and a steady-state current signal is omega02 pi f, and the differential equation in dq coordinate system is obtained as:
wherein, Itd、ItqAre respectively itabcD-and q-axis components of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
Wherein, the step S2 of establishing the neural network of the constant voltage control microgrid by using a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning includes the following steps:
step S201, establishing a state space expression of micro-grid island operation;
step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm;
and step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202.
Specifically, since the observed state quantity and the input quantity are continuous quantities, the state space expression of the microgrid in islanding operation established by the state space equation in step S201 is as follows:
from the differential equation obtained in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd(ii) a A. B, C, the parameters in the matrix are the parameters of the electrical model in the differential equation in dq coordinate system in step S102.
At time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network established in step S202 are as follows: the current action network can be based on the current state stGenerating specific actions a, exploratory or unexploredt(ii) a The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1(ii) a The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value; the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω '), and ω'). Wherein the expression of the target value is:
y=Rt+γQ'(st+1,at+1,ω') (3)
in the formula (3), y is the target value, RtFor the return value at time t, γ is a discount factor and 0 < γ < 1, Q'(s)t+1,at+1ω') is the output value of the target evaluation network at time t + 1.
In order to increase some randomness and coverage of learning in the learning process, the DDPG adds a certain noise N to the selected behavior a, and then obtains the expression of the final action a interacting with the environment as:
A=πθ(s)+N (4)
wherein, piθ(. cndot.) is the output function of the current action network, theta is the current action network parameter, and N is the random noise function.
Further, according to a state space equation of the microgrid, at the time t, a load voltage d-axis component is selectedIntegral quantity of voltage per unit value e and current state quantity s ^ etI.e. bySelecting per unit value of inverter port voltageAs current motion input quantity, i.e. motion interacting with the environment
The updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:
step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nIs the reported value at time t + n.
Step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2 (6)
wherein, Q(s)t,atAnd ω) is the output value of the current action network at the moment t.
Step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ' (7)
τ is an update coefficient, τ is greater than or equal to 0 and less than or equal to 0.01, and τ is 0.001 in this embodiment.
Wherein, step S3 specifically includes the following steps:
step S301, designing a reward function and a termination function;
step S302, training the microgrid model proposed in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a converged voltage.
The purpose of designing the reward function in the step S301 is to determine whether the deep reinforcement learning controller takes action in the training process; specifically, a positive reward is obtained when the voltage deviation is within the allowable range, a negative reward is obtained when the voltage deviation is large, a very low negative reward is obtained when a serious deviation occurs, and a termination state is entered, so that the expression of the reward function is:
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Representing the voltage per unit value of the PCC point.
Furthermore, a termination function is set in combination with the termination condition of the reward function, the termination function can judge that the system state enters a termination/non-termination state, if the system state enters the termination state, the iteration of the current round is finished, and if the system state enters the non-termination state, the iteration process of the current round is continued; wherein the termination function is represented as:
in one embodiment, by combining the reward function and the termination function designed in step S301, a requirement Learning Toolbox in MATLAB2020a is used to perform constant voltage control on the microgrid model in the islanding mode proposed in step 1 by using a DDPG algorithm, where the maximum iteration number is T, as shown in fig. 3, and the training of the DDPG algorithm in step S302 specifically includes the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience replays, and setting a maximum training number (the present embodiment is set to 150 times); step S322b, setting a loop from T ═ 1 to the maximum number of loops T in training (T is a positive integer);
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initialize S ═ VdE, je) is the first state of the current state sequence; step S322b, based on the first state S in the current action networktGet an actionIn step S322c, action a is executed to obtain new state S ═ Vd', e ', - [ integral ] e '), deriving the reward R from the reward function (8) in step S301, and deriving the termination state information is _ end from the termination function (9) in step S301;
step S323, storing sample information, specifically, storing quintuple sample information { S, a, R, S', is _ end } obtained in step S322 into an empirical revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', that is, the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1,2 · m (m 64 in this example), the current target value y is calculatedj:
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following stepsThe method comprises the following steps: step S325a, updating the current evaluation network parameter omega by the gradient back propagation of the neural network by using a mean square error loss function formula; the mean square error loss function isStep 325b, updating the current action network theta by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, judging whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, return to step S322b until state S' is the end state or the maximum number of cycles is reached.
Through continuous training of the DDPG algorithm in the deep reinforcement learning, the reward obtained by the deep reinforcement learning controller in the training process can be converged, namely, the difference value of the voltage and the reference voltage is smaller and smaller, and then the microgrid voltage which is converged in an island mode is obtained through training.
In summary, compared with the existing voltage control, the micro-grid autonomous operation voltage control method based on deep reinforcement learning provided by the invention has the advantage of high stability, and can effectively maintain the voltage stability of the micro-grid in an island mode.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.
Claims (10)
1. A micro-grid autonomous operation voltage control method based on deep reinforcement learning is characterized by comprising the following steps:
step S1, establishing a microgrid model when a microgrid isolated island operates;
step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning;
and step S3, designing a proper reward function aiming at the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.
2. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 1, wherein the step S1 of establishing a classical model when a microgrid is in an island operation mode comprises the following steps:
s101, establishing a microgrid electrical model;
step S102, establishing a differential equation matched with the microgrid electrical model in the step S101;
wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.
3. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 2, wherein the microgrid module comprises:
the output end of the distributed power generation unit is connected with an inverter unit (1);
a filter unit (2) having one end electrically connected to the inverter unit (1) and the other end connected to a transformer unit (3);
a load cell (4) having one end electrically connected to the transformer cell (3) via a PCC and the other end grounded;
the filter unit (2) is composed of a filter resistor RtAnd a filter inductance LtAre connected in series;
the load unit (4) comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel;
the load inductor component is composed of a load inductor resistor RLAnd a load inductor L connected in series.
4. The deep reinforcement learning-based microgrid autonomous operation voltage control method according to claim 3, characterized in that the differential equation in step S102 is as follows:
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs a load inductance resistance;
when the frequency of the steady state voltage and current signals is omega0Performing park transformation on the differential equation in step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:
wherein, Itd、ItqIs itabcD-q axis component of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
5. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 4, wherein the step S2 of establishing the neural network of the constant voltage control microgrid by using DDPG algorithm includes the following steps:
step S201, establishing a state space expression of micro-grid island operation;
step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm;
step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202;
the state space expression of the microgrid isolated island operation in step S201 is as follows:
from the differential equation in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd。
6. The method according to claim 5, wherein at time t, the current network parameter of the action is θ, the target network parameter of the action is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network in step S202 are as follows:
the current action network can be based on the current state stGenerating exploratory or unexploredSpecific behavior at;
The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1;
The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value;
the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω '), and ω').
7. The microgrid autonomous operation voltage control method based on deep reinforcement learning as claimed in claim 6, wherein the parameter updating of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:
step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nThe return value at the moment t + n;
step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2
wherein, Q(s)t,atω) is the output value of the current action network at the moment t;
step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ'
wherein tau is an updating coefficient, and the value range of tau is more than or equal to 0 and less than or equal to 0.01.
8. The method according to claim 7, wherein the step S3 specifically includes the following steps:
step S301, designing a reward function and a termination function;
step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.
9. The method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning method of claim 8, wherein the expression of the reward function is as follows:
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Voltage per unit value representing the PCC point;
the termination function is represented by:
10. the method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning of claim 9, wherein the training of the DDPG algorithm in the step S302 specifically comprises the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience playback, and setting the maximum training times; step S322b, setting a loop from T ═ 1 to the maximum loop number T in training, where T is a positive integer;
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initializing S to be the first state of the current state sequence; step S322b, obtaining behavior A based on the first state S in the current action network; step S322c, executing action A to obtain a new state S', obtaining a reward R according to the reward function (8) in step S301, and obtaining termination state information is _ end according to the termination function (9) in step S301;
step S323, storing sample information, namely storing quintuple sample information { S, A, R, S', is _ end } obtained in step S322 into an experience revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1, 2. m, calculating the current target value yj:
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following steps: step 325a, updating the parameter ω of the current evaluation network by the gradient back propagation of the neural network by using a mean square error loss function; the mean square error loss function isStep 325b, updating the parameter theta of the current action network by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, determining whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, the process returns to step S322b until state S' is the end state or the maximum number of cycles is reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210021329.0A CN114336759A (en) | 2022-01-10 | 2022-01-10 | Micro-grid autonomous operation voltage control method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210021329.0A CN114336759A (en) | 2022-01-10 | 2022-01-10 | Micro-grid autonomous operation voltage control method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114336759A true CN114336759A (en) | 2022-04-12 |
Family
ID=81026351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210021329.0A Pending CN114336759A (en) | 2022-01-10 | 2022-01-10 | Micro-grid autonomous operation voltage control method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114336759A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116014772A (en) * | 2022-11-25 | 2023-04-25 | 国网上海市电力公司 | Battery energy storage system control method based on improved virtual synchronous machine |
CN118263842A (en) * | 2024-05-29 | 2024-06-28 | 南京师范大学 | Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110365056A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network voltage regulation optimization method based on DDPG |
CN110365057A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning |
CN110535146A (en) * | 2019-08-27 | 2019-12-03 | 哈尔滨工业大学 | The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth |
CN112187074A (en) * | 2020-09-15 | 2021-01-05 | 电子科技大学 | Inverter controller based on deep reinforcement learning |
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
CN113872198A (en) * | 2021-09-29 | 2021-12-31 | 电子科技大学 | Active power distribution network fault recovery method based on reinforcement learning method |
-
2022
- 2022-01-10 CN CN202210021329.0A patent/CN114336759A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110365056A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network voltage regulation optimization method based on DDPG |
CN110365057A (en) * | 2019-08-14 | 2019-10-22 | 南方电网科学研究院有限责任公司 | Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning |
CN110535146A (en) * | 2019-08-27 | 2019-12-03 | 哈尔滨工业大学 | The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth |
CN112187074A (en) * | 2020-09-15 | 2021-01-05 | 电子科技大学 | Inverter controller based on deep reinforcement learning |
CN113872198A (en) * | 2021-09-29 | 2021-12-31 | 电子科技大学 | Active power distribution network fault recovery method based on reinforcement learning method |
CN113780688A (en) * | 2021-11-10 | 2021-12-10 | 中国电力科学研究院有限公司 | Optimized operation method, system, equipment and medium of electric heating combined system |
Non-Patent Citations (4)
Title |
---|
JIAJUN DUAN等: "Deep-Reinforcement-Learning-Based Autonomous Voltage Control for Power Grid Operations", 《IEEE TRANSACTIONS ON POWER SYSTEMS》, vol. 35, no. 1, 12 September 2019 (2019-09-12), pages 814, XP011765957, DOI: 10.1109/TPWRS.2019.2941134 * |
LILONG XIE: "Research on Autonomous Operation Control of Microgrid Based on Deep Reinforcement Learning", 《2021 IEEE 5TH CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2)》, 25 October 2021 (2021-10-25), pages 2503 - 2507, XP034090455, DOI: 10.1109/EI252483.2021.9713298 * |
苏诗慧;雷勇;李永凯;朱英伟;: "基于改进DDPG算法的中短期光伏发电功率预测", 半导体光电, no. 05, 15 October 2020 (2020-10-15), pages 116 - 122 * |
龚锦霞;刘艳敏;: "基于深度确定策略梯度算法的主动配电网协调优化", 电力系统自动化, no. 06, 25 March 2020 (2020-03-25), pages 155 - 167 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116014772A (en) * | 2022-11-25 | 2023-04-25 | 国网上海市电力公司 | Battery energy storage system control method based on improved virtual synchronous machine |
CN118263842A (en) * | 2024-05-29 | 2024-06-28 | 南京师范大学 | Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning |
CN118263842B (en) * | 2024-05-29 | 2024-08-20 | 南京师范大学 | Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114336759A (en) | Micro-grid autonomous operation voltage control method based on deep reinforcement learning | |
CN113113928B (en) | Flexible-direct system direct-current bus voltage control method and device based on deep reinforcement learning | |
CN107171328B (en) | A kind of modeling of Distributed Power Flow controller and emulation mode based on ADPSS | |
CN110880774B (en) | Self-adaptive adjustment inverter controller | |
CN108736519B (en) | Self-adaptive control method and device for virtual synchronous generator of photovoltaic power station | |
Saadatmand et al. | Adaptive critic design-based reinforcement learning approach in controlling virtual inertia-based grid-connected inverters | |
CN111525581B (en) | Voltage control method for micro-grid system with unbalanced load | |
Xiong et al. | Deep reinforcement learning based parameter self-tuning control strategy for VSG | |
Lü et al. | Energy economy optimization and comprehensive performance improvement for PEMFC/LIB hybrid system based on hierarchical optimization | |
Dong et al. | Output control method of microgrid VSI control network based on dynamic matrix control algorithm | |
CN117097189A (en) | Control method of photovoltaic inverter | |
Liu et al. | Power distribution strategy based on state of charge balance for hybrid energy storage systems in all-electric ships | |
CN117990986B (en) | Current transformer impedance measurement method and device, electronic equipment and medium | |
CN113224797B (en) | PI parameter configuration method for voltage and current double closed-loop control system of inverter | |
CN106950831A (en) | A kind of reactive-load compensation method for offline optimization/switch online | |
CN112086996B (en) | Agent-based improved droop control method for parallel inverter | |
Ren et al. | Multivariable control method in STATCOM application for performance improvement | |
CN113258614B (en) | Island micro-grid elastic distributed frequency and voltage recovery control method | |
CN107800145A (en) | STATCOM control system based on Two-Degree-of-Freedom Internal Model Control | |
CN114374334A (en) | Harmonic power control method of multi-inverter parallel system | |
Wang et al. | Design and implementation of an LCL grid‐connected inverter based on capacitive current fractional proportional–integral feedback strategy | |
CN117559528B (en) | Micro-grid stability domain determining method and system based on micro-grid reduced order model | |
CN116150969B (en) | Stability analysis method for optical storage-virtual synchronous generator | |
CN117394421B (en) | Improved active disturbance rejection control method of energy storage converter based on supercoiled sliding mode observer | |
kumar Mahto et al. | Design of Controller for three-phase Grid-Connected Inverter Using Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |