[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114336759A - Micro-grid autonomous operation voltage control method based on deep reinforcement learning - Google Patents

Micro-grid autonomous operation voltage control method based on deep reinforcement learning Download PDF

Info

Publication number
CN114336759A
CN114336759A CN202210021329.0A CN202210021329A CN114336759A CN 114336759 A CN114336759 A CN 114336759A CN 202210021329 A CN202210021329 A CN 202210021329A CN 114336759 A CN114336759 A CN 114336759A
Authority
CN
China
Prior art keywords
microgrid
network
current
state
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021329.0A
Other languages
Chinese (zh)
Inventor
肖金星
徐冰雁
孙俭
陈云峰
叶影
郭磊
陈龙
汤衡
沈杰士
刘杨名
曹春
骆国连
徐建国
杨军
谢黎龙
李勇汇
张宇威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
State Grid Shanghai Electric Power Co Ltd
Original Assignee
Wuhan University WHU
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU, State Grid Shanghai Electric Power Co Ltd filed Critical Wuhan University WHU
Priority to CN202210021329.0A priority Critical patent/CN114336759A/en
Publication of CN114336759A publication Critical patent/CN114336759A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Feedback Control In General (AREA)

Abstract

The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training. The micro-grid autonomous operation voltage control method based on deep reinforcement learning has the advantage of high stability, can effectively stabilize load voltage, and can also effectively maintain the stability of the voltage when the voltage fluctuates due to external factors.

Description

Micro-grid autonomous operation voltage control method based on deep reinforcement learning
Technical Field
The invention belongs to the field of isolated microgrid control, and particularly relates to a microgrid autonomous operation voltage control method based on deep reinforcement learning.
Background
The distributed power supply output has the defects of intermittency and randomness, so that the distributed power supply output is difficult to use on a large scale, and the micro-grid formed by the distributed power supply, the energy storage device and the load together is an effective means for solving the problem. When the micro-grid operates in an island, the micro-grid needs to be capable of being separated from a power distribution network and independently supplying power to a load, and stable voltage and frequency are provided. In recent years, with the rapid development of artificial intelligence, control technologies are continuously developed and updated, how to apply the new-generation artificial intelligence technology to smart grids and energy internet is a research focus in the current grid and energy field, and in the aspect of control of isolated micro-grids, researches on control by deep reinforcement learning are less, and a voltage control method needs to be continuously researched.
Disclosure of Invention
The invention aims to provide a micro-grid autonomous operation voltage control method based on deep reinforcement learning, and the method has the advantages of strong control capability and good stability.
In order to achieve the above object, the present invention provides a method for controlling voltage of autonomous operation of a micro-grid based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.
Preferably, the step S1 of establishing the classical model of the microgrid during island operation includes the following steps: s101, establishing a microgrid electrical model; step S102, establishing a differential equation matched with the microgrid electrical model in the step S101; wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with a public power grid through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.
Preferably, the microgrid module comprises: the output end of the distributed power generation unit is connected with an inverter unit; a filter unit having one end connected to the inverter unit circuit and the other end connected to the transformer unit; one end of the load unit is connected with the transformer unit circuit through PCC, and the other end of the load unit is grounded; the filter unit) is composed of a filter resistor RtAnd a filter inductance LtAre connected in series; the load unit comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor RLAnd a load inductor L connected in series.
Preferably, the differential equation described in step S102 is as follows:
Figure BDA0003462699950000021
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs a load inductance resistance;
when the frequency of the steady state voltage and current signals is omega0=2πf0Performing park transformation on the differential equation in the step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:
Figure BDA0003462699950000022
wherein, Itd、ItqIs itabcD-q axis component of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
Preferably, the step S2 of establishing the neural network of the constant voltage control microgrid by using the DDPG algorithm includes the following steps: step S201, establishing a state space expression of micro-grid island operation; step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm; step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202; the state space expression of the microgrid isolated island operation in step S201 is as follows:
Figure BDA0003462699950000031
wherein,
Figure BDA0003462699950000032
C=[0 0 0 1](ii) a From the differential equation in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd
Preferably, at time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the network parameter of the target evaluation is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network described in step S202 are as follows: the current action network can be based on the current state stGenerating specific actions a, exploratory or unexploredt(ii) a The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1(ii) a The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value; the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω') of。
Preferably, the updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps: step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
Figure BDA0003462699950000033
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nThe return value at the moment t + n;
step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2
wherein, Q(s)t,atω) is the output value of the current action network at the moment t;
step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ'
wherein tau is an updating coefficient, and the value range of tau is more than or equal to 0 and less than or equal to 0.01.
Preferably, the step S3 specifically includes the following steps: step S301, designing a reward function and a termination function; step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.
Preferably, the expression of the reward function is:
Figure BDA0003462699950000041
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Voltage per unit value representing the PCC point;
the termination function is represented by:
Figure BDA0003462699950000042
preferably, the training of the DDPG algorithm in step S302 specifically includes the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience playback, and setting the maximum training times; step S322b, setting a loop from T ═ 1 to the maximum loop number T in training, where T is a positive integer;
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initializing S to be the first state of the current state sequence; step S322b, obtaining behavior A based on the first state S in the current action network; step S322c, executing action A to obtain a new state S', obtaining a reward R according to the reward function (8) in step S301, and obtaining termination state information is _ end according to the termination function (9) in step S301;
step S323, storing sample information, namely storing quintuple sample information { S, A, R, S', is _ end } obtained in step S322 into an experience revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1, 2. m, calculating the current target value yj
Figure BDA0003462699950000051
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following steps: step 325a, updating the parameter ω of the current evaluation network by the gradient back propagation of the neural network by using a mean square error loss function; the mean square error loss function is
Figure BDA0003462699950000052
Step 325b, updating the parameter theta of the current action network by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Figure BDA0003462699950000053
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, determining whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, the process returns to step S322b until state S' is the end state or the maximum number of cycles is reached.
In summary, compared with the prior art, the method for controlling the autonomous operation voltage of the micro-grid based on the deep reinforcement learning provided by the invention has the following beneficial effects that the artificial intelligence deep reinforcement learning technology is applied to the micro-grid control: by applying the depth determination gradient strategy to isolated microgrid voltage control, load voltage can be effectively stabilized, and voltage can be effectively maintained to be stable when external causes cause voltage fluctuation.
Drawings
FIG. 1 is a flow chart of a voltage control method according to the present invention;
FIG. 2 is an electrical model diagram of the microgrid of the present invention in islanding operation;
FIG. 3 is a learning flow chart of the isolated microgrid DDPG algorithm of the present invention.
Detailed Description
The technical solution, the structural features, the achieved objects and the effects of the embodiments of the present invention will be described in detail with reference to fig. 1 to 3 of the embodiments of the present invention.
It should be noted that the drawings are simplified in form and not to precise scale, and are only used for convenience and clarity to assist in describing the embodiments of the present invention, but not for limiting the conditions of the embodiments of the present invention, and therefore, the present invention is not limited by the technical spirit, and any structural modifications, changes in the proportional relationship, or adjustments in size, should fall within the scope of the technical content of the present invention without affecting the function and the achievable purpose of the present invention.
It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, wherein a micro-grid can be divided into two major operation modes of grid-connected operation and island operation according to different operation modes; when the micro-grid is in the island operation mode, better power quality needs to be provided for the load, so that a reasonable control strategy needs to be provided, and the observed quantity and the input quantity are continuous quantities; as shown in fig. 1, the voltage control method includes:
step S1, establishing a microgrid model when a microgrid isolated island operates;
step S2, establishing a neural network of the voltage control microgrid by adopting a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning;
step S3, designing a suitable reward function for the microgrid model and the voltage control requirement described in step S1, and obtaining a converged voltage through DDPG algorithm training.
The step S1 of establishing the microgrid model during microgrid island operation includes the following steps:
s101, establishing a microgrid electrical model;
step S102, establishing a differential equation matched with the microgrid electrical model in the step S101;
specifically, as shown in fig. 2, the microgrid electrical model established in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC (Point of Common Coupling, namely a public connection Point, namely a connection position of more than one user load in the power system) circuit; a switch CB provided on the PCC for controlling the connection and disconnection between the microgrid module and the public power grid module (Mainnet); when the switch CB is turned off, the micro-grid module is disconnected with the public power grid module, namely the micro-grid is isolated from the public power grid, and a micro-grid island operation mode is formed; when the switch CB is switched on, the micro-grid module is conducted with the public power grid module, namely the micro-grid is conducted with the public power grid in a closed mode, and a micro-grid-connected operation mode is formed.
As shown in fig. 2, the microgrid module in the microgrid electrical model comprises: the output end of the distributed power generation unit is connected with an Inverter unit (Inverter)1 and provides a direct-current power supply for the Inverter unit 1; a filter unit 2 having one end electrically connected to the inverter unit 1 and the other end connected to a Transformer unit (Transformer) 3; a load cell 4, one end of which is electrically connected with the transformer cell 3 through a PCC and the other end of which is grounded; wherein the filter unit 2 is composed of a filter resistorRtAnd a filter inductance LtAre connected in series; the load unit 4 comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor RLThe load inductor L is connected in series; the model is improved by setting the multiple loads, so that the adaptability of the microgrid electrical model when the microgrid runs in an isolated island is stronger.
In one embodiment, the distributed power generation unit is formed by combining a photovoltaic device and an energy storage device, and equivalently forms a stable direct current power supply, and the parameters of each unit of the microgrid in an island operation mode are set as follows: the distributed power generation unit parameter is output voltage Vdc800V, microgrid frequency f 50Hz, low voltage side U of transformer unit 31600V, high pressure side U2Filter inductance L in filter unit 2 of 13800Vt0.3mH, filter resistance Rt1.5m Ω, load resistance R in the load unit 4 is 76 Ω, load inductance L is 111.9mH, and load inductance resistance RL0.3515 Ω, and 62.855 μ F load capacitance C.
Further, since the microgrid system is balanced, the differential equation that is established in step S102 and matched with the microgrid electrical model is as follows:
Figure BDA0003462699950000081
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs the load inductor resistance.
Performing park transformation on the obtained differential equation (1) (the park transformation refers to projecting the three-phase currents of a, b and c to a direct axis (d axis), a quadrature axis (q axis) and a zero axis (0 axis) perpendicular to a dq plane), and selecting the d axis to be in the same direction as a voltage vector, so that the voltage of the q axis is enabled to be in the same directionThe component is zero. When the micro-grid is in an island operation mode, the frequency f of the micro-grid is provided by the inverter through a constant-frequency internal oscillator, the frequency of the system is controlled in an open loop mode, and the frequency of a steady-state voltage signal and a steady-state current signal is omega02 pi f, and the differential equation in dq coordinate system is obtained as:
Figure BDA0003462699950000082
wherein, Itd、ItqAre respectively itabcD-and q-axis components of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
Wherein, the step S2 of establishing the neural network of the constant voltage control microgrid by using a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning includes the following steps:
step S201, establishing a state space expression of micro-grid island operation;
step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm;
and step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202.
Specifically, since the observed state quantity and the input quantity are continuous quantities, the state space expression of the microgrid in islanding operation established by the state space equation in step S201 is as follows:
Figure BDA0003462699950000091
wherein,
Figure BDA0003462699950000092
C=[0 0 0 1]。
from the differential equation obtained in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd(ii) a A. B, C, the parameters in the matrix are the parameters of the electrical model in the differential equation in dq coordinate system in step S102.
At time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network established in step S202 are as follows: the current action network can be based on the current state stGenerating specific actions a, exploratory or unexploredt(ii) a The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1(ii) a The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value; the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω '), and ω'). Wherein the expression of the target value is:
y=Rt+γQ'(st+1,at+1,ω') (3)
in the formula (3), y is the target value, RtFor the return value at time t, γ is a discount factor and 0 < γ < 1, Q'(s)t+1,at+1ω') is the output value of the target evaluation network at time t + 1.
In order to increase some randomness and coverage of learning in the learning process, the DDPG adds a certain noise N to the selected behavior a, and then obtains the expression of the final action a interacting with the environment as:
A=πθ(s)+N (4)
wherein, piθ(. cndot.) is the output function of the current action network, theta is the current action network parameter, and N is the random noise function.
Further, according to a state space equation of the microgrid, at the time t, a load voltage d-axis component is selected
Figure BDA0003462699950000101
Integral quantity of voltage per unit value e and current state quantity s ^ etI.e. by
Figure BDA0003462699950000102
Selecting per unit value of inverter port voltage
Figure BDA0003462699950000103
As current motion input quantity, i.e. motion interacting with the environment
Figure BDA0003462699950000104
The updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:
step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
Figure BDA0003462699950000105
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nIs the reported value at time t + n.
Step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2 (6)
wherein, Q(s)t,atAnd ω) is the output value of the current action network at the moment t.
Step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ' (7)
τ is an update coefficient, τ is greater than or equal to 0 and less than or equal to 0.01, and τ is 0.001 in this embodiment.
Wherein, step S3 specifically includes the following steps:
step S301, designing a reward function and a termination function;
step S302, training the microgrid model proposed in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a converged voltage.
The purpose of designing the reward function in the step S301 is to determine whether the deep reinforcement learning controller takes action in the training process; specifically, a positive reward is obtained when the voltage deviation is within the allowable range, a negative reward is obtained when the voltage deviation is large, a very low negative reward is obtained when a serious deviation occurs, and a termination state is entered, so that the expression of the reward function is:
Figure BDA0003462699950000111
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Representing the voltage per unit value of the PCC point.
Furthermore, a termination function is set in combination with the termination condition of the reward function, the termination function can judge that the system state enters a termination/non-termination state, if the system state enters the termination state, the iteration of the current round is finished, and if the system state enters the non-termination state, the iteration process of the current round is continued; wherein the termination function is represented as:
Figure BDA0003462699950000112
in one embodiment, by combining the reward function and the termination function designed in step S301, a requirement Learning Toolbox in MATLAB2020a is used to perform constant voltage control on the microgrid model in the islanding mode proposed in step 1 by using a DDPG algorithm, where the maximum iteration number is T, as shown in fig. 3, and the training of the DDPG algorithm in step S302 specifically includes the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience replays, and setting a maximum training number (the present embodiment is set to 150 times); step S322b, setting a loop from T ═ 1 to the maximum number of loops T in training (T is a positive integer);
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initialize S ═ VdE, je) is the first state of the current state sequence; step S322b, based on the first state S in the current action networktGet an action
Figure BDA0003462699950000113
In step S322c, action a is executed to obtain new state S ═ Vd', e ', - [ integral ] e '), deriving the reward R from the reward function (8) in step S301, and deriving the termination state information is _ end from the termination function (9) in step S301;
step S323, storing sample information, specifically, storing quintuple sample information { S, a, R, S', is _ end } obtained in step S322 into an empirical revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', that is, the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1,2 · m (m 64 in this example), the current target value y is calculatedj
Figure BDA0003462699950000121
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following stepsThe method comprises the following steps: step S325a, updating the current evaluation network parameter omega by the gradient back propagation of the neural network by using a mean square error loss function formula; the mean square error loss function is
Figure BDA0003462699950000122
Step 325b, updating the current action network theta by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Figure BDA0003462699950000123
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, judging whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, return to step S322b until state S' is the end state or the maximum number of cycles is reached.
Through continuous training of the DDPG algorithm in the deep reinforcement learning, the reward obtained by the deep reinforcement learning controller in the training process can be converged, namely, the difference value of the voltage and the reference voltage is smaller and smaller, and then the microgrid voltage which is converged in an island mode is obtained through training.
In summary, compared with the existing voltage control, the micro-grid autonomous operation voltage control method based on deep reinforcement learning provided by the invention has the advantage of high stability, and can effectively maintain the voltage stability of the micro-grid in an island mode.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (10)

1. A micro-grid autonomous operation voltage control method based on deep reinforcement learning is characterized by comprising the following steps:
step S1, establishing a microgrid model when a microgrid isolated island operates;
step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning;
and step S3, designing a proper reward function aiming at the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.
2. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 1, wherein the step S1 of establishing a classical model when a microgrid is in an island operation mode comprises the following steps:
s101, establishing a microgrid electrical model;
step S102, establishing a differential equation matched with the microgrid electrical model in the step S101;
wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.
3. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 2, wherein the microgrid module comprises:
the output end of the distributed power generation unit is connected with an inverter unit (1);
a filter unit (2) having one end electrically connected to the inverter unit (1) and the other end connected to a transformer unit (3);
a load cell (4) having one end electrically connected to the transformer cell (3) via a PCC and the other end grounded;
the filter unit (2) is composed of a filter resistor RtAnd a filter inductance LtAre connected in series;
the load unit (4) comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel;
the load inductor component is composed of a load inductor resistor RLAnd a load inductor L connected in series.
4. The deep reinforcement learning-based microgrid autonomous operation voltage control method according to claim 3, characterized in that the differential equation in step S102 is as follows:
Figure FDA0003462699940000021
wherein v istabcIs the three-phase output voltage of the inverter, itabcIs the output current of the inverter, Lt,RtIs a filter inductor and a filter resistor, vabcIs the load side voltage iLabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, RLIs a load inductance resistance;
when the frequency of the steady state voltage and current signals is omega0Performing park transformation on the differential equation in step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:
Figure FDA0003462699940000022
wherein, Itd、ItqIs itabcD-q axis component of (V)dIs v isabcD-axis component of (V)tdIs v istabcD-axis component of (I)LdIs iLabcThe d-axis component of (a).
5. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 4, wherein the step S2 of establishing the neural network of the constant voltage control microgrid by using DDPG algorithm includes the following steps:
step S201, establishing a state space expression of micro-grid island operation;
step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm;
step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202;
the state space expression of the microgrid isolated island operation in step S201 is as follows:
Figure FDA0003462699940000031
wherein,
Figure FDA0003462699940000032
C=[0 0 0 1];
from the differential equation in step S102, the state quantity x ═ I is determinedtd Itq ILd Vd]tInput u ═ Vtd
6. The method according to claim 5, wherein at time t, the current network parameter of the action is θ, the target network parameter of the action is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network in step S202 are as follows:
the current action network can be based on the current state stGenerating exploratory or unexploredSpecific behavior at
The target action network can give a subsequent state s according to the environmentt+1Generating a used for predictive valuet+1
The current evaluation network is able to calculate a state stAnd the generated behavior atThe corresponding behavioral value;
the target evaluation network can be based on a subsequent state st+1And action at+1Generating Q' for calculating target valuet+1,at+1ω '), and ω').
7. The microgrid autonomous operation voltage control method based on deep reinforcement learning as claimed in claim 6, wherein the parameter updating of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:
step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:
Figure FDA0003462699940000041
wherein N is the number of steps from the current state to the termination state, NnJ is the objective function with attenuation at time t, E (-) is the expectation function, Rt+nThe return value at the moment t + n;
step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:
L=E(y-Q(st,at,ω))2
wherein, Q(s)t,atω) is the output value of the current action network at the moment t;
step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:
ω'←τω+(1-τ)ω'
θ'←τθ+(1-τ)θ'
wherein tau is an updating coefficient, and the value range of tau is more than or equal to 0 and less than or equal to 0.01.
8. The method according to claim 7, wherein the step S3 specifically includes the following steps:
step S301, designing a reward function and a termination function;
step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.
9. The method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning method of claim 8, wherein the expression of the reward function is as follows:
Figure FDA0003462699940000042
wherein e represents the difference between the voltage of the PCC point and the reference voltage, V*Voltage per unit value representing the PCC point;
the termination function is represented by:
Figure FDA0003462699940000043
10. the method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning of claim 9, wherein the training of the DDPG algorithm in the step S302 specifically comprises the following steps:
step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience playback, and setting the maximum training times; step S322b, setting a loop from T ═ 1 to the maximum loop number T in training, where T is a positive integer;
step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initializing S to be the first state of the current state sequence; step S322b, obtaining behavior A based on the first state S in the current action network; step S322c, executing action A to obtain a new state S', obtaining a reward R according to the reward function (8) in step S301, and obtaining termination state information is _ end according to the termination function (9) in step S301;
step S323, storing sample information, namely storing quintuple sample information { S, A, R, S', is _ end } obtained in step S322 into an experience revisit set D;
step S324, sample experience playback, including the following steps: step S324a, where S is S ', the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set Dj,Aj,Rj,s'j,is_endjJ 1, 2. m, calculating the current target value yj
Figure FDA0003462699940000051
Wherein, piθ' (. The) is the output function of the target action network;
step S325, updating the current network parameters, specifically including the following steps: step 325a, updating the parameter ω of the current evaluation network by the gradient back propagation of the neural network by using a mean square error loss function; the mean square error loss function is
Figure FDA0003462699940000052
Step 325b, updating the parameter theta of the current action network by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as
Figure FDA0003462699940000053
Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;
step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;
step S328, determining whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, the process returns to step S322b until state S' is the end state or the maximum number of cycles is reached.
CN202210021329.0A 2022-01-10 2022-01-10 Micro-grid autonomous operation voltage control method based on deep reinforcement learning Pending CN114336759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021329.0A CN114336759A (en) 2022-01-10 2022-01-10 Micro-grid autonomous operation voltage control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021329.0A CN114336759A (en) 2022-01-10 2022-01-10 Micro-grid autonomous operation voltage control method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN114336759A true CN114336759A (en) 2022-04-12

Family

ID=81026351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021329.0A Pending CN114336759A (en) 2022-01-10 2022-01-10 Micro-grid autonomous operation voltage control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114336759A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116014772A (en) * 2022-11-25 2023-04-25 国网上海市电力公司 Battery energy storage system control method based on improved virtual synchronous machine
CN118263842A (en) * 2024-05-29 2024-06-28 南京师范大学 Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365056A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy participation power distribution network voltage regulation optimization method based on DDPG
CN110365057A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning
CN110535146A (en) * 2019-08-27 2019-12-03 哈尔滨工业大学 The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN113780688A (en) * 2021-11-10 2021-12-10 中国电力科学研究院有限公司 Optimized operation method, system, equipment and medium of electric heating combined system
CN113872198A (en) * 2021-09-29 2021-12-31 电子科技大学 Active power distribution network fault recovery method based on reinforcement learning method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365056A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy participation power distribution network voltage regulation optimization method based on DDPG
CN110365057A (en) * 2019-08-14 2019-10-22 南方电网科学研究院有限责任公司 Distributed energy participation power distribution network peak regulation scheduling optimization method based on reinforcement learning
CN110535146A (en) * 2019-08-27 2019-12-03 哈尔滨工业大学 The Method for Reactive Power Optimization in Power of Policy-Gradient Reinforcement Learning is determined based on depth
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN113872198A (en) * 2021-09-29 2021-12-31 电子科技大学 Active power distribution network fault recovery method based on reinforcement learning method
CN113780688A (en) * 2021-11-10 2021-12-10 中国电力科学研究院有限公司 Optimized operation method, system, equipment and medium of electric heating combined system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIAJUN DUAN等: "Deep-Reinforcement-Learning-Based Autonomous Voltage Control for Power Grid Operations", 《IEEE TRANSACTIONS ON POWER SYSTEMS》, vol. 35, no. 1, 12 September 2019 (2019-09-12), pages 814, XP011765957, DOI: 10.1109/TPWRS.2019.2941134 *
LILONG XIE: "Research on Autonomous Operation Control of Microgrid Based on Deep Reinforcement Learning", 《2021 IEEE 5TH CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2)》, 25 October 2021 (2021-10-25), pages 2503 - 2507, XP034090455, DOI: 10.1109/EI252483.2021.9713298 *
苏诗慧;雷勇;李永凯;朱英伟;: "基于改进DDPG算法的中短期光伏发电功率预测", 半导体光电, no. 05, 15 October 2020 (2020-10-15), pages 116 - 122 *
龚锦霞;刘艳敏;: "基于深度确定策略梯度算法的主动配电网协调优化", 电力系统自动化, no. 06, 25 March 2020 (2020-03-25), pages 155 - 167 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116014772A (en) * 2022-11-25 2023-04-25 国网上海市电力公司 Battery energy storage system control method based on improved virtual synchronous machine
CN118263842A (en) * 2024-05-29 2024-06-28 南京师范大学 Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning
CN118263842B (en) * 2024-05-29 2024-08-20 南京师范大学 Direct-current micro-grid coordinated control method based on multi-agent system and deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN114336759A (en) Micro-grid autonomous operation voltage control method based on deep reinforcement learning
CN113113928B (en) Flexible-direct system direct-current bus voltage control method and device based on deep reinforcement learning
CN107171328B (en) A kind of modeling of Distributed Power Flow controller and emulation mode based on ADPSS
CN110880774B (en) Self-adaptive adjustment inverter controller
CN108736519B (en) Self-adaptive control method and device for virtual synchronous generator of photovoltaic power station
Saadatmand et al. Adaptive critic design-based reinforcement learning approach in controlling virtual inertia-based grid-connected inverters
CN111525581B (en) Voltage control method for micro-grid system with unbalanced load
Xiong et al. Deep reinforcement learning based parameter self-tuning control strategy for VSG
Lü et al. Energy economy optimization and comprehensive performance improvement for PEMFC/LIB hybrid system based on hierarchical optimization
Dong et al. Output control method of microgrid VSI control network based on dynamic matrix control algorithm
CN117097189A (en) Control method of photovoltaic inverter
Liu et al. Power distribution strategy based on state of charge balance for hybrid energy storage systems in all-electric ships
CN117990986B (en) Current transformer impedance measurement method and device, electronic equipment and medium
CN113224797B (en) PI parameter configuration method for voltage and current double closed-loop control system of inverter
CN106950831A (en) A kind of reactive-load compensation method for offline optimization/switch online
CN112086996B (en) Agent-based improved droop control method for parallel inverter
Ren et al. Multivariable control method in STATCOM application for performance improvement
CN113258614B (en) Island micro-grid elastic distributed frequency and voltage recovery control method
CN107800145A (en) STATCOM control system based on Two-Degree-of-Freedom Internal Model Control
CN114374334A (en) Harmonic power control method of multi-inverter parallel system
Wang et al. Design and implementation of an LCL grid‐connected inverter based on capacitive current fractional proportional–integral feedback strategy
CN117559528B (en) Micro-grid stability domain determining method and system based on micro-grid reduced order model
CN116150969B (en) Stability analysis method for optical storage-virtual synchronous generator
CN117394421B (en) Improved active disturbance rejection control method of energy storage converter based on supercoiled sliding mode observer
kumar Mahto et al. Design of Controller for three-phase Grid-Connected Inverter Using Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination