CN114336759A

CN114336759A - Micro-grid autonomous operation voltage control method based on deep reinforcement learning

Info

Publication number: CN114336759A
Application number: CN202210021329.0A
Authority: CN
Inventors: 肖金星; 徐冰雁; 孙俭; 陈云峰; 叶影; 郭磊; 陈龙; 汤衡; 沈杰士; 刘杨名; 曹春; 骆国连; 徐建国; 杨军; 谢黎龙; 李勇汇; 张宇威
Original assignee: Wuhan University WHU; State Grid Shanghai Electric Power Co Ltd
Current assignee: Wuhan University WHU; State Grid Shanghai Electric Power Co Ltd
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-12

Abstract

The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training. The micro-grid autonomous operation voltage control method based on deep reinforcement learning has the advantage of high stability, can effectively stabilize load voltage, and can also effectively maintain the stability of the voltage when the voltage fluctuates due to external factors.

Description

Micro-grid autonomous operation voltage control method based on deep reinforcement learning

Technical Field

The invention belongs to the field of isolated microgrid control, and particularly relates to a microgrid autonomous operation voltage control method based on deep reinforcement learning.

Background

The distributed power supply output has the defects of intermittency and randomness, so that the distributed power supply output is difficult to use on a large scale, and the micro-grid formed by the distributed power supply, the energy storage device and the load together is an effective means for solving the problem. When the micro-grid operates in an island, the micro-grid needs to be capable of being separated from a power distribution network and independently supplying power to a load, and stable voltage and frequency are provided. In recent years, with the rapid development of artificial intelligence, control technologies are continuously developed and updated, how to apply the new-generation artificial intelligence technology to smart grids and energy internet is a research focus in the current grid and energy field, and in the aspect of control of isolated micro-grids, researches on control by deep reinforcement learning are less, and a voltage control method needs to be continuously researched.

Disclosure of Invention

The invention aims to provide a micro-grid autonomous operation voltage control method based on deep reinforcement learning, and the method has the advantages of strong control capability and good stability.

In order to achieve the above object, the present invention provides a method for controlling voltage of autonomous operation of a micro-grid based on deep reinforcement learning, which comprises the following steps: step S1, establishing a microgrid model when a microgrid isolated island operates; step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning; and step S3, designing a proper reward function according to the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.

Preferably, the step S1 of establishing the classical model of the microgrid during island operation includes the following steps: s101, establishing a microgrid electrical model; step S102, establishing a differential equation matched with the microgrid electrical model in the step S101; wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with a public power grid through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.

Preferably, the microgrid module comprises: the output end of the distributed power generation unit is connected with an inverter unit; a filter unit having one end connected to the inverter unit circuit and the other end connected to the transformer unit; one end of the load unit is connected with the transformer unit circuit through PCC, and the other end of the load unit is grounded; the filter unit) is composed of a filter resistor R_tAnd a filter inductance L_tAre connected in series; the load unit comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor R_LAnd a load inductor L connected in series.

Preferably, the differential equation described in step S102 is as follows:

wherein v is_tabcIs the three-phase output voltage of the inverter, i_tabcIs the output current of the inverter, L_t，R_tIs a filter inductor and a filter resistor, v_abcIs the load side voltage i_LabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, R_LIs a load inductance resistance;

when the frequency of the steady state voltage and current signals is omega₀＝2πf₀Performing park transformation on the differential equation in the step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:

wherein, I_td、I_tqIs i_tabcD-q axis component of (V)_dIs v is_abcD-axis component of (V)_tdIs v is_tabcD-axis component of (I)_LdIs i_LabcThe d-axis component of (a).

Preferably, the step S2 of establishing the neural network of the constant voltage control microgrid by using the DDPG algorithm includes the following steps: step S201, establishing a state space expression of micro-grid island operation; step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm; step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202; the state space expression of the microgrid isolated island operation in step S201 is as follows:

wherein,

C＝[0 0 0 1](ii) a From the differential equation in step S102, the state quantity x ═ I is determined_td I_tq I_Ld V_d]^tInput u ═ V_td。

Preferably, at time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the network parameter of the target evaluation is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network described in step S202 are as follows: the current action network can be based on the current state s_tGenerating specific actions a, exploratory or unexplored_t(ii) a The target action network can give a subsequent state s according to the environment_t+1Generating a used for predictive value_t+1(ii) a The current evaluation network is able to calculate a state s_tAnd the generated behavior a_tThe corresponding behavioral value; the target evaluation network can be based on a subsequent state s_t+1And action a_t+1Generating Q' for calculating target value_t+1,a_t+1ω') of。

Preferably, the updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps: step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:

wherein N is the number of steps from the current state to the termination state, N_nJ is the objective function with attenuation at time t, E (-) is the expectation function, R_t+nThe return value at the moment t + n;

step S232, updating the current evaluation network parameter omega by minimizing the loss function; the loss function is:

L＝E(y-Q(s_t,a_t,ω))²

wherein, Q(s)_t,a_tω) is the output value of the current action network at the moment t;

step S233, updating the target action network parameter theta 'and the target evaluation network parameter omega' respectively in a soft updating mode; the expression of the soft update mode is as follows:

ω'←τω+(1-τ)ω'

θ'←τθ+(1-τ)θ'

wherein tau is an updating coefficient, and the value range of tau is more than or equal to 0 and less than or equal to 0.01.

Preferably, the step S3 specifically includes the following steps: step S301, designing a reward function and a termination function; step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.

Preferably, the expression of the reward function is:

wherein e represents the difference between the voltage of the PCC point and the reference voltage, V^*Voltage per unit value representing the PCC point;

the termination function is represented by:

preferably, the training of the DDPG algorithm in step S302 specifically includes the following steps:

step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience playback, and setting the maximum training times; step S322b, setting a loop from T ═ 1 to the maximum loop number T in training, where T is a positive integer;

step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initializing S to be the first state of the current state sequence; step S322b, obtaining behavior A based on the first state S in the current action network; step S322c, executing action A to obtain a new state S', obtaining a reward R according to the reward function (8) in step S301, and obtaining termination state information is _ end according to the termination function (9) in step S301;

step S323, storing sample information, namely storing quintuple sample information { S, A, R, S', is _ end } obtained in step S322 into an experience revisit set D;

step S324, sample experience playback, including the following steps: step S324a, where S is S ', the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set D_j,A_j,R_j,s'_j,is_end_jJ 1, 2. m, calculating the current target value y_j：

Wherein, pi_θ' (. The) is the output function of the target action network;

step S325, updating the current network parameters, specifically including the following steps: step 325a, updating the parameter ω of the current evaluation network by the gradient back propagation of the neural network by using a mean square error loss function; the mean square error loss function is

Step 325b, updating the parameter theta of the current action network by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as

Step S326, updating the parameters of the target evaluation network and the target action network by using the soft updating mode in the step S323 through parameter transmission;

step S327, judging whether the training is converged or whether the maximum training times is reached; if the reward R obtained by training is converged or reaches the maximum training times, the control of the DDPG algorithm on the microgrid voltage in the island mode is realized, the operation is ended, otherwise, the operation goes to the step S328;

step S328, determining whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, the process returns to step S322b until state S' is the end state or the maximum number of cycles is reached.

In summary, compared with the prior art, the method for controlling the autonomous operation voltage of the micro-grid based on the deep reinforcement learning provided by the invention has the following beneficial effects that the artificial intelligence deep reinforcement learning technology is applied to the micro-grid control: by applying the depth determination gradient strategy to isolated microgrid voltage control, load voltage can be effectively stabilized, and voltage can be effectively maintained to be stable when external causes cause voltage fluctuation.

Drawings

FIG. 1 is a flow chart of a voltage control method according to the present invention;

FIG. 2 is an electrical model diagram of the microgrid of the present invention in islanding operation;

FIG. 3 is a learning flow chart of the isolated microgrid DDPG algorithm of the present invention.

Detailed Description

The technical solution, the structural features, the achieved objects and the effects of the embodiments of the present invention will be described in detail with reference to fig. 1 to 3 of the embodiments of the present invention.

It should be noted that the drawings are simplified in form and not to precise scale, and are only used for convenience and clarity to assist in describing the embodiments of the present invention, but not for limiting the conditions of the embodiments of the present invention, and therefore, the present invention is not limited by the technical spirit, and any structural modifications, changes in the proportional relationship, or adjustments in size, should fall within the scope of the technical content of the present invention without affecting the function and the achievable purpose of the present invention.

It is to be noted that, in the present invention, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention provides a micro-grid autonomous operation voltage control method based on deep reinforcement learning, wherein a micro-grid can be divided into two major operation modes of grid-connected operation and island operation according to different operation modes; when the micro-grid is in the island operation mode, better power quality needs to be provided for the load, so that a reasonable control strategy needs to be provided, and the observed quantity and the input quantity are continuous quantities; as shown in fig. 1, the voltage control method includes:

step S1, establishing a microgrid model when a microgrid isolated island operates;

step S2, establishing a neural network of the voltage control microgrid by adopting a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning;

step S3, designing a suitable reward function for the microgrid model and the voltage control requirement described in step S1, and obtaining a converged voltage through DDPG algorithm training.

The step S1 of establishing the microgrid model during microgrid island operation includes the following steps:

s101, establishing a microgrid electrical model;

step S102, establishing a differential equation matched with the microgrid electrical model in the step S101;

specifically, as shown in fig. 2, the microgrid electrical model established in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC (Point of Common Coupling, namely a public connection Point, namely a connection position of more than one user load in the power system) circuit; a switch CB provided on the PCC for controlling the connection and disconnection between the microgrid module and the public power grid module (Mainnet); when the switch CB is turned off, the micro-grid module is disconnected with the public power grid module, namely the micro-grid is isolated from the public power grid, and a micro-grid island operation mode is formed; when the switch CB is switched on, the micro-grid module is conducted with the public power grid module, namely the micro-grid is conducted with the public power grid in a closed mode, and a micro-grid-connected operation mode is formed.

As shown in fig. 2, the microgrid module in the microgrid electrical model comprises: the output end of the distributed power generation unit is connected with an Inverter unit (Inverter)1 and provides a direct-current power supply for the Inverter unit 1; a filter unit 2 having one end electrically connected to the inverter unit 1 and the other end connected to a Transformer unit (Transformer) 3; a load cell 4, one end of which is electrically connected with the transformer cell 3 through a PCC and the other end of which is grounded; wherein the filter unit 2 is composed of a filter resistorR_tAnd a filter inductance L_tAre connected in series; the load unit 4 comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel; the load inductor component is composed of a load inductor resistor R_LThe load inductor L is connected in series; the model is improved by setting the multiple loads, so that the adaptability of the microgrid electrical model when the microgrid runs in an isolated island is stronger.

In one embodiment, the distributed power generation unit is formed by combining a photovoltaic device and an energy storage device, and equivalently forms a stable direct current power supply, and the parameters of each unit of the microgrid in an island operation mode are set as follows: the distributed power generation unit parameter is output voltage V_dc800V, microgrid frequency f 50Hz, low voltage side U of transformer unit 3₁600V, high pressure side U₂Filter inductance L in filter unit 2 of 13800V_t0.3mH, filter resistance R_t1.5m Ω, load resistance R in the load unit 4 is 76 Ω, load inductance L is 111.9mH, and load inductance resistance R_L0.3515 Ω, and 62.855 μ F load capacitance C.

Further, since the microgrid system is balanced, the differential equation that is established in step S102 and matched with the microgrid electrical model is as follows:

wherein v is_tabcIs the three-phase output voltage of the inverter, i_tabcIs the output current of the inverter, L_t，R_tIs a filter inductor and a filter resistor, v_abcIs the load side voltage i_LabcIs the branch current of the load inductor, R is the load resistance, L is the load inductor, C is the load capacitance, R_LIs the load inductor resistance.

Performing park transformation on the obtained differential equation (1) (the park transformation refers to projecting the three-phase currents of a, b and c to a direct axis (d axis), a quadrature axis (q axis) and a zero axis (0 axis) perpendicular to a dq plane), and selecting the d axis to be in the same direction as a voltage vector, so that the voltage of the q axis is enabled to be in the same directionThe component is zero. When the micro-grid is in an island operation mode, the frequency f of the micro-grid is provided by the inverter through a constant-frequency internal oscillator, the frequency of the system is controlled in an open loop mode, and the frequency of a steady-state voltage signal and a steady-state current signal is omega₀2 pi f, and the differential equation in dq coordinate system is obtained as:

wherein, I_td、I_tqAre respectively i_tabcD-and q-axis components of (V)_dIs v is_abcD-axis component of (V)_tdIs v is_tabcD-axis component of (I)_LdIs i_LabcThe d-axis component of (a).

Wherein, the step S2 of establishing the neural network of the constant voltage control microgrid by using a depth determination strategy gradient (DDPG) algorithm in the deep reinforcement learning includes the following steps:

step S201, establishing a state space expression of micro-grid island operation;

step S202, establishing an action network, a target action network, a current evaluation network and a target evaluation network by adopting a DDPG algorithm;

and step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202.

Specifically, since the observed state quantity and the input quantity are continuous quantities, the state space expression of the microgrid in islanding operation established by the state space equation in step S201 is as follows:

wherein,

C＝[0 0 0 1]。

from the differential equation obtained in step S102, the state quantity x ═ I is determined_td I_tq I_Ld V_d]^tInput u ═ V_td(ii) a A. B, C, the parameters in the matrix are the parameters of the electrical model in the differential equation in dq coordinate system in step S102.

At time t, the current network parameter of the action is θ, the network parameter of the action target is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network established in step S202 are as follows: the current action network can be based on the current state s_tGenerating specific actions a, exploratory or unexplored_t(ii) a The target action network can give a subsequent state s according to the environment_t+1Generating a used for predictive value_t+1(ii) a The current evaluation network is able to calculate a state s_tAnd the generated behavior a_tThe corresponding behavioral value; the target evaluation network can be based on a subsequent state s_t+1And action a_t+1Generating Q' for calculating target value_t+1,a_t+1ω '), and ω'). Wherein the expression of the target value is:

y＝R_t+γQ'(s_t+1,a_t+1,ω') (3)

in the formula (3), y is the target value, R_tFor the return value at time t, γ is a discount factor and 0 < γ < 1, Q'(s)_t+1,a_t+1ω') is the output value of the target evaluation network at time t + 1.

In order to increase some randomness and coverage of learning in the learning process, the DDPG adds a certain noise N to the selected behavior a, and then obtains the expression of the final action a interacting with the environment as:

A＝π_θ(s)+N (4)

wherein, pi_θ(. cndot.) is the output function of the current action network, theta is the current action network parameter, and N is the random noise function.

Further, according to a state space equation of the microgrid, at the time t, a load voltage d-axis component is selected

Integral quantity of voltage per unit value e and current state quantity s ^ e_tI.e. by

Selecting per unit value of inverter port voltage

As current motion input quantity, i.e. motion interacting with the environment

The updating of the parameters of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:

step S231, updating the current action network parameter theta by optimizing the objective function; the optimization objective function is:

wherein N is the number of steps from the current state to the termination state, N_nJ is the objective function with attenuation at time t, E (-) is the expectation function, R_t+nIs the reported value at time t + n.

L＝E(y-Q(s_t,a_t,ω))² (6)

wherein, Q(s)_t,a_tAnd ω) is the output value of the current action network at the moment t.

ω'←τω+(1-τ)ω'

θ'←τθ+(1-τ)θ' (7)

τ is an update coefficient, τ is greater than or equal to 0 and less than or equal to 0.01, and τ is 0.001 in this embodiment.

Wherein, step S3 specifically includes the following steps:

step S301, designing a reward function and a termination function;

step S302, training the microgrid model proposed in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a converged voltage.

The purpose of designing the reward function in the step S301 is to determine whether the deep reinforcement learning controller takes action in the training process; specifically, a positive reward is obtained when the voltage deviation is within the allowable range, a negative reward is obtained when the voltage deviation is large, a very low negative reward is obtained when a serious deviation occurs, and a termination state is entered, so that the expression of the reward function is:

wherein e represents the difference between the voltage of the PCC point and the reference voltage, V^*Representing the voltage per unit value of the PCC point.

Furthermore, a termination function is set in combination with the termination condition of the reward function, the termination function can judge that the system state enters a termination/non-termination state, if the system state enters the termination state, the iteration of the current round is finished, and if the system state enters the non-termination state, the iteration process of the current round is continued; wherein the termination function is represented as:

in one embodiment, by combining the reward function and the termination function designed in step S301, a requirement Learning Toolbox in MATLAB2020a is used to perform constant voltage control on the microgrid model in the islanding mode proposed in step 1 by using a DDPG algorithm, where the maximum iteration number is T, as shown in fig. 3, and the training of the DDPG algorithm in step S302 specifically includes the following steps:

step S321, setting microgrid operation data, including the following steps: step S321a, randomly initializing neural network parameters, including a current action network parameter θ, a current evaluation network parameter ω, a target action network parameter θ ═ θ ', a target evaluation network parameter ω ═ ω', emptying the set D of experience replays, and setting a maximum training number (the present embodiment is set to 150 times); step S322b, setting a loop from T ═ 1 to the maximum number of loops T in training (T is a positive integer);

step S322, deep learning of the microgrid in the island mode by using a DDPG algorithm, comprising the following steps: step S322a, initialize S ═ V_dE, je) is the first state of the current state sequence; step S322b, based on the first state S in the current action network_tGet an action

In step S322c, action a is executed to obtain new state S ═ V_d', e ', - [ integral ] e '), deriving the reward R from the reward function (8) in step S301, and deriving the termination state information is _ end from the termination function (9) in step S301;

step S323, storing sample information, specifically, storing quintuple sample information { S, a, R, S', is _ end } obtained in step S322 into an empirical revisit set D;

step S324, sample experience playback, including the following steps: step S324a, where S is S ', that is, the new state S' is assigned to the first state S; step S324b, sampling m samples { S } from empirical playback set D_j,A_j,R_j,s'_j,is_end_jJ 1,2 · m (m 64 in this example), the current target value y is calculated_j：

Wherein, pi_θ' (. The) is the output function of the target action network;

step S325, updating the current network parameters, specifically including the following stepsThe method comprises the following steps: step S325a, updating the current evaluation network parameter omega by the gradient back propagation of the neural network by using a mean square error loss function formula; the mean square error loss function is

Step 325b, updating the current action network theta by the gradient back propagation of the neural network by using the loss gradient expression; the loss gradient is expressed as

step S328, judging whether the state S' is a termination state or whether the maximum cycle number is reached; if the state S' is the termination state or the maximum number of cycles is reached, return to step S322 a; otherwise, return to step S322b until state S' is the end state or the maximum number of cycles is reached.

Through continuous training of the DDPG algorithm in the deep reinforcement learning, the reward obtained by the deep reinforcement learning controller in the training process can be converged, namely, the difference value of the voltage and the reference voltage is smaller and smaller, and then the microgrid voltage which is converged in an island mode is obtained through training.

In summary, compared with the existing voltage control, the micro-grid autonomous operation voltage control method based on deep reinforcement learning provided by the invention has the advantage of high stability, and can effectively maintain the voltage stability of the micro-grid in an island mode.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A micro-grid autonomous operation voltage control method based on deep reinforcement learning is characterized by comprising the following steps:

step S2, establishing a neural network of the voltage control microgrid by adopting a DDPG algorithm in deep reinforcement learning;

and step S3, designing a proper reward function aiming at the microgrid model and the control requirements, and obtaining converged voltage through DDPG algorithm training.

2. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 1, wherein the step S1 of establishing a classical model when a microgrid is in an island operation mode comprises the following steps:

s101, establishing a microgrid electrical model;

wherein, the microgrid electrical model in step S101 includes: the micro-grid module is connected with the public power grid module through a PCC circuit; and the switch CB is arranged on the PCC and controls the connection or disconnection of the microgrid and the public power grid.

3. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 2, wherein the microgrid module comprises:

the output end of the distributed power generation unit is connected with an inverter unit (1);

a filter unit (2) having one end electrically connected to the inverter unit (1) and the other end connected to a transformer unit (3);

a load cell (4) having one end electrically connected to the transformer cell (3) via a PCC and the other end grounded;

the filter unit (2) is composed of a filter resistor R_tAnd a filter inductance L_tAre connected in series;

the load unit (4) comprises a load resistor R, a load inductance component and a load capacitor C which are connected in parallel;

the load inductor component is composed of a load inductor resistor R_LAnd a load inductor L connected in series.

4. The deep reinforcement learning-based microgrid autonomous operation voltage control method according to claim 3, characterized in that the differential equation in step S102 is as follows:

when the frequency of the steady state voltage and current signals is omega₀Performing park transformation on the differential equation in step S102, and selecting the d axis and the voltage vector in the same direction, so that the q axis voltage component is zero, and obtaining a differential equation in the dq coordinate system as follows:

5. The microgrid autonomous operation voltage control method based on deep reinforcement learning of claim 4, wherein the step S2 of establishing the neural network of the constant voltage control microgrid by using DDPG algorithm includes the following steps:

step S203, respectively updating the parameters of the action network, the target action network, the current evaluation network and the target evaluation network established in the step S202;

the state space expression of the microgrid isolated island operation in step S201 is as follows:

wherein,

C＝[0 0 0 1]；

from the differential equation in step S102, the state quantity x ═ I is determined_td I_tq I_Ld V_d]^tInput u ═ V_td。

6. The method according to claim 5, wherein at time t, the current network parameter of the action is θ, the target network parameter of the action is θ ', the current evaluation network parameter is ω, and the target evaluation network parameter is ω', and the functions of the current action network, the target action network, the current evaluation network, and the target evaluation network in step S202 are as follows:

the current action network can be based on the current state s_tGenerating exploratory or unexploredSpecific behavior a_t；

The target action network can give a subsequent state s according to the environment_t+1Generating a used for predictive value_t+1；

The current evaluation network is able to calculate a state s_tAnd the generated behavior a_tThe corresponding behavioral value;

the target evaluation network can be based on a subsequent state s_t+1And action a_t+1Generating Q' for calculating target value_t+1,a_t+1ω '), and ω').

7. The microgrid autonomous operation voltage control method based on deep reinforcement learning as claimed in claim 6, wherein the parameter updating of the current action network, the target action network, the current evaluation network and the target evaluation network in step S203 specifically includes the following steps:

L＝E(y-Q(s_t,a_t,ω))²

ω'←τω+(1-τ)ω'

θ'←τθ+(1-τ)θ'

8. The method according to claim 7, wherein the step S3 specifically includes the following steps:

step S301, designing a reward function and a termination function;

step S302, training the microgrid model provided in step S101 by using the DDPG algorithm in step S202 by using simulation software to obtain a voltage control strategy.

9. The method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning method of claim 8, wherein the expression of the reward function is as follows:

the termination function is represented by:

10. the method for controlling the autonomous operation voltage of the microgrid based on the deep reinforcement learning of claim 9, wherein the training of the DDPG algorithm in the step S302 specifically comprises the following steps:

Wherein, pi_θ' (. The) is the output function of the target action network;