CN115169957A - Power distribution network scheduling method, device and medium based on deep reinforcement learning - Google Patents
Power distribution network scheduling method, device and medium based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115169957A CN115169957A CN202210893449.XA CN202210893449A CN115169957A CN 115169957 A CN115169957 A CN 115169957A CN 202210893449 A CN202210893449 A CN 202210893449A CN 115169957 A CN115169957 A CN 115169957A
- Authority
- CN
- China
- Prior art keywords
- distribution network
- power
- power distribution
- scheduled
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 230000002787 reinforcement Effects 0.000 title claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 75
- 230000008569 process Effects 0.000 claims abstract description 46
- 230000009471 action Effects 0.000 claims abstract description 44
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims abstract description 17
- 238000004146 energy storage Methods 0.000 claims description 62
- 230000005611 electricity Effects 0.000 claims description 26
- 238000010248 power generation Methods 0.000 claims description 22
- 239000003795 chemical substances by application Substances 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 11
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 7
- 229910052799 carbon Inorganic materials 0.000 claims description 7
- 239000000446 fuel Substances 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000002347 injection Methods 0.000 claims description 6
- 239000007924 injection Substances 0.000 claims description 6
- 238000007599 discharging Methods 0.000 claims description 5
- 230000007704 transition Effects 0.000 claims description 5
- 239000002283 diesel fuel Substances 0.000 claims description 3
- XXQGYGJZNMSSFD-UHFFFAOYSA-N 2-[2-(dimethylcarbamoyl)phenoxy]acetic acid Chemical compound CN(C)C(=O)C1=CC=CC=C1OCC(O)=O XXQGYGJZNMSSFD-UHFFFAOYSA-N 0.000 claims 1
- 102100022443 CXADR-like membrane protein Human genes 0.000 claims 1
- 238000005070 sampling Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000011084 recovery Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012502 risk assessment Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/38—Arrangements for parallely feeding a single network by two or more generators, converters or transformers
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2300/00—Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
- H02J2300/10—The dispersed energy generation being of fossil origin, e.g. diesel generators
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Power Engineering (AREA)
- Entrepreneurship & Innovation (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Water Supply & Treatment (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Public Health (AREA)
- Supply And Distribution Of Alternating Current (AREA)
Abstract
The invention provides a method, a device and a medium for dispatching a power distribution network based on deep reinforcement learning, wherein the method comprises the following steps: constructing operation constraints and cost functions corresponding to a plurality of devices, constraints and cost functions of electric energy transactions between the power distribution network to be scheduled and a main network, and risk constraints of node voltages and branch power of the power distribution network, respectively, on the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring a state variable, an action variable and a reward function and constructing a Markov decision process; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is achieved, and generalization capability is obviously improved.
Description
Technical Field
The invention relates to the field of power systems, in particular to a power distribution network scheduling method, device and medium based on deep reinforcement learning.
Background
The risk assessment of the power system is a static safety analysis method integrating the probability and the severity of the operation state, and can quantitatively reflect the operation safety of the system. However, in the prior art, uncertainty of renewable energy power generation and load is not considered in the power distribution network scheduling based on risk assessment, and secondly, because the calculation involved in the scheduling process is highly non-convex and is difficult to express explicitly, the conventional method is difficult to solve, and the generalization capability of the solved result is poor.
Disclosure of Invention
The invention provides a power distribution network scheduling method, device and medium based on deep reinforcement learning, and aims to solve the technical problem of poor generalization capability in the prior art.
In order to solve the technical problem, an embodiment of the present invention provides a power distribution network scheduling method based on deep reinforcement learning, including:
constructing operation constraints and cost functions corresponding to a plurality of devices respectively for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model (further, an economic scheduling model) of the power distribution network to be scheduled;
acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
training a strategy network corresponding to the Markov decision process by a SAC algorithm in combination with basic data in the Markov decision process;
and scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
As a preferred scheme, the training of the policy network corresponding to the markov decision process by the SAC algorithm specifically includes:
updating parameters of the SAC algorithm through an ASAM algorithm and a PER algorithm, and training an intelligent agent and a strategy network corresponding to the Markov decision process through the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
Preferably, the plurality of devices comprise not less than one diesel engine set and not less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
wherein,for the active power output of the ith diesel engine set in the power distribution network to be scheduled in the period t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,for the set of all nodes connected to the diesel cluster,is a set of all time periods in the scheduling cycle;
the cost function of the diesel engine set is as follows:
wherein,for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled in the period of t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel engine set.
Preferably, the operation constraints of the energy storage system are as follows:
wherein,for the active output of the ith energy storage system in the power distribution network to be scheduled in the period of t,the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,for the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled,set, SOC, of all nodes connected with an energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be scheduled,SOC i,t for the t time period, the ith energy storage system in the power distribution network to be dispatchedThe minimum state of charge allowed for is,the maximum charge state, eta, allowed by the ith energy storage system in the period t in the power distribution network to be scheduled C Charging power, η, for energy storage systems D For discharge power of energy storage systems, E i The capacity of the ith energy storage system in the power distribution network to be dispatched is obtained;
the cost function of the energy storage system is:
wherein,is the sum of the charge and discharge costs, a, of all stored energy of the power distribution network to be scheduled in the period of t E,i And the cost coefficient of the ith energy storage system in the power distribution network to be dispatched is obtained.
As a preferred scheme, the cost function of the transaction between the power distribution network to be scheduled and the main network electric energy is as follows:
wherein,the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >0 is t time period, the power of the power distribution network to be dispatched purchasing power from the main network, P t M <0 is the power of the power distribution network to be dispatched selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,the difference proportion between the price of purchasing and selling electricity and the real-time price for the main network;
the constraint of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
wherein,for a period of time t reactive power flows from the main network to the distribution network to be scheduled,apparent power flowing from the main network to the distribution network to be scheduled for a period of time t,S M is the minimum capacity of the transmission line,is the maximum capacity of the transmission line.
As a preferred scheme, the constructing risk constraints of the node voltage and the branch power of the power distribution network to be scheduled includes:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power, Q, flowing on branch ij during time t ij,t The reactive power flowing through the branch ij in the time period t, N is the set of all nodes in the power distribution network to be scheduled, B ij Characterizing the Power flow on leg ij, S ij,t For the apparent power flowing on branch ij for the period t,for the active power of photovoltaic power generation at node i during the period t,for the active power generated by the wind at node i during time t,for the active power of the load of node i during time t,the reactive power generated by the diesel engine set at the node i in the period t,for the reactive power of the wind power generation at node i during the period t,reactive power of the load at node i for a period of t, N 0 A node set which is connected with the main network for the power distribution network;
constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal load flow calculation model, specifically:
wherein,for a time period t, the node voltage amplitude risk of the power distribution network to be scheduled,for a time period t, the branch apparent power risk, epsilon, of the node voltage of the power distribution network to be scheduled V A node voltage amplitude risk threshold value epsilon of the power distribution network to be dispatched S A branch apparent power risk threshold value, w, for the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of branch ij.
As a preferred scheme, the obtaining of the state variable, the action variable, and the reward function of the scheduling model specifically includes:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t :
Wherein,for node i wind forceThe active power of the power generation in the t period,the active power of photovoltaic power generation at the node i in the period t,for the load of node i during time t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,for the active power of the node i diesel engine set during the time period t,charging and discharging power of the energy storage system at the node i in a time period t;
defining a reward function for the scheduling model:
wherein, r(s) t ,α t ) For agent in state s t Take action a t The result of the award is that the user can,for the total running cost weighted for the period t,for the total penalty, ω, weighted over time t 1 、ω 2 、ω 3 And ω 4 The weights of the reward components.
As a preferred scheme, the building a markov decision process for the scheduling model based on the state variable, the action variable and the reward function specifically includes:
constructing the Markov decision process according to the following equation:
wherein,is a space of states that is, for example,is a space for the movement of the robot,is a state transition probability function and r is a reward function.
Correspondingly, the embodiment of the invention also provides a power distribution network scheduling device based on deep reinforcement learning, which comprises the following components:
the system comprises a constraint module, a master network and a scheduling module, wherein the constraint module is used for constructing operation constraints and cost functions corresponding to a plurality of devices of a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a master network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
the Markov decision process building module is used for obtaining a state variable, an action variable and a reward function of the scheduling model and building a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
the training module is used for training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process;
and the scheduling module is used for scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; when the computer program runs, the equipment where the computer readable storage medium is located is controlled to execute the power distribution network scheduling method based on deep reinforcement learning.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a power distribution network scheduling method and device based on deep reinforcement learning and a computer readable storage medium, wherein the method comprises the following steps: constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions between the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring state variables, action variables and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variables, the action variables and the reward function; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is realized, and generalization capability is obviously improved.
Drawings
FIG. 1: the invention provides a flow diagram of an embodiment of a power distribution network scheduling method based on deep reinforcement learning.
FIG. 2: the invention provides a charge state schematic diagram of an embodiment of a power distribution network energy storage system.
FIG. 3: the invention provides a schematic diagram of a training process of an embodiment of a policy network.
FIG. 4: the invention provides a schematic structural diagram of an embodiment of a power distribution network dispatching device based on deep reinforcement learning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a power distribution network scheduling method based on deep reinforcement learning according to an embodiment of the present invention, including steps S1 to S4, where:
step S1, constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transaction between the power distribution network to be scheduled and a main network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model (further, an economic scheduling model) of the power distribution network to be scheduled.
In this embodiment, the plurality of devices includes no less than one diesel unit and no less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
wherein,for the active output of the ith diesel engine set (or the diesel engine set at the ith node) in the power distribution network to be dispatched in the period t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,for the set of all nodes connected to the diesel cluster,is the set of all periods in the scheduling cycle.
The cost function of the diesel engine set is as follows:
wherein,for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled in the period of t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel engine set.
The operation constraint of the energy storage system is as follows:
wherein,for a period t, the active output of the ith energy storage system (or the energy storage system at the ith node) in the power distribution network to be scheduled: (Which is indicative of a discharge of electricity,indicating charging),the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled is obtained, andandare all larger than 0, and are all larger than 0,for the set, SOC, of all nodes connected with the energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be dispatched,SOC i,t for the t period, the minimum state of charge allowed by the ith energy storage system in the power distribution network to be dispatched,the maximum charge state, eta, allowed by the ith energy storage system in the t period of the power distribution network to be scheduled C Charging power, η, for energy storage systems D For the discharge power (eta) of the energy storage system C ,η D ∈[0,1]),E i And the capacity of the ith energy storage system in the power distribution network to be dispatched is obtained.
In the equation, the first constraint characterizes a capacity limit of the converter to which the energy storage system is connected. The second constraint is to avoid overcharge and overdischarge conditions that may cause a decay in the life of the energy storage system. The third constraint characterizes the relationship between the charge state of the energy storage system in the next period and the charge state and the charge and discharge power of the energy storage system in the current period. In order to facilitate the next cycle scheduling, the SOC of the last period of each scheduling cycle should be returned to the initial value, i.e., the SOC i,0 =SOC i,T 。
And,SOC i,t ,as the current time t changes, instead of maintaining a constant, refer to fig. 2. Wherein,the slopes of the sections A-B and E-D are the minimum charge state and the maximum charge state allowed by the ith energy storage system of the power distribution networkThe slopes of the C-D and A-F segments are
The cost function of the energy storage system is:
wherein,is the sum of charging and discharging costs of all stored energy of the power distribution network to be dispatched in the period of t, a E,i And the cost coefficient is the cost coefficient of the ith energy storage system in the power distribution network to be dispatched.
Further, the cost function of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
wherein,the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >0 is t time period, the power of the power distribution network to be dispatched purchasing power from the main network, P t M <0 is the power of the power distribution network to be dispatched selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,the difference proportion between the electricity purchasing price and the electricity selling price of the main network and the real-time price aims to enable the electricity purchasing price to be lower than the electricity selling price to be lower than the main network, promote the internal power consumption of the power distribution network and reduce the negative influence of the internal disturbance of the power distribution network on the main network.
The constraint of the transaction between the power distribution network to be scheduled and the main network electric energy is as follows:
wherein,for a period t of reactive power flowing from the main network to the distribution network to be scheduled,apparent power flowing from the main network to the distribution network to be scheduled for a period of time t,S M is the minimum capacity of the transmission line,is the maximum capacity of the transmission line.
And constructing risk constraints of the node voltage and branch power of the power distribution network to be scheduled, including:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power flowing on branch ij (branch ij i.e. the branch from node i to node j), Q during period t ij,t The reactive power flowing through the branch ij in the t time period, N is the set of all nodes in the power distribution network to be dispatched, B ij Characterizing the Power flow on leg ij, S ij,t For the apparent power flowing on branch ij for the period t,for the active power of photovoltaic power generation at node i during the period t,for the active power generated by the wind at node i during time t,the active power of the load of node i for a period t (if node i is not connected to the corresponding device, correspondingly,orOrIs 0) in the first step,for the reactive power generated by the diesel engine set at the node i in the period t,for the reactive power of the wind power generation at node i during the period t,reactive power of the load at node i for a period of t, N 0 The set of nodes connected to the main network for the distribution network (if node i is not connected to a corresponding device, correspondingly,orOrIs 0).
The node voltage calculation formula is as follows:
V j,t is the voltage amplitude, V, of node j during time t i,t The voltage amplitude of node i, r, for a period of t ij ,x ij Resistance and reactance, V, of branch ij, respectively 0 The node voltage at the connection with the main network is a preset value.
Constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal load flow calculation model, specifically:
wherein,for a time period t the node voltage amplitude risk of the power distribution network to be scheduled,for a period t of the node voltage of the distribution network to be scheduledApparent power risk of branch,. Epsilon V For the node voltage amplitude risk threshold value, epsilon, of the power distribution network to be scheduled S A branch apparent power risk threshold value, w, for the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of branch ij and satisfies
For the time period t the voltage magnitude risk at node i,for the apparent power risk of branch ij,is the set of all branches in the distribution network.
The node voltage magnitude risk and branch apparent power risk are defined as integrating the product of the probability density function and the severity function:
wherein PDF (V) i,t ),PDF(S ij,t ) Respectively, node voltage amplitude V i,t And branch apparent power S ij,t The probability density function can be obtained by probability load flow calculation, for example, a point estimation method is adopted to combine with Gram-Charlier expansion; sev V (V i,t ),Sev S (S ij,t ) As the node voltage amplitude V i,t Sum branch apparent power S ij,t Meets the following criteria:
V,respectively the lower limit and the upper limit of the node voltage amplitude,S,respectively, the lower and upper limits of the branch apparent power.
And S2, acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function.
In this embodiment, the obtaining of the state variables, the action variables, and the reward function of the scheduling model is preferably:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t :
Wherein,the active power for the time period t for the wind power generation of node i,active power for photovoltaic power generation at node i during time period t,for the load of node i during t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,for the active power of the node i diesel engine set during the time period t,and charging and discharging power of the energy storage system at the node i in the period t.
Wind power generation, photovoltaic power generation, load and electricity price are exogenous state variables, are determined by system uncertainty and are not influenced by action variables; the energy storage state of charge is an endogenous state variable, which is affected by the action variables. For a foreign state, state transitions are implemented by reading data for the next time period in the dataset; for endogenous states, the state transition is achieved by calculating the state of charge for the next time period. The definition of the action variable is based on the decision variable of the optimization model, but the active exchange with the main network can be carried out by the active power of each node diesel engine setAnd the charging and discharging power of each node energy storage systemCombined with the power flow calculation, and therefore not within the range defined by the action variables.
At the same time, a reward function of the scheduling model is defined:
wherein r(s) t ,a t ) For agent in state s t Take action a t The result of the award is that the user can,the weighted total operation cost (including the fuel cost of the diesel engine set, the carbon emission cost, the charge and discharge cost of the energy storage system and the cost of purchasing electricity from the main network) in the period t,the total punishment (comprising punishment of violating the state of charge constraint, punishment of node voltage amplitude value out-of-limit risk and punishment of branch apparent power out-of-limit) after t time period weighting is carried out, and omega is 1 、ω 2 、ω 3 And ω 4 The weights of the reward components.
The agent learns in interaction with the environment, in particular the agent perceives the current environmental state s t And performing action a t The environment shifts to the next state s t+1 The agent obtains a reward r(s) t ,a t )。
And, based on the state variables, the action variables and the reward function, constructing a markov decision process for the scheduling model, specifically according to the following equation:
wherein,in the form of a state space, the state space,is a space for the movement of the user,is a state transition probability function and r is a reward function. The goal of the agent is to maximize long-term accumulated rewards by interacting with the environment. The cost and penalty for defining the reward function as negative is therefore to guide the agent to minimize the running cost and meet the constraints.
And S3, training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process.
Specifically, in this embodiment, parameters of the SAC algorithm are updated by an ASAM algorithm and a PER algorithm, and an agent and a policy network corresponding to the markov decision process are trained by the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
The objective function of the SAC algorithm is the maximization of:
wherein, pi (a) t |s t ) For agent in state s t Take action a t Probability of (p) π For the state action trace generated by the policy pi,is the entropy of the strategy pi, alpha is a temperature coefficient which is used for reflecting the relative importance of strategy entropy and reward in an objective function of the SAC algorithm, when alpha → 0 the objective function is degenerated to the maximization of long-term accumulated reward in the conventional reinforcement learning algorithm,is a mathematical expectation. SAC algorithm by applying in a target functionThe maximization of the strategy entropy added into the number can effectively promote the intelligent agent to explore unknown state action space, and the learning speed of the intelligent agent is improved.
The SAC algorithm is based on an artificial neural network, so the soft Q function is parameterized as Q θ (s t ,a t ) The network parameter of the soft Q network is theta, and the Gaussian strategy is parameterized to pi φ (a t |s t ) The strategic network parameter is phi.
The input of the soft Q network is a state and an action, and the output is a 1-dimensional Q value of the state-action pair; the input of the strategy network is a state, and the output is the mean value and standard deviation of Gaussian action. In order to relieve the over-estimation problem of the soft Q function, two soft Q networks need to be established and independently trained simultaneously in the algorithm, and the network parameter is theta i (i =1, 2) and the smaller Q value output in both networks is used to update the parameters of the soft Q network and the policy network. Record of interaction of agent with environment(s) t ,a t ,s t+1 ,r t ) Is stored in an experience replay pool, and each time network parameters are updated, partial samples are extracted from the experience replay pool to perform random gradient descent.
For parameter update of soft Q network, by soft bellman residual:
wherein,is an empirical playback pool. The soft Q network has a corresponding target network and parameters thereofThe method is obtained by soft updating on the basis of soft Q network parameters:
tau is eyeThe smoothing factor of the target network is much less than 1. In soft Bellman residual differenceSubstituting the smaller of the two target network output Q values:
in order to improve generalization capability, an Adaptive Sharpness Aware Minimization (ASAM) algorithm is introduced into parameter update of the soft Q network, and an objective function of the algorithm is as follows:
wherein e is i As a network parameter theta i (i =1, 2), p being a hyper-parameter defining this neighborhood,for a normalized operator of a network parameter, for a fully connected network:
and λ is a weight attenuation coefficient of L2 regularization, which is the weight coefficient of the kth layer of the ith soft Q network.
For policy networks, however, the updated goal is to minimize the Kullback-Leibler divergence of the policy:
wherein Q is θ (s t ,a t ) Substituting two soft Q network outputsThe smaller of the Q values. The temperature coefficient alpha measures the trade-off between reward and strategy entropy in the objective function. The magnitude of the reward function has a direct effect on the temperature coefficient α, so that the performance of the SAC algorithm is impaired unless the temperature coefficient is adjusted in different tasks or during training of the same task. During training, automatic temperature coefficient adjustment is performed with the goal of minimizing:
And updating soft Q network parameters, strategy network parameters and temperature coefficients based on random gradient descent. Updating of soft Q network parameters requires solving a min-max type optimization problem:
first, the max problem of the inner layer is approximated by first-order Taylor expansion, and then the optimal epsilon is solved i And then updates theta by gradient descent i . The network parameter update formula is as follows:
wherein λ is Q ,λ π Learning rates for soft Q network and policy network respectively,λ α to update the step size of the temperature coefficient alpha.
Secondly, giving priority to each sample based on the absolute value of a time sequence difference (TD) error through a Prioritized Experience Replay (PER) algorithm, and carrying out differentiation processing on sampling probability:
wherein P (k) is the sampling probability of the kth sample in the empirical replay pool, P (k) is the priority of the kth sample in the empirical replay pool, and beta 1 Measure the degree of priority (. Beta.) 1 And =0 is an equiprobable sample). In proportional prioritization, the priority p (k) is defined as follows:
p(k)=|δ(k)|+ε;
and delta (k) is the TD error of the kth sample in the empirical replay pool, namely, the sample with the larger absolute value of the TD error is considered to have higher learning value. Epsilon is a small positive number that ensures that there is some probability of being sampled even if the TD error is 0.
For the ith soft Q network, TD error delta i The calculation of (c) is closely related to the loss function:
the TD error for updating the k-th sample priority in the empirical replay pool is given by the above equation δ i Average value of (i =1,2).
The precedence in sampling introduces a bias in the soft Q function estimate, and therefore the bias IS removed by weighting the samples with Importance Sampling (IS) in calculating the loss function, including:
w k to empirically playback the IS weights for the kth sample in the pool, normalization IS required for stability. N is the size of the empirical recovery tank, beta 2 The compensation strength of IS weight IS when beta 2 And the compensation is complete when the value is 1. Initial value at the beginning of training, beta 2 The linear increase to 1 at the end of training.
The training of the policy network corresponding to the markov decision process comprises (with reference to figure 3):
step S31, randomly initializing a strategy network parameter phi and 2 soft Q network parameters theta 1 ,θ 2 And copying the soft Q network parameters to the corresponding target network:
step S32, in each period of each scheduling cycle, the intelligent agent senses the environment state, and reads the wind power generation, the photovoltaic power generation, the load and the electricity price in the current period and the charge state of the energy storage system in the current period from a data set (basic data comprises the data set and a historical data set) for training; sampling and executing action a according to the action mean value and the action variance output by the strategy network and the Gaussian distribution t ~π φ (a t |s t ) The environment shifts to the next state s t+1 Reading wind power generation, photovoltaic power generation, load and electricity price in the next time period from a training data set, calculating the charge state of an energy storage system in the next time period, wherein the basic data comprises the training data set, and an intelligent agent obtains a reward r(s) t ,a t ) A sample(s) t ,a t ,s t+1 ,r t ) With current maximum priority p = max j p j And storing the experience data into an experience recovery pool.
Step S33, in each period of each scheduling cycle, extracting the kth sample from the empirical recovery pool with probability P (k), and calculating the IS weight w corresponding to the kth sample k And TD error delta (k) and updates its priority p (k) with IS weight w k Cumulative soft Q network loss function J Q (θ i ) The process co-extractsn samples.
Step S34, in each period of each scheduling cycle, calculating the optimal neighborhood e of the network parameters defined by the self-adaptive sharpness i Updating soft Q network parameter theta based on gradient descent 1 ,θ 2 Strategy network parameter phi and temperature coefficient alpha, and target network parameterAnd performing soft updating.
And step S35, repeating the steps S32 to S34 until the current scheduling period is finished.
And S36, repeating the steps S32 to S35 until the number of the dispatching cycles reaches a preset value and the cycle reward curve tends to be stable.
And S4, scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
In this embodiment, through the trained policy network, at each time interval, the agent perceives the current environment state s t Reading wind power generation, photovoltaic power generation, load and electricity price in the current time period from the real-time data, reading the charge state of the energy storage system in the current time period, and executing an action a according to an action mean value output by a strategy network t The environment shifts to the next state s t+1 Reading the wind power generation, the photovoltaic power generation, the load and the electricity price in the next time period from the real-time data, calculating the charge state of the energy storage system in the next time period, and obtaining the reward r(s) by the intelligent body t ,a t ) The same steps are performed for each period until the end of the current scheduling cycle.
Correspondingly, referring to fig. 4, an embodiment of the present invention further provides a power distribution network scheduling apparatus based on deep reinforcement learning, including:
the constraint module 101 is configured to construct an operation constraint and a cost function respectively corresponding to a plurality of devices for a power distribution network to be scheduled, construct a constraint and a cost function corresponding to electric energy transaction between the power distribution network to be scheduled and a main network, construct a risk constraint of node voltage and branch power of the power distribution network to be scheduled, and obtain a scheduling model of the power distribution network to be scheduled;
a markov decision process constructing module 102, configured to obtain a state variable, an action variable, and a reward function of the scheduling model, and construct a markov decision process for the scheduling model based on the state variable, the action variable, and the reward function;
a training module 103, configured to train, in the markov decision process, a policy network corresponding to the markov decision process through a SAC algorithm in combination with basic data;
and the scheduling module 104 is configured to schedule the power distribution network to be scheduled based on the output of the trained policy network.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the power distribution network scheduling method based on the deep reinforcement learning.
The module integrated by the distribution network dispatching device based on deep reinforcement learning can be stored in a computer readable storage medium if the module is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a power distribution network scheduling method and device based on deep reinforcement learning and a computer readable storage medium, wherein the method comprises the following steps: constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions between the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring state variables, action variables and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variables, the action variables and the reward function; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is realized, and generalization capability is obviously improved.
It should be noted that the above-described apparatuses are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement without inventive effort.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.
Claims (10)
1. A power distribution network scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
constructing operation constraints and cost functions corresponding to a plurality of devices respectively for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process;
and scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
2. The power distribution network scheduling method based on deep reinforcement learning according to claim 1, wherein the training of the policy network corresponding to the markov decision process by the SAC algorithm specifically comprises:
updating parameters of the SAC algorithm through an ASAM algorithm and a PER algorithm, and training an intelligent agent and a strategy network corresponding to the Markov decision process through the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
3. The power distribution network dispatching method based on deep reinforcement learning of claim 1, wherein the plurality of devices comprise not less than one diesel engine set and not less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
wherein,the active power output of the ith diesel engine set in the power distribution network to be scheduled in the period of t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,for the set of all the nodes to which the diesel group is connected,is a set of all time periods in the scheduling cycle;
the cost function of the diesel engine set is:
wherein,for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled at the time period t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel unit.
4. The power distribution network scheduling method based on deep reinforcement learning of claim 3, wherein the operation constraints of the energy storage system are as follows:
wherein,for the active output of the ith energy storage system in the power distribution network to be scheduled in the period of t,the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled,for the set, SOC, of all nodes connected with the energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be dispatched,SOC i,t for the t period, the minimum state of charge allowed by the ith energy storage system in the power distribution network to be dispatched,the maximum charge state, eta, allowed by the ith energy storage system in the period t in the power distribution network to be scheduled C Charging power, η, for energy storage systems D For the discharge power of the energy storage system, E i The capacity of the ith energy storage system in the power distribution network to be dispatched is obtained;
the cost function of the energy storage system is:
5. The distribution network scheduling method based on deep reinforcement learning of claim 4, wherein the cost function of the electric power transaction between the distribution network to be scheduled and the main network is as follows:
wherein,the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >And 0 is t time period, the power of the power distribution network to be scheduled for purchasing power from the main network, P t M <0 is the power of the power distribution network to be scheduled for selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,the difference proportion between the price of purchasing and selling electricity and the real-time price for the main network;
the constraint of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
wherein,for a period t of reactive power flowing from the main network to the distribution network to be scheduled,apparent power flowing from the main network to the power distribution network to be scheduled for a period t,S M is the minimum capacity of the transmission line,is the maximum capacity of the transmission line.
6. The power distribution network scheduling method based on deep reinforcement learning of claim 5, wherein the constructing of the risk constraints of the node voltage and the branch power of the power distribution network to be scheduled comprises:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power, Q, flowing on branch ij during time t ij,t The reactive power flowing through the branch ij in the time period t, N is the set of all nodes in the power distribution network to be scheduled, B ij Characterizing the power flow direction, S, over branch ij ij,t For the apparent power flowing on branch ij for the period t,for the active power of photovoltaic power generation at node i during the period t,for the active power generated by the wind at node i during time t,for the active power of the load of node i during time t,the reactive power generated by the diesel engine set at the node i in the period t,for the reactive power of the wind power generation at node i during the period t,reactive power of the load at node i for a period of t, N 0 A node set which is connected with the main network for the power distribution network;
constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal power flow calculation model, and specifically:
wherein,for a time period t, the node voltage amplitude risk of the power distribution network to be scheduled,for the t time period, the power distribution network to be scheduledOf the node voltage of (c) is a branch apparent power risk of V For the node voltage amplitude risk threshold value, epsilon, of the power distribution network to be scheduled S An apparent power risk threshold, w, for the branch of the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of the branch ij and,for the time period t the voltage magnitude risk at node i,for the apparent power risk of branch ij,is the set of all branches in the distribution network.
7. The power distribution network scheduling method based on deep reinforcement learning according to claim 6, wherein the obtaining of the state variables, the action variables and the reward functions of the scheduling model specifically includes:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t :
Wherein,the active power for the time period t for the wind power generation of node i,for node i photovoltaic generatorThe active power of the electricity during the time period t,for the load of node i during time t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,for the active power of the node i diesel engine set during the time period t,charging and discharging power of the energy storage system of the node i in a time period t;
and defining a reward function of the scheduling model:
8. The distribution network dispatching method based on deep reinforcement learning of claim 7, wherein the Markov decision process is constructed for the dispatching model based on the state variables, the action variables and the reward function, and specifically comprises:
constructing the Markov decision process according to the following equation:
9. The utility model provides a distribution network scheduling device based on deep reinforcement learning which characterized in that includes:
the system comprises a constraint module, a master network and a scheduling module, wherein the constraint module is used for constructing operation constraints and cost functions corresponding to a plurality of devices of a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a master network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
the Markov decision process building module is used for obtaining a state variable, an action variable and a reward function of the scheduling model and building a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
the training module is used for training a strategy network corresponding to the Markov decision process through an SAC algorithm in combination with basic data in the Markov decision process;
and the scheduling module is used for scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; the computer program controls, when executed, an apparatus on which the computer readable storage medium is located to execute the method for scheduling a power distribution network based on deep reinforcement learning according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210893449.XA CN115169957A (en) | 2022-07-27 | 2022-07-27 | Power distribution network scheduling method, device and medium based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210893449.XA CN115169957A (en) | 2022-07-27 | 2022-07-27 | Power distribution network scheduling method, device and medium based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115169957A true CN115169957A (en) | 2022-10-11 |
Family
ID=83496657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210893449.XA Pending CN115169957A (en) | 2022-07-27 | 2022-07-27 | Power distribution network scheduling method, device and medium based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115169957A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116316755A (en) * | 2023-03-07 | 2023-06-23 | 西南交通大学 | Energy management method for electrified railway energy storage system based on reinforcement learning |
CN116562464A (en) * | 2023-07-03 | 2023-08-08 | 南京菁翎信息科技有限公司 | Deep reinforcement learning-based low-carbon optimal scheduling method for power system |
-
2022
- 2022-07-27 CN CN202210893449.XA patent/CN115169957A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116316755A (en) * | 2023-03-07 | 2023-06-23 | 西南交通大学 | Energy management method for electrified railway energy storage system based on reinforcement learning |
CN116316755B (en) * | 2023-03-07 | 2023-11-14 | 西南交通大学 | Energy management method for electrified railway energy storage system based on reinforcement learning |
CN116562464A (en) * | 2023-07-03 | 2023-08-08 | 南京菁翎信息科技有限公司 | Deep reinforcement learning-based low-carbon optimal scheduling method for power system |
CN116562464B (en) * | 2023-07-03 | 2023-09-19 | 南京菁翎信息科技有限公司 | Deep reinforcement learning-based low-carbon optimal scheduling method for power system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111884213B (en) | Power distribution network voltage adjusting method based on deep reinforcement learning algorithm | |
CN114725936B (en) | Power distribution network optimization method based on multi-agent deep reinforcement learning | |
Li et al. | A merged fuzzy neural network and its applications in battery state-of-charge estimation | |
CN113572157B (en) | User real-time autonomous energy management optimization method based on near-end policy optimization | |
CN112186743A (en) | Dynamic power system economic dispatching method based on deep reinforcement learning | |
CN109347149A (en) | Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning | |
CN113326994A (en) | Virtual power plant energy collaborative optimization method considering source load storage interaction | |
CN115169957A (en) | Power distribution network scheduling method, device and medium based on deep reinforcement learning | |
CN110518580A (en) | Active power distribution network operation optimization method considering micro-grid active optimization | |
CN114362187B (en) | Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning | |
CN112507614A (en) | Comprehensive optimization method for power grid in distributed power supply high-permeability area | |
CN117057553A (en) | Deep reinforcement learning-based household energy demand response optimization method and system | |
CN116451880B (en) | Distributed energy optimization scheduling method and device based on hybrid learning | |
CN114723230B (en) | Micro-grid double-layer scheduling method and system for new energy power generation and energy storage | |
Kim et al. | Optimize the operating range for improving the cycle life of battery energy storage systems under uncertainty by managing the depth of discharge | |
CN113972645A (en) | Power distribution network optimization method based on multi-agent depth determination strategy gradient algorithm | |
Liu et al. | Multi-state joint estimation of series battery pack based on multi-model fusion | |
CN116359742B (en) | Energy storage battery state of charge on-line estimation method and system based on deep learning combination extended Kalman filtering | |
CN117277327A (en) | Grid-connected micro-grid optimal energy management method based on intelligent agent | |
CN117039981A (en) | Large-scale power grid optimal scheduling method, device and storage medium for new energy | |
CN115276067B (en) | Distributed energy storage voltage adjusting method suitable for dynamic topological change of power distribution network | |
CN114048576B (en) | Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid | |
CN117559429A (en) | Light Chu Zhi flexible power distribution system decision model training method, system and storage medium | |
CN116979579A (en) | Electric automobile energy-computing resource scheduling method based on safety constraint of micro-grid | |
CN115001002A (en) | Optimal scheduling method and system for solving energy storage participation peak clipping and valley filling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |