[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115169957A - Power distribution network scheduling method, device and medium based on deep reinforcement learning - Google Patents

Power distribution network scheduling method, device and medium based on deep reinforcement learning Download PDF

Info

Publication number
CN115169957A
CN115169957A CN202210893449.XA CN202210893449A CN115169957A CN 115169957 A CN115169957 A CN 115169957A CN 202210893449 A CN202210893449 A CN 202210893449A CN 115169957 A CN115169957 A CN 115169957A
Authority
CN
China
Prior art keywords
distribution network
power
power distribution
scheduled
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210893449.XA
Other languages
Chinese (zh)
Inventor
陈铭
刘刚刚
侯凯
马顺
阮楠千
许银亮
梅诗妍
曾瑜
胡晋岚
孙罡
姜玉梁
周妍
秦燕
秦万祥
赵芳菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202210893449.XA priority Critical patent/CN115169957A/en
Publication of CN115169957A publication Critical patent/CN115169957A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/10The dispersed energy generation being of fossil origin, e.g. diesel generators
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Water Supply & Treatment (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Public Health (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a method, a device and a medium for dispatching a power distribution network based on deep reinforcement learning, wherein the method comprises the following steps: constructing operation constraints and cost functions corresponding to a plurality of devices, constraints and cost functions of electric energy transactions between the power distribution network to be scheduled and a main network, and risk constraints of node voltages and branch power of the power distribution network, respectively, on the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring a state variable, an action variable and a reward function and constructing a Markov decision process; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is achieved, and generalization capability is obviously improved.

Description

Power distribution network scheduling method, device and medium based on deep reinforcement learning
Technical Field
The invention relates to the field of power systems, in particular to a power distribution network scheduling method, device and medium based on deep reinforcement learning.
Background
The risk assessment of the power system is a static safety analysis method integrating the probability and the severity of the operation state, and can quantitatively reflect the operation safety of the system. However, in the prior art, uncertainty of renewable energy power generation and load is not considered in the power distribution network scheduling based on risk assessment, and secondly, because the calculation involved in the scheduling process is highly non-convex and is difficult to express explicitly, the conventional method is difficult to solve, and the generalization capability of the solved result is poor.
Disclosure of Invention
The invention provides a power distribution network scheduling method, device and medium based on deep reinforcement learning, and aims to solve the technical problem of poor generalization capability in the prior art.
In order to solve the technical problem, an embodiment of the present invention provides a power distribution network scheduling method based on deep reinforcement learning, including:
constructing operation constraints and cost functions corresponding to a plurality of devices respectively for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model (further, an economic scheduling model) of the power distribution network to be scheduled;
acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
training a strategy network corresponding to the Markov decision process by a SAC algorithm in combination with basic data in the Markov decision process;
and scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
As a preferred scheme, the training of the policy network corresponding to the markov decision process by the SAC algorithm specifically includes:
updating parameters of the SAC algorithm through an ASAM algorithm and a PER algorithm, and training an intelligent agent and a strategy network corresponding to the Markov decision process through the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
Preferably, the plurality of devices comprise not less than one diesel engine set and not less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
Figure BDA0003768467550000021
wherein,
Figure BDA0003768467550000022
for the active power output of the ith diesel engine set in the power distribution network to be scheduled in the period t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure BDA0003768467550000023
for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure BDA0003768467550000024
for the set of all nodes connected to the diesel cluster,
Figure BDA0003768467550000025
is a set of all time periods in the scheduling cycle;
the cost function of the diesel engine set is as follows:
Figure BDA0003768467550000026
Figure BDA0003768467550000027
wherein,
Figure BDA0003768467550000028
for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,
Figure BDA0003768467550000029
is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled in the period of t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel engine set.
Preferably, the operation constraints of the energy storage system are as follows:
Figure BDA00037684675500000210
Figure BDA0003768467550000031
Figure BDA0003768467550000032
wherein,
Figure BDA0003768467550000033
for the active output of the ith energy storage system in the power distribution network to be scheduled in the period of t,
Figure BDA0003768467550000034
the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,
Figure BDA0003768467550000035
for the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled,
Figure BDA0003768467550000036
set, SOC, of all nodes connected with an energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be scheduled,SOC i,t for the t time period, the ith energy storage system in the power distribution network to be dispatchedThe minimum state of charge allowed for is,
Figure BDA0003768467550000037
the maximum charge state, eta, allowed by the ith energy storage system in the period t in the power distribution network to be scheduled C Charging power, η, for energy storage systems D For discharge power of energy storage systems, E i The capacity of the ith energy storage system in the power distribution network to be dispatched is obtained;
the cost function of the energy storage system is:
Figure BDA0003768467550000038
wherein,
Figure BDA0003768467550000039
is the sum of the charge and discharge costs, a, of all stored energy of the power distribution network to be scheduled in the period of t E,i And the cost coefficient of the ith energy storage system in the power distribution network to be dispatched is obtained.
As a preferred scheme, the cost function of the transaction between the power distribution network to be scheduled and the main network electric energy is as follows:
Figure BDA00037684675500000310
wherein,
Figure BDA00037684675500000311
the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >0 is t time period, the power of the power distribution network to be dispatched purchasing power from the main network, P t M <0 is the power of the power distribution network to be dispatched selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,
Figure BDA00037684675500000312
the difference proportion between the price of purchasing and selling electricity and the real-time price for the main network;
the constraint of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
Figure BDA00037684675500000313
Figure BDA0003768467550000041
wherein,
Figure BDA0003768467550000042
for a period of time t reactive power flows from the main network to the distribution network to be scheduled,
Figure BDA0003768467550000043
apparent power flowing from the main network to the distribution network to be scheduled for a period of time t,S M is the minimum capacity of the transmission line,
Figure BDA0003768467550000044
is the maximum capacity of the transmission line.
As a preferred scheme, the constructing risk constraints of the node voltage and the branch power of the power distribution network to be scheduled includes:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
Figure BDA0003768467550000045
Figure BDA0003768467550000046
Figure BDA0003768467550000047
Figure BDA0003768467550000048
Figure BDA0003768467550000049
Figure BDA00037684675500000410
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power, Q, flowing on branch ij during time t ij,t The reactive power flowing through the branch ij in the time period t, N is the set of all nodes in the power distribution network to be scheduled, B ij Characterizing the Power flow on leg ij, S ij,t For the apparent power flowing on branch ij for the period t,
Figure BDA00037684675500000411
for the active power of photovoltaic power generation at node i during the period t,
Figure BDA00037684675500000412
for the active power generated by the wind at node i during time t,
Figure BDA00037684675500000413
for the active power of the load of node i during time t,
Figure BDA00037684675500000414
the reactive power generated by the diesel engine set at the node i in the period t,
Figure BDA00037684675500000415
for the reactive power of the wind power generation at node i during the period t,
Figure BDA00037684675500000416
reactive power of the load at node i for a period of t, N 0 A node set which is connected with the main network for the power distribution network;
constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal load flow calculation model, specifically:
Figure BDA0003768467550000051
Figure BDA0003768467550000052
wherein,
Figure BDA0003768467550000053
for a time period t, the node voltage amplitude risk of the power distribution network to be scheduled,
Figure BDA0003768467550000054
for a time period t, the branch apparent power risk, epsilon, of the node voltage of the power distribution network to be scheduled V A node voltage amplitude risk threshold value epsilon of the power distribution network to be dispatched S A branch apparent power risk threshold value, w, for the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of branch ij.
As a preferred scheme, the obtaining of the state variable, the action variable, and the reward function of the scheduling model specifically includes:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t
Figure BDA0003768467550000055
Figure BDA0003768467550000056
Wherein,
Figure BDA0003768467550000057
for node i wind forceThe active power of the power generation in the t period,
Figure BDA0003768467550000058
the active power of photovoltaic power generation at the node i in the period t,
Figure BDA0003768467550000059
for the load of node i during time t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,
Figure BDA00037684675500000510
for the active power of the node i diesel engine set during the time period t,
Figure BDA00037684675500000511
charging and discharging power of the energy storage system at the node i in a time period t;
defining a reward function for the scheduling model:
Figure BDA00037684675500000512
Figure BDA00037684675500000513
Figure BDA00037684675500000514
wherein, r(s) tt ) For agent in state s t Take action a t The result of the award is that the user can,
Figure BDA00037684675500000515
for the total running cost weighted for the period t,
Figure BDA00037684675500000516
for the total penalty, ω, weighted over time t 1 、ω 2 、ω 3 And ω 4 The weights of the reward components.
As a preferred scheme, the building a markov decision process for the scheduling model based on the state variable, the action variable and the reward function specifically includes:
constructing the Markov decision process according to the following equation:
Figure BDA0003768467550000061
Figure BDA0003768467550000062
wherein,
Figure BDA0003768467550000063
is a space of states that is, for example,
Figure BDA0003768467550000064
is a space for the movement of the robot,
Figure BDA0003768467550000065
is a state transition probability function and r is a reward function.
Correspondingly, the embodiment of the invention also provides a power distribution network scheduling device based on deep reinforcement learning, which comprises the following components:
the system comprises a constraint module, a master network and a scheduling module, wherein the constraint module is used for constructing operation constraints and cost functions corresponding to a plurality of devices of a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a master network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
the Markov decision process building module is used for obtaining a state variable, an action variable and a reward function of the scheduling model and building a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
the training module is used for training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process;
and the scheduling module is used for scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; when the computer program runs, the equipment where the computer readable storage medium is located is controlled to execute the power distribution network scheduling method based on deep reinforcement learning.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a power distribution network scheduling method and device based on deep reinforcement learning and a computer readable storage medium, wherein the method comprises the following steps: constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions between the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring state variables, action variables and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variables, the action variables and the reward function; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is realized, and generalization capability is obviously improved.
Drawings
FIG. 1: the invention provides a flow diagram of an embodiment of a power distribution network scheduling method based on deep reinforcement learning.
FIG. 2: the invention provides a charge state schematic diagram of an embodiment of a power distribution network energy storage system.
FIG. 3: the invention provides a schematic diagram of a training process of an embodiment of a policy network.
FIG. 4: the invention provides a schematic structural diagram of an embodiment of a power distribution network dispatching device based on deep reinforcement learning.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, fig. 1 is a power distribution network scheduling method based on deep reinforcement learning according to an embodiment of the present invention, including steps S1 to S4, where:
step S1, constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transaction between the power distribution network to be scheduled and a main network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model (further, an economic scheduling model) of the power distribution network to be scheduled.
In this embodiment, the plurality of devices includes no less than one diesel unit and no less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
Figure BDA0003768467550000081
wherein,
Figure BDA0003768467550000082
for the active output of the ith diesel engine set (or the diesel engine set at the ith node) in the power distribution network to be dispatched in the period t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure BDA0003768467550000083
for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure BDA0003768467550000084
for the set of all nodes connected to the diesel cluster,
Figure BDA0003768467550000085
is the set of all periods in the scheduling cycle.
The cost function of the diesel engine set is as follows:
Figure BDA0003768467550000086
Figure BDA0003768467550000087
wherein,
Figure BDA00037684675500000811
for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,
Figure BDA0003768467550000088
is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled in the period of t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel engine set.
The operation constraint of the energy storage system is as follows:
Figure BDA0003768467550000089
Figure BDA00037684675500000810
Figure BDA0003768467550000091
wherein,
Figure BDA0003768467550000092
for a period t, the active output of the ith energy storage system (or the energy storage system at the ith node) in the power distribution network to be scheduled: (
Figure BDA0003768467550000093
Which is indicative of a discharge of electricity,
Figure BDA0003768467550000094
indicating charging),
Figure BDA0003768467550000095
the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,
Figure BDA0003768467550000096
the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled is obtained, and
Figure BDA0003768467550000097
and
Figure BDA0003768467550000098
are all larger than 0, and are all larger than 0,
Figure BDA0003768467550000099
for the set, SOC, of all nodes connected with the energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be dispatched,SOC i,t for the t period, the minimum state of charge allowed by the ith energy storage system in the power distribution network to be dispatched,
Figure BDA00037684675500000910
the maximum charge state, eta, allowed by the ith energy storage system in the t period of the power distribution network to be scheduled C Charging power, η, for energy storage systems D For the discharge power (eta) of the energy storage system CD ∈[0,1]),E i And the capacity of the ith energy storage system in the power distribution network to be dispatched is obtained.
In the equation, the first constraint characterizes a capacity limit of the converter to which the energy storage system is connected. The second constraint is to avoid overcharge and overdischarge conditions that may cause a decay in the life of the energy storage system. The third constraint characterizes the relationship between the charge state of the energy storage system in the next period and the charge state and the charge and discharge power of the energy storage system in the current period. In order to facilitate the next cycle scheduling, the SOC of the last period of each scheduling cycle should be returned to the initial value, i.e., the SOC i,0 =SOC i,T
And,SOC i,t ,
Figure BDA00037684675500000911
as the current time t changes, instead of maintaining a constant, refer to fig. 2. Wherein,
Figure BDA00037684675500000912
the slopes of the sections A-B and E-D are the minimum charge state and the maximum charge state allowed by the ith energy storage system of the power distribution network
Figure BDA00037684675500000913
The slopes of the C-D and A-F segments are
Figure BDA00037684675500000914
The cost function of the energy storage system is:
Figure BDA00037684675500000915
wherein,
Figure BDA00037684675500000916
is the sum of charging and discharging costs of all stored energy of the power distribution network to be dispatched in the period of t, a E,i And the cost coefficient is the cost coefficient of the ith energy storage system in the power distribution network to be dispatched.
Further, the cost function of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
Figure BDA0003768467550000101
wherein,
Figure BDA0003768467550000102
the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >0 is t time period, the power of the power distribution network to be dispatched purchasing power from the main network, P t M <0 is the power of the power distribution network to be dispatched selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,
Figure BDA0003768467550000103
the difference proportion between the electricity purchasing price and the electricity selling price of the main network and the real-time price aims to enable the electricity purchasing price to be lower than the electricity selling price to be lower than the main network, promote the internal power consumption of the power distribution network and reduce the negative influence of the internal disturbance of the power distribution network on the main network.
The constraint of the transaction between the power distribution network to be scheduled and the main network electric energy is as follows:
Figure BDA0003768467550000104
Figure BDA0003768467550000105
wherein,
Figure BDA0003768467550000106
for a period t of reactive power flowing from the main network to the distribution network to be scheduled,
Figure BDA0003768467550000107
apparent power flowing from the main network to the distribution network to be scheduled for a period of time t,S M is the minimum capacity of the transmission line,
Figure BDA0003768467550000108
is the maximum capacity of the transmission line.
And constructing risk constraints of the node voltage and branch power of the power distribution network to be scheduled, including:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
Figure BDA0003768467550000109
Figure BDA00037684675500001010
Figure BDA00037684675500001011
Figure BDA00037684675500001012
Figure BDA00037684675500001013
Figure BDA0003768467550000111
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power flowing on branch ij (branch ij i.e. the branch from node i to node j), Q during period t ij,t The reactive power flowing through the branch ij in the t time period, N is the set of all nodes in the power distribution network to be dispatched, B ij Characterizing the Power flow on leg ij, S ij,t For the apparent power flowing on branch ij for the period t,
Figure BDA0003768467550000112
for the active power of photovoltaic power generation at node i during the period t,
Figure BDA0003768467550000113
for the active power generated by the wind at node i during time t,
Figure BDA0003768467550000114
the active power of the load of node i for a period t (if node i is not connected to the corresponding device, correspondingly,
Figure BDA0003768467550000115
or
Figure BDA0003768467550000116
Or
Figure BDA0003768467550000117
Is 0) in the first step,
Figure BDA0003768467550000118
for the reactive power generated by the diesel engine set at the node i in the period t,
Figure BDA0003768467550000119
for the reactive power of the wind power generation at node i during the period t,
Figure BDA00037684675500001110
reactive power of the load at node i for a period of t, N 0 The set of nodes connected to the main network for the distribution network (if node i is not connected to a corresponding device, correspondingly,
Figure BDA00037684675500001111
or
Figure BDA00037684675500001112
Or
Figure BDA00037684675500001113
Is 0).
The node voltage calculation formula is as follows:
Figure BDA00037684675500001114
V j,t is the voltage amplitude, V, of node j during time t i,t The voltage amplitude of node i, r, for a period of t ij ,x ij Resistance and reactance, V, of branch ij, respectively 0 The node voltage at the connection with the main network is a preset value.
Constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal load flow calculation model, specifically:
Figure BDA00037684675500001115
Figure BDA00037684675500001116
wherein,
Figure BDA00037684675500001117
for a time period t the node voltage amplitude risk of the power distribution network to be scheduled,
Figure BDA00037684675500001118
for a period t of the node voltage of the distribution network to be scheduledApparent power risk of branch,. Epsilon V For the node voltage amplitude risk threshold value, epsilon, of the power distribution network to be scheduled S A branch apparent power risk threshold value, w, for the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of branch ij and satisfies
Figure BDA00037684675500001119
Figure BDA0003768467550000121
For the time period t the voltage magnitude risk at node i,
Figure BDA0003768467550000122
for the apparent power risk of branch ij,
Figure BDA0003768467550000123
is the set of all branches in the distribution network.
The node voltage magnitude risk and branch apparent power risk are defined as integrating the product of the probability density function and the severity function:
Figure BDA0003768467550000124
Figure BDA0003768467550000125
wherein PDF (V) i,t ),PDF(S ij,t ) Respectively, node voltage amplitude V i,t And branch apparent power S ij,t The probability density function can be obtained by probability load flow calculation, for example, a point estimation method is adopted to combine with Gram-Charlier expansion; sev V (V i,t ),Sev S (S ij,t ) As the node voltage amplitude V i,t Sum branch apparent power S ij,t Meets the following criteria:
Figure BDA0003768467550000126
Figure BDA0003768467550000127
V,
Figure BDA0003768467550000128
respectively the lower limit and the upper limit of the node voltage amplitude,S,
Figure BDA0003768467550000129
respectively, the lower and upper limits of the branch apparent power.
And S2, acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function.
In this embodiment, the obtaining of the state variables, the action variables, and the reward function of the scheduling model is preferably:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t
Figure BDA00037684675500001210
Figure BDA0003768467550000131
Wherein,
Figure BDA0003768467550000132
the active power for the time period t for the wind power generation of node i,
Figure BDA0003768467550000133
active power for photovoltaic power generation at node i during time period t,
Figure BDA0003768467550000134
for the load of node i during t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,
Figure BDA0003768467550000135
for the active power of the node i diesel engine set during the time period t,
Figure BDA0003768467550000136
and charging and discharging power of the energy storage system at the node i in the period t.
Wind power generation, photovoltaic power generation, load and electricity price are exogenous state variables, are determined by system uncertainty and are not influenced by action variables; the energy storage state of charge is an endogenous state variable, which is affected by the action variables. For a foreign state, state transitions are implemented by reading data for the next time period in the dataset; for endogenous states, the state transition is achieved by calculating the state of charge for the next time period. The definition of the action variable is based on the decision variable of the optimization model, but the active exchange with the main network can be carried out by the active power of each node diesel engine set
Figure BDA0003768467550000137
And the charging and discharging power of each node energy storage system
Figure BDA0003768467550000138
Combined with the power flow calculation, and therefore not within the range defined by the action variables.
At the same time, a reward function of the scheduling model is defined:
Figure BDA0003768467550000139
Figure BDA00037684675500001310
Figure BDA00037684675500001311
wherein r(s) t ,a t ) For agent in state s t Take action a t The result of the award is that the user can,
Figure BDA00037684675500001312
the weighted total operation cost (including the fuel cost of the diesel engine set, the carbon emission cost, the charge and discharge cost of the energy storage system and the cost of purchasing electricity from the main network) in the period t,
Figure BDA00037684675500001313
the total punishment (comprising punishment of violating the state of charge constraint, punishment of node voltage amplitude value out-of-limit risk and punishment of branch apparent power out-of-limit) after t time period weighting is carried out, and omega is 1 、ω 2 、ω 3 And ω 4 The weights of the reward components.
The agent learns in interaction with the environment, in particular the agent perceives the current environmental state s t And performing action a t The environment shifts to the next state s t+1 The agent obtains a reward r(s) t ,a t )。
And, based on the state variables, the action variables and the reward function, constructing a markov decision process for the scheduling model, specifically according to the following equation:
Figure BDA0003768467550000141
Figure BDA0003768467550000142
wherein,
Figure BDA0003768467550000143
in the form of a state space, the state space,
Figure BDA0003768467550000144
is a space for the movement of the user,
Figure BDA0003768467550000145
is a state transition probability function and r is a reward function. The goal of the agent is to maximize long-term accumulated rewards by interacting with the environment. The cost and penalty for defining the reward function as negative is therefore to guide the agent to minimize the running cost and meet the constraints.
And S3, training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process.
Specifically, in this embodiment, parameters of the SAC algorithm are updated by an ASAM algorithm and a PER algorithm, and an agent and a policy network corresponding to the markov decision process are trained by the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
The objective function of the SAC algorithm is the maximization of:
Figure BDA0003768467550000146
wherein, pi (a) t |s t ) For agent in state s t Take action a t Probability of (p) π For the state action trace generated by the policy pi,
Figure BDA0003768467550000147
is the entropy of the strategy pi, alpha is a temperature coefficient which is used for reflecting the relative importance of strategy entropy and reward in an objective function of the SAC algorithm, when alpha → 0 the objective function is degenerated to the maximization of long-term accumulated reward in the conventional reinforcement learning algorithm,
Figure BDA0003768467550000148
is a mathematical expectation. SAC algorithm by applying in a target functionThe maximization of the strategy entropy added into the number can effectively promote the intelligent agent to explore unknown state action space, and the learning speed of the intelligent agent is improved.
The SAC algorithm is based on an artificial neural network, so the soft Q function is parameterized as Q θ (s t ,a t ) The network parameter of the soft Q network is theta, and the Gaussian strategy is parameterized to pi φ (a t |s t ) The strategic network parameter is phi.
The input of the soft Q network is a state and an action, and the output is a 1-dimensional Q value of the state-action pair; the input of the strategy network is a state, and the output is the mean value and standard deviation of Gaussian action. In order to relieve the over-estimation problem of the soft Q function, two soft Q networks need to be established and independently trained simultaneously in the algorithm, and the network parameter is theta i (i =1, 2) and the smaller Q value output in both networks is used to update the parameters of the soft Q network and the policy network. Record of interaction of agent with environment(s) t ,a t ,s t+1 ,r t ) Is stored in an experience replay pool, and each time network parameters are updated, partial samples are extracted from the experience replay pool to perform random gradient descent.
For parameter update of soft Q network, by soft bellman residual:
Figure BDA0003768467550000151
wherein,
Figure BDA0003768467550000152
is an empirical playback pool. The soft Q network has a corresponding target network and parameters thereof
Figure BDA0003768467550000153
The method is obtained by soft updating on the basis of soft Q network parameters:
Figure BDA0003768467550000154
tau is eyeThe smoothing factor of the target network is much less than 1. In soft Bellman residual difference
Figure BDA0003768467550000155
Substituting the smaller of the two target network output Q values:
Figure BDA0003768467550000156
in order to improve generalization capability, an Adaptive Sharpness Aware Minimization (ASAM) algorithm is introduced into parameter update of the soft Q network, and an objective function of the algorithm is as follows:
Figure BDA0003768467550000157
wherein e is i As a network parameter theta i (i =1, 2), p being a hyper-parameter defining this neighborhood,
Figure BDA0003768467550000158
for a normalized operator of a network parameter, for a fully connected network:
Figure BDA0003768467550000159
Figure BDA00037684675500001510
and λ is a weight attenuation coefficient of L2 regularization, which is the weight coefficient of the kth layer of the ith soft Q network.
For policy networks, however, the updated goal is to minimize the Kullback-Leibler divergence of the policy:
Figure BDA00037684675500001511
wherein Q is θ (s t ,a t ) Substituting two soft Q network outputsThe smaller of the Q values. The temperature coefficient alpha measures the trade-off between reward and strategy entropy in the objective function. The magnitude of the reward function has a direct effect on the temperature coefficient α, so that the performance of the SAC algorithm is impaired unless the temperature coefficient is adjusted in different tasks or during training of the same task. During training, automatic temperature coefficient adjustment is performed with the goal of minimizing:
Figure BDA0003768467550000161
wherein,
Figure BDA0003768467550000162
is the goal of policy entropy.
And updating soft Q network parameters, strategy network parameters and temperature coefficients based on random gradient descent. Updating of soft Q network parameters requires solving a min-max type optimization problem:
first, the max problem of the inner layer is approximated by first-order Taylor expansion, and then the optimal epsilon is solved i And then updates theta by gradient descent i . The network parameter update formula is as follows:
Figure BDA0003768467550000163
Figure BDA0003768467550000164
Figure BDA0003768467550000165
Figure BDA0003768467550000166
wherein λ is Qπ Learning rates for soft Q network and policy network respectively,λ α to update the step size of the temperature coefficient alpha.
Secondly, giving priority to each sample based on the absolute value of a time sequence difference (TD) error through a Prioritized Experience Replay (PER) algorithm, and carrying out differentiation processing on sampling probability:
Figure BDA0003768467550000167
wherein P (k) is the sampling probability of the kth sample in the empirical replay pool, P (k) is the priority of the kth sample in the empirical replay pool, and beta 1 Measure the degree of priority (. Beta.) 1 And =0 is an equiprobable sample). In proportional prioritization, the priority p (k) is defined as follows:
p(k)=|δ(k)|+ε;
and delta (k) is the TD error of the kth sample in the empirical replay pool, namely, the sample with the larger absolute value of the TD error is considered to have higher learning value. Epsilon is a small positive number that ensures that there is some probability of being sampled even if the TD error is 0.
For the ith soft Q network, TD error delta i The calculation of (c) is closely related to the loss function:
Figure BDA0003768467550000171
the TD error for updating the k-th sample priority in the empirical replay pool is given by the above equation δ i Average value of (i =1,2).
The precedence in sampling introduces a bias in the soft Q function estimate, and therefore the bias IS removed by weighting the samples with Importance Sampling (IS) in calculating the loss function, including:
Figure BDA0003768467550000172
Figure BDA0003768467550000173
w k to empirically playback the IS weights for the kth sample in the pool, normalization IS required for stability. N is the size of the empirical recovery tank, beta 2 The compensation strength of IS weight IS when beta 2 And the compensation is complete when the value is 1. Initial value at the beginning of training, beta 2 The linear increase to 1 at the end of training.
The training of the policy network corresponding to the markov decision process comprises (with reference to figure 3):
step S31, randomly initializing a strategy network parameter phi and 2 soft Q network parameters theta 12 And copying the soft Q network parameters to the corresponding target network:
Figure BDA0003768467550000174
step S32, in each period of each scheduling cycle, the intelligent agent senses the environment state, and reads the wind power generation, the photovoltaic power generation, the load and the electricity price in the current period and the charge state of the energy storage system in the current period from a data set (basic data comprises the data set and a historical data set) for training; sampling and executing action a according to the action mean value and the action variance output by the strategy network and the Gaussian distribution t ~π φ (a t |s t ) The environment shifts to the next state s t+1 Reading wind power generation, photovoltaic power generation, load and electricity price in the next time period from a training data set, calculating the charge state of an energy storage system in the next time period, wherein the basic data comprises the training data set, and an intelligent agent obtains a reward r(s) t ,a t ) A sample(s) t ,a t ,s t+1 ,r t ) With current maximum priority p = max j p j And storing the experience data into an experience recovery pool.
Step S33, in each period of each scheduling cycle, extracting the kth sample from the empirical recovery pool with probability P (k), and calculating the IS weight w corresponding to the kth sample k And TD error delta (k) and updates its priority p (k) with IS weight w k Cumulative soft Q network loss function J Qi ) The process co-extractsn samples.
Step S34, in each period of each scheduling cycle, calculating the optimal neighborhood e of the network parameters defined by the self-adaptive sharpness i Updating soft Q network parameter theta based on gradient descent 12 Strategy network parameter phi and temperature coefficient alpha, and target network parameter
Figure BDA0003768467550000181
And performing soft updating.
And step S35, repeating the steps S32 to S34 until the current scheduling period is finished.
And S36, repeating the steps S32 to S35 until the number of the dispatching cycles reaches a preset value and the cycle reward curve tends to be stable.
And S4, scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
In this embodiment, through the trained policy network, at each time interval, the agent perceives the current environment state s t Reading wind power generation, photovoltaic power generation, load and electricity price in the current time period from the real-time data, reading the charge state of the energy storage system in the current time period, and executing an action a according to an action mean value output by a strategy network t The environment shifts to the next state s t+1 Reading the wind power generation, the photovoltaic power generation, the load and the electricity price in the next time period from the real-time data, calculating the charge state of the energy storage system in the next time period, and obtaining the reward r(s) by the intelligent body t ,a t ) The same steps are performed for each period until the end of the current scheduling cycle.
Correspondingly, referring to fig. 4, an embodiment of the present invention further provides a power distribution network scheduling apparatus based on deep reinforcement learning, including:
the constraint module 101 is configured to construct an operation constraint and a cost function respectively corresponding to a plurality of devices for a power distribution network to be scheduled, construct a constraint and a cost function corresponding to electric energy transaction between the power distribution network to be scheduled and a main network, construct a risk constraint of node voltage and branch power of the power distribution network to be scheduled, and obtain a scheduling model of the power distribution network to be scheduled;
a markov decision process constructing module 102, configured to obtain a state variable, an action variable, and a reward function of the scheduling model, and construct a markov decision process for the scheduling model based on the state variable, the action variable, and the reward function;
a training module 103, configured to train, in the markov decision process, a policy network corresponding to the markov decision process through a SAC algorithm in combination with basic data;
and the scheduling module 104 is configured to schedule the power distribution network to be scheduled based on the output of the trained policy network.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the power distribution network scheduling method based on the deep reinforcement learning.
The module integrated by the distribution network dispatching device based on deep reinforcement learning can be stored in a computer readable storage medium if the module is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a power distribution network scheduling method and device based on deep reinforcement learning and a computer readable storage medium, wherein the method comprises the following steps: constructing operation constraints and cost functions respectively corresponding to a plurality of devices for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions between the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled; acquiring state variables, action variables and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variables, the action variables and the reward function; training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process; and scheduling the power distribution network to be scheduled based on the output of the trained strategy network. Compared with the prior art, the method has the advantages that a Markov decision process is built, the strategy network trained through the SAC algorithm can adapt to online operation and complex calculation, millisecond-level rapid calculation is realized, and generalization capability is obviously improved.
It should be noted that the above-described apparatuses are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement without inventive effort.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

Claims (10)

1. A power distribution network scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
constructing operation constraints and cost functions corresponding to a plurality of devices respectively for a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a main network, constructing node voltage and branch power risk constraints of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
acquiring a state variable, an action variable and a reward function of the scheduling model, and constructing a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
training a strategy network corresponding to the Markov decision process through a SAC algorithm in combination with basic data in the Markov decision process;
and scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
2. The power distribution network scheduling method based on deep reinforcement learning according to claim 1, wherein the training of the policy network corresponding to the markov decision process by the SAC algorithm specifically comprises:
updating parameters of the SAC algorithm through an ASAM algorithm and a PER algorithm, and training an intelligent agent and a strategy network corresponding to the Markov decision process through the updated SAC algorithm; wherein the parameters of the SAC algorithm comprise soft Q network parameters, temperature coefficients and network parameters of the policy network.
3. The power distribution network dispatching method based on deep reinforcement learning of claim 1, wherein the plurality of devices comprise not less than one diesel engine set and not less than one energy storage system;
the operation constraint of the diesel engine set is as follows:
Figure FDA0003768467540000011
wherein,
Figure FDA0003768467540000012
the active power output of the ith diesel engine set in the power distribution network to be scheduled in the period of t,P i G for the minimum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure FDA0003768467540000013
for the maximum active power of the ith diesel engine set of the power distribution network to be dispatched,
Figure FDA0003768467540000021
for the set of all the nodes to which the diesel group is connected,
Figure FDA0003768467540000022
is a set of all time periods in the scheduling cycle;
the cost function of the diesel engine set is:
Figure FDA0003768467540000023
Figure FDA0003768467540000024
wherein,
Figure FDA0003768467540000025
for the sum of the fuel costs of all the diesel fuel units in the distribution network to be scheduled during the period t,
Figure FDA0003768467540000026
is the sum of the carbon emission costs of all the diesel engine sets in the power distribution network to be scheduled at the time period t, a G,i 、b G,i And c G,i For the fuel cost factor of the i-th diesel unit, d G,i And e G,i Is the carbon emission cost coefficient of the ith diesel unit.
4. The power distribution network scheduling method based on deep reinforcement learning of claim 3, wherein the operation constraints of the energy storage system are as follows:
Figure FDA0003768467540000027
Figure FDA0003768467540000028
Figure FDA0003768467540000029
wherein,
Figure FDA00037684675400000210
for the active output of the ith energy storage system in the power distribution network to be scheduled in the period of t,
Figure FDA00037684675400000211
the maximum charging power of the ith energy storage system in the power distribution network to be dispatched is obtained,
Figure FDA00037684675400000212
the maximum discharge power of the ith energy storage system in the power distribution network to be scheduled,
Figure FDA00037684675400000213
for the set, SOC, of all nodes connected with the energy storage system in the power distribution network to be scheduled i,t For the time period t, the charge state of the ith energy storage system in the power distribution network to be dispatched,SOC i,t for the t period, the minimum state of charge allowed by the ith energy storage system in the power distribution network to be dispatched,
Figure FDA00037684675400000214
the maximum charge state, eta, allowed by the ith energy storage system in the period t in the power distribution network to be scheduled C Charging power, η, for energy storage systems D For the discharge power of the energy storage system, E i The capacity of the ith energy storage system in the power distribution network to be dispatched is obtained;
the cost function of the energy storage system is:
Figure FDA0003768467540000031
wherein,
Figure FDA0003768467540000032
is the sum of the charge and discharge costs, a, of all stored energy of the power distribution network to be scheduled in the period of t E,i And the cost coefficient is the cost coefficient of the ith energy storage system in the power distribution network to be dispatched.
5. The distribution network scheduling method based on deep reinforcement learning of claim 4, wherein the cost function of the electric power transaction between the distribution network to be scheduled and the main network is as follows:
Figure FDA0003768467540000033
wherein,
Figure FDA0003768467540000034
the cost P of purchasing electricity from the main network for the power distribution network to be scheduled in the period t t M >And 0 is t time period, the power of the power distribution network to be scheduled for purchasing power from the main network, P t M <0 is the power of the power distribution network to be scheduled for selling electricity to the main network in the period of t, a M,t Is the real-time electricity rate for the period t,
Figure FDA0003768467540000035
the difference proportion between the price of purchasing and selling electricity and the real-time price for the main network;
the constraint of the electric energy transaction between the power distribution network to be scheduled and the main network is as follows:
Figure FDA0003768467540000036
Figure FDA0003768467540000037
wherein,
Figure FDA0003768467540000038
for a period t of reactive power flowing from the main network to the distribution network to be scheduled,
Figure FDA0003768467540000039
apparent power flowing from the main network to the power distribution network to be scheduled for a period t,S M is the minimum capacity of the transmission line,
Figure FDA00037684675400000310
is the maximum capacity of the transmission line.
6. The power distribution network scheduling method based on deep reinforcement learning of claim 5, wherein the constructing of the risk constraints of the node voltage and the branch power of the power distribution network to be scheduled comprises:
constructing an internal power flow calculation model of the power distribution network to be scheduled:
Figure FDA00037684675400000311
Figure FDA0003768467540000041
Figure FDA0003768467540000042
Figure FDA0003768467540000043
Figure FDA0003768467540000044
Figure FDA0003768467540000045
wherein, P i,t Net injection of active power, Q, for node i during time t i,t Net injection of reactive power, P, for node i during time t ij,t For the active power, Q, flowing on branch ij during time t ij,t The reactive power flowing through the branch ij in the time period t, N is the set of all nodes in the power distribution network to be scheduled, B ij Characterizing the power flow direction, S, over branch ij ij,t For the apparent power flowing on branch ij for the period t,
Figure FDA0003768467540000046
for the active power of photovoltaic power generation at node i during the period t,
Figure FDA0003768467540000047
for the active power generated by the wind at node i during time t,
Figure FDA0003768467540000048
for the active power of the load of node i during time t,
Figure FDA0003768467540000049
the reactive power generated by the diesel engine set at the node i in the period t,
Figure FDA00037684675400000410
for the reactive power of the wind power generation at node i during the period t,
Figure FDA00037684675400000411
reactive power of the load at node i for a period of t, N 0 A node set which is connected with the main network for the power distribution network;
constructing risk constraints of node voltage amplitude and branch apparent power of the power distribution network to be scheduled on the basis of the internal power flow calculation model, and specifically:
Figure FDA00037684675400000412
Figure FDA00037684675400000413
wherein,
Figure FDA00037684675400000414
for a time period t, the node voltage amplitude risk of the power distribution network to be scheduled,
Figure FDA00037684675400000415
for the t time period, the power distribution network to be scheduledOf the node voltage of (c) is a branch apparent power risk of V For the node voltage amplitude risk threshold value, epsilon, of the power distribution network to be scheduled S An apparent power risk threshold, w, for the branch of the distribution network to be scheduled i Is the weight of node i, w ij Is the weight of the branch ij and,
Figure FDA0003768467540000051
for the time period t the voltage magnitude risk at node i,
Figure FDA0003768467540000052
for the apparent power risk of branch ij,
Figure FDA0003768467540000053
is the set of all branches in the distribution network.
7. The power distribution network scheduling method based on deep reinforcement learning according to claim 6, wherein the obtaining of the state variables, the action variables and the reward functions of the scheduling model specifically includes:
defining a state variable s of the scheduling model over a period t t And an action variable a during a period t t
Figure FDA0003768467540000054
Figure FDA0003768467540000055
Wherein,
Figure FDA0003768467540000056
the active power for the time period t for the wind power generation of node i,
Figure FDA0003768467540000057
for node i photovoltaic generatorThe active power of the electricity during the time period t,
Figure FDA0003768467540000058
for the load of node i during time t, α M,t For real-time electricity prices, SOC i,t The active power of the energy storage system at the node i in the period t,
Figure FDA0003768467540000059
for the active power of the node i diesel engine set during the time period t,
Figure FDA00037684675400000510
charging and discharging power of the energy storage system of the node i in a time period t;
and defining a reward function of the scheduling model:
Figure FDA00037684675400000511
Figure FDA00037684675400000512
Figure FDA00037684675400000513
wherein r(s) t ,a t ) For agents in state s t Take action a t The result of the award is that the user can,
Figure FDA00037684675400000514
for the total running cost weighted for the period t,
Figure FDA00037684675400000515
for the total penalty, ω, weighted over time t 1 、ω 2 、ω 3 And omega 4 The weights of the reward components.
8. The distribution network dispatching method based on deep reinforcement learning of claim 7, wherein the Markov decision process is constructed for the dispatching model based on the state variables, the action variables and the reward function, and specifically comprises:
constructing the Markov decision process according to the following equation:
Figure FDA0003768467540000061
Figure FDA0003768467540000062
wherein,
Figure FDA0003768467540000063
is a space of states that is, for example,
Figure FDA0003768467540000064
is a space for the movement of the user,
Figure FDA0003768467540000065
is a state transition probability function and r is a reward function.
9. The utility model provides a distribution network scheduling device based on deep reinforcement learning which characterized in that includes:
the system comprises a constraint module, a master network and a scheduling module, wherein the constraint module is used for constructing operation constraints and cost functions corresponding to a plurality of devices of a power distribution network to be scheduled, constructing constraints and cost functions corresponding to electric energy transactions of the power distribution network to be scheduled and a master network, constructing risk constraints of node voltage and branch power of the power distribution network to be scheduled, and obtaining a scheduling model of the power distribution network to be scheduled;
the Markov decision process building module is used for obtaining a state variable, an action variable and a reward function of the scheduling model and building a Markov decision process for the scheduling model based on the state variable, the action variable and the reward function;
the training module is used for training a strategy network corresponding to the Markov decision process through an SAC algorithm in combination with basic data in the Markov decision process;
and the scheduling module is used for scheduling the power distribution network to be scheduled based on the output of the trained strategy network.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; the computer program controls, when executed, an apparatus on which the computer readable storage medium is located to execute the method for scheduling a power distribution network based on deep reinforcement learning according to any one of claims 1 to 8.
CN202210893449.XA 2022-07-27 2022-07-27 Power distribution network scheduling method, device and medium based on deep reinforcement learning Pending CN115169957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210893449.XA CN115169957A (en) 2022-07-27 2022-07-27 Power distribution network scheduling method, device and medium based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210893449.XA CN115169957A (en) 2022-07-27 2022-07-27 Power distribution network scheduling method, device and medium based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115169957A true CN115169957A (en) 2022-10-11

Family

ID=83496657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210893449.XA Pending CN115169957A (en) 2022-07-27 2022-07-27 Power distribution network scheduling method, device and medium based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115169957A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116316755A (en) * 2023-03-07 2023-06-23 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN116562464A (en) * 2023-07-03 2023-08-08 南京菁翎信息科技有限公司 Deep reinforcement learning-based low-carbon optimal scheduling method for power system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116316755A (en) * 2023-03-07 2023-06-23 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN116316755B (en) * 2023-03-07 2023-11-14 西南交通大学 Energy management method for electrified railway energy storage system based on reinforcement learning
CN116562464A (en) * 2023-07-03 2023-08-08 南京菁翎信息科技有限公司 Deep reinforcement learning-based low-carbon optimal scheduling method for power system
CN116562464B (en) * 2023-07-03 2023-09-19 南京菁翎信息科技有限公司 Deep reinforcement learning-based low-carbon optimal scheduling method for power system

Similar Documents

Publication Publication Date Title
CN111884213B (en) Power distribution network voltage adjusting method based on deep reinforcement learning algorithm
CN114725936B (en) Power distribution network optimization method based on multi-agent deep reinforcement learning
Li et al. A merged fuzzy neural network and its applications in battery state-of-charge estimation
CN113572157B (en) User real-time autonomous energy management optimization method based on near-end policy optimization
CN112186743A (en) Dynamic power system economic dispatching method based on deep reinforcement learning
CN109347149A (en) Micro-capacitance sensor energy storage dispatching method and device based on depth Q value network intensified learning
CN113326994A (en) Virtual power plant energy collaborative optimization method considering source load storage interaction
CN115169957A (en) Power distribution network scheduling method, device and medium based on deep reinforcement learning
CN110518580A (en) Active power distribution network operation optimization method considering micro-grid active optimization
CN114362187B (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN112507614A (en) Comprehensive optimization method for power grid in distributed power supply high-permeability area
CN117057553A (en) Deep reinforcement learning-based household energy demand response optimization method and system
CN116451880B (en) Distributed energy optimization scheduling method and device based on hybrid learning
CN114723230B (en) Micro-grid double-layer scheduling method and system for new energy power generation and energy storage
Kim et al. Optimize the operating range for improving the cycle life of battery energy storage systems under uncertainty by managing the depth of discharge
CN113972645A (en) Power distribution network optimization method based on multi-agent depth determination strategy gradient algorithm
Liu et al. Multi-state joint estimation of series battery pack based on multi-model fusion
CN116359742B (en) Energy storage battery state of charge on-line estimation method and system based on deep learning combination extended Kalman filtering
CN117277327A (en) Grid-connected micro-grid optimal energy management method based on intelligent agent
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN115276067B (en) Distributed energy storage voltage adjusting method suitable for dynamic topological change of power distribution network
CN114048576B (en) Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid
CN117559429A (en) Light Chu Zhi flexible power distribution system decision model training method, system and storage medium
CN116979579A (en) Electric automobile energy-computing resource scheduling method based on safety constraint of micro-grid
CN115001002A (en) Optimal scheduling method and system for solving energy storage participation peak clipping and valley filling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination