[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116454926A - Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network - Google Patents

Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network Download PDF

Info

Publication number
CN116454926A
CN116454926A CN202310696501.7A CN202310696501A CN116454926A CN 116454926 A CN116454926 A CN 116454926A CN 202310696501 A CN202310696501 A CN 202310696501A CN 116454926 A CN116454926 A CN 116454926A
Authority
CN
China
Prior art keywords
agent
markov
value
model
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310696501.7A
Other languages
Chinese (zh)
Other versions
CN116454926B (en
Inventor
李佳勇
海征
陈大波
张聪
朱利鹏
帅智康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310696501.7A priority Critical patent/CN116454926B/en
Publication of CN116454926A publication Critical patent/CN116454926A/en
Application granted granted Critical
Publication of CN116454926B publication Critical patent/CN116454926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/26Arrangements for eliminating or reducing asymmetry in polyphase networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06312Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/50Arrangements for eliminating or reducing asymmetry in polyphase networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Power Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Biophysics (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Game Theory and Decision Science (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

A multi-type resource collaborative regulation and control method for three-phase imbalance treatment of a distribution network belongs to the technical field of three-phase imbalance treatment of a distribution network, and comprises the steps of S1 setting a quintuple set as a construction model coordinate; s2, constructing a Markov decision model, and solving the Markov decision model to obtain a control strategy of the parallel capacitor bank and the phase change switch; s3, constructing a Markov game model, and solving the Markov game model to enable the selected intelligent agent to selectively pay attention to information of the non-selected intelligent agent in the Q value estimation model; s4, constructing an upper-layer agent of a Markov decision model and a lower-layer agent of a Markov game model by adopting two-step collaborative training. The control method solves the problems that the existing control technology based on the physical model excessively depends on refined modeling and is difficult to be suitable for on-line treatment of three-phase unbalance of a part of observable power distribution network, and the current unbalance degree compensation and the voltage unbalance treatment effect of the power distribution network are remarkably improved.

Description

Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network
Technical Field
The invention belongs to the technical field of three-phase unbalance management of a power distribution network, relates to a multi-type resource cooperative regulation and control method for three-phase unbalanced management of a distribution network.
Background
At present, the access of high-proportion distributed new energy into the power distribution network has become an important direction of energy development in China. However, the great increase of the permeability of the new energy not only can cause frequent voltage fluctuation of the power distribution network, but also can cause generation of three-phase unbalanced current and voltage. In addition, the three-phase unbalance phenomenon of the power distribution network can be aggravated by injection power generated by a large number of single-phase distributed photovoltaic power sources which are connected in a scattered manner, and the safe and reliable operation of the power distribution network is seriously jeopardized.
In the prior art, the voltage of the power distribution network is controlled in real time only by focusing on the capability of quickly adjusting the reactive power of the photovoltaic inverter, and the inherent asymmetric property of the power distribution network and the distributed synergistic effect of the parallel capacitor bank, the phase change switch and the photovoltaic inverter are ignored, so that the adjusting capability of each type of controllable equipment cannot be fully invoked to improve the unbalanced phenomenon of the three phases of the voltage and the current of the power distribution network and improve the electric energy quality.
Disclosure of Invention
In order to achieve the purpose, the invention provides a multi-type resource collaborative regulation and control method for three-phase unbalance management of a power distribution network, which solves the problems that the existing control technology based on a physical model is excessively dependent on fine modeling and is difficult to be suitable for partially observable three-phase unbalance online management of the power distribution network, and remarkably improves the current unbalance degree compensation and voltage unbalance management effects of the power distribution network.
The technical scheme adopted by the invention is as follows:
the first aspect of the embodiment of the invention provides a multi-type resource collaborative regulation and control method for three-phase imbalance treatment of a distribution network, which comprises the following steps: s1 setting five-tuple setAs construction model coordinates; s2, constructing a Markov decision model, and solving the Markov decision model by adopting a first calculation method to obtain a control strategy of the parallel capacitor bank and the phase change switch; s3, constructing a Markov game model, and solving the Markov game model by adopting a second calculation method to enable the selected agent to selectively pay attention to information of non-selected agents in the Q value estimation model; s4, constructing an upper-layer agent of a Markov decision model and a lower-layer agent of a Markov game model by adopting two-step collaborative training.
The first calculation method adopts a depth neural network fitting function DQN to obtain an optimal control strategy of the parallel capacitor bank and the phase change switch;
the second calculation method adopts a multi-attention action-evaluation MAAC which introduces an attention mechanism to solve the Markov game, so that the selected agent selectively pays attention to the related information of the non-selected agent in the Q value estimation process, and the calculation complexity and the storage space are reduced;
the two-step method adopts a multi-time scale control method to cooperatively train an upper-layer agent for constructing a Markov decision model and a lower-layer agent for constructing a Markov game model, so that the parallel capacitor bank, the phase change switch and the photovoltaic inverter cooperatively act.
Further, constructing a Markov decision model from the set of five-tuple comprises: state spaceSpace of actionBonus function->State transition probability function->The method comprises the steps of carrying out a first treatment on the surface of the S2.1 setting State space->Active power and reactive power of all nodes of the power distribution network, active power of photovoltaic equipment and node voltage amplitude; s2.2 setting action space +.>The device comprises a parallel capacitor bank and an action instruction of a phase change switch; s2.3 setting a reward functionThe method comprises the steps of adding zero sequence and negative sequence current components through a transmission and distribution connection node, and obtaining a voltage out-of-limit penalty value and a voltage unbalance out-of-limit penalty value; s2.4 setting a state transition probability function +.>The method comprises the steps of carrying out a first treatment on the surface of the Status space->Space of actionBonus function->The characterized upper level agent and is used to maximize the cumulative discount rewards.
Further, the Markov decision model solving comprises; s2.5, fitting an action cost function according to the deep neural network; given stateTake action->Policy based->The specific process of continuously interacting with the environment to obtain the desired rewards may define an action-cost function as a Q function, such as:
wherein ,expressed as policy->Lower expected value->For discounts factor->Weight parameters to be optimized for the Q network, t being denoted as time t, < >>A bonus function value expressed as time t;
and solving the Markov decision model, and selecting the action with the maximum Q value by the intelligent agent according to the predicted Q value, and taking effect at the preset next moment.
Further, the Markov decision model solving further comprises; s2.6, applying a target Q network and an experience playback mechanism; s2.7 updating parameters of the loss function with the Adam optimizerWherein, all evaluation networks can be iteratively updated by minimizing a joint regression loss function, the loss function being:
wherein ,expressed as expected value,Expressed as target Q value, +.>For rewarding function value->For discounts factor->For the weight parameter of the target Q network, +.>Expressed as predicted Q value;
s2.8 use ofGreedy policies select actions of the Q network.
Further, the Markov game model comprises; s3.1 setting State spaceState space->Comprising distribution network areas->Active power and reactive power of all nodes in the photovoltaic system, active power and reactive power of the photovoltaic system, node voltage amplitude and time +.>Status information of the internal parallel capacitor bank and the phase change switch; s3.2 setting action space->The reactive power output value of each photovoltaic inverter in the area is calculated; s3.3 setting a reward function->A reward function for representing underlying multi-agent sharing; every +.>And at time intervals, obtaining a corresponding action strategy through each intelligent agent according to the local state information in the area, then carrying out load flow calculation of the three-phase asymmetric power distribution network to obtain measurement information such as voltage amplitude values of all nodes, finally calculating a reward function value at the current moment on the basis, and transferring the three-phase asymmetric power distribution network to the next moment.
Further, solving the Markov game model includes; s3.4, considering local observation state information and action information of the intelligent agent, and considering contribution degree of local information of other intelligent agents; s3.5 based on three trainable parameter sharing matrices, all evaluation networks can be iteratively updated by minimizing the joint regression loss function: s3.6, each agent can update the parameters of the own action network based on the gradient strategy; and S3.7, updating target network parameters so that each intelligent agent selectively pays attention to related information of other intelligent agents in the Q value estimation process.
The beneficial effects of the invention are as follows: setting a five-tuple set by S1As construction model coordinates; s2, constructing a Markov decision model, and solving the Markov decision model by adopting a first calculation method to obtain a control strategy of the parallel capacitor bank and the phase change switch; s3, constructing a Markov game model, and solving the Markov game model by adopting a second calculation method to enable the selected agent to selectively pay attention to information of non-selected agents in the Q value estimation model; s4, constructing an upper-layer agent of a Markov decision model and a lower-layer agent of a Markov game model by adopting two-step collaborative training. The method solves the problems that the existing control technology based on the physical model is excessively dependent on refined modeling and is difficult to be suitable for on-line treatment of three-phase unbalance of a part of observable power distribution network, and the current unbalance compensation of the power distribution network is obviously improvedAnd a voltage unbalance treatment effect.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for collaborative regulation of multiple types of resources for three-phase imbalance treatment of a distribution network according to an embodiment of the present invention;
FIG. 2 is a diagram of a multi-type resource collaborative regulation framework for three-phase imbalance management of a power distribution network according to an embodiment of the present invention;
FIG. 3 is a diagram of a DQN method network architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the attention mechanism of the agent Q function according to an embodiment of the present invention;
FIG. 5 is a flow chart of a top level agent training provided in an embodiment of the present invention;
FIG. 6 is a flow chart of an underlying multi-agent training process according to one embodiment of the present invention;
FIG. 7 is a flowchart of an implementation strategy of a multi-time scale control method according to an embodiment of the present invention;
FIG. 8a is a plot of amplitude versus frequency of phase voltage at node a according to one embodiment of the present invention;
FIG. 8b is a plot of amplitude versus frequency for another phase of voltage at node a according to one embodiment of the present invention;
FIG. 9a is a plot of node b phase voltage magnitude versus frequency for an embodiment of the present invention;
FIG. 9b is a plot of amplitude versus frequency for another phase of voltage at node b according to one embodiment of the present invention;
FIG. 10a is a plot of the amplitude versus frequency of the phase-c voltage at node according to one embodiment of the present invention;
FIG. 10b is a plot of amplitude versus frequency for another phase of voltage at node c according to one embodiment of the present invention;
FIG. 11a is a plot of voltage imbalance frequency provided by an embodiment of the present invention;
FIG. 11b is a plot of another voltage imbalance frequency provided by an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for collaborative control of multiple types of resources for three-phase imbalance treatment of a distribution network according to an embodiment of the present invention; the first aspect of the embodiment of the invention provides a multi-type resource collaborative regulation and control method for three-phase imbalance treatment of a distribution network, which comprises the following steps: s1 setting five-tuple setAs construction model coordinates; s2, constructing a Markov decision model, and solving the Markov decision model by adopting a first calculation method to obtain a control strategy of the parallel capacitor bank and the phase change switch; s3, constructing a Markov game model, and solving the Markov game model by adopting a second calculation method to enable the selected agent to selectively pay attention to information of non-selected agents in the Q value estimation model; s4, constructing an upper-layer agent of a Markov decision model and a lower-layer agent of a Markov game model by adopting two-step collaborative training.
The first calculation method adopts a depth neural network fitting function DQN to obtain an optimal control strategy of the parallel capacitor bank and the phase change switch;
the second calculation method adopts a multi-attention action-evaluation MAAC which introduces an attention mechanism to solve the Markov game, so that the selected agent selectively pays attention to the related information of the non-selected agent in the Q value estimation process, and the calculation complexity and the storage space are reduced;
the two-step method adopts a multi-time scale control method to cooperatively train an upper-layer agent for constructing a Markov decision model and a lower-layer agent for constructing a Markov game model, so that the parallel capacitor bank, the phase change switch and the photovoltaic inverter cooperatively act.
In the present embodiment, the functionAs a function of the characterization of a usable deep neural network, the neural network may be referred to as a Q network. In this network, the output is a real number, called the Q value, which is expressed as the long-term jackpot value that an agent can achieve in a certain state to take a certain action.
In this embodiment, the five-tuple includes a five-tuple composed of a state space, an action space, a reward function, a state transition probability function, and a discount factor of the upper-layer agent.
Referring to fig. 2, the following description will be made, and fig. 2 is a schematic diagram of a multi-type resource collaborative regulation framework for three-phase imbalance treatment of a power distribution network according to an embodiment of the present invention.
In this embodiment, step 2 is a sub-step of long-time scale control, that is, step S2 of constructing a markov decision model, and solving the markov decision model by using a first calculation method is a sub-step of long-time scale control; the control according to the long time scale may comprise the steps of:
specifically, according to the discreteness of the parallel capacitor bank and the phase change switch action mode, the control problem is modeled as a Markov decision process, and is described by a five-tuple:
illustratively, the five-tuple includes: state spaceRepresenting a set of upper level agent state spaces. At preset time or experimental result time t, the state space of the upper intelligent agent is formed by the active power and the reactive power of all nodes of the power distribution network,Active power, node voltage amplitude, etc. of the photovoltaic device, and is defined as +.>
Exemplary, action spaceRepresenting a set of upper level agent action spaces; at->At the moment, the action space of the upper-layer intelligent body is composed of the action instructions of the parallel capacitor bank and the phase change switch at the moment, and can be defined as
Since the parallel capacitor bank has two actions of on and off, the dimension of the binary action space set is proportional to the number of the parallel capacitor banks and can be expressed as. Similarly, the phase change switch has three different actions of conducting A phase, B phase and C phase, so the dimension of the action space set can be expressed as
Illustratively, the reward functionRepresenting the rewarding function of the upper level agent. At->When the moment, the rewarding function of the upper-layer intelligent agent comprises three parts of the sum of zero sequence and negative sequence current components passing through the transmission and distribution connection node, the voltage out-of-limit penalty value and the voltage unbalance out-of-limit penalty value at the moment, and in order to ensure that the rewarding function value tends to the maximum value, the method can be used for controlling the transmission and distribution connection node to realize the purpose ofAnd (5) calculating to obtain the product.
It should be noted that the number of the substrates,represented by the zero sequence and negative sequence components of the transmission and distribution connection node, respectively, < >>The amplitude at the time t is represented as the time t, and the t is represented as the formula (1) and the formula (2);The penalty term and the penalty term of the voltage unbalance degree violation are respectively expressed as a penalty term of node voltage out-of-limit and a penalty term of voltage unbalance degree violation, and are shown in the formulas (3) and (4):
(1)
(2)
(3)
(4)
wherein, in the steps (1) and (2),is->Phase voltage amplitude> andRespectively represent the communication node through transmission and distribution>Active and reactive power of the phase.
Wherein, in the step (4),is a three-phase node->The degree of voltage imbalance of (2) can be calculated from the following formula (5)>The voltage is a set of all three-phase node voltages in the three-phase asymmetric power distribution network;
(5)
(5) in the formula ,for node->Is used for the phase voltage of the capacitor.
Further, in the state transition probability function, since the state of the next moment of the power distribution network is only dependent on the state of the current moment and the action taken under the current strategy, the state transition probability functionObeying a markov decision process.
It should be noted that, based on the power flow calculation result, the three-phase unbalanced operation condition of the actual power distribution network is simulated, and in the training process of the whole model, the state transition relation satisfies the power flow constraint of the power distribution network.
By way of example only, the present invention is directed to a method of,is a discount factor for levelingWeights of instant rewards and future rewards are balanced.
In this embodiment, the upper level agent is at each ofGlobal observation state information of three-phase asymmetric distribution network at moment +.>Obtain corresponding action instruction->Then, based on the action instruction, carrying out power flow calculation of the power distribution network to obtain a reward function value of the current moment>Observation state at the next moment +.>The method comprises the steps of carrying out a first treatment on the surface of the Based on the step length circulation, the object of the upper-layer agent is to learn the optimal switching strategy of the regulating equipment through the repeated interaction of the agent and the three-phase asymmetric power distribution network environment, thereby realizing accumulated discount rewards +.>Is maximized.
Further, the markov decision process is solved by adopting a first calculation method to solve the markov decision model, and the first calculation method is adopted to solve the markov decision process by adopting an DQN method, so that an optimal control strategy of the parallel capacitor bank and the phase change switch is obtained.
Note that the DQN method uses deep neural networks to fit motion cost functions, wherein ,Respectively are provided withStates and actions expressed as environment, +.>The weight parameters to be optimized for the Q network.
Participation in fig. 3, fig. 3 is a network architecture diagram of an DQN method according to an embodiment of the invention; as can be seen, the Q network consists of an input layer, two hidden layers and an output layer. Its input is the currentGlobal status information of the distribution network at the moment->The number of neurons is the number of elements in the state space set; its output contains status->Predictive Q values for all possible actions of the lower parallel capacitor bank and the commutation switch, which are common +.>And neurons.
Further, according to these predicted Q values, the agent will select the action with the largest Q value to be effective at the next moment.
Preferably, in order to improve the stability and convergence of the Q network during training, the DQN method introduces a target Q network and an empirical playback mechanism, and the loss function is shown in formula (6):
(6)
wherein in the formula (6),expressed as desired value +.>Expressed as target Q value, +.>For rewarding function value->For discounts factor->For the weight parameter of the target Q network, +.>Represented as predicted Q value.
Updating parameters of a loss function using an Adam optimizerThe update formula of the Q network parameters can be obtained as follows:
(7)
wherein in the formula (7), andRespectively->Time and->Q network parameters of time of day->Is the learning rate. In order to ensure that the intelligent agent can actively explore the unknown environment while effectively utilizing the environment information, the method adopts a method of +.>Greedy policies select actions of the Q network, namely:
(8)
wherein the method comprises the steps ofIn the formula (8), the amino acid sequence of the compound,is a constant, & gt>Is a randomly generated number. When->When the intelligent agent selects one action at random in the action space, otherwise, the intelligent agent selects the action with the largest Q value in the current state.
It should be noted that, the greedy strategy refers to that the agent selects the action with the largest Q value under the condition of high probability, and the remaining small probability condition generates a motion to be explored immediately, so as to avoid sinking into the locally optimal solution.
Referring to fig. 5, for further explanation of the above principle, fig. 5 is a flowchart of upper-level agent training according to an embodiment of the present invention;
in this embodiment, the step of constructing the markov game model and the step of solving the markov game model by using the second calculation method is a short time scale control sub-step, and is to model the cooperative control problem of the photovoltaic inverter as a partially observable markov game problem. The model adopts a plurality of agents to represent optimization decisions and information interaction of different areas, and each agent is independently responsible for action instructions of photovoltaic inverters in the sub-area where the agent is located. The markov game is composed mainly of the following parts.
Specifically, the state spaceWhich represents a collection of all underlying agent state spaces. Wherein, intelligent agent->At time->Inner time interval->Status of time->Comprising distribution network areas->Active power and reactive power of all nodes in the photovoltaic system, active power and reactive power of the photovoltaic system, node voltage amplitude and time +.>The state information of the internal parallel capacitor bank and the phase change switch can be expressed as:
Specifically, the action spaceWhich represents the collection of all agent action spaces in the lower layer. Wherein, intelligent agent->At time->Inner time interval->Action of all photovoltaic inverters at the time +.>Can be expressed as a ratio to the maximum reactive output of the inverter, then by +.>The region +.>Reactive output value of each photovoltaic inverter.
Specifically, the reward functionWhich is represented as a bonus function shared by the underlying multi-agent. At->The first time of dayAt intervals of time, agent->The prize value of (2) may be defined as:
(9)
wherein in the formula (9),for a three-phase unbalance current reference value,/->Region +.>The penalty term when the internal node voltage exceeds the threshold and the voltage unbalance exceeds the threshold is expressed in the same form as the expression (3) and the expression (4).
The state transition probability function in the lower multi-agentDesign of (2)>The value is chosen similarly to the upper level agent.
Further, in the underlying multi-agent architecture, every other intervalTime interval, each intelligentThe body is based on local state information +.>Obtain corresponding action policy->And then carrying out load flow calculation of the three-phase asymmetric distribution network based on a forward push back substitution method to obtain measurement information such as voltage amplitude values of all nodes, and finally calculating a reward function value at the current moment on the basis, and transferring the three-phase asymmetric distribution network to a state at the next moment.
With reference to fig. 6, further explanation of the above principle is provided in fig. 6, which is a flowchart of the lower multi-agent training according to an embodiment of the present invention.
In this embodiment, the solution of the markov game adopts a second calculation method, and the second calculation method adopts a MAAC (Multi-attention-action-assessment) method to solve the markov game, so that each agent selectively focuses on the related information of other agents in the Q value estimation process, thereby greatly reducing the calculation complexity and the storage space.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating an attention mechanism of an agent Q function according to an embodiment of the present invention; agent Q functionExcept that local observation state information Q and motion information of the own are considered +.>Besides, the contribution degree of the local information of other intelligent agents is also considered>As shown in formula (10):
(10)
wherein in the formula (10),denoted as MLP (multi-layer supercedron), double-layer multi-layer perceptron,represented as a single layer MLP encoder, while contribution +.>Expressed as a weighted sum of all agent encoded values except the agent, as shown in formula (11):
(11)
wherein in the formula (11),expressed as encoding agent +.>A parameter sharing matrix converted into a value;Is a nonlinear activation function;For allocation to agents->Is by weight of attention to +.>And->Bilinear map acquisition is performed, and similarity between coded values is then transferred based on softmax operation, and the specific expression is shown in (12):
(12)
wherein in the formula (12),expressed as +.>Converting into a parameter sharing matrix of 'key value';Expressed as +.>The parameter sharing matrix is converted into a key code.
Based on the three trainable parameter sharing matricesAll evaluation networks can be iteratively updated by minimizing the joint regression loss function as shown in equation (13):
(13)
wherein in the formula (13),
each agent may then update parameters of its own action network based on the gradient policiesAs shown in formula (14):
(14)
wherein in the formula (14),expressed as except for agent->A set of all agents outside, < >>Represented as a multi-agent dominance function.
Finally updating the target network parameters based on equation (15):
(15)
wherein in the formula (15),the coefficients are updated for soft.
In this embodiment, S4 adopts two-step collaborative training to construct an upper level agent of a markov decision model and a lower level agent of a markov game model.
Referring to fig. 7, fig. 7 is a flowchart of an execution strategy of a multi-time scale control method according to an embodiment of the present invention; the upper and lower double-layer intelligent bodies are cooperatively trained by adopting a two-step method and comprise corresponding parameters, wherein,expressed as its set of parameters to be optimized, +.>Is the weight parameter set to be optimized of the upper-layer agent,/-for the upper-layer agent>Is a weight parameter set to be optimized of the lower multi-agent.
In one embodiment, building a Markov decision model from a set of five-tuple comprises: state spaceAction space->Bonus function->State transition probability function->The method comprises the steps of carrying out a first treatment on the surface of the S2.1 setting State space->Active power and reactive power of all nodes of the power distribution network, active power of photovoltaic equipment and node voltage amplitude; s2.2 setting action space +.>The device comprises a parallel capacitor bank and an action instruction of a phase change switch; s2.3 setting a reward function +.>The method comprises the steps of adding zero sequence and negative sequence current components through a transmission and distribution connection node, and obtaining a voltage out-of-limit penalty value and a voltage unbalance out-of-limit penalty value; s2.4 setting a state transition probability function +.>The method comprises the steps of carrying out a first treatment on the surface of the Status space->Action space->Bonus function->The characterized upper level agent and is used to maximize the cumulative discount rewards.
In this embodiment, in order to verify the correctness and feasibility of the above method, the construction and training process of the proposed method is performed in Python 3.9 of Pytorch framework. Or other systems which can be used for constructing models and training processes are selected, and the selection is specifically based on experimental suitability.
Further, the Markov decision model solving comprises; s2.5, fitting an action cost function according to the deep neural network; given stateTake action->Policy based->The specific process of continuously interacting with the environment to obtain the desired rewards may define an action-cost function as a Q function, such as:
wherein ,expressed as policy->Lower expected value->For discounts factor->Weight parameters to be optimized for the Q network, t being denoted as time t, < >>A bonus function value expressed as time t;
and solving the Markov decision model, and selecting the action with the maximum Q value by the intelligent agent according to the predicted Q value, and taking effect at the preset next moment.
In this embodiment, according to the construction and training process of the proposed method performed in Python 3.9 of pythorch framework, the parameters of the DQN method used are set as follows:
further, the Markov decision model solving further comprises; s2.6, applying a target Q network and an experience playback mechanism; s2.7 updating parameters of the loss function with the Adam optimizerWherein, all evaluation networks can be iteratively updated by minimizing a joint regression loss function, the loss function being:
wherein ,expressed as desired value +.>Expressed as target Q value, +.>For rewarding function value->For discounts factor->For the weight parameter of the target Q network, +.>Expressed as predicted Q value;
s2.8 use ofGreedy policies select actions of the Q network.
In this embodiment, the Q network outputs a real function according to the function Q ϕ (s, a) which is typically a function with a parameter ϕ, such as a neural network. The target Q network may be DQN (Deep Q-network), which refers to a Q learning algorithm based on Deep learning, mainly combines value function approximation and neural network technology, and performs network training by using a target network and experience playback method.
In this embodiment, the empirical playback mechanism is specifically as follows: an empirical playback mechanism is used to break the correlation between sample data. Namely, an experience playback buffer zone is constructed, and experience data obtained by the interaction of an intelligent agent and a three-phase asymmetric power distribution network environment are obtained in a training roundIs stored into the buffer. When the experience buffer storage reaches the set capacity, on the one hand, the agent starts to update its own network parameters, i.e. a certain amount of experience sample data is randomly extracted from the experience playback buffer first +.>Iterative updating of network parameters is then implemented based on the extracted sample data. On the other hand, the experience playback buffer zone can automatically delete sample experience data generated by the initial interaction with the three-phase asymmetric distribution network environment and store the latest learned sample experience data.
In this embodiment, adam optimizer (Adaptive Moment Estimation, optimizer) has the effect of fast gradient descent and easy oscillation around the optimal value.
Further, the Markov game model comprises; s3.1 setting State spaceState space->Comprising distribution network areas->Active power and reactive power of all nodes in the photovoltaic power generator, active power and reactive power of photovoltaic power generator and node voltage amplitudeTime +.>Status information of the internal parallel capacitor bank and the phase change switch; s3.2 setting action space->The reactive power output value of each photovoltaic inverter in the area is calculated; s3.3 setting a reward function->A reward function for representing underlying multi-agent sharing; every +.>And at time intervals, obtaining a corresponding action strategy through each intelligent agent according to the local state information in the area, then carrying out load flow calculation of the three-phase asymmetric power distribution network to obtain measurement information such as voltage amplitude values of all nodes, finally calculating a reward function value at the current moment on the basis, and transferring the three-phase asymmetric power distribution network to the next moment.
It should be noted that in multi-agent reinforcement learning, each agent will have its own cost function and will autonomously learn and formulate strategies based on environmental observations and interactions with the goal of maximizing its utility value.
Further, since each agent does not take into account the impact of its policies on other agents when interacting with the environment. Thus, there may be a competition or collaboration situation under the influence of multiple agents interacting with each other. While multi-agent decisions can be specifically analyzed using game theory. For different multi-agent reinforcement learning scenes, different game frameworks can be adopted to simulate interaction scenes, and the three categories can be divided as a whole.
For example, a static game in which all agents make decisions at the same time and each agent makes only one action. Since each agent acts only once, it can make some unexpected fraud and traitor strategies to benefit itself from gaming. Thus, in static gaming, each agent needs to consider and protect against fraud and traitors of other agents in making policies to reduce its own losses.
For example, a repeated game is one in which multiple agents take repeated decision actions in the same state. Thus, the total cost function for each agent is the sum of its value at each decision action. Compared with static game, repeated game greatly avoids malicious action decisions among multiple agents, so that the sum of total benefit values of all agents is improved as a whole.
By way of example, random gaming (or markov gaming) may be considered a markov process in which there are multiple agents making action decisions in multiple states. Each intelligent agent can make an optimal action decision for improving the self-cost function through observing the environment and predicting the actions of other intelligent agents according to the state of the intelligent agent.
Further, solving the Markov game model includes; s3.4, considering local observation state information and action information of the intelligent agent, and considering contribution degree of local information of other intelligent agents; s3.5 based on three trainable parameter sharing matrices, all evaluation networks can be iteratively updated by minimizing the joint regression loss function: s3.6, each agent can update the parameters of the own action network based on the gradient strategy; and S3.7, updating target network parameters so that each intelligent agent selectively pays attention to related information of other intelligent agents in the Q value estimation process.
In this embodiment, parameters of the MAAC method used in the training process for solving the Markov game model are as follows:
in combination with the related principles of the above embodiments, the related data is further added to demonstrate the feasibility. Specifically, the three-phase unbalanced current represents the advantage of the proposed control method in the aspect of compensating the three-phase unbalanced current, and in order to introduce a compensation degree index to quantify the treatment effect of the current unbalance, the compensation degree of the negative sequence and zero sequence current components is respectively defined as follows:
(16)
(17)
wherein in the formulas (16) and (17), andThe compensation degree of positive sequence, negative sequence and zero sequence current components respectively;Andthe amplitudes of the current components of the negative sequence before and after reactive power compensation are respectively; andThe amplitudes of the zero sequence current components before and after reactive power compensation are respectively obtained. Then, respectively randomly extracting two typical days from sample data of a test set, and testing 960 groups of sample data to obtain the average zero sequence current, the average negative sequence current and the compensation degree value in the test set as follows:
furthermore, in the control method, the average negative sequence current component and the average zero sequence current component which pass through the transmission and distribution connection node in the test set are greatly reduced compared with the original value, the compensation degree is more than 55%, and the advantages of the method in the aspect of three-phase unbalanced current compensation are verified.
Specifically, in order to verify the advantages of the proposed control method in voltage control, and introduce a success rate index to quantify the effect of node voltage amplitude control, the success rate of voltage regulation is defined as follows:
(18);/>
wherein in the formula (18),the success rate of voltage amplitude adjustment;The number of voltage out-of-limit accidents before the regulation method is adopted;The voltage amplitude values are all in the safe range after the regulation method is adopted. Then, based on the 960 groups of sample data, performing voltage control verification test, and obtaining the statistical result of the voltage amplitude adjustment success rate after the method is adopted, wherein the statistical result is as follows:
preferably, the experimental results of the maximum value, the minimum value and the maximum value of the voltage unbalance degree of the three-phase voltages of the nodes a, b and c in the test set data after the method is adopted are as follows:
referring to fig. 8a to 11b, the probability distribution diagrams of the phase voltage amplitudes of the nodes a, b and c and the three-phase voltage unbalance in the test set data except the node 0 before and after the proposed method are shown.
FIG. 8a is a graph showing a frequency distribution of the amplitude of the phase voltage at node a, wherein two 4000 frequency values exist in the range of 0.95p.u and 0.97 p.u; FIG. 8b is a plot of amplitude versus frequency for another phase voltage at node a, up to 6000 at 0.99 p.u. frequency, in accordance with one embodiment of the present invention; FIG. 9a is a plot of node b phase voltage magnitude versus frequency for a frequency of 4500 frequency values around 0.98p.u according to an embodiment of the present invention; FIG. 9b is a plot of amplitude versus frequency for another phase of voltage at node b for an embodiment of the present invention, the plot having frequency values above 4000 at 0.99p.u to 1 p.u; FIG. 10a is a plot of the amplitude versus frequency of the phase voltage at node c versus 3500 for a plot of 0.96p.u to 0.98p.u frequency; FIG. 10b is a plot of amplitude versus frequency for another phase voltage at node c for an embodiment of the present invention, with dense frequency values of ripple occurring at 0.98p.u to 1.01 p.u; FIG. 11a is a plot of voltage imbalance frequency distribution for an embodiment of the present invention, centered around 1400 at 0.5% imbalance to 1.5% imbalance frequency; FIG. 11b is a plot of another voltage imbalance frequency distribution graph showing frequency values at 0% imbalance and frequency values zero before 2% imbalance, and no more than 1500 frequency values, according to an embodiment of the present invention.
It should be noted that, by jointly scheduling different types of regulating equipment such as the parallel capacitor bank, the phase-change switch and the photovoltaic inverter, the control method not only can avoid the phenomenon of voltage out-of-limit by 100%, but also can make the voltage amplitude of each phase of a, b and c relatively stable and close to the rated voltage. In addition, the method can also enable the three-phase voltage amplitude values of all the nodes to be similar, namely, the voltage unbalance degree is ensured to be in a safety range.

Claims (6)

1. A multi-type resource cooperative regulation method for three-phase unbalanced management of a distribution network is characterized in that,
s1 setting five-tuple setAs construction model coordinates, state space->Action space->Bonus function->State transition probability function->
S2, constructing a Markov decision model, and solving the Markov decision model by adopting a first calculation method to obtain a control strategy of the parallel capacitor bank and the phase change switch;
s3, constructing a Markov game model, and solving the Markov game model by adopting a second calculation method to enable the selected agent to selectively pay attention to information of non-selected agents in the Q value estimation model;
s4, constructing an upper-layer agent of a Markov decision model and a lower-layer agent of a Markov game model by adopting two-step collaborative training;
the first calculation method adopts a depth neural network fitting function DQN to obtain an optimal control strategy of the parallel capacitor bank and the phase change switch;
the second calculation method adopts a multi-attention action-evaluation MAAC which introduces an attention mechanism to solve the Markov game, so that the selected agent selectively pays attention to the related information of the non-selected agent in the Q value estimation process, and the calculation complexity and the storage space are reduced;
the two-step method adopts a multi-time scale control method to cooperatively train an upper-layer agent for constructing a Markov decision model and a lower-layer agent for constructing a Markov game model, so that the parallel capacitor bank, the phase change switch and the photovoltaic inverter cooperatively act.
2. The resource collaborative regulation method of claim 1, wherein S1 comprises:
s2.1 setting the State spaceActive power and reactive power of all nodes of the power distribution network, active power of photovoltaic equipment and node voltage amplitude;
s2.2 setting the action spaceThe device comprises a parallel capacitor bank and an action instruction of a phase change switch;
s2.3 setting the reward functionThe method comprises the steps of adding zero sequence and negative sequence current components through a transmission and distribution connection node, and obtaining a voltage out-of-limit penalty value and a voltage unbalance out-of-limit penalty value; wherein (1)>Respectively expressed as zero sequence and negative sequence components passing through the transmission and distribution connection node;The punishment items are respectively expressed as punishment items of node voltage out-of-limit and punishment items of voltage unbalance violation;Represented as amplitude at time t; t is denoted as time t;
s2.4 setting a State transition probability function
The state spaceSaid action space->The reward function->The characterized upper level agent and is used to maximize the cumulative discount rewards.
3. The resource collaborative regulation and control method of claim 1, wherein the markov decision model solution includes;
s2.5, fitting an action cost function according to the deep neural network; fixed stateTake action->Policy based->The specific process of continuously interacting with the environment to obtain the desired rewards defines an action-cost function as a Q function:
;
wherein ,expressed as policy->Lower expected value->For discounts factor->Weight parameters to be optimized for the Q network, t being denoted as time t, < >>A bonus function value expressed as time t;
and solving the Markov decision model, and selecting the action with the maximum Q value by the intelligent agent according to the predicted Q value, and taking effect at the preset next moment.
4. A resource co-ordination method as claimed in claim 1 or claim 3, wherein the markov decision model solution further comprises;
s2.6, applying a target Q network and an experience playback mechanism;
s2.7 updating parameters of the loss function with the Adam optimizerWherein, all evaluation networks can be iteratively updated by minimizing a joint regression loss function, the loss function being:
;
wherein ,expressed as desired value +.>Expressed as target Q value, +.>For rewarding function value->For discounts factor->For the weight parameter of the target Q network, +.>Expressed as predicted Q value;
s2.8 use ofGreedy policies select actions of the Q network.
5. The resource co-regulation method of claim 1, wherein the markov game model comprises;
s3.1 setting State spaceThe state space->Comprising distribution network areas->Active power and reactive power of all nodes in the photovoltaic system, active power and reactive power of the photovoltaic system, node voltage amplitude and time +.>Status information of the internal parallel capacitor bank and the phase change switch;
s3.2 setting an action spaceThe reactive power output value of each photovoltaic inverter in the area is calculated;
s3.3 setting a reward functionA reward function for representing underlying multi-agent sharing;
every other interval in the underlying multi-agent architecture of the Markov game modelAt intervals according to the time of each agentAnd (3) obtaining corresponding action strategies according to local state information in the area, then carrying out load flow calculation of the three-phase asymmetric power distribution network to obtain measurement information such as voltage amplitude values of all nodes, finally calculating a reward function value at the current moment on the basis, and transferring the three-phase asymmetric power distribution network to the next moment.
6. The resource collaborative conditioning method according to claim 1, wherein said solving a markov gaming model comprises;
s3.4, testing local observation state information and action information of the intelligent agent and also testing contribution degree of the local information of the intelligent agent;
s3.5 based on three trainable parameter sharing matrices, all evaluation networks are iteratively updated by minimizing the joint regression loss function:
s3.6, each agent updates parameters of the own action network based on the gradient strategy;
and S3.7, updating target network parameters to enable each selected intelligent agent to pay attention to the relevant information of the non-selected intelligent agent in the Q value estimation process.
CN202310696501.7A 2023-06-13 2023-06-13 Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network Active CN116454926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310696501.7A CN116454926B (en) 2023-06-13 2023-06-13 Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310696501.7A CN116454926B (en) 2023-06-13 2023-06-13 Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network

Publications (2)

Publication Number Publication Date
CN116454926A true CN116454926A (en) 2023-07-18
CN116454926B CN116454926B (en) 2023-09-01

Family

ID=87132361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310696501.7A Active CN116454926B (en) 2023-06-13 2023-06-13 Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network

Country Status (1)

Country Link
CN (1) CN116454926B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823803A (en) * 2023-07-21 2023-09-29 深圳鑫舟生物信息科技有限公司 Biological compensation physiotherapy system
CN116961139A (en) * 2023-09-19 2023-10-27 南方电网数字电网研究院有限公司 Scheduling method and scheduling device for power system and electronic device
CN117477607A (en) * 2023-12-28 2024-01-30 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch
CN117806170A (en) * 2024-02-23 2024-04-02 中国科学院近代物理研究所 Microbeam focusing control method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019129729A1 (en) * 2017-12-31 2019-07-04 Vito Nv Unbalance compensation by optimally redistributing current
KR20210051043A (en) * 2019-10-29 2021-05-10 중앙대학교 산학협력단 Method and apparatus for optimizing home energy management system in three-phase unbalanced low-voltage distribution network
CN113489015A (en) * 2021-06-17 2021-10-08 清华大学 Power distribution network multi-time scale reactive voltage control method based on reinforcement learning
CN115117901A (en) * 2022-06-17 2022-09-27 佳源科技股份有限公司 Distribution area three-phase imbalance optimization method and system applying distributed photovoltaic access
CN115986750A (en) * 2022-12-30 2023-04-18 南京邮电大学 Voltage regulation method for layered multi-agent deep reinforcement learning power distribution network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019129729A1 (en) * 2017-12-31 2019-07-04 Vito Nv Unbalance compensation by optimally redistributing current
KR20210051043A (en) * 2019-10-29 2021-05-10 중앙대학교 산학협력단 Method and apparatus for optimizing home energy management system in three-phase unbalanced low-voltage distribution network
CN113489015A (en) * 2021-06-17 2021-10-08 清华大学 Power distribution network multi-time scale reactive voltage control method based on reinforcement learning
CN115117901A (en) * 2022-06-17 2022-09-27 佳源科技股份有限公司 Distribution area three-phase imbalance optimization method and system applying distributed photovoltaic access
CN115986750A (en) * 2022-12-30 2023-04-18 南京邮电大学 Voltage regulation method for layered multi-agent deep reinforcement learning power distribution network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DI CAO; JUNBO ZHAO: "Attention Enabled Multi-Agent DRL for Decentralized Volt-Var Control of Active Distribution System Using PV Inverters and SVCs", IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, vol. 12, no. 3, pages 1582 - 1592, XP011862135, DOI: 10.1109/TSTE.2021.3057090 *
YOONGUN JUNG; CHANGHEE HAN: "Adaptive Volt-Var Control in Smart PV Inverter for Mitigating Voltage Unbalance at PCC Using Multiagent Deep Reinforcement Learning", APPLIED SCIENCE, vol. 11, no. 19, pages 1 - 14 *
张剑; 崔明建; 姚潇毅: "基于数据驱动与物理模型的主动配电网双时间尺度协调优化", 电力系统自动化, pages 1 - 16 *
黄辉; 余泓圻; 刘鹏伟: "考虑三相有功不平衡度的无功电压集中控制策略", 云南电力技术, vol. 48, no. 2, pages 31 - 36 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116823803A (en) * 2023-07-21 2023-09-29 深圳鑫舟生物信息科技有限公司 Biological compensation physiotherapy system
CN116823803B (en) * 2023-07-21 2024-01-30 深圳鑫舟生物信息科技有限公司 Biological compensation physiotherapy system
CN116961139A (en) * 2023-09-19 2023-10-27 南方电网数字电网研究院有限公司 Scheduling method and scheduling device for power system and electronic device
CN116961139B (en) * 2023-09-19 2024-03-19 南方电网数字电网研究院有限公司 Scheduling method and scheduling device for power system and electronic device
CN117477607A (en) * 2023-12-28 2024-01-30 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch
CN117477607B (en) * 2023-12-28 2024-04-12 国网江西综合能源服务有限公司 Three-phase imbalance treatment method and system for power distribution network with intelligent soft switch
CN117806170A (en) * 2024-02-23 2024-04-02 中国科学院近代物理研究所 Microbeam focusing control method and device
CN117806170B (en) * 2024-02-23 2024-05-10 中国科学院近代物理研究所 Microbeam focusing control method and device

Also Published As

Publication number Publication date
CN116454926B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
CN116454926B (en) Multi-type resource cooperative regulation and control method for three-phase unbalanced management of distribution network
Yang et al. Reinforcement learning in sustainable energy and electric systems: A survey
CN110535146B (en) Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN113937829B (en) Multi-target reactive power control method of active power distribution network based on D3QN
Zhang et al. Deep reinforcement learning for load shedding against short-term voltage instability in large power systems
CN115588998A (en) Graph reinforcement learning-based power distribution network voltage reactive power optimization method
CN115409650A (en) Power system voltage control method based on near-end strategy optimization algorithm
CN103618315B (en) A kind of line voltage idle work optimization method based on BART algorithm and super-absorbent wall
CN117973644A (en) Distributed photovoltaic power virtual acquisition method considering optimization of reference power station
Mu et al. Graph multi-agent reinforcement learning for inverter-based active voltage control
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN115759370A (en) Mapping operation method based on MADDPG algorithm
Wei et al. Social cognitive optimization algorithm with reactive power optimization of power system
CN113344283B (en) Energy internet new energy consumption capability assessment method based on edge intelligence
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
CN115276067B (en) Distributed energy storage voltage adjusting method suitable for dynamic topological change of power distribution network
CN117893043A (en) Hydropower station load distribution method based on DDPG algorithm and deep learning model
CN115133540B (en) Model-free real-time voltage control method for power distribution network
CN117117989A (en) Deep reinforcement learning solving method for unit combination
CN116896112A (en) Active power distribution network distributed power supply collaborative optimization operation method and system
CN115983373A (en) Near-end strategy optimization method based on graph convolution neural network
Ao et al. The application of DQN in thermal process control
Vakula et al. Evolutionary Prisoner's Dilemma in updating fuzzy linguistic model to damp power system oscillations
CN113837654B (en) Multi-objective-oriented smart grid hierarchical scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant