CN114861792A

CN114861792A - Complex power grid key node identification method based on deep reinforcement learning

Info

Publication number: CN114861792A
Application number: CN202210484829.8A
Authority: CN
Inventors: 王红; 张岩; 齐林海
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-08-05

Abstract

A complex power grid key node identification method based on deep reinforcement learning belongs to the technical field of electric power big data processing. The method utilizes the thought of the Deep reinforcement learning model DDQN (Double Deep Q-Network, DDQN) interactive learning, completes the calculation of the Q value under the action of a specific state by the self-initiated behavior exploration of an intelligent body and combining the empirical data formed by environment information, action information and reward information, thereby evaluating the value of the action of a certain state of the complex power grid. The method is based on data driving, and overcomes the limitations of establishing a mathematical model based on a business mechanism in various aspects of adaptability, algorithm efficiency and accuracy under the complex environment of the power distribution network. The method avoids the process of performing distribution hypothesis and characteristic modeling on the state according to a large amount of priori knowledge in the traditional complex power grid reconstruction process, reduces the complexity of key node identification, is more suitable for a key node identification method of a large-scale power grid under the background of big data, and has higher robustness and accuracy.

Description

Complex power grid key node identification method based on deep reinforcement learning

Technical Field

The invention relates to a complex power grid key node identification method based on deep reinforcement learning, and belongs to the technical field of electric power big data processing.

Background

The electric power system is used as the basic guarantee for the stable and healthy development of social economy and the survival of people of all countries in the world, and the safe, stable and continuous work of the electric power system is crucial to the social development. With the rapid development of human society, the demand for electricity in society is greatly increased, and the scale and complexity of a power grid are continuously enlarged, so that a power grid system is very complex. The development of new energy power is further promoted by the new concepts of carbon peak arrival, carbon neutralization and the like, but the coupling degree of different parts of a power grid can be further improved along with the continuous increase and change of the proportion of the new energy to be connected into a power system, so that the probability of occurrence of disturbance signals and the propagation capacity are more easily increased. The disturbance signals are transmitted through each node in the power grid, and when a key link of the power system is interfered, the spread range is wider, the influence range is deeper, systematic accidents are more easily caused, and the financial and manpower losses which are extremely dangerous and serious are caused. The complex power grid has the characteristic of a non-homogeneous topological structure, namely the number of key nodes is small, but the fault of the nodes can greatly affect the topological structure and the operation function of the network, and even can rapidly affect the whole network. If a critical link in a complex power grid fails once, large-area power failure can be caused, and loss of a small degree is caused. The power grid accidents are not only caused by inefficacy, but also certain responsibility is given to the instability of the power system, so that the stability of the power system is improved, key nodes of the complex power grid are identified, and the significance of pertinently deploying monitoring and adding protection to the safe and stable operation of the power system by utilizing limited financial resources and material resources is great.

At present, the existing key node identification methods can be divided into the following two categories: firstly, based on the static characteristics of the power grid, a pure topological structure is used for identifying key nodes of the complex power grid, such as topological betweenness and degree centrality. Such methods are computationally simple but are not accurate in practical power grids due to the lack of consideration of electrical engineering physical characteristics. And secondly, combining electrical characteristics by adopting a topological structure. Such as topological entropy, energy function, etc. The method has extremely high requirement on the richness of the prior knowledge, and is difficult to avoid the influence of subjective factors on the model, so that the constructed mechanism analysis model has increased difficulty in accurately describing high-dimensional, complex and time-varying object characteristics, and has low robustness for large-scale range power grid identification.

The traditional model established based on a mathematical statistics method and prior knowledge combined with mechanism analysis is not enough in completing the work, and the root cause is at least two points as follows: firstly, modeling data actually measured in the real world needs a large amount of priori knowledge support, and the performance of the model and the robustness of an identification result are directly influenced by the quality of modeling; secondly, the actual electrical data of the complex power grid is often influenced by a multi-physical-field coupling system, the characteristics are complex, the upper and lower correlations are close, and fitting the model generally requires a great amount of calculation, and even is difficult to bear.

The deep reinforcement learning technology has the advantages of learning affair characteristics in a data-driven and self-adaptive manner, low dependence on a specific mathematical model and transfer learning capability on a source domain. The Deep reinforcement learning model DDQN (Double Deep-Qnetwork) is a typical model applied to the field of various fields in the current Deep reinforcement learning field, has the characteristics of low modeling cost and capability of self-adapting to potential characteristics of learning data by completely depending on data drive, guides the model to narrow the difference with target distribution in a Double-Q cycle iteration mode, overcomes the problem that a dynamically-changed Q value function is difficult to calculate and converge by using a regression model, and avoids the problem that the Q value is over-estimated by a DQN model.

The method for identifying the key nodes of the complex power grid based on deep reinforcement learning is provided. Firstly, abstracting a complex power grid into an intuitive and simply-described unauthorized undirected graph of the connection relation between nodes and links; secondly, preprocessing and normalizing the node attribute and the link attribute of the topological structure to facilitate subsequent calculation, distributing the weight of the obtained topology and electrical information on the design of a reward function, and integrating an abstract topological graph to form a weighted undirected topological structure; the method comprises the steps of improving feature extraction of exploration data of an intelligent agent by adopting a classical CNN model on the design of a neural network, copying a Convolutional Neural Network (CNN) by using a data distribution network (DDQN) to form a double-Q network, optimizing the convergence speed and feature extraction capability of the model by using double iterative perception, and enabling the model to generate the Q value probability distribution which has the maximum similarity with known information and accords with data obtained by interaction of the intelligent agent in a complex power grid by using a back propagation algorithm. The method overcomes the problems of insufficient feature extraction and poor robustness among high-coupling nodes possibly caused by the conventional method for reconstructing missing data through explicit modeling, and in addition, the model can be suitable for other networks with more complex features, only needs to increase the level of a neural network along with the complexity and increase training, increases the exploration times of an intelligent agent, ensures the sufficient extraction of the potential features of the data, and has extremely high generalization capability and stability. The artificial intelligence method is characterized in that a reward matrix aiming at a complex power grid environment is provided by combining a power grid topological structure with priori knowledge such as electrical distance, an environment solved by mass interactive data of an agent and the reward matrix is formed, an optimal strategy for completing a target task is learned by applying a deep double-Q network in a deep reinforcement learning model, and the artificial intelligence method which is high in efficiency and robustness and suitable for complex power grid key node identification is achieved.

Disclosure of Invention

The invention aims to provide a complex power grid key node identification method based on deep reinforcement learning, aiming at the problems of complex modeling and low identification efficiency and robustness of the traditional power grid key node identification method.

The invention adopts a deep double-Q network (DDQN) with strong decision-making capability and autonomous feature extraction capability as a main framework, and combines double Q value iterative perception to regularly summarize exploration data with complex relationships. The method obtains the potential characteristics of the data in a self-adaptive exploration utilization and unsupervised learning mode, and completely overcomes the defects of adaptability, algorithm efficiency, accuracy and the like due to the fact that a large amount of priori knowledge is needed for complex modeling of the power grid. The method of the invention has good robustness.

Firstly, abstract preprocessing is carried out on a static complex power grid, all nodes of large components including a generator and a transformer and links of the large components are simplified into connection of points and lines based on graph theory, existence and physical connection characteristics of the nodes are only considered, and the complex power grid is abstracted into a pure non-right undirected connection graph.

Then, carrying out normalization processing on the attribute of the power grid topological structure and the physical characteristic data of the electrical engineering, and carrying out statistics on the related attribute of the power grid topological structure, including the degree of entrance and exit of the node and the strength of the node; the physical properties of electrical engineering include electrical distance, i.e. equivalent impedance. The data are normalized by (0,1), the normalization can better compare the difference between the same data, and the problem of dimension explosion of the neural network is solved.

And then, designing an incentive function by using a Gaussian function and an objective weighting method, wherein the smaller the equivalent impedance is, the tighter the connection between two nodes is, the inverse distance weighting is performed on the electrical distance represented by the equivalent impedance by using the Gaussian function, so that the weight between two points with the smaller equivalent impedance is larger, and the proportion of different attributes in the incentive function is configured by using an objective weighting method, so that a depth-enhanced learning incentive function suitable for the selected static complex network is obtained.

Secondly, recording the exploration process of the agent by using an experience pool, and inputting exploration data into a dual neural network iterative perception Q value. The recording of the exploration process of the intelligent agent by the experience pool is a balance of exploration and utilization mechanisms, the reuse rate of data is increased, and the data is recorded to be a data packet which comprises a node state s, an action a, a reward r and a next node state s 'and is packaged into a plurality of groups of shapes (s, a, r and s') and stored into the experience pool to be used for training the Q values of different node states of the neural network. In the double-Q network, the double neural network is utilized to perform sensing prediction on the Q values of the action states at different time steps and iterate after a certain time step, the loss function is reduced faster due to double sensing, a more stable regression prediction scheme is provided, and the result is more accurate.

And thirdly, designing a proper neural network model according to the scale of the static power grid, and training and fitting the Q values in different states. The neural network model adopts a classical CNN model, wherein the classical CNN model comprises a layer of input layers of all states of n power grid nodes, a layer of convolution layers of 30 convolution kernels of 3 x 3, a layer of convolution layers of 60 convolution kernels of 3 x 3, a last hidden layer which is a full-connection layer, and a final output layer which is a full-connection layer, outputs a vector containing a Q value of each legal action and represents the value of state transfer to different power grid nodes. The trained neural network can calculate action selections of different probability distributions formed according to the Q value for the input of any state, and a complete link with the highest value can be finally obtained through a series of continuous inputs.

And finally, measuring the optimal link by taking the node Q value as a standard, and outputting key nodes with different importance in the global power grid according to the frequency of the nodes appearing in the optimal solution set. The method comprises the steps of using a trained neural network to obtain values Q of different actions in different states, screening out the optimal link between any two points according to the Q value maximization principle, and obtaining key nodes with different importance according to the frequency of the nodes appearing in the optimal link set and sequencing.

Drawings

The fig. 1IEEE30 node abstracted structure diagram without authority.

Figure 2 is a diagram of a dual Q neural network iterative perception operation machine.

Fig. 3 is a schematic diagram of a CNN neural network.

FIG. 4 is a graph of importance of critical node identification results of IEEE30 power system using DDQN model.

Figure 5DDQN model operational flow diagram.

The DDQN model solves the process of the key nodes of the power grid.

FIG. 7 is a flow chart of the present invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples, but the practice of the invention is not limited thereto.

Example (b):

the power grid system structure of the present embodiment is selected from an open source data IEEE30 node system structure.

Step 1: the method comprises the steps of carrying out abstraction preprocessing on a static complex power grid, simplifying all nodes of large components including a generator and a transformer and links thereof into connection of points and lines based on a graph theory, abstracting the complex power grid into a pure non-weight undirected connection diagram only by considering existence and physical connection characteristics of the nodes, wherein the total number of the nodes is 30, and 41 connection edges are shown in figure 1.

Step 2: carrying out (0,1) normalization processing on the attribute of the power grid topological structure and the physical characteristic data of the electrical engineering, carrying out statistics on the related attribute of the power grid topological structure, carrying out node access degree normalization processing, firstly, scanning the data of the attribute once from beginning to end, and finding out the maximum access degree d of the attribute _max And a minimum degree of ingress and egress d _min . Then, a min-max normalization formula is used for solving the normalization value D of the input and output degree of each node existing in the complex power grid, wherein the formula is as follows:

normalizing the node strength, firstly, scanning the data of the attribute from beginning to end once to find out the maximum node strength s of the attribute _max And minimum node strength s _min . Then, a min-max normalization formula is used for solving a node strength S normalization value S of each node existing in the complex power grid, wherein the formula is as follows:

the physical characteristics of the electrical engineering comprise electrical distance, namely equivalent impedance, to carry out (0,1) normalization processing, firstly, the data of the attribute needs to be scanned from beginning to end once, and the maximum equivalent impedance z of the attribute is found out _max And minimum equivalent impedance z _min . Then, a min-max normalization formula is used for solving an equivalent impedance Z normalization value Z of each node existing in the complex power grid, wherein the formula is as follows:

and step 3: the reward function is designed by using a Gaussian function and an objective weighting method, wherein the smaller the equivalent impedance is, the closer the connection between two nodes is, and therefore the electrical distance represented by the equivalent impedance is inversely weighted by using the Gaussian function, so that the smaller the equivalent impedance is, the higher the weight between the two nodes is. The formula is as follows:

and then configuring the proportion of different attributes in the reward function by using an subjective and objective weighting mode, thereby obtaining the deep reinforcement learning reward function suitable for the selected static complex network, wherein the formula is as follows:

wherein Z is _ij Representing the equivalent impedance after a Gaussian transformation, d _j Represents the degree of entry or exit of node j, s _j Representing the strength of node j.

And 4, step 4: and recording the exploration process of the intelligent agent by using an experience pool, and inputting exploration data into a dual neural network iteration perception Q value. The process of searching the intelligent agent is recorded by the experience pool through balancing a searching mechanism and a utilization mechanism, the reuse rate of data is increased, the data is recorded to include a node state s, an action a, a reward r and a next node state s ', and a plurality of groups of data packets in the shape of (s, a, r, s') are stored and enter the experience pool to be used for training the Q values of different node states of the neural network. In the double-Q network, the dual neural network is used for carrying out perception prediction on the Q values of the action states at different time steps and carrying out iteration after a certain time step, the loss function is reduced faster due to the dual perception, a more stable regression prediction scheme is provided, the result is more accurate, and the working mechanism of the double-Q network is shown in figure 2. The loss function is:

L＝E[(r _t+1 +γQ _target (s _t+1 ,argmax _a (Q _eval (s _t+1 ,a _t+1 ；θ)))-Q _eval (s _t ,a _t ；θ)) ² ]

wherein r represents the reward value, a represents the action, γ Qtarget is the value of the action prediction output by the fixed target network in the dual Q network according to the current training network, and Qeval (s, a, θ) is the value of the current training network prediction.

And 5: and designing a proper neural network model according to the scale of the static power grid, and training and fitting the Q values under different states. The neural network model adopts a classical CNN model, wherein the classical CNN model comprises a layer of input layers of all states of 30 power grid nodes, a layer of convolution layers of 30 convolution kernels of 3 x 3, a layer of convolution layers of 60 convolution kernels of 3 x 3, a last hidden layer which is a full-connection layer, and a final output layer which is a full-connection layer, and outputs a vector containing a Q value of each legal action to represent the value of state transfer to different power grid nodes, as shown in FIG. 3. The trained neural network can calculate action selections of different probability distributions formed according to the Q value for the input of any state, and a complete link with the highest value can be finally obtained through a series of continuous inputs.

Step 6: and measuring the optimal link by taking the Q value of the node as a standard, and outputting key nodes with different importance in the global power grid according to the frequency of the node in the optimal solution set. The method for using the node Q value as the standard for measuring the optimal link comprises the steps of obtaining values Q of different actions selected in different states by using a trained neural network, screening the optimal link between any two points according to the Q value maximization principle, obtaining key nodes with different importance according to the frequency of the nodes in the optimal link set, and sequencing, wherein the result is shown in fig. 4. The calculation formula is as follows:

wherein l _jk (i) Indicates the optimal number of links containing node i, l _jk Representing the optimal number of links between all nodes.

The above embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to this, for example, the neuron level in the neural network may be increased or decreased according to the actual level, and different deep learning models may be used. The model has great advantages in processing complex data feature extraction, and can be applied to complex power grids with more state numbers.

The invention provides a complex power grid key node identification method based on deep reinforcement learning, and belongs to the technical field of electric power big data processing. The method utilizes the thought of the Deep reinforcement learning model DDQN (Double Deep Q-Network, DDQN) interactive learning, completes the calculation of the Q value under the action of a specific state by the self-initiated behavior exploration of an intelligent body and combining the empirical data formed by environment information, action information and reward information, thereby evaluating the value of the action of a certain state of the complex power grid. The method is based on data driving, and overcomes the limitations of establishing a mathematical model based on a business mechanism in various aspects of adaptability, algorithm efficiency and accuracy under the complex environment of the power distribution network. The method avoids the process of performing distribution hypothesis and characteristic modeling on the state according to a large amount of priori knowledge in the traditional complex power grid reconstruction process, reduces the complexity of key node identification, is more suitable for a key node identification method of a large-scale power grid under the background of big data, and has higher robustness and accuracy.

Claims

1. A complex power grid key node identification method based on deep reinforcement learning is characterized in that a deep double-Q network in the deep reinforcement learning is adopted, an interactive learning mode is established in a Markov view, and the Q value of each power grid node state is evaluated through empirical data interactively collected by an intelligent agent and a complex power grid environment; then, a state action CNN convolution model suitable for a static power grid environment is trained according to context action information and reward information by utilizing the spatial-temporal distribution characteristics and potential characteristics of data among the nodes captured by the neural network, an optimal link between any two points in the power grid is found, and the frequency of each node appearing on the optimal link is counted to complete the sequencing of the global key nodes of the power grid, wherein the method comprises the following steps:

step 1: performing abstract preprocessing on the static complex power grid;

step 2: carrying out normalization processing on the power grid topological structure attribute and the electrical engineering physical characteristic data;

and step 3: designing an incentive function by using a Gaussian function and an objective and subjective weighting method;

and 4, step 4: recording the exploration process of the intelligent agent by using an experience pool, and inputting exploration data into a dual neural network iterative perception Q value;

and 5: designing a proper neural network model according to the scale of the static power grid, and training and fitting Q values in different states;

step 6: and measuring the optimal link by taking the Q value of the node as a standard, and outputting key nodes with different importance in the global power grid according to the frequency of the node in the optimal solution set.

2. The complex grid key node identification method based on deep reinforcement learning as claimed in claim 1, wherein the abstract preprocessing of the complex grid in step 1 simplifies all large component nodes including generators and transformers and links thereof into connection with points and lines based on graph theory, and only considers existence and physical connection characteristics of the nodes.

3. The complex grid key node identification method based on deep reinforcement learning according to claim 1, wherein the statistics of the grid topology related attributes in step 2 includes node entry and exit degrees and node strength; the physical characteristics of electrical engineering include electrical distance, i.e., equivalent impedance, and the above data is subjected to local normalization processing of (0, 1).

4. The complex grid key node identification method based on deep reinforcement learning as claimed in claim 1, wherein the designing of the reward function in step 3 comprises performing inverse distance weighting on equivalent impedance by using a gaussian function, and configuring the proportion of different attributes in the reward function by using a subjective and objective weighting mode, so as to obtain the deep reinforcement learning reward function suitable for the selected static complex network.

5. The method for identifying key nodes of a complex power grid based on deep reinforcement learning according to claim 1, wherein the step 4 of recording the discovery process of an agent by using an experience pool is a balance between discovery and utilization mechanisms, and increases the reuse rate of data, and the data is recorded to include a node state s, an action a, a reward r, and a next node state s ', and is packaged into a plurality of groups of data packets (s, a, r, s') to be stored in the experience pool and used for the neural network to train Q values of different node states in the step 5, and the dual neural network is used for perception prediction of the Q values of the action states at different time steps and iteration is performed after a certain time step, so that the loss function is more rapidly reduced due to dual perception, a more stable regression prediction scheme is provided, and the result is more accurate.

6. The method for identifying key nodes of the complex power grid based on deep reinforcement learning according to claim 1, wherein the neural network model in the step 5 adopts a classical CNN model, which comprises an input layer and two convolutional layers, wherein the last hidden layer is a fully-connected layer, the final output layer is a fully-connected layer, and a vector containing a Q value of each legal action is output to represent the value of state transition to different power grid nodes.

7. The complex grid key node identification method based on deep reinforcement learning as claimed in claim 1, wherein the using of the node Q value as the criterion for measuring the optimal link in step 6 includes obtaining values Q of different actions selected in different states by using a trained neural network, screening the optimal link between any two points according to the principle of Q value maximization, obtaining key nodes of different importance according to the frequency of the nodes appearing in the optimal link set, and sorting the key nodes.