KR20240146224A

KR20240146224A - Physics-informed reinforcement learning model for compensating unbalanced voltage in the distribution system

Info

Publication number: KR20240146224A
Application number: KR1020230040906A
Authority: KR
Inventors: 최승연; 윤영걸
Original assignee: 고려대학교 산학협력단
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2024-10-08
Also published as: WO2024205050A1

Abstract

물리적 제약을 고려한 강화학습 모델 기반 배전계통 내 전압불평형 해소 장치 및 방법이 개시된다. 상기 전압불평형 해소 방법은 적어도 프로세서(processor)를 포함하는 전압불평형 해소 장치에 의해 수행되는 전압불평형 해소 방법으로, 배전계통의 관측 데이터를 수신하는 단계, 학습된 인공 신경망 모델을 이용하여 상기 관측 데이터로부터 행동을 예측하는 단계, 및 상기 행동에 대응하는 제약된 행동을 산출하는 단계를 포함한다.A voltage imbalance resolution device and method in a distribution system based on a reinforcement learning model considering physical constraints are disclosed. The voltage imbalance resolution method is a voltage imbalance resolution method performed by a voltage imbalance resolution device including at least a processor, and includes a step of receiving observation data of a distribution system, a step of predicting an action from the observation data using a learned artificial neural network model, and a step of generating a constrained action corresponding to the action.

Description

{PHYSICS-INFORMED REINFORCEMENT LEARNING MODEL FOR COMPENSATING UNBALANCED VOLTAGE IN THE DISTRIBUTION SYSTEM}

본 발명은 배전계통 내 인버터 기반 분산형 전원 출력 제어에 관한 것이며, 특히 배전계통의 특성을 고려하여 유효전력과 무효전력 출력을 제어하는 기법에 관한 것이다.The present invention relates to an inverter-based distributed power output control within a distribution system, and more particularly, to a technique for controlling active power and reactive power output by taking into account the characteristics of a distribution system.

배전계통 내 재생에너지 연계량이 증가함에 따라 단상 연계 인버터 기반 분산형 전원이 증가한다. 이는 3상 전압불평형을 야기하며 전력 기기에 손상을 주고 수명을 단축시켜 안정적, 경제적 계통 운영에 영향을 미친다.As the amount of renewable energy connected to the distribution system increases, the number of distributed power sources based on single-phase inverters increases. This causes three-phase voltage imbalances, damages power equipment, and shortens its lifespan, affecting stable and economical grid operation.

기존의 전압불평형 해소를 위한 인버터 출력 제어기법은 보상을 위한 피드백 제어 구조 또는 최적화 기법을 활용하고 있다. 피드백 제어 로직의 경우에는 파라미터가 고정되어 광범위한 제어가 어렵다. 최적화의 경우에는 주어진 환경에 대해 매번 연산을 수행해 많은 시간을 소요한다.The existing inverter output control techniques for resolving voltage imbalance utilize feedback control structures or optimization techniques for compensation. In the case of feedback control logic, the parameters are fixed, making it difficult to control in a wide range. In the case of optimization, it takes a lot of time to perform calculations every time for a given environment.

이에 본 발명에서는 배전계통의 물리적 제약을 고려한 강화학습 모델 기반의 배전계통 내 전압불평형 해소 기법을 제안하고자 한다.Accordingly, the present invention proposes a technique for resolving voltage imbalance in a distribution system based on a reinforcement learning model that takes into account the physical constraints of the distribution system.

대한민국 공개특허 제2020-0117170호 (2020.10.14. 공개)Republic of Korea Publication Patent No. 2020-0117170 (Published on October 14, 2020) 대한민국 공개특허 제2012-0065533호 (2012.06.21. 공개)Republic of Korea Publication Patent No. 2012-0065533 (Published on June 21, 2012)

본 발명이 이루고자 하는 기술적인 과제는 배전계통의 물리적 제약을 고려한 강화학습 모델 기반의 배전계통 내 전압불평형 해소 장치 및 방법을 제공하는 것이다.The technical problem to be achieved by the present invention is to provide a device and method for resolving voltage imbalance in a distribution system based on a reinforcement learning model that takes into account the physical constraints of the distribution system.

본 발명의 일 실시예에 따른 전압불평형 해소 방법은 적어도 프로세서(processor)를 포함하는 전압불평형 해소 장치에 의해 수행되는 전압불평형 해소 방법으로, 배전계통의 관측 데이터를 수신하는 단계, 학습된 인공 신경망 모델을 이용하여 상기 관측 데이터로부터 행동을 예측하는 단계, 및 상기 행동에 대응하는 제약된 행동을 산출하는 단계를 포함한다.A voltage imbalance resolution method according to one embodiment of the present invention is a voltage imbalance resolution method performed by a voltage imbalance resolution device including at least a processor, comprising the steps of: receiving observation data of a distribution system; predicting an action from the observation data using a learned artificial neural network model; and generating a constrained action corresponding to the action.

본 발명의 실시예에 따른 배전계통 내 전압불평형 해소 장치 및 방법에 의할 경우, 배전계통에 연계되는 단상 부하 및 분산형 전원으로 인해 발생하는 전압불평형 현상을 해소하여 안정적인 계통 운영에 기여할 수 있다.According to the device and method for resolving voltage imbalance in a distribution system according to an embodiment of the present invention, voltage imbalance phenomenon occurring due to single-phase loads and distributed power sources connected to a distribution system can be resolved, thereby contributing to stable system operation.

또한, 태양광 발전, 풍력 발전 및 에너지저장시스템과 같은 분산형 전원이 증대되는 미래 배전계통에서 계통 안정성을 높힐 수 있다.Additionally, it can enhance grid stability in future distribution systems where distributed power sources such as solar power generation, wind power generation, and energy storage systems are increasing.

또한, 본 발명은 실시간 제어에 적용하기 적합하도록 오프라인 학습된 모델을 활용하여 빠른 제어 신호를 도출할 수 있다.In addition, the present invention can derive a fast control signal by utilizing an offline learned model suitable for application to real-time control.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 상세한 설명이 제공된다.
도 1은 본 발명의 일 실시예에 따른 전압불평형 해소 장치 및 방법을 설명하기 위한 개념도이다.
도 2는 제안 기법의 DNN 구조로써, (a) 액터 네트워크와 (b) 크리틱 네트워크를 도시한다.
도 3은 제안 기법을 적용하기 위한 수정된 IEEE test feeder를 도시한다.
도 4는 랜덤하게 생성된 시나리오들로써, (a) 부하 프로파일, (b) PV 프로파일, 및 (c) SOC 프로파일을 도시한다.
도 5는 시나리오들의 VUF 비교 결과로써, (a) 제어를 하지 않은 경우와 (b) 제안 기법으로 제어한 경우의 결과를 도시한다.
도 6은 제어를 하지 않은 경우와 각 제어 모델에서의 평균 VUF를 비교한 결과를 도시한다.In order to more fully understand the drawings cited in the detailed description of the present invention, a detailed description of each drawing is provided.
FIG. 1 is a conceptual diagram illustrating a voltage imbalance resolution device and method according to one embodiment of the present invention.
Figure 2 shows the DNN structure of the proposed technique, showing (a) an actor network and (b) a critic network.
Figure 3 illustrates a modified IEEE test feeder for applying the proposed technique.
Figure 4 shows randomly generated scenarios, namely (a) load profile, (b) PV profile, and (c) SOC profile.
Figure 5 shows the results of VUF comparison of scenarios, showing the results for (a) no control and (b) control using the proposed technique.
Figure 6 shows the results of comparing the average VUF in each control model with the case of no control.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태들로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of embodiments according to the concept of the present invention disclosed in this specification are merely exemplified for the purpose of explaining embodiments according to the concept of the present invention, and embodiments according to the concept of the present invention can be implemented in various forms and are not limited to the embodiments described in this specification.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에서 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Since embodiments according to the concept of the present invention can have various changes and can have various forms, the embodiments are illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, but includes all modifications, equivalents, or substitutes included in the spirit and technical scope of the present invention.

제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 벗어나지 않은 채, 제1 구성 요소는 제2 구성 요소로 명명될 수 있고 유사하게 제2 구성 요소는 제1 구성 요소로도 명명될 수 있다.The terms first or second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are only intended to distinguish one component from another, for example, without departing from the scope of the invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.

어떤 구성 요소가 다른 구성 요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성 요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성 요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성 요소가 다른 구성 요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는 중간에 다른 구성 요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성 요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When it is said that a component is "connected" or "connected" to another component, it should be understood that it may be directly connected or connected to that other component, but that there may be other components in between. On the other hand, when it is said that a component is "directly connected" or "directly connected" to another component, it should be understood that there are no other components in between. Other expressions that describe the relationship between components, such as "between" and "directly between" or "adjacent to" and "directly adjacent to", should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로서, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 본 명세서에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is only used to describe particular embodiments and is not intended to be limiting of the present invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. It should be understood that, as used herein, the terms "comprises" or "has" and the like are intended to specify the presence of a feature, number, step, operation, component, part, or combination thereof described in the present specification, but do not exclude in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms defined in commonly used dictionaries, such as those defined in common usage dictionaries, should be interpreted as having a meaning consistent with the meaning they have in the context of the relevant art, and will not be interpreted in an idealized or overly formal sense unless explicitly defined herein.

이하, 본 명세서에 첨부된 도면들을 참조하여 본 발명의 실시예들을 상세히 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings attached to this specification. However, the scope of the patent application is not limited or restricted by these embodiments. The same reference numerals presented in each drawing represent the same components.

제안하는 방법에서 이용하는 강화학습(SDRL(Safe Deep Reinforcement Learning)은 진보된 DRL(Deep Reinforcement learning) 모델이라 할 수 있으며, 제안 기법은 물리적 제약(physical constraints)을 만족시키며 두 개의 모듈, 즉 학습 모듈(learning module, LM)과 제약 모듈(constraint module, CM)을 포함한다. 이 모듈들에서 사용되는 메인 알고리즘은 각각 DDPG(Deep Deterministic Policy Gradient)과 QP(Quadratic Programming)이다. 제안하는 기법이 개략도는 도 1에 도시되어 있다. The reinforcement learning (Safe Deep Reinforcement Learning (SDRL)) used in the proposed method can be said to be an advanced Deep Reinforcement learning (DRL) model, and the proposed technique satisfies physical constraints and includes two modules, namely, a learning module (LM) and a constraint module (CM). The main algorithms used in these modules are Deep Deterministic Policy Gradient (DDPG) and Quadratic Programming (QP), respectively. The schematic diagram of the proposed technique is illustrated in Fig. 1.

보다 구체적으로, 적어도 프로세서(processor) 및/또는 메모리(memory)를 포함하는 컴퓨팅 장치로 구현되는 전압불평형 해소 장치는 학습 모듈(학습부로 명명될 수도 있음)와 제약 모듈(제약부로 명명될 수도 있음)을 포함한다. 또한, 전압불평형 해소 장치는 배전 계통의 관측된 데이터를 수신하는 수신부(수신 모듈이라 명명될 수도 있음)를 더 포함할 수 있다. 실시예에 따라, 전압불평형 해소 장치는 배전 계통의 데이터를 측정하기 위한 감지부(감지 모듈이라 명명될 수도 있음)를 포함하는 것으로 이해될 수도 있다. 감지부는 배전계통을 관측하고 관측된 데이터를 수신부로 송신할 수 있다. 컴퓨팅 장치는 PC(Personal Computer), 서버(server) 등을 포함한다.More specifically, a voltage imbalance resolution device implemented as a computing device including at least a processor and/or a memory includes a learning module (which may be referred to as a learning unit) and a constraint module (which may be referred to as a constraint unit). In addition, the voltage imbalance resolution device may further include a receiving unit (which may be referred to as a receiving module) that receives observed data of a power distribution system. According to an embodiment, the voltage imbalance resolution device may be understood to include a sensing unit (which may be referred to as a sensing module) for measuring data of the power distribution system. The sensing unit can observe the power distribution system and transmit the observed data to the receiving unit. The computing device includes a personal computer (PC), a server, and the like.

MDP(Markov Decision Process, 마르코프 결정 프로세스)MDP(Markov Decision Process)

MDP는 S, A, P, R, 및 로 구성되는 튜플(tuple)이다. 여기서, S는 상태(states), A는 행동(actions), P는 전이 확률(transition probability), R은 보상(rewards), 그리고 는 디스카운트 팩터(discount factor)를 나타낸다. MDP의 정의는 수학식 1과 같다.MDP is S, A, P, R, and is a tuple consisting of S, A, P, R, and A. Here, S is states, A is actions, P is transition probability, R is rewards, and A is actions. represents the discount factor. The definition of MDP is as shown in mathematical expression 1.

[수학식 1][Mathematical formula 1]

MDP의 각 요소(component)는, 도 1에 도시된 바와 같이, 폐루프(closed loop)를 포함한다. 여기서, 상태, 행동, 및 보상은 점선(dotted arrowed lines)으로 표시되어 있다. P와 는 보상 계산 단계에 포함된다. 또한, P는 DDPG 내에서 DNN(Deep Neural Network, 심층 신경망)으로 대체되고, 요소는 고려하지 않는다. 시뮬레이션 내의 각 타임 스텝은 동등한 가치(equally valuable)를 가지기 때문이다. 배전계통(distribution system)의 전압불평형을 완화하는 나머지 MDP 요소(상태, 행동, 및 보상)에 관한 상세한 설명은 후술하기로 한다.Each component of the MDP contains a closed loop, as illustrated in Figure 1. Here, the state, action, and reward are indicated by dotted arrowed lines. P and is included in the reward calculation step. In addition, P is replaced with DNN (Deep Neural Network) in DDPG, The elements are not considered because each time step in the simulation is equally valuable. A detailed description of the remaining MDP elements (states, actions, and rewards) that mitigate voltage imbalances in the distribution system is given later.

1) 관측 및 상태(Observations and States)1) Observations and States

배전계통으로부터 관측된 데이터는 상태 및/또는 보상을 포함한다. 관측은 배전계통으로부터 측정된다. ESS(Energy Storage System) PCC(Point of the Common Coupling) 노드의 3상 전압(크기와 위상을 의미할 수 있음)이 관측(또는 측정)된다. 상태는 배전계통에서의 관측이나 측정을 의미할 수 있으며, 이는 보상을 계산하기 위해 그리고 크리틱 및 액터 네트워크(critic and actor networks)의 입력으로 이용된다. 상태는 수학식 2와 같다.The observed data from the distribution system include states and/or compensation. The observations are measured from the distribution system. The three-phase voltages (which may mean magnitude and phase) of the ESS (Energy Storage System) PCC (Point of the Common Coupling) node are observed (or measured). The states may mean observations or measurements from the distribution system, which are used to compute compensation and as inputs to critic and actor networks. The states are as shown in Equation 2.

[수학식 2][Mathematical formula 2]

수학식 2에서, |V_n|은 각 상 전압의 크기를, 은 각 상 전압의 위상을 나타낸다(a, b, c ∈ n). MDP에서 ESS의 3상 유효전력 출력은 행동으로 간주된다.In mathematical expression 2, |V _n | represents the magnitude of each phase voltage, represents the phase of each phase voltage (a, b, c ∈ n). In MDP, the three-phase real power output of the ESS is considered as an action.

2) 행동(Actions)2) Actions

행동은 강화학습(Reinforcement Learning, RL) 에이전트(agent)의 제어 변수(control variable)이다. 행동은 수학식 3과 같이 나타낼 수 있다.Actions are control variables of a reinforcement learning (RL) agent. Actions can be expressed as in mathematical expression 3.

[수학식 3][Mathematical formula 3]

수학식 3에서, 는 ESS의 상별 유효전력 출력을, 는 상별 역률(power factor)을 나타낸다.In mathematical expression 3, is the phase-specific active power output of ESS, represents the power factor by phase.

3) 보상(Rewards)3) Rewards

보상은 각 타임-스텝으로부터의 수익(revenue)으로, 보상의 합은 에피소드 리턴(episode return)이다. 에피소드(episode)는, 시작부터 완료될 때까지의, 에이전트의 전체 경험(total experience)이다. 에이전트는, 에피소드의 종료 시점에, 액터 및 크리틱 네트워크를 리턴으로부터 업데이트한다. 전압불평형을 최소화하기 위한 보상 함수(reward function)는, 수학식 4와 같이, 전압불평형 지수(voltage unbalanced factor, VUF)로 나타낼 수 있다.The reward is the revenue from each time-step, and the sum of the rewards is the episode return. An episode is the total experience of the agent from the beginning to the end. At the end of the episode, the agent updates the actor and critic networks from the return. The reward function for minimizing the voltage imbalance can be expressed as the voltage unbalanced factor (VUF), as in Equation 4.

[수학식 4][Mathematical Formula 4]

R = - VUF = -V₂/V₁×100R = - VUF = -V ₂ /V ₁ ×100

수학식 4에서, V₁과 V₂는 각각 ESS PCC 전압의 정상분 성분(정상분 전압 크기)과 역상분 성분(역상분 전압 크기)을 나타낸다.In mathematical expression 4, V ₁ and V ₂ represent the positive sequence component (positive sequence voltage size) and the negative sequence component (negative sequence voltage size) of the ESS PCC voltage, respectively.

학습 모듈(Learning Module, LM)Learning Module (LM)

학습 모듈은 소정의 DNN 모델을 학습시켜 행동을 예측하는 행동 예측 모델을 생성하거나, 및/또는 생성된 행동 예측 모델을 이용하여 배전계통의 임의의 지점에서 측정되어 실시간으로 수신되는 상태에 대응하는 행동을 예측할 수 있다. 이때, 실시간으로 행동을 예측하는 경우 두 개의 인공 신경망 모델들 중 행동을 예측하는 하나의 모델만이 이용될 수도 있다.The learning module can train a given DNN model to generate a behavior prediction model that predicts behavior, and/or can predict behavior corresponding to a state measured at an arbitrary point in the distribution system and received in real time using the generated behavior prediction model. In this case, when predicting behavior in real time, only one model among the two artificial neural network models that predicts behavior may be used.

학습 모듈(LM)의 DNN 레이어를 학습하기 위해 상태는 정규화된다. 또한, 네트워크 파리미터들을 효율적으로 학습하기 위해, z-스코어 정규화(z-score normalization)가 DNN 입력 레이어로 적용될 수 있다. ESS를 제어하지 않은 케이스(no-ESS control case)에 해당하는 기본 케이스(base case)로부터 상태의 평균과 표준 편차가 계산될 수 있다. 정규화된 상태는 수학식 5와 같이 계산된다.In order to learn the DNN layer of the learning module (LM), the state is normalized. In addition, in order to efficiently learn the network parameters, z-score normalization can be applied to the DNN input layer. The mean and standard deviation of the state can be calculated from the base case corresponding to the no-ESS control case. The normalized state is calculated as in Equation 5.

[수학식 5][Mathematical Formula 5]

수학식 5에서, s는 상태, S는 상태의 집합, 는 정규화된 상태, μ는 상태의 평균 값을, 그리고 σ는 상태의 표준 편차(실시예에 따라, 분산을 의미할 수도 있음)를 의미한다.In Equation 5, s is a state, S is a set of states, is the normalized state, μ is the mean value of the state, and σ is the standard deviation of the state (which may also mean the variance depending on the embodiment).

학습 모듈(DM)은, 액터 및 크리틱 네트워크라 명명되는 두 개의 DNN을 포함하는 DDPG 알고리즘을 이용하여 상태와 보상으로부터 행동을 학습한다. 이러한 두 개의 네트워크들은, 도 2에 도시된 바와 같이, 초기 상태(initial state)로부터 행동과 리턴의 기대값(expectation of return, 추정된 보상을 의미할 수 있음)을 예측하도록 디자인된다. 즉, 두 개의 DNN 중 하나는 상태에 따라 행동을 출력하는 인공 신경망이고 다른 하나는 상태에서 수행한 행동에 대한 보상을 추정하는 인공 신경망일 수 있다. DDPG 알고리즘의 에이전트는 액터 네트워크에 기초하여 행동을 선택한다. DNN은, 연속적인 상태와 행동으로, MDP의 공식화(formulation)를 가능케 한다. The learning module (DM) learns actions from states and rewards using the DDPG algorithm, which includes two DNNs, called actor and critic networks. These two networks are designed to predict actions and expectations of returns (which may mean estimated rewards) from initial states, as illustrated in Fig. 2. That is, one of the two DNNs may be an artificial neural network that outputs actions based on states, and the other may be an artificial neural network that estimates rewards for actions performed in states. The agent of the DDPG algorithm selects actions based on the actor network. The DNN enables the formulation of MDPs with continuous states and actions.

액터 네트워크는 3 개의 완전 연결 레이어들(fully connected(FC) layers)과 3개의 활성화 함수 레이어들(activation function layers)을 포함할 수 있다. 활성화 함수 레이어들은 두 개의 ReLU(Rectified Linear Unit) 레이어들과 하나의 하이퍼볼릭 탄젠트 레이어(hyperbolic tangent(tanh) layer)를 포함할 수 있다. 말단에서, 스케일링 레이어(scaling layer)는 ESS를 제어하기 위한 액션을 출력한다. 하이퍼볼릭 탄젠트 레이어의 출력은, ESS의 용량에 의해 제한되기는 하지만, -1과 +1 사이에서 고정된다. 스케일링 레이어는 액터 네트워크의 출력의 경계를 적절한 스케일과 바이어스 파리미터들로 디자인할 수 있게 한다. The actor network can contain three fully connected (FC) layers and three activation function layers. The activation function layers can contain two Rectified Linear Unit (ReLU) layers and one hyperbolic tangent (tanh) layer. At the end, the scaling layer outputs an action to control the ESS. The output of the hyperbolic tangent layer is fixed between -1 and +1, although it is limited by the capacity of the ESS. The scaling layer allows the bounds of the output of the actor network to be designed with appropriate scale and bias parameters.

보다 구체적으로, 완전 연결 레이어, ReLU 레이어, 하이퍼볼릭 탄젠트 레이어, 스케일링 레이어의 예시적인 연산 과정은 각각 수학식 6 내지 수학식 9와 같다.More specifically, exemplary computational processes of a fully connected layer, a ReLU layer, a hyperbolic tangent layer, and a scaling layer are as shown in Equations 6 to 9, respectively.

[수학식 6][Mathematical Formula 6]

y=w^Tx+by=w ^T x+b

수학식 6에서, y는 출력값, x는 입력값, w는 가중치 파라미터, b는 편차 파라미터를 의미한다.In mathematical expression 6, y represents the output value, x represents the input value, w represents the weight parameter, and b represents the deviation parameter.

[수학식 7][Mathematical formula 7]

수학식 7에서, y는 출력값, x는 입력값, a는 스케일링 계수를 의미한다.In mathematical expression 7, y represents the output value, x represents the input value, and a represents the scaling factor.

[수학식 8][Mathematical formula 8]

y=tanh(x)y=tanh(x)

수학식 8에서 y는 출력값, x는 입력값을 의미한다.In mathematical expression 8, y represents the output value and x represents the input value.

[수학식 9][Mathematical formula 9]

y=ax+by=ax+b

수학식 9에서, y는 출력값, x는 입력값, a는 스케일링 계수, b는 편차 계수를 의미한다.In mathematical expression 9, y represents the output value, x represents the input value, a represents the scaling coefficient, and b represents the deviation coefficient.

DDPG 알고리즘은 MATLAB 심층 학습 툴박스(deep learning toolbox)에서 구현될 수 있다. 크리틱 네트워크에서의 DNN 파리미터들 과 액터 네트워크에서의 DNN 파라미터들 은 임의의 파리미터들로(임의의 값으로) 초기화된다. 두 네트워크에서 파리미터들은 기존의 연구(“Deep Deterministic Policy Gradient (DDPG) Agents,” mathworks.com, https://kr.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html (accessed Oct. 26, 2022).)에서와 같이 업데이트될 수 있다.The DDPG algorithm can be implemented in MATLAB deep learning toolbox. DNN parameters in the critic network DNN parameters in actor networks are initialized with random parameters (random values). In both networks, the parameters can be updated as in previous work (“Deep Deterministic Policy Gradient (DDPG) Agents,” mathworks.com, https://kr.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html (accessed Oct. 26, 2022).).

제약 모듈(Constraint Module, CM)Constraint Module (CM)

제약 모듈은 2차 프로그래밍(Quadratic Programming, QP) 기반 최적화 동작을 수행한다. 즉, 제약 모듈은 학습 모듈에서 목적 함수에 따라 출력한 행동과 가장 근접하고 제약 조건을 만족하는 제약된 행동을 찾는 최적화 동작을 수행한다. 제약 모듈의 출력은 에너지 저장 장치를 제어하기 위한 신호를 의미할 수 있다. 따라서, 제약 모듈의 출력은 에너지 저장 장치로 전송될 수 있다.The constraint module performs an optimization operation based on quadratic programming (QP). That is, the constraint module performs an optimization operation to find a constrained action that is closest to the action output by the learning module according to the objective function and satisfies the constraint conditions. The output of the constraint module may mean a signal for controlling the energy storage device. Therefore, the output of the constraint module may be transmitted to the energy storage device.

1) 제약 조건(Constraints)1) Constraints

ESS에서 각 상의 유효 전력은, 수학식 10과 같이, 스케줄링된 ESS 프로파일과 같도록 제약된다.In ESS, the effective power of each phase is constrained to be equal to the scheduled ESS profile, as in Equation 10.

[수학식 10][Mathematical formula 10]

수학식 10에서, P_ESS는 하루전 스케줄링된 ESS 프로파일이다. 즉, 3상 ESS 출력은 하루 전 스케쥴링을 통해 결정되었다고 가정하고, 실시간 상별 제어에서 본 발명을 적용할 수 있다. 즉, 수학식 10은 3상 ESS 출력 P_ESS는 결정된 상수이며 제어 변수인 ESS의 상별 출력 의 합이 결정된 3상 ESS 출력과 일치해야 한다는 제약조건이다.In mathematical expression 10, P _ESS is an ESS profile scheduled one day in advance. That is, assuming that the three-phase ESS output is determined through one-day-ahead scheduling, the present invention can be applied to real-time phase-by-phase control. That is, mathematical expression 10 is a three-phase ESS output P _{ESS, where P ESS} is a determined constant and the phase-by-phase output of the ESS, which is a control variable. The constraint is that the sum of the three-phase ESS outputs must match the determined output.

또한, ESS의 각 상의 출력은 실제적인 ESS 동작을 만족시키기 위해서, 수학식 11과 같이, 동일한 부호(sign)를 가져야 한다. 상별 ESS의 출력은 기결정된 3상 ESS 출력과 동일히야 한다. 즉, 하루 전 스케쥴링에 따라 충전(양수)였으면 상별 출력도 양수이어야 하며, 방전(음수)였으면 상별 출력도 양수이어야 한다.In addition, the output of each phase of the ESS must have the same sign, as in Equation 11, in order to satisfy the actual ESS operation. The output of each phase of the ESS must be the same as the predetermined output of the three-phase ESS. That is, if it was charged (positive) according to the schedule the day before, the output of each phase must also be positive, and if it was discharged (negative), the output of each phase must also be positive.

[수학식 11][Mathematical formula 11]

ESS의 각 상의 역률은, 수학식 12과 같이, 인버터의 전압, 전류, 및 피상 전력(apparent power) 제한을 고려하도록 제한된다. 즉, 인버터가 출력할 수 있는 무효 전력의 범위를 결정하기 위한 역률의 범위를 제약한다.The power factor of each phase of the ESS is limited to consider the voltage, current, and apparent power limitations of the inverter, as in Equation 12. That is, the range of power factors for determining the range of reactive power that the inverter can output is limited.

[수학식 12][Mathematical formula 12]

즉, 각 상의 역률은 미리 정해진 범위의 값, 예컨대 a(a는 1보다 작은 실수로써, 예시적인 값은 0.5임) 보다 크거나 같고 b(는 a 보다는 크거나 같고 1 보다는 작거나 같은 실수로써, 예시적인 값은 1임) 보다 작거나 같은 값을 가질 수 있다.That is, the power factor of each phase can have a value within a predetermined range, for example, a (where a is a real number less than 1, an exemplary value being 0.5) and a value less than or equal to b (where a is a real number greater than or equal to a and less than or equal to 1, an exemplary value being 1).

2) QP(Quadratic Programming, 2차 프로그래밍)2) QP(Quadratic Programming)

QP는 학습 모듈(LM)의 출력 행동을 제한하기 위해 도입될 수 있다. 목적 함수(objective function)는 수학식 13과 같다.QP can be introduced to restrict the output behavior of the learning module (LM). The objective function is as shown in Equation 13.

[수학식 13][Mathematical formula 13]

이때, u와 u₀는 각각 수학식 14 및 수학식 15와 같다. 즉, 수학식 13은 학습 모듈에서 결정한 행동 u₀와 가장 가까운 행동 u를 찾도록 하는 목적 함수이다.At this time, u and u ₀ are as shown in Equations 14 and 15, respectively. That is, Equation 13 is an objective function that finds the action u that is closest to the action u ₀ determined by the learning module.

[수학식 14][Mathematical formula 14]

[수학식 15][Mathematical formula 15]

여기서, u는 u₀로부터 제약된 예측 행동이고, u₀는 학습 모듈(LM)로부터의 오리지널 행동(즉, 학습 모듈에서 출력한 행동)을 의미한다. 또한, P_ESS는 3상 유효 전력 출력, 는 상별 ESS 유효 전력 출력(a상, b상, c상), 는 상별 역률을 나타낸다.Here, u is a constrained predicted action from u ₀ , and u ₀ means the original action from the learning module (LM) (i.e., the action output from the learning module). In addition, P _ESS has three-phase active power output, ESS effective power output by phase (A phase, B phase, C phase), represents the power factor by phase.

제약 모듈(CM)은, 제약 조건을 만족시키는, 최소의 제곱 오차(squared error)를 갖는 가장 근접한 행동을 선택한다. 행동은 수학식 16과 수학식 17을 만족한다.The constraint module (CM) selects the closest action with the minimum squared error that satisfies the constraints. The action satisfies Equations (16) and (17).

[수학식 16][Mathematical formula 16]

[수학식 17][Mathematical formula 17]

여기서, f는 관측으로부터의 상수 값 행렬(constant value matrix)이고, g는 제약된 행동에 의해 곱해진 행렬이다. 다시 말하면, f는 0차 항(zero-order term)이고, g는 1차항(first-order term)의 계수(coefficient)이다. 와 는 각각 행동의 하한(lower bound)과 상한(upper bound)이다.Here, f is a constant value matrix from observations, and g is a matrix multiplied by the constrained actions. In other words, f is a zero-order term, and g is a coefficient of a first-order term. and are the lower bound and upper bound of the action, respectively.

QP의 제약 조건 항(constraints terms)은 수학식 18 내지 21에 따라 공식화된다.The constraints terms of QP are formulated according to mathematical expressions 18 to 21.

[수학식 18][Mathematical expression 18]

[수학식 19][Mathematical formula 19]

[수학식 20][Mathematical expression 20]

[수학식 21][Mathematical expression 21]

시뮬레이션 환경은, 도 3에 도시된 바와 같이, IEEE 13 test feeder에 기초하여 형성되었다. PV는 680-노드에 인스톨되고, ESS는, 가장 높은 평균 UVF가 관측된, 675-노드에 인스톨되었다. 즉, 도 3에는 본 발명의 모니터링 대상인 배전계통을 도시한다. 본 발명에서 이용되는 배전계통은 다양한 형태를 가질 수 있으며, 모선의 개수, 선로의 개수, 연계된 태양광 발전소(PV), 풍력 발전소(WT), 에너지 저장 장치(ESS)의 위치나 개수, 저압 배전계통의 유무나 변압기의 위치, 운연 전압의 크기 등은 상관없다.The simulation environment was formed based on the IEEE 13 test feeder, as illustrated in Fig. 3. PV was installed at 680 nodes, and ESS was installed at 675 nodes where the highest average UVF was observed. That is, Fig. 3 illustrates a distribution system, which is a monitoring target of the present invention. The distribution system used in the present invention may have various forms, and the number of bus bars, the number of lines, the locations or numbers of connected solar power plants (PVs), wind power plants (WTs), and energy storage devices (ESSs), the presence or absence of a low-voltage distribution system, the location of transformers, the size of the operating voltage, etc. are not important.

부하(load), PV(태양광 발전 설비), 및 ESS SOC 프로파일이, 도 4에 도시된 바와 같이, 다양한 시나리오들을 시뮬레이션하기 위해 생성되었다. 램덤하게 생성된 시나리오들은 하루 전 최적화의 결과물임을 가정한다. 제안 기법을 학습하고 테스트하기 위한 천 개의 시나리오들이 생성되었다. 이 중, 700 개의 시나리오들은 오프라인에서 전압불평형을 해소하기 위한 학습에 이용되었고, 300 개의 시나리오들은 학습된 에이전트를 위한 온라인 시뮬레이션에서 테스트하기 위해 이용되었다.Load, PV (photovoltaic power generation facility), and ESS SOC profiles were generated to simulate various scenarios, as shown in Fig. 4. It is assumed that the randomly generated scenarios are the results of the one-day-ahead optimization. A thousand scenarios were generated to learn and test the proposed technique. Among them, 700 scenarios were used for offline learning to resolve voltage imbalance, and 300 scenarios were used for testing in online simulation for the learned agent.

각 상에 대하여, 부하와 PV 프로파일이 0에서부터 1 pu까지의 동작을 위해 생성되었다. 부하의 각 상은 또한, 불평형 전압의 다양한 상황을 시뮬레이션하기 위해, 랜덤하게 동작한다. SOC 프로파일이 10%에서부터 90%까지의 동작을 위해 생성되었다. For each phase, load and PV profiles were generated for operation from 0 to 1 pu. Each phase of the load was also operated randomly to simulate various situations of unbalanced voltage. SOC profiles were generated for operation from 10% to 90%.

학습 모듈 하이퍼파라미터(LM Hyperparameters) Learning Module Hyperparameters (LM Hyperparameters)

학습 모듈(LM)은, 표 1에 도시된 바와 같이, DNN과 에이전트 학습 옵션들을 위한 파라미터들을 포함한다.The learning module (LM) contains parameters for DNN and agent learning options, as shown in Table 1.

[표 1][Table 1]

전압불평형 해소 결과(Voltage Unbalanced Mitigation Results)Voltage Unbalanced Mitigation Results

제안 기법은 ESS의 유효 전력과 무효 전력을 제어할 수 있다. 제안 기법의 성능을 비교하기 위하여, 유효 전력만을 제어하는 기법이 동일한 시나리오에 대하여 수행되었다. 또한, 휴리스틱 최적화 알고리즘들(heuristic optimization algorithms) 중 하나(PSO(Particle Swarm Optimization))가 성능 및 소모 시간을 비교하기 위하여 시뮬레이션 되었다. 동작 제한을 만족시키기 위하여, 제약 모듈(CM)이 PSO에 연결되었다.The proposed technique can control the active power and reactive power of ESS. In order to compare the performance of the proposed technique, a technique that controls only the active power was performed for the same scenario. In addition, one of the heuristic optimization algorithms (Particle Swarm Optimization (PSO)) was simulated to compare the performance and consumption time. In order to satisfy the operation constraints, a constraint module (CM) was connected to PSO.

테스트셋에서 베스트 시나리오 (223)의 성능은 도 5에 도시되어 있다. VUF는 시뮬레이션된 배전계통 내의 모든 13 노드들이 아닌 오직 6 노드들에서만 계산되었다. 다른 노드들은 단상 또는 2상의 노드들이기 때문이다. 가장 높은 VUF(4.35%)는 시나리오 223, 675 버스에서 관측되었고, 제안 기법을 이용하였을 때 3.08%로 감소하였다. 뚜렷하게, 다른 노드들과 시나리오의 VUF 또한 감소하였다.The performance of the best scenario (223) in the test set is shown in Fig. 5. The VUF is computed only for 6 nodes instead of all 13 nodes in the simulated distribution system, since the other nodes are single-phase or dual-phase nodes. The highest VUF (4.35%) is observed for scenario 223, 675 buses, which is reduced to 3.08% using the proposed technique. Notably, the VUFs of other nodes and scenarios are also reduced.

도 6은 제어를 하지 않은 경우와 각 제어 모델에서의 평균 VUF를 비교한 결과를 도시한다. 노드 671과 675는 높은 불균형을 유발하는 중부하(heavy load)에 접속되었다. 그리고, 이웃 노드는 노드 632와 633에 비해 더 높은 불균형 전압을 갖는다. 제어를 하지 않는 경우, 각 상에 대하여 동일한 유효 전력과 무효 전력을 출력하였다. 제안 기법에 기반한 모델은, 테스트 및 학습 시나리오들에 대하여, 동일한 적절한 제어를 제공하였다. PSO가, 다른 모델들과 비교하기 위하여, 테스트 시나리오에 대하여 시뮬레이션되었다. 여기서, PSO는 가장 낮은 VUF를 보였다. 이는 매우 많은 시간을 소모하여 각 시나리오의 최적화가 가능하기 때문이다. 성능과 소모 시간의 결과는 표 2에 도시되어 있다.Fig. 6 shows the results of comparing the average VUF in the case of no control and each control model. Nodes 671 and 675 are connected to heavy loads that cause high imbalance. And the neighboring nodes have higher imbalance voltage than nodes 632 and 633. In the case of no control, the same real and reactive power were output for each phase. The model based on the proposed technique provided the same appropriate control for the test and learning scenarios. PSO was simulated for the test scenario in order to compare with other models. Here, PSO showed the lowest VUF. This is because it takes a lot of time to optimize each scenario. The results of performance and time consumption are shown in Table 2.

[표 2][Table 2]

표 2에서, 학습을 위한 연산 시간은 700 개의 시나리오들을 이용한 전체 학습 시간으로 측정되었고, 테스트를 위한 연산 시간은 300 개의 테스트 시나리오들의 평균 값으로 계산되었다. VUF 개선은 제어를 하지 않은 모드와 각 모델 사이의 차이를 계산하였다.In Table 2, the computation time for learning was measured as the total learning time using 700 scenarios, and the computation time for testing was calculated as the average value of 300 test scenarios. The VUF improvement was calculated as the difference between the mode without control and each model.

이상에서 설명된 장치는 하드웨어 구성 요소, 소프트웨어 구성 요소, 및/또는 하드웨어 구성 요소 및 소프트웨어 구성 요소의 집합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성 요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(Digital Signal Processor), 마이크로컴퓨터, FPA(Field Programmable array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(Operation System, OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술 분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(Processing Element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(Parallel Processor)와 같은, 다른 처리 구성(Processing Configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a collection of hardware components and software components. For example, the devices and components described in the embodiments may be implemented using one or more general-purpose computers or special-purpose computers, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing instructions and responding. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing device is sometimes described as being used alone, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. For example, the processing unit may include multiple processors, or one processor and one controller. Other processing configurations, such as a parallel processor, are also possible.

소프트웨어는 컴퓨터 프로그램(Computer Program), 코드(Code), 명령(Instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(Collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성 요소(Component), 물리적 장치, 가상 장치(Virtual Equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(Signal Wave)에 영구적으로, 또는 일시적으로 구체화(Embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, and may configure a processing device to perform a desired operation or may independently or collectively command the processing device. The software and/or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal wave to be interpreted by the processing device or to provide instructions or data to the processing device. The software may be distributed on network-connected computer systems and stored or executed in a distributed manner. The software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-optical Media), 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, etc., alone or in combination. The program commands recorded on the medium may be those specially designed and configured for the embodiment or may be those known to and usable by those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program commands such as ROMs, RAMs, and flash memories. Examples of the program commands include not only machine language codes generated by a compiler but also high-level language codes that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiment, and vice versa.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성 요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성 요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments illustrated in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. For example, appropriate results can be achieved even if the described techniques are performed in a different order from the described method, and/or components of the described systems, structures, devices, circuits, etc. are combined or combined in a different form from the described method, or are replaced or substituted by other components or equivalents. Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

Claims

In a voltage imbalance resolution method performed by a voltage imbalance resolution device including at least a processor,
A step of receiving observation data of a distribution system;
A step of predicting an action from the observation data using a learned artificial neural network model; and
comprising a step of producing a constrained action corresponding to said action;
How to resolve voltage imbalance.

In the first paragraph,
The above observation data includes a state, and the state includes the phase-by-phase voltage magnitude and phase of the three-phase voltage at any point in the distribution system.
How to resolve voltage imbalance.

In the second paragraph,
The step of predicting the above action is to normalize the above state and then input the normalized state into the learned artificial neural network model.
The above normalized state is generated through mathematical expression 1,
The above mathematical expression 1 is And,
The above s is a state, the above S is a set of states, and the above is a normalized state, μ represents the mean value of the state, and σ represents the standard deviation of the state.
How to resolve voltage imbalance.

In the third paragraph,
The step of producing the above-mentioned constrained behavior produces the above-mentioned constrained behavior through second-order programming optimization,
The above second-order programming optimization has mathematical expression 2 as the objective function,
The above mathematical formula 2 is And,
The above u is the constrained action, and the above u ₀ means the action that is the output of the learned artificial neural network model.
How to resolve voltage imbalance.

In paragraph 4,
The step of producing the above-mentioned constrained action produces the above-mentioned constrained action using the first constraint,
The above first constraint is expressed by mathematical expression 3,
The above mathematical formula 3 is And,
The above P _ESS is a predetermined 3-phase ESS (Energy Storage System) output,
Above is the output of ESS,
How to resolve voltage imbalance.

In paragraph 5,
The step of producing the above-mentioned constrained action produces the above-mentioned constrained action by further utilizing the second constraint,
The above second constraint is expressed by mathematical expression 4,
The above mathematical expression 4 is person,
How to resolve voltage imbalance.

In Article 6,
The step of producing the above-mentioned constrained action produces the above-mentioned constrained action by further utilizing the third constraint,
The above third constraint is expressed by mathematical expression 5,
The above mathematical expression 5 is And,
is the power factor,
How to resolve voltage imbalance.