KR20230103790A

KR20230103790A - Adversarial learning-based image correction method and apparatus for deep learning analysis of heterogeneous images

Info

Publication number: KR20230103790A
Application number: KR1020220014250A
Authority: KR
Inventors: 김남규; 김준우
Original assignee: 국민대학교산학협력단
Priority date: 2021-12-30
Filing date: 2022-02-03
Publication date: 2023-07-07

Abstract

The present invention relates to an adversarial training-based image correction method and apparatus for deep training analysis of heterogeneous images, which comprises the steps of: generating at least one patch image from each original image based on image sets consisting of original images of different domains; generating a training dataset by selecting a portion of the at least one patch image for each original image of the image sets; identifying an RGB label and a source of the original image corresponding to each training image in the training dataset; updating the training dataset by tagging each training image with the RGB label and a domain label corresponding to the source; applying the training images of the training dataset to a feature extraction model to generate image features; and building a color constancy model by performing adversarial training of repeating attribute prediction and domain classification tasks for each training image by using the image features. Therefore, the present invention operates through the interaction of a domain classifier, which predicts the domain in which an image was taken and a lighting predictor, which predicts lighting values and removes domain characteristics by training in a way that degrades the performance of domain classification.

Description

ADVERSARIAL LEARNING-BASED IMAGE CORRECTION METHOD AND APPARATUS FOR DEEP LEARNING ANALYSIS OF HETEROGENEOUS IMAGES}

본 발명은 이미지 보정 기술에 관한 것으로, 보다 상세하게는 이질적 환경에서 생성된 이미지 데이터들을 동시에 사용하는 앤드-투-앤드 구조의 적대적 학습 기반의 이미지 색 항상성 모델 성능 향상 기술에 관한 것이다.The present invention relates to an image correction technology, and more particularly, to an image color constancy model performance improvement technology based on adversarial learning of an end-to-end structure that simultaneously uses image data generated in heterogeneous environments.

최근 이미지 분야에서 CNN (Convolutional Neural Network) 알고리즘의 발전은 이미지 분류, 객체 탐지, 미술치료, 영화 평점 예측 등 학계와 업계를 가리지 않고 매우 다양한 응용 분야에서 괄목할 만한 성과를 거두고 있다. 하지만, 이미지 데이터는 언어라는 일관된 약속에 따라 생성되는 텍스트 데이터와 달리 촬영 환경에 따라 데이터의 형태와 구조가 매우 이질적으로 인식되며, 이로 인해 동일한 정보임에도 촬영 환경에 따라 각 이미지의 특징(Feature)이 상이하게 표현될 수 있다. 이는 각 이미지가 갖는 상이한 환경 정보뿐 아니라 이미지 고유의 정보조차 서로 상이한 특징으로 표현됨을 의미하며, 이로 인해 이들 이미지 정보는 서로 잡음(Noise)으로 작용해 결과적으로 모델의 분석 성능을 저해하는 요소로 작용할 수 있다.Recent advances in CNN (Convolutional Neural Network) algorithms in the image field have achieved remarkable results in a wide variety of application fields, including image classification, object detection, art therapy, and movie rating prediction, both in academia and industry. However, image data, unlike text data that is created according to a consistent promise of language, is recognized as very heterogeneous in the form and structure of data depending on the shooting environment. may be expressed differently. This means that not only the different environmental information of each image, but even the image-specific information is expressed with different features, which can cause these image information to act as noise to each other and consequently act as an element that hinders the analysis performance of the model. can

이러한 문제를 해결하기 위해 학습에 사용되는 데이터 자체의 일관성을 향상시키기 위한 방법, 즉, 이질적 환경에서 촬영된 이미지를 동일 환경에서 촬영된 것처럼 변환하는 방법이 연구되어 왔다. 대표적으로 다양한 빛 또는 조명 하에서 서로 다르게 나타나는 물체의 색상을 일관되게 표현하는 방식으로 촬영 환경의 이질성을 제거하는 색 항상성(Color Constancy)은 컴퓨터 비전 애플리케이션의 필수 처리 과정으로, 사람의 시각이 인지하는 것과 동일한 이미지를 생성하기 위해 널리 사용될 수 있다.In order to solve this problem, a method for improving the consistency of the data itself used for learning, that is, a method of converting an image captured in a heterogeneous environment as if it was captured in the same environment has been studied. Typically, color constancy, which removes the heterogeneity of the shooting environment by consistently expressing the color of objects that appear differently under various lights or lighting, is an essential processing process for computer vision applications. It can be widely used to create identical images.

색 항상성은 크게 통계기반과 학습기반으로 나뉘며, 통계기반 기법은 주로 인텔(Intel)사의 OpenCV 라이브러리나 MathWorks 사의 MATLAB을 활용하여 이미지 전처리 단계처럼 수행될 수 있다. 통계기반 기법에 비해 상대적으로 최근에 연구되기 시작한 학습기반 기법은 CNN을 활용하는 연구들이 주를 이루고 있으며, 간단한 네트워크를 통해 기존 기법 대비 우수한 성능을 나타낸다는 장점을 가질 수 있다. 그러나, 학습기반 기법 또한 일반화 측면의 한계, 즉 특정 환경에서 촬영된 이미지에 대한 학습을 통해 생성한 모델을 다른 환경에서 촬영된 이미지의 분석에 사용하는 경우 성능이 낮게 나타난다는 한계를 가질 수 있다.Color constancy is largely divided into statistics-based and learning-based, and the statistical-based technique can be performed like an image preprocessing step, mainly using Intel's OpenCV library or MathWorks' MATLAB. Compared to statistics-based techniques, learning-based techniques, which have been studied relatively recently, are dominated by studies using CNNs, and can have the advantage of showing superior performance compared to existing techniques through simple networks. However, the learning-based technique also has limitations in terms of generalization, that is, when a model generated through learning of images captured in a specific environment is used for analysis of images captured in other environments, performance may be low.

한국등록특허 제10-1462440호 (2014.11.11)Korean Patent Registration No. 10-1462440 (2014.11.11)

본 발명의 일 실시예는 이미지가 촬영된 환경인 도메인을 예측하는 도메인 분류기와 조명 값을 예측하는 조명 예측기의 상호 작용으로 동작하고 도메인 분류의 성능을 떨어뜨리는 방향의 학습을 통해 도메인 특성을 제거할 수 있는 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법 및 장치를 제공하고자 한다.An embodiment of the present invention operates by interaction between a domain classifier predicting a domain, which is an environment in which an image was captured, and a lighting predictor predicting a lighting value, and removes domain characteristics through learning in a direction that deteriorates the performance of domain classification. It is intended to provide an adversarial learning-based image correction method and apparatus for deep learning analysis of heterogeneous images.

실시예들 중에서, 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법은 각각이 서로 다른 도메인의 원본 이미지들로 구성된 이미지셋들을 기초로 각 원본 이미지로부터 적어도 하나의 패치 이미지를 생성하는 단계; 상기 이미지셋들의 각 원본 이미지마다 상기 적어도 하나의 패치 이미지 중에서 일부를 선별하여 학습 데이터셋을 생성하는 단계; 상기 학습 데이터셋의 각 학습 이미지에 대응되는 원본 이미지의 RGB 레이블(label)과 소스(source)를 식별하는 단계; 상기 각 학습 이미지에 상기 RGB 레이블과 상기 소스에 대응되는 도메인 레이블을 태깅(tagging)하여 상기 학습 데이터셋을 갱신하는 단계; 상기 학습 데이터셋의 학습 이미지를 특징 추출 모델에 적용하여 이미지 특징을 생성하는 단계; 및 상기 이미지 특징을 이용하여 상기 각 학습 이미지에 관한 속성 예측 작업과 도메인 분류 작업을 반복하는 적대적 학습을 수행하여 색 항상성 모델(color constancy model)을 구축하는 단계;를 포함한다.Among embodiments, an adversarial learning-based image correction method for deep learning analysis of heterogeneous images includes generating at least one patch image from each original image based on image sets each composed of original images of different domains; generating a training dataset by selecting a portion of the at least one patch image for each original image of the image sets; Identifying an RGB label and source of an original image corresponding to each training image of the training dataset; updating the training dataset by tagging each training image with the RGB label and a domain label corresponding to the source; generating image features by applying a training image of the training dataset to a feature extraction model; and constructing a color constancy model by performing adversarial learning in which an attribute prediction task and a domain classification task are repeated for each training image using the image features.

상기 패치 이미지를 생성하는 단계는 상기 각 원본 이미지의 픽셀 데이터를 RGB 채널 별로 독립적으로 생성하는 단계; 상기 RGB 채널의 픽셀 데이터에 대해 특정 크기로 정의되는 인접 영역의 픽셀들을 그룹화하는 단계; 및 상기 그룹화된 픽셀들을 연산하여 상기 RGB 채널에 대한 픽셀값을 산출하는 단계를 포함할 수 있다.The generating of the patch image may include independently generating pixel data of each original image for each RGB channel; grouping pixels of an adjacent area defined with a specific size for the pixel data of the RGB channels; and calculating a pixel value for the RGB channel by calculating the grouped pixels.

상기 패치 이미지를 생성하는 단계는 상기 RGB 채널에 대한 픽셀값을 합산하여 상기 패치 이미지의 밝기를 결정하는 단계를 포함할 수 있다.The generating of the patch image may include determining brightness of the patch image by summing pixel values of the RGB channels.

상기 학습 데이터셋을 생성하는 단계는 상기 패치 이미지의 밝기를 기준으로 상위 n개(상기 n은 자연수)의 패치 이미지를 선별하는 단계를 포함할 수 있다.The generating of the training dataset may include selecting top n patch images (where n is a natural number) based on brightness of the patch images.

상기 RGB 레이블과 소스를 식별하는 단계는 상기 원본 이미지의 속성들 중 어느 하나의 RGB 값을 상기 RGB 레이블로 정의하는 단계; 및 상기 원본 이미지와 연관된 카메라 특성을 상기 소스로 정의하고 상기 카메라 특성에 관한 도메인 분류 결과에 따라 상기 도메인 레이블을 결정하는 단계를 포함할 수 있다.The step of identifying the RGB label and the source may include defining an RGB value of any one of properties of the original image as the RGB label; and defining a camera characteristic associated with the original image as the source and determining the domain label according to a domain classification result of the camera characteristic.

상기 이미지 특징을 생성하는 단계는 상기 학습 이미지를 입력으로 수신하여 상기 이미지 특징을 출력으로 생성하는 CNN(Convolutional Neural Network) 모델을 상기 특징 추출 모델로서 적용하는 단계를 포함할 수 있다.The generating of the image feature may include applying a convolutional neural network (CNN) model that receives the training image as an input and generates the image feature as an output as the feature extraction model.

상기 이미지 특징을 생성하는 단계는 합성곱층(Convolutional Layer), 맥스풀링층(Max Pooling Layer) 및 플래튼층(Flatten Layer)을 포함하는 CNN 모델을 통해 기 설정된 크기의 다차원 특징 벡터를 상기 이미지 특징으로 생성하는 단계를 포함할 수 있다.The step of generating the image feature generates a multi-dimensional feature vector of a predetermined size as the image feature through a CNN model including a convolutional layer, a max pooling layer, and a flatten layer steps may be included.

상기 색 항상성 모델을 구축하는 단계는 상기 이미지 특징을 2개의 완전연결계층(Fully Connected Layer)들을 포함하는 속성 예측 모델에 입력하여 학습 이미지의 속성을 예측하는 단계; 및 상기 이미지 특징을 하나의 GRL(Gradient Reversal Layer) 및 2개의 완전연결계층들을 포함하는 도메인 분류 모델에 입력하여 학습 이미지의 도메인을 예측하는 단계를 포함할 수 있다.The constructing of the color constancy model may include predicting attributes of a training image by inputting the image features into an attribute prediction model including two fully connected layers; and predicting a domain of a training image by inputting the image feature into a domain classification model including one gradient reversal layer (GRL) and two fully connected layers.

상기 색 항상성 모델을 구축하는 단계는 상기 반복 동안 상기 속성 예측 모델의 손실을 감소시키면서 상기 도메인 분류 모델의 손실을 증가시키는 방향으로 상기 적대적 학습을 수행하는 단계를 포함할 수 있다.Building the color constancy model may include performing the adversarial learning in a direction of increasing loss of the domain classification model while reducing loss of the attribute prediction model during the iteration.

상기 방법은 상기 색 항상성 모델을 이용하여 입력 이미지의 속성을 예측하는 단계; 및 상기 입력 이미지에서 상기 예측된 속성을 제거하여 상기 입력 이미지의 원본 이미지를 복원하는 단계;를 더 포함할 수 있다.The method may include predicting attributes of an input image using the color constancy model; and restoring an original image of the input image by removing the predicted attribute from the input image.

상기 방법은 상기 입력 이미지의 속성에 관한 정답과 예측값 사이의 각도를 산출하여 상기 색 항상성 모델의 성능을 평가하는 단계;를 더 포함할 수 있다.The method may further include evaluating performance of the color constancy model by calculating an angle between a correct answer and a predicted value for the attribute of the input image.

일 실시예에서, 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 장치는 각각이 서로 다른 도메인의 원본 이미지들로 구성된 이미지셋들을 기초로 각 원본 이미지로부터 적어도 하나의 패치 이미지를 생성하는 패치 이미지 생성부; 상기 이미지셋들의 각 원본 이미지마다 상기 적어도 하나의 패치 이미지 중에서 일부를 선별하여 학습 데이터셋을 생성하는 학습 데이터셋 생성부; 상기 학습 데이터셋의 각 학습 이미지에 대응되는 원본 이미지의 RGB 레이블(label)과 소스(source)를 식별하는 원본 이미지 식별부; 상기 각 학습 이미지에 상기 RGB 레이블과 상기 소스에 대응되는 도메인 레이블을 태깅(tagging)하여 상기 학습 데이터셋을 갱신하는 학습 데이터셋 갱신부; 상기 학습 데이터셋의 학습 이미지를 특징 추출 모델에 적용하여 이미지 특징을 생성하는 이미지 특징 생성부; 및 상기 이미지 특징을 이용하여 상기 각 학습 이미지에 관한 속성 예측 작업과 도메인 분류 작업을 반복하는 적대적 학습을 수행하여 색 항상성 모델(color constancy model)을 구축하는 모델 구축부;를 포함한다.In one embodiment, an adversarial learning-based image correction apparatus for deep learning analysis of heterogeneous images generates at least one patch image from each original image based on image sets each composed of original images of different domains. generating unit; a learning data set generating unit generating a training data set by selecting a part from among the at least one patch image for each original image of the image sets; an original image identification unit identifying an RGB label and a source of an original image corresponding to each training image of the training dataset; a training dataset updating unit updating the training dataset by tagging each of the training images with the RGB label and a domain label corresponding to the source; an image feature generation unit generating image features by applying the learning image of the training dataset to a feature extraction model; and a model building unit configured to construct a color constancy model by performing adversarial learning by repeating attribute prediction and domain classification for each training image using the image features.

상기 장치는 상기 색 항상성 모델을 이용하여 입력 이미지의 속성을 예측하는 이미지 속성 예측부; 및 상기 입력 이미지에서 상기 예측된 속성을 제거하여 상기 입력 이미지의 원본 이미지를 복원하는 원본 이미지 복원부;를 더 포함할 수 있다.The apparatus includes an image property predictor predicting properties of an input image using the color constancy model; and an original image restoration unit configured to restore an original image of the input image by removing the predicted attribute from the input image.

상기 장치는 상기 입력 이미지의 속성에 관한 정답과 예측값 사이의 각도를 산출하여 상기 색 항상성 모델의 성능을 평가하는 모델 성능 평가부;를 더 포함할 수 있다.The apparatus may further include a model performance evaluation unit configured to evaluate performance of the color constancy model by calculating an angle between a correct answer and a predicted value of the attribute of the input image.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology may have the following effects. However, it does not mean that a specific embodiment must include all of the following effects or only the following effects, so it should not be understood that the scope of rights of the disclosed technology is limited thereby.

본 발명의 일 실시예에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법 및 장치는 이미지가 촬영된 환경인 도메인을 예측하는 도메인 분류기와 조명 값을 예측하는 조명 예측기의 상호 작용으로 동작하고 도메인 분류의 성능을 떨어뜨리는 방향의 학습을 통해 도메인 특성을 제거할 수 있다.An adversarial learning-based image correction method and apparatus for deep learning analysis of heterogeneous images according to an embodiment of the present invention operate by interaction between a domain classifier that predicts a domain in which an image is captured and a lighting predictor that predicts a lighting value. and domain characteristics can be removed through learning in the direction of degrading the performance of domain classification.

도 1은 본 발명에 따른 이미지 보정 시스템을 설명하는 도면이다.
도 2는 도 1의 이미지 보정 장치의 기능적 구성을 설명하는 도면이다.
도 3은 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 설명하는 순서도이다.
도 4는 본 발명에 따른 이미지 보정 과정을 설명하는 도면이다.
도 5는 본 발명에 따른 패치 생성 및 라벨 태깅 과정을 설명하는 도면이다.
도 6은 본 발명에 따른 특징 추출 및 적대적 학습 과정을 설명하는 도면이다.
도 7은 본 발명에 따른 적대적 학습 과정에서 특징 갱신의 일 실시예를 설명하는 도면이다.
도 8 내지 11은 본 발명에 따른 방법에 관한 실험 내용을 설명하는 도면이다.1 is a diagram illustrating an image correction system according to the present invention.
FIG. 2 is a diagram explaining the functional configuration of the image correction device of FIG. 1 .
3 is a flowchart illustrating an adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention.
4 is a diagram illustrating an image correction process according to the present invention.
5 is a diagram illustrating a patch generation and label tagging process according to the present invention.
6 is a diagram illustrating a feature extraction and adversarial learning process according to the present invention.
7 is a diagram illustrating an embodiment of feature update in an adversarial learning process according to the present invention.
8 to 11 are diagrams for explaining experimental details related to the method according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is only an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, since the embodiment can be changed in various ways and can have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such effects, the scope of the present invention should not be construed as being limited thereto.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of terms described in this application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are used to distinguish one component from another, and the scope of rights should not be limited by these terms. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected to the other element, but other elements may exist in the middle. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that no intervening elements exist. Meanwhile, other expressions describing the relationship between components, such as “between” and “immediately between” or “adjacent to” and “directly adjacent to” should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Expressions in the singular number should be understood to include plural expressions unless the context clearly dictates otherwise, and terms such as “comprise” or “having” refer to an embodied feature, number, step, operation, component, part, or these. It should be understood that it is intended to indicate that a combination exists, and does not preclude the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (eg, a, b, c, etc.) is used for convenience of explanation, and the identification code does not describe the order of each step, and each step clearly follows a specific order in context. Unless otherwise specified, it may occur in a different order than specified. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be implemented as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all types of recording devices storing data that can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, the computer-readable recording medium may be distributed to computer systems connected through a network, so that computer-readable codes may be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs, unless defined otherwise. Terms defined in commonly used dictionaries should be interpreted as consistent with meanings in the context of the related art, and cannot be interpreted as having ideal or excessively formal meanings unless explicitly defined in the present application.

색 항상성은 다양한 조명 아래에서도 객체의 색상을 일관되게 인식하도록 만들어 주는 인간의 시각 보상 인지 능력을 의미할 수 있다. 한편, 카메라는 렌즈를 통해 들어오는 물체의 반사광을 물체의 색상으로 나타내기 때문에, 촬영 당시의 조명에 따라 표현되는 이미지의 색상을 그대로 받아들일 수 있다. 따라서, 인간의 시각과 달리, 카메라는 조명의 영향으로 인해 목표 객체의 본질적 특성을 제대로 인식하지 못하는 문제점을 포함할 수 있다. 이러한 어려움을 해결하고자 하는 연구 분야를 색 항상성 또는 화이트 밸런스(White Balance)라고 하며, 크게 통계기반과 학습기반으로 나뉠 수 있다. 또한, 해당 과정은 조명을 예측한 뒤, 예측한 조명을 통해 원본 이미지를 보정하는 2단계의 과정으로 구성될 수 있다.Color constancy may refer to a human's ability to perceive the visual compensation that consistently recognizes the color of an object under various lighting conditions. On the other hand, since the camera expresses reflected light of an object entering through the lens as the color of the object, it can accept the color of the image expressed according to the lighting at the time of shooting. Therefore, unlike human vision, the camera may have a problem of not properly recognizing the essential characteristics of the target object due to the influence of lighting. The field of research to solve these difficulties is called color constancy or white balance, and can be largely divided into statistical-based and learning-based. In addition, the process may consist of a two-step process of estimating lighting and correcting the original image through the predicted lighting.

전통적 통계기반의 색 항상성 기술은 이미지 색상 분포에 대한 특정 통계적 가정을 통해 조명을 추정할 수 있다. Gray-World 알고리즘은 한 이미지의 평균 색상이 무채색이라는 가정을 통해 RGB 채널의 각 평균을 계산하여 조명을 추정할 수 있다. White-Point 또는 Max-RGB라 불리는 알고리즘은 RGB 채널의 가장 밝은 지점들이 조명의 완전 반사에 의해 생성된다고 가정하고 반사에 의한 색상을 해석하여 조명을 추정할 수 있다. Gray-Edge 알고리즘은 이미지 테두리의 평균이 무채색이라고 가정하고, RGB 채널의 각 테두리 색의 평균을 계산해 조명을 추정할 수 있다. 비교적 최근에 등장한 Grey-Pixel 알고리즘은 이미지로부터 반사율이 무채색과 가장 유사한 픽셀들의 집합을 추출하고, 해당 픽셀들의 평균 반사율을 조명으로 추정할 수 있다. 통계기반 기법은 관측을 통한 비교적 단순한 가정으로 인해, 단 한 장의 데이터만으로도 적용할 수 있다는 장점을 가질 수 있다. 하지만, 통계기반 방법은 색 분포가 매우 다양하거나 무채색이 없는 이미지 데이터가 입력되는 등 해당 가정이 성립되지 않는 경우에는 적용이 어렵다는 한계를 가질 수 있다. 따라서, 최근에는 이러한 한계를 극복하고자 데이터에서 스스로 규칙을 찾는 학습기반 기법이 선호되고 있다.Traditional statistical color constancy techniques can estimate lighting by making certain statistical assumptions about image color distribution. The Gray-World algorithm can estimate lighting by calculating the average of each RGB channel, assuming that the average color of an image is achromatic. An algorithm called White-Point or Max-RGB assumes that the brightest points in the RGB channels are created by perfect reflection of light, and interprets the color of the reflection to estimate the illumination. The Gray-Edge algorithm assumes that the average of the image edges is achromatic, and calculates the average of each edge color in the RGB channels to estimate lighting. A relatively recent Grey-Pixel algorithm extracts a set of pixels whose reflectance is most similar to that of an achromatic color from an image, and can estimate the average reflectance of the corresponding pixels as lighting. Statistical-based techniques can have the advantage of being applicable with only one sheet of data due to relatively simple assumptions made through observations. However, the statistical-based method may have a limitation that it is difficult to apply when the corresponding assumption is not established, such as when color distribution is very diverse or image data without achromatic color is input. Therefore, in order to overcome these limitations, a learning-based technique that finds rules on its own in data has recently been preferred.

학습기반 기법은 데이터에서 학습한 모델을 통해 조명을 추정하는 방식으로, 크게 Gamut Mapping과 회귀 추론으로 나뉠 수 있다. Gamut Mapping은 Gamut이라 불리는 제한된 색 공간을 만든 뒤, 새로운 입력 이미지의 색 공간을 기존에 만들어진 공간에 투영하는 방식에 해당할 수 있다. 해당 방식은 기존의 색 공간이 새로 들어온 이미지의 색 공간과 유사한 특성을 갖는 경우 효과적으로 작동하지만, 특성이 상이한 경우에는 성능이 낮아진다는 단점을 가질 수 있다. 회귀 추론 방식은 이미지의 특징을 직접 학습하여 조명의 벡터(Vector) 값을 추정하는 방법으로, 서포트 백터 머신(Support Vector Machine), 의사결정나무(Decision Tree) 등의 머신러닝(Machine Learning) 모델들이 주로 사용될 수 있다.The learning-based technique is a method of estimating lighting through a model learned from data, and can be largely divided into Gamut Mapping and regression inference. Gamut Mapping may correspond to a method of creating a limited color space called gamut and then projecting the color space of a new input image onto the previously created space. This method works effectively when the existing color space has similar characteristics to the color space of the new image, but may have a disadvantage that performance is lowered when the characteristics are different. The regression inference method is a method of estimating the vector value of lighting by directly learning the features of an image, and machine learning models such as support vector machines and decision trees can be mainly used.

최근에는 학습 데이터의 전체적인 특성을 활용해 최대한의 효율을 내기 위한 시도로, 딥러닝을 사용한 회귀 추론 방식이 활발하게 시도되고 있다. 색 항상성에 CNN 알고리즘을 적용한 최초의 연구는 이미지 전체에서 통계적 속성을 추출한 이전 연구들과 달리 패치(Patch) 단위 학습을 통해 이미지의 지역적 변화를 감지하는 방식으로 우수한 성능을 나타낼 수 있다. 이외에도 CNN 구조를 개선해 성능을 향상시키는 방법으로 다양한 학습기반 방법들이 연구되고 있다.Recently, regression inference methods using deep learning have been actively attempted in an attempt to achieve maximum efficiency by utilizing the overall characteristics of learning data. Unlike previous studies that applied the CNN algorithm to color constancy, the first study that extracted statistical properties from the entire image could show excellent performance by detecting regional changes in images through patch-by-patch learning. In addition, various learning-based methods are being studied as a way to improve the performance by improving the CNN structure.

생성적 적대 신경망(Generative Adversarial Network)으로 잘 알려진 적대적 네트워크는 의미없는 노이즈로부터 실제와 유사한 이미지를 생성하기 위해 사용될 수 있다. 적대적 네트워크는 구체적으로 생성자(Generator)와 판별자(Discriminator)의 모듈로 구성되며, 생성자는 가짜 이미지를 최대한 진짜와 유사한 모습으로 생성하고 판별자는 가짜 이미지와 진짜 이미지를 최대한 정확하게 판별해 내는 최소최대화(MiniMax) 학습을 수행할 수 있다. 이를 통해 생성자는 의미 없는 노이즈 이미지로부터 실제와 유사한 이미지를 만들어 낼 수 있다. 이러한 이중 네트워크 구조는 이미지 생성뿐만 아니라 가짜뉴스 판별, 시계열 합성, 자연어 처리, 도메인 적용(Domain Adaptation) 등 다양한 응용 분야에서 널리 활용되고 있다.Adversarial networks, better known as generative adversarial networks, can be used to generate realistic images from meaningless noise. The adversarial network is specifically composed of generator and discriminator modules. The generator creates a fake image as similar to the real one as possible, and the discriminator minimizes maximization (which discriminates between fake and real images as accurately as possible). MiniMax) learning can be performed. Through this, the creator can create an image similar to the real one from a meaningless noise image. This dual network structure is widely used in various application fields such as image generation, fake news detection, time series synthesis, natural language processing, and domain adaptation.

다양한 응용 분야 중 도메인 적용은 전이학습(Transfer Learning)의 하위분야로, 충분한 양의 데이터를 지닌 소스 도메인을 활용하여 데이터가 비교적 적거나 정답 레이블이 없는 타겟 도메인의 문제를 효율적으로 해결할 수 있다. 즉, 소스 도메인 데이터의 분류 성능을 유지하면서 타겟 도메인 데이터에서도 분류의 성능을 높이는 것을 목표로 한다. 이 과정에서 학습 전 소스 도메인의 특징(Feature)과 타겟 도메인의 특징은 그 분포가 상이하게 나타나는데, 이러현 현상, 즉 도메인 변화(Domain Shift) 현상을 극복하고 상이한 도메인의 특성 분포를 동일하게 만드는 학습이 도메인 적용의 핵심에 해당할 수 있다. 도메인 시프트를 해결하는 방식은 소스 도메인에서 학습된 모델에 타겟 도메인 데이터를 입력으로 넣어 추출되는 특징을 소스 도메인의 특징 분포와 동일하게 만드는 방식, 그리고 상이한 두 도메인 데이터를 학습에 동시에 사용하여 두 도메인의 특징 분포 중 공통된 것만 추출하는 방식을 포함할 수 있다.Among various application fields, domain application is a subfield of transfer learning, and it can efficiently solve problems in target domains with relatively little data or no correct answer labels by utilizing a source domain with sufficient amount of data. That is, while maintaining the classification performance of source domain data, the goal is to improve classification performance in target domain data. In this process, the features of the source domain before learning and the features of the target domain appear in different distributions. Learning that overcomes this phenomenon, that is, the domain shift phenomenon, and makes the feature distribution of different domains the same It may correspond to the core of this domain application. The method of solving the domain shift is a method of inputting the target domain data into a model learned from the source domain to make the extracted features the same as the feature distribution of the source domain, and a method of simultaneously using data from two different domains for learning to achieve the two domains' A method of extracting common features among feature distributions may be included.

도 1은 본 발명에 따른 이미지 보정 시스템을 설명하는 도면이다.1 is a diagram illustrating an image correction system according to the present invention.

도 1을 참조하면, 이미지 보정 시스템(100)은 사용자 단말(110), 이미지 보정 장치(130) 및 데이터베이스(150)를 포함할 수 있다.Referring to FIG. 1 , an image correction system 100 may include a user terminal 110 , an image correction device 130 and a database 150 .

사용자 단말(110)은 이미지 보정 장치(130)와 연결되어 이미지 보정이 필요한 이미지를 제공하고 보정된 이미지를 수신하여 확인할 수 있는 컴퓨팅 장치에 해당할 수 있다. 사용자 단말(110)은 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다. 사용자 단말(110)은 이미지 보정 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들이 이미지 보정 장치(130)와 동시에 연결될 수도 있다.The user terminal 110 may correspond to a computing device that is connected to the image correction device 130 to provide an image requiring image correction and to receive and check the corrected image. The user terminal 110 may be implemented as a smart phone, a laptop computer, or a computer, but is not necessarily limited thereto, and may also be implemented as various devices such as a tablet PC. The user terminal 110 may be connected to the image calibration device 130 through a network, and a plurality of user terminals 110 may be simultaneously connected to the image calibration device 130 .

이미지 보정 장치(130)는 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 수행하는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 이미지 보정 장치(130)는 사용자 단말(110)과 유선 또는 무선 네트워크를 통해 연결될 수 있고 상호 간에 데이터를 주고받을 수 있다.The image correction device 130 may be implemented as a server corresponding to a computer or program that performs an adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention. The image calibration device 130 may be connected to the user terminal 110 through a wired or wireless network and may exchange data with each other.

일 실시예에서, 이미지 보정 장치(130)는 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 수행하는 과정에서 다양한 외부 시스템(또는 서버)과 연동하여 동작할 수 있다. 이를 통해, 이미지 보정 장치(130)는 SNS 서비스, 포털 사이트, 블로그 등을 통해 다양한 이미지 데이터에 접근할 수 있으며, 적대적 학습을 위한 학습 데이터의 수집과 학습 모델의 구축 등에 필요한 데이터를 수집할 수 있다. 또한, 이미지 보정 장치(130)는 사용자의 요청에 대한 응답으로서 입력 이미지에 대한 원문 이미지를 복원하여 제공하는 동작을 수행할 수 있다. In one embodiment, the image correction device 130 may operate in conjunction with various external systems (or servers) in the process of performing the adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention. Through this, the image correction device 130 can access various image data through SNS services, portal sites, blogs, etc., and can collect data necessary for collecting learning data for adversarial learning and building a learning model. . Also, the image correction device 130 may perform an operation of restoring and providing an original image for an input image as a response to a user's request.

데이터베이스(150)는 이미지 보정 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 예를 들어, 데이터베이스(150)는 학습 및 평가를 위한 학습 데이터셋에 관한 정보를 저장할 수 있고, 색 항상성 모델 구축을 위한 학습 알고리즘 및 모델 정보를 저장할 수 있으며, 반드시 이에 한정되지 않고, 이미지 보정 장치(130)가 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 수행하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 150 may correspond to a storage device for storing various information necessary for the operation of the image correction device 130 . For example, the database 150 may store information about a learning dataset for learning and evaluation, and may store learning algorithm and model information for building a color constancy model, but is not necessarily limited thereto, and an image calibration device. 130 may store information collected or processed in various forms in the process of performing the adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention.

도 2는 도 1의 이미지 보정 장치의 기능적 구성을 설명하는 도면이다.FIG. 2 is a diagram explaining the functional configuration of the image correction device of FIG. 1 .

도 2를 참조하면, 이미지 보정 장치(130)는 패치 이미지 생성부(210), 학습 데이터셋 생성부(220), 원본 이미지 식별부(230), 학습 데이터셋 갱신부(240), 이미지 특징 생성부(250), 모델 구축부(260), 원본 이미지 복원부(270), 모델 성능 평가부(280) 및 제어부(도 2에 미도시함)를 포함할 수 있다.Referring to FIG. 2 , the image correction device 130 includes a patch image generator 210, a training dataset generator 220, an original image identification unit 230, a training dataset updater 240, and image feature generation. It may include a unit 250, a model building unit 260, an original image restoration unit 270, a model performance evaluation unit 280, and a control unit (not shown in FIG. 2).

다만, 본 발명의 실시예에 따른 이미지 보정 장치(130)가 상기의 기능적 구성들을 동시에 모두 포함해야 하는 것은 아니며, 각각의 실시예에 따라 상기의 구성들 중 일부를 생략하거나, 상기의 구성들 중 일부 또는 전부를 선택적으로 포함하여 구현될 수도 있다. 또한, 이미지 보정 장치(130)는 필요에 따라 상기의 구성들 중 일부를 선택적으로 포함하는 독립된 장치들로 구현될 수 있으며, 각 장치들 간의 연동을 통해 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 수행할 수도 있다. 이하, 각 구성들의 동작을 구체적으로 설명한다.However, the image correction device 130 according to an embodiment of the present invention does not have to include all of the above functional components at the same time, and some of the above components may be omitted or selected from among the above components according to each embodiment. It may be implemented by selectively including some or all of them. In addition, the image correction device 130 may be implemented as independent devices that selectively include some of the above configurations as needed, and deep learning analysis of heterogeneous images according to the present invention can be performed through interworking between devices. It is also possible to perform an adversarial learning-based image correction method for Hereinafter, the operation of each component will be described in detail.

패치 이미지 생성부(210)는 다수의 이미지들을 포함하는 데이터 집합을 기초로 패치 이미지를 생성하는 동작을 수행할 수 있다. 여기에서, 패치 이미지는 원래의 원본 이미지에서 추출되는 부분 이미지에 해당할 수 있으며, 필요에 따라 하나의 원본 이미지로부터 다수의 패치 이미지들이 추출될 수 있다. 패치 이미지 생성부(210)는 사전에 설정된 크기 및 개수에 따라 패치 이미지를 생성할 수 있다.The patch image generation unit 210 may perform an operation of generating a patch image based on a data set including a plurality of images. Here, the patch image may correspond to a partial image extracted from the original original image, and a plurality of patch images may be extracted from one original image as needed. The patch image generation unit 210 may generate patch images according to a preset size and number.

한편, 다수의 이미지들을 포함하는 데이터 집합은 사전에 수집되어 구축될 수 있으며, 서로 다른 도메인(domain)으로 분류되는 원본 이미지들로 구성될 수 있다. 즉, 데이터 집합은 복수의 이미지셋들을 포함할 수 있고, 각 이미지셋은 특정 도메인으로 분류되는 이미지들을 포함하여 독립적으로 구성될 수 있다. 특히, 데이터 집합은 적어도 2개 이상의 서로 다른 도메인들에 관한 이미지셋을 포함하여 생성될 수 있다.Meanwhile, a data set including a plurality of images may be collected and constructed in advance, and may be composed of original images classified into different domains. That is, the data set may include a plurality of image sets, and each image set may be independently composed of images classified into a specific domain. In particular, the data set may be generated including image sets related to at least two or more different domains.

여기에서, 도메인은 이미지가 촬영된 환경을 나타낼 수 있다. 예를 들어, 도메인은 이미지를 촬영한 카메라의 특성으로 표현될 수 있다. 즉, 이미지 a가 카메라 A를 통해 촬영된 경우, '이미지 a'의 도메인은 '카메라 A(또는 도메인 A)'로 분류될 수 있다. 또한, 이미지 b가 카메라 A와 다른 특성을 가진 카메라 B를 통해 촬영된 경우, '이미지 b'의 도메인은 '카메라 B(또는 도메인 B)'로 분류될 수 있다.Here, the domain may represent an environment in which an image is captured. For example, a domain may be expressed as a characteristic of a camera that captures an image. That is, when image a is captured through camera A, the domain of 'image a' may be classified as 'camera A (or domain A)'. Also, when image b is captured through camera B having different characteristics from camera A, the domain of 'image b' may be classified as 'camera B (or domain B)'.

일 실시예에서, 패치 이미지 생성부(210)는 각 원본 이미지의 픽셀 데이터를 RGB 채널 별로 독립적으로 생성하고, RGB 채널의 픽셀 데이터에 대해 특정 크기로 정의되는 인접 영역의 픽셀들을 그룹화하며, 그룹화된 픽셀들을 연산하여 RGB 채널에 대한 픽셀값을 각각 산출할 수 있다. 이미지의 각 픽셀은 RBG, 즉 빨강(R), 초록(G) 및 파랑(B) 각각에 대한 값으로 표현될 수 있으며, 패치 이미지 생성부(210)는 원본 이미지의 각 픽셀 별로 RGB 채널의 픽셀값을 생성할 수 있고, 이를 기초로 원본 이미지의 부분 이미지에 해당하는 패치 이미지를 생성할 수 있다.In one embodiment, the patch image generation unit 210 independently generates pixel data of each original image for each RGB channel, groups pixels in an adjacent area defined as a specific size for the pixel data of the RGB channel, and groups the pixel data of the RGB channel. A pixel value for each RGB channel may be calculated by calculating the pixels. Each pixel of the image may be expressed as a value for each of RBG, that is, red (R), green (G), and blue (B), and the patch image generator 210 generates a pixel of the RGB channel for each pixel of the original image. A value can be created, and based on this value, a patch image corresponding to a partial image of the original image can be created.

예를 들어, 원본 이미지의 크기가 4*4인 경우 패치 이미지 생성부(210)는 16개의 픽셀에 대해 각각 R, G 및 B 채널의 픽셀값들을 산출하여 4*4*3의 배열로 표현할 수 있다. 이후, 패치 이미지 생성부(210)는 R, G 및 B 각각에 대해 2*2 크기를 갖는 인접 영역을 정의하여 해당 인접 픽셀들을 그룹화할 수 있다. 또한, 패치 이미지 생성부(210)는 그룹화된 인접 픽셀들을 묶어 하나의 패치 이미지를 생성할 수 있으며, 인접 영역 내의 4개의 픽셀값들을 합산하여 해당 패치 이미지의 픽셀값으로 결정할 수 있다.For example, if the size of the original image is 4*4, the patch image generation unit 210 may calculate pixel values of R, G, and B channels for 16 pixels and express them in a 4*4*3 array. there is. After that, the patch image generation unit 210 may group corresponding adjacent pixels by defining adjacent regions having a size of 2*2 for each of R, G, and B. In addition, the patch image generation unit 210 may generate a single patch image by combining the grouped adjacent pixels, and may determine the pixel value of the corresponding patch image by summing 4 pixel values in the adjacent area.

일 실시예에서, 패치 이미지 생성부(210)는 RGB 채널에 대한 픽셀값을 합산하여 패치 이미지의 밝기를 결정할 수 있다. 즉, 패치 이미지의 밝기는 RGB 채널 각각의 픽셀값들을 합산하여 산출될 수 있다. 또한, 패치 이미지가 다수의 픽셀들을 포함하는 경우 각 픽셀에 대한 RGB 채널별 픽셀값들의 총합을 통해 해당 패치 이미지의 밝기를 결정할 수 있다.In one embodiment, the patch image generation unit 210 may determine the brightness of the patch image by summing pixel values for RGB channels. That is, the brightness of the patch image may be calculated by summing the pixel values of each of the RGB channels. In addition, when the patch image includes a plurality of pixels, the brightness of the corresponding patch image may be determined through the sum of pixel values of each RGB channel for each pixel.

학습 데이터셋 생성부(220)는 이미지셋들의 각 원본 이미지에서 추출된 복수의 패치 이미지들 중에서 일부를 선별하여 학습 데이터셋을 생성할 수 있다. 각 이미지셋은 서로 다른 도메인에 대응되어 생성될 수 있으며, 이에 따라 각 이미지셋에서 추출될 패치 이미지들 역시 서로 다른 도메인에 대응될 수 있다. 다만, 학습 데이터셋 생성부(220)에 의해 생성된 학습 데이터셋은 서로 다른 도메인의 패치 이미지들을 하나로 통합하여 구성될 수 있다. 학습 데이터셋 생성부(220)는 패치 이미지들을 그대로 이용하여 학습 데이터셋을 구축할 수도 있으나, 패치 이미지들 중에서 소정의 조건을 충족하는 패치 이미지들만을 선택하여 학습 데이터셋을 구축할 수 있다.The training dataset generation unit 220 may generate a training dataset by selecting some of a plurality of patch images extracted from each original image of the image sets. Each image set may be generated corresponding to different domains, and accordingly, patch images to be extracted from each image set may also correspond to different domains. However, the learning dataset generated by the learning dataset generation unit 220 may be configured by integrating patch images of different domains into one. The training dataset generation unit 220 may build a training dataset by using patch images as they are, but may build a training dataset by selecting only patch images that satisfy a predetermined condition among patch images.

일 실시예에서, 학습 데이터셋 생성부(220)는 패치 이미지의 밝기를 기준으로 상위 n개(상기 n은 자연수)의 패치 이미지를 선별할 수 있다. 예를 들어, 하나의 원본 이미지에서 4개의 패치 이미지들이 생성된 경우 학습 데이터셋 생성부(220)는 가장 밝은 픽셀값을 갖는 상위 2개의 패치 이미지들만을 선택하여 학습 데이터셋에 추가할 수 있다. 학습 데이터셋 생성부(220)는 학습 데이터의 특성 및 학습 조건 등에 따라 하나의 원본 이미지에서 선별되는 패치 이미지의 개수를 가변적으로 결정할 수 있다.In an embodiment, the training dataset generation unit 220 may select top n patch images (where n is a natural number) based on the brightness of the patch images. For example, when four patch images are generated from one original image, the training dataset generation unit 220 may select only the top two patch images having the brightest pixel values and add them to the training dataset. The training dataset generation unit 220 may variably determine the number of patch images selected from one original image according to characteristics of training data and learning conditions.

원본 이미지 식별부(230)는 학습 데이터셋의 각 학습 이미지에 대응되는 원본 이미지의 RGB 레이블(label)과 소스(source)를 식별하는 동작을 수행할 수 있다. 여기에서, RGB 레이블은 이미지의 RGB 채널별 픽셀값으로 표현될 수 있고, 소스는 이미지의 출처(예를 들어, 이미지 촬영 환경 또는 카메라 특성)에 해당할 수 있다. 이미지셋의 원본 이미지들은 사전에 RBG 레이블과 소스에 관한 레이블 정보를 포함하여 생성될 수 있다. 즉, 원본 이미지 식별부(230)는 원본 이미지의 레이블 정보를 참조하여 RGB 레이블과 소스를 각각 식별할 수 있다. The original image identification unit 230 may perform an operation of identifying an RGB label and a source of an original image corresponding to each training image of the training dataset. Here, the RGB label may be expressed as a pixel value for each RGB channel of the image, and the source may correspond to the source of the image (eg, image capturing environment or camera characteristics). The original images of the image set may be created by including the RBG label and source label information in advance. That is, the original image identification unit 230 may identify the RGB label and the source by referring to the label information of the original image.

만약 원본 이미지에 레이블 정보가 포함되지 않은 경우, 원본 이미지 식별부(230)는 원본 이미지에 대한 속성 분석을 통해 RGB 레이블을 결정하고, 원본 이미지의 카메라 특성을 예측하여 소스를 분류할 수 있다. 한편, 원본 이미지 식별부(230)의 동작은 패치 이미지 생성부(210) 및 학습 데이터셋 생성부(220)의 동작과 병렬적으로 수행될 수 있다.If label information is not included in the original image, the original image identification unit 230 may determine an RGB label through attribute analysis of the original image, predict camera characteristics of the original image, and classify the source. Meanwhile, the operation of the original image identification unit 230 may be performed in parallel with the operations of the patch image generator 210 and the training dataset generator 220 .

일 실시예에서, 원본 이미지 식별부(230)는 원본 이미지의 속성들 중 어느 하나의 RGB 값을 RGB 레이블로 정의하고, 원본 이미지와 연관된 카메라 특성을 소스로 정의하고 카메라 특성에 관한 도메인 분류 결과에 따라 도메인 레이블을 결정할 수 있다. 즉, 이미지에 대한 RGB 레이블은 이미지의 특정 속성에 대한 RGB 값으로 표현될 수 있다. 예를 들어, 이미지의 속성은 조명, 색상, 밝기 및 채도 등을 포함할 수 있으며, 원본 이미지 식별부(230)는 속성들 중 어느 하나를 특정하여 RGB 레이블로 정의할 수 있다. 또한, 원본 이미지 식별부(230)는 원본 이미지가 촬영된 카메라의 특성에 따라 기 정의된 도메인들 중 어느 하나로 분류할 수 있다. 이때, 도메인은 이미지 촬영에 사용된 카메라들을 기준으로 정의될 수 있다.In one embodiment, the original image identification unit 230 defines any one of the RGB values of the properties of the original image as an RGB label, defines a camera characteristic associated with the original image as a source, and determines a domain classification result related to the camera characteristic. domain label can be determined accordingly. That is, an RGB label for an image may be expressed as an RGB value for a specific property of the image. For example, attributes of an image may include lighting, color, brightness, saturation, etc., and the original image identification unit 230 may specify one of the attributes and define it as an RGB label. Also, the original image identification unit 230 may classify the original image into one of predefined domains according to the characteristics of the camera in which the original image was captured. In this case, the domain may be defined based on cameras used to capture the image.

학습 데이터셋 갱신부(240)는 각 학습 이미지에 원본 이미지 식별부(230)에 의해 식별된 RGB 레이블과 소스에 대응되는 도메인 레이블을 태깅(tagging)하여 학습 데이터셋을 갱신할 수 있다. 즉, 학습 데이터셋의 모든 학습 이미지는 각각 RGB 레이블과 도메인 레이블이 포함된 이미지 데이터로 구성될 수 있다. 예를 들어, 학습 이미지의 제1 태그는 RGB 레이블에 대응될 수 있고, 제2 태그는 도메인 레이블에 대응될 수 있다. 한편, 학습 데이터셋 갱신부(240)의 동작은 원본 이미지 식별부(230)의 동작에 연결될 수 있고, 패치 이미지 생성부(210) 및 학습 데이터셋 생성부(220)의 동작과 병렬적으로 수행될 수 있다.The training dataset updating unit 240 may update the training dataset by tagging each training image with an RGB label identified by the original image identification unit 230 and a domain label corresponding to the source. That is, all training images in the training dataset may be composed of image data including RGB labels and domain labels, respectively. For example, the first tag of the training image may correspond to an RGB label, and the second tag may correspond to a domain label. Meanwhile, the operation of the learning dataset updater 240 may be connected to the operation of the original image identification unit 230 and performed in parallel with the operations of the patch image generator 210 and the training dataset generator 220. It can be.

이미지 특징 생성부(250)는 학습 데이터셋의 학습 이미지를 특징 추출 모델에 적용하여 이미지 특징을 생성할 수 있다. 여기에서, 특징 추출 모델은 사전에 구축된 학습 모델에 해당할 수 있다. 이미지 특징 생성부(250)는 학습 이미지 각각을 독립적으로 구축된 학습 모델에 적용하여 이미지 특징에 관한 정보를 획득할 수 있다. 여기에서, 이미지 특징은 이미지에 포함된 고유의 특징 정보에 해당할 수 있으며, 학습 이미지에 적용되는 특징 추출 방법에 따라 상이하게 표현될 수 있다. 또한, 학습 이미지에서 추출된 이미지 특징은 이후 단계에서 적대적 학습을 위한 학습 데이터로 사용될 수 있다.The image feature generation unit 250 may generate image features by applying the training image of the training dataset to a feature extraction model. Here, the feature extraction model may correspond to a pre-built learning model. The image feature generation unit 250 may acquire information about image features by applying each training image to an independently constructed learning model. Here, the image feature may correspond to unique feature information included in the image and may be expressed differently according to a feature extraction method applied to the training image. In addition, image features extracted from the training image may be used as training data for adversarial learning in a later step.

일 실시예에서, 이미지 특징 생성부(250)는 학습 이미지를 입력으로 수신하여 이미지 특징을 출력으로 생성하는 CNN(Convolutional Neural Network) 모델을 특징 추출 모델로서 적용할 수 있다. 일 실시예에서, 이미지 특징 생성부(250)는 합성곱층(Convolutional Layer), 맥스풀링층(Max Pooling Layer) 및 플래튼층(Flatten Layer)을 포함하는 CNN 모델을 통해 기 설정된 크기의 다차원 특징 벡터를 상기 이미지 특징으로 생성할 수 있다.In one embodiment, the image feature generation unit 250 may apply a convolutional neural network (CNN) model that receives a training image as an input and generates image features as an output as a feature extraction model. In one embodiment, the image feature generator 250 generates a multidimensional feature vector of a predetermined size through a CNN model including a convolutional layer, a max pooling layer, and a flatten layer. It can be created with the above image features.

예를 들어, 이미지 특징 생성부(250)는 CNN 모델을 통해 패치 이미지들 각각을 입력으로 수신하고, 하나의 합성곱층(Convolutional Layer)과 맥스풀링층(Max Pooling Layer)을 통해 합성곱 연산을 수행할 수 있다. 이후, CNN 모델은 합성곱 연산 결과를 기초로 플래튼 층(Flatten Layer)을 통해 3,080차원의 고차원 벡터를 출력할 수 있으며, 이미지 특징 생성부(250)는 CNN 모델이 출력하는 고차원 벡터를 이미지 특징으로 결정할 수 있다.For example, the image feature generation unit 250 receives each of the patch images as an input through a CNN model, and performs a convolution operation through one convolutional layer and a max pooling layer can do. Thereafter, the CNN model can output a 3,080-dimensional high-dimensional vector through a flatten layer based on the result of the convolution operation, and the image feature generator 250 converts the high-dimensional vector output from the CNN model into an image feature can be determined by

모델 구축부(260)는 이미지 특징을 이용하여 각 학습 이미지에 관한 속성 예측 작업과 도메인 분류 작업을 반복하는 적대적 학습을 수행하여 색 항상성 모델(color constancy model)을 구축할 수 있다. 즉, 적대적 학습 과정은 학습 이미지에 대한 속성 예측 작업과 도메인 분류 작업이 독립적으로 수행되는 과정에서 속성 예측과 도메인 분류에 관한 모델 파라미터들을 갱신하는 과정에 해당할 수 있다. 적대적 학습 과정은 소정의 학습 목표를 달성할 때까지 반복적으로 수행될 수 있으며, 적대적 학습을 통해 구축된 모델들 중 적어도 하나는 이미지에서 촬영 환경의 이질성을 제거하기 위한 색 항상성 모델로 사용될 수 있다.The model builder 260 may build a color constancy model by performing adversarial learning in which the attribute prediction task and the domain classification task are repeated for each training image using image features. That is, the adversarial learning process may correspond to a process of updating model parameters related to attribute prediction and domain classification in a process in which the attribute prediction task and the domain classification task for the training image are independently performed. The adversarial learning process may be repeatedly performed until a predetermined learning goal is achieved, and at least one of models built through adversarial learning may be used as a color constancy model for removing heterogeneity of a photographing environment from an image.

일 실시예에서, 모델 구축부(260)는 이미지 특징을 2개의 완전연결계층(Fully Connected Layer)들을 포함하는 속성 예측 모델에 입력하여 학습 이미지의 속성을 예측할 수 있다. 또한, 모델 구축부(260)는 이미지 특징을 하나의 GRL(Gradient Reversal Layer) 및 2개의 완전연결계층들을 포함하는 도메인 분류 모델에 입력하여 학습 이미지의 도메인을 예측할 수 있다. 이미지 특징 생성부(250)에 의해 생성된 이미지 특징은 각각 속성 예측 모델과 도메인 분류 모델의 입력으로 제공될 수 있으며, 속성 예측 모델과 도메인 분류 모델 각각은 이미지 특징에 기초한 속성 예측 과정과 도메인 분류 과정을 통해 학습될 수 있다. 적대적 학습의 구체적인 과정은 도 6에서 보다 자세히 설명한다.In one embodiment, the model builder 260 may predict the attributes of the training image by inputting image features to an attribute prediction model including two fully connected layers. In addition, the model building unit 260 may predict the domain of the training image by inputting image features to a domain classification model including one gradient reversal layer (GRL) and two fully connected layers. The image features generated by the image feature generator 250 may be provided as inputs to an attribute prediction model and a domain classification model, respectively, and each of the attribute prediction model and the domain classification model includes an attribute prediction process and a domain classification process based on image features. can be learned through The specific process of adversarial learning is described in more detail in FIG. 6 .

원본 이미지 복원부(270)는 색 항상성 모델을 이용하여 입력 이미지의 속성을 예측하고 입력 이미지에서 예측된 속성을 제거하여 입력 이미지의 원본 이미지를 복원할 수 있다. 예를 들어, 원본 이미지 복원부(270)는 색 항상성 모델을 이용하여 입력 이미지의 조명 속성을 예측할 수 있고, 입력 이미지에서 조명 속성을 제거하여 조명 효과가 제거된 원본 이미지를 복원할 수 있다. 색 항상성 모델은 입력 이미지의 도메인과 무관하게 조명의 RGB 값을 효과적으로 예측할 수 있으며, 이에 따라 원래의 이미지에 가까운 원본 이미지가 복원될 수 있다.The original image restoration unit 270 may restore the original image of the input image by predicting the attribute of the input image using the color constancy model and removing the predicted attribute from the input image. For example, the original image restoration unit 270 may predict the lighting properties of the input image using a color constancy model, and may restore the original image from which lighting effects are removed by removing the lighting properties from the input image. The color constancy model can effectively predict RGB values of lighting regardless of the domain of the input image, and accordingly, an original image close to the original image can be restored.

모델 성능 평가부(280)는 입력 이미지의 속성에 관한 정답과 예측값 사이의 각도를 산출하여 색 항상성 모델의 성능을 평가할 수 있다. 예를 들어, 이미지의 조명을 속성 정보로 사용하는 경우, 모델 성능 평가부(280)는 색 항상성 모델의 성능 평가 지표로 각 오차(Angular Error, AE)를 사용할 수 있다. 여기에서, 각 오차는 실제 조명과 예측 조명 사이의 각도로 표현될 수 있으며, 모델의 성능이 우수할수록 각 오차의 값은 작아질 수 있다.The model performance evaluation unit 280 may evaluate the performance of the color constancy model by calculating the angle between the correct answer and the predicted value of the attribute of the input image. For example, when lighting of an image is used as attribute information, the model performance evaluation unit 280 may use Angular Error (AE) as a performance evaluation index of a color constancy model. Here, each error may be expressed as an angle between actual illumination and predicted illumination, and the value of each error may decrease as the performance of the model improves.

제어부(도 2에 미도시함)는 이미지 보정 장치(130)의 전체적인 동작을 제어하고, 패치 이미지 생성부(210), 학습 데이터셋 생성부(220), 원본 이미지 식별부(230), 학습 데이터셋 갱신부(240), 이미지 특징 생성부(250), 모델 구축부(260), 원본 이미지 복원부(270) 및 모델 성능 평가부(280) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.A controller (not shown in FIG. 2 ) controls the overall operation of the image calibration device 130, and controls the patch image generator 210, the training dataset generator 220, the original image identification unit 230, and the learning data. A control flow or data flow between the set updater 240, the image feature generator 250, the model builder 260, the original image restorer 270, and the model performance evaluater 280 may be managed.

도 3은 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법을 설명하는 순서도이고, 도 4는 본 발명에 따른 이미지 보정 과정을 설명하는 도면이다.3 is a flowchart illustrating an adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention, and FIG. 4 is a diagram illustrating an image correction process according to the present invention.

도 3 및 4를 참조하면, 도 4의 Phase 1에서, 이미지 보정 장치(130)는 패치 이미지 생성부(210)를 통해 촬영된 이미지들을 패치 단위로 추출할 수 있고(S310, ①), 학습 데이터셋 생성부(220)를 통해 여러 패치들 중 필요한 부분만을 선택하여 학습용 데이터셋을 생성할 수 있다(S320, ②). 이와 동시에, 이미지 보정 장치(130)는 원본 이미지 식별부(230)를 통해 각 원본 이미지의 RGB 레이블과 소스를 식별한 뒤(S330, ③), 학습 데이터셋 갱신부(240)를 통해 각 패치에 대해 RGB 레이블과 도메인 값을 태깅(Tagging)하여 학습 데이터셋을 갱신할 수 있다(S340, ④).Referring to FIGS. 3 and 4 , in Phase 1 of FIG. 4 , the image calibration device 130 may extract images captured by the patch image generator 210 in units of patches (S310, ①), and learning data. Through the set generation unit 220, it is possible to generate a dataset for learning by selecting only necessary parts among several patches (S320, ②). At the same time, the image correction device 130 identifies the RGB label and source of each original image through the original image identification unit 230 (S330, ③), and then assigns each patch through the training dataset update unit 240. It is possible to update the learning dataset by tagging RGB labels and domain values (S340, ④).

또한, 도 4의 Phase 2에서, 이미지 보정 장치(130)는 이미지 특징 생성부(250)를 통해 갱신된 학습 데이터셋으로부터 CNN을 통해 이미지 특징을 추출하는 동작을 수행할 수 있다(S350, ⑤). 마지막으로, 이미지 보정 장치(130)는 모델 구축부(260)를 통해 적대적 학습 단계에서 추출된 이미지 특징을 사용하여 조명 예측(A)과 도메인 분류(B)의 작업을 동시에 수행할 수 있다(S360, ⑥). 이러한 전체 과정을 통해 일반화 성능이 높은 색 항상성 모델이 구축될 수 있다.In addition, in Phase 2 of FIG. 4 , the image calibration device 130 may perform an operation of extracting image features through CNN from the learning dataset updated through the image feature generator 250 (S350, ⑤). . Finally, the image calibration device 130 may simultaneously perform lighting prediction (A) and domain classification (B) tasks using the image features extracted in the adversarial learning step through the model building unit 260 (S360). , ⑥). Through this entire process, a color constancy model with high generalization performance can be built.

도 5는 본 발명에 따른 패치 생성 및 라벨 태깅 과정을 설명하는 도면이다.5 is a diagram illustrating a patch generation and label tagging process according to the present invention.

도 5를 참조하면, 좌측 상단은 원본 이미지(510)에 해당할 수 있고, 제1 태그(520)인 'Domain 1'은 해당 이미지의 출처, 즉 해당 이미지를 촬영한 카메라가 무엇인지를 나타낼 수 있다. 또한, 제2 태그(530)인 'RGB: [243, 228, 86]'은 해당 이미지가 어떤 상황에서 촬영되었는지, 즉 촬영에 사용된 조명의 R, G, B값을 나타낼 수 있다.Referring to FIG. 5 , the upper left corner may correspond to the original image 510, and 'Domain 1', the first tag 520, may indicate the source of the image, that is, what camera captured the image. there is. In addition, 'RGB: [243, 228, 86]', which is the second tag 530, may indicate under what circumstances the corresponding image was captured, that is, the R, G, and B values of lighting used for the photographing.

도 5에서, 원본 이미지(510)인 'Image 1'은 (4*4)의 크기를 갖는 이미지로서 16개의 픽셀(Pixel)로 구성될 수 있다. 각 픽셀은 빨강(R), 초록(G), 파랑(B) 각각에 대한 값을 가질 수 있으며, Image 1에 대한 정보는 좌측 하단과 같이 (4*4*3)의 배열로 나타낼 수 있다. 다음으로 R, G, B 각각에 대해 인접한 픽셀들을 그룹화하여 패치를 생성할 수 있으며, (2*2)의 크기, 즉 4개의 픽셀들을 묶어 하나의 패치를 생성할 수 있다. 그 결과로 원본 이미지(510)의 정보는 우측 하단과 같이 (2*2*3)의 배열로 나타낼 수 있으며, 이때 각 셀에 표시된 숫자는 해당 셀에 포함되는 픽셀의 값을 더한 값에 해당할 수 있다.In FIG. 5 , 'Image 1', which is an original image 510, is an image having a size of (4*4) and may be composed of 16 pixels. Each pixel may have a value for each of red (R), green (G), and blue (B), and information on Image 1 may be displayed in a (4*4*3) array as shown in the lower left. Next, a patch can be created by grouping adjacent pixels for each of R, G, and B, and one patch can be created by grouping 4 pixels of a size of (2*2). As a result, the information of the original image 510 can be represented in a (2*2*3) array as shown in the lower right corner, where the number displayed in each cell corresponds to the sum of the pixel values included in the cell. can

예를 들어, 우측 하단 그림의 첫 번째 셀의 18은 좌측 하단 그림의 1, 2, 13, 2를 합하여 산출된 값이다. 우측 하단 그림은 4개 패치의 R 값을 보여주고 있으며, 각 패치는 R 값 외에 G와 B의 값도 가질 수 있다.For example, 18 in the first cell of the lower right figure is a value calculated by adding 1, 2, 13, and 2 in the lower left figure. The lower right figure shows R values of 4 patches, and each patch can have G and B values as well as R values.

다음 과정은 각 패치의 밝기를 계산하는 과정으로, 도 5에서는 생략되어 있으나 각 패치의 R, G, B 값을 더하여 해당 패치의 총 밝기를 산출할 수 있다. 예를 들어, 우측 하단에서 첫 번째 패치의 값이 R=18, G=20, B=15라면, 해당 패치의 밝기는 RGB 채널의 픽셀 강도 값의 총합인 53으로 계산될 수 있다. 여기에서는 조명 예측에 가장 영향력이 큰 데이터를 식별하기 위해 가장 밝은 값을 갖는 패치를 선택하여 학습 데이터로 사용하는 것으로 가정한다. 이러한 기준으로 RGB 채널의 픽셀 강도 값의 총합이 가장 큰 두 개의 패치를 선택한 결과가 우측 상단에 도시되어 있다.The next process is a process of calculating the brightness of each patch. Although omitted in FIG. 5, the total brightness of the corresponding patch can be calculated by adding the R, G, and B values of each patch. For example, if the value of the first patch in the lower right corner is R=18, G=20, and B=15, the brightness of the corresponding patch can be calculated as 53, which is the sum of pixel intensity values of RGB channels. Here, it is assumed that a patch with the brightest value is selected and used as training data in order to identify data having the greatest influence on lighting prediction. Based on this criterion, the result of selecting two patches having the largest sum of pixel intensity values of RGB channels is shown in the upper right corner.

한편, 이렇게 선택된 'Patch 1'(540)과 'Patch 2'()는 모두 'Image 1'로부터 생성된 것이므로, 이들 두 패치는 모두 'Image 1'의 레이블을 상속받아 'RGB: [243, 228, 86]'와 'Domain 1'의 레이블 값을 태그로 가질 수 있으며, 이는 두 개의 패치 모두 'Domain 1'의 카메라를 사용하여 촬영되었으며 촬영 당시 조명 값이 'RGB: [243, 228, 86]'라는 정보를 나타낼 수 있다. 이들 두 패치와 각각의 레이블 값은 이후 과정에서 적대적 학습을 위한 입력 데이터로 사용될 수 있다.Meanwhile, 'Patch 1' (540) and 'Patch 2' () selected in this way are both created from 'Image 1', so both of these patches inherit the label of 'Image 1' and display 'RGB: [243, 228 . '. These two patches and their respective label values can be used as input data for adversarial learning in a later process.

도 6은 본 발명에 따른 특징 추출 및 적대적 학습 과정을 설명하는 도면이다.6 is a diagram illustrating a feature extraction and adversarial learning process according to the present invention.

도 6을 참조하면, 이미지 보정 장치(130)에서 수행되는 특징 추출 및 적대적 학습 과정의 일 실시예가 도시되어 있다. 좌측에 나타난 4장의 패치들은 'Domain 1'과 'Domain 2'로부터 각각 2장씩 생성될 수 있다. 도 6의 중앙에 위치한 CNN(610)은 이러한 패치들 각각을 입력으로 받아 하나의 합성곱층(Convolutional Layer)과 맥스풀링층(Max Pooling Layer)을 거쳐 합성곱 연산을 진행한 뒤, 다시 플래튼 층(Flatten Layer)을 통해 3,080차원의 고차원 벡터를 추출할 수 있다. 이렇게 추출된 특징 벡터는 우측 상단의 도메인 분류기(Domain Discriminator)(620)와 우측 하단의 조명 예측기(Illumination Estimator)(630)의 입력으로 사용될 수 있다.Referring to FIG. 6 , an embodiment of a process of feature extraction and adversarial learning performed by the image calibration device 130 is shown. The four patches shown on the left can be generated by two each from 'Domain 1' and 'Domain 2'. The CNN 610 located in the center of FIG. 6 receives each of these patches as an input and performs a convolution operation through one convolutional layer and a max pooling layer, and then returns to the platen layer. A 3,080-dimensional high-dimensional vector can be extracted through (Flatten Layer). The feature vectors extracted in this way can be used as inputs to the domain discriminator 620 at the upper right and the illumination estimator 630 at the lower right.

조명 예측기(630)는 완전연결계층(Fully Connected Layer) 2개를 쌓아서 구성될 수 있으며, RGB 레이블에 해당하는 값을 예측하는 학습이 진행될 수 있다. 도 6의 경우, 각 패치들의 촬영 당시 조명 값으로 태깅된 'RGB: [a, b, c]', 'RGB: [d, e, f]', 'RGB: [g, h, i]', 그리고 'RGB: [j, k, i]'가 예측의 목표 값에 해당할 수 있다. 일 실시예에서, 개별 이미지를 32*32 크기의 패치로 구분하여 밝기 값이 가장 높은 100개의 패치만을 학습에 사용할 수 있으며, 이러한 반복 학습을 통해 CNN(610)에서 추출된 고차원 벡터는 RGB 조명 값을 예측하도록 갱신될 수 있다.The lighting predictor 630 may be configured by stacking two fully connected layers, and learning to predict a value corresponding to an RGB label may be performed. In the case of FIG. 6, 'RGB: [a, b, c]', 'RGB: [d, e, f]', 'RGB: [g, h, i]' tagged with lighting values at the time of shooting of each patch. , and 'RGB: [j, k, i]' may correspond to the target value of prediction. In one embodiment, individual images are divided into 32*32 sized patches, and only 100 patches having the highest brightness value can be used for learning. can be updated to predict

도메인 분류기(620)는 'GRL(Gradient Reversal Layer)[5]' 1개와 완전연결계층 2개로 구성될 수 있으며, 패치의 도메인 레이블인 'Domain 1'과 'Domain 2'을 각각 예측하여 개별 패치들이 어떤 소스에서 생성된 것인지를 구분하는 학습이 진행될 수 있다. 이때, 도메인 분류기(620)는 일반적인 딥러닝 모델과 달리 정답과 예측 사이의 손실을 늘리는 방향으로 학습이 진행될 수 있다. 즉, 학습을 통해 정답인 'Domain Label'을 잘 구분하지 못하도록 많은 파라미터들이 갱신될 수 있으며, 결과적으로 이질적 환경에서 생성된 이미지 데이터들에서 도메인 응집성이 높은 특성이 제거될 수 있다. 이러한 작업은 적대적 학습의 핵심 요소인 GRL을 통해 구현될 수 있으며, 구체적으로 역전파(Back Propagation) 시 이전 층에서 전달받은 기울기에 -1에서 0 사이의 음수 값을 곱하여 전달하는 방식으로 동작할 수 있다.The domain classifier 620 may be composed of one 'GRL (Gradient Reversal Layer) [5]' and two fully connected layers, and each patch predicts 'Domain 1' and 'Domain 2', which are the domain labels of the patch, so that individual patches are generated. Learning to distinguish which source is generated may proceed. In this case, unlike a general deep learning model, the domain classifier 620 may be trained in a direction of increasing a loss between a correct answer and a prediction. That is, through learning, many parameters can be updated so that the 'Domain Label', which is the correct answer, cannot be distinguished well, and as a result, characteristics with high domain coherence can be removed from image data generated in a heterogeneous environment. This task can be implemented through GRL, which is a key element of adversarial learning. Specifically, during Back Propagation, the gradient received from the previous layer can be multiplied by a negative value between -1 and 0 and transmitted. there is.

조명 예측기(630)와 도메인 분류기(620)가 동시에 상호 작용하며 적대적 학습이 이루어지기 때문에, 이를 통해 최종적으로 얻게 되는 모델은 입력 데이터의 도메인과 무관하게 조명의 RGB 값을 잘 예측할 수 있다.Since the lighting predictor 630 and the domain classifier 620 interact simultaneously and adversarial learning is performed, the finally obtained model can well predict the RGB values of lighting regardless of the domain of the input data.

도 7은 본 발명에 따른 적대적 학습 과정에서 특징 갱신의 일 실시예를 설명하는 도면이다.7 is a diagram illustrating an embodiment of feature update in an adversarial learning process according to the present invention.

도 7을 참조하면, 학습 전 특징(Before feature)(710)은 원본 이미지의 도메인별 특성이 초록색과 청색으로 명확히 구분되어 나타나는 반면, 적대적 학습이 진행될수록 이러한 특성은 옅어져서 학습 후 특징(After feature)(730)에서는 도메인별 특성이 제거된 것을 확인할 수 있다. 즉, 학습 후 특징(730)은 도메인의 특성이 배제된 이미지 고유의 특성만을 표현할 수 있으며, 이를 기초로 이미지를 복원하면 실제의 이미지에 가까운 복원 이미지가 획득될 수 있다. 결과적으로, 본 발명에 따른 적대적 학습을 통해 일반화 가능한 색 항상성 모델이 도출될 수 있다.Referring to FIG. 7, while the domain-specific characteristics of the original image are clearly distinguished in green and blue in the before feature 710, these characteristics fade as adversarial learning progresses, resulting in after-learning features (After feature). ) 730, it can be confirmed that the domain-specific characteristics have been removed. That is, the feature 730 after learning can express only the unique characteristics of an image from which the domain characteristics are excluded, and if the image is reconstructed based on this, a reconstructed image close to the actual image can be obtained. As a result, a generalizable color constancy model can be derived through adversarial learning according to the present invention.

도 8 내지 11은 본 발명에 따른 방법에 관한 실험 내용을 설명하는 도면이다.8 to 11 are diagrams for explaining experimental details related to the method according to the present invention.

여기에서는 본 발명에 따른 이미지 보정 방법에 관한 실험 내용을 설명한다. 즉, 실험에서는 이질적 환경에서 촬영된 이미지에 대한 적대적 학습을 통해 모델의 일반화 성능을 향상시킨 색 항상성 모델의 성능을 평가할 수 있다.Here, experimental contents related to the image correction method according to the present invention will be described. That is, in the experiment, the performance of the color constancy model, which improves the generalization performance of the model through adversarial learning on images taken in a heterogeneous environment, can be evaluated.

도 9의 그림 (a)는 실험에 사용된 벤치마크 데이터셋의 특징을 나타낼 수 있다. 전체 데이터 수는 7,022장으로, 총 3대의 카메라를 통해 촬영되었다. 이 가운데 'C'사와 'N'사의 카메라를 통해 촬영된 이미지 4,902장이 모델 학습에 사용되었으며, 추론에는 학습에 사용되지 않은 'S'사 카메라의 촬영 이미지 2,120장이 사용되었다. 만약 'C'사와 'N'사 카메라 이미지에 대한 학습을 통해 도출한 모델이 'S'사 카메라 이미지의 분석에서도 높은 성능을 보인다면, 해당 모델은 일반화 가능성이 높은 모델로 인정될 수 있다.Figure (a) of FIG. 9 may represent the characteristics of the benchmark dataset used in the experiment. The total number of data was 7,022, which was captured through a total of three cameras. Among them, 4,902 images taken by cameras of 'C' and 'N' companies were used for model learning, and 2,120 images taken by cameras of 'S' that were not used for learning were used for inference. If the model derived from learning the camera images of company 'C' and 'N' shows high performance in the analysis of camera images of company 'S', the model can be recognized as a model with high generalization potential.

모델의 성능 평가에는 색 항상성 기술의 평가 지표로 널리 사용되는 각 오차(Angular Error)가 사용될 수 있다. 해당 지표는 다음의 수학식 1을 통해 실제 조명(Ground Truth)의 RGB 값(

)과 예측 RGB 값(

) 사이의 각도를 계산할 수 있으며, 모델의 성능이 우수할수록 그 값이 작게 나타날 수 있다.Angular Error, which is widely used as an evaluation index for color constancy technology, can be used to evaluate the performance of the model. The index is the RGB value of the actual lighting (Ground Truth) through the following Equation 1 (

) and predicted RGB values (

) can be calculated, and the better the performance of the model, the smaller the value may appear.

[수학식 1][Equation 1]

해당 실험에서는 비교 모델(A), (B), (C)와 본 발명에 따른 모델 (D)의 성능을 비교하였으며, 도 8은 성능 평가를 위한 실험의 전체 개요를 나타낼 수 있다. 비교 모델 (A)와 (B)는 각각 'C'사와 'N' 사의 이미지만 사용하여 학습을 수행한 모델로, 학습은 본 발명에 따른 방법의 '특징 추출기'인 패치 기반(Patch-based) CNN 구조를 이용하여 수행될 수 있다. 비교 모델 (C)는 'C'사와 'N'사의 이미지를 모두 학습에 사용하지만, 도메인 정보를 활용하지 않고 단순히 두 이미지 집합을 통합하여 사용할 수 있다. 한편, (D)는 본 발명에 따른 모델로, 적대적 학습을 통해 일반화 성능을 향상시킨 모델에 해당할 수 있다. 이들 네 가지 모델의 성능 평가를 위해 'S' 사의 이미지에 대한 추론을 수행할 수 있으며, 각 모델의 추론 결과에 대한 각 오차(Angular Error)를 비교한다.In the experiment, the performance of the comparative models (A), (B), and (C) and the model (D) according to the present invention was compared, and FIG. 8 may show the overall outline of the experiment for performance evaluation. Comparison models (A) and (B) are models in which learning is performed using only images of companies 'C' and 'N', respectively, and learning is patch-based, which is the 'feature extractor' of the method according to the present invention. It can be performed using a CNN structure. Comparison model (C) uses both 'C' and 'N' images for training, but it can simply integrate and use the two image sets without using domain information. Meanwhile, (D) is a model according to the present invention and may correspond to a model in which generalization performance is improved through adversarial learning. In order to evaluate the performance of these four models, inference can be performed on the images of company 'S', and the angular errors of the inference results of each model are compared.

도 8의 'Patch-based CNN Training'의 세부 구조는 기본적인 구조를 따르되, 일반화 성능 향상을 위해 합성곱층과 맥스풀링층 사이에 배치정규화층(Batch Normalization Layer)이 삽입될 수 있다. 또한, 하이퍼파라미터(Hyperparameter)로 배치 크기(Batch Size)는 64, 에폭(Epoch)은 10으로 설정될 수 있다. 실험을 수행한 H/W 및 S/W 환경은 도 9의 그림 (b)와 같다.The detailed structure of 'Patch-based CNN Training' in FIG. 8 follows the basic structure, but a batch normalization layer may be inserted between the convolution layer and the max pooling layer to improve generalization performance. Also, as hyperparameters, the batch size can be set to 64 and the epoch to 10. The H/W and S/W environment in which the experiment was performed is shown in Figure (b) of FIG.

해당 실험에서 비교 모델들과 본 발명에 따른 모델은 모두 패치 단위로 학습이 이루어질 수 있으며, 추론 시에도 패치 단위의 개별 입력은 자신의 원본 이미지의 RGB 값을 각각 예측한다. 이때, 성능 평가에는 각 패치 단위로 정답과 예측 값의 Angular Error를 계산하는 방법(Each Patch), 또는 패치를 통합하여 각 이미지 단위로 정답과 예측 값의 Angular Error를 계산하는 방법이 사용될 수 있다. 패치의 통합은 각 이미지에 대해 밝기의 강도가 가장 높은 100개 패치의 RGB 값을 집계하는 방식으로 이루어질 수 있으며, 집계에는 일반적으로 평균 풀링(Average Pooling), 또는 중위수 풀링(Median Pooling) 방식이 널리 사용될 수 있다. 도 10은 상기의 세 가지 방식에 따라 산출한 네 가지 모델 각각의 평균 Angular Error를 나타낼 수 있으며, 이를 도식화한 결과는 도 11과 같다.In the experiment, both the comparison models and the model according to the present invention can be learned in a patch unit, and even during inference, individual inputs in a patch unit predict the RGB values of their original images, respectively. At this time, a method of calculating the angular error of the correct answer and the predicted value in each patch unit (Each Patch) or a method of calculating the angular error of the correct answer and the predicted value in each image unit by integrating the patches can be used for performance evaluation. Integration of patches can be done by aggregating the RGB values of 100 patches with the highest intensity of brightness for each image. In general, average pooling or median pooling is widely used for aggregation. can be used FIG. 10 can show the average Angular Error of each of the four models calculated according to the above three methods, and the result of this diagram is shown in FIG. 11.

도 10과 도 11에서, Angular Error의 산출 단위, 즉 패치 단위의 산출과 이미지 단위의 산출은 큰 차이를 나타내지 않았으며, 세 가지 모든 방식에서 본 발명에 따른 방법의 Angular Error가 가장 낮은 것으로 나타날 수 있다. 이는 본 발명에 따른 방법이 적대적 학습을 통해 도메인 응집성이 높은 특성을 제거함으로써 일반화 성능이 우수한 모델을 생성하였음을 나타낼 수 있다. 한편, 각각 'C'사와 'N'사의 이미지만 사용하여 학습을 수행한 비교 모델 (A)와 (B)의 성능 차이가 크게 나타난 것은 이미지 소스의 특성에 기인한 것으로 볼 수 있다. 즉, 학습에 사용된 'N'사의 이미지는 다소 청색을 띠고 추론에 사용된 'S'사의 이미지는 다소 청녹색을 띠었는데, 두 데이터 셋의 우연한 도메인 유사성으로 인해 (B) 모델의 성능이 과평가된(Overestimated)것으로 해석될 수 있다.10 and 11, the calculation unit of Angular Error, that is, the calculation of the patch unit and the calculation of the image unit did not show a big difference, and the Angular Error of the method according to the present invention was the lowest in all three methods. there is. This may indicate that the method according to the present invention generated a model with excellent generalization performance by removing features with high domain coherence through adversarial learning. On the other hand, the large performance difference between the comparison models (A) and (B), which were trained using only the images of 'C' and 'N' companies, respectively, can be attributed to the characteristics of the image source. In other words, the image of company 'N' used for learning was somewhat blue, and the image of company 'S' used for inference was somewhat blue-green. can be interpreted as being overestimated.

본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법은 방대한 이미지 데이터를 컴퓨터 비전 분야의 딥러닝 기술에 효과적으로 활용하기 위한 방안을 제공할 수 있다. 즉, 이질적 환경에서 촬영된 이미지는 동일한 정보임에도 촬영 환경에 따라 각 이미지의 특징이 상이하게 표현될 수 있다. 이는 각 이미지가 갖는 상이한 환경 정보뿐 아니라 이미지 고유의 정보조차 서로 상이한 특징으로 표현될 수 있으며, 이로 인해 이들 이미지 정보는 서로 잡음으로 작용해 모델의 분석 성능을 저해할 수 있음을 의미할 수 있다. 본 발명에 따른 이질적 이미지의 딥러닝 분석을 위한 적대적 학습기반 이미지 보정 방법은 적대적 학습을 통해 이질적 환경에서 생성된 이미지들의 색 항상성을 향상시키는 방안을 제공하는 점에서 일반화 성능을 향상시키고자 하는 다양한 딥러닝 응용 분야에 널리 활용될 수 있다.The adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention can provide a method for effectively utilizing vast image data for deep learning technology in the field of computer vision. That is, although images captured in heterogeneous environments have the same information, characteristics of each image may be expressed differently depending on the capturing environment. This may mean that not only different environmental information of each image, but also image-specific information may be expressed as different features, and as a result, these image information may act as noise to hinder the analysis performance of the model. The adversarial learning-based image correction method for deep learning analysis of heterogeneous images according to the present invention provides a method for improving color constancy of images generated in a heterogeneous environment through adversarial learning, and various deep learning methods are intended to improve generalization performance. It can be widely used in running applications.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art will variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the claims below. You will understand that it can be done.

100: 이미지 보정 시스템
110: 사용자 단말 130: 이미지 보정 장치
150: 데이터베이스
210: 패치 이미지 생성부 220: 학습 데이터셋 생성부
230: 원본 이미지 식별부 240: 학습 데이터셋 갱신부
250: 이미지 특징 생성부 260: 모델 구축부
270: 원본 이미지 복원부 280; 모델 성능 평가부100: image correction system
110: user terminal 130: image calibration device
150: database
210: patch image generation unit 220: training dataset generation unit
230: Original image identification unit 240: Training dataset update unit
250: image feature generation unit 260: model building unit
270: original image restoration unit 280; Model performance evaluation department

Claims

generating at least one patch image from each original image based on image sets each composed of original images of different domains;
generating a training dataset by selecting a portion of the at least one patch image for each original image of the image sets;
Identifying an RGB label and source of an original image corresponding to each training image of the training dataset;
updating the training dataset by tagging each training image with the RGB label and a domain label corresponding to the source;
generating image features by applying a training image of the training dataset to a feature extraction model; and
Constructing a color constancy model by performing adversarial learning by repeating the attribute prediction task and the domain classification task for each training image using the image features; deep learning analysis of heterogeneous images, including An adversarial learning-based image correction method for

The method of claim 1 , wherein generating the patch image comprises:
independently generating pixel data of each original image for each RGB channel;
grouping pixels of an adjacent area defined with a specific size for the pixel data of the RGB channels; and
An adversarial learning-based image correction method for deep learning analysis of a heterogeneous image, comprising calculating a pixel value for the RGB channel by calculating the grouped pixels.

3. The method of claim 2, wherein generating the patch image
Adversarial learning-based image correction method for deep learning analysis of heterogeneous images, comprising the step of determining the brightness of the patch image by summing the pixel values for the RGB channels.

The method of claim 3, wherein generating the training dataset
Adversarial learning-based image correction method for deep learning analysis of heterogeneous images, comprising the step of selecting the top n patch images (where n is a natural number) based on the brightness of the patch image.

2. The method of claim 1, wherein identifying the RGB label and source comprises:
defining one of RGB values among properties of the original image as the RGB label; and
Defining a camera characteristic associated with the original image as the source and determining the domain label according to a result of domain classification on the camera characteristic, adversarial learning-based image correction for deep learning analysis of heterogeneous images method.

2. The method of claim 1, wherein generating the image feature comprises:
Applying a Convolutional Neural Network (CNN) model that receives the training image as an input and generates the image features as an output as the feature extraction model. Image correction method.

7. The method of claim 6, wherein generating the image features comprises:
Generating a multi-dimensional feature vector of a preset size as the image feature through a CNN model including a convolutional layer, a max pooling layer, and a flatten layer. An adversarial learning-based image correction method for deep learning analysis of heterogeneous images.

The method of claim 1, wherein the step of building the color constancy model
predicting attributes of a training image by inputting the image features to an attribute prediction model including two fully connected layers; and
For deep learning analysis of heterogeneous images, comprising the step of predicting the domain of a learning image by inputting the image features into a domain classification model including one gradient reversal layer (GRL) and two fully connected layers Adversarial learning-based image correction method.

9. The method of claim 8, wherein the step of building the color constancy model
performing the adversarial learning in a direction of increasing a loss of the domain classification model while reducing a loss of the attribute prediction model during the iteration, and adversarial learning-based image correction for deep learning analysis of heterogeneous images method.

According to claim 1,
predicting properties of an input image using the color constancy model; and
Reconstructing the original image of the input image by removing the predicted attribute from the input image; adversarial learning-based image correction method for deep learning analysis of heterogeneous images, characterized in that it further comprises.

According to claim 11,
Evaluating the performance of the color constancy model by calculating an angle between the correct answer and the predicted value for the attribute of the input image; adversarial learning-based image correction method for deep learning analysis of heterogeneous images, characterized in that it further comprises.

a patch image generating unit generating at least one patch image from each original image based on image sets each composed of original images of different domains;
a learning data set generating unit generating a training data set by selecting a part from among the at least one patch image for each original image of the image sets;
an original image identification unit identifying an RGB label and a source of an original image corresponding to each training image of the training dataset;
a training dataset updating unit updating the training dataset by tagging each of the training images with the RGB label and a domain label corresponding to the source;
an image feature generation unit generating image features by applying the learning image of the training dataset to a feature extraction model; and
A model builder for constructing a color constancy model by performing adversarial learning by repeating attribute prediction and domain classification for each training image using the image features; deep learning of heterogeneous images including An adversarial learning-based image calibration device for analysis.

According to claim 12,
An original image restorer configured to predict a property of an input image using the color constancy model and to restore an original image of the input image by removing the predicted property from the input image; Adversarial learning-based image correction device for deep learning analysis.

According to claim 13,
A model performance evaluation unit configured to evaluate the performance of the color constancy model by calculating an angle between a correct answer and a predicted value of the attribute of the input image; adversarial learning-based image for deep learning analysis of heterogeneous images, characterized in that it further comprises correction device.