KR102475177B1

KR102475177B1 - Method and apparatus for processing image

Info

Publication number: KR102475177B1
Application number: KR1020200145931A
Authority: KR
Inventors: 박종빈; 정종진; 박성주; 김경원
Original assignee: 한국전자기술연구원
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2022-12-07
Also published as: KR20220060210A

Abstract

본 발명은 영상 처리 방법 및 장치에 관한 것이다. 본 발명의 일 실시예에 따른 영상 처리 방법은, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)에 대해 각각 비선형의 데이터 차원 축소를 수행하여, 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타내는 q'와 s_i'를 생성하는 생성 단계; 및 차원 축소된 q'와 s_i'를 이용하여, 각 s_i에 대한 q와의 비교를 간접적으로 수행하는 비교 단계;를 포함한다.The present invention relates to an image processing method and apparatus. An image processing method according to an embodiment of the present invention includes query image information (q) and N pieces of search target image information (s _{1 to s N ) (hereinafter, s 1} _to s , where _N is a natural number of 2 or more) a generating step of performing nonlinear data dimensionality reduction on each of _N (referred to as s _i ) to generate q' and s _i ' representing at least one element with respect to an attribute of a corresponding image; and a comparison step of indirectly performing a comparison with q for each s _i using dimensionally reduced q' and s i _' .

Description

Image processing method and apparatus {METHOD AND APPARATUS FOR PROCESSING IMAGE}

본 발명은 영상 처리 방법 및 장치에 관한 것으로서, 더욱 상세하게는 시맨틱 변인 통제 지원 등의 전처리 기술을 통해 주어진 영상 정보들 간의 상호 비교를 수행할 수 있는 영상 처리 방법 및 장치에 관한 것이다.The present invention relates to an image processing method and apparatus, and more particularly, to an image processing method and apparatus capable of performing mutual comparison between given image information through a preprocessing technique such as supporting semantic variable control.

다양한 영상 처리 중에 복수의 영상 간에 그 상호 유사도를 비교하는 기술(이하, “영상 비교 처리”라 지칭함)이 필요한 경우가 있다. 즉, 영상 비교 처리는 쿼리에 해당하는 영상 정보(q)가 입력되었을 때, 탐색 공간(Search space) 또는 갤러리(Gallery)에 해당하는 집합(S)(단, S={s₁, s₂, …s_i, …s_N}, i∈{1, 2, …N}, N은 2이상의 자연수)의 각 원소(element)들과 쿼리 영상 정보(q) 간의 상호 유사도를 비교하는 기술이다. 이하, S를 “탐색 대상 집합”이라 지칭하며, 탐색 대상 집합(S)의 원소들인 s₁ 내지 s_N를 “탐색 대상 영상 정보”라고 지칭한다.Among various image processing, there is a case where a technique for comparing mutual similarity between a plurality of images (hereinafter, referred to as “image comparison processing”) is required. That is, in the image comparison process, when image information (q) corresponding to a query is input, a set (S) corresponding to a search space or gallery (however, S={s ₁ , s ₂ , ...s _i , ...s _N }, i∈{1, 2, ...N}, where N is a natural number greater than or equal to 2), is a technique for comparing mutual similarity between each element and the query image information (q). Hereinafter, S is referred to as a “search target set”, and elements s ₁ to s _N of the search target set S are referred to as “search target image information”.

이러한 영상 비교 처리는 데이터 검색, 비교, 학습 등에서 빈번하게 발생한다. 일례로, 쿼리 영상 정보(q)가 주어졌을 때, 탐색 대상 집합(S)의 i번째 원소(s_i)와 q와의 유사도를 “similarity(q, s_i)”(단, i∈{1, 2, …N})로 나타낼 수 있다. 이때, similarity(q, s_i)를 계산할 수 있다고 하면, 탐색 대상 집합(S)의 N개의 모든 원소에 대해 각각 유사도를 계산한 후, 어느 조건(예를 들어, 유사도가 가장 크거나, 가장 작거나, 임계 범위에 포함되거나, 임계 범위에 포함되지 않거나 하는 등의 조건)을 만족하는 탐색 대상 집합(S)의 부분집합을 구하는 과정이 다양한 응용에서 필요할 수 있다.Such image comparison processing frequently occurs in data search, comparison, and learning. For example, when query image information (q) is given, “similarity (q, s _i ₎ ” (provided that i∈{1, 2, …N}). At this time, if similarity (q, s _i ) can be calculated, the similarity is calculated for each of the N elements of the set to be searched (S), and then a certain condition (eg, the highest or lowest similarity) A process of finding a subset of the search target set (S) that satisfies a condition such as, being included in a critical range, or not included in a critical range may be required in various applications.

<쿼리 영상 정보(q)가 오염되었을 때의 문제><Problem when query image information (q) is corrupted>

한편, 현실에서는 쿼리 영상 정보(q) 또는 탐색 대상 집합(S)의 어느 원소가 임의의 객체에 의해 가려지거나, 노이즈가 생기거나, 훼손되거나, 너무 작게 촬영된 경우가 발생할 수 있다. 이와 같이 특정 원인에 의해 영상이 오염되어 있거나 정보가 불충분 경우는 빈번히 발생하는데, 종래에는 이런 경우에 쿼리 영상 정보(q)와 탐색 대상 집합(S)의 원소들 간에 온전한 비교가 어려웠다.On the other hand, in reality, a case may occur in which an element of the query image information (q) or the search target set (S) is obscured by an arbitrary object, noise is generated, damaged, or too small. In this way, cases in which the image is contaminated or the information is insufficient due to a specific cause frequently occur. Conventionally, in this case, it is difficult to completely compare the query image information (q) and the elements of the set to be searched (S).

<시맨틱 수준의 비교가 어려운 문제><The problem of difficult comparison of semantic level>

또한, 종래에는 q와 S={s₁, s₂, …s_i, …s_N}를 비교함에 있어서, MSE(Mean Squared Error), PSNR(Peak to Signal to Noise Ratio) 등과 같은 비교 매트릭(Metric) 등을 사용하였다. 하지만, 이러한 매트릭들은 픽셀 값이 서로 일치하는 정도를 판단하는 수준에서는 효과적이지만, 보다 높은 수준의 객체 특성을 반영하여 비교하도록 설계된 것은 아니다. 이러한 문제점을 개선하기 위해, SSIM(Structural Similarity Index Measure) 등과 같은 매트릭(이하, “개선 매트릭”이라 지칭함)은 영상 내 존재하는 구조적인 유사도를 반영하도록 설계되었다. 하지만, 이러한 개선 매트릭의 경우에도 사람이 통상적으로 생각하는 시맨틱(의미) 수준의 비교까지는 수행하지는 않는다.In addition, conventionally, q and S = {s ₁ , s ₂ , . . . s _i , . . . s _N }, comparison metrics such as mean squared error (MSE) and peak to signal to noise ratio (PSNR) were used. However, although these metrics are effective at the level of judging the degree to which pixel values match each other, they are not designed to reflect and compare higher-level object characteristics. In order to improve this problem, a metric such as SSIM (Structural Similarity Index Measure) (hereinafter referred to as “improvement metric”) is designed to reflect the structural similarity existing in an image. However, even in the case of this improvement metric, it does not perform comparison at the level of semantics (meaning) that people usually think.

여기서, “시맨틱 수준의 비교”라 함은 '분홍색 옷과 빨간색 옷과의 관계', '분홍색 옷과 분홍색 꽃과의 관계' 등과 같이, 단순히 픽셀 간 비교보다는 훨씬 높은 수준의 의미 간 비교 분석을 의미한다. 혹여, 다양한 종래기술을 활용하여 이러한 시맨틱 수준의 비교를 수행할 수도 있다. 하지만, 이러한 비교 처리는 통상적으로 연산량이 많아져, 늦은 처리 속도, 많은 에너지 소모, 비효율적인 연산 자원의 활용 등과 같은 문제점이 발생한다.Here, “comparison at the semantic level” means a comparative analysis between meanings at a much higher level than simple comparison between pixels, such as ‘relationship between pink clothes and red clothes’ and ‘relationships between pink clothes and pink flowers’. do. Perhaps, such a semantic level comparison may be performed using various conventional techniques. However, such comparison processing usually increases the amount of computation, resulting in problems such as slow processing speed, high energy consumption, and inefficient use of computational resources.

KRKR 10-2013-010842510-2013-0108425 AA

상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여, 본 발명은 쿼리 입력 영상과 탐색 대상 집합 간의 영상 비교 처리 기술을 제공하되, 영상에 존재하는 고수준의 특징 요소를 능동적이고 선택적으로 변경함으로써 시맨틱 수준의 변인 통제가 가능한 새로운 방식의 영상 비교 처리 기술을 제공하는데 그 목적으로 있다.In order to solve the problems of the prior art as described above, the present invention provides an image comparison processing technique between a query input image and a search target set, but by actively and selectively changing high-level feature elements present in the image, semantic level Its purpose is to provide a new type of image comparison processing technology capable of variable control.

또한, 본 발명은 연산 처리를 효율적으로 수행할 수 있는 영상 비교 처리 기술을 제공하는데 그 다른 목적이 있다.Another object of the present invention is to provide an image comparison processing technology capable of efficiently performing calculation processing.

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제에 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, the problem to be solved by the present invention is not limited to the above-mentioned problems, and other problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

상기와 같은 과제를 해결하기 위한 본 발명의 일 실시예에 따른 영상 처리 방법은, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)에 대해 각각 비선형의 데이터 차원 축소를 수행하여, 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타내는 q'와 s_i'를 생성하는 생성 단계; 및 차원 축소된 q'와 s_i'를 이용하여, 각 s_i에 대한 q와의 비교를 간접적으로 수행하는 비교 단계;를 포함한다.In order to solve the above problem, an image processing method according to an embodiment of the present invention includes query image information (q) and N (N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i ) by performing non-linear data dimensionality reduction, respectively, to generate q 'and s _i 'represented by at least one element for the attribute of the image step; and a comparison step of indirectly performing a comparison with q for each s _i using dimensionally reduced q' and s i _' .

상기 q' 및 s_i'는 머신 러닝(machine learning) 기법으로 기 학습된 모델을 이용하여 생성될 수 있다.The q' and _si ' may be generated using a pre-learned model using a machine learning technique.

상기 모델은 입력 데이터가 입력되는 입력층과, 출력 데이터가 출력되는 출력층과, 입력층 및 출력층의 사이를 연결하되 입력층 및 출력층 보다 노드 개수가 줄어든 적어도 하나의 은닉층을 각각 포함할 수 있다.The model may each include an input layer through which input data is input, an output layer through which output data is output, and at least one hidden layer connecting the input layer and the output layer with a reduced number of nodes compared to the input layer and the output layer.

상기 q' 및 s_i'는 상기 모델에 각각 q 및 s_i를 입력하여 그 노드 값이 설정된 적어도 하나의 은닉층을 기반으로 생성될 수 있다.The q' and s _i ' may be generated based on at least one hidden layer in which node values are set by inputting q and s _i to the model, respectively.

상기 비교 단계는 q'와 s_i'를 직접 비교하거나, q'와 s_i'에 포함된 적어도 어느 하나의 속성에 대한 원소를 비교할 수 있다.In the comparing step, q' and s _i ' may be directly compared, or elements of at least one attribute included in q' and s _i ' may be compared.

상기 비교 단계는 q' 및 s_i' 중에 적어도 어느 하나를 해당 축소 차원 내에서 추가 가공한 후 비교하는 단계를 포함할 수 있다.The comparing step may include comparing at least one of q' and s _i ' after additionally processing it within a corresponding reduced dimension.

상기 추가 가공은 적어도 어느 하나의 속성에 관련된 q' 또는 s_i'의 원소 값을 변경하는 것을 포함할 수 있다.The additional processing may include changing an element value of q' or s _i ' related to at least one attribute.

본 발명의 일 실시예에 따른 영상 처리 방법은 상기 q와 s_i가 가지는 속성에 대한 정보를 추출하는 추출 단계를 더 포함할 수 있다.The image processing method according to an embodiment of the present invention may further include an extraction step of extracting information about attributes of q and s _i .

상기 비교 단계는 상기 추출 단계에서 추출된 정보를 이용하여 q' 및 s_i' 중에 적어도 하나를 해당 축소 차원 내에서 추가 가공하는 단계를 포함할 수 있다.The comparison step may include a step of additionally processing at least one of q' and s _i ' within the corresponding reduced dimension using the information extracted in the extraction step.

상기 추가 가공은 q에서 오염된 적어도 어느 하나의 속성을 q'에서 제거하거나, 어느 한 s_i에서 오염된 적어도 어느 하나의 속성을 해당 s_i'에서 제거하는 것을 포함할 수 있다.The additional processing may include removing at least one contaminated attribute from q from q′, or removing at least one contaminated attribute from one s _i from the corresponding s _i .

상기 추가 가공은 q에서 오염된 적어도 어느 하나의 속성을 각 s_i'에 주입하는 것을 포함할 수 있다.The further processing may include injecting at least one of the contaminated attributes in q into each s _i '.

상기 비교 단계는 q' 또는 s_i'의 속성을 선택적으로 변경하여 변인 통제하는 단계를 포함할 수 있다.The comparison step may include controlling variables by selectively changing attributes of q' or s _i '.

상기 비교 단계는 q'와 s_i'를 디코딩하여 비교하거나, q' 또는 s_i'에 대해 해당 축소 차원 내에서의 추가 가공한 후에 디코딩하여 비교하는 단계를 포함할 수 있다.The comparing step may include decoding and comparing q' and s _i ', or decoding and comparing q' or s _i ' after additional processing within a corresponding reduced dimension.

상기 생성 단계는 머신 러닝(machine learning) 기법으로 기 학습된 모델을 통해, 상기 q에서 오염된 적어도 어느 하나의 속성을 q'에서 제거하거나 어느 한 s_i에서 오염된 적어도 어느 하나의 속성을 해당 s_i'에서 제거하면서, 상기 데이터 차원 축소를 수행하는 단계를 포함할 수 있다.In the generating step, at least one attribute contaminated in q is removed from q' or at least one attribute contaminated in any one s _i is removed from q' through a pre-learned model using a machine learning technique. performing the data dimensionality reduction while removing from _i '.

상기 생성 단계는 서로 다른 속성을 나타내는 다수의 q'를 생성할 수 있고, 각 q'에 따른 속성에 대응하도록 s₁' 내지 s_N' 각각도 다수개로 생성하는 단계를 포함할 수 있다.The generating step may include generating a plurality of q' representing different attributes, and generating a plurality of each of s ₁ 'to s _N ' to correspond to attributes according to each q'.

본 발명의 일 실시예에 따른 영상 처리 장치는, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)를 저장한 메모리; 및 메모리에 저장된 q와 s_i를 이용하여, 각 s_i에 대한 q와의 비교를 제어하는 제어부;를 포함한다.An image processing apparatus according to an embodiment of the present invention includes query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s a memory storing _N (referred to as s _i ); and a control unit controlling a comparison of each s _i with q using q and s _i stored in the memory.

또한, 본 발명의 일 실시예에 따른 영상 처리 장치는, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)를 제1 장치로부터 수신한 통신부; 및 통신부에 수신된 q와 s_i를 이용하여, 각 s_i에 대한 q와의 비교를 제어하며, 그 비교 결과를 제1 장치 또는 제2 장치로 전송하도록 제어하는 제어부;를 포함한다.In addition, the image processing apparatus according to an embodiment of the present invention includes query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N is referred to as s _i ) from the first device; and a control unit controlling a comparison with q for each s _i using the q and s _i received by the communication unit, and controlling transmission of the comparison result to the first device or the second device.

상기 제어부는, q와 s_i에 대해 각각 비선형의 데이터 차원 축소를 수행하여 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타내는 q'와 s_i'를 생성하며, 차원 축소된 q'와 s_i'를 이용하여 각 s_i에 대한 q와의 비교를 간접적으로 수행할 수 있다.The control unit performs non-linear data dimensionality reduction on q and _si , respectively, to generate q' and s _i ' representing at least one element with respect to the attribute of the corresponding image, and dimensionally reduced q' and s _i Comparison with q for each s _i can be performed indirectly using '.

본 발명의 다른 일 실시예에 따른 영상 처리 장치는, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)를 저장한 메모리; q와 s_i에 대해 각각 비선형의 데이터 차원 축소를 수행하여 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타내는 q'와 s_i'를 생성하는 비선형 차원 변환부; 및 q'와 s_i'를 이용하여, 각 s_i에 대한 q와의 비교를 수행하는 비교부;를 포함한다.An image processing apparatus according to another embodiment of the present invention includes query image information (q) and N (N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s 1 to s ₁ to s N ). a memory storing s _N as s _i ); a non-linear dimensional conversion unit for performing non-linear data dimensionality reduction on q and s _i , respectively, to generate q' and s _i ' representing at least one element with respect to an attribute of a corresponding image; and a comparison unit that performs a comparison with q for each s _i using q' and s _i '.

본 발명의 또 다른 일 실시예에 따른 영상 처리 장치는, 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)를 제1 장치로부터 수신하며, 비교부의 비교 결과를 제1 장치 또는 제2 장치로 전송하는 통신부; q와 s_i에 대해 각각 비선형의 데이터 차원 축소를 수행하여 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타내는 q'와 s_i'를 생성하는 비선형 차원 변환부; 및 q'와 s_i'를 이용하여, 각 s_i에 대한 q와의 비교를 수행하는 비교부;를 포함한다.An image processing apparatus according to another embodiment of the present invention includes query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N (referred to as s _i ) from the first device and transmits a comparison result of the comparison unit to the first device or the second device; a non-linear dimensional conversion unit for performing non-linear data dimensionality reduction on q and s _i , respectively, to generate q' and s _i ' representing at least one element with respect to an attribute of a corresponding image; and a comparison unit that performs a comparison with q for each s _i using q' and s _i '.

상기와 같이 구성되는 본 발명은 쿼리 입력 영상과 탐색 대상 집합 간의 영상 비교 처리 기술을 제공하되, 영상에 존재하는 고수준의 특징 요소를 능동적이고 선택적으로 변경함으로써 시맨틱 수준의 변인 통제가 가능한 새로운 방식의 영상 비교 처리가 가능한 이점이 있다.The present invention configured as described above provides an image comparison processing technology between a query input image and a search target set, but a new type of image capable of semantic variable control by actively and selectively changing high-level feature elements present in the image. There is an advantage that comparison processing is possible.

또한, 본 발명에 따른 영상 비교 처리는 데이터 차원 축소를 통해 연산 처리를 효율적으로 수행함으로써, 큰 사이즈를 가지는 영상 도메인에서의 비교 처리에 비해 보다 빠른 비교 처리가 가능하고 그 저장 데이터의 양도 줄일 수 있는 이점이 있다.In addition, the image comparison processing according to the present invention efficiently performs calculation processing through data dimension reduction, enabling faster comparison processing compared to comparison processing in the image domain having a large size and reducing the amount of stored data. There is an advantage.

또한, 본 발명은 영상과 관련한 검색, 유사도 비교, 분류 등이 필요한 다양한 분야에서 활용될 수 있는 이점이 있다.In addition, the present invention has an advantage that it can be used in various fields that require image search, similarity comparison, classification, and the like.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

도 1은 본 발명의 영상 비교 처리에 대한 전체적인 개념도를 나타낸다.
도 2는 본 발명의 일 실시예에 따른 영상 처리 장치(100)의 블록 구성도를 나타낸다.
도 3은 본 발명의 일 실시예에 따른 영상 처리 장치(100)에서 제어부(150)의 블록 구성도를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 영상 처리 방법의 순서도를 나타낸다.
도 5은 본 발명에 의한 비선형 차원 축소 처리를 사용한 영상 비교에 대한 개념도를 나타낸다.
도 6는 본 발명에 따른 얼굴 비교의 일 예를 나타낸다.
도 7는 본 발명에 따른 얼굴 비교의 다른 일 예를 나타낸다.
도 8은 본 발명에서 q와 s_i가 각각 다수 개의 표현 벡터로 표현된 후의 비교에 대한 일 예를 나타낸다.1 shows an overall conceptual diagram of image comparison processing according to the present invention.
2 shows a block diagram of an image processing device 100 according to an embodiment of the present invention.
3 shows a block diagram of the controller 150 in the image processing device 100 according to an embodiment of the present invention.
4 is a flowchart of an image processing method according to an embodiment of the present invention.
5 shows a conceptual diagram for image comparison using nonlinear dimensionality reduction processing according to the present invention.
6 shows an example of face comparison according to the present invention.
7 shows another example of face comparison according to the present invention.
8 shows an example of comparison after q and s _i are each expressed with a plurality of expression vectors in the present invention.

본 발명의 상기 목적과 수단 및 그에 따른 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.The above objects and means of the present invention and the effects thereof will become clearer through the following detailed description in conjunction with the accompanying drawings, and accordingly, those skilled in the art to which the present invention belongs can easily understand the technical idea of the present invention. will be able to carry out. In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 경우에 따라 복수형도 포함한다. 본 명세서에서, "포함하다", “구비하다”, “마련하다” 또는 “가지다” 등의 용어는 언급된 구성요소 외의 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.Terms used in this specification are for describing the embodiments and are not intended to limit the present invention. In this specification, singular forms also include plural forms in some cases unless otherwise specified in the text. In this specification, terms such as "comprise", "have", "provide" or "have" do not exclude the presence or addition of one or more other elements other than the mentioned elements.

본 명세서에서, “또는”, “적어도 하나” 등의 용어는 함께 나열된 단어들 중 하나를 나타내거나, 또는 둘 이상의 조합을 나타낼 수 있다. 예를 들어, “또는 B”“및 B 중 적어도 하나”는 A 또는 B 중 하나만을 포함할 수 있고, A와 B를 모두 포함할 수도 있다.In this specification, terms such as “or” and “at least one” may represent one of the words listed together, or a combination of two or more. For example, "or B" and "at least one of B" may include only one of A or B, or may include both A and B.

본 명세서에서, “예를 들어” 등에 따르는 설명은 인용된 특성, 변수, 또는 값과 같이 제시한 정보들이 정확하게 일치하지 않을 수 있고, 허용 오차, 측정 오차, 측정 정확도의 한계와 통상적으로 알려진 기타 요인을 비롯한 변형과 같은 효과로 본 발명의 다양한 실시 예에 따른 발명의 실시 형태를 한정하지 않아야 할 것이다.In this specification, descriptions following "for example" may not exactly match the information presented, such as cited characteristics, variables, or values, and tolerances, measurement errors, limits of measurement accuracy and other commonly known factors It should not be limited to the embodiments of the present invention according to various embodiments of the present invention with effects such as modifications including.

본 명세서에서, 어떤 구성요소가 다른 구성요소에 '연결되어’ 있다거나 '접속되어' 있다고 기재된 경우, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성 요소에 '직접 연결되어' 있다거나 '직접 접속되어' 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해될 수 있어야 할 것이다.In this specification, when a component is described as being 'connected' or 'connected' to another component, it may be directly connected or connected to the other component, but there may be other components in the middle. It should be understood that it may be On the other hand, when a component is referred to as 'directly connected' or 'directly connected' to another component, it should be understood that no other component exists in the middle.

본 명세서에서, 어떤 구성요소가 다른 구성요소의 '상에' 있다거나 '접하여' 있다고 기재된 경우, 다른 구성요소에 상에 직접 맞닿아 있거나 또는 연결되어 있을 수 있지만, 중간에 또 다른 구성요소가 존재할 수 있다고 이해되어야 할 것이다. 반면, 어떤 구성요소가 다른 구성요소의 '바로 위에' 있다거나 '직접 접하여' 있다고 기재된 경우에는, 중간에 또 다른 구성요소가 존재하지 않은 것으로 이해될 수 있다. 구성요소 간의 관계를 설명하는 다른 표현들, 예를 들면, '～사이에'와 '직접 ～사이에' 등도 마찬가지로 해석될 수 있다.In the present specification, when an element is described as being 'on' or 'in contact with' another element, it may be in direct contact with or connected to the other element, but another element may be present in the middle. It should be understood that On the other hand, if an element is described as being 'directly on' or 'directly in contact with' another element, it may be understood that another element in the middle does not exist. Other expressions describing the relationship between components, such as 'between' and 'directly between', can be interpreted similarly.

본 명세서에서, '제1', '제2' 등의 용어는 다양한 구성요소를 설명하는데 사용될 수 있지만, 해당 구성요소는 위 용어에 의해 한정되어서는 안 된다. 또한, 위 용어는 각 구성요소의 순서를 한정하기 위한 것으로 해석되어서는 안되며, 하나의 구성요소와 다른 구성요소를 구별하는 목적으로 사용될 수 있다. 예를 들어, '제1구성요소'는 '제2구성요소'로 명명될 수 있고, 유사하게 '제2구성요소'도 '제1구성요소'로 명명될 수 있다.In this specification, terms such as 'first' and 'second' may be used to describe various elements, but the elements should not be limited by the above terms. In addition, the above terms should not be interpreted as limiting the order of each component, and may be used for the purpose of distinguishing one component from another. For example, a 'first element' may be named a 'second element', and similarly, a 'second element' may also be named a 'first element'.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. Unless otherwise defined, all terms used in this specification may be used in a meaning commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하도록 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 영상 비교 처리에 대한 전체적인 개념도를 나타낸다.1 shows an overall conceptual diagram of image comparison processing according to the present invention.

본 발명은 영상들 간의 비교를 위해 영상 정보를 처리하는 기술에 관한 것이다. 도 1을 참조하면, 쿼리 영상 정보(q)와 탐색 대상 집합(S)(단, S={s₁, s₂, …s_i, …s_N}, i∈{1, 2, …N}, N은 2이상의 자연수)가 주어진다면, 이를 본 발명의 영상 정보 처리를 통해 q'와 S'(단, S'={s₁ ^', s₂ ^', …s_i ^', …s_N ^'}, i∈{1, 2, …N})라는 결과를 출력할 수 있다. 이때, 쿼리 영상 정보(q)는 쿼리로 요청된 영상 정보이며, 탐색 대상 집합(S)은 쿼리 영상 정보(q)와 비교되기 위한 탐색 대상 영상 정보들(s₁ 내지 s_N)의 집합이다.The present invention relates to a technique for processing image information for comparison between images. Referring to FIG. 1, query image information (q) and a search target set (S) (however, S={s ₁ , s ₂ , …s _i , …s _N }, i∈{1, 2, …N} , N is a natural number of 2 or more), q' and S' (however, S'={s ₁ ^' , s ₂ ^' , …s _i ^' , …s _N ^' } through the image information processing of the present invention) , i ∈ {1, 2, ... N}) can be output. In this case, the query image information (q) is image information requested by the query, and the search target set (S) is a set of search target image information (s ₁ to s _N ) to be compared with the query image information (q).

또한, 쿼리 영상 정보(q)와 q'는 그 데이터의 차원(data dimension)이 같거나 다를 수 있으며, 탐색 대상 집합(S)에 존재하는 각 영상 정보(s₁ 내지 s_N)는 S'의 각 원소(s₁' 내지 s_N')와 그 데이터의 차원이 같거나 다를 수 있다. 일례로, 입력된 영상이 100x100이면 출력된 영상은 100x100, 1000x1000, 또는 10x10이 될 수 있다. 정리하면, 영상 정보 처리를 통해, 입력된 영상에 대한 출력된 영상의 크기는 같을 수도 있고, 커질 수도 있고, 작아질 수도 있다.In addition, the query image information (q) and q' may have the same or different data dimensions, and each image information (s ₁ to s _N ) existing in the search target set (S) is S' Dimensions of each element (s ₁ 'to s _N ') and its data may be the same or different. For example, if the input image is 100x100, the output image may be 100x100, 1000x1000, or 10x10. In summary, through image information processing, the size of an output image with respect to an input image may be the same, may be increased, or may be reduced.

특히, 출력 데이터의 차원이 입력 데이터의 크기에 비해 줄어드는 영상 정보 처리(이하, “차원 축소 처리”라 지칭함)를 수행하는 것은 본 발명에서 필수적인 부분이다. 이는 차원 축소 처리를 통해, 연산처리의 효율성을 높이고, 필요 저장공간도 줄어들 수 있기 때문이다.In particular, it is an essential part of the present invention to perform image information processing in which the dimension of output data is reduced compared to the size of the input data (hereinafter referred to as “dimensional reduction processing”). This is because the efficiency of calculation processing can be increased and the required storage space can be reduced through dimensionality reduction processing.

즉, 차원 축소 처리를 통해, q'는 q에 비해 그 데이터의 차원이 축소된 정보를 가질 수 있으며, S'의 원소(즉, s_i')는 S의 원소(s_i)에 비해 그 데이터의 차원이 축소된 정보를 가질 수 있다.That is, through the dimensionality reduction process, q' may have information whose dimension of the data is reduced compared to q, and the element of S' (ie, s _i ') is the data compared to the element of S (s _i ). The dimension of may have reduced information.

가령, 차원 축소 처리를 통해, A라는 영상 정보가 데이터의 차원이 축소된 A'로 출력될 수 있다. 이때, A는 a개 원소를 가지는 벡터, 2차원 매트릭스 또는 3차원 매트릭스로 그 정보가 표현될 수 있다. 또한, A'는 a'개(단, a>a') 원소를 가지는 벡터, 2차원 매트릭스 또는 3차원 매트릭스로 그 정보가 표현될 수 있다. 이러한 경우에, 차원 축소 처리를 통해, 3차원 매트릭스의 A가 그 보다 작은 규모의 3차원 매트릭스, 2차원 매트릭스 또는 벡터인 A'로 가공됨으로써, 그 데이터 차원이 축소될 수 있다. 또한, 차원 축소 처리를 통해, 2차원 매트릭스의 A가 그 보다 작은 규모의 2차원 매트릭스 또는 벡터인 A'로 가공됨으로써, 그 데이터 차원이 축소될 수 있다.For example, through dimensionality reduction processing, image information A may be output as A′ in which the dimensionality of data is reduced. At this time, the information of A may be expressed as a vector having a number of elements, a 2D matrix, or a 3D matrix. In addition, the information of A' may be expressed as a vector having a' number of elements (where a>a'), a 2D matrix, or a 3D matrix. In this case, through dimensionality reduction processing, A of the 3D matrix is processed into a 3D matrix, 2D matrix or vector A' of a smaller scale, so that the data dimension can be reduced. In addition, through dimensionality reduction processing, A of the 2D matrix is processed into A', which is a 2D matrix or vector of a smaller scale, so that the data dimension can be reduced.

다만, 본 발명에서 이러한 차원 축소 처리 후에, 차원 축소된 정보의 차원을 다시 확장(가령, 최종 출력된 영상의 크기가 입력된 영상의 크기와 같게 하거나 더 크게 변환)하도록 영상 정보 처리(이하, “차원 확장 처리”라 지칭함)를 수행하는 것은 선택적(Optional)으로 추가될 수 있다.However, in the present invention, after the dimension reduction process, image information is processed (hereinafter referred to as " Performing dimension expansion processing”) may be optionally added.

한편, 본 발명은 쿼리 영상 정보(q)를 탐색 대상 집합(S)의 원소(element)들(s₁ 내지 s_N)과 각각 비교하도록 처리할 수 있다. 다만, 본 발명은 각 s_i에 대한 q를 비교를 직접적으로 수행하지 않고, q와 s_i에 대한 차원 축소 처리를 통해 q'와 s_i'를 생성한 후, 그 차원 축소된 q'와 s_i'를 이용하여 각 s_i에 대한 q와의 비교를 간접적으로 수행한다. 즉, 본 발명은 q'와 s_i'를 비교하거나, q' 및 s_i' 중에 어느 하나를 추가 가공(영상 정보 처리)하여 비교함으로써, 각 s_i에 대한 q와의 간접 비교를 수행할 수 있다.Meanwhile, in the present invention, query image information (q) may be compared with elements (s ₁ to s _N ) of the search target set (S), respectively. However, the present invention does not directly compare q for each s _i , but generates q' and s _i ' through dimension reduction processing for q and s _i , and then calculates the dimensionally reduced q' and s Comparison with q for each s _i is performed indirectly using _i '. That is, in the present invention, indirect comparison with q for each s _i can be performed by comparing q' and s _i ', or comparing one of q' and s _i ' with additional processing (image information processing). .

도 1에서, 영상 정보 처리를 통해 출력된 q'과 S'은 서로 시맨틱 수준의 비교가 용이하도록 변형된 것이다. 다만, q는 명확한 설명을 위해 1개를 표시했으나 1개 이상일 수도 있다.In FIG. 1, q' and S' output through image information processing are modified to facilitate comparison of semantic levels with each other. However, q is indicated as one for clarity, but may be more than one.

쿼리 영상 정보(q)와 탐색 대상 집합(S)의 원소들(s₁ 내지 s_N) 간의 상호 유사도 계산은 종래에 많은 응용에서 필수적으로 수행하는 과정이었다. 이러한 과정과 관련하여, 본 발명에서는 영상 정보 처리를 통한 간접 비교를 수행함으로써, <쿼리 영상 정보(q)의 오염 요소 제거>, <탐색 대상 집합(S)에 존재하는 오염 요소 제거>, <탐색 대상 집합(S)의 각 영상 정보(s₁ 내지 s_N)에 쿼리 영상 정보(q)에 존재하는 오염 요소의 주입>, <쿼리 영상 정보(q)와 탐색 대상 집합(S) 사이의 시맨틱 수준의 비교 지원>, <쿼리 영상 정보(q)와 탐색 대상 집합(S) 사이의 시맨틱 수준의 능동적인 변인 통제 지원> 등이 가능하도록 한다. 이들에 대한 상세한 설명은 후술하도록 한다.Calculation of the mutual similarity between the query image information (q) and the elements (s ₁ to s _N ) of the search target set (S) has conventionally been a necessary process in many applications. In relation to this process, in the present invention, by performing indirect comparison through image information processing, <removal of contaminating elements of query image information (q)>, <removal of contaminating elements present in the search target set (S)>, and <search Injection of contamination elements present in query image information (q) into each image information (s ₁ to s _N ) of the target set (S)>, <Semantic level between query image information (q) and search target set (S) , <support for comparison of query image information (q) and search target set (S)>, <support for active variable control at semantic level>. A detailed description of these will be described later.

도 2는 본 발명의 일 실시예에 따른 영상 처리 장치(100)의 블록 구성도를 나타낸다.2 shows a block diagram of an image processing device 100 according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 영상 처리 장치(100)는 영상들 간의 비교를 위해 영상 정보를 처리하는 장치로서, 컴퓨팅(computing)이 가능한 전자 장치 또는 컴퓨팅 네트워크일 수 있다.The image processing device 100 according to an embodiment of the present invention is a device that processes image information for comparison between images, and may be an electronic device capable of computing or a computing network.

예를 들어, 전자 장치는 데스크탑 PC(desktop personal computer), 랩탑 PC(laptop personal computer), 태블릿 PC(tablet personal computer), 넷북 컴퓨터(netbook computer), 워크스테이션(workstation), PDA(personal digital assistant), 스마트폰(smartphone), 스마트패드(smartpad), 또는 휴대폰(mobile phone), 등일 수 있으나, 이에 한정되는 것은 아니다.For example, the electronic device includes a desktop personal computer, a laptop personal computer, a tablet personal computer, a netbook computer, a workstation, and a personal digital assistant (PDA). , a smart phone (smartphone), a smart pad (smartpad), or a mobile phone (mobile phone), etc., but is not limited thereto.

이러한 영상 처리 장치(100)는, 도 2에 도시된 바와 같이, 입력부(110), 통신부(120), 디스플레이(130), 메모리(140) 및 제어부(150)를 포함할 수 있다.As shown in FIG. 2 , the image processing device 100 may include an input unit 110, a communication unit 120, a display 130, a memory 140, and a control unit 150.

입력부(110)는 다양한 사용자의 입력에 대응하여, 입력데이터를 발생시키며, 다양한 입력수단을 포함할 수 있다. 예를 들어, 입력부(110)는 키보드(key board), 키패드(key pad), 돔 스위치(dome switch), 터치 패널(touch panel), 터치 키(touch key), 터치 패드(touch pad), 마우스(mouse), 메뉴 버튼(menu button) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The input unit 110 generates input data in response to various user inputs and may include various input means. For example, the input unit 110 includes a keyboard, a key pad, a dome switch, a touch panel, a touch key, a touch pad, and a mouse. (mouse), menu button (menu button), etc. may be included, but is not limited thereto.

통신부(120)는 다른 장치와의 통신을 수행하는 구성이다. 예를 들어, 통신부(120)는 5G(5th generation communication), LTE-A(long term evolution-advanced), LTE(long term evolution), 블루투스, BLE(bluetooth low energe), NFC(near field communication), 와이파이(WiFi) 통신 등의 무선 통신을 수행하거나, 케이블 통신 등의 유선 통신을 수행할 수 있으나, 이에 한정되는 것은 아니다. 가령, 통신부(120)는 영상, 머신 러닝 모델 등을 타 장치로부터 수신할 수 있으며, 영상 정보 처리 결과(차원 축소 처리 결과, 추가 가공 처리 결과, 비교 처리 결과 등), 머신 러닝 모델 등을 타 장치로 송신할 수 있다. 이때, 머신 러닝 모델 등을 타 장치로부터 수신하는 경우는 후술하는 영상 처리 방법에서 머신 러닝 모델에 대한 학습을 타 장치에서 수행하는 경우일 수 있다.The communication unit 120 is a component that communicates with other devices. For example, the communication unit 120 may perform 5th generation communication (5G), long term evolution-advanced (LTE-A), long term evolution (LTE), Bluetooth, bluetooth low energy (BLE), near field communication (NFC), Wireless communication such as WiFi communication or wired communication such as cable communication may be performed, but is not limited thereto. For example, the communication unit 120 may receive images, machine learning models, etc. from other devices, and image information processing results (dimensional reduction processing results, additional processing processing results, comparison processing results, etc.), machine learning models, etc. can be sent to In this case, when a machine learning model or the like is received from another device, it may be a case in which another device performs learning on a machine learning model in an image processing method described later.

디스플레이(130)는 다양한 영상 데이터를 화면으로 표시하는 것으로서, 비발광형 패널이나 발광형 패널로 구성될 수 있다. 예를 들어, 디스플레이(130)는 액정 디스플레이(LCD; liquid crystal display), 발광 다이오드(LED; light emitting diode) 디스플레이, 유기 발광 다이오드(OLED; organic LED) 디스플레이, 마이크로 전자기계 시스템(MEMS; micro electro mechanical systems) 디스플레이, 또는 전자 종이(electronic paper) 디스플레이 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 또한, 디스플레이(130)는 입력부(110)와 결합되어 터치 스크린(touch screen) 등으로 구현될 수 있다.The display 130 displays various image data on a screen, and may be composed of a non-emissive panel or a light-emitting panel. For example, the display 130 may include a liquid crystal display (LCD), a light emitting diode (LED) display, an organic LED (OLED) display, and a micro electromechanical system (MEMS). mechanical systems) display, or electronic paper (electronic paper) display, etc. may be included, but is not limited thereto. Also, the display 130 may be combined with the input unit 110 and implemented as a touch screen or the like.

메모리(140)는 영상 처리 장치(100)의 동작에 필요한 각종 정보를 저장한다. 저장 정보로는 영상, 영상 정보 처리 결과(차원 축소 처리 결과, 추가 가공 처리 결과, 비교 처리 결과 등), 머신 러닝 모델, 후술할 영상 처리 방법에 관련된 프로그램 정보 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 예를 들어, 메모리(140)는 그 유형에 따라 하드디스크 타입(hard disk type), 마그네틱 매체 타입(Sagnetic media type), CD-ROM(compact disc read only memory), 광기록 매체 타입(Optical Media type), 자기-광 매체 타입(Sagneto-optical media type), 멀티미디어 카드 마이크로 타입(Sultimedia card micro type), 플래시 저장부 타입(flash memory type), 롬 타입(read only memory type), 또는 램 타입(random access memory type) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 또한, 메모리(140)는 그 용도/위치에 따라 캐시(cache), 버퍼, 주기억장치, 또는 보조기억장치이거나 별도로 마련된 저장 시스템일 수 있으나, 이에 한정되는 것은 아니다.The memory 140 stores various types of information necessary for the operation of the image processing device 100 . The stored information may include images, image information processing results (dimensional reduction processing results, additional processing processing results, comparison processing results, etc.), machine learning models, program information related to image processing methods to be described later, etc., but is not limited thereto. not. For example, the memory 140 may be of a hard disk type, a magnetic media type, a compact disc read only memory (CD-ROM), or an optical media type according to its type. ), Sagneto-optical media type, Sultimedia card micro type, flash memory type, read only memory type, or RAM type (random access memory type), etc., but is not limited thereto. In addition, the memory 140 may be a cache, a buffer, a main memory, an auxiliary memory, or a separately provided storage system depending on its purpose/location, but is not limited thereto.

제어부(150)는 영상 처리 장치(100)의 다양한 제어 동작을 수행할 수 있다. 즉, 제어부(150)는 후술할 영상 처리 방법의 수행을 제어할 수 있으며, 영상 처리 장치(100)의 나머지 구성, 즉 입력부(110), 통신부(120), 디스플레이(130), 메모리(140) 등의 동작을 제어할 수 있다. 예를 들어, 제어부(150)는 하드웨어인 프로세서(processor) 또는 해당 프로세서에서 수행되는 소프트웨어인 프로세스(process) 등을 포함할 수 있으나, 이에 한정되는 것은 아니다.The controller 150 may perform various control operations of the image processing device 100 . That is, the control unit 150 may control the execution of an image processing method to be described later, and the rest of the components of the image processing device 100, that is, the input unit 110, the communication unit 120, the display 130, and the memory 140 etc. can be controlled. For example, the control unit 150 may include, but is not limited to, a processor that is hardware or a process that is software that is executed in the corresponding processor.

도 3은 본 발명의 일 실시예에 따른 영상 처리 장치(100)에서 제어부(150)의 블록 구성도를 나타낸다.3 shows a block diagram of the controller 150 in the image processing device 100 according to an embodiment of the present invention.

제어부(150)는 본 발명의 일 실시예에 따른 영상 처리 방법의 수행을 제어하며, 도 3에 도시된 바와 같이, 비선형 차원 축소부(151), 시맨틱 변인 추출부(152), 시맨틱 변인 변환부(153), 비선형 차원 확장부(154), 학습부(155) 및 비교부(156)를 포함할 수 있다. 예를 들어, 비선형 차원 축소부(151), 시맨틱 변인 추출부(152), 시맨틱 변인 변환부(153), 비선형 차원 확장부(154), 학습부(155) 및 비교부(156)는 제어부(150)의 하드웨어 구성이거나, 제어부(150)에서 수행되는 소프트웨어인 프로세스일 수 있으나, 이에 한정되는 것은 아니다.The control unit 150 controls the execution of an image processing method according to an embodiment of the present invention, and as shown in FIG. 3, a nonlinear dimension reduction unit 151, a semantic variable extraction unit 152, and a semantic variable conversion unit. (153), a nonlinear dimension expansion unit 154, a learning unit 155, and a comparison unit 156. For example, the nonlinear dimension reduction unit 151, the semantic variable extraction unit 152, the semantic variable conversion unit 153, the nonlinear dimension expansion unit 154, the learning unit 155, and the comparison unit 156 may include a control unit ( 150) or a software process executed by the control unit 150, but is not limited thereto.

이하, 본 발명에 따른 영상 처리 방법에 대해 보다 상세하게 설명하도록 한다.Hereinafter, an image processing method according to the present invention will be described in more detail.

도 4는 본 발명의 일 실시예에 따른 영상 처리 방법의 순서도를 나타낸다.4 is a flowchart of an image processing method according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 영상 처리 방법은 영상들 간의 비교를 위해 영상 정보를 처리하는 방법으로서, 도 4에 도시된 바와 같이, S201 및 S202를 포함할 수 있다.An image processing method according to an embodiment of the present invention is a method of processing image information for comparison between images, and may include S201 and S202 as shown in FIG. 4 .

즉, 비선형 차원 축소부(151)는 차원 축소 처리를 수행하여 q'와 s_i'를 생성할 수 있다(S201). 이후, 비교부(152)는 차원 축소된 q'와 s_i'를 이용하여, 각 s_i에 대한 q와의 비교를 간접적으로 수행할 수 있다(S202).That is, the nonlinear dimensionality reduction unit 151 may generate q' and s _i 'by performing dimensionality reduction processing (S201). Thereafter, the comparator 152 may indirectly perform a comparison with q for each s _i by using the dimensionally reduced q' and s _i ' (S202).

구체적으로, S201에서, 비선형 차원 축소부(151)는 쿼리 영상 정보(q)와 N개(단, N은 2이상의 자연수)의 탐색 대상 영상 정보(s₁ 내지 s_N)(이하, s₁ 내지 s_N를 s_i로 지칭)에 대해 각각 데이터의 차원 축소를 수행하여, q'와 s_i'를 생성할 수 있다. 이때, q'와 s_i'는 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타낸다. 이때, 속성은 영상에 포함된 다양한 특성에 대한 정보일 수 있다.Specifically, in S201, the nonlinear dimension reduction unit 151 converts the query image information (q) and N search target image information (s 1 to s N ) (hereinafter, s ₁ to s ₁ to s _N ), where N is a natural number equal to or greater than 2. s _N is referred to as s _i ), q' and s _i ' may be generated by performing dimensionality reduction of the data, respectively. In this case, q' and s _i ' represent at least one element of the attribute of the corresponding image. In this case, the attribute may be information about various characteristics included in the image.

가령, 영상 내에 포함된 객체에 대한 정보, 그 객체 내에 포함된 보다 소규모의 세부 객체에 대한 정보 등이 속성일 수 있다. 일례로, 영상에 “사람 얼굴”이 포함되어 있다면, 사람 얼굴의 객체에 대한 정보와, 그 사람 얼굴의 객체의 세부 특성에 대한 정보(피부색, 눈동자 색, 머리카락 색, 성별, 나이 등)와, 그 사람 얼굴의 세부 객체에 대한 정보(눈, 코, 입술, 귀, 이마, 턱선 등) 등이 속성에 해당할 수 있다.For example, information on an object included in an image, information on a smaller detailed object included in the object, and the like may be attributes. For example, if a “human face” is included in the image, information about the object of the human face and detailed characteristics of the object of the human face (skin color, eye color, hair color, gender, age, etc.) Information on detailed objects of the person's face (eyes, nose, lips, ears, forehead, jawline, etc.) may correspond to attributes.

비선형 차원 축소부(151)를 통해 출력된 결과는 입력에 비해 비선형적으로 그 데이터 차원이 축소된 정보를 가진다.The result output through the nonlinear dimension reduction unit 151 has information whose data dimension is nonlinearly reduced compared to the input.

가령, 차원 축소된 q'와 s_i'는 다수의 원소를 갖는 벡터로 표현될 수 있으며, 이를 “표현 벡터(Representation vector)”라고 지칭할 수 있다. 표현 벡터는 벡터의 차원에 해당하는 대수적인 공간을 나타내는데 이를 “잠재 공간(Latent space)”라 지칭할 수 있다. 즉, 비선형 차원 축소부(151)는 q 또는 s_i의 입력 영상(흑백 혹은 컬러)을 입력 받아 표현 벡터 혹은 잠재 공간을 생성하는 역할을 수행할 수 있다.For example, dimensionally reduced q' and s _i ' may be expressed as a vector having a plurality of elements, which may be referred to as a “representation vector”. The expression vector represents an algebraic space corresponding to the dimension of the vector, which may be referred to as “latent space”. That is, the nonlinear dimension reduction unit 151 may play a role of generating an expression vector or a latent space by receiving an input image (black and white or color) of q or s _i .

통상적으로, 영상은 벡터, 2차원 매트릭스, 3차원 매트릭스로 나타낼 수 있으며, 본 발명에서는 특정한 형식에 제한을 두지는 않는다. 일례로, 가로가 100픽셀, 세로가 100픽셀, RGB 컬러 채널이 3개인 컬러영상은 100x100x3이라는 3차원 매트릭스로 표현할 수 있고, 이 값이 비선형 차원 축소부(151)를 통해 2x1 크기의 표현 벡터로 축소되었다면, 결과적으로 100x100x3 영상이 2차원의 잠재 공간에 인코딩(혹은 매핑)되었다고 표현할 수 있다. 물론, 100x100x3 영상은 표현 벡터 외에도, 그 보다 원소 개수가 작은 3차원 매트릭스 또는 2차원 매트릭스도로 표현될 수도 있다.In general, an image can be represented by a vector, a 2D matrix, or a 3D matrix, and the present invention does not limit a specific format. For example, a color image with 100 pixels in width, 100 pixels in height, and three RGB color channels can be expressed as a 3D matrix of 100x100x3, and this value is converted into a 2x1 expression vector through the nonlinear dimension reduction unit 151. If it is scaled down, it can be expressed that a 100x100x3 image is encoded (or mapped) in a two-dimensional latent space as a result. Of course, a 100x100x3 image may be expressed as a 3D matrix or a 2D matrix having fewer elements than the expression vector.

여기서, 비선형 차원 축소부(151)는 차원 축소 처리, 즉 비선형 차원 축소(Nonlinear dimensionality reduction)를 수행한다. 비선형 차원 축소 과정을 'g()'라는 비선형 함수로 나타낼 수 있다. q가 입력되는 경우에 q(q)라는 저차원 표현 벡터를 출력할 수 있다. 물론, 차원 축소 처리를 위한 함수인 g()(이하, “차원 축소 함수”라 지칭함)는 다수의 함수들이 합성된 g()=g₁(g₂(g₃(…와 같은 형태로도 표현될 수 있다. 이때, 각각의 부분 함수들은 선형 혹은 비선형 함수가 될 수 있으며, 따라서 최종적으로 g()는 비선형 함수가 된다. 즉, g()는 q를 입력 받으면 차원 축소된 q'를 출력하고, s_i를 입력 받으면 차원 축소된 s_i'를 출력한다.Here, the nonlinear dimensionality reduction unit 151 performs dimensionality reduction processing, that is, nonlinear dimensionality reduction. The nonlinear dimension reduction process can be expressed as a nonlinear function called 'g()'. When q is input, a low-dimensional expression vector called q(q) can be output. Of course, g() (hereinafter referred to as “dimensional reduction function”), which is a function for dimensionality reduction processing, is also expressed in the form of g()=g ₁ (g ₂ (g ₃ (... At this time, each subfunction can be a linear or nonlinear function, so finally g() becomes a nonlinear function. That is, g() receives dimensionality reduction q' as input and outputs , s _i as input, it outputs dimensionally reduced s _i '.

특히, 차원 축소 처리는 다양한 차원 축소 알고리즘을 이용하여 수행될 수 있다. 예를 들어, 차원 축소 알고리즘은 Laplacian eigenmaps, Isomap, LLE(Local Linear Embedding), t-SNE(Stochastic Neighbor Embedding) 등일 수 있으나, 이에 한정되는 것은 아니다.In particular, dimensionality reduction processing may be performed using various dimensionality reduction algorithms. For example, the dimension reduction algorithm may be Laplacian eigenmaps, Isomap, Local Linear Embedding (LLE), Stochastic Neighbor Embedding (t-SNE), etc., but is not limited thereto.

그 외에도, 차원 축소 처리는 딥 러닝(deep learning) 기법 등의 머신 러닝(machine learning) 기법을 이용하여 학습된 머신 러닝 모델을 이용하여 수행될 수 있다. 이 경우, 학습부(155)가 머신 러닝 기법에 따른 학습을 수행하여 머신 러닝 모델을 생성할 수 있다. 예를 들어, 딥 러닝(deep learning) 기법은 오토인코더(Autoencoder), 변이형 오토인코더(VAE, Variational Autoencoder), 생성적 적대 신경망(Generative Adversarial Network) 등일 수 있으나, 이에 한정되는 것은 아니다.In addition, dimension reduction processing may be performed using a machine learning model trained using a machine learning technique such as a deep learning technique. In this case, the learning unit 155 may generate a machine learning model by performing learning according to a machine learning technique. For example, a deep learning technique may be an autoencoder, a variational autoencoder (VAE), a generative adversarial network, or the like, but is not limited thereto.

머신 러닝 모델은 입력 데이터 및 출력 데이터 쌍(데이터 셋)의 학습 데이터를 통해 머신 러닝 기법에 따라 학습된 모델이다. 즉, 머신 러닝 모델은 입력 데이터가 입력되는 입력층과, 출력 데이터가 출력되는 출력층과, 입력층 및 출력층의 사이에 마련되어 입력 데이터와 출력 데이터의 관계에 대한 함수를 가지는 은닉층을 포함한다. 이때, 은닉층은 다수의 층으로도 구성될 수 있다. 즉, 머신 러닝 모델에 입력 데이터가 입력되는 경우, 해당 함수에 따른 출력 데이터가 출력될 수 있다.A machine learning model is a model learned according to a machine learning technique through training data of a pair of input data and output data (data set). That is, the machine learning model includes an input layer into which input data is input, an output layer through which output data is output, and a hidden layer provided between the input layer and the output layer and having a function for the relationship between the input data and the output data. At this time, the hidden layer may also be composed of a plurality of layers. That is, when input data is input to the machine learning model, output data according to a corresponding function may be output.

이때, 머신 러닝 모델을 “인공신경망(neural network)”라 지칭하기도 한다. 인공신경망 내의 각 레이어는 적어도 하나 이상의 필터로 이루어지며, 각 필터는 가중치(weight)의 매트릭스(matrix), 노드의 매트릭스를 가진다. 즉, 해당 필터의 매트릭스에서 각 원소(픽셀 또는 노드)는 가중치의 값에 해당할 수 있다.At this time, the machine learning model is also referred to as a “neural network”. Each layer in the artificial neural network is composed of at least one filter, and each filter has a matrix of weights and a matrix of nodes. That is, each element (pixel or node) in the matrix of the corresponding filter may correspond to a weight value.

가령, 오토인코더는 일종의 생성 모델이며, 그 출력되는 결과물은 그 학습 방법에 따라 영상이 될 수도 있다. 이때, 생성 모델이란 소정의 파라미터 값을 입력하면 이에 대응하는 데이터를 특정 확률분포로 출력하는 모델로 정의할 수 있다. 예를 들어, 1차원 가우시안 확률분포를 갖는 생성 모델은 평균값을 중심으로 종모양의 발생 분포를 갖는 데이터들을 생성할 수 있다. 또한, 다수개의 2차원 가우시안 함수의 합으로 표현된 생성 모델이 있다면 생성된 데이터들은 마치 여러 개의 산봉오리를 갖는 분포를 보일 것이다.For example, an autoencoder is a kind of generative model, and its output may be an image according to its learning method. In this case, the generative model may be defined as a model that outputs data corresponding to a predetermined parameter value with a specific probability distribution. For example, a generative model having a one-dimensional Gaussian probability distribution may generate data having a bell-shaped occurrence distribution centered on an average value. In addition, if there is a generative model expressed as the sum of a plurality of two-dimensional Gaussian functions, the generated data will show a distribution with several peaks.

오토인코더는 통상적으로 데이터를 압축하는 인코더 은닉층들과 데이터를 압축해제하는 디코더 은닉층들, 인코더와 디코더 사이에 존재하는 중심층으로 구성된다. 이때, 은닉층의 중심은 입력층 및 출력층 보다 노드 개수가 줄어든다. 인코더와 디코더의 형태가 서로 대칭인 대칭형 오토인코더의 경우에는 은닉층이 2m+1개(단, m은 0이상 자연수)의 다수 층을 가지며, 그 중심층은 다수 층의 중심에 위치한 층에 해당하며, 각 층은 다수의 노드(즉, 매트릭스)를 포함한다. 물론 일반적인 오토인코더는 인코더 역할을 수행하는 은닉층과 디코더 역할을 수행하는 은닉층의 형태가 반드시 대칭일 필요는 없다.An autoencoder is usually composed of encoder hidden layers that compress data, decoder hidden layers that decompress data, and a central layer between the encoder and the decoder. At this time, the number of nodes in the center of the hidden layer is smaller than that of the input layer and the output layer. In the case of a symmetric autoencoder in which the shapes of the encoder and decoder are symmetric to each other, the hidden layer has a plurality of layers of 2m + 1 (where m is a natural number greater than or equal to 0), and the central layer corresponds to a layer located at the center of the plurality of layers , each layer contains a number of nodes (ie matrices). Of course, in a general autoencoder, the hidden layer that serves as an encoder and the hidden layer that serves as a decoder do not necessarily have to be symmetrical.

그리고 변이형 오토인코더(VAE)의 경우에는 은닉층 내부에 통계적 기법을 적용하여 단순한 오토인코더에 비해 잠재공간이 연속적이고 구조적으로 학습되는 특징이 있다.In addition, in the case of the variant autoencoder (VAE), statistical techniques are applied to the inside of the hidden layer, and the latent space is learned continuously and structurally compared to simple autoencoders.

오토인코더는 통상적으로 인코딩을 통해 차원축소를 수행한 후 이를 디코딩하여 원래 차원의 크기로 확장하는 구조를 갖는다. 즉, 오토인코더의 은닉층(중심층)은 입력층 및 출력층에 비해 줄어든 개수의 노드를 가진다.An autoencoder usually has a structure in which dimensionality reduction is performed through encoding and then decoded and expanded to the size of the original dimension. That is, the hidden layer (center layer) of the autoencoder has a reduced number of nodes compared to the input layer and the output layer.

가령, 오토인코더의 경우, X를 입력으로 하여 X가 출력되도록 학습시키면, 은닉층에 X에 대한 인코딩이 수행된다. 이와 같이 학습된 오토인코더는 이후에 X + delta라는 값이 입력되더라도 X와 거의 유사한 출력을 생성하는 특징이 있다. 이를 수학적으로는 임의의 다차원 정보가 오토인코더를 통해 다차원 매니폴드 상에 투영되어 결과가 출력된다고도 표현할 수 있지만, 응용 관점에서 보면 delta는 노이즈, 가려짐 영역, 얼룩 등으로 생각할 수 있다. 즉, X + delta는 영상에 노이즈, 가려짐 영역, 얼룩 등이 존재하는 것이고, 이를 입력하면 이것들이 제거된 깨끗한 출력 X를 얻을 수 있다.For example, in the case of an autoencoder, if X is input and trained to output X, encoding of X is performed in the hidden layer. The autoencoder learned in this way has a feature of generating an output almost similar to X even if a value of X + delta is input thereafter. Mathematically, it can also be expressed that arbitrary multi-dimensional information is projected onto a multi-dimensional manifold through an autoencoder and the result is output. That is, X + delta means that noise, occluded areas, stains, etc. exist in the image, and if these are input, a clean output X with these removed can be obtained.

즉, 동일한 입출력을 이용하여 학습된 오토인코더에 q를 입력시키면, 그 은닉층에는 q에 대한 인코딩의 각 노드 값(즉, 가중치 값)이 설정(확정)된다. 즉, 은닉층의 중심층에는 차원 축소 처리된 q'에 관련된 노드 값이 설정된다. 마찬가지로, 동일한 입출력을 이용하여 학습된 오토인코더에 s_i를 입력시키면, 그 은닉층의 중심층에는 차원 축소 처리된 s_i'에 관련된 노드 값이 설정된다. 이와 같이 설정된 각 노드 값의 매트릭스를 그대로 이용하거나 가공하여 이용함으로써, q' 및 s_i'의 표현 벡터를 도출할 수 있다.That is, when q is input to the learned autoencoder using the same input/output, each node value (ie, weight value) of encoding for q is set (confirmed) in the hidden layer. That is, in the central layer of the hidden layer, a node value related to dimensionality reduction process q' is set. Similarly, when s _i is input to the learned autoencoder using the same input/output, a node value related to the dimensionally reduced s _i 'is set in the central layer of the hidden layer. Expression vectors of q' and s _i ' can be derived by using the matrix of each node value set in this way as it is or after processing it.

한편, 상술한 오토인코더의 특징은 변이형 오토인코더(VAE, Variational Autoencoder), 생성적 적대 신경망(Generative Adversarial Network)에도 유사하게 적용될 수 있다.Meanwhile, the characteristics of the above-described autoencoder can be similarly applied to a variational autoencoder (VAE) and a generative adversarial network.

비선형 차원 축소부(151)를 통해 최종 축소된 차원은 가변적인데, 표현 벡터의 차원이 작을수록 높은 정보 압축을 수행하지만, 그 만큼 원래의 영상을 제대로 복원할 수 없다는 문제가 발생할 수 있다. 차원 축소 과정에서 정보가 손실되는 것이다. 반면에, 표현 벡터를 원본 영상의 차원과 유사하게 할수록 정보의 복원 능력은 좋아지지만 압축이 거의 되지 않으므로, 의미 있는 속성(특징)들을 추출할 수 없게 된다. 이는 trade-off 관계에 있다고 볼 수 있으며 시행착오와 다양한 학습, 응용의 종류에 따라 표현 벡터의 차원을 결정하는 것이 바람직하다.The dimension finally reduced by the nonlinear dimension reduction unit 151 is variable, and the smaller the dimension of the expression vector, the higher information compression is performed, but a problem may occur that the original image cannot be properly restored. Information is lost in the process of dimensionality reduction. On the other hand, the more similar the expression vector is to the dimension of the original image, the better the information restoration ability, but almost no compression, making it impossible to extract meaningful attributes (features). This can be seen as a trade-off relationship, and it is desirable to determine the dimension of the expression vector according to trial and error, various types of learning, and application.

이와 같이 도출된 q' 및 s_i'의 표현 벡터는 해당 영상이 가지는 속성에 대하여 적어도 하나의 원소로 나타낸다. 일례로, q 및 s_i가 “사람 얼굴”에 관련된 영상인 경우, q' 및 s_i'의 표현 벡터는 “피부색”, “눈동자 색”, “머리카락 색”, “성별”, “나이”, “눈”, “코”, “입술”, “귀”, “이마”, 또는 “턱선” 등의 속성에 관련된 원소를 포함할 수 있다. 즉, q' 및 s_i'의 표현 벡터의 특정 위치의 적어도 하나의 원소가 “피부색”의 속성에 관련된 원소일 경우, 그 원소 값이 변경되면 그 표현 벡터가 영상으로 복원될 때 원래의 “피부색”이 변경될 수 있다. 이러한 속성 추출 기능은 시맨틱 변인 추출부(152)에 의해 수행될 수 있다.The expression vector of q' and s _i ' derived in this way represents at least one element with respect to the attribute of the corresponding image. For example, if q and s _i are images related to “human face”, expression vectors of q’ and s _i ’ are “skin color”, “eye color”, “hair color”, “gender”, “age”, It may contain elements related to attributes such as “eyes”, “nose”, “lips”, “ears”, “forehead”, or “jawline”. That is, if at least one element at a specific position of the expression vectors of q' and s _i 'is an element related to the attribute of “skin color”, when the value of the element is changed, when the expression vector is restored as an image, the original “skin color” ” is subject to change. This attribute extraction function may be performed by the semantic variable extractor 152 .

이때, 속성은 다수의 세부 속성을 가질 수 있다. 일례로, 각 속성이 {나이, 성별, 머리모양, 옷색깔, 얼굴 가려짐 패턴, 노이즈 패턴}에 대한 것이라고 한다면, 각 속성은 나이:{10대, 20대, 30대, 40대 이상}, 성별:{남성, 여성, 알려지지않음}, 옷색깔:{파란색, 흰색, 검은색, 주황색}, 얼굴 가려짐 패턴:{마스크, 썬글라스, 안경, 목도리, 손}, 노이즈 패턴:{가우시안 랜덤 노이즈, 소금 후추 노이즈, 버스트 노이즈} 등과 같은 세부 속성을 가질 수 있다. 다만, 본 발명에서는 “속성”과 “세부 속성”을 따로 구분하지 않고 “속성”으로 통일해서 사용할 수 있다.In this case, the property may have a plurality of detailed properties. As an example, if each attribute is about {age, gender, hairstyle, clothing color, face covering pattern, noise pattern}, then each attribute is age: {teens, 20s, 30s, 40s or older}, Gender: {Male, Female, Unknown}, Clothing Color: {Blue, White, Black, Orange}, Face Covering Pattern: {Mask, Sunglasses, Glasses, Scarf, Hand}, Noise Pattern: {Gaussian Random Noise, salt and pepper noise, burst noise}, etc. However, in the present invention, “attribute” and “detailed attribute” may be unified and used as “attribute” without being separately distinguished.

즉, 본 발명의 핵심 요소인 비선형 차원 축소부(151)를 통해 출력된 q'와 s_i'의 표현 벡터는 해당 영상이 가지는 각 속성에 대해 원소 값으로 표현하므로, 의미(시맨틱, Semantic) 수준의 높은 수준으로 정보를 표현할 수 있는 이점이 있다.That is, since the expression vector of q' and s _i 'output through the nonlinear dimension reduction unit 151, which is a key element of the present invention, expresses each attribute of the image as an element value, the semantic level It has the advantage of being able to express information at a high level of

또한, q'와 s_i'의 표현 벡터는 q와 s_i의 영상에 비해 낮은 차원의 데이터를 가지므로, 대체로 큰 사이즈를 가지는 영상 도메인에서의 비교 처리에 비해 보다 빠른 비교 처리가 가능하고, 그 저장 데이터의 양도 줄일 수 있는 이점이 있다.In addition, since the expression vectors of q' and s _i ' have lower-dimensional data than the images of q and s _i , faster comparison processing is possible compared to comparison processing in the image domain having a large size. There is an advantage that the amount of stored data can be reduced.

이와 같이 본 발명은 시맨틱 수준의 정보를 표현하는 q'와 s_i'의 표현 벡터를 이용함에 따라, <쿼리 영상 정보(q)의 오염 요소 제거>, <탐색 대상 집합(S)에 존재하는 오염 요소 제거>, <탐색 대상 집합(S)의 각 영상 정보(s₁ 내지 s_N)에 쿼리 영상 정보(q)에 존재하는 오염 요소의 주입> 등이 가능하다. 이들에 대한 상세한 설명은 후술하도록 한다.In this way, as the present invention uses expression vectors of q' and s _i ', which represent information at a semantic level, <removal of contamination elements of query image information (q)>, <contamination existing in the search target set (S) Element removal>, <injection of contamination elements present in query image information (q) into each image information (s ₁ to s _N ) of the search target set (S)>, etc. are possible. A detailed description of these will be described later.

한편, S201 또는 S202에서, 시맨틱 변인 추출부(152)는 q와 s_i에 대해 영상 별로 시맨틱 변인을 추출하는 기능을 수행하는데, q와 s_i의 각 영상에 포함된 속성을 추출할 수 있다. 특히, 시맨틱 변인 추출부(152)는 차원 축소된 q'와 s_i'의 표현 벡터의 어느 원소가 해당 영상의 특정 속성에 관련되어 있는지를 파악할 수 있다.Meanwhile, in S201 or S202, the semantic variable extractor 152 performs a function of extracting semantic variables for each image for q and s _i , and can extract attributes included in each image of q and s _i . In particular, the semantic variable extractor 152 may determine which element of the dimensionally reduced expression vector of q' and s _i ' is related to a specific attribute of the corresponding image.

일례로, 학습된 오토인코더에 “웃는 얼굴” 영상을 입력할 경우(제1 경우), 그 은닉층의 중심층의 노드 값 설정에 따른 제1 표현 벡터가 생성될 수 있다. 마찬가지로, 학습된 오토인코더에 “웃지 않는 얼굴” 영상을 입력할 경우(제2 경우), 그 은닉층의 중심층의 노드 값 설정에 따른 제2 표현 벡터가 생성될 수 있다. 이때, 제1 및 제2 표현 벡터를 비교하면, 어느 위치의 원소 값이 가장 크게 변했는지를 확인할 수 있으며, 이와 같이 확인된 해당 원소가 “웃는 표정”의 속성에 관련된 원소인 것으로 파악될 수 있다.For example, when a “smiley face” image is input to the learned autoencoder (case 1), a first expression vector may be generated according to the node value setting of the central layer of the hidden layer. Similarly, when a “non-smiling face” image is input to the learned autoencoder (second case), a second expression vector can be generated according to the node value setting of the central layer of the hidden layer. At this time, by comparing the first and second expression vectors, it is possible to determine at which position the value of the element has changed the most, and the identified corresponding element can be determined to be an element related to the attribute of “smiley expression”.

예를 들어, 시맨틱 변인 추출부(152)는 “사람얼굴”, “얼굴 특징점”, “웃는 모습” 등과 같은 속성을 사전에 정의하여, 입력 영상으로부터 그 영상에 포함된 객체의 속성을 추출할 수 있다.For example, the semantic variable extractor 152 may define attributes such as “human face”, “facial feature points”, “smile” in advance and extract attributes of objects included in the image from the input image. have.

시맨틱 변인 추출부(152)를 통해, q'와 s_i'의 표현 벡터의 각 속성을 파악할 수 있으며, 파악된 속성(예를 들어, 공통 속성 등)은 이후에 시맨틱 변인 변환부(153)에서의 변인 변환 처리에 활용될 있다. 시맨틱 변인 추출부(152)를 통해서 분석된 정보는 통신부(120)를 통해 다른 장치로 제공될 수도 있으며, 이를 통해 외부의 사용자 또는 프로그램이 특정한 변인을 제어할 수 있도록 할 수 있다.Through the semantic variable extractor 152, each attribute of the expression vector of q' and s _i 'can be grasped, and the identified attribute (eg, common attribute) is later converted into a semantic variable transform unit 153. It can be used for variable conversion processing of . Information analyzed through the semantic variable extractor 152 may be provided to other devices through the communication unit 120, and through this, an external user or program may control a specific variable.

한편, S202에서, 시맨틱 변인 변환부(154)는 시맨틱 변인 추출부(152)에서 추출된 q'와 s_i'의 표현 벡터의 각 속성을 이용하여 어느 한 속성에 관련된 원소 값을 가공함으로써, 변인 변환을 수행할 수 있다.On the other hand, in S202, the semantic variable conversion unit 154 uses each attribute of the expression vector of q' and s _i ' extracted by the semantic variable extraction unit 152 to process the element value related to any one attribute, conversion can be performed.

구체적으로, 시맨틱 변인 변환부(154)는 다음의 역할을 수행할 수 있다.Specifically, the semantic variable conversion unit 154 may perform the following roles.

(1) q가 오염된 경우(예를 들어, q가 임의의 객체에 의해 가려지거나, 훼손되거나, 노이즈가 발생하거나, 기하학적 변형이 발생하는 등), 오염 원인을 제거할 수 있다. 즉, q'의 표현 벡터에서, 그 오염에 관련된 속성의 원소 값을 가공함으로써, 해당 오염 원인을 제거할 수 있다. 이러한 오염 요소 제거는 S의 특정 원소(s_i)가 오염된 경우에도 적용할 수 있다.(1) If q is contaminated (for example, q is covered by an arbitrary object, damaged, noise is generated, geometric deformation occurs, etc.), the source of contamination can be removed. That is, in the expression vector of q', the corresponding contamination cause can be removed by processing the element value of the attribute related to the contamination. Such contaminant removal can be applied even when a specific element (s _i ) of S is contaminated.

이를 통해, 본 발명은 <쿼리 영상 정보(q)의 오염 요소 제거>, <탐색 대상 집합(S)에 존재하는 오염 요소 제거> 등이 가능하다.Through this, the present invention can perform <removal of contaminating elements of the query image information (q)>, <removal of contaminating elements present in the search target set (S)>, and the like.

(2) 표현 벡터의 원소 값에 소정의 가중치를 주거나, 2개 이상의 서로 다른 표현 벡터에 대한 선형결합(Linear combination)을 수행하면, 시맨틱 수준의 변환이 가능하다. 일례로, 시맨틱 변인 변환부(154)는 “안경을 쓰지 않은 사람의 얼굴”의 표현 벡터에 대해 “안경”의 속성에 해당하는 표현 벡터의 원소 값을 추가함으로써, “안경을 쓴 사람의 얼굴”에 해당하는 표현 벡터로 바꿀 수 있다. 또한, <“안경 쓴 사람”의 표현 벡터 - “사람”의 표현 벡터>을 수행하면 “안경”에 관련된 표현 벡터가 출력되고, <“안경 쓴 사람”의 표현 벡터 - “안경”의 표현 벡터>를 수행하면 “사람”의 표현 벡터가 출력되도록 하는 등의 처리가 가능하다. 이를 통해, 본 발명은 <q와 S 사이의 시맨틱 수준의 비교 지원>, <q와 S 사이의 시맨틱 수준의 능동적인 변인 통제 지원>이 가능해진다. 즉, 표현 벡터에 의한 잠재 공간을 인간이 생각하는 상위 수준의 의미를 부여하도록 구성할 수 있다. (2) If a predetermined weight is given to element values of expression vectors or a linear combination is performed on two or more different expression vectors, semantic level conversion is possible. For example, the semantic variable conversion unit 154 adds element values of the expression vector corresponding to the attribute of “glasses” to the expression vector of “the face of a person without glasses”, thereby adding “the face of a person wearing glasses”. can be converted into an expression vector corresponding to Also, if <expression vector of “person wearing glasses” - expression vector of “person”> is executed, expression vectors related to “glasses” are output, <expression vector of “person wearing glasses” - expression vector of “glasses”> , it is possible to process such as outputting the expression vector of “person”. Through this, the present invention enables <support for semantic level comparison between q and S> and <support for active variable control at semantic level between q and S>. That is, the latent space by the expression vector can be configured to give a higher level of meaning that humans think.

일례로, 얼굴 표정이라는 속성 안에 {“보통”, “웃는”, “우는”, “찡그린”} 이라는 4개의 속성이 있다고 가정하고, 비선형 차원 축소부(151)가 변이형 오토인코더를 사용하여 차원 축소 처리를 수행한다고 가정하자. 이때, 얼굴 표정이 {“보통 얼굴”, “웃는 얼굴”, “우는 얼굴”, “찡그린 얼굴”}인 레이블이 붙어 있는 얼굴 사진들을 모으고, 이를 각각 학습하여 표현 벡터를 생성하면 {“보통 얼굴”, “웃는 얼굴”, “우는 얼굴”, “찡그린 얼굴”}에 대한 표현 벡터를 얻게 되는데, 이것을 {“웃는”, “우는”, “찡그린”} 이라는 표현 벡터로 변환할 수 있다. 가령, “웃는”이라는 표현 벡터를 찾으려면 “웃는 얼굴”의 표현 벡터를 “보통 얼굴”의 표현 벡터로 빼면, “얼굴”이라는 공통의 속성은 사라지고, “웃는”이라는 표현 벡터의 값만 남게 되는 것이다. 이때, 표현 벡터와 표현 벡터를 단순히 더하거나 빼지 않고 상술한 바와 같이 선형결합(Linear combination)과 함께 특정 속성의 원소 값에 대해 가중치를 주어 처리하면, 시맨틱 수준의 변인 통제가 가능해질 수 있다.As an example, assuming that there are four attributes {“normal”, “smiling”, “crying”, “frowning”} in the attribute of facial expression, the nonlinear dimension reduction unit 151 uses a variable autoencoder to Suppose we want to do some reduction processing. At this time, if you collect labeled face photos with facial expressions {“normal face”, “smile face”, “crying face”, “frown face”} and learn each of them to create an expression vector {“normal face” , “Smiley face”, “Crying face”, “Frown face”} expression vectors are obtained, which can be converted into expression vectors {“smiley”, “crying”, “frowning”}. For example, to find the expression vector “smile”, subtract the expression vector of “smile face” from the expression vector of “normal face”, the common attribute “face” disappears, and only the value of the expression vector “smile” remains. . At this time, if the expression vector is not simply added or subtracted from the expression vector, and the element value of the specific attribute is weighted together with the linear combination as described above, variable control at the semantic level may be possible.

(3) 상술한 (2)의 특징으로 인해, 본 발명은 <탐색 대상 집합(S)의 각 영상 정보(s₁ 내지 s_N)에 쿼리 영상 정보(q)에 존재하는 오염 요소의 주입> 등이 가능하다. 즉, q에 임의의 오염 객체가 있을 때 이를 추정하여 탐색 대상 집합(S)의 원소(s_i)에 주입하는 것이 가능하다. 구체적으로, q의 오염 객체에 해당하는 표현 벡터를 q'에서 찾고, S를 차원 축소 처리한 후에, 각각의 영상들(s_i')에 대한 표현 벡터들에 해당 오염원에 해당하는 속성의 원소 값을 더하거나 가중치를 조절하면, 탐색 대상 집합(S)의 각 원소에, q에 존재하는 오염 요소를 주입할 수 있다.(3) Due to the characteristics of (2) described above, the present invention <injection of contamination elements present in query image information (q) into each image information (s ₁ to s _N ) of the search target set (S)>, etc. this is possible That is, when there is a random polluted object in q, it is possible to estimate it and inject it into the element (s _i ) of the search target set (S). Specifically, after finding the expression vector corresponding to the contamination object of q from q' and dimensionally reducing S, the element values of the attribute corresponding to the contamination source are assigned to the expression vectors for each of the images (s _i '). By adding or adjusting the weight, it is possible to inject a contaminant element present in q into each element of the set to be searched (S).

한편, 시맨틱 변인 변환부(153)에 의한 상기 (1) 외에도, 본 발명은 비선형 차원 축소부(151)에 의한 다음의 (4)를 통해 상술한 (1)과 동일한 효과, 즉 <쿼리 영상 정보(q)의 오염 요소 제거>, <탐색 대상 집합(S)에 존재하는 오염 요소 제거> 등이 가능하다On the other hand, in addition to the above (1) by the semantic variable conversion unit 153, the present invention provides the same effect as the above-mentioned (1) through the following (4) by the nonlinear dimension reduction unit 151, that is, <query image information It is possible to remove contaminants of (q)>, <remove contaminants present in the search target set (S)>, etc.

(4) q 또는 s_i가 오염된 경우(예를 들어, 임의의 객체에 의해 가려지거나, 훼손되거나, 노이즈가 발생하거나, 기하학적 변형이 발생하는 등), 차원 축소 처리를 통해 그 오염 원인을 제거할 수도 있다. 이 경우에 비선형 차원 축소부(151)는 오토인코더(Autoencoder), 변이형 오토인코더(VAE, Variational Autoencoder), 생성적 적대 신경망(Generative Adversarial Network) 등의 머신 러닝 모델을 통해 차원 축소 처리를 수행한다. 이러한 머신 러닝 모델은 그 은닉층(중심층)이 입력층 및 출력층에 비해 줄어든 개수의 노드를 가지므로, 이를 통해 차원 축소 처리하면 그 은닉층을 통해 인코딩 처리가 가능하다. 특히, 이러한 머신 러닝 모델은 오염된 q 또는 s_i와 유사 종류의 다양한 제1 영상들(단, 제1 영상들은 해당 오염이 발생하지 않은 영상들)을 기반으로 학습된 것일 수 있다. 이 경우, 해당 머신 러닝 모델에 오염된 q 또는 s_i'를 입력시키면 그 은닉층은 오염원이 제거된 상태의 노드 값이 설정되므로, 오염원이 제거된 상태에 해당하는 q' 또는 s_i'의 표현 벡터를 생성할 수 있다. 이러한 원리에 의해, 본 발명은 차원 축소 처리 중에 <쿼리 영상 정보(q)의 오염 요소 제거>, <탐색 대상 집합(S)에 존재하는 오염 요소 제거> 등이 가능하다.(4) If q or s _i is contaminated (for example, it is obscured by an object, damaged, noise is generated, geometric deformation occurs, etc.), the cause of contamination is removed through dimensionality reduction processing. You may. In this case, the nonlinear dimensionality reduction unit 151 performs dimensionality reduction processing through a machine learning model such as an autoencoder, a variational autoencoder (VAE), or a generative adversarial network. . In such a machine learning model, since the hidden layer (central layer) has a reduced number of nodes compared to the input layer and the output layer, encoding can be performed through the hidden layer if dimensionality reduction is performed through this. In particular, such a machine learning model may be learned based on various first images similar to q or s _i that are contaminated (provided that the first images are images without corresponding contamination). In this case, if contaminated q or s _i 'is input to the machine learning model, the hidden layer is set to a node value in a state in which the pollutant is removed, so the expression vector of q' or s _i 'corresponding to the state in which the pollutant is removed can create According to this principle, the present invention can perform <removal of contaminating elements of query image information (q)>, <removal of contaminating elements existing in the set to be searched (S)>, etc. during the dimension reduction process.

한편, S202에서, 비교부(156)는 q'와 s_i'의 표현 벡터를 직접 비교하거나, q'와 s_i'의 표현 벡터에 포함된 적어도 어느 하나의 속성에 대한 원소를 비교할 수 있다. 이러한 비교는 상술한 (4)에 따른 오염원 제거의 차원 축소 처리를 수행한 후의 비교도 포함될 수 있다.Meanwhile, in S202, the comparator 156 may directly compare expression vectors q' and s _i ', or compare elements of at least one attribute included in expression vectors q' and s _i '. This comparison may also include a comparison after performing the dimension reduction process of removing the contaminant according to (4) above.

또한, 비교부(156)는 q' 및 s_i' 중에 적어도 어느 하나를 해당 축소 차원 내에서 추가 가공 처리한 후 비교할 수도 있다. 이때, 추가 가공 처리는 상술한 (1) 내지 (3) 중에 어느 하나의 처리일 수 있다.In addition, the comparator 156 may perform additional processing on at least one of q' and s _i ' within the corresponding reduced dimension, and then compare them. At this time, the additional processing treatment may be any one of the above-described (1) to (3).

즉, 추가 가공 처리는 적어도 어느 하나의 속성에 관련된 q' 또는 s_i'의 원소 값을 변경하는 것일 수 있다. 또한, 추가 가공 처리는 q에서 오염된 적어도 어느 하나의 속성을 q'에서 제거하는 처리일 수 있다. 또한, 추가 가공 처리는 어느 한 s_i에서 오염된 적어도 어느 하나의 속성을 각 s_i'에서 제거하는 처리일 수 있다. 또한, 추가 가공 처리는 q에서 오염된 적어도 어느 하나의 속성을 각 s_i'에 주입하는 처리일 수 있다. 또한, 추가 가공 처리는 q' 또는 s_i'에서 적어도 하나의 속성을 선택적으로 변경하여 변인 통제하는 처리일 수 있다. 이러한 추가 가공 처리에 대한 상세한 내용은 (1) 내지 (3)에서 상술하였으므로, 이하 생략하도록 한다.That is, the additional processing may be to change the element value of q' or s _i ' related to at least one attribute. In addition, the additional processing may be a process of removing at least one of the contaminated attributes of q from q'. Also, the additional processing process may be a process of removing at least one attribute contaminated in any one s _i from each s _i '. Further, the additional processing process may be a process of injecting at least one attribute contaminated in q into each s _i '. In addition, the additional processing process may be a process of controlling variables by selectively changing at least one attribute in q' or s _i '. Since the details of these additional processing treatments have been described in (1) to (3), they will be omitted below.

한편, S202에서, 비선형 차원 확장부(154)는 q'와 s_i'를 디코딩 처리할 수 있다. 또한, 비선형 차원 확장부(154)는 q' 또는 s_i'에 대해 해당 축소 차원 내에서 추가 가공 처리한 후에 디코딩 처리할 수 있다.Meanwhile, in S202, the nonlinear dimension expansion unit 154 may decode q' and s _i '. In addition, the nonlinear dimension expansion unit 154 may decode q' or s _i ' after additional processing within the corresponding reduced dimension.

이때, “디코딩 처리”란 차원 축소되면서 시맨틱 수준의 정보를 가지고 있던 q'와 s_i'를 다시 영상으로 변환하는 처리이다. 이러한 디코딩 처리에 따라, 그 데이터의 차원은 확장된다. “차원 확장”은 상술한 차원 축소와 반대의 의미일 수 있다.At this time, “decoding process” is a process of converting q' and s _i ', which had information of a semantic level, back into an image as the dimension is reduced. According to this decoding process, the dimension of the data is expanded. “Dimensional expansion” may mean the opposite of the aforementioned dimensionality reduction.

즉, 비선형 차원 확장부(154)는 차원 축소된 표현 벡터를 역으로 확장하여 원래의 영상의 크기, 혹은 이보다 더 큰 영상, 혹은 더 작은 영상으로 만들 수 있다. 이를 위해, 오토인코더(Autoencoder), 변이형 오토인코더(VAE, Variational Autoencoder), 생성적 적대 신경망(Generative Adversarial Network)을 사용할 수 있으나, 이에 한정되는 것은 아니다.That is, the non-linear dimension expansion unit 154 can inversely expand the dimensionally reduced expression vector to make an image the size of the original image, a larger image, or a smaller image. To this end, an autoencoder, a variational autoencoder (VAE), or a generative adversarial network may be used, but is not limited thereto.

이러한 비선형 차원 확장부(154)의 디코딩 처리에 따라, S202에서, 비교부(156)는 q'와 s_i'에 대한 디코딩 처리 결과물들을 비교할 수 있다. 또한, 비교부(156)는 q' 또는 s_i'에 대한 추가 가공 및 디코딩 처리의 결과물들을 비교할 수 있다. According to the decoding process of the nonlinear dimension expansion unit 154, the comparator 156 may compare decoding processing results for q' and s _i ' in S202. Also, the comparator 156 may compare results of additional processing and decoding processing for q' or s _i '.

가령, q'에 대한 추가 가공 처리한 것이 q''이고, s_i'에 대한 추가 가공 처리한 것이 s_i''라고 지칭할 수 있다. 이때, 비교부(156)는 q'의 디코딩 처리 결과물과, s_i'의 디코딩 처리 결과물을 비교할 수 있다. 또한, 비교부(156)는 q''의 디코딩 처리 결과물과, s_i'의 디코딩 처리 결과물을 비교할 수 있다. 또한, 비교부(156)는 q'의 디코딩 처리 결과물과, s_i''의 디코딩 처리 결과물을 비교할 수 있다. 또한, 비교부(156)는 q''의 디코딩 처리 결과물과, s_i''의 디코딩 처리 결과물을 비교할 수 있다.For example, the additional processing for q' may be referred to as q'', and the additional processing for s _i ' may be referred to as s _i ''. At this time, the comparator 156 may compare the decoding process result of q' and the decoding process result of s _i '. In addition, the comparator 156 may compare the decoding process result of q″ and the decoding process result of s _{i ′} . In addition, the comparator 156 may compare the decoding process result of q' with the decoding process result of s _i ''. In addition, the comparator 156 may compare the decoding process result of q'' with the decoding process result of s _i ''.

요약하면, q와 S의 i번째 원소 s_i의 상호 비교 관점에서, 각 입력 영상들에 대해서 비선형 차원 축소 g()를 적용하면, g(q)와 g(s_i)의 표현 벡터(즉, q'와 s_i'의 표현 벡터)를 획득할 수 있어, q와 s_i에 대한 간접 비교를 수행할 수 있다. 즉, similarity()라고 정의한 소정의 비교 방법을 사용하여, similarity(g(q), g(s_i))의 비교 결과(즉, similarity(q', s_i'), similarity(q'', s_i'), similarity(q', s_i''), 또는 similarity(q'', s_i'') 등의 비교 결과)를 얻을 수 있다.In summary, in terms of mutual comparison of q and the ith element s _i of S, if nonlinear dimensionality reduction g() is applied to each input image, the expression vectors of g(q) and g(s _i ) (i.e., Expression vectors of q' and s _i ') can be obtained, and indirect comparison of q and s _i can be performed. That is, using a predetermined comparison method defined as similarity(), the comparison result of similarity (g(q), g(s _i )) (ie, similarity(q', s _i '), similarity(q'', s _i '), similarity (q', s _i ''), or similarity (q'', s _i '')) can be obtained.

참고로, 차원 축소 처리가 가능 한 이론적 배경으로는 매니폴드 가설이 존재한다. 매니폴드 가설은 높은 차원에 존재하는 데이터들은 실제로는 이보다 낮은 차원의 manifold(다양체)가 존재한다는 것인데, 비유적으로는 시계의 태엽의 경우, 3차원 공간에 존재하지만, 태엽을 넓게 풀면 단일 평면으로 나타낼 수 있고, 이와 유사하게 고차원에 표현된 데이터들도 그것의 실제 분포는 더 낮은 차원의 다양체로 표현될 수 있다는 것이다.For reference, the manifold hypothesis exists as a theoretical background for dimension reduction processing. The manifold hypothesis is that data that exist in a high dimension actually exist in a manifold (manifold) of a lower dimension. Metaphorically, in the case of a clockwork, it exists in a three-dimensional space, but if the mainspring is widened, it is formed into a single plane. Similarly, for data represented in high dimensions, their actual distributions can be expressed as manifolds in lower dimensions.

특히, 본 발명에서는 이러한 차원 축소 처리를 위해서 비선형 방식을 사용하는 것이 바람직하다. 일례로, 종래의 Wiener filter와 같은 기술은 오염이 발생하게 된 경위를 수학적 모델로 수립하고 이를 역변환하여 오염 요소를 제거하는 방법을 사용했다. 종래의 PCA(주성분 분석, Principle Component Analysis) 방법의 경우, 입력된 데이터를 기술하는 새로운 좌표계를 찾은 후, 차원별 중요성을 고려하여 데이터를 처리했다. 하지만, 이러한 종래 기술들은 통상적으로 모델이 선형(linear model)이라고 가정한 방식에 불과하며, 비선형 모델을 사용하는 경우에 대해서는 특정 국부적인 구간에 대해 선형이라고 가정하는 등의 단순화 과정이 필요했다. 또한, 이러한 단순화된 모델 역시 세부적인 파라미터(Parameter)는 사전에 미리 알려진 정보(Prior knowledge)라고 가정하여 이를 다시 추정해야 하는 어려움이 있었다. 물론, 비선형 차원 축소방식이 이론적으로 선형모델에 비해 더욱 복잡하고 학습에 소요되는 시간과 연산량은 증가할 수도 있다. 하지만, 최신의 인공신경망 기술에서 개발된 여러 기법과 구조를 활용하면, 그 학습의 시간 및 연산량을 줄일 수 있는 이점이 있다. 예를 들어, 이러한 인공신경망은 오토인코더(Autoencoder), 변이형 오토인코더(VAE, Variational Autoencoder), 생성적 적대 신경망(Generative Adversarial Network) 등이 있으며, 이들에 대한 학습은 학습부(155)를 통해 제어될 수 있다.In particular, in the present invention, it is preferable to use a non-linear method for dimension reduction processing. For example, a conventional technology such as the Wiener filter uses a method of establishing a mathematical model of how contamination occurs and inversely transforming it to remove contaminant elements. In the case of the conventional PCA (Principle Component Analysis) method, after finding a new coordinate system describing the input data, the data was processed considering the importance of each dimension. However, these conventional techniques are usually just a method of assuming that the model is a linear model, and in the case of using a non-linear model, a simplification process such as assuming that the model is linear for a specific local section is required. In addition, this simplified model also had a difficulty in re-estimating detailed parameters assuming that they were prior knowledge. Of course, the nonlinear dimension reduction method is theoretically more complicated than the linear model, and the time and amount of computation required for learning may increase. However, using various techniques and structures developed in the latest artificial neural network technology has the advantage of reducing the learning time and amount of computation. For example, such an artificial neural network includes an autoencoder, a variational autoencoder (VAE), a generative adversarial network, and the like, and learning about them is performed through the learning unit 155. can be controlled

도 5은 본 발명에 의한 비선형 차원 축소 처리를 사용한 영상 비교에 대한 개념도를 나타낸다.5 shows a conceptual diagram for image comparison using nonlinear dimensionality reduction processing according to the present invention.

도 5를 참조하면, 본 발명은 q가 입력되었을 때 S의 원소들과 상호 유사도를 바로 계산하는 것이 아니라, q와 S의 i번째 원소(s_i)에 대해 차원 축소 처리에 해당하는 g()라는 변환 관계를 적용한 후, similarity(g(q), g(s_i))라는 비교를 수행하는 것을 특징으로 한다. 이러한 차원 축소 처리는 g() 연산자로 나타낼 수 있으며, g()=g₁(g₂(g₃(…와 같은 다중 합성 함수의 형태로도 표현할 수 있다. 다중 합성 함수를 이루는 각각의 함수들은 선형, 혹은 비선형이 될 수 있다.Referring to FIG. 5, the present invention does not directly calculate the degree of mutual similarity with the elements of S when q is input, but g() corresponding to dimension reduction processing for q and the ith element (s _i ) of S After applying the transformation relationship called, it is characterized in that a comparison called similarity (g (q), g (s _i )) is performed. This dimensionality reduction process can be expressed by the g() operator, and can also be expressed in the form of multiple composition functions such as g()=g ₁ (g ₂ (g ₃ (...). Each function constituting the multiple composition function is It can be linear or non-linear.

차원 축소 처리를 통해 얻는 이점은 당초의 입력 영상인 q 및 s_i에 비해서, g(q) 및 g(s_i)의 데이터의 차원(즉, q' 및 s_i'의 데이터 차원) 훨씬 줄어들 수 있는 것이다. 일례로, 1024x1024 크기의 2차원 영상을 100x1 크기의 벡터로 표현하는 것과 같은 식이다. 단, 이러한 아이디어는 종래의 PCA(주성분 분석)에서 사용하는 방법과 유사할 수도 있지만, PCA는 선형 변환인데 비하여 본 발명에 의한 차원 축소 처리는 비선형 방법을 사용한다는 점이 특징이다.The advantage obtained through the dimensionality reduction process is that the dimensions of the data of g(q) and g(s _i ) (that is, the dimensions of the data of q' and s _i ') can be significantly reduced compared to the original input images q and s _i . There is. For example, it is the same expression as expressing a 1024x1024 2D image as a 100x1 vector. However, this idea may be similar to the method used in the conventional PCA (principal component analysis), but PCA is a linear transformation, whereas the dimensionality reduction process according to the present invention uses a non-linear method.

차원 축소 처리를 통해 얻을 수 있는 또 다른 이점으로는 PCA 등과 같은 선형 변환을 통해 출력된 정보에 비해 훨씬 다양한 의미를 출력된 표현 벡터에 포함시킬 수 있다는 것이다. 일례로, PCA의 경우 주어진 데이터셋의 발생 분포, 구체적으로는 분산을 고려하여 고유벡터(eigenvector)와 고윳값(eigenvalue)을 찾는 것으로 한정되지만, 오토인코더 또는 변이형 오토인코더 등의 비선형 방법을 사용하면 차원 축소된 표현 벡터가 보다 풍부한 시맨틱 수준의 잠재 공간(Latent space)을 생성할 수 있다.Another advantage obtained through dimensionality reduction processing is that much more diverse meanings can be included in the output expression vector compared to information output through linear transformation such as PCA. As an example, PCA is limited to finding eigenvectors and eigenvalues in consideration of the occurrence distribution of a given dataset, specifically the variance, but uses a non-linear method such as an autoencoder or a variational autoencoder. In this case, a dimensionally reduced expression vector can create a latent space at a more abundant semantic level.

도 6은 본 발명에 따른 얼굴 비교의 일 예를 나타낸다.6 shows an example of face comparison according to the present invention.

도 6에서, q라는 영상은 “찡그린 얼굴”이라고 가정하고, S의 각 원소(S={s₁, s₂, …s_i, …s_N}, i∈{1, 2, …N})에 대응하는 영상들은 각각 “찡그린 얼굴”(s₁), “화난 얼굴” (s₂), ... “웃는 얼굴” (s_i), ... “슬픈 얼굴”(s_N) 이라고 가정하자. 본 발명에 의한 차원 축소 처리를 수행하면, q는 g(q)(즉, q')가 되고, S는 S'={g(s₁), g(s₂), …g(s_i), …g(s_N)}가 된다. 본 발명에 의하면, g(q)와 각각의 g(s_i)들 간에 비교를 수행할 수 있다. 특히, 표현 벡터로 기술되는 잠재 공간에서, “얼굴”이라는 표현 벡터에서 “얼굴 표정”에 해당하는 속성의 표현 벡터를 제거하고 “웃는”이라는 속성의 표현 벡터를 주입할 수 있다. 이 경우, q와 S의 얼굴들에 대해서, 얼굴의 모양, 피부색 등 다른 요소는 가급적 바꾸지 않으면서 얼굴 표정을 웃는 것으로 변환할 수 있다. 즉, 얼굴 표정을 통제할 변인으로 설정하여 영상 간 비교 시에, 얼굴 표정으로 인한 불일치 가능성을 낮출 수 있는 것이다.In FIG. 6, it is assumed that the image q is a “frown face”, and each element of S (S={s ₁ , s ₂ , …s _i , …s _N }, i∈{1, 2, …N}) Assume that the images corresponding to are “frown face” (s ₁ ), “angry face” (s ₂ ), ... “smiley face” (s _i ), ... “sad face” (s _N ), respectively. . When dimension reduction processing according to the present invention is performed, q becomes g(q) (ie, q'), and S is S'={g(s ₁ ), g(s ₂ ), . . . g(s _i ), . . . g(s _N )}. According to the present invention, it is possible to perform a comparison between g(q) and each g(s _i ). In particular, in the latent space described by the expression vector, the expression vector of the attribute corresponding to “facial expression” may be removed from the expression vector “face” and the expression vector of the attribute “smile” may be injected. In this case, for the faces of q and S, facial expressions can be converted into smiles while keeping other factors such as face shape and skin color unchanged as much as possible. In other words, by setting facial expression as a variable to control, the possibility of inconsistency due to facial expression can be reduced when comparing images.

도 7는 본 발명에 따른 얼굴 비교의 다른 일 예를 나타낸다.7 shows another example of face comparison according to the present invention.

도 7을 참조하면, 노이즈가 포함된 경우, 입 주변이 마스크로 가려진 경우, 영상의 일부 영역이 훼손된 경우 등과 같이 q 영상에 오염이 발생한 경우(또는 s_i 영상에 오염이 발생한 경우)에 대해, 본 발명은 차원 축소 처리 후에 q'(또는 s_i')에서 그 오염 요소에 해당하는 속성을 제거할 수 있다. Referring to FIG. 7, when contamination occurs in the q image (or when contamination occurs in the s _i image), such as when noise is included, when the area around the mouth is covered by a mask, when a part of the image is damaged, etc. The present invention can remove attributes corresponding to the contaminants from q' (or s _i ') after dimensionality reduction processing.

이러한 오염 상황에 대해서 더 구체적으로 기술하면 다음과 같다. 즉, q =q+a과 같이, q가 a라는 요소에 의해 오염된 상황을 가정하자. 이 경우, q 를 그대로 사용하여 s_i와 비교하면 그 비교가 제대로 될 수 없다. 이에 따라, 정확한 비교를 위해 오염 요소인 a를 제거하는 조치를 취한 후에 S의 원소들과 비교하는 과정이 필요하다. 하지만, 종래 기술의 경우, 영상 차원에서 q 를 가공 처리한 후 비교하였다. 여기서, 오염된 상태를 나타내는 수식인 q =q+a는 간단한 설명을 위한 것이며, 보다 일반적으로는 q =f(q)와 같은 임의의 함수 형태로 나타낼 수 있다. 즉, 종래 기술은 오염원 a이나 그 함수 f()를 찾은 후에 이것의 역변환을 취하는 원리를 사용했다. 하지만, 이러한 종래 기술은 영상 차원에서의 가공 및 비교를 수행해야 하므로, 연산량 및 연산을 위한 데이터 저장 공간이 증가하는 문제가 발생했다.A more detailed description of this contamination situation is as follows. That is, suppose that q is contaminated by an element a, such as q = q + a. In this case, if q is used as it is and compared with s _i , the comparison cannot be performed properly. Accordingly, for accurate comparison, a process of comparing with the elements of S is required after taking measures to remove a, which is a contaminant element. However, in the case of the prior art, q was processed and compared in the image dimension. Here, q = q + a, which is a formula representing the contaminated state, is for simple explanation, and more generally, it can be expressed in an arbitrary function form such as q = f (q). That is, the prior art used the principle of finding the source of contamination a or its function f() and then taking its inverse transformation. However, since processing and comparison must be performed in the image dimension, the amount of calculation and data storage space for calculation increase.

반면, 본 발명에서는 높은 차원의 영상 데이터를 낮은 차원으로 인코딩(즉, 차원 축소 처리)한 후에, 그 축소된 차원에서 노이즈를 제거하는 원리를 사용한다. 이는 상술한 바와 같이 매니폴드 가설에 의해 뒷받침될 수 있으며, 특히 랜덤한 노이즈는 특별한 규칙이 없기 때문에 비선형의 차원 축소 처리를 통해 인코딩이 되지 않을 수 있다. On the other hand, in the present invention, after encoding high-dimensional image data into a low-dimensional image (ie, dimension reduction process), noise is removed from the reduced dimension. This can be supported by the manifold hypothesis as described above, and since random noise in particular has no special rules, it may not be encoded through nonlinear dimensionality reduction processing.

또한, 상술한 (4)의 처리를 적용할 수 있다. 즉, 얼굴에 가려진 부분이 존재하는 경우(즉, 오염원이 존재하는 경우), 가려진 얼굴과 가려지지 않은 얼굴을 사용하여 머신 러닝 모델의 학습을 수행하고 해당 학습된 머신 러닝 모델을 이용하여 차원 축소 처리하면, 가려진 부분 뒤에 숨겨진 구조를 포함한 표현 벡터를 확보할 수 있다. 즉, “가려진 부분을 제거하는 비선형 차원 축소”의 머신 러닝 모델을 학습을 통해 확보하고, 해당 모델에 가려진 부분이 있는 얼굴을 입력시키면, 그 출력된 표현 벡터는 “가려진 부분이 없는 얼굴에 대한 표현 벡터”에 해당할 수 있다.In addition, the processing of (4) described above can be applied. That is, if there is a hidden part on the face (ie, a contamination source exists), the learning of the machine learning model is performed using the covered face and the non-covered face, and dimensionality reduction processing is performed using the trained machine learning model , it is possible to secure an expression vector including a hidden structure behind the occluded part. In other words, if a machine learning model of “nonlinear dimensionality reduction that removes occluded parts” is obtained through learning, and a face with occluded parts is input to the model, the output expression vector is “representation of a face without occluded parts” vector”.

또한, 도 7에는 표시하지 않았지만, 상술한 (1)에 따라 시맨틱 변인 변환부(154)를 통해, q에 존재하는 오염 요소를 추출하여 이를 S의 각각의 원소에 주입할 수도 있다.In addition, although not shown in FIG. 7, through the semantic variable conversion unit 154 according to the above-described (1), a contaminant element existing in q may be extracted and injected into each element of S.

다만, 도 6와 도 7에서는 “얼굴 표정”이라는 속성을 그 예로 설명하였지만, 본 발명이 이에 한정되지 않고 다양한 속성에 대해 적용될 수 있음은 물론이다.However, in FIGS. 6 and 7 , the attribute “facial expression” is described as an example, but the present invention is not limited thereto and can be applied to various attributes, of course.

도 8은 본 발명에서 q와 s_i가 각각 다수 개의 표현 벡터로 표현된 후의 비교에 대한 일 예를 나타낸다.8 shows an example of comparison after q and s _i are each expressed with a plurality of expression vectors in the present invention.

도 8을 참조하면, S201에서, q에 대해 다수의 q', 즉 g⁽¹⁾(q), g⁽²⁾(q), …g^(k)(q)를 각각 생성할 수 있다. 마찬가지로, s_i에 대해 다수의 s_i', 즉 g⁽¹⁾(s_i), g⁽²⁾(s_i), …g^(k)(s_i)를 각각 생성할 수 있다. 이때, k는 2이상의 자연수이다. 이때, k개의 q' 또는 s_i'는 서로 다른 속성을 나타낼 수 있다. 즉, 각 q'에 따른 속성에 대응하도록 s₁' 내지 s_N' 각각도 k개로 생성할 수 있다.Referring to FIG. 8, in S201, a plurality of q' for q, that is, g ⁽¹⁾ (q), g ⁽²⁾ (q), ... g ^(k) (q) can be generated respectively. Similarly, for s _i , multiple s _i ′, i.e. g ⁽¹⁾ (s _i ), g ⁽²⁾ (s _i ), . . . g ^(k) (s _i ) can be generated respectively. Here, k is a natural number greater than or equal to 2. In this case, k q' or s _i ' may represent different attributes. That is, k each of s ₁ 'to s _N ' may be generated to correspond to the attribute according to each q'.

이러한 다수 개의 표현 벡터는 서로 다른 차원 축소 처리(다수 개의 머신 러닝 모델)를 통해 생성될 수 있다. 가령, 서로 다른 속성을 가공하면서 차원 축소 처리하는 k개의 머신 러닝을 이용하여, k개의 q'와 k개의 s_i'를 각각 생성할 수 있다. 또한, 하나의 차원 축소 처리(머신 러닝 모델)를 통해 출력된 표현 벡터를 다양한 선형결합(Linear combination)을 통해 생성될 수 있다.These multiple expression vectors may be generated through different dimensionality reduction processing (multiple machine learning models). For example, k pieces of q' and k pieces of s _i ' may be respectively generated by using k machine learning processes for dimensionality reduction while processing different attributes. In addition, expression vectors output through one dimensionality reduction process (machine learning model) may be generated through various linear combinations.

이후, S202에서는 다수 개의 q' 및 s_i'를 이용하여 비교를 수행할 수 있다. 물론, 다수 개의 q' 및 s_i'에 대해서도 추가 가공 처리 후에 비교할 수도 있으며, 디코딩 처리 후에 비교할 수도 있다. 이와 같이 생성된 다수 개의 표현 벡터들은 원래 입력된 영상에는 존재하지 않았던 새로운 의미를 제공하여, S202에서의 비교를 더욱 정교하고 풍부하게 할 수 있다.Thereafter, in S202, comparison may be performed using a plurality of q' and s _i '. Of course, a plurality of q' and s _i ' may be compared after additional processing, or may be compared after decoding. The plurality of expression vectors generated in this way provide a new meaning that did not exist in the originally input image, and the comparison in S202 can be more elaborate and rich.

일례로, 입력된 q가 단순히 “측면 얼굴”이었음에도 불구하고, 상술한 방법을 사용하면 q와 S의 영상들을 각각 다수개의 표현 벡터로 확장하여, {“측면 얼굴”, “정면 얼굴”, “위에서 본 얼굴”, “아래서 본 얼굴”}처럼 각도에 대응하는 다수의 표현 벡터로 확장할 수 있다. 또한, q가 단순히 “20대 얼굴” 이었더라도 q와 S의 얼굴들을 {“10대 얼굴”, “대 얼굴”, “대 얼굴”, “대 얼굴”, “대 얼굴”}과 같이 확장된 표현 벡터로 각각 다수 개 생성하여, 단순한 비교가 아니라 다양한 독립 변인을 생성하여 비교를 수행할 수 있다. 이 경우, 수십년 동안 찾지 못한 인물을 찾거나, A라는 공간에서 촬영된 영상정보를, {B, C, D, E}와 같은 다른 공간에서 촬영된 영상처럼 변형하여 비교를 수행하는 등의 응용이 가능하다.For example, even though the input q was simply a “side face”, using the above method, the images of q and S are expanded into a plurality of expression vectors, respectively, {“side face”, “front face”, “from above It can be extended to a number of expression vectors corresponding to angles, such as “face seen”, “face seen from below”}. In addition, even if q was simply “face in their twenties”, the faces of q and S were extended expressions such as {“teenage face”, “large face”, “large face”, “large face”, “large face”} By creating multiple vectors each, comparison can be performed by creating various independent variables rather than simple comparison. In this case, applications such as finding a person who has not been found for decades or performing comparison by transforming image information captured in space A into images taken in other spaces such as {B, C, D, E} this is possible

또한, 도 8를 참조하면, q 외에 S의 영상들도 이러한 표현 벡터의 확장이 가능하므로, 단순한 비교에서는 찾을 수 없었던 실마리를 찾을 수 있다. 일례로, q와 s_i를 직접 비교하면 큰 유사점이 발견되지 않을 수 있다. 하지만, q를 제1 변인으로 변환시키도록 차원 축소 처리한 g⁽¹⁾(q)와, s_i를 제3 변인으로 변환시키도록 차원 축소 처리한 g⁽³⁾(s_i)가 유사할 수 있다. 이 경우, 제1 변인과 제3 변인의 관계를 파악하여, 사건의 연관성 및 인과 관계 등을 추론할 수 있다.In addition, referring to FIG. 8, since images of S in addition to q can extend such an expression vector, clues that could not be found in simple comparison can be found. For example, a direct comparison of q and s _i may not reveal significant similarities. However, g ⁽¹⁾ (q), which is dimensionally reduced to convert q into the first variable, and g ⁽³⁾ (s _i ), which is dimensionally reduced to transform s _i into a third variable, can be similar. have. In this case, the relationship between the first variable and the third variable can be grasped to infer the correlation and causal relationship between the events.

S201과, S202에서의 추가 가공 처리와, 상술한 (1) 내지 (4)의 처리 등은 주어진 영상 정보들에 대한 상호 비교에 전에 수행되는 일종의 영상 전처리 기술로 사용될 수 있다. 이는 검색이나 데이터 정제 등과 같은 다양한 서비스에 활용될 수 있다.The additional processing in S201 and S202 and the processing in (1) to (4) described above can be used as a kind of image preprocessing technique performed before mutual comparison of given image information. This can be used for various services such as search or data purification.

정리하면, 본 발명은 다음과 같은 점에서 종래 기술과 차이점을 가진다.In summary, the present invention has a difference from the prior art in the following points.

q 또는 s_i의 영상 정보가 오염된 경우(임의의 객체에 의해 가려지거나, 노이즈가 생기거나, 훼손되거나, 너무 작게 촬영된 경우)가 발생하더라도, 본 발명은 비선형 차원 축소부(151)의 차원 축소 처리 과정을 통해서나, 차원 축소된 q' 또는 s_i'에 대해 해당 차원에서 시맨틱 변인 변환부(153)의 추가 가공 처리를 통해, 이러한 오염 요소를 제거할 수 있다. 특히, 변인 변환부(153)의 추가 가공 처리를 통해 q에만 오염 요소가 존재하는 경우, q의 오염 요소를 파악하여 이를 s_i에 주입함으로써 오염 요소에 의한 비교 불일치 문제를 상쇄시킬 수 있다.Even if the image information of q or s _i is contaminated (occluded by an arbitrary object, noise is generated, damaged, or captured too small), the present invention provides the dimension of the nonlinear dimension reduction unit 151 Such a contaminating factor may be removed through a reduction process or through additional processing of the semantic variable conversion unit 153 in the corresponding dimension for dimensionally reduced q' or s _i '. In particular, through the additional processing of the variable conversion unit 153 When the contaminant element exists only in q, the comparison inconsistency problem caused by the contaminant element can be offset by identifying the contaminant element of q and injecting it into _si .

<시맨틱 수준의 비교 지원><Semantic Level Comparison Support>

본 발명은 입력된 영상 정보를 차원 축소 처리를 통해 의미 정보로 인코딩한 후, i) 인코딩한 정보를 그대로 이용하거나, ii) 인코딩한 정보에서 특정 속성의 원소 값에 가중치를 주거나, iii) 인코딩된 정보에 대한 소정의 변환을 수행하거나, iv) 상술한 i) 내지 iii) 중에 어느 하나의 처리 후에 이를 다시 차원 확장하는 디코딩 과정을 통해 영상 도메인으로 변환시킬 수 있다. 이후, 본 발명은 상술한 i) 내지 iv) 중에 어느 하나에 의해 처리된 결과물을 이용하여 상호 비교를 수행한다. 이때, 인코딩된 정보는 보통의 사람들이 상식적으로 생각하는 정도의 높은 수준의 의미 정보를 포함한다. 이러한 인코딩된 정보를 직접 혹은 간접적으로 비교 과정에 이용함으로써, 시맨틱 수준의 비교가 가능하며, q와 s_i에 대한 간접 비교가 가능하다.The present invention encodes input video information into semantic information through dimensionality reduction processing, and then i) uses the encoded information as it is, ii) weights element values of specific attributes in the encoded information, or iii) encodes A predetermined conversion may be performed on the information, or iv) it may be converted into an image domain through a decoding process of dimensionally extending the information after processing any one of i) to iii). Thereafter, the present invention performs mutual comparison using the result processed by any one of i) to iv) described above. At this time, the encoded information includes semantic information of a high level that ordinary people think with common sense. By directly or indirectly using such encoded information in a comparison process, a semantic level comparison is possible, and an indirect comparison of q and s _i is possible.

<시맨틱 수준의 능동적인 변인 통제 가능><Possible to actively control variables at the semantic level>

또한, 본 발명은 시맨틱 수준의 변인 통제를 지원한다. 이때, 변인 통제는 과학 연구에서 보편적으로 쓰이는 방법론이며, 영상들 간의 비교에서도 사용될 수 있다. 특히, 본 발명에서는 영상을 능동적으로 변형시키는 과정을 통해 시맨틱 수준의 변인 통제 기능을 제공한다는 것이 큰 차이점을 가진다.In addition, the present invention supports variable control at the semantic level. At this time, variable control is a methodology commonly used in scientific research, and can also be used for comparison between images. In particular, the present invention has a great difference in providing a variable control function at a semantic level through a process of actively transforming an image.

즉, 종래에는 q와 S를 준비하는 단계에서 이러한 변인 통제를 미리 수행했다. 일례로, S가 “성별이 구분되지 않은 사람들의 얼굴 집합”이라고 하고, 이를 전체집합이라고 정의할 때, “여성 얼굴”만 별도로 추출한 부분집합을 S_f라고 정의할 수 있다. 이때, 종래 기술은 이러한 S_f의 부분집합의 영상들을 q와 상호 비교하기 위해, q와 S_f를 그 동일한 영상 차원에서 변인 통제를 가한 후 비교를 수행한다.That is, conventionally, these variables were controlled in advance in the step of preparing q and S. For example, when S is “a set of faces of people whose gender is not distinguished” and this is defined as a whole set, a subset from which only “female faces” are separately extracted can be defined as S _f . At this time, in the prior art, in order to mutually compare the images of the subset of S _f with q, the comparison is performed after applying variable control to q and S _f in the same image dimension.

반면, 본 발명은 q 또는 S의 원소에 해당하는 영상들을 차원 축소 처리하면서, “웃는 표정의 얼굴로 일치”, “슬픈 표정의 얼굴로 일치”, “대 평균 한국인의 얼굴로 일치”, “안경을 쓴 사람들의 얼굴로 일치”, “안경을 쓰지 않은 사람들의 얼굴로 일치”, “마스크를 쓴 사람들의 얼굴로 일치”, “마스크를 안 쓴 사람들의 얼굴로 일치”, “특정한 카메라로 촬영되었다고 가정했을 때의 사진처럼 변환하여 일치”, “영상 스타일이 홍길동이 그린 그림처럼 보이도록 일치” 등과 같이 능동적으로 변형함으로써 시맨틱 수준의 변인 통제가 가능하다.On the other hand, in the present invention, while dimensionally reducing images corresponding to elements of q or S, “Match with smiling face”, “Match with sad face”, “Match with average Korean face”, “Glasses” “Matches the faces of people wearing glasses”, “Matches the faces of people not wearing glasses”, “Matches the faces of people wearing masks”, “Matches the faces of people not wearing masks”, “It says they were filmed with a specific camera It is possible to control variables at the semantic level by actively transforming, such as “matching by converting like a photograph at the time of assumption”, “matching the video style to look like a picture drawn by Hong Gil-dong”.

즉, 종래에는 주어진 집합 S 내에서 원소들의 큰 특징들을 구분하고, 특징들 각각의 세부 속성을 파악한다. 이후 각각의 속성을 변인으로 하여 존재하는 원소들을 분류(Classification)하여 부분집합을 구성한 후 변인 통제를 했다고 하면, 본 발명에서는 원소들이 가진 속성을 차원 축소된 상태에서 선택적으로 변형하고 가공하여 변인 통제를 할 수 있다.That is, conventionally, large features of elements within a given set S are identified, and detailed attributes of each feature are identified. After that, if each attribute is classified as a variable, the existing elements are classified, a subset is formed, and the variable is controlled. In the present invention, the attributes of the elements are selectively transformed and processed in a dimensionally reduced state, can do.

특히, 본 발명에서는 속성에 대해 사전에 정의하고, 이에 맞게 차원 축소 처리를 수행하는 머신 러닝 모델의 학습 과정을 수행하여 이를 변인 통제에 사용할 수 있다. 또한, 본 발명은 각 속성에 해당하는 잠재공간을 기술하는 표현 벡터를 선형결합 등과 같은 가공(연산처리)을 수행하여 변인 통제를 수행할 수 있다.In particular, in the present invention, a learning process of a machine learning model in which attributes are defined in advance and dimensionality reduction processing is performed accordingly can be used for variable control. In addition, the present invention can perform variable control by performing processing (arithmetic processing) such as linear combination of expression vectors describing the latent space corresponding to each attribute.

상술한 바와 같이 구성되는 본 발명은 쿼리 입력 영상과 탐색 대상 집합 간의 영상 비교 처리 기술을 제공하되, 영상에 존재하는 고수준의 특징 요소를 능동적이고 선택적으로 변경함으로써 시맨틱 수준의 변인 통제가 가능한 새로운 방식의 영상 비교 처리가 가능한 이점이 있다. 또한, 본 발명에 따른 영상 비교 처리는 데이터 차원 축소를 통해 연산 처리를 효율적으로 수행함으로써, 큰 사이즈를 가지는 영상 도메인에서의 비교 처리에 비해 보다 빠른 비교 처리가 가능하고 그 저장 데이터의 양도 줄일 수 있는 이점이 있다. 또한, 본 발명은 영상과 관련한 검색, 유사도 비교, 분류 등이 필요한 다양한 분야에서 활용될 수 있는 이점이 있다.The present invention configured as described above provides an image comparison processing technology between a query input image and a search target set, but provides a new method that can control variables at the semantic level by actively and selectively changing high-level feature elements present in the image. There is an advantage that image comparison processing is possible. In addition, the image comparison processing according to the present invention efficiently performs calculation processing through data dimension reduction, enabling faster comparison processing compared to comparison processing in the image domain having a large size and reducing the amount of stored data. There is an advantage. In addition, the present invention has an advantage that it can be used in various fields that require image search, similarity comparison, classification, and the like.

본 발명의 상세한 설명에서는 구체적인 실시 예에 관하여 설명하였으나 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되지 않으며, 후술되는 청구범위 및 이 청구범위와 균등한 것들에 의해 정해져야 한다.In the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention is not limited to the described embodiments, and should be defined by the following claims and equivalents thereof.

100: 영상 정보 처리 장치 110: 입력부
120: 통신부 120: 통신부
130: 디스플레이 140: 메모리
150: 입력부 151: 비선형 차원 축소부
152; 시맨틱 변인 추출부 153: 시맨틱 변인 변환부
154: 비선형 차원 확장부 154: 학습부
155: 비교부100: image information processing device 110: input unit
120: communication unit 120: communication unit
130: display 140: memory
150: input unit 151: nonlinear dimension reduction unit
152; Semantic variable extraction unit 153: semantic variable conversion unit
154: non-linear dimension expansion unit 154: learning unit
155: comparison unit

Claims

An image processing method in which each step is performed by a computing device,
Non-linear data for each query image information (q) and search target image information (s ₁ to s _N ) of N pieces (N is a natural number of 2 or more) (hereinafter, s ₁ to s _N are referred to as s _i ) a generation step of performing dimensionality reduction to generate q' and s _i ' representing attributes of a corresponding image with at least one element; and
A comparison step of indirectly performing a comparison with q for each s _i using dimensionally reduced q ' and s _i '; including,
The q' and _si ' are generated using a pre-learned model using a machine learning technique,
The model includes an input layer through which input data is input, an output layer through which output data is output, and at least one hidden layer connecting the input layer and the output layer, the number of nodes being reduced compared to the input layer and the output layer, respectively.
Wherein q' and s _i ' are generated based on at least one hidden layer in which node values are set by inputting q and s _i to the model, respectively.

An image processing method in which each step is performed by a computing device,
Non-linear data for each query image information (q) and search target image information (s ₁ to s _N ) of N pieces (N is a natural number of 2 or more) (hereinafter, s ₁ to s _N are referred to as s _i ) a generation step of performing dimensionality reduction to generate q' and s _i ' representing attributes of a corresponding image with at least one element; and
A comparison step of indirectly performing a comparison with q for each s _i using dimensionally reduced q ' and s _i '; including,
In the generating step, at least one attribute contaminated in q is removed from q' or at least one attribute contaminated in any one s _i is removed from q' through a pre-learned model using a machine learning technique. and performing the data dimensionality reduction while removing _i '.

According to claim 1 or 2,
The generating step includes generating a plurality of q' representing different attributes, and generating a plurality of each of s ₁ ' to s _N ' to correspond to the attribute according to each q'.

According to claim 1 or 2,
Wherein the comparing step directly compares q' and s _i 'or compares elements of at least one attribute included in q' and s _i '.

According to claim 1 or 2,
Wherein the comparing step includes comparing at least one of q' and s _i ' after additionally processing it within a corresponding reduction dimension.

According to claim 5,
The additional processing comprises changing an element value of q' or s _i ' related to at least one attribute.

According to claim 5,
Further comprising an extraction step of extracting information about attributes of q and s _i ,
Wherein the comparing step comprises additionally processing at least one of q' and s _i ' within a corresponding reduction dimension using the information extracted in the extracting step.

According to claim 5,
The additional processing includes removing at least one attribute contaminated from q from q', or removing at least one attribute contaminated from one s _i from the corresponding s _i '.

According to claim 5,
Wherein the additional processing comprises injecting at least one of the contaminated attributes in q into each s _i '.

According to claim 1 or 2,
The comparison step is an image processing method comprising the step of controlling variables by selectively changing attributes of q' or s _i '.

According to claim 1 or 2,
The comparing step comprises decoding and comparing q' and s _i ', or decoding and comparing q' or s _i ' after additional processing within a corresponding reduction dimension.

A memory storing query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i ); and
A control unit controlling a comparison with q for each s _i using q and s _i stored in the memory; includes,
The control unit,
Non-linear data dimension reduction is performed on q and s _i , respectively, to generate q' and s _i ', which represent at least one element for the attribute of the image, and using the dimensionally reduced q' and s _i ' Indirectly performing a comparison with q for each s _i ,
The q' and _si ' are generated using a pre-learned model using a machine learning technique,
The model includes an input layer through which input data is input, an output layer through which output data is output, and at least one hidden layer connecting the input layer and the output layer, the number of nodes being reduced compared to the input layer and the output layer, respectively.
The q' and s _i ' are generated based on at least one hidden layer in which node values are set by inputting q and s _i to the model, respectively.

A memory storing query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i ); and
A control unit controlling a comparison with q for each s _i using q and s _i stored in the memory; includes,
The control unit,
Non-linear data dimension reduction is performed on q and s _i , respectively, to generate q' and s _i ', which represent at least one element for the attribute of the image, and using the dimensionally reduced q' and s _i ' Indirectly performing a comparison with q for each s _i ,
When the q' and s _i ' are generated, at least one attribute contaminated in the q is removed from q' or contaminated in any one s _i through a model pre-learned by a machine learning technique. An image processing device that performs the data dimensionality reduction while removing at least one attribute from the corresponding s _{i '} .

Receiving query image information (q) and N pieces (where N is a natural number of 2 or more) of search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i ) from the first device one communications department; and
A control unit for controlling a comparison with q for each s _{i using q and s i} _received by the communication unit, and controlling to transmit the comparison result to the first device or the second device; includes,
The control unit,
Non-linear data dimension reduction is performed on q and s _i , respectively, to generate q' and s _i ', which represent at least one element for the attribute of the image, and using the dimensionally reduced q' and s _i ' Indirectly performing a comparison with q for each s _i ,
The q' and _si ' are generated using a pre-learned model using a machine learning technique,
The model includes an input layer through which input data is input, an output layer through which output data is output, and at least one hidden layer connecting the input layer and the output layer, the number of nodes being reduced compared to the input layer and the output layer, respectively.
The q' and s _i ' are generated based on at least one hidden layer in which node values are set by inputting q and s _i to the model, respectively.

Receiving query image information (q) and N pieces (where N is a natural number of 2 or more) of search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i ) from the first device one communications department; and
A control unit for controlling a comparison with q for each s _{i using q and s i} _received by the communication unit, and controlling to transmit the comparison result to the first device or the second device; includes,
The control unit,
Non-linear data dimension reduction is performed on q and s _i , respectively, to generate q' and s _i ', which represent at least one element for the attribute of the image, and using the dimensionally reduced q' and s _i ' Indirectly performing a comparison with q for each s _i ,
When the q' and s _i ' are generated, at least one attribute contaminated in the q is removed from q' or contaminated in any one s _i through a model pre-learned by a machine learning technique. An image processing device that performs the data dimensionality reduction while removing at least one attribute from the corresponding s _{i '} .

A memory storing query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i );
a non-linear dimensional conversion unit for performing non-linear data dimensionality reduction on q and s _i , respectively, to generate q' and s _i ' representing at least one element with respect to an attribute of a corresponding image;
A comparison unit for performing a comparison with q for each s _i using q 'and s _i '; includes,
The q' and _si ' are generated using a pre-learned model using a machine learning technique,
The model includes an input layer through which input data is input, an output layer through which output data is output, and at least one hidden layer connecting the input layer and the output layer, the number of nodes being reduced compared to the input layer and the output layer, respectively.
The q' and s _i ' are generated based on at least one hidden layer in which node values are set by inputting q and s _i to the model, respectively.

A memory storing query image information (q) and N (where N is a natural number of 2 or more) search target image information (s ₁ to s _N ) (hereinafter, s ₁ to s _N are referred to as s _i );
a non-linear dimensional conversion unit for performing non-linear data dimensionality reduction on q and s _i , respectively, to generate q' and s _i ' representing at least one element with respect to an attribute of a corresponding image;
A comparison unit for performing a comparison with q for each s _i using q 'and s _i '; includes,
When generating q' and s _i ', the nonlinear dimension conversion unit removes at least one attribute contaminated from q' from q' through a pre-learned model using a machine learning technique, or An image processing device that performs the data dimensionality reduction while removing at least one of the contaminated attributes of s _i from the corresponding s _i '.