KR102476679B1

KR102476679B1 - Apparatus and method for object detection

Info

Publication number: KR102476679B1
Application number: KR1020210024081A
Authority: KR
Inventors: 이성환; 정홍규; 김건욱
Original assignee: 고려대학교 산학협력단
Priority date: 2020-02-24
Filing date: 2021-02-23
Publication date: 2022-12-13
Anticipated expiration: 2041-02-23
Also published as: KR20210107551A

Abstract

본 발명의 일 실시예는, 카테고리들 간의 유사도와 상기 카테고리들 각각의 고유 벡터를 이용한 객체 탐지 장치를 제공한다. 본 객체 탐지 장치는, 입력 모듈과 상기 입력 모듈로 입력되는 데이터를 분석하는 프로세서를 포함한다. 상기 프로세서는, 상기 1차 카테고리들에 대한 문장 데이터를 기초로 상기 1차 카테고리들 간의 유사도를 나타내도록 생성된 유사도 그래프, 그리고, CNN(Convolution Neural Network)을 통해 상기 1차 카테고리들에 해당하는 1차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 1차 카테고리들 각각의 고유 벡터를 설정하고, 상기 1차 카테고리들과 상이한 2차 카테고리에 대한 문장 데이터를 이용하여 상기 유사도 그래프가 상기 1차 카테고리들과 상기 2차 카테고리를 포함한 카테고리들 간의 유사도를 나타내도록 상기 유사도 그래프를 수정하며, 상기 1차 이미지로부터 추출되는 객체들의 특징 정보, 수정된 상기 유사도 그래프, 그리고, 상기 CNN을 통해 상기 2차 카테고리에 해당하는 2차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 2차 카테고리의 고유 벡터를 설정하는 것을 수행하도록 구성된다.An embodiment of the present invention provides an object detection apparatus using a similarity between categories and an eigenvector of each of the categories. The present object detection apparatus includes an input module and a processor that analyzes data input to the input module. The processor generates a similarity graph generated to indicate a degree of similarity between the first categories based on sentence data for the first categories, and 1 corresponding to the first categories through a convolution neural network (CNN). An eigenvector of each of the first categories is set using feature information of objects extracted from the primary image, and the similarity graph is calculated using sentence data for a second category different from the first categories. The similarity graph is modified to indicate the degree of similarity between categories and categories including the secondary category, and through the feature information of objects extracted from the primary image, the modified similarity graph, and the CNN, the secondary An eigenvector of the secondary category is set using feature information of objects extracted from the secondary image corresponding to the category.

Description

Object detection device and method {APPARATUS AND METHOD FOR OBJECT DETECTION}

본 발명은 객체 탐지 장치 및 방법에 관한 것으로, 보다 상세하게는, 카테고리들 간의 상관 관계를 토대로 생성되는 유사도 그래프 및 고유 벡터를 이용하여, 새로운 카테고리에 해당하는 객체에 대한 소수의 학습 데이터만으로도 딥러닝 기반의 객체 탐지가 가능하도록 하는 객체 탐지 장치 및 방법에 관한 것이다. The present invention relates to an object detection apparatus and method, and more particularly, to deep learning using a similarity graph and an eigenvector generated based on a correlation between categories, using only a small number of training data for an object corresponding to a new category. It relates to an object detection device and method enabling object detection based on the present invention.

객체 탐지는 이미지 내 객체의 위치와 종류를 식별해내는 영상 처리 기술의 핵심 요소이다. 최근 빅데이터 및 딥러닝을 통해 객체 탐지 기술이 비약적으로 발달하고 있다. 딥러닝 기반의 객체 탐지 기술은 많은 양의 학습 데이터를 통해 이미지 내 존재하는 다양한 객체의 위치와 종류를 보다 정확하게 탐지할 수 있게 한다.Object detection is a key element of image processing technology that identifies the location and type of an object in an image. Recently, object detection technology is rapidly developing through big data and deep learning. Deep learning-based object detection technology enables more accurate detection of the location and type of various objects in an image through a large amount of learning data.

정확도 높은 객체 탐지를 위해서는 정답이 사전에 라벨링 되어 있는 다수의 학습 데이터가 필요하지만, 이에 비해, 아직 대부분의 산업 및 학문 분야의 데이터 양은 부족한 상태이다. 사람이 여러 분야의 이미지 내 객체의 종류와 위치를 식별하는데 시간 및 비용이 크게 발생하므로, 실용적으로 딥러닝 기반 객체 탐지 기술을 사용하는데 어려움이 있는 상황이다. 또한, 실제 서비스 환경에서 새로운 종류의 객체를 학습하기 위해, 해당 객체에 대한 다량의 학습 데이터를 추가적으로 확보하는데 실질적으로 어려움이 있다. For object detection with high accuracy, a large number of training data with pre-labeled correct answers are required, but in comparison, the amount of data in most industries and academic fields is still insufficient. Since it takes a lot of time and money for people to identify the types and locations of objects in images in various fields, it is difficult to practically use deep learning-based object detection technology. In addition, in order to learn a new type of object in an actual service environment, it is practically difficult to additionally secure a large amount of learning data for a corresponding object.

이에 따라, 최근 소수 샷 학습 기법에 관한 연구가 주목을 받고 있다. 소수 샷 학습 기법은 다수의 학습 데이터로 학습된 모델에 학습되지 않은 새로운 카테고리를 갖는 소수의 학습 데이터가 주어졌을 때, 해당 카테고리를 갖는 이미지 내 객체를 제대로 예측하는 것을 목표로 한다. 종래의 소수 샷 학습 기법은 이미지 내 객체의 종류를 예측하는 이미지 분류 기술에 국한되어 연구되어 왔다. 한편, 객체 탐지를 위한 학습 데이터를 구축하는 것이 객체 분류를 위한 학습 데이터를 구축하는 것보다 상대적으로 많은 비용이 발생한다. 따라서, 소수 샷 기반의 객체 탐지 기술에 대한 관심이 커지고 있다.Accordingly, research on a fractional shot learning technique has recently been attracting attention. The prime-shot learning technique aims to correctly predict an object in an image having a corresponding category when a model trained with a large number of training data is given a small number of training data having a new unlearned category. Conventional small-shot learning techniques have been studied limited to image classification techniques that predict the type of an object in an image. Meanwhile, constructing training data for object detection incurs a relatively higher cost than constructing training data for object classification. Accordingly, interest in object detection technology based on a small number of shots is growing.

본 발명은 전술한 문제점을 해결하기 위한 것으로, 카테고리들 간 상관 관계를 토대로 생성되는 유사도 그래프 및 고유 벡터를 이용하여, 새로운 카테고리에 해당하는 객체에 대한 소수의 학습 데이터만으로도 딥러닝 기반의 객체 탐지가 가능하도록 하는 객체 탐지 장치 및 방법을 제공하는 것을 기술적 과제로 한다.The present invention is intended to solve the above-described problem, using a similarity graph and eigenvector generated based on correlation between categories, deep learning-based object detection is possible with only a small amount of training data for objects corresponding to a new category. It is a technical task to provide an object detection device and method that enables this.

본 발명이 이루고자 하는 기술적 과제들은 상기한 기술적 과제로 제한되지 않으며, 이하의 설명으로부터 본 발명의 또 다른 기술적 과제들이 도출될 수 있다.The technical problems to be achieved by the present invention are not limited to the above technical problems, and other technical problems of the present invention can be derived from the following description.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 측면에 따라 카테고리들 간의 유사도와 상기 카테고리들 각각의 고유 벡터를 이용한 객체 탐지 장치가 제공된다. 본 객체 탐지 장치는, 입력 모듈과 상기 입력 모듈로 입력되는 데이터를 분석하는 프로세서를 포함한다. 상기 프로세서는, 상기 1차 카테고리들에 대한 문장 데이터를 기초로 상기 1차 카테고리들 간의 유사도를 나타내도록 생성된 유사도 그래프, 그리고, CNN(Convolution Neural Network)을 통해 상기 1차 카테고리들에 해당하는 1차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 1차 카테고리들 각각의 고유 벡터를 설정하고, 상기 1차 카테고리들과 상이한 2차 카테고리에 대한 문장 데이터를 이용하여 상기 유사도 그래프가 상기 1차 카테고리들과 상기 2차 카테고리를 포함한 카테고리들 간의 유사도를 나타내도록 상기 유사도 그래프를 수정하며, 상기 1차 이미지로부터 추출되는 객체들의 특징 정보, 수정된 상기 유사도 그래프, 그리고, 상기 CNN을 통해 상기 2차 카테고리에 해당하는 2차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 2차 카테고리의 고유 벡터를 설정하는 것을 수행하도록 구성된다.As a technical means for achieving the above-described technical problem, an object detection apparatus using a similarity between categories and an eigenvector of each of the categories is provided according to an aspect of the present invention. The present object detection apparatus includes an input module and a processor that analyzes data input to the input module. The processor generates a similarity graph generated to indicate a degree of similarity between the first categories based on sentence data for the first categories, and 1 corresponding to the first categories through a convolution neural network (CNN). An eigenvector of each of the first categories is set using feature information of objects extracted from the primary image, and the similarity graph is calculated using sentence data for a second category different from the first categories. The similarity graph is modified to indicate the degree of similarity between categories and categories including the secondary category, and through the feature information of objects extracted from the primary image, the modified similarity graph, and the CNN, the secondary An eigenvector of the secondary category is set using feature information of objects extracted from the secondary image corresponding to the category.

또한, 본 발명의 다른 측면에 따라 카테고리들 간의 유사도와 상기 카테고리들 각각의 고유 벡터를 이용한 객체 탐지 방법이 제공된다. 본 객체 탐지 방법은, 1차 카테고리들에 대한 문장 데이터를 기초로 상기 1차 카테고리들 간의 유사도를 나타내도록 생성된 유사도 그래프, 그리고, CNN(Convolution Neural Network)을 통해 상기 1차 카테고리들에 해당하는 1차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 1차 카테고리들 각각의 고유 벡터를 설정하는 단계와, 상기 1차 카테고리들과 상이한 2차 카테고리에 대한 문장 데이터를 이용하여 상기 유사도 그래프가 상기 1차 카테고리들과 상기 2차 카테고리를 포함한 카테고리들 간의 유사도를 나타내도록 상기 유사도 그래프를 수정하는 단계와, 상기 1차 이미지로부터 추출되는 객체들의 특징 정보, 수정된 상기 유사도 그래프, 그리고, 상기 CNN을 통해 상기 2차 카테고리에 해당하는 2차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 상기 2차 카테고리의 고유 벡터를 설정하는 단계를 포함한다. In addition, according to another aspect of the present invention, an object detection method using a similarity between categories and an eigenvector of each of the categories is provided. The present object detection method includes a similarity graph generated to indicate the degree of similarity between the first categories based on sentence data for the first categories, and a convolutional neural network (CNN) that corresponds to the first categories. Setting an eigenvector of each of the first categories using feature information of objects extracted from the first image; and constructing the similarity graph using sentence data for a second category different from the first categories. Modifying the similarity graph to indicate a degree of similarity between categories including the first categories and the second category, the feature information of objects extracted from the first image, the modified similarity graph, and the CNN. and setting an eigenvector of the second category by using feature information of objects extracted from the second image corresponding to the second category through

전술한 본 발명의 과제 해결 수단에 의하면, 새로운 카테고리에 해당하는 객체에 대한 소수의 학습 데이터만으로도 정확도 높은 딥러닝 기반의 객체 탐지가 가능하다. According to the above-described problem solving means of the present invention, highly accurate deep learning-based object detection is possible with only a small number of training data for objects corresponding to a new category.

또한, 본 발명에 따르면, 로봇 분야와 같이 객체 탐지를 위한 학습 데이터가 극히 적은 분야의 고속 기계 학습을 수행할 수 있다.In addition, according to the present invention, high-speed machine learning can be performed in a field where learning data for object detection is extremely small, such as a robot field.

또한, 본 발명에 따르면, 다수의 데이터로 학습되어 있는 기존 객체 탐지 모델에 소수의 새로운 학습 데이터만 추가적으로 적용함으로써 기존 객체 탐지 모델을 새로운 서비스 환경에 적용할 수 있다. In addition, according to the present invention, an existing object detection model can be applied to a new service environment by additionally applying only a small number of new learning data to an existing object detection model learned with a large number of data.

본 발명의 효과들은 상술한 효과들로 제한되지 않으며, 이하의 기재로부터 이해되는 모든 효과들을 포함한다. The effects of the present invention are not limited to the effects described above, and include all effects understood from the following description.

도 1은 본 발명의 일 실시예에 따른 객체 탐지 장치의 구성을 도시한 블록도이다.
도 2 내지 도 4는 도 1에 도시된 객체 탐지 장치를 이용한 객체 탐지 방법을 설명하기 위해 도시한 도면이다.
도 5는 본 발명의 다른 실시예에 따른 객체 탐지 방법의 순서를 도시한 흐름도이다.
도 6 및 도 7은 도 5에 도시된 객체 탐지 방법의 일부 단계의 세부 과정을 도시한 흐름도이다. 1 is a block diagram showing the configuration of an object detection apparatus according to an embodiment of the present invention.
2 to 4 are diagrams for explaining an object detection method using the object detection apparatus shown in FIG. 1 .
5 is a flowchart illustrating a sequence of an object detection method according to another embodiment of the present invention.
6 and 7 are flowcharts illustrating detailed processes of some steps of the object detection method shown in FIG. 5 .

이하에서는 첨부한 도면을 참조하여 본 발명을 상세히 설명하기로 한다. 다만, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 여기에서 설명하는 실시예들로 한정되는 것은 아니다. 또한, 첨부된 도면은 본 명세서에 개시된 실시예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않는다. 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 도면에 나타난 각 구성요소의 크기, 형태, 형상은 다양하게 변형될 수 있다. 명세서 전체에 대하여 동일/유사한 부분에 대해서는 동일/유사한 도면 부호를 붙였다. Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. However, the present invention may be implemented in many different forms, and is not limited to the embodiments described herein. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in this specification, and the technical ideas disclosed in this specification are not limited by the accompanying drawings. In order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and the size, shape, and shape of each component shown in the drawings may be variously modified. Same/similar reference numerals are assigned to the same/similar parts throughout the specification.

이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부" 등은 명세서 작성의 용이함만이 고려되어 부여 되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략하였다. The suffixes "module" and "unit" for components used in the following description are given or used interchangeably in consideration of ease of writing the specification, and do not have meanings or roles that are distinct from each other by themselves. In addition, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of related known technologies may obscure the gist of the embodiments disclosed in this specification, the detailed description is omitted.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결(접속, 접촉 또는 결합)"되어 있다고 할 때, 이는 "직접적으로 연결(접속, 접촉 또는 결합)"되어 있는 경우뿐만 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결 (접속, 접촉 또는 결합)"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함(구비 또는 마련)"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 "포함(구비 또는 마련)"할 수 있다는 것을 의미한다. Throughout the specification, when a part is said to be “connected (connected, contacted, or combined)” with another part, this is not only the case where it is “directly connected (connected, contacted, or coupled)”, but also has other members in the middle. It also includes the case of being "indirectly connected (connected, contacted, or coupled)" between them. In addition, when a part "includes (provides or provides)" a certain component, it does not exclude other components, but "includes (provides or provides)" other components unless otherwise specified. means you can

본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 나타내는 용어들은 하나의 구성 요소를 다른 구성요소로부터 구별하는 목적으로만 사용되며, 구성 요소들의 순서나 관계를 제한하지 않는다. 예를 들어, 본 발명의 제1구성요소는 제2구성요소로 명명될 수 있고, 유사하게 제2구성요소도 제1구성 요소로 명명될 수 있다. Terms indicating ordinal numbers such as first and second used in this specification are used only for the purpose of distinguishing one element from another, and do not limit the order or relationship of elements. For example, a first element of the present invention may be termed a second element, and similarly, the second element may also be termed a first element.

이하에서 언급되는 "장치"는 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet), LTE(Long Term Evolution) 통신 기반 단말, 스마트폰, 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다. A “device” referred to below may be implemented as a computer or portable terminal capable of accessing a server or other terminals through a network. Here, the computer includes, for example, a laptop, desktop, laptop, etc. equipped with a web browser, and the portable terminal is, for example, a wireless communication device that ensures portability and mobility. , IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet), LTE (Long Term Evolution) communication-based terminal, smart All types of handheld-based wireless communication devices such as phones and tablet PCs may be included. In addition, the network may include a wired network such as a Local Area Network (LAN), a Wide Area Network (WAN) or a Value Added Network (VAN), a mobile radio communication network, a satellite communication network, and the like. It can be implemented in all kinds of wireless networks such as

도 1은 본 발명의 일 실시예에 따른 객체 탐지 장치(100)의 구성을 도시한 블록도이고, 도 2 내지 도 4는 객체 탐지 장치(100)를 이용한 객체 탐지를 설명하기 위해 도시한 도면이다. 이하에서 도 1 내지 도 4를 참조하여 객체 탐지 장치(100)에 대해 상세히 설명하도록 한다. 1 is a block diagram showing the configuration of an object detection device 100 according to an embodiment of the present invention, and FIGS. 2 to 4 are diagrams for explaining object detection using the object detection device 100 . Hereinafter, the object detection apparatus 100 will be described in detail with reference to FIGS. 1 to 4 .

도 1을 참조하면, 객체 탐지 장치(100)는 입력 모듈(110) 및 프로세서(120)를 포함하여 구성되며, 메모리(130)를 더 포함할 수 있다. 여기서, 입력 모듈(110)은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치를 포함할 수 있다. 프로세서(120)는 데이터를 제어 및 처리하는 다양한 종류의 장치들을 포함할 수 있다. 프로세서(120)는 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 일 예에서, 프로세서(120)는 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 형태로 구현될 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 메모리(130)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력을 필요로 하는 휘발성 저장장치를 통칭하는 것으로 해석되어야 한다. 메모리(130)는 프로세서(120)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행할 수 있다. 메모리(130)는 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치 외에 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. Referring to FIG. 1 , the object detection apparatus 100 includes an input module 110 and a processor 120, and may further include a memory 130. Here, the input module 110 may include a device including hardware and software necessary for transmitting/receiving a signal such as a control signal or a data signal through a wired/wireless connection with another network device. The processor 120 may include various types of devices that control and process data. The processor 120 may refer to a data processing device embedded in hardware having a physically structured circuit to perform functions expressed by codes or instructions included in a program. In one example, the processor 120 may include a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated circuit (ASIC), an FPGA ( field programmable gate array), etc., but the scope of the present invention is not limited thereto. The memory 130 should be interpreted as collectively referring to a non-volatile storage device that continuously maintains stored information even when power is not supplied and a volatile storage device that requires power to maintain stored information. The memory 130 may temporarily or permanently store data processed by the processor 120 . The memory 130 may include magnetic storage media or flash storage media in addition to volatile storage devices that require power to maintain stored information, but the scope of the present invention is not limited thereto. no.

입력 모듈(110)은 프로세서(120)에 의해 실행되는 프로그램 및 기능에 필요한 데이터를 입력 받는다. 입력 모듈(110)은 프로세서(120)의 딥러닝 및 객체 탐지를 위한 여러 카테고리들 각각에 해당하는 이미지 데이터를 입력 받을 수 있다. 여기서, 카테고리는 객체의 종류를 의미한다. 일 예에서, 카테고리는 사람, 자전거, 비행기일 수 있으나, 본 발명의 범위가 이에 제한되는 것은 아니다. 도면에 도시되지 않았으나, 입력 모듈(110)은 객체를 촬영하여 이미지를 획득하는 카메라 모듈, 컴퓨터의 입력 장치, 데이터 입력 회로, 통신 모듈을 포함할 수 있다. The input module 110 receives data required for programs and functions executed by the processor 120 . The input module 110 may receive image data corresponding to each of several categories for deep learning and object detection of the processor 120 . Here, the category means the type of object. In one example, the category may be a person, a bicycle, or an airplane, but the scope of the present invention is not limited thereto. Although not shown in the drawing, the input module 110 may include a camera module for capturing an image of an object, a computer input device, a data input circuit, and a communication module.

프로세서(120)는 입력 모듈(110)로 입력되는 데이터를 분석한다. 프로세서(120)는 1차 카테고리들 간의 유사도를 나타내도록 생성된 유사도 그래프와, CNN(Convolution Neural Network)을 통해 1차 카테고리들에 해당하는 1차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 1차 카테고리들 각각의 고유 벡터를 설정할 수 있다. CNN은 합성곱 신경망으로서 딥러닝 기반의 데이터 학습, 객체 분류, 이미지 데이터 분석, 특히 이미지 데이터로부터 객체 탐지를 위한 다양한 인공 신경망 모델을 포함하는 것으로 해석되어야 한다. 일 예에서, 프로세서(120)는 완전연결 레이어(Fully Connected Layer, FCL)를 이용하여 이미지 데이터로부터 객체가 존재할 위치와, 그 형태 및 종류를 추론하도록 학습할 수 있다. 또한, 프로세서(120)는 전역 평균 풀링(Global Average Pooling, GAP)을 통해 이미지의 차원을 줄여 벡터화하며 해당 벡터가 각 카테고리의 고유한 특성을 대표하도록 학습할 수 있다. The processor 120 analyzes data input to the input module 110 . The processor 120 uses a similarity graph generated to indicate a degree of similarity between primary categories and feature information of objects extracted from primary images corresponding to the primary categories through a Convolution Neural Network (CNN), An eigenvector of each of the tea categories may be set. CNN, as a convolutional neural network, should be interpreted as including various artificial neural network models for deep learning-based data learning, object classification, image data analysis, and especially object detection from image data. In one example, the processor 120 may learn to infer a location where an object will exist and its shape and type from image data using a Fully Connected Layer (FCL). In addition, the processor 120 may vectorize the image by reducing the dimensions of the image through global average pooling (GAP), and learn that the corresponding vector represents a unique characteristic of each category.

CNN을 통해 1차 이미지로부터 추출되는 객체들의 특징 정보는, 1차 이미지로부터 추출되는 객체들 각각의 형상 정보, 크기 정보 및 윤곽 정보 중 적어도 어느 하나 이상을 포함할 수 있다. 일 예에서, 1차 카테고리들은 사람, 비행기, 자전거, 버스, 기차, 오토바이, 모니터, TV, 의자, 소파, 고양이, 호랑이, 소, 양을 포함할 수 있으나, 본 발명의 범위가 이에 제한되는 것은 아니다. 1차 이미지는 1차 카테고리들 각각에 해당하는 객체가 포함된 이미지를 의미한다. 일 예에서, 1차 이미지는 사람의 영상 이미지, 비행기의 영상 이미지, 버스의 영상 이미지를 포함할 수 있다. Characteristic information of objects extracted from the primary image through CNN may include at least one or more of shape information, size information, and contour information of each of the objects extracted from the primary image. In one example, the primary categories may include people, airplanes, bicycles, buses, trains, motorcycles, monitors, TVs, chairs, sofas, cats, tigers, cows, and sheep, but the scope of the present invention is not limited thereto. no. The primary image refers to an image including objects corresponding to each of the primary categories. In one example, the primary image may include a video image of a person, a video image of an airplane, and a video image of a bus.

유사도 그래프는 1차 카테고리들에 대한 문장 데이터를 기초로 산출되는 유사도를 나타낸다. 유사도 그래프는 프로세서(120)에 의해 생성될 수 있으며, 메모리(130)에 저장될 수 있다. 유사도 그래프를 생성하기 위해 먼저, 프로세서(120)는 카테고리들에 대한 문장 데이터에 포함된 단어의 동시 출현 빈도를 고려하여 단어 임베딩을 수행한다. 여기서, 단어 임베딩은 단어 표현 글로벌 벡터(Global Vectors for Word Representation)를 이용하여 문장 내 단어 간 동시 출현빈도를 기초로 단어 간 관계를 임베딩하는 과정을 최적화하는 것을 의미할 수 있다. 이후 프로세서(120)는 수행된 단어 임베딩들을 이용하여 각 단어 임베딩 간 코사인 유사도를 계산하여 카테고리들 간 유사도를 행렬 등의 형태로 나타내는 유사도 그래프를 생성한다.The similarity graph represents the similarity calculated based on sentence data for the first categories. The similarity graph may be generated by the processor 120 and may be stored in the memory 130 . To generate a similarity graph, the processor 120 first performs word embedding in consideration of co-occurrence frequencies of words included in sentence data for categories. Here, the word embedding may mean optimizing a process of embedding a relationship between words based on a co-occurrence frequency between words in a sentence using global vectors for word representation. Thereafter, the processor 120 calculates the cosine similarity between each word embedding using the performed word embeddings to generate a similarity graph representing the similarity between categories in the form of a matrix or the like.

유사도 그래프의 일 예를 도시한 도 2를 참조하면, 사람(person)과 비행기(airplane)의 유사도는 0.32, 사람(person)과 오토바이(motor bike)와의 유사도는 0.52, 사람(person)과 자전거(bicycle) 간의 유사도는 0.53, 사람(person)과 말(horse)의 유사도는 0.5, 오토바이(motor bike)와 자전거(bicycle)의 유사도는 0.9로 설정될 수 있다. 다만, 도 2에 도시된 형태의 그래프뿐만 아니라 카테고리들의 명칭과 카테고리들 간 유사도 수치를 나타낼 수 있는 다양한 그래프의 형태로 본 발명에 따른 유사도 그래프가 생성될 수 있다. Referring to FIG. 2 showing an example of a similarity graph, the similarity between a person and an airplane is 0.32, the similarity between a person and a motor bike is 0.52, and a person and a bicycle ( The similarity between bicycles may be set to 0.53, the similarity between person and horse to 0.5, and the similarity between motor bike and bicycle to 0.9. However, the similarity graph according to the present invention may be generated in the form of various graphs capable of indicating names of categories and similarity values between categories, as well as the graph shown in FIG. 2 .

한편, 1차 카테고리들은 제1 내지 제3 카테고리를 포함할 수 있다. 이 때, 프로세서(120)는 제1 카테고리와 제2 카테고리간 유사도가 제1 카테고리와 제3 카테고리 간 유사도보다 큰 경우, 제1 카테고리에 설정되는 고유벡터와 제2 카테고리에 설정되는 고유벡터의 차이는 제1 카테고리에 설정되는 고유벡터와 제3 카테고리에 설정되는 고유벡터의 차이보다 작도록 설정할 수 있다. 이와 관련하여, 1차 카테고리들 각각에 설정되는 고유 벡터를 X축 및 Y축을 포함한 그래프 형태로 시각화한 일 예를 도시한 도 3을 참조하도록 한다. Meanwhile, the first categories may include first to third categories. At this time, if the similarity between the first category and the second category is greater than the similarity between the first category and the third category, the processor 120 determines the difference between the eigenvector set for the first category and the eigenvector set for the second category. may be set to be smaller than the difference between the eigenvector set in the first category and the eigenvector set in the third category. In this regard, reference is made to FIG. 3 showing an example in which eigenvectors set for each of the primary categories are visualized in the form of a graph including an X axis and a Y axis.

앞서 도 2를 참조하여 설명한 바와 같이 자전거(bicycle)와 오토바이(motor bike)의 유사도(0.9)는 자전거(bicycle)와 사람(person)의 유사도(0.53)보다 높다. 따라서, 도 3의 그래프에서, 자전거(bicycle)에 해당하는 점과 오토바이(motor bike)에 해당하는 점 간의 거리가, 자전거(bicycle)에 해당하는 점과 사람(person)에 해당하는 점 거리보다 가깝게 설정된 것을 알 수 있다. 즉, 자전거(bicycle)의 고유벡터와 오토바이(motor bike)의 고유벡터 간의 차이의 크기가, 자전거(bicycle)의 고유벡터와 사람(person)의 고유벡터의 차이의 크기보다 작도록 설정된다. 이와 같이, 객체의 특징 정보뿐만 아니라 카테고리 간 유사도를 고려하여 카테고리의 고유 벡터를 설정함으로써 이하에서 설명될 새로운 카테고리에 대한 고유 벡터 설정을 효과적으로 수행할 수 있다. As described above with reference to FIG. 2, the similarity (0.9) between a bicycle and a motor bike is higher than the similarity (0.53) between a bicycle and a person. Therefore, in the graph of FIG. 3, the distance between the point corresponding to the bicycle and the point corresponding to the motor bike is closer than the distance between the point corresponding to the bicycle and the point corresponding to the person. You can see what has been set. That is, the size of the difference between the eigenvector of the bicycle and the eigenvector of the motor bike is set to be smaller than the size of the difference between the eigenvector of the bicycle and the eigenvector of the person. In this way, by setting the eigenvector of the category in consideration of the similarity between categories as well as the feature information of the object, it is possible to effectively set the eigenvector for a new category to be described below.

프로세서(120)는 1차 카테고리들과 상이한 2차 카테고리에 대한 문장 데이터를 이용하여 유사도 그래프가 1차 카테고리들과 2차 카테고리를 포함한 카테고리들 간의 유사도를 나타내도록 유사도 그래프를 수정할 수 있다. 일 예에서, 프로세서(120)는 1차 카테고리에 포함되지 않은 새로운 카테고리인 전동 킥보드의 문장 데이터를 이용하여 1차 카테고리들(사람, 자전거, 오토바이, 비행기)간의 유사도만 나타나 있는 유사도 그래프를 2차 카테고리를 포함하도록 수정할 수 있다. 수정 과정은 상술한 유사도 그래프 생성 과정과 유사하게 수행될 수 있다. The processor 120 may modify the similarity graph so that the similarity graph represents the degree of similarity between the categories including the first categories and the second categories by using sentence data of the second categories that are different from the first categories. In one example, the processor 120 uses sentence data of an electric kickboard, which is a new category not included in the first category, to create a similarity graph showing only the similarity between the first categories (human, bicycle, motorcycle, airplane) in a second order Can be modified to include categories. The modification process may be performed similarly to the process of generating the above-described similarity graph.

프로세서(120)는 1차 이미지로부터 추출되는 객체들의 특징 정보, 수정된 유사도 그래프, 그리고, CNN을 통해 2차 카테고리에 해당하는 2차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 2차 카테고리의 고유 벡터를 설정할 수 있다. 일 예에서, 프로세서(120)에 의해 1차 카테고리들의 고유 벡터가 설정되어 있는 상태에서 2차 카테고리의 고유 벡터를 설정하는 경우, 2차 카테고리에 해당하는 소수의 이미지 만으로도 고유 벡터를 설정할 수 있다. The processor 120 uses the feature information of objects extracted from the primary image, the modified similarity graph, and the feature information of objects extracted from the secondary image corresponding to the secondary category through CNN to determine the second category. Eigenvectors can be set. In one example, when the processor 120 sets the eigenvectors of the 2nd category in a state where the eigenvectors of the 1st categories are set, the eigenvectors can be set with only a small number of images corresponding to the 2nd category.

보다 상세하게는, 먼저, 프로세서(120)는 1차 이미지를 분할하여 1차 이미지 내 위치별 1차 분할 이미지들을 획득할 수 있다. 프로세서(120)는 1차 분할 이미지들로부터 1차 이미지 내 객체가 존재하는 위치를 판단할 수 있다. 프로세서(120)는 1차 이미지 내 객체를 식별하여 식별된 객체를 상기 1차 카테고리들 중 어느 하나의 카테고리로 분류할 수 있다. 프로세서(120)는 1차 분할 이미지들 중 객체가 존재하는 1차 분할 이미지와 객체가 존재하는 1차 분할 이미지 내 객체의 카테고리에 설정된 고유벡터의 합성곱을 수행할 수 있다. 이에 따라, 1차 카테고리들 각각에 대한 합성곱 결과가 생성될 수 있다. 프로세서(120)는 1차 카테고리들 각각에 대한 합성곱 결과를 이용하여, 1차 카테고리들 각각에 해당하는 이미지로부터 더욱 정교한 객체 분석을 수행할 수 있다. More specifically, first, the processor 120 may divide the primary image to obtain primary divided images for each location in the primary image. The processor 120 may determine a position where an object exists in the primary image from the primary division images. The processor 120 may identify an object in the primary image and classify the identified object into one of the primary categories. The processor 120 may perform a convolutional product of a first division image having an object among first division images and an eigenvector set for a category of an object in the first division image having an object. Accordingly, convolution results for each of the first categories may be generated. The processor 120 may perform more sophisticated object analysis on images corresponding to each of the first categories by using a convolution result for each of the first categories.

다음으로, 프로세서(120)는 2차 이미지를 분할하여 2차 이미지 내 위치별 2차 분할 이미지들을 획득할 수 있다. 프로세서(120)는 2차 분할 이미지들로부터 2차 이미지 내 객체가 존재하는 위치를 판단할 수 있다. 프로세서(120)는 2차 분할 이미지들 중 객체가 존재하는 2차 분할 이미지와 2차 카테고리에 설정된 고유벡터의 합성곱을 수행할 수 있다. 이에 따라, 2차 카테고리에 대한 합성곱 결과가 생성될 수 있다. 프로세서(120)는 2차 카테고리에 대한 합성곱 결과를 이용하여, 2차 카테고리에 해당하는 이미지로부터 더욱 정교한 객체 분석을 수행할 수 있다.Next, the processor 120 may divide the secondary image to obtain secondary divided images for each location in the secondary image. The processor 120 may determine a position where an object exists in the secondary image from the secondary division images. The processor 120 may perform a convolutional product of a second division image in which an object exists among second division images and an eigenvector set in a second category. Accordingly, a convolution result for the second category may be generated. The processor 120 may perform more sophisticated object analysis from an image corresponding to the second category by using a convolution result for the second category.

상술한 1차 이미지 및 2차 이미지는 각각 복수개의 이미지들을 포함한다. 1차 이미지에 포함된 이미지의 개수 2차 이미지에 포함된 이미지의 개수보다 많도록 설정될 수 있다. 일 예에서, 먼저, 프로세서(120)는 1차 카테고리들에 해당하는 수천장의 이미지를 토대로 1차 카테고리들로 분류되는 객체들에 대한 데이터 학습을 완료할 수 있다. 이후, 프로세서(120)는 2차 카테고리에 해당하는 소수의 이미지, 예컨대 단 몇 장의 2차 카테고리의 이미지만으로 2차 카테고리로 분류되는 객체들에 대한 데이터 학습을 수행할 수 있다. 이와 같이, 프로세서(120)는 1차 카테고리들에 기설정된 고유벡터와 유사도 그래프를 사용함으로써, 2차 카테고리로 분류되는 이미지로부터 객체들의 위치와 종류를 더욱 정확하게 추출할 수 있다.Each of the above-described primary image and secondary image includes a plurality of images. The number of images included in the primary image may be set to be greater than the number of images included in the secondary image. In one example, first, the processor 120 may complete data learning on objects classified into the first categories based on thousands of images corresponding to the first categories. Thereafter, the processor 120 may perform data learning on objects classified into the second category using a small number of images corresponding to the second category, for example, only a few images of the second category. In this way, the processor 120 may more accurately extract the locations and types of objects from images classified into the second categories by using preset eigenvectors and similarity graphs for the first categories.

객체 탐지 장치(100)의 이미지 데이터 분석 과장의 일 예를 도시한 도 4를 참조하도록 한다. 프로세서(120)는 1차 카테고리들에 해당하는 1차 이미지 및 2차 카테고리에 해당하는 2차 이미지에 대한 딥러닝을 수행한 후, 1차 카테고리들 및 2차 카테고리를 포함하는 카테고리들 중 어느 하나의 카테고리에 해당하는 분석 대상 이미지(410)를 입력 받을 수 있다. 프로세서(120)는 CNN, 1차 카테고리들 각각에 대한 합성곱 결과. 그리고, 2차 카테고리에 대한 합성곱 결과를 이용하여, 분석 대상 이미지로부터 객체를 식별할 수 있다. 420은 프로세서(120)가 분석 대상 이미지 내에서 분석 대상 이미지로부터 식별되는 객체의 위치를 판단하고, 분석 대상 이미지로부터 식별되는 객체를 1차 카테고리들 및 2차 카테고리를 포함하는 카테고리들 중 어느 하나의 카테고리로 분류하는 것을 나타낸다. 이후, 최종적으로 분석 대상 이미지(410)로부터 객체를 식별한 분석 결과(430)가 생성될 수 있다. Referring to FIG. 4 showing an example of image data analysis exaggeration of the object detection device 100 . The processor 120 performs deep learning on the first image corresponding to the first categories and the second image corresponding to the second category, and then selects one of categories including the first categories and the second category. An analysis target image 410 corresponding to the category of may be input. The processor 120 generates convolutional results for CNN and each of the first categories. In addition, the object may be identified from the image to be analyzed using the convolution result for the second category. In operation 420, the processor 120 determines the position of the object identified from the analysis target image within the analysis target image, and assigns the object identified from the analysis target image to any one of categories including primary categories and secondary categories. Indicates sorting into categories. Thereafter, an analysis result 430 of finally identifying an object from the analysis target image 410 may be generated.

도 5는 본 발명의 다른 실시예에 따른 객체 탐지 방법의 순서를 도시한 흐름도이고, 도 6 및 도 7은 도 5에 도시된 객체 탐지 방법의 일부 단계의 세부 과정을 도시한 흐름도이다. 이하에서 도 5 내지 도 7을 참조하여 본 실시예에 따른 객체 탐지 방법을 상세하게 설명하도록 한다. 5 is a flow chart showing a sequence of an object detection method according to another embodiment of the present invention, and FIGS. 6 and 7 are flowcharts showing detailed processes of some steps of the object detection method shown in FIG. 5 . Hereinafter, the object detection method according to the present embodiment will be described in detail with reference to FIGS. 5 to 7 .

본 실시예에 따른 객체 탐지 방법은 앞서 도 1 내지 도 4를 참조하여 설명한 객체 탐지 장치(도 1의 100)에 의해 수행될 수 있다. 예컨대, 이하에서 설명되는 단계들은 객체 탐지 장치(도 1의 100)의 프로세서(도 1의 120)에 의해 실행될 수 있다. 따라서, 앞서 도 1 내지 도 4를 참조하여 설명한 객체 탐지 장치(도 1의 100)에 대한 설명은 이하의 객체 탐지 방법에도 동일하게 적용될 수 있다. The object detection method according to this embodiment may be performed by the object detection device (100 in FIG. 1) described above with reference to FIGS. 1 to 4. For example, the steps described below may be executed by a processor ( 120 in FIG. 1 ) of the object detection device ( 100 in FIG. 1 ). Accordingly, the description of the object detection device (100 in FIG. 1) described above with reference to FIGS. 1 to 4 may be equally applied to the object detection method below.

본 실시예에 따른 객체 탐지 방법은, 1차 카테고리들에 대한 문장 데이터를 기초로 1차 카테고리들 간의 유사도를 나타내도록 생성된 유사도 그래프, 그리고, CNN(Convolution Neural Network)을 통해 1차 카테고리들에 해당하는 1차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 1차 카테고리들 각각의 고유 벡터를 설정하는 단계(S110), 1차 카테고리들과 상이한 2차 카테고리에 대한 문장 데이터를 이용하여 유사도 그래프가 1차 카테고리들과 2차 카테고리를 포함한 카테고리들 간의 유사도를 나타내도록 유사도 그래프를 수정하는 단계(S120) 및 1차 이미지로부터 추출되는 객체들의 특징 정보, 수정된 유사도 그래프, 그리고, CNN을 통해 2차 카테고리에 해당하는 2차 이미지로부터 추출되는 객체들의 특징 정보를 이용하여, 2차 카테고리의 고유 벡터를 설정하는 단계(S130)를 포함한다. In the object detection method according to the present embodiment, a similarity graph generated to indicate the degree of similarity between the first categories based on sentence data of the first categories and a convolution neural network (CNN) are used to detect the first categories. Setting an eigenvector of each of the first categories by using feature information of objects extracted from the corresponding primary image (S110), using sentence data for a secondary category different from the primary categories, and a similarity graph Modify the similarity graph to indicate the degree of similarity between the categories including the first categories and the second categories (S120), and through the feature information of objects extracted from the primary image, the modified similarity graph, and CNN, 2 An eigenvector of the second category is set using feature information of objects extracted from the second image corresponding to the second category (S130).

S110 단계는, 1차 이미지를 분할하여 1차 이미지 내 위치별 1차 분할 이미지들을 획득하는 단계(S111), 1차 분할 이미지들로부터 1차 이미지 내 객체가 존재하는 위치를 판단하는 단계(S112) 및 1차 이미지 내 객체를 식별하여 식별된 객체를 1차 카테고리들 중 어느 하나의 카테고리로 분류하는 단계(S113)를 포함할 수 있다. 또한, S110 단계는, 1차 분할 이미지들 중 객체가 존재하는 1차 분할 이미지와 객체가 존재하는 1차 분할 이미지 내 객체의 카테고리에 설정된 고유벡터의 합성곱을 수행하여, 1차 카테고리들 각각에 대한 합성곱 결과를 생성하는 단계(S114)를 더 포함할 수 있다. In step S110, the primary image is segmented to obtain primary segmented images for each location in the primary image (S111), and the location of the object in the primary image is determined from the primary segmented images (S112). and identifying an object in the primary image and classifying the identified object into one of primary categories (S113). In addition, in step S110, a convolutional product is performed between the first division image in which the object exists among the first division images and the eigenvector set for the category of the object in the first division image in which the object exists, for each of the first division categories. A step of generating a convolution result (S114) may be further included.

1차 이미지로부터 추출되는 객체들의 특징 정보는, 1차 이미지로부터 추출되는 객체들 각각의 형상 정보, 크기 정보 및 윤곽 정보 중 적어도 어느 하나 이상을 포함할 수 있다. Characteristic information of objects extracted from the primary image may include at least one of shape information, size information, and contour information of each of the objects extracted from the primary image.

1차 카테고리들은 제1 내지 제3 카테고리를 포함하고, S110단계에서, 상기 제1 카테고리와 제2 카테고리간 유사도가 제1 카테고리와 제3 카테고리 간 유사도보다 큰 경우, 제1 카테고리에 설정되는 고유벡터와 제2 카테고리에 설정되는 고유벡터의 차이는 제1 카테고리에 설정되는 고유벡터와 제3 카테고리에 설정되는 고유벡터의 차이보다 작도록 설정될 수 있다. The first categories include first to third categories, and in step S110, when the degree of similarity between the first and second categories is greater than the degree of similarity between the first and third categories, the eigenvector set in the first category A difference between eigenvectors set in the second category and eigenvectors set in the second category may be set to be smaller than a difference between eigenvectors set in the first category and eigenvectors set in the third category.

S110 단계에서 이용되는 1차 이미지 및 S130 단계에서 이용되는 2차 이미지는 각각 복수개의 이미지들을 포함하며, 1차 이미지에 포함된 이미지의 개수가 상기 2차 이미지에 포함된 이미지의 개수보다 많도록 설정될 수 있다. The primary image used in step S110 and the secondary image used in step S130 each include a plurality of images, and the number of images included in the primary image is set to be greater than the number of images included in the secondary image. It can be.

S130 단계는 2차 이미지를 분할하여 2차 이미지 내 위치별 2차 분할 이미지들을 획득하는 단계(S131), 2차 분할 이미지들로부터 2차 이미지 내 객체가 존재하는 위치를 판단하는 단계(S132) 및 2차 분할 이미지들 중 객체가 존재하는 2차 분할 이미지와 2차 카테고리에 설정된 고유벡터의 합성곱을 수행하여 상기 2차 카테고리에 대한 합성곱 결과를 생성하는 단계(S133)를 포함할 수 있다. Step S130 is a step of dividing the secondary image to obtain secondary segmented images for each position in the secondary image (S131), determining a location where an object exists in the secondary image from the secondary segmentation images (S132), and A step of generating a convolution result for the second category by performing a convolution between a second division image in which an object exists among second division images and an eigenvector set in the second category (S133).

본 실시예에 따른 객체 탐지 방법은, S130 단계 이후에 수행되는, 1차 카테고리들 및 2차 카테고리를 포함하는 카테고리들 중 어느 하나의 카테고리에 해당하는 분석 대상 이미지를 입력 받는 단계(S140), 그리고, CNN, 1차 카테고리들 각각에 대한 합성곱 결과. 그리고, 2차 카테고리에 대한 합성곱 결과를 이용하여, 분석 대상 이미지로부터 객체를 식별하는 단계(S150)를 더 포함할 수 있다. The object detection method according to the present embodiment includes a step of receiving an analysis target image corresponding to any one of categories including first categories and second categories, which is performed after step S130 (S140), and , CNN, convolution results for each of the first-order categories. The method may further include identifying an object from the image to be analyzed using a convolution result for the second category (S150).

S150 단계는. 분석 대상 이미지 내에서 분석 대상 이미지로부터 식별되는 객체의 위치를 판단하는 단계(S151), 분석 대상 이미지로부터 식별되는 객체를 상기 1차 카테고리들 및 상기 2차 카테고리를 포함하는 카테고리들 중 어느 하나의 카테고리로 분류하는 단계(S152)를 포함할 수 있다. Step S150 is. Determining the location of the object identified from the analysis target image within the analysis target image (S151), and assigning the object identified from the analysis target image to any one of categories including the first categories and the second category. It may include a step of classifying into (S152).

도 5 내지 도 7을 참조하여 설명한 객체 탐지 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.The object detection method described with reference to FIGS. 5 to 7 may be implemented in the form of a recording medium including instructions executable by a computer, such as program modules executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Although the methods and systems of the present invention have been described with reference to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form. The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

Claims

An object detection apparatus using a similarity between categories and an eigenvector set for each of the categories,
input module; and
A processor for analyzing data input to the input module;
the processor,
A similarity graph generated to indicate the degree of similarity between the first categories based on sentence data for the first categories, and extracted from the first images corresponding to the first categories through a convolution neural network (CNN) Using feature information of objects, setting an eigenvector of each of the first categories;
modifying the similarity graph so that the similarity graph indicates a degree of similarity between categories including the first categories and the second categories using sentence data for a second category different from the first categories;
Using the feature information of objects extracted from the primary image, the modified similarity graph, and the feature information of objects extracted from the secondary image corresponding to the secondary category through the CNN, An object detection device configured to perform setting an eigenvector.

According to claim 1,
the processor,
The primary image is segmented to obtain primary segmented images for each position in the primary image, a position where an object exists in the primary image is determined from the primary segmented images, and an object in the primary image is determined. and further performing identifying and classifying the identified object into any one of the primary categories.

According to claim 2,
the processor,
A convolution result of each of the first categories is performed by performing a convolution between a first division image in which an object exists among the first division images and an eigenvector set for a category of an object in the first division image in which the object exists. An object detection device configured to further perform generating an object detection device.

According to claim 1,
The feature information of the objects extracted from the primary image includes at least one or more of shape information, size information, and contour information of each of the objects extracted from the primary image.

According to claim 1,
The first categories include first to third categories,
the processor,
When the degree of similarity between the first category and the second category is greater than the degree of similarity between the first category and the third category, the difference between the eigenvector set in the first category and the eigenvector set in the second category is and setting the difference between the eigenvector set in the first category and the eigenvector set in the third category to be smaller than that of the eigenvector set in the third category.

According to claim 1,
The first image and the second image each include a plurality of images, and the number of images included in the first image is set to be greater than the number of images included in the second image.

According to claim 3,
the processor,
The secondary image is segmented to obtain secondary segmented images for each location in the secondary image, a location where an object exists in the secondary image is determined from the secondary segmented images, and among the secondary segmented images and performing a convolutional product of a second division image in which an object exists and an eigenvector set in the second category to generate a convolution result for the second category.

According to claim 7,
the processor,
Receiving an analysis target image corresponding to any one of categories including the first categories and the second categories;
CNN, convolution results for each of the first-order categories. And, the object detection apparatus configured to further perform identification of an object from the analysis target image by using a convolutional product result for the second category.

According to claim 8,
the processor,
The location of an object identified from the analysis target image is determined within the analysis target image, and the object identified from the analysis target image is assigned to any one of categories including the first categories and the second category. An object detection device, configured to further perform classification.

An object detection method performed by an object detection apparatus using a similarity between categories and an eigenvector set for each of the categories,
(a) a similarity graph generated by the object detection apparatus to indicate a degree of similarity between the first categories based on sentence data of the first categories, and the first categories through a Convolution Neural Network (CNN) setting an eigenvector of each of the primary categories by using feature information of objects extracted from the primary image corresponding to;
(b) The object detection apparatus causes the similarity graph to indicate a degree of similarity between categories including the first categories and the second categories by using sentence data for a second category different from the first categories. modifying the similarity graph; and
(c) The object detection device, through the feature information of objects extracted from the primary image, the modified similarity graph, and the feature information of objects extracted from the secondary image corresponding to the secondary category through the CNN and setting an eigenvector of the second category by using , an object detection method.

According to claim 10,
In step (a),
obtaining, by the object detection device, first divided images for each position in the first image by dividing the first image;
determining, by the object detection device, a position where an object exists in the primary image from the primary division images; and
and classifying, by the object detection device, an object in the primary image and classifying the identified object into one of the primary categories.

According to claim 11,
In step (a),
The object detection apparatus performs a convolutional product of a first division image in which an object exists among the first division images and an eigenvector set for a category of an object in the first division image in which the object exists, and determines the first categories Further comprising the step of generating a convolution result for each, the object detection method.

According to claim 10,
The feature information of the objects extracted from the primary image includes at least one or more of shape information, size information, and contour information of each of the objects extracted from the primary image.

According to claim 10,
The first categories include first to third categories,
In step (a), when the similarity between the first category and the second category is greater than the similarity between the first category and the third category, the eigenvector set to the first category and the second category set The difference between the eigenvectors set to be smaller than the difference between the eigenvectors set in the first category and the eigenvectors set in the third category.

According to claim 10,
The primary image used in step (a) and the secondary image used in step (c) each include a plurality of images, and the number of images included in the primary image is included in the secondary image. An object detection method that is set to be greater than the number of images displayed.

According to claim 12,
In step (c),
obtaining, by the object detection device, second divided images for each location in the secondary image by dividing the secondary image;
determining, by the object detection device, a location where an object exists in the secondary image from the secondary divided images; and
The object detection device performs a convolutional product of a second division image in which an object exists among the second division images and an eigenvector set in the second category to generate a convolution result for the second category. , an object detection method.

According to claim 16,
(d) receiving, by the object detection device, an analysis target image corresponding to any one of categories including the first categories and the second categories; and
The object detection apparatus, (e) CNN, a convolution result for each of the first categories. and identifying an object from the analysis target image by using a convolutional product result for the second category.

According to claim 17,
In step (e),
determining, by the object detection device, a position of an object identified from the analysis target image within the analysis target image; and
and classifying, by the object detection device, an object identified from the analysis target image into any one category among categories including the first categories and the second category.

A non-transitory computer-readable recording medium on which a computer program for performing the object detection method according to any one of claims 10 to 18 is recorded.