KR20220076398A

KR20220076398A - Object recognition processing apparatus and method for ar device

Info

Publication number: KR20220076398A
Application number: KR1020210168994A
Authority: KR
Inventors: 이경한; 함성민
Original assignee: 서울대학교산학협력단
Priority date: 2020-11-30
Filing date: 2021-11-30
Publication date: 2022-06-08
Also published as: KR102592551B1

Abstract

본 발명의 일 실시예에 따른 AR(Augmented Reality) 장치를 위한 객체 인식 처리 장치는 통신모듈; 객체 인식 프로그램이 저장된 메모리; 및 상기 객체 인식 프로그램을 실행하는 프로세서를 포함한다. 이때, 객체 인식 프로그램은 적어도 하나 이상의 AR 장치로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신하는 AR 정보 수집 모듈, 상기 AR 영상에서 각각의 객체를 구분하는 객체 탐지 모듈, 상기 객체 탐지 모듈을 통해 구분된 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출하는 객체 추적 및 데이터셋 수집 모듈, 상기 객체 추적 및 데이터셋 수집 모듈로부터 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하는 객체 학습 모듈 및 입력된 AR 영상으로부터 상기 객체 탐지 모듈을 통해 구분된 객체를 상기 객체 학습 모듈에 입력하여 객체의 식별 정보를 출력하는 객체 인지 모듈을 포함한다.An object recognition processing apparatus for an AR (Augmented Reality) device according to an embodiment of the present invention includes a communication module; a memory in which an object recognition program is stored; and a processor executing the object recognition program. In this case, the object recognition program includes an AR information collection module that receives an AR image, a shooting time of the AR image, location information of a place where the AR image was shot, and information on the AR device that captured the image from at least one AR device; An object detection module for discriminating each object in the AR image, receiving object category information divided through the object detection module, tracking the object based on the degree of similarity of each object, and extracting a data set for learning the object Object tracking and dataset collection module, an object learning module that receives a data set from the object tracking and dataset collection module, and performs learning to classify each object based on the similarity of the received data set, and an input AR image and an object recognition module for inputting an object identified through the object detection module from the object learning module to output identification information of the object.

Description

Object recognition processing apparatus and method for AR device {OBJECT RECOGNITION PROCESSING APPARATUS AND METHOD FOR AR DEVICE}

본 발명은 AR 장치를 위한 객체 인식 처리 장치 및 방법에 관한 것이다.The present invention relates to an object recognition processing apparatus and method for an AR device.

증강현실(AR) 시스템은 특별한 이벤트(박물관, 기계 조립 등)에서만 스마트폰 화면을 통해 장면을 인지하는 이벤트형 사용방식에서 출발하여, 최근에는 스마트 글라스와 같은 웨어러블 장치를 착용하는 형태로 변화하고 있다. Augmented reality (AR) systems start from an event-type usage method that recognizes a scene through a smartphone screen only at a special event (museum, machine assembly, etc.), and is recently changing to a wearable device such as smart glasses. .

기존의 이벤트형 AR 시스템을 위해서는 특정한 조건을 인식하기 위한 식별 수단(QR코드, 번호판, 위치 좌표)을 제공하거나 소수의 구성요소(전시물, 기계 부품)에 대해 학습하여 증강할 정보를 얻을 수 있었다. 하지만 기존 학습 및 추론 방식을 새로운 사용방식에 적용할 경우, 기존 이벤트형 AR시스템과 같이 소수의 구성요소로 전체 객체의 범위를 한정하지 않는다면, 학습 난이도의 문제로 객체들의 카테고리(사람, 자동차, 의자 등)를 분류(classify)하는 정도의 수준만을 얻을 수 있게 된다.For the existing event-type AR system, it was possible to provide identification means (QR code, license plate, location coordinates) for recognizing specific conditions or learn about a small number of components (exhibits, machine parts) to obtain information to be augmented. However, when applying the existing learning and inference method to a new usage method, if the scope of the entire object is not limited to a small number of components as in the existing event-type AR system, the category of objects (person, car, chair) is a problem of learning difficulty. etc.), it is possible to obtain only a level of classifying (classifying).

이를 극복하기 위해서는 단순히 학습, 모델을 늘리는 것이 아닌 AR의 특성을 고려한 효율적인 대안이 필요하다. 그 이유는, 첫번째로 모든 객체에 대해서 충분히 학습할 수 있는 데이터 셋(dataset)을 만드는 것이 매우 어렵고, 두번째로 시간과 공간을 한정하여 범위 내의 모든 객체에 대한 데이터 셋을 확보한다고 하더라도 한정된 범위를 넘어가는 상황에서 매번 새로운 객체에 대한 모델을 단시간 내에 수정하는 것은 한계가 있기 때문이다.In order to overcome this, an efficient alternative that considers the characteristics of AR, rather than simply increasing the number of learning and models, is needed. The reason is that, first, it is very difficult to create a dataset that can sufficiently learn about all objects. This is because there is a limit to modifying the model for a new object in a short time every time in the situation of

또한, 통제된 상황에서의 기존 딥러닝 학습과는 달리 객체 이미지의 훼손(블러, 가려짐 등)이 빈번한 AR 환경 또한 고려되어야 한다. In addition, unlike the existing deep learning learning in a controlled situation, the AR environment where damage to the object image (blur, occlusion, etc.) is frequent must also be considered.

새로운 AR 시스템용 딥러닝 알고리즘의 요구사항은 다음과 같이 크게 2가지이다.There are two major requirements for deep learning algorithms for new AR systems.

첫번째는 객체의 카테고리를 넘어선 객체별 구분을 가능한 성능을 확보하는 것이며, 두번째로 객체 이미지가 훼손된 경우에도 이를 극복할 수 있어야한다.The first is to secure the performance capable of classifying each object beyond the category of the object, and secondly, it must be able to overcome this even when the object image is damaged.

객체 분류(object detection/classification)와 관련하여 일반적으로 널리 알려진 이미지 딥러닝 기법은 전체 이미지 안에서 특정 객체가 존재하는 영역(Bounding box)과 해당 객체의 카테고리를 동시에 추론하는 기법으로 Mask RCNN, YOLO v3, SSD 등의 다양한 연구가 진행되고 있다. 또한 입력 데이터가 동영상인 경우에는 매 프레임마다 이미지 딥러닝 기법을 사용하는 것은 비효율적이므로, 객체의 카테고리와 무관(class-agnostic)하게 특정 객체의 경계 상자(bounding box)가 주어지면 연속된 프레임에서 해당 객체의 이동을 파악하여 이동된 경계 상자를 제공하는 객체 추적(tracking) 딥러닝 기법이 사용되며, MDNet, SiamFC, SiamRPN++ 등의 다양한 연구가 진행되고 있다.In relation to object detection/classification, the widely known image deep learning technique is a technique that simultaneously infers the area where a specific object exists (Bounding box) and the category of the object within the entire image. Mask RCNN, YOLO v3, Mask RCNN, YOLO v3, Various studies such as SSD are in progress. Also, if the input data is a video, it is inefficient to use the image deep learning technique for every frame, so if a bounding box of a specific object is given regardless of the object category (class-agnostic), the An object tracking deep learning technique that detects the movement of an object and provides a moved bounding box is used, and various studies such as MDNet, SiamFC, and SiamRPN++ are in progress.

이미지 기반의 객체 검출(object dection)과 대비하여 객체 추적(object tracking)은 같은 객체가 시간에 따라 각도, 조명 등의 변화에 의해 다른 모습을 보이게 되는 객체 변형(object deformation) 에 더 쉬운 대처가 가능하다는 특징이 있다.In contrast to image-based object detection, object tracking is easier to deal with object deformation, in which the same object takes on a different appearance due to changes in angle, lighting, etc. over time. It has a characteristic that

따라서 이러한 특징을 활용하면 AR 시스템에서 객체 추적을 활용하여 학습 데이터 셋을 구축하는 동시에, 이 데이터 셋을 통해 학습된 분류 모델(classification model)과 샴 유사성 모델(Siamese similarity model)의 조합을 통해서 카테고리보다 더 구체적인 객체 ID별 구분이 가능하며 객체 이미지 훼손(deformation) 또한 극복할 수 있을 것으로 생각된다.Therefore, by utilizing these features, the AR system uses object tracking to build a training data set, and at the same time, through a combination of a classification model and a Siamese similarity model It is possible to classify by more specific object ID, and it is thought that object image deformation can also be overcome.

본 발명의 일 과제는, 증강현실 시스템의 객체 인식 알고리즘의 성능을 강화시킬 수 있는 새로운 학습 및 추론 방법이 적용된 객체 인식 처리 장치를 제공하는 것을 목적으로 한다.An object of the present invention is to provide an object recognition processing apparatus to which a new learning and inference method capable of enhancing the performance of an object recognition algorithm of an augmented reality system is applied.

본 발명의 과제는 이상에서 언급된 과제들로 제한되지 않으며, 언급되지 않은 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems of the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기한 기술적 과제를 해결하기 위한 기술적 수단으로서, 본 발명의 일 실시예에 따른 AR(Augmented Reality) 장치를 위한 객체 인식 처리 장치는, 통신모듈; 객체 인식 프로그램이 저장된 메모리; 및 상기 객체 인식 프로그램을 실행하는 프로세서를 포함하되, 상기 객체 인식 프로그램은 적어도 하나 이상의 AR 장치로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신하는 AR 정보 수집 모듈, 상기 AR 영상에서 각각의 객체를 구분하는 객체 탐지 모듈, 상기 객체 탐지 모듈을 통해 구분된 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출하는 객체 추적 및 데이터셋 수집 모듈, 상기 객체 추적 및 데이터셋 수집 모듈로부터 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하는 객체 학습 모듈 및 입력된 AR 영상으로부터 상기 객체 탐지 모듈을 통해 구분된 객체를 상기 객체 학습 모듈에 입력하여 객체의 식별 정보를 출력하는 객체 인지 모듈을 포함한다.As a technical means for solving the above technical problem, an object recognition processing apparatus for an AR (Augmented Reality) device according to an embodiment of the present invention includes: a communication module; a memory in which an object recognition program is stored; and a processor executing the object recognition program, wherein the object recognition program includes an AR image from at least one AR device, a recording time of the AR image, location information of a place where the AR image was captured, and an AR device that captured the image An AR information collection module for receiving information on, an object detection module for discriminating each object in the AR image, and an object category information divided through the object detection module and tracking the object based on the degree of similarity of each object, , an object tracking and dataset collection module for extracting a data set for learning the corresponding object, receiving a data set from the object tracking and dataset collection module, and classifying each object based on the similarity of the received data set and an object learning module for performing learning, and an object recognition module for outputting identification information of an object by inputting an object separated from the input AR image through the object detection module into the object learning module.

본 발명의 다른 실시예에 따른, AR(Augmented Reality) 장치를 위한 객체 인식 처리 장치를 이용한 객체 인식 처리 방법은, 적어도 하나 이상의 AR 장치로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신하는 단계; 상기 AR 영상에서 각각의 객체를 구분하고, 구분된 객체 카테고리 정보를 생성하는 단계; 상기 구분된 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출하는 단계; 상기 추출된 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하여 객체 학습 모듈을 구축하는 단계; 및 입력된 AR 영상으로부터 상기 구분된 객체를 상기 객체 학습 모듈에 입력하여 객체의 식별 정보를 출력하는 객체 인지 단계를 포함한다.According to another embodiment of the present invention, an object recognition processing method using an object recognition processing apparatus for an AR (Augmented Reality) device includes an AR image from at least one AR device, a recording time of the AR image, and a place where the AR image is captured. receiving location information of the AR device and information about an AR device that has captured the corresponding image; classifying each object in the AR image and generating classified object category information; receiving the divided object category information, tracking the object based on the degree of similarity of each object, and extracting a data set for learning the object; building an object learning module by receiving the extracted data set and performing learning to classify each object based on the similarity of the received data set; and inputting the object divided from the input AR image into the object learning module and outputting identification information of the object.

전술한 본원의 과제의 해결 수단에 의하면, AR 시스템에서 객체 추적을 활용하여 학습 데이터 셋을 구축하는 동시에, 이 데이터 셋을 통해 학습된 분류 모델(classification model)과 샴 유사성 모델(Siamese similarity model)의 조합을 통해서 카테고리보다 더 구체적인 객체 ID별 구분이 가능하며 객체 이미지 훼손(deformation) 또한 극복할 수 있다.According to the above-described means for solving the problem of the present application, a training data set is constructed by utilizing object tracking in the AR system, and at the same time, a classification model and a Siamese similarity model learned through this data set are Through the combination, more specific object ID classification than category is possible, and object image deformation can also be overcome.

도 1은 본 발명의 일 실시예에 따른 AR 장치를 위한 객체 인식 처리 장치의 구성을 도시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 객체 인식 프로그램의 구성을 도시한 블록도이다.
도 3은 본 발명의 일 실시예에 따른 객체 탐지 모듈의 동작을 설명하기 위한 도면이다.
도 4와 도 5는 본 발명의 일 실시예에 따른 객체 추적 및 데이터셋 수집 모듈의 동작을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 객체 학습 모듈의 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 객체 인지 모듈의 동작을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시예에 따른 객체 인식 방법을 도시한 순서도이다.1 is a block diagram illustrating the configuration of an object recognition processing apparatus for an AR device according to an embodiment of the present invention.
2 is a block diagram illustrating the configuration of an object recognition program according to an embodiment of the present invention.
3 is a diagram for explaining an operation of an object detection module according to an embodiment of the present invention.
4 and 5 are diagrams for explaining the operation of the object tracking and data set collection module according to an embodiment of the present invention.
6 is a diagram for explaining an operation of an object learning module according to an embodiment of the present invention.
7 is a diagram for explaining an operation of an object recognition module according to an embodiment of the present invention.
8 is a flowchart illustrating an object recognition method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present application pertains can easily carry out. However, the present application may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is "connected" with another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 “상에” 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be located “on” another member, this includes not only a case in which a member is in contact with another member but also a case in which another member is present between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 본원 명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본원의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본원 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다.Throughout this specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. As used throughout this specification, the terms "about," "substantially," and the like are used in a sense at or close to the numerical value when the manufacturing and material tolerances inherent in the stated meaning are presented, and are intended to enhance the understanding of this application. To help, precise or absolute figures are used to prevent unfair use by unconscionable infringers of the stated disclosure. The term “step of” or “step of” to the extent used throughout this specification does not mean “step for”.

이하, 첨부한 도면들 및 후술되어 있는 내용을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 그러나, 본 발명은 여기서 설명되어지는 실시예들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 명세서 전체에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings and the content to be described later. However, the present invention is not limited to the embodiments described herein and may be embodied in other forms. Like reference numerals refer to like elements throughout.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다.Throughout this specification, when a part is "connected" with another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. do.

이하, 본 발명의 일 실시예에 따른 산업재해 사건 추출 시스템에 대하여 설명한다.Hereinafter, an industrial accident event extraction system according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 AR 장치를 위한 객체 인식 처리 장치의 구성을 도시한 블록도이다.1 is a block diagram illustrating the configuration of an object recognition processing apparatus for an AR device according to an embodiment of the present invention.

도 1을 참조하여 설명하면, 객체 인식 처리 장치(100)는 복수의 AR 장치(200~204)로부터 수집되는 AR 영상을 이용하여, 객체 인식을 처리하는 것으로서, 마치 서버와 같은 동작을 수행할 수 있다. 이를 위해, 객체 인식 처리 장치(100)는 통신모듈(110), 메모리(120), 프로세서(130), 데이터베이스(140)를 포함한다.Referring to FIG. 1 , the object recognition processing device 100 processes object recognition using AR images collected from a plurality of AR devices 200 to 204 , and may perform an operation like a server. have. To this end, the object recognition processing apparatus 100 includes a communication module 110 , a memory 120 , a processor 130 , and a database 140 .

다음으로, 통신 모듈(110)은 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크를 사용하는 통신 모듈을 포함할 수 있다. 특히, 통신 모듈(110)은 외부 통신망을 통해 접속하는 각 AR 장치(200~204)와의 통신 인터페이스를 제공한다.Next, the communication module 110 is a wired network such as a local area network (LAN), a wide area network (WAN), or a value added network (VAN), or a mobile radio communication network (mobile radio communication network). ) or a communication module using all kinds of wireless networks, such as satellite communication networks. In particular, the communication module 110 provides a communication interface with each AR device 200 to 204 connected through an external communication network.

메모리(120)는 객체 인식 프로그램이 저장된 것이다. 객체 인식 프로그램은 적어도 각각의 AR 장치(200~204)로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신하는 AR 정보 수집 모듈 (310), AR 영상에서 각각의 객체를 구분하는 객체 탐지 모듈(320), 객체 탐지 모듈(320)을 통해 구분된 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출하는 객체 추적 및 데이터셋 수집 모듈(330), 객체 추적 및 데이터셋 수집 모듈(330)로부터 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하는 객체 학습 모듈(340) 및 입력된 AR 영상으로부터 객체 탐지 모듈(320)을 통해 구분된 객체를 객체 학습 모듈(340)에 입력하여 객체의 식별 정보를 출력하는 객체 인지 모듈(350)을 포함한다.The memory 120 stores an object recognition program. The object recognition program collects AR information for receiving at least the AR image, the shooting time of the AR image, location information of the place where the AR image was taken, and information about the AR device that took the image from each of the AR devices 200 to 204 The module 310, the object detection module 320 for discriminating each object in the AR image, receives the object category information divided through the object detection module 320, and tracks the object based on the similarity of each object, Receive a data set from the object tracking and dataset collection module 330, which extracts a data set for object learning, and the object tracking and dataset collection module 330, and collects each object based on the similarity of the received data set. An object learning module 340 that performs learning to classify and an object recognition module that inputs an object classified from the input AR image through the object detection module 320 into the object learning module 340 and outputs identification information of the object (350).

메모리(120)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력을 필요로 하는 휘발성 저장장치를 통칭하는 것으로 해석되어야 한다. 또한, 메모리(120)는 프로세서(130)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행할 수 있다. 메모리(120)는 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치 외에 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. The memory 120 should be interpreted as a generic term for a non-volatile storage device that continuously maintains stored information even when power is not supplied, and a volatile storage device that requires power to maintain the stored information. In addition, the memory 120 may perform a function of temporarily or permanently storing data processed by the processor 130 . The memory 120 may include magnetic storage media or flash storage media in addition to the volatile storage device that requires power to maintain stored information, but the scope of the present invention is not limited thereto. not.

프로세서(130)는 메모리(120)에 저장된 객체 인식 프로그램을 실행한다. 프로세서(130)는 데이터를 제어 및 처리하는 다양한 종류의 장치들을 포함할 수 있다. 프로세서(130)는 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 일 예에서, 프로세서(200)는 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 형태로 구현될 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The processor 130 executes an object recognition program stored in the memory 120 . The processor 130 may include various types of devices for controlling and processing data. The processor 130 may refer to a data processing device embedded in hardware having a physically structured circuit to perform a function expressed as a code or an instruction included in a program. In one example, the processor 200 is a microprocessor (microprocessor), central processing unit (CPU), processor core (processor core), multiprocessor (multiprocessor), ASIC (application-specific integrated circuit), FPGA ( field programmable gate array), but the scope of the present invention is not limited thereto.

또한, 데이터베이스(140)는 각 AR 장치로부터 수집되는 각종 AR 영상과 AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보 등을 관리할 수 있다.In addition, the database 140 may manage various AR images collected from each AR device, the shooting time of the AR image, location information of a place where the AR image was shot, information about the AR device that captured the image, and the like.

한편, 객체 인식 처리 장치(100)는 각 AR 장치(200~204) 로부터 AR 영상을 수신하고, 객체 인식 프로그램을 통해 식별된 객체에 대한 정보를 제공하는 서버로서 동작할 수 있다. 이때, 객체 인식 처리 장치(100) 는 SaaS (Software as a Service), PaaS (Platform as a Service) 또는 IaaS (Infrastructure as a Service)와 같은 클라우드 컴퓨팅 서비스 모델에서 동작할 수 있다. 또한, 객체 인식 처리 장치(100) 는 사설(private) 클라우드, 공용(public) 클라우드 또는 하이브리드(hybrid) 클라우드와 같은 형태로 구축될 수 있다.Meanwhile, the object recognition processing device 100 may receive an AR image from each AR device 200 to 204 and operate as a server that provides information on an object identified through an object recognition program. In this case, the object recognition processing apparatus 100 may operate in a cloud computing service model such as Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS). In addition, the object recognition processing apparatus 100 may be built in a form such as a private cloud, a public cloud, or a hybrid cloud.

도 2는 본 발명의 일 실시예에 따른 객체 인식 프로그램의 구성을 도시한 블록도이다.2 is a block diagram illustrating the configuration of an object recognition program according to an embodiment of the present invention.

객체 인식 프로그램은 적어도 각각의 AR 장치(200~204)로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신하는 AR 정보 수집 모듈 (310), AR 영상에서 각각의 객체를 구분하는 객체 탐지 모듈(320), 객체 탐지 모듈(320)을 통해 구분된 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출하는 객체 추적 및 데이터셋 수집 모듈(330), 객체 추적 및 데이터셋 수집 모듈(330)로부터 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하는 객체 학습 모듈(340) 및 입력된 AR 영상으로부터 객체 탐지 모듈(320)을 통해 구분된 객체를 객체 학습 모듈(340)에 입력하여 객체의 식별 정보를 출력하는 객체 인지 모듈(350)을 포함한다.The object recognition program collects AR information for receiving at least the AR image, the shooting time of the AR image, location information of the place where the AR image was taken, and information about the AR device that took the image from each of the AR devices 200 to 204 The module 310, the object detection module 320 for discriminating each object in the AR image, receives the object category information divided through the object detection module 320, and tracks the object based on the similarity of each object, Receive a data set from the object tracking and dataset collection module 330, which extracts a data set for object learning, and the object tracking and dataset collection module 330, and collects each object based on the similarity of the received data set. An object learning module 340 that performs learning to classify and an object recognition module that inputs an object classified from the input AR image through the object detection module 320 into the object learning module 340 and outputs identification information of the object (350).

AR 정보 수집 모듈 (310)은 앞서 정의한 바와 같이, 각 AR 장치(200~204)로부터 AR 영상뿐만 아니라, 해당 AR 영상이 촬영되는 시간, 해당 AR 영상이 촬영된 장소의 위치 정보(예를 들면 GPS 정보 등), 해당 영상을 촬영한 AR 장치에 대한 정보(예를 들면, 단말기 고유 번호)를 각각 수집한다. 특히, AR 장치에 대한 정보는 향후, 각 AR 장치를 사용하는 사용자를 구분하는데 사용될 수 있다.The AR information collection module 310, as defined above, includes not only the AR image from each AR device 200 to 204, but also the time at which the AR image is captured, location information of the place where the AR image is captured (eg, GPS information, etc.), and information (eg, terminal unique number) about the AR device that captured the image. In particular, information on AR devices may be used to classify users who use each AR device in the future.

도 3은 본 발명의 일 실시예에 따른 객체 탐지 모듈의 동작을 설명하기 위한 도면이다.3 is a diagram for explaining an operation of an object detection module according to an embodiment of the present invention.

객체 탐지 모듈(320)은 AR 영상에서 각 객체별로 바운딩 박스를 생성하고, 각 객체별 바운딩 박스에 객체 카테고리 정보를 부여하여 객체 추적 및 데이터셋 수집 모듈(330)에 전달한다. The object detection module 320 generates a bounding box for each object in the AR image, gives object category information to each object-specific bounding box, and transmits it to the object tracking and dataset collection module 330 .

도 3에 도시된 바와 같이, AR 영상에서 각 객체별로 바운딩 박스를 생성하는데, 바운딩 박스 생성을 위한 알고리즘으로는 YOLO 등 종래의 기술을 사용한다. 그리고, 각 객체별 바운딩 박스에 객체 카테고리 정보를 부여하여 객체 추적 및 데이터셋 수집 모듈(330)에 전달한다. 객체 추적 및 데이터셋 수집 모듈(330)은 객체 탐지 모듈(320)에서 수신한 객체 카테고리 정보를 기초로, 객체 추적을 수행하며, 객체 추적 수행 결과를 다시 객체 탐지 모듈(320)에 전달하여, 객체 탐지에 사용되도록 한다.As shown in FIG. 3 , a bounding box is generated for each object in the AR image, and a conventional technique such as YOLO is used as an algorithm for generating the bounding box. Then, object category information is given to the bounding box for each object and transmitted to the object tracking and data set collection module 330 . The object tracking and dataset collection module 330 performs object tracking based on the object category information received from the object detection module 320 , and transmits the object tracking performance result back to the object detection module 320 , to be used for detection.

한편, 객체의 탐지 단계에서 객체 카테고리 분류 결과가 도출되면, 객체 카테고리 정보를 객체 인지 모듈(350)에 전달하고, 객체 인지 모듈(350)은 객체 카테고리 분류 결과를 추가로 활용하여 객체의 식별 정보를 출력한다. 객체 탐지 모듈(320)은 객체의 카테코리 정보로서, 예를 들면, 사람, 자동차, 동물 등의 분류 정보를 제공할 수 있다.On the other hand, when the object category classification result is derived in the object detection step, the object category information is transmitted to the object recognition module 350, and the object recognition module 350 additionally utilizes the object category classification result to obtain identification information of the object. print out The object detection module 320 may provide classification information of, for example, a person, a car, an animal, etc. as category information of an object.

또한, 도 3에 도시된 바와 같이, AR 영상에서 기탐지되어 추적중인 객체와 기존에 추적하지 않던 객체가 함께 존재하는 경우 추적 중인 객체의 영역을 제외한 나머지 영역에 대해, 즉 새롭게 등장한 객체에 대하여 탐지를 수행한다. 이와 같이, 새롭게 탐지된 객체에 대해서는 새로운 객체로서 인식하여 객체 추적 및 데이터셋 수집 모듈(330)과 객체 인지 모듈(350)에 각각 전달할 수 있다.In addition, as shown in FIG. 3 , when an object that has been previously detected and tracked in the AR image and an object that has not been tracked exist together, the remaining area except for the area of the object being tracked, that is, a newly appeared object is detected. carry out In this way, the newly detected object may be recognized as a new object and transmitted to the object tracking and data set collection module 330 and the object recognition module 350 , respectively.

도 4와 도 5는 본 발명의 일 실시예에 따른 객체 추적 및 데이터셋 수집 모듈의 동작을 설명하기 위한 도면이다.4 and 5 are diagrams for explaining the operation of the object tracking and data set collection module according to an embodiment of the present invention.

객체 추적 및 데이터셋 수집 모듈(330)은 객체 탐지 모듈(320)을 통해 구분된 객체 카테고리 정보를 수신하고, 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출한다.The object tracking and dataset collection module 330 receives the object category information divided through the object detection module 320 , tracks the object based on the similarity of each object, and extracts a data set for learning the object do.

도 4에 도시된 바와 같이, AR 영상으로부터 복수의 구분된 프레임의 이미지를 수신하는데, 각 프레임별로 객체 탐지 모듈(320)을 통해 구분된 각 프레임별 객체에 대한 정보를 수신한다. 그리고, 객체 추적 및 데이터셋 수집 모듈(330)은 샴 네트워크(Siamese Network)를 적용하여, 각 프레임별로 유사도가 가장 큰 객체를 추출하는 방식으로 객체를 추적할 수 있다. As shown in FIG. 4 , images of a plurality of divided frames are received from the AR image, and information on objects for each frame divided through the object detection module 320 for each frame is received. In addition, the object tracking and data set collection module 330 may track an object by applying a Siamese network to extract an object having the greatest similarity for each frame.

특히, 아래의 수학식 1과 같이, 객체 추적에 사용된 전체 프레임들 중에서의 각 객체간의 유사도의 합이 최소가 되는 복수의 프레임 또는 각 프레임들간의 유사도가 임계값 이하인 소정 개수의 프레임을 선택하여 학습을 위한 데이터 셋으로 선택한다. In particular, as shown in Equation 1 below, a plurality of frames in which the sum of similarities between objects among all frames used for object tracking is minimized or a predetermined number of frames in which the similarity between frames is less than or equal to a threshold is selected. Select the data set for training.

[수학식 1] [Equation 1]

i, j는 프레임 번호를 나타냄, S_ij는 두 프레임 i,j간 유사도를 나타냄i, j indicate frame number, S _ij indicates similarity between two frames i, j

K는 전체 프레임의 집합인 U의 부분 집합이며 , n(K)는 집합 K에 속하는 프레임의 개수임K is a subset of U, which is the set of all frames, and n(K) is the number of frames belonging to the set K

이와 같이, 특정 객체를 추적한 영상 내에서 유사도의 차이가 큰 프레임을 대상으로 각 객체에 대한 데이터 셋을 수집하므로, 각 객체의 다양한 변형 형태에 대한 학습 모델 구축이 가능해진다.In this way, since a data set for each object is collected for a frame with a large difference in similarity within an image in which a specific object is tracked, it is possible to build a learning model for various types of deformation of each object.

이때, 유사도 판단을 위한 샴 네트워크의 구성은 도 5에 도시된 바와 같으며, 각 프레임에 포함된 객체별로 유사도를 산출하는 알고리즘으로 알려져 있다.In this case, the configuration of the Siamese network for determining the similarity is as shown in FIG. 5 , and is known as an algorithm for calculating the similarity for each object included in each frame.

그리고, 객체 추적 및 데이터셋 수집 모듈(330)은 객체의 탐지 또는 객체의 추적 과정에서 식별된 객체의 특징 정보로부터 추출되는 각 객체의 부가 식별 정보를 상기 학습을 위한 데이터 셋에 추가하고, 객체 학습 모듈(340)은 부가 식별 정보가 추가된 데이터 셋을 기초로 각 객체를 학습한다.Then, the object tracking and data set collection module 330 adds additional identification information of each object extracted from characteristic information of the object identified in the object detection or object tracking process to the data set for learning, and object learning The module 340 learns each object based on the data set to which additional identification information is added.

예를 들면, 객체의 특징 정보로는 사람의 경우 객체의 얼굴의 특징 정보를 이용하거나, 자동차의 경우 자동차 번호판을 통해 식별되는 특징 정보, 또는 각 물품에 부착된 바코드 등의 정보를 특징 정보로서 이용할 수 있다. For example, as the feature information of the object, in the case of a person, feature information of the face of the object is used, in the case of a car, feature information identified through a license plate or information such as a barcode attached to each article can be used as feature information. can

이와 같은 정보는 객체를 대표하는 특징 정보인 만큼, 이를 이용하면, 객체를 보다 정확하게 식별할 수 있고, 이를 기반으로 각 객체의 식별 정보를 보다 용이하게 설정할 수 있게 된다.Since such information is characteristic information representing an object, by using it, it is possible to more accurately identify an object, and based on this, it is possible to more easily set identification information of each object.

도 6은 본 발명의 일 실시예에 따른 객체 학습 모듈의 동작을 설명하기 위한 도면이다.6 is a diagram for explaining an operation of an object learning module according to an embodiment of the present invention.

객체 학습 모듈(340)은 객체 추적 및 데이터셋 수집 모듈(330)로부터 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행한다.The object learning module 340 receives a data set from the object tracking and data set collection module 330 and performs learning to classify each object based on the similarity of the received data set.

도 6에 도시된 바와 같이, 객체 학습 모듈(340)은 수신된 데이터 셋의 유사도에 기초하여, 각 객체를 구분하는 객체 식별 정보를 부여하되, AR 영상의 촬영 시간 및 AR 영상이 촬영된 장소의 위치 정보를 추가적으로 고려하여 데이터 셋을 갱신하는 방식으로 학습을 수행한다.As shown in FIG. 6 , the object learning module 340 provides object identification information for classifying each object based on the similarity of the received data set, but the AR image capture time and AR image location Learning is performed by updating the data set by additionally considering location information.

예를 들면, 각 AR 장치로부터 수집된 AR 영상이 촬영된 장소의 위치 정보를 기준으로, 서로 구분된 위치(위치 A, 위치 B) 별로 각 객체 식별 정보가 부여된 데이터 셋이 수집된다. 이때, 데이터 셋은 각 객체 식별 정보 별로 유사한 프레임 이미지, 객체 탐지 결과(객체의 카테고리에 대한 정보)등을 포함하고 있는 것이다.For example, based on location information of a place where an AR image collected from each AR device was captured, a data set to which object identification information is assigned is collected for each separated location (position A, location B). In this case, the data set includes similar frame images and object detection results (information about object categories) for each object identification information.

그리고, 앞선 단계를 순차적으로 진행하여, 새로운 데이터 셋이 입력되면, 기존의 데이터 셋과의 유사도 비교를 통해, 유사도가 큰 데이터 셋을 중심으로 데이터 셋을 갱신한다. 이 과정에서 서로 다른 AR 장치에서 수집된 데이터 셋을 포함하도록, 데이터 셋을 갱신하여, 다양한 특징이 포함하면서 고른 분포를 가진 데이터 셋이 포함되도록 한다.Then, the preceding steps are sequentially performed, and when a new data set is input, the data set is updated with a focus on the data set having a high similarity through similarity comparison with the existing data set. In this process, the data set is updated to include the data sets collected from different AR devices, so that the data set includes various features and has an even distribution.

한편, 데이터 셋 간의 유사도를 판별하거나, 각 지역에 적합한 형태로 데이터 셋을 갱신하는 과정에 앞서 설명한 샴 네트워크가 사용될 수 있다.Meanwhile, the Siamese network described above may be used for determining the similarity between data sets or for updating the data sets in a form suitable for each region.

도 7은 본 발명의 일 실시예에 따른 객체 인지 모듈의 동작을 설명하기 위한 도면이다.7 is a diagram for explaining an operation of an object recognition module according to an embodiment of the present invention.

객체 인지 모듈(350)은 입력된 AR 영상으로부터 객체 탐지 모듈(320)을 통해 구분된 객체를 객체 학습 모듈(340)에 입력하여 객체의 식별 정보를 출력한다.The object recognition module 350 inputs the object identified through the object detection module 320 from the input AR image into the object learning module 340 and outputs identification information of the object.

이때, 객체 인지 모듈(350)은 객체 탐지 모듈(320)을 통해 구분된 객체와 객체 학습 모듈(340)의 각 객체별 데이터 셋과의 유사도에 기초하여, 기존 객체 식별 정보로 분류하거나, 신규 객체 식별 정보를 부여하거나, 미확정으로 분류할 수 있다. 이때, 입력 AR 영상과 객체 학습 모듈(340)의 데이터 셋 간의 유사도를 판별하기위해 앞서 설명한 샴 네트워크가 사용될 수 있다.At this time, the object recognition module 350 classifies the object classified through the object detection module 320 and the object learning module 340 into existing object identification information based on the similarity between the object-specific data set 340, or a new object. Identification information may be assigned or classified as unconfirmed. In this case, the Siamese network described above may be used to determine the similarity between the input AR image and the data set of the object learning module 340 .

입력 AR 영상과 특정 데이터 셋간의 유사도가 임계값 이상인 경우에는 기존에 객체 학습 모듈(340)에서 해당 데이터 셋을 대표하는 객체 식별 정보(ID)에 해당하는 것으로 분류한다.When the similarity between the input AR image and the specific data set is greater than or equal to the threshold, the object learning module 340 classifies it as corresponding to object identification information (ID) representing the data set.

입력 AR 영상과 전체 데이터 셋간의 유사도가 임계값에 도달하지 못한 경우에는 객체 학습 모듈(340)에서 해당 데이터 셋을 확보하지 못한 것으로 보고, 새로운 객체 식별 정보(ID)를 부여한 후, 해당 객체 식별 정보(ID)를 기초로 데이터 셋을 구축하도록 한다.If the similarity between the input AR image and the entire data set does not reach the threshold, the object learning module 340 considers that the corresponding data set has not been secured, and after assigning new object identification information (ID), the corresponding object identification information Build a data set based on (ID).

입력 AR 영상과 복수의 데이터 셋간의 유사도가 임계값 이상인 경우에는 경우에는 특정 객체 식별 정보(ID)를 부여하지 않고, 미확정으로 분류한다.When the similarity between the input AR image and the plurality of data sets is greater than or equal to the threshold, specific object identification information (ID) is not assigned and is classified as unconfirmed.

도 8은 본 발명의 일 실시예에 따른 객체 인식 방법을 도시한 순서도이다.8 is a flowchart illustrating an object recognition method according to an embodiment of the present invention.

먼저, 적어도 하나 이상의 AR 장치로부터 AR 영상, AR 영상의 촬영 시간, AR 영상이 촬영된 장소의 위치 정보, 해당 영상을 촬영한 AR 장치에 대한 정보를 수신한다(S810). 객체 인식 처리 장치(100)의 AR 정보 수집 모듈(310)에서 수행하는 동작으로서, 이와 같이 수신된 AR 데이터는 학습 모듈 구축에 사용되고, 이후 새롭게 입력된 AR 영상에 대하여 추론 과정을 수행하는데 사용된다.First, an AR image, a recording time of the AR image, location information of a place where the AR image was captured, and information on the AR device that captured the image are received from at least one AR device ( S810 ). As an operation performed by the AR information collection module 310 of the object recognition processing device 100, the AR data thus received is used to build a learning module, and then used to perform an inference process on a newly input AR image.

다음으로, AR 영상에서 각각의 객체를 구분하고, 구분된 객체 카테고리 정보를 생성한다(S820). 객체 인식 처리 장치(100)의 객체 탐지 모듈(320)에서 수행하는 동작으로서, 영상에서 객체별로 바운딩 박스를 생성하고, 해당 객체에 대한 카테고리 분류를 수행한다. 이와 같이 탐지된 객체 정보는 이후 객체 추적 및 데이터 셋 수집 단계(S830)와 객체 인지 단계(S850)로 각각 전달된다.Next, each object is divided in the AR image, and divided object category information is generated ( S820 ). As an operation performed by the object detection module 320 of the object recognition processing apparatus 100, a bounding box is created for each object in an image, and category classification is performed on the object. The detected object information is then transmitted to the object tracking and data set collection step (S830) and the object recognition step (S850), respectively.

다음으로, 객체 카테고리 정보를 수신하고 각 객체의 유사도에 기초하여 객체를 추적하고, 해당 객체의 학습을 위한 데이터 셋을 추출한다(S830). 객체 인식 처리 장치(100)의 객체 추적 및 데이터 셋 수집 모듈(330)에서 에서 수행하는 동작으로서, 샴 네트워크를 이용하여 유사도를 산출하고, 앞서 설명한 수학식 1을 통해 최적의 프레임 데이터 셋으로서 추출하되, 예를 들면, 각 객체간의 유사도의 합이 최소가 되는 복수의 프레임 또는 각 프레임들간의 유사도가 임계값 이하인 소정 개수의 프레임을 선택하여 학습을 위한 데이터 셋으로 선택한다. 또한, 객체의 탐지 또는 객체의 추적 과정에서 식별된 객체의 특징 정보로부터 추출되는 각 객체의 부가 식별 정보를 학습을 위한 데이터 셋에 추가하여, 각 ID 별로 데이터 셋의 정확성을 향상시킬 수 있다.Next, the object category information is received, the object is tracked based on the degree of similarity of each object, and a data set for learning the object is extracted (S830). As an operation performed in the object tracking and data set collection module 330 of the object recognition processing apparatus 100, the similarity is calculated using a Siamese network, and extracted as an optimal frame data set through Equation 1 described above. , for example, a plurality of frames in which the sum of similarities between objects is minimized or a predetermined number of frames in which the similarity between frames is equal to or less than a threshold value is selected as a data set for learning. In addition, by adding additional identification information of each object extracted from characteristic information of the object identified in the object detection or object tracking process to the data set for learning, it is possible to improve the accuracy of the data set for each ID.

다음으로, 추출된 데이터 셋을 수신하고, 수신된 데이터 셋의 유사도에 기초하여 각 객체를 구분하기 위한 학습을 수행하여 객체 학습 모듈을 구축한다(S840). 수신된 데이터 셋의 유사도에 기초하여, 각 객체를 구분하는 상기 객체 식별 정보를 부여하되, AR 영상의 촬영 시간 및 AR 영상이 촬영된 장소의 위치 정보를 추가적으로 고려하여 데이터 셋을 갱신하는 방식으로 학습을 수행한다.Next, the extracted data set is received, and an object learning module is constructed by performing learning to classify each object based on the degree of similarity of the received data set (S840). Based on the similarity of the received data set, the object identification information for classifying each object is given, and the data set is updated by additionally considering the recording time of the AR image and the location information of the place where the AR image was captured. carry out

다음으로, 입력된 AR 영상으로부터 구분된 객체를 객체 학습 모듈에 입력하여 객체의 식별 정보를 출력하는 객체 인지 단계를 수행한다(S850). 객체와 객체 학습 모듈의 각 객체별 데이터 셋과의 유사도에 기초하여, 기존 객체 식별 정보로 분류하거나, 신규 객체 식별 정보를 부여하거나, 미확정으로 분류할 수 있다.Next, an object recognition step of outputting identification information of an object by inputting an object separated from the input AR image into the object learning module is performed (S850). Based on the degree of similarity between the object and the data set for each object of the object learning module, the object may be classified as existing object identification information, new object identification information may be assigned, or classified as unconfirmed.

본 발명의 일 실시예에 따른 객체 인식 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The object recognition method according to an embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described with reference to specific embodiments, some or all of their components or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustration, and those of ordinary skill in the art to which the present application pertains will understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 청구범위에 의하여 나타내어지며, 청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present application.

100: 객체 인식 처리 장치
110: 통신 모듈
120: 메모리
130: 프로세서
140: 데이터베이스
300: 객체 인식 프로그램
310: AR 정보 수집 모듈
320: 객체 탐지 모듈
330: 객체 추적 및 데이터 셋 수집 모듈
340: 객체 학습 모듈
350: 객체 인지 모듈100: object recognition processing unit
110: communication module
120: memory
130: processor
140: database
300: object recognition program
310: AR information collection module
320: object detection module
330: object tracking and data set collection module
340: object learning module
350: object recognition module

Claims

In the object recognition processing device for AR (Augmented Reality) device,
communication module;
a memory in which an object recognition program is stored; and
A processor executing the object recognition program,
The object recognition program includes an AR information collection module that receives an AR image, a recording time of the AR image, location information of a place where the AR image was captured, and information on an AR device that captured the image from at least one AR device, the AR An object detection module for discriminating each object in an image, an object receiving object category information divided through the object detection module, tracking the object based on the degree of similarity of each object, and extracting a data set for learning the object From a tracking and dataset collection module, an object learning module that receives a data set from the object tracking and dataset collection module, and performs learning to classify each object based on the similarity of the received data set, and an input AR image and an object recognition module for inputting the object classified through the object detection module into the object learning module and outputting identification information of the object.

According to claim 1,
The object detection module generates a bounding box for each object in the AR image, gives the object category information to each object-specific bounding box, and transmits the object category information to the object tracking and dataset collection module.

According to claim 1,
The object detection module generates a bounding box for each object in the AR image, gives the object category information to each object-specific bounding box, and transmits the object category information to the object recognition module;
The object recognition module further utilizes the object category information to output identification information of the object.

According to claim 1,
Wherein the object detection module performs object detection on the remaining area except for the area of the object when there is an object being detected and tracked in the AR image.

According to claim 1,
The object tracking and dataset collection module applies a Siamese network to each frame-by-frame object classified through the object detection module from the AR image, and extracts the object with the greatest degree of similarity for each frame. Tracking the object recognition processing device.

6. The method of claim 5,
The object tracking and data set collection module selects a plurality of frames in which the sum of similarities between objects among all frames used for object tracking is the minimum or a predetermined number of frames in which the similarity between frames is less than or equal to a threshold value. An object recognition processing device that is selected as a data set for learning.

According to claim 1,
The object tracking and data set collection module adds additional identification information of each object extracted from the characteristic information of the object identified in the object detection or object tracking process to the data set for learning,
The object learning module is to learn each object based on the data set to which the additional identification information is added, the object recognition processing apparatus.

According to claim 1,
The object learning module provides the object identification information for classifying each object based on the similarity of the received data set, and additionally considers the recording time of the AR image and the location information of the place where the AR image is captured. To perform learning in a way to update the object recognition processing device.

According to claim 1,
The object recognition module
Based on the similarity between the object classified through the object detection module and the data set for each object of the object learning module, classification as existing object identification information, new object identification information, or classification as unconfirmed, object recognition processing unit.

In the object recognition processing method using the object recognition processing device for AR (Augmented Reality) device,
(a) receiving an AR image, a recording time of the AR image, location information of a place where the AR image was captured, and information on the AR device that captured the image from at least one AR device;
(b) classifying each object in the AR image and generating classified object category information;
(c) receiving the divided object category information, tracking the object based on the degree of similarity of each object, and extracting a data set for learning the object;
(d) receiving the extracted data set and constructing an object learning module by performing learning to classify each object based on the degree of similarity of the received data set; and
(e) inputting the object divided from the input AR image into the object learning module and outputting identification information of the object;

11. The method of claim 10,
The step (b) is to create a bounding box for each object in the AR image, and to give the object category information to the bounding box for each object, the object recognition processing method.

11. The method of claim 10,
In the step (b), a bounding box is created for each object in the AR image, the object category information is given to the bounding box for each object, and the object category information is transmitted to the step (e),
The step (e) is to output the identification information of the object by further utilizing the object category information, object recognition processing method.

11. The method of claim 10,
In the step (b), if there is an object being detected and being tracked in the AR image, object detection is performed on the remaining area except for the area of the object.

11. The method of claim 10,
The step (c) applies a Siamese network to the object for each frame from the AR image, and the object is tracked by extracting the object with the greatest similarity for each frame.

15. The method of claim 14,
The step (c) selects a plurality of frames in which the sum of similarities between objects is the minimum among all frames used for object tracking, or a predetermined number of frames in which the similarity between frames is less than or equal to a threshold value for learning. An object recognition processing method, which is selected as a data set.

11. The method of claim 10,
In the step (c), additional identification information of each object extracted from the characteristic information of the object identified in the object detection or object tracking process is added to the data set for learning,
The step (d) is to learn each object based on the data set to which the additional identification information is added, the object recognition processing method.

11. The method of claim 10,
In the step (d), the object identification information for classifying each object is given based on the degree of similarity of the received data set, and data is additionally taken into account in consideration of the recording time of the AR image and the location information of the place where the AR image is captured. An object recognition processing method that performs learning by updating the set.

11. The method of claim 10,
In step (e), based on the degree of similarity between the object classified through step (b) and the data set for each object of the object learning module, classifying the object into existing object identification information, or assigning new object identification information, An object recognition processing method that classifies as unconfirmed.

A computer-readable medium in which a program for executing the method according to any one of claims 10 to 18 is recorded.