KR20240087443A

KR20240087443A - Training method and apparatus of object search model for unsupervised domain adaptation

Info

Publication number: KR20240087443A
Application number: KR1020220173165A
Authority: KR
Inventors: 심재영; 양재원; 김두현
Original assignee: 울산과학기술원
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2024-06-19

Abstract

일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법은 트레이닝이 완료된 객체 검출 모델에 기초하여 소스 도메인(source domain)에 속하는 소스 데이터의 소스 객체를 검출하는 단계, 상기 객체 검출 모델에 기초하여 타겟 도메인(target domain)에 속하는 타겟 데이터의 타겟 객체를 검출하는 단계, 특징 추출 모델에 기초하여 상기 검출된 소스 객체로부터 소스 특징 데이터, 상기 검출된 타겟 객체로부터 타겟 특징 데이터를 추출하는 단계, 상기 소스 특징 데이터 및 상기 타겟 특징 데이터로부터 생성된 중간 특징 데이터에 기초하여 상기 검출된 타겟 객체에 대한 신뢰성 스코어를 결정하는 단계, 상기 타겟 특징 데이터에 기초하여 상기 타겟 데이터로부터 검출된 상기 타겟 객체를 클러스터링(clustering)하는 단계, 유일성(uniqueness) 논리를 적용하여 상기 타겟 객체에 대한 클러스터를 정제(refine)하는 단계 및 상기 정제된 클러스터링 결과 및 상기 신뢰성 스코어에 기초한 목적 함수(objective function)를 이용하여 상기 특징 추출 모델을 포함하는 객체 식별 모델을 트레이닝시키는 단계를 포함할 수 있다.A method of training an object detection model for unsupervised domain adaptation according to an embodiment includes detecting a source object of source data belonging to a source domain based on a trained object detection model, and adding the object detection model to the object detection model. Detecting a target object of target data belonging to a target domain based on a target domain, extracting source feature data from the detected source object and target feature data from the detected target object based on a feature extraction model, Determining a reliability score for the detected target object based on intermediate feature data generated from the source feature data and the target feature data, clustering the target object detected from the target data based on the target feature data (clustering), refining the cluster for the target object by applying uniqueness logic, and using an objective function based on the refined clustering result and the reliability score to determine the feature. It may include training an object identification model including an extraction model.

Description

Method and apparatus for training an object search model for unsupervised domain adaptation {TRAINING METHOD AND APPARATUS OF OBJECT SEARCH MODEL FOR UNSUPERVISED DOMAIN ADAPTATION}

비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법 및 장치가 제공된다.A method and apparatus for training an object search model for unsupervised domain adaptation are provided.

영상 데이터 기반 AI 서비스는 AI 기술들을 활용하여 영상과 이미지에 존재하는 객체의 종류와 특징들을 추출하고, 추출한 특징들로부터 유의미한 정보를 산출하여 사용자에게 제공하는 서비스를 의미할 수 있다. 이를 통해 AI 서비스 사용자는 높은 인식률과 정확도를 가진 의료, 보안, 불량검출, 범죄인지, 상황인지 등의 서비스를 제공받을 수 있다. 또한, 사람이 등장하는 영상들로부터 특정 사람을 찾아내는 기술은 영상 데이터 기반 AI 서비스 중 AI 영상처리 기술의 일부분으로 시각 지능이 이미 인간 수준을 넘은 시각 인식률을 달성하는 것과 같이 현재 비약적인 발전을 이루고 있다.Video data-based AI service can refer to a service that utilizes AI technologies to extract the types and characteristics of objects present in videos and images, calculates meaningful information from the extracted features, and provides it to users. Through this, AI service users can receive services such as medical, security, defect detection, crime recognition, and situational recognition with high recognition rates and accuracy. In addition, the technology to find a specific person from videos featuring people is a part of AI image processing technology among video data-based AI services, and is currently making rapid progress, with visual intelligence already achieving a visual recognition rate that exceeds the human level.

영상들로부터 특정 사람을 찾아내는 기술은 크게 사람 식별(person re-identification) 기술과 사람 검색(person search) 기술로 나눌 수 있다. 사람 식별 기술은 주어진 사람을 찾기 위하여 한 명의 사람만 촬영된 영상이나 전체 영상에서 미리 검출된 사람의 영상을 사용하는 등 제한된 촬영 환경과 조건을 필요로 하는 단점이 있다. 그러나 사람 검색 기술은 다수의 사람이 등장하는 CCTV 영상과 같은 영상으로부터 사람을 검출한 뒤 그 중에서 주어진 사람을 찾으므로 기술 적용에 별도의 제약이 없는 점에서 차이가 있다. 따라서 사람 검색 기술은 사람 식별 기술보다 실제 생활에 적용이 용이하고 적용되는 분야가 폭넓을 수 있다. 특히, 사람 검색 기술은 CCTV, 블랙박스 등으로부터 수집된 영상을 분석하여 사람 또는 발생 상황 등을 인식할 수 있는 광범위 보안 및 감시 시스템에 적용될 수 있다. 한편, 사람 검색 기술은 사람 검출(person detection) 기술과 사람 식별 기술이 함께 이용되는 기술로 전술한 사람 식별 기술보다 훨씬 복잡하고 다양한 도전적인 과제를 가지고 있다. 이에 사람 검색 기술은 도전적인 과제를 해결하기 위한 다양한 기술 개발을 필요로 한다.Technology for finding specific people from videos can be broadly divided into person re-identification technology and person search technology. Person identification technology has the disadvantage of requiring limited filming environments and conditions, such as using images of only one person or images of people previously detected in the entire image to find a given person. However, human search technology is different in that it detects people from images such as CCTV images that appear in multiple people and then searches for a given person among them, so there are no separate restrictions on the application of the technology. Therefore, human search technology can be more easily applied to real life than human identification technology and can be applied in a wider range of fields. In particular, human search technology can be applied to a wide range of security and surveillance systems that can recognize people or situations by analyzing images collected from CCTV, black boxes, etc. Meanwhile, person search technology is a technology that uses both person detection technology and person identification technology, and has much more complex and diverse challenging tasks than the above-mentioned person identification technology. Accordingly, human search technology requires the development of various technologies to solve challenging tasks.

한편, 머신러닝(machine learning)은 AI의 한 분야로 소스 도메인(source domain)을 이용하여 학습하고, 타겟 도메인(target domain)을 이용하여 학습 결과를 확인할 수 있다. 또한, 머신러닝(machine learning)은 크게 지도 학습(supervised supervision) 방식 및 비지도 학습(unsupervised supervision) 방식으로 학습 방식을 분류할 수 있다. 지도 학습은 기계학습 모델에 데이터와 정답에 해당하는 라벨(label)을 함께 제공하여 학습하는 방식에 해당할 수 있다. 비지도 학습은 라벨없이 데이터를 분석함에 따라 특징을 구분하고, 구분한 특징을 이용하여 학습하는 방식에 해당할 수 있다. 사람 검색 기술은 지도 학습과 비지도 학습을 각각 응용하여 기계학습 모델에 적용하거나 상호 혼합하여 응용한 학습 방식을 기계학습 모델에 적용할 수 있다.Meanwhile, machine learning is a field of AI that learns using a source domain and can check the learning results using a target domain. In addition, machine learning can be broadly classified into supervised supervision and unsupervised learning methods. Supervised learning may correspond to a method of learning by providing a machine learning model with data and a label corresponding to the correct answer. Unsupervised learning may correspond to a method of classifying features by analyzing data without labels and learning using the classified features. People search technology can be applied to a machine learning model by applying supervised learning and unsupervised learning, or a learning method mixed with each other can be applied to a machine learning model.

위에서 설명한 배경기술은 발명자가 본원의 개시 내용을 도출하는 과정에서 보유하거나 습득한 것으로서, 반드시 본 출원 전에 일반 공중에 공개된 공지기술이라고 할 수는 없다.The background technology described above is possessed or acquired by the inventor in the process of deriving the disclosure of the present application, and cannot necessarily be said to be known technology disclosed to the general public before this application.

대한민국 공개 특허공보 10- 2022-0068373(2022년 05월 26일 공고)에는 Cctv 환경에서의 보행자 추적 장치 및 방법이 제시된다.Republic of Korea Patent Publication No. 10-2022-0068373 (announced on May 26, 2022) presents a pedestrian tracking device and method in a CCTV environment. 대한민국 등록 특허공보 10-2112859(2020년 05월 13일 공고)에는 레이블링 작업을 위해 딥러닝 모델을 트레이닝하는 방법 및 그를 이용한 장치가 제시된다.Republic of Korea Patent Publication No. 10-2112859 (announced on May 13, 2020) presents a method of training a deep learning model for labeling work and a device using the same.

일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법 및 장치는 객체 식별 학습 시 유일성 정보를 적용하여 라벨링을 위한 군집화 과정이 개선될 수 있으며, 부분 분류화 기법으로 산출된 신뢰성이 적용된 목적함수를 사용하여 도메인 갭에 의한 성능 저하를 개선할 수 있다.A method and apparatus for training an object search model for unsupervised domain adaptation according to an embodiment can improve the clustering process for labeling by applying uniqueness information when learning object identification, and the reliability calculated by a partial classification technique is applied. Performance degradation caused by domain gaps can be improved by using the objective function.

다만, 기술적 과제는 상술한 기술적 과제들로 한정되는 것은 아니며, 또 다른 기술적 과제들이 존재할 수 있다.However, technical challenges are not limited to the above-mentioned technical challenges, and other technical challenges may exist.

상기 객체 검색 모델의 트레이닝 방법은 상기 소스 도메인의 트레이닝 데이터를 이용하여 상기 객체 검출 모델에 대한 지도 트레이닝을 수행하는 단계를 포함할 수 있다.The training method of the object detection model may include performing supervised training on the object detection model using training data of the source domain.

상기 객체 검색 모델의 트레이닝 방법은 상기 객체 검출 모델의 트레이닝 후에, 상기 소스 특징 데이터 및 상기 타겟 특징 데이터를 이용하여 상기 객체 식별 모델의 비지도 트레이닝을 수행하는 단계를 더 포함할 수 있다.The training method of the object detection model may further include performing unsupervised training of the object identification model using the source feature data and the target feature data after training the object detection model.

상기 신뢰성 스코어를 결정하는 단계는, 파트(part) 분류 모델에 기초하여 상기 중간 특징 데이터를 복수개의 파트(part)들에 대한 분류 결과를 생성하는 단계 및 상기 분류 결과에서 상기 복수개의 파트들에 대한 정답 예측 확률의 평균에 기초하여 상기 신뢰성 스코어를 산출하는 단계를 포함할 수 있다.Determining the reliability score includes generating a classification result for the plurality of parts from the intermediate feature data based on a part classification model, and classifying the plurality of parts in the classification result. It may include calculating the reliability score based on the average of the predicted probability of a correct answer.

상기 파트(part)로 분류하는 단계는, 상기 복수개의 파트가 분류되는 상기 정답 예측 확률에 기초하여 산출되는 파트(part) 손실함수가 감소하는 방향으로 상기 파트(part) 분류 모델을 트레이닝시키는 단계를 포함할 수 있다.The step of classifying into parts includes training the part classification model in a direction to decrease the part loss function calculated based on the correct prediction probability of classifying the plurality of parts. It can be included.

상기 클러스터를 정제하는 단계는, 상기 클러스터링된 타겟 객체를 이미지 별로 그룹화하고 각 그룹에 대하여 유일성 논리를 적용함에 따라 상기 타겟 객체에 대한 클러스터를 정제하는 단계 및 상기 정제된 클러스터를 기준으로 부정(negative) 객체 및 긍정(positive) 객체를 재설정하는 단계를 포함할 수 있다.The step of refining the cluster includes refining the cluster for the target object by grouping the clustered target object by image and applying uniqueness logic to each group, and negating the cluster based on the refined cluster. It may include resetting objects and positive objects.

상기 객체 식별 모델을 트레이닝시키는 단계는, 상기 부정 객체와 기준 객체의 특징 벡터들 간의 거리가 멀어지고, 상기 긍정 객체와 상기 기준 객체의 특징 벡터들 간의 거리가 가까워지도록, 상기 특징 추출 모델의 파라미터를 업데이트하는 단계를 더 포함할 수 있다.The step of training the object identification model includes adjusting the parameters of the feature extraction model so that the distance between the feature vectors of the negative object and the reference object becomes larger, and the distance between the feature vectors of the positive object and the reference object becomes closer. An updating step may be further included.

상기 클러스터를 정제하는 단계는, 상기 정제된 클러스터링 결과에 기초하여 산출된 손실함수가 감소하는 방향으로 상기 특징 추출 모델을 트레이닝시키는 단계를 더 포함할 수 있다.The step of refining the cluster may further include training the feature extraction model in a direction that reduces the loss function calculated based on the refined clustering result.

상기 객체 식별 모델을 트레이닝시키는 단계는, 상기 신뢰성 스코어 및 상기 정제된 클러스터링 결과에 기초하여 산출되는 분류 도메인 적응 손실함수가 감소하는 방향으로 상기 객체 식별 모델을 트레이닝시키는 단계를 더 포함할 수 있다.The step of training the object identification model may further include training the object identification model in a direction to decrease the classification domain adaptation loss function calculated based on the reliability score and the refined clustering result.

상기 객체 식별 모델을 트레이닝시키는 단계는, 상기 신뢰성 스코어 및 도메인 간의 거리 값에 기초하여 산출되는 특징 도메인 적응 손실함수가 감소하는 방향으로 상기 객체 식별 모델을 트레이닝시키는 단계를 더 포함할 수 있다.The step of training the object identification model may further include training the object identification model in a direction to decrease the feature domain adaptation loss function calculated based on the reliability score and the distance value between domains.

일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 장치는 객체 검출 모델 및 객체 식별 모델을 트레이닝시키는 인스트럭션들(instruction)이 저장된 메모리 및 상기 메모리에 저장된 인스트럭션들을 실행하는 프로세서를 포함하고, 상기 프로세서는, 상기 메모리에 저장된 인스트럭션들을 실행하여 트레이닝된 상기 객체 검출 모델로부터 소스 도메인에 속하는 소스 데이터의 소스 객체 및 타겟 도메인에 속하는 타겟 데이터의 타겟 객체를 검출하고, 특징 추출 모델에 기초하여 상기 소스 객체로부터 소스 특징 데이터 및 상기 타겟 객체로부터 타겟 특징 데이터를 추출하고, 상기 소스 특징 데이터 및 상기 타겟 특징 데이터로부터 생성된 중간 데이터에 기초하여 상기 타겟 객체에 대한 신뢰성 스코어를 결정하고, 상기 타겟 특징 데이터에 기초하여 상기 타겟 객체를 클러스터링 하고, 유일성 논리를 적용하여 상기 타겟 객체에 대한 클러스터를 정제하며, 상기 정제된 클러스터링 결과 및 상기 신뢰성 스코어에 기초한 목적 함수를 이용하여 상기 특징 추출 모델을 포함하는 객체 식별 모델을 트레이닝시킬 수 있다.An apparatus for training an object search model for unsupervised domain adaptation according to an embodiment includes a memory storing instructions for training an object detection model and an object identification model, and a processor executing the instructions stored in the memory, The processor executes instructions stored in the memory to detect a source object of source data belonging to a source domain and a target object of target data belonging to a target domain from the trained object detection model, and detects the source object based on the feature extraction model. Extracting source feature data from an object and target feature data from the target object, determining a reliability score for the target object based on intermediate data generated from the source feature data and the target feature data, and adding the target feature data to the target feature data. an object identification model comprising clustering the target object based on the target object, applying uniqueness logic to refine clusters for the target object, and the feature extraction model using an objective function based on the refined clustering result and the reliability score. can be trained.

상기 프로세서는 상기 소스 도메인의 트레이닝 데이터를 이용하여 상기 객체 검출 모델에 대한 지도 트레이닝을 수행하고, 상기 소스 특징 데이터 및 상기 타겟 특징 데이터를 이용하여 상기 객체 식별 모델에 대한 비지도 트레이닝을 수행할 수 있다.The processor may perform supervised training on the object detection model using training data of the source domain and unsupervised training on the object identification model using the source feature data and the target feature data. .

상기 프로세서는 파트 분류 모델에 기초하여 상기 중간 특징 데이터를 복수개의 파트(part)들에 대한 분류 결과를 생성하고, 상기 분류 결과에서 상기 복수개의 파트들에 대한 정답 예측 확률의 평균에 기초하여 상기 신뢰성 스코어를 산출할 수 있다.The processor generates a classification result for a plurality of parts from the intermediate feature data based on a part classification model, and the reliability is based on the average of correct prediction probabilities for the plurality of parts in the classification result. A score can be calculated.

상기 프로세서는 상기 복수개의 파트가 분류되는 상기 정답 예측 확률에 기초하여 산출되는 부분(part) 분류 손실 함수가 감소하는 방향으로 상기 파트(part) 분류 모델을 트레이닝시킬 수 있다.The processor may train the part classification model in a direction that decreases the part classification loss function calculated based on the correct prediction probability by which the plurality of parts are classified.

상기 프로세서는 상기 클러스터링된 타겟 객체를 이미지 별로 그룹화하고 각 그룹에 대하여 유일성 논리를 적용함에 따라 상기 타겟 객체에 대한 클러스터를 정제하며, 상기 정제된 클러스터를 기준으로 부정 객체 및 긍정 객체를 재설정할 수 있다.The processor groups the clustered target objects by image, applies uniqueness logic to each group, refines the cluster for the target object, and resets the negative and positive objects based on the refined cluster. .

상기 프로세서는 상기 부정 객체와 기준 객체의 특징 벡터들 간의 거리가 멀어지고, 상기 긍정 객체와 상기 기준 객체의 특징 벡터들 간의 거리가 가까워지도록 상기 특징 추출 모델의 파라미터를 업데이트할 수 있다.The processor may update the parameters of the feature extraction model so that the distance between the feature vectors of the negative object and the reference object becomes larger, and the distance between the feature vectors of the positive object and the reference object becomes closer.

상기 프로세서는 상기 정제된 클러스터링 결과에 기초하여 산출된 손실함수가 감소하는 방향으로 상기 특징 추출 모델을 트레이닝시킬 수 있다.The processor may train the feature extraction model in a direction that reduces the loss function calculated based on the refined clustering result.

상기 프로세서는 상기 신뢰성 스코어 및 상기 정제된 클러스터링 결과에 기초하여 산출되는 분류 도메인 적응 손실함수가 감소하는 방향으로 상기 객체 식별 모델을 트레이닝시킬 수 있다.The processor may train the object identification model in a direction that reduces the classification domain adaptation loss function calculated based on the reliability score and the refined clustering result.

상기 프로세서는 상기 신뢰성 스코어 및 도메인 간의 거리 값에 기초하여 산출되는 특징 도메인 적응 손실함수가 감소하는 방향으로 상기 객체 식별 모델을 트레이닝시킬 수 있다.The processor may train the object identification model in a direction that reduces the feature domain adaptation loss function calculated based on the reliability score and the distance value between domains.

일 실시예에 따른 하나 이상의 컴퓨터 프로그램을 저장한 컴퓨터 판독 가능 기록 매체는, 방법을 수행하기 위한 명령어를 포함할 수 있다.A computer-readable recording medium storing one or more computer programs according to an embodiment may include instructions for performing a method.

도 1은 비지도 도메인 적응 기술에 대한 예시도를 도시한다.
도 2는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 장치의 블록도를 도시한다.
도 3은 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 장치 내 프로세서에 의해 동작되는 도메인 기준에 따른 흐름도를 도시한다.
도 4는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법의 흐름도를 도시한다.
도 5는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법 중 클러스터링 방식 및 클러스터링된 클러스터를 정제하는 방식의 예시도를 도시한다.
도 6은 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법의 추론 단계의 흐름도를 도시한다.
도 7은 일 실시예에 따른 트레이닝 방법이 적용된 객체 검색 모델의 성능을 다른 객체 검색 모델과 비교한 결과이다.
도 8은 일 실시예에 따른 트레이닝 방법을 순차적으로 적용한 객체 검색 모델의 성능을 비교한 결과이다.Figure 1 shows an example diagram of an unsupervised domain adaptation technique.
Figure 2 shows a block diagram of an apparatus for training an object search model for unsupervised domain adaptation according to an embodiment.
Figure 3 shows a flowchart according to domain criteria operated by a processor in a training device for an object search model for unsupervised domain adaptation according to an embodiment.
Figure 4 shows a flowchart of a method for training an object search model for unsupervised domain adaptation according to one embodiment.
Figure 5 shows an example of a clustering method and a method of refining clustered clusters among the training methods of an object search model for unsupervised domain adaptation according to an embodiment.
Figure 6 shows a flowchart of the inference step of a method for training an object search model for unsupervised domain adaptation according to one embodiment.
Figure 7 shows the results of comparing the performance of an object search model to which a training method according to an embodiment is applied with another object search model.
Figure 8 shows the results of comparing the performance of an object search model to which training methods according to an embodiment are sequentially applied.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or excessively formal sense. No.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

사람 검색 기술은 기계학습 모델의 학습 방식에 따라 크게 두가지로 분류될 수 있다. 사람 검색 기술은 예시적으로 지도 사람 검색 기술(Supervised Person Search) 및 약한 지도 사람 검색 기술(Weakly Supervised Person Search)로 분류할 수 있다.Human search technology can be broadly classified into two types depending on the learning method of the machine learning model. Person search techniques can be illustratively classified into Supervised Person Search and Weakly Supervised Person Search.

지도 사람 검색 기술에서는 사람 검출의 기계학습 모델과 사람 식별 기계학습 모델 모두가 지도 학습을 이용하여 학습될 수 있다. 지도 사람 검색 기술은 소스 도메인의 확률 분포 타겟 도메인의 확률 분포가 일치 또는 유사한 경우에 높은 인식률을 나타낼 수 있으며, 사람의 위치 정보를 나타내는 바운딩 박스(bounding box) 라벨과 사람의 신원 정보를 나타내는 ID 라벨을 모두 기계학습 모델에 제공하는 특성을 가질 수 있다. 지도 사람 검색 기술은 사람 식별 기술과 다르게 대규모 데이터셋을 사용하며, 종단간 네트워크(end-to-end network)를 이용하여 사람 검출과 사람 식별에 대한 두가지 작업을 함께 수행하는 사람 검색 기술의 초창기 기술이다. 또한, 지도 사람 검색 기술은 응용기술로 어텐션(attention) 구조를 사용한 네트워크를 통해 잘못된 사람 검출 결과의 영향을 완화시키는 기술과 가중치 기법을 통해 검출된 결과의 품질을 사람 식별 단계에 반영하는 기술에 대한 개발이 이루어졌다. 또한, 지도 사람 검색 기술은 사람 검출 단계에서 앵커 프리(anchor-free)기반의 기계학습 모델을 제안하여 기존 Faster R-CNN(Region-Based Convolutional Neural Network)의 높은 계산 비용을 완화시키는 기술에 대한 개발 또한 이루어졌다. In supervised person search technology, both the machine learning model for person detection and the machine learning model for person identification can be learned using supervised learning. Map person search technology can show a high recognition rate when the probability distribution of the source domain and the probability distribution of the target domain match or are similar, and the bounding box label representing the person's location information and the ID label representing the person's identity information All can have characteristics provided to a machine learning model. Unlike person identification technology, mapped person search technology uses large-scale datasets and is an early technology that performs the two tasks of person detection and person identification using an end-to-end network. am. In addition, the supervised person search technology is an application technology that includes a technology that mitigates the impact of incorrect person detection results through a network using an attention structure and a technology that reflects the quality of the detected results in the person identification stage through a weighting technique. development has taken place. In addition, the supervised person search technology is developed to alleviate the high computational cost of the existing Faster R-CNN (Region-Based Convolutional Neural Network) by proposing an anchor-free machine learning model in the person detection stage. It was also done.

한편, 소스 도메인과 타겟 도메인은 항상 같은 확률 분포를 가질 수 없고, 데이터 수가 충분하지 않을 경우에는 다른 도메인으로 학습해야 하는 경우가 있을 수 있다. 이에 소스 도메인과 타겟 도메인의 확률분포가 다른 것을 도메인 갭(domain gap)이라고 하며, 도메인 갭은 성능 저하의 원인이 될 수 있다. 그러나 지도 사람 검색 기술과 관련하여, 소스 도메인과 타겟 도메인의 확률 분포가 일치하지 않은 경우에 대한 기술 개발이 이루어지지 않고 있다. 또한, 동시에 지도 사람 검색 기술은 학습을 위해 대규모 데이터 셋(data set)의 라벨을 생성함에 따라 엄청난 경제적, 시간적 비용을 수반하는 문제가 있다.Meanwhile, the source domain and target domain may not always have the same probability distribution, and if the number of data is not sufficient, there may be cases where learning from a different domain may be necessary. Accordingly, the difference in probability distributions between the source domain and the target domain is called a domain gap, and the domain gap can cause performance degradation. However, in relation to map person search technology, technology has not been developed for cases where the probability distributions of the source domain and target domain do not match. Additionally, at the same time, supervised human search technology has the problem of incurring enormous economic and time costs as it generates labels from large-scale data sets for learning.

이를 해결하고자 약한 지도 사람 검색 기술은 타겟 도메인의 ID 라벨없이 바운딩 박스 라벨만 제공하는 특성을 가진다. 이에 약한 지도 사람 검색 기술은 사람 검출 모델에는 지도 학습을 이용하고, 사람 식별 모델에는 비지도 학습을 이용할 수 있다. 약한 지도 사람 검색 기술은 샴 네트워크(Siamese network)를 통해 군집화 단계에서 사람이 등장한 배경의 문맥을 개체마다 일관성을 가지도록 학습할 수 있는 기술 개발을 이루었으며, 하나의 영상에서 같이 등장한 사람은 다른 영상에서도 등장할 확률이 높다는 논리를 이용한 군집화 기법에 대한 기술 개발을 이루었다. 또한, 약한 지도 사람 검색 기술은 여러가지 문맥적 상황을 이용하여 배경 정보에서 벗어난 사람의 특징만을 학습하는 기술 개발 또한 이루었다. 그러나 약한 지도 사람 검색 기술은 학습을 위하여 여전히 타겟 도메인에 바운딩 박스 라벨을 생성해야 하는 문제가 존재하고 있으며, 소스 도메인과 타겟 도메인의 확률 분포 불일치에 대한 기술 개발 또한 여전히 이루어지지 않은 문제가 있다.To solve this problem, weakly supervised human search technology has the characteristic of providing only bounding box labels without ID labels of the target domain. Accordingly, a weak supervised person search technology may use supervised learning for the person detection model and unsupervised learning for the person identification model. Weakly supervised person search technology has been developed through a Siamese network to learn the context of the background in which people appear in the clustering stage so that it is consistent for each object, and people who appear together in one video are identified in another video. We have developed technology for a clustering technique using the logic that it is highly likely to appear in . In addition, weakly supervised person search technology has also been developed to learn only the characteristics of people that deviate from background information using various contextual situations. However, weakly supervised human search technology still has the problem of having to generate bounding box labels in the target domain for learning, and there is also the problem that technology for the probability distribution mismatch between the source domain and target domain has not yet been developed.

소스 도메인과 타겟 도메인의 확률 분포 불일치 문제를 해결하고자 하는 목적은 타겟 도메인에서의 테스트 적용 시 좋은 결과를 도출하는 것일 수 있다. 이에 도메인 적응(domain adaptation) 시 소스 도메인을 통해 획득한 지식을 타겟 도메인으로 용이하게 전이(transfer)하는 것은 전술한 문제를 해결하는 방안이 될 수 있다. 이러한 도메인 적응 기법은 사람 검색 기술 분야가 아닌 사람 식별 기술 분야에 적용하는 연구가 활발히 이루어지고 있다. The purpose of solving the problem of probability distribution mismatch between the source domain and the target domain may be to obtain good results when applying the test in the target domain. Accordingly, easily transferring knowledge acquired through the source domain to the target domain during domain adaptation can be a way to solve the above-mentioned problem. Research is being actively conducted to apply these domain adaptation techniques to the field of human identification technology rather than the field of human search technology.

도 1은 비지도 도메인 적응 기술에 대한 예시도를 도시한다.Figure 1 shows an example diagram of an unsupervised domain adaptation technique.

일 실시예에 따르면, 비지도 도메인 적응 기술은 소스 도메인과 타겟 도메인이 다른 상황을 가정할 수 있다. 소스 도메인의 이미지(예: 소스 이미지)(110) 및 타겟 도메인의 이미지(예: 타겟 이미지)(130)의 둘 다에서 바운딩 박스 라벨이 검출될 수 있으나, 바운딩 박스 별 참값 라벨(ground truth label)은 소스 도메인(110)에만 있는 경우를 가정할 수 있다. 달리 말해, 소스 도메인에 속하는 트레이닝 데이터는 소스 이미지 및 소스 이미지에 대해 미리 주어진 참값 바운딩 박스라벨과 참값 바운딩 박스(ground truth bounding box) 별 참값 라벨(ground truth label)을 포함할 수 있다.According to one embodiment, the unsupervised domain adaptation technology may assume a situation where the source domain and the target domain are different. Bounding box labels can be detected in both the image of the source domain (e.g., source image) 110 and the image of the target domain (e.g., target image) 130, but the ground truth label for each bounding box is It can be assumed that it exists only in the source domain 110. In other words, training data belonging to the source domain may include a source image, a pre-given ground truth bounding box label for the source image, and a ground truth label for each ground truth bounding box.

예시적으로, 소스 도메인은 임의의 환경(예: 소스 환경)에서 획득된 데이터들(예: 소스 데이터, 소스 이미지)이 분포되는 도메인을 나타낼 수 있고, 타겟 도메인은 소스 환경과는 다른 환경(예: 타겟 환경)에서 획득된 데이터들(예: 타겟 데이터, 타겟 이미지)이 분포되는 도메인을 나타낼 수 있다. 수학적으로는 소스 도메인에 속하는 데이터들의 확률 분포와 타겟 도메인에 속하는 데이터들의 확률 분포가 서로 다른 것으로 해석될 수 있다. 예를 들어, 감시 카메라에 의해 획득된 이미지들과 스마트폰의 카메라에 의해 획득된 이미지들은 서로 다른 특성을 나타내므로, 서로 다른 도메인에 속할 수 있다.By way of example, the source domain may represent a domain in which data (e.g., source data, source image) acquired in an arbitrary environment (e.g., source environment) is distributed, and the target domain may represent an environment different from the source environment (e.g., : may indicate a domain in which data (e.g., target data, target image) acquired from the target environment is distributed. Mathematically, the probability distribution of data belonging to the source domain and the probability distribution of data belonging to the target domain can be interpreted as different. For example, images acquired by a surveillance camera and images acquired by a smartphone camera exhibit different characteristics and thus may belong to different domains.

예를 들어, 기계 학습 모델(예: 하기 도 2의 객체 검출 모델)은 전술된 소스 도메인에 속하는 트레이닝 데이터를 이용하여 트레이닝될 수 있다. 예를 들어, 기계 학습 모델은 소스 도메인의 각 이미지(111, 112)로부터 바운딩 박스 라벨의 검출 결과 및/또는 검출된 바운딩 박스 별 ID 라벨(ID1, ID2, ID3, ID4)의 추출 결과를 생성할 수 있다. 기계 학습 모델은 바운딩 박스 라벨의 검출 결과 및/또는 검출된 바운딩 박스 별 ID 라벨(ID1, ID2, ID3, ID4)의 추출 결과에 기초하여 소스 객체의 특징을 추출할 수 있다. 소스 도메인 스페이스(120)에서는 특징 스페이스(feature space)에서 각 이미지(111, 112)로부터 추출된 소스 객체의 특징 벡터를 이용한 분포가 나타날 수 있다. 기계 학습 모델은 소스 객체의 특징 벡터를 이미지 별로 그룹(121, 122)으로 분류할 수 있다. 예를 들면, 이미지(111)는 그룹(121)으로 분류될 수 있고, 이미지(112)는 그룹(122)으로 분류될 수 있다. 전술된 바와 같이 트레이닝된 기계 학습 모델은 타겟 도메인에 속하는 이미지에 대해서도 사용될 수 있다. 소스 도메인(110)에 대해 트레이닝된 기계 학습 모델을 타겟 도메인(130)에 속하는 데이터(예: 이미지)(131, 132, 133)에 대해 사용하는 것은, 소스 도메인(110)을 이용하여 학습된 지식(knowledge)이 타겟 도메인(130)으로 전이(transfer)되는 것으로 해석될 수 있다. 따라서, 참값 라벨을 확보하기 어렵거나, 참값 라벨이 확보될 수 없는 타겟 도메인에 속하는 이미지(131, 132, 133)는 전술된 기계 학습 모델을 이용하여 타겟 도메인 스페이스(140)와 같이 그룹(141, 142, 143)으로 분류될 수 있다. 타겟 도메인 스페이스(140)에서는 특징 스페이스에서 각 이미지(131, 132, 133)로부터 추출된 타겟 객체의 특징 벡터를 이용한 분포가 나타날 수 있다. 타겟 도메인 스페이스(140)에서는 소스 도메인(110)으로부터 전이된 지식을 이용하여 타겟 객체의 특징 벡터가 이미지별 그룹(141, 142, 143)으로 분류될 수 있다. 예를 들면, 이미지(131)는 그룹(141)으로 분류될 수 있고, 이미지(132)는 그룹(142)으로 분류될 수 있으며, 이미지(133)는 그룹(143)으로 분류될 수 있다. 이에 따라 기계 학습 모델의 트레이닝 과정에서 타겟 도메인은 라벨이 필요하지 않을 수 있다.For example, a machine learning model (e.g., the object detection model in Figure 2 below) can be trained using training data belonging to the source domain described above. For example, the machine learning model may generate detection results of bounding box labels and/or extraction results of ID labels (ID1, ID2, ID3, ID4) for each detected bounding box from each image (111, 112) of the source domain. You can. The machine learning model may extract features of the source object based on the detection result of the bounding box label and/or the extraction result of the ID label (ID1, ID2, ID3, ID4) for each detected bounding box. In the source domain space 120, a distribution using the feature vector of the source object extracted from each image 111 and 112 in the feature space may appear. The machine learning model can classify the feature vector of the source object into groups 121 and 122 for each image. For example, image 111 may be classified into group 121 and image 112 may be classified into group 122. The machine learning model trained as described above can also be used for images belonging to the target domain. Using a machine learning model trained on the source domain 110 on data (e.g., images) 131, 132, 133 belonging to the target domain 130 is based on knowledge learned using the source domain 110. (knowledge) may be interpreted as being transferred to the target domain 130. Therefore, the images 131, 132, and 133 belonging to the target domain for which it is difficult to secure a true value label or for which a true value label cannot be secured are grouped 141, 142, 143). In the target domain space 140, a distribution using the feature vector of the target object extracted from each image 131, 132, and 133 in the feature space may appear. In the target domain space 140, the feature vector of the target object may be classified into groups 141, 142, and 143 for each image using knowledge transferred from the source domain 110. For example, image 131 may be classified into group 141, image 132 may be classified into group 142, and image 133 may be classified into group 143. Accordingly, the target domain may not require a label during the training process of the machine learning model.

비지도 도메인 적응형 사람 식별 기술(Unsupervised Domain Adaptative Person Re-Identification)은 타겟 도메인의 라벨이 없는 상태에서 소스 도메인과 타겟 도메인 간의 거리(예: 차이)를 줄이는 방식을 포함할 수 있다. 소스 도메인과 타겟 도메인 간의 거리(예: 차이)를 줄이는 방식 중 하나는 생성 모델을 사용하여 소스 도메인의 스타일을 타겟 도메인처럼 변형하거나 소스 도메인과 타겟 도메인을 함께 학습에 이용하는 방식이 있다. 그러나 비지도 도메인 적응형 사람 식별 기술은 사람 검색 기술이 아닌 사람 식별 기술로 사람 검출 단계가 없으므로 영상에 한 사람만 존재하는 제한적인 상황에서만 적용가능한 문제가 있다. 이에 본 발명은 비지도 도메인 적응형 사람 식별 기술을 응용하여 전술한 문제들에 대한 해결 방안을 도출하였다.Unsupervised Domain Adaptive Person Re-Identification may include reducing the distance (e.g., difference) between the source domain and the target domain while the target domain is unlabeled. One way to reduce the distance (e.g. difference) between the source domain and the target domain is to use a generative model to transform the style of the source domain to be like the target domain or to use the source domain and target domain together for learning. However, the unsupervised domain adaptive person identification technology is a person identification technology, not a person search technology, and does not have a person detection step, so there is a problem that it can only be applied in limited situations where only one person is present in the video. Accordingly, the present invention derived a solution to the above-mentioned problems by applying unsupervised domain adaptive human identification technology.

도 2는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 장치의 블록도를 도시한다.Figure 2 shows a block diagram of an apparatus for training an object search model for unsupervised domain adaptation according to an embodiment.

일 실시예에 따른 전자 장치(예: 객체 검색 모델의 트레이닝 장치(200))는 통신부(210), 메모리(220) 및 프로세서(230)를 포함할 수 있다. 객체 검색 모델의 트레이닝 장치(200)는 객체(예: 사람) 검색 전 객체 검색 모델을 트레이닝시킬 수 있다. 객체 검색 모델은 객체 검출 모델 및 객체 식별 모델을 포함할 수 있다. 예를 들어, 전자 장치는 트레이닝된 객체 검출 모델을 이용하여 이미지로부터 객체를 검출하고, 트레이닝된 객체 식별 모델을 이용하여 객체 검출 모델에 의해 검출된 객체를 식별할 수 있다.An electronic device (eg, an object search model training device 200) according to an embodiment may include a communication unit 210, a memory 220, and a processor 230. The object search model training apparatus 200 may train an object search model before searching for an object (eg, a person). The object search model may include an object detection model and an object identification model. For example, the electronic device may detect an object from an image using a trained object detection model and identify the object detected by the object detection model using the trained object identification model.

통신부(210)는 외부의 장치와 연결되어 외부 장치로부터 객체 검색 모델의 트레이닝 및 추론에 필요한 이미지들을 수신할 수 있다. 또한, 객체 검색 모델의 트레이닝 장치(200)는 하기 도 6에서 설명하는 바와 같이 객체 검색 장치로도 이용될 수 있으므로, 통신부(210)를 통해 객체 검색이 필요한 이미지들을 수신할 수 있다.The communication unit 210 is connected to an external device and can receive images necessary for training and inference of an object search model from the external device. In addition, the object search model training device 200 can also be used as an object search device as described in FIG. 6 below, and thus can receive images requiring object search through the communication unit 210.

메모리(220)는 객체 검출 모델 및 객체 식별 모델을 트레이닝시키는 인스트럭션들(instruction) 및 트레이닝에 필요한 하이퍼 파라미터 값을 저장할 수 있다. 하이퍼파라미터는 모델링할 때 사용자가 직접 세팅해주는 값으로 일 실시예에 따르면 및 가 포함되나 이에 한정하지 않는다. 추가로, 메모리(220)는 통신부(210)를 통해 외부 장치로부터 수신한 데이터를 임시 또는 장기로 저장할 수 있다. 메모리(220)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 휘발성 저장장치일 수 있다. 예를 들면, 메모리(220)는 콤팩트 플래시(compact flash; CF) 카드, SD(secure digital) 카드, 메모리 스틱(memorystick), 솔리드 스테이트 드라이브(solid-state drive; SSD) 및 마이크로(micro) SD 카드 등과 같은 낸드 플래시 메모리(NAND flash memory), 하드 디스크 드라이브(hard disk drive; HDD) 등과 같은 마그네틱 컴퓨터 기억장치 및 CD-ROM, DVD-ROM 등과 같은 광학 디스크 드라이브(optical disc drive) 등을 포함할 수 있다.The memory 220 may store instructions for training an object detection model and an object identification model, and hyperparameter values required for training. According to one embodiment, hyperparameters are values that the user directly sets when modeling. and Includes but is not limited to this. Additionally, the memory 220 may temporarily or long-term store data received from an external device through the communication unit 210. The memory 220 may be a non-volatile storage device or a volatile storage device that continues to retain stored information even when power is not supplied. For example, memory 220 may include compact flash (CF) cards, secure digital (SD) cards, memory sticks, solid-state drives (SSD), and micro SD cards. It may include magnetic computer storage devices such as NAND flash memory, hard disk drives (HDD), and optical disc drives such as CD-ROM, DVD-ROM, etc. there is.

인스트럭션들은 프로세서(230)에서 수행하는 모든 동작들을 포함할 수 있다. 프로세서(230)는 메모리(220)에 저장된 인스트럭션들을 실행하여 객체 검출 모델 및 객체 식별 모델을 트레이닝시킬 수 있다.Instructions may include all operations performed by the processor 230. The processor 230 may execute instructions stored in the memory 220 to train an object detection model and an object identification model.

도 3은 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 장치 내 프로세서에 의해 동작되는 도메인 기준에 따른 흐름도를 도시한다.Figure 3 shows a flowchart according to domain criteria operated by a processor in a training device for an object search model for unsupervised domain adaptation according to an embodiment.

프로세서(230)는 객체 검출 모델(3110)을 이용하여 동작(3100)을 수행할 수 있고, 객체 식별 모델(3210)을 이용하여 동작(3200)을 수행할 수 있다.The processor 230 may perform operation 3100 using the object detection model 3110 and may perform operation 3200 using the object identification model 3210.

동작(3100)에서, 트레이닝 장치는 소스 도메인(source domain)에 속하는 트레이닝 데이터를 이용하여 객체 검출 모델(3110)의 지도 트레이닝을 수행할 수 있다. 지도 트레이닝 과정은 정답이 있는 소스 도메인을 이용하여 수행될 수 있다. 지도 트레이닝이 완료된 객체 검출 모델(3110)은 후술하는 객체 식별 모델(3210)의 트레이닝시 정답이 없는 타겟 도메인의 데이터에 대해서도 이용될 수 있다.In operation 3100, the training device may perform supervised training of the object detection model 3110 using training data belonging to a source domain. The supervised training process can be performed using a source domain with the correct answer. The object detection model 3110 for which supervised training has been completed can also be used for data in the target domain for which there is no correct answer when training the object identification model 3210, which will be described later.

객체 검출 모델(3110)은 입력된 이미지에 대해 바운딩 박스와 객체 정보(예: 객체의 종류를 지시하는 ID 라벨)를 출력하도록 설계 및 트레이닝될 수 있다. 예를 들어, 입력이 소스 도메인(예: 전체 영상(scene image))인 경우, 객체 검출 모델(3110)의 출력은 객체가 있을 것으로 예측된 소스 바운딩 박스(bounding box) 라벨과 소스 객체 검출 ID 라벨(예: 검출된 객체의 종류)을 포함할 수 있다. 예를 들어, 프로세서(230)는 소스 도메인의 참값(ground truth, GT) 라벨(예: 바운딩 박스 라벨 및 객체 검출 ID 라벨)을 이용하여 객체 검출 모델(3110)을 트레이닝시킬 수 있다. 참고로, 사람 검색에서는 검출된 객체가 사람이어야 하고, 사람이 아니면 배경으로 취급될 수 있다. 트레이닝 장치는 객체 검출 ID 라벨이 사람을 지시하는 것으로 검출된 바운딩 박스 외의 나머지 객체 검출 ID 라벨에 대응하는 바운딩 박스에 대해서는 배경으로 처리하여 객체 검출 모델(3110)을 트레이닝시킬 수 있다.The object detection model 3110 may be designed and trained to output a bounding box and object information (e.g., an ID label indicating the type of object) for the input image. For example, if the input is a source domain (e.g., an entire scene image), the output of the object detection model 3110 is the source bounding box label where the object is predicted to be and the source object detection ID label. (e.g. type of detected object) may be included. For example, the processor 230 may train the object detection model 3110 using ground truth (GT) labels (e.g., bounding box labels and object detection ID labels) of the source domain. For reference, in human search, the detected object must be a person, and if it is not a person, it may be treated as a background. The training device may train the object detection model 3110 by processing the bounding boxes corresponding to the remaining object detection ID labels as background, other than the bounding box in which the object detection ID label is detected to indicate a person.

한편, 트레이닝이 완료된 객체 검출 모델(3110)은 후술하는 객체 식별 모델(3210)의 트레이닝 동작 및/또는 객체 검색 모델을 이용한 추론 동작에 사용될 수 있다. 이 때, 객체 검출 모델(3110)에는 소스 도메인이 아닌 타겟 도메인에 속하는 데이터가 입력될 수 있다. 입력이 타겟 도메인(target domain)인 경우, 객체 검출 모델(3110)의 출력은 객체의 검출 결과(예: 타겟 바운딩 박스 라벨 및 타겟 객체 검출 ID 라벨)일 수 있다. 프로세서(230)는 타겟 도메인에서 검출된 객체를 이용하여 동작(3200)을 수행할 수 있다.Meanwhile, the object detection model 3110 on which training has been completed can be used in a training operation of the object identification model 3210 and/or an inference operation using an object search model, which will be described later. At this time, data belonging to the target domain rather than the source domain may be input to the object detection model 3110. When the input is a target domain, the output of the object detection model 3110 may be an object detection result (eg, target bounding box label and target object detection ID label). The processor 230 may perform operation 3200 using an object detected in the target domain.

또한, 객체 검출 모델은 Faster R-CNN 모델을 포함할 수 있다. Faster R-CNN은 딥러닝 기반의 영역 제안 네트워크(Region Proposal Network, RPN)와 Fast R-CNN 모델로 구성될 수 있다. 딥러닝 기반의 영역 제안 네트워크(RPN)는 객체를 포함할 가능성이 높은 영역을 선택적으로 탐색하여 객체를 탐지할 수 있다. 그런 후, Fast R-CNN 모델은 탐지된 객체를 전달받아 객체의 클래스(class)와 위치를 예측할 수 있다. Faster R-CNN은 매우 빠른 단일 통합 네트워크이므로 하나의 네트워크만 트레이닝하고 실행하면 된다.Additionally, the object detection model may include a Faster R-CNN model. Faster R-CNN can be composed of a deep learning-based Region Proposal Network (RPN) and a Fast R-CNN model. A deep learning-based region proposal network (RPN) can detect objects by selectively searching regions that are likely to contain objects. Then, the Fast R-CNN model can receive the detected object and predict the object's class and location. Faster R-CNN is a very fast single integrated network, so you only need to train and run one network.

동작(3200)에서, 트레이닝 장치는 소스 도메인, 타겟 도메인 및 중간 도메인을 이용하여 객체 식별 모델(3210)의 비지도 트레이닝을 수행할 수 있다. 비지도 트레이닝 과정은 객체 식별 모델(3210)을 이용하여 수행할 수 있으며, 추론 단계는 도6에서 후술한다.In operation 3200, the training device may perform unsupervised training of the object identification model 3210 using the source domain, target domain, and intermediate domain. The unsupervised training process can be performed using the object identification model 3210, and the inference step is described later in FIG. 6.

프로세서(230)는, 객체 검출 모델(3110)에 의해 검출된 객체를 식별하기 위하여, 객체 식별 모델(3210)을 트레이닝시킬 수 있다. 객체 식별 모델(3210)은 객체 검출 결과(예: 사람으로 분류된 바운딩 박스)에 대응하는 이미지(예: 바운딩 박스에 대응하는 패치 이미지(patch image))를 입력받아 객체 식별 ID라벨을 출력하도록 설계 및 트레이닝될 수 있다. 후술하겠으나, 객체 식별 ID라벨은, 객체의 실제 신원을 지시하는 정보라기 보다, 여러 이미지들에서 검출된 바운딩 박스들의 객체들을 구분하기 위한 일종의 임시 식별자일 수 있다. 예를 들어, 복수의 이미지들에서 검출된 바운딩 박스들 중 같은 객체를 포함하는 것으로 판단된 바운딩 박스에 대해서는 같은 객체 식별 ID라벨이 부여될 수 있다. 다른 예를 들어, 복수의 이미지들에서 검출된 바운딩 박스들 중 서로 다른 객체를 포함하는 것으로 판단된 바운딩 박스들에 대해서는 서로 다른 객체 식별 ID라벨이 부여될 수 있다.The processor 230 may train the object identification model 3210 to identify the object detected by the object detection model 3110. The object identification model 3210 is designed to receive an image (e.g., a patch image corresponding to the bounding box) corresponding to the object detection result (e.g., a bounding box classified as a person) and output an object identification ID label. and can be trained. As will be described later, the object identification ID label may be a type of temporary identifier for distinguishing objects in bounding boxes detected in multiple images, rather than information indicating the actual identity of the object. For example, among bounding boxes detected in a plurality of images, the same object identification ID label may be assigned to bounding boxes that are determined to contain the same object. For another example, different object identification ID labels may be assigned to bounding boxes determined to contain different objects among bounding boxes detected in a plurality of images.

일 실시예에 따른 객체 식별 모델(3210)은 특징 추출 모델(3211), IDM(Intermediate Domain Module)(3212), GAP layer(3213), FC layer(3214), 파트 분류 모델(3215) 및 클러스터링 알고리즘(3216)을 포함할 수 있다. 프로세서(230)는 객체 검출 모델(3110)에 의해 검출된 소스 객체 및 타겟 객체의 특징을 특징 추출 모델(3211)에 의해 추출할 수 있다. 소스 객체는 소스 이미지로부터 객체 검출 모델(3110)에 기초하여 검출된 객체를 나타내고 타겟 객체는 타겟 이미지로부터 객체 검출 모델(3110)에 기초하여 검출된 객체를 나타낼 수 있다. 프로세서(230)는 소스 이미지로부터 소스 객체에 대응하는 부분을 크롭(crop)함으로써 생성된 소스 패치 이미지를 특징 추출 모델(3211)에 입력함으로써, 소스 객체에 대한 특징 데이터(예: 특징 벡터)를 추출할 수 있다. 유사하게, 프로세서(230)는 타겟 이미지로부터 타겟 객체에 대응하는 부분을 크롭함으로써 생성된 타겟 패치 이미지를 특징 추출 모델(3211)에 입력함으로써, 타겟 객체에 대한 특징 데이터를 추출할 수 있다.The object identification model 3210 according to one embodiment includes a feature extraction model 3211, an intermediate domain module (IDM) 3212, a GAP layer 3213, an FC layer 3214, a part classification model 3215, and a clustering algorithm. It may include (3216). The processor 230 may extract features of the source object and target object detected by the object detection model 3110 using the feature extraction model 3211. The source object may represent an object detected based on the object detection model 3110 from the source image, and the target object may represent an object detected based on the object detection model 3110 from the target image. The processor 230 extracts feature data (e.g., feature vector) for the source object by inputting the source patch image generated by cropping the portion corresponding to the source object from the source image into the feature extraction model 3211. can do. Similarly, the processor 230 may extract feature data for the target object by inputting a target patch image generated by cropping a portion corresponding to the target object from the target image into the feature extraction model 3211.

프로세서(230)는 특징 추출 모델(3211)에 의해 추출된 소스 특징 데이터 및 타겟 특징 데이터를 혼합함으로써 중간 특징 데이터를 생성할 수 있다. 중간 특징 데이터는 중간 도메인에 속하는 특징 데이터를 나타낼 수 있다. 예시적으로 프로세서(230)는 IDM(3212)에 기초하여 소스 특징 데이터 및 타겟 특징 데이터를 혼합함으로써 중간 특징 데이터를 생성할 수 있다. IDM(3212)에 기초한 중간 특징 데이터의 생성은, 'Dai, Y.; Liu, J.; Sun, Y.; Tong, Z.; Zhang, C.; and Duan, L.- Y. 2021. Idm: An intermediate domain module for domain adaptive person re-id. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 11864- 11874'에서 참조될 수 있다. 중간 도메인은 소스 도메인과 타겟 도메인을 매개하는 도메인을 나타낼 수 있다. 소스 도메인을 이용하여 트레이닝된 지식은 중간 도메인을 경유하여 도메인 적응(Domain Adaptation) 시 타겟 도메인으로 용이하게 전이(transfer)될 수 있다.The processor 230 may generate intermediate feature data by mixing source feature data and target feature data extracted by the feature extraction model 3211. Intermediate feature data may represent feature data belonging to an intermediate domain. Exemplarily, the processor 230 may generate intermediate feature data by mixing source feature data and target feature data based on the IDM 3212. Generation of intermediate feature data based on IDM (3212), 'Dai, Y.; Liu, J.; Sun, Y.; Tong, Z.; Zhang, C.; and Duan, L.- Y. 2021. Idm: An intermediate domain module for domain adaptive person re-id. Reference may be made to 'In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 11864-11874'. The intermediate domain may represent a domain that mediates the source domain and the target domain. Knowledge trained using the source domain can be easily transferred to the target domain during domain adaptation via an intermediate domain.

프로세서(230)는 소스 특징 데이터, 타겟 특징 데이터 및 중간 특징 데이터를 개별적으로 전역 평균 풀링(Global average pooling, GAP) 레이어(3213)에 통과시켜 특징의 수를 감소시킬 수 있다. 프로세서(230)는 특징의 수가 감소된 중간 특징 데이터를 바로 도메인 적응 손실 함수에 적용할 수 있다. 또한, 프로세서(230)는 특징의 수가 감소된 소스 특징 데이터 및 타겟 특징 데이터를 바로 특징 추출 손실 함수에 적용할 수 있다. FC(fully-connected) 레이어(3214)는 마지막 특징과 매트릭스(matrix) 곱을 한 결과를 출력할 수 있다. 이에 프로세서(230)는 GAP 레이어(3213)를 통과하여 특징의 수가 감소한 소스 특징 데이터, 타겟 특징 데이터 및 중간 특징 데이터를 FC 레이어(3214)에 각각 개별적으로 통과시켜 매트릭스 형태의 결과 값을 출력할 수 있다. 이렇게 출력된 매트릭스 형태의 결과는 도메인 적응 손실 함수 및 특징 추출 손실 함수의 값을 산출하는데 각각 사용될 수 있다.The processor 230 may individually pass the source feature data, target feature data, and intermediate feature data through a global average pooling (GAP) layer 3213 to reduce the number of features. The processor 230 can directly apply intermediate feature data with a reduced number of features to the domain adaptation loss function. Additionally, the processor 230 may directly apply source feature data and target feature data with a reduced number of features to the feature extraction loss function. The fully-connected (FC) layer 3214 can output the result of multiplying the last feature with a matrix. Accordingly, the processor 230 passes the source feature data, target feature data, and intermediate feature data with a reduced number of features through the GAP layer 3213 individually through the FC layer 3214 to output result values in the form of a matrix. there is. The results in the form of a matrix output in this way can be used to calculate the values of the domain adaptation loss function and the feature extraction loss function, respectively.

프로세서(230)는 전술한 중간 특징 데이터를 파트 분류 모델(3215)에 적용하여 복수개의 파트(part)들로 분류할 수 있다. 이러한 분류 결과에 따라 프로세서(230)는 복수개의 파트들에 대한 정답 예측 확률을 파트(part) 분류 손실 함수에 적용할 수 있다. 프로세서(230)는 파트(part) 분류 손실 함수를 이용하여 파트 분류 모델(3215)을 트레이닝시킬 수 있다. 추가로 프로세서(230)는 복수의 파트들에 대한 정답 예측 확률의 평균에 기초하여 신뢰성 스코어를 산출할 수 있다. 또한, 프로세서(230)는 타겟 특징 데이터를 클러스터링 알고리즘(3216)에 통과시켜 타겟 특징 데이터에 기초하여 타겟 객체에 대한 클러스터링을 수행할 수 있다. 프로세서(230)는 클러스터링된 클러스터를 정제하여 정제된 클러스터를 형성할 수 있다. 프로세서(230)는 정제된 클러스터링 결과를 특징 추출 손실 함수에 적용하여 특징 추출 모델(3211)을 트레이닝시킬 수 있다. 또한, 프로세서(230)는 정제된 클러스터링 결과와 전술한 신뢰성 스코어를 도메인 적응 손실 함수에 적용하여 객체 식별 모델(3210)을 트레이닝시킬 수 있다. 결과적으로 프로세서(230)는 객체 식별 모델(3210)을 트레이닝시키면서 객체 식별 모델(3210) 내에 있는 모델 또한 트레이닝시킬 수 있다. 또한, 전술한 특징 추출 모델(3211), IDM(3212), GAP layer(3213), FC layer(3214), 파트 분류 모델(3215) 및 클러스터링 알고리즘(3216)에 의한 동작들은 동시에 이루어질 수 있으나, 이에 한정하지 않고, 순차적으로도 이루어질 수 있다.The processor 230 may apply the above-described intermediate feature data to the part classification model 3215 to classify it into a plurality of parts. According to this classification result, the processor 230 may apply the correct prediction probability for a plurality of parts to the part classification loss function. The processor 230 may train the part classification model 3215 using a part classification loss function. Additionally, the processor 230 may calculate a reliability score based on the average of the correct prediction probabilities for a plurality of parts. Additionally, the processor 230 may pass the target feature data through the clustering algorithm 3216 to perform clustering on the target object based on the target feature data. The processor 230 may refine the clustered cluster to form a refined cluster. The processor 230 may train the feature extraction model 3211 by applying the refined clustering result to the feature extraction loss function. Additionally, the processor 230 may train the object identification model 3210 by applying the refined clustering result and the above-described reliability score to the domain adaptation loss function. As a result, the processor 230 can train the object identification model 3210 while also training models within the object identification model 3210. In addition, operations by the above-described feature extraction model (3211), IDM (3212), GAP layer (3213), FC layer (3214), part classification model (3215), and clustering algorithm (3216) may be performed simultaneously. It is not limited and can be done sequentially.

도 4는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법의 흐름도를 도시한다.Figure 4 shows a flowchart of a method for training an object search model for unsupervised domain adaptation according to one embodiment.

단계(410)에서, 프로세서는 트레이닝이 완료된 객체 검출 모델에 기초하여 소스 도메인에 속하는 소스 데이터의 소스 객체를 검출할 수 있다. 프로세서는 소스 도메인의 GT 라벨(예: 바운딩 박스 라벨, ID 라벨)을 이용하여 객체 검출 모델에 지도 트레이닝을 수행할 수 있다. 소스 객체는 객체 검출 모델에 의한 검출없이 트레이닝을 위한 GT 라벨을 다음단계에 사용할 수 있다.In step 410, the processor may detect the source object of source data belonging to the source domain based on the trained object detection model. The processor can perform supervised training on an object detection model using GT labels (e.g., bounding box label, ID label) of the source domain. The source object can use the GT label for training in the next step without being detected by the object detection model.

단계(420)에서, 프로세서는 객체 검출 모델에 기초하여 타겟 도메인에 속하는 타겟 데이터의 타겟 객체를 검출할 수 있다. 도메인에는 복수의 이미지들이 있고 객체는, 예를 들면, 한 명 이상의 사람일 수 있다. In step 420, the processor may detect the target object of target data belonging to the target domain based on the object detection model. There are multiple images in the domain and the object may be, for example, one or more people.

단계(430)에서, 프로세서는 특징 추출 모델에 기초하여 검출된 소스 객체로부터 소스 특징 데이터, 검출된 타겟 객체로부터 타겟 특징 데이터를 추출할 수 있다. 예를 들면, 프로세서는 소스 특징 데이터 및 타겟 특징 데이터를 Re-ID 모델을 이용하여 추출할 수 있다.In step 430, the processor may extract source feature data from the detected source object and target feature data from the detected target object based on the feature extraction model. For example, the processor may extract source feature data and target feature data using a Re-ID model.

단계(440)에서, 프로세서는 소스 특징 데이터 및 타겟 특징 데이터로부터 생성된 중간 특징 데이터에 기초하여 타겟 객체에 대한 신뢰성 스코어를 결정할 수 있다. 프로세서는 IDM에 소스 특징 데이터 및 타겟 특징 데이터를 입력하여 중간 특징 데이터를 생성할 수 있다. 신뢰성 스코어는 수학식 1 및 2에서 후술한다.At step 440, the processor may determine a trust score for the target object based on intermediate feature data generated from the source feature data and the target feature data. The processor may generate intermediate feature data by inputting source feature data and target feature data into the IDM. The reliability score is described later in Equations 1 and 2.

단계(450)에서, 프로세서는 타겟 특징 데이터에 기초하여 타겟 데이터로부터 검출된 타겟 객체를 클러스터링할 수 있다. 프로세서는 클러스터링을 통해 타겟 객체에 수도 라벨(pseudo label)을 부여할 수 있다.In step 450, the processor may cluster target objects detected from target data based on target feature data. The processor can assign a pseudo label to the target object through clustering.

단계(460)에서, 프로세서는 유일성(uniqueness) 논리를 적용하여 타겟 객체에 대한 클러스터를 정제(refine)할 수 있다. 유일성 논리는 “한 영상에서 같은 사람은 등장할 수 없다”는 의미를 내포하고, 유일성 논리를 적용하여 클러스터를 정제하는 방식은 도5에서 후술한다.At step 460, the processor may apply uniqueness logic to refine the cluster for the target object. Uniqueness logic implies that “the same person cannot appear in one video,” and the method of refining a cluster by applying uniqueness logic is described later in Figure 5.

단계(470)에서, 프로세서는 정제된 클러스터링 결과 및 신뢰성 스코어에 기초한 목적 함수(objective function)를 이용하여 특징 추출 모델을 포함하는 객체 식별 모델을 트레이닝시킬 수 있다. 목적함수는 트레이닝한 모델의 최적화 알고리즘에서 최대값 또는 최소값을 찾아야하는 함수로, 본 발명에서는 손실 함수(loss function)를 사용할 수 있다. 손실(또는 에러(Error))은 0이 완벽한 모델로 0과 1 사이의 숫자로 측정될 수 있다. 이에 모델은 손실이 가능한 0에 가깝게 되도록 최적화할 수 있다. 신뢰성 스코어에 기초한 목적 함수는 수학식 4에서 후술한다.At step 470, the processor may train an object identification model including a feature extraction model using an objective function based on the refined clustering results and reliability scores. The objective function is a function that must find the maximum or minimum value in the optimization algorithm of the trained model, and a loss function can be used in the present invention. Loss (or error) can be measured as a number between 0 and 1, with 0 being a perfect model. Accordingly, the model can be optimized so that the loss is as close to 0 as possible. The objective function based on the reliability score is described later in Equation 4.

객체 검출 모델을 통한 타겟 도메인의 객체 검출은 타겟 도메인에 바운딩 박스 라벨 정보가 없기 때문에 객체 검출 결과가 부정확할 확률이 높다. 또한, 객체 식별 모델을 이용한 객체 식별은 객체 검출 모델을 이용하여 검출된 객체에 기초하여 수행되므로 연쇄적으로 낮은 식별 결과를 갖게 될 수 있다. 이를 해결하기 위하여 프로세서는 신뢰성 스코어 값을 산출하여 객체 검출 결과를 확인할 수 있다. 프로세서는 파트(part) 분류 모델에 의해 분류된 파트(part)의 행렬 값에 기초하여 신뢰성 스코어를 결정할 수 있다. Detecting objects in a target domain through an object detection model has a high probability of inaccurate object detection results because there is no bounding box label information in the target domain. Additionally, object identification using the object identification model is performed based on objects detected using the object detection model, which may result in successively lower identification results. To solve this problem, the processor can check the object detection result by calculating a reliability score value. The processor may determine the reliability score based on the matrix value of the part classified by the part classification model.

파트(part) 분류 모델은 중간 특징 데이터에 파셜 최대 풀링(partial max pooling)을 적용하여 여러 개의 나누어진 부분을 하나의 파트(part)으로 지정할 수 있다. 예를 들면, 파트(part) 분류 모델은 검출된 객체(예: 사람)를 가로로 일정하게 나누어 각 부분을 라벨링할 수 있다. 이에 예를 들면, 첫번째 라벨은 0이 될 수 있고, 두번째 라벨은 1이 될 수 있다. 또한, 프로세서는 다층 퍼셉트론(Multi-Layer Perceptron, MLP) 층 하나로 이루어진 부분 분류화(part classification)을 이용하여 파트(part)을 분류할 수 있다. 미니배치(Mini-batch)가 주어졌을 때 파트(part) 분류 모델의 출력은 P 이고 P 는 의 예측 스코어 행렬을 나타낼 수 있다. 는 파트의 개수일 수 있고, 행렬 P 의 원소 는 i번째 파트(part)가 j번째 파트(part)로 분류될 확률을 나타낼 수 있다. 또한, 프로세서는 파트(part) 분류 모델의 손실함수가 감소하는 방향으로 파트(part) 분류 모델을 트레이닝시킬 수 있다. 파트(part) 분류 모델의 손실함수는 수학식1과 같이 표현될 수 있다.The part classification model can designate multiple divided parts as one part by applying partial max pooling to intermediate feature data. For example, a part classification model can divide a detected object (e.g., a person) horizontally and label each part. So, for example, the first label could be 0, and the second label could be 1. Additionally, the processor can classify parts using part classification consisting of one layer of a multi-layer perceptron (MLP). Given a mini-batch, the output of the part classification model is P , and P is The prediction score matrix of can be expressed. can be the number of parts, elements of matrix P May represent the probability that the i-th part is classified as the j-th part. Additionally, the processor can train the part classification model in a direction that reduces the loss function of the part classification model. The loss function of the part classification model can be expressed as Equation 1.

전술한 수학식 1에서 는 i번째 부분의 정답 라벨(GT label)을 나타낼 수 있고, 는 정답 예측 확률을 나타낼 수 있으며, 는 하이퍼파라미터로 0.04일 수 있다.In the above equation 1, can represent the correct answer label (GT label) of the ith part, can represent the predicted probability of a correct answer, can be 0.04 as a hyperparameter.

한편, 신뢰성 스코어는 예측 확률의 평균으로 결정될 수 있으며, 수학식2와 같이 표현될 수 있다.Meanwhile, the reliability score can be determined as the average of the predicted probabilities and can be expressed as Equation 2.

전술한 수학식 2에서 는 전술한 와 동일하게 정답 예측 확률을 나타낼 수 있고, 신뢰성 스코어는 파트(part)의 분류 정도를 또한 나타낼 수 있다. 이에 값이 1인 경우, 모든 파트가 잘 분류되었고 검출된 타겟 객체는 정확하게 검출된 것을 의미할 수 있다. 만약 값이 1보다 작은 경우, 검출된 타겟 객체는 부정확하게 검출되었음을 의미할 수 있다. 예를 들어, 검출된 타겟 객체가 사람이 아닌 배경인 경우, 파트(part) 분류 모델은 파트를 분류하기 어려워 정확하게 분류될 확률 또한 낮아질 수 있으므로 는 1보다 작아질 수 있다.In the aforementioned equation 2, is the aforementioned In the same way as, the prediction probability of the correct answer can be expressed, and the reliability score may also indicate the degree of classification of the part. Therefore If the value is 1, this may mean that all parts are well classified and the detected target object is accurately detected. if If the value is less than 1, it may mean that the detected target object was detected incorrectly. For example, if the detected target object is a background rather than a person, it may be difficult for the part classification model to classify the part, thus lowering the probability of accurate classification. can be smaller than 1.

프로세서는 객체 식별 모델을 트레이닝시키기 위해서 ID 라벨이 필요하나 본 발명의 타겟 도메인에는 ID 라벨이 없으므로 ID 라벨의 할당이 필요하다. 프로세서는 밀도 기반 클러스터링(DBSCAN, Density-based spatial clustering of applications with noise)이라는 군집화 알고리즘을 이용하여 수도 라벨(예: 객체 식별 ID 라벨)을 할당할 수 있다. 이에 프로세서는 밀도 차이를 이용하여 클러스터링을 수행할 수 있다. 그러나 이러한 클러스터링 방식은 클러스터링의 정확도가 떨어지는 단점이 있다. 따라서 이를 개선하기 위하여 본 발명에서는 유일성(uniqueness) 논리를 적용하여 클러스터링된 클러스터를 정제할 수 있다.The processor needs an ID label to train an object identification model, but since the target domain of the present invention does not have an ID label, assignment of an ID label is necessary. The processor may assign capital labels (e.g., object identification ID labels) using a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN). Accordingly, the processor can perform clustering using the density difference. However, this clustering method has the disadvantage of low clustering accuracy. Therefore, in order to improve this, the present invention can refine the clustered cluster by applying uniqueness logic.

도 5는 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법 중 클러스터링 방식 및 클러스터링된 클러스터를 정제하는 방식의 예시도를 도시한다.Figure 5 shows an example of a clustering method and a method of refining clustered clusters among the training methods of an object search model for unsupervised domain adaptation according to an embodiment.

(510), (520)은 특징 스페이스(feature space)에서 이미지()로부터 검출된 타겟 객체의 특징 벡터를 이용한 분포를 나타낼 수 있다. 프로세서는 클러스터링 결과에 기초하여 ID라벨을 할당할 수 있다. 프로세서는, 예를 들면, (510), (520)에서 기준(Anchor) 객체를 (511) 객체로 지정할 수 있다. 클러스터링의 밀도는 이동수단 특징을 중심으로 집중될 수 있다. 이에 따라 클러스터링 알고리즘은 동일한 이동수단을 이용하는 (511) 객체, (512) 객체, (513) 객체, (514) 객체를 동일 객체로 분류하여 하나의 클러스터로 클러스터링할 수 있다. 하나의 클러스터에는 동일한 ID 라벨이 할당되므로 (511) 객체, (512) 객체, (513) 객체, (514) 객체는 동일한 ID 라벨을 가질 수 있다. 프로세서는 특징 추출 모델의 최적화를 위하여 다른 ID 라벨을 가진 객체들 중에 가장 거리가 가까운 객체를 부정(negative) 객체로 지정할 수 있다. 이에 따라 (510)에서 프로세서는 (515) 객체를 부정 객체로 지정할 수 있다. 또한, 프로세서는 같은 ID 라벨을 가진 객체들 중에 가장 거리가 먼 객체를 긍정(positive) 객체로 지정할 수 있다. 이에 따라 (510)에서 프로세서는 (512) 객체를 긍정 객체로 지정할 수 있다.(510), (520) are images (510) and (520) in feature space ( ) can represent the distribution using the feature vector of the target object detected from. The processor may assign an ID label based on the clustering result. For example, the processor may designate the anchor object at (510) and (520) as the (511) object. The density of clustering can be centered around transportation characteristics. Accordingly, the clustering algorithm can classify (511) objects, (512) objects, (513) objects, and (514) objects that use the same means of transportation as the same object and cluster them into one cluster. Since the same ID label is assigned to one cluster, the (511) object, (512) object, (513) object, and (514) object can have the same ID label. To optimize the feature extraction model, the processor may designate the object with the closest distance among objects with different ID labels as a negative object. Accordingly, at 510, the processor may designate the object (515) as a negative object. Additionally, the processor can designate the object with the greatest distance among objects with the same ID label as a positive object. Accordingly, at 510, the processor may designate the object 512 as a positive object.

그러나 (511) 객체와 (513) 객체는 상호 다른 객체이지만 같은 클러스터 내에 있으므로 이에 대한 구분이 필요하다. 이를 위하여 프로세서는 클러스터 내에 각 객체를 이미지 별로 그룹화할 수 있다. 이에 "같은 영상에 등장한 사람들은 서로 다른 사람이다"라는 유일성 논리에 따라 하나의 이미지()에 대한 그룹(521)에서 기준 객체인 (511) 객체 외에는 클러스터에서 제외시킬 수 있다. 따라서 정제된 클러스터링 결과는 (511) 객체, (512) 객체, (514) 객체는 동일한 ID 라벨을 가질 수 있다. 이러한 결과에 따라 프로세서는 부정(negative) 객체를 (515) 객체에서 (513) 객체로 변경할 수 있다.However, although the (511) object and the (513) object are different objects, they are within the same cluster, so distinction is necessary. To this end, the processor can group each object within the cluster by image. Accordingly, according to the uniqueness logic that “people who appear in the same video are different people,” one image ( In the group 521 for ), objects other than (511), which are standard objects, can be excluded from the cluster. Therefore, the refined clustering result is that the (511) object, (512) object, and (514) object may have the same ID label. According to these results, the processor can change the negative object from the (515) object to the (513) object.

한편, 특징 추출 모델은 소스 도메인, 타겟 도메인 및 중간 도메인의 특징을 추출할 수 있다. 프로세서는 정제된 클러스터링 결과에 기초하여 산출된 손실함수가 감소하는 방향으로 특징 추출 모델을 트레이닝시킬 수 있다. 특징 추출 모델의 손실함수는 배치 하드 삼중항 손실(batch-hard triplet loss) 함수와 크로스 엔트로피 기반의 분류화 손실(classification loss) 함수로 구성될 수 있다.Meanwhile, the feature extraction model can extract features of the source domain, target domain, and intermediate domain. The processor can train the feature extraction model in a direction that reduces the loss function calculated based on the refined clustering results. The loss function of the feature extraction model can be composed of a batch-hard triplet loss function and a cross-entropy-based classification loss function.

배치 하드 삼중항 손실(batch-hard triplet loss) 함수는 수학식3과 같이 표현될 수 있다.The batch-hard triplet loss function can be expressed as Equation 3.

전술한 수학식 3에서 프로세서는 배치 크기가 PXK인 미니 배치에 대해 P개의 ID에서 K개의 특징 벡터를 무작위로 선택할 수 있다. 는 기준 객체, 는 긍정 객체, 는 부정 객체를 나타낼 수 있다. D(·)는 유클리드 거리를 나타내고, m은 하이퍼파라미터로 사용자에 의해 지정될 수 있다.In Equation 3 described above, the processor can randomly select K feature vectors from P IDs for a mini-batch with a batch size of PXK. is the reference object, is a positive object, can represent a negative object. D(·) represents the Euclidean distance, and m is a hyperparameter that can be specified by the user.

프로세서는 삼중항 손실(batch-hard triplet loss) 함수를 통해 전술한 도5에서의 (513) 객체(예: 부정 객체)와 (511) 객체(예: 기준 객체) 간의 거리가 멀어지도록 특징 추출 모델을 트레이닝시킬 수 있다. 또한, 프로세서는 삼중항 손실(batch-hard triplet loss) 함수를 통해 (512) 객체(예: 긍정 객체)와 (511) 객체(예: 기준 객체)간의 거리가 가까워지도록 특징 추출 모델을 트레이닝시킬 수 있다. 또한, 손실 함수는 실제 값과 예측 값 간의 차이를 수치화하는 함수로, 손실을 감소시키면서 모델을 최적화할 수 있다. 또한, 모델이 트레이닝되면 트레이닝에 의한 파라미터가 정해지는데, 모델을 최적화하는 것은 정해진 파라미터를 최적화하는 것과 동일한 의미일 수 있다. 이에 모델을 최적화하는 것은 손실함수를 줄어들게 하는 방향으로 파라미터를 업데이트하는 것을 의미할 수 있다. 결국, 특징 추출 모델을 트레이닝시키는 것은 특징 추출 모델의 파라미터를 업데이트하는 것을 의미할 수 있다.The processor uses a batch-hard triplet loss function to create a feature extraction model so that the distance between the (513) object (e.g., negated object) and the (511) object (e.g., reference object) in FIG. 5 increases. can be trained. Additionally, the processor can train the feature extraction model to make the distance between the (512) object (e.g., positive object) and (511) object (e.g., reference object) closer through the triplet loss (batch-hard triplet loss) function. there is. Additionally, the loss function is a function that quantifies the difference between the actual value and the predicted value, allowing the model to be optimized while reducing the loss. In addition, when a model is trained, parameters by training are determined, and optimizing the model may have the same meaning as optimizing the specified parameters. Accordingly, optimizing the model may mean updating parameters in a way that reduces the loss function. Ultimately, training a feature extraction model may mean updating the parameters of the feature extraction model.

크로스 엔트로피 기반의 분류화 손실(classification loss) 함수는 수학식4와 같이 표현될 수 있다.The cross-entropy-based classification loss function can be expressed as Equation 4.

전술한 수학식 4에서 는 i번째 ID일 확률을 나타내고, k는 소스도메인과 타겟 도메인을 나타낼 수 있다. 는 소스도메인일 경우에 정답인 ID이며, 타겟 도메인일 경우에 클러스터링의 결과로 부여된 허위 ID인 것을 나타낼 수 있다. N은 ID의 개수를 나타낼 수 있고, 는 하이퍼파라미터로 1일 수 있다. 는 입력된 도메인이 소스 도메인일 경우에 항상 정답 클러스터를 알고 있으므로 1일 수 있다. 그러나 입력된 도메인이 타겟 도메인일 경우에는 유일성 논리가 적용되어 같은 클러스터로 묶인 객체들(예: 같은 ID를 가진 객체들)이 같은 이미지에서 나온 경우에는 0일 수 있다. 또한, 입력된 도메인이 타겟 도메인일 경우, 같은 클러스터로 묶인 객체들(예: 같은 ID를 가진 객체들)이 다른 이미지에서 나온 경우에는 1일 수 있다.In the aforementioned equation 4, represents the probability of being the ith ID, and k may represent the source domain and target domain. In the case of a source domain, it can indicate that it is a correct ID, and in the case of a target domain, it can indicate that it is a false ID given as a result of clustering. N can represent the number of IDs, is a hyperparameter and can be 1. can be 1 because the correct answer cluster is always known when the input domain is the source domain. However, if the input domain is the target domain, uniqueness logic is applied, so if objects grouped in the same cluster (e.g. objects with the same ID) come from the same image, it may be 0. Additionally, when the input domain is the target domain, it may be 1 if objects grouped in the same cluster (e.g., objects with the same ID) come from different images.

한편, 객체 식별 모델은 목적함수로 도메인 적응 손실(domain adaptation loss) 함수를 적용할 수 있다. 도메인 적응 손실 함수는 분류 부분과 특징 부분으로 나눠서 손실을 산출할 수 있다. 분류 부분에 있어서 도메인 적응 손실 함수는 전술한 신뢰성 스코어 및 정제된 클러스터링 결과에 기초하여 수학식5와 같이 표현될 수 있다.Meanwhile, the object identification model can apply a domain adaptation loss function as the objective function. The domain adaptation loss function can be divided into a classification part and a feature part to calculate the loss. In the classification part, the domain adaptation loss function can be expressed as Equation 5 based on the above-described reliability score and refined clustering results.

전술한 수학식 5에서 는 중간도메인의 특징이 i번째 ID의 라벨을 가지고 있을 예측 값을 나타낼 수 있다. 은 중간 도메인을 만들 때 소스 도메인과 타겟 도메인을 얼마나 섞을지에 대한 가중치를 나타낼 수 있고, 는 하이퍼파라미터로 1일 수 있다.In the aforementioned equation 5, can represent the predicted value that the feature of the intermediate domain has the label of the ith ID. can represent the weight of how much to mix the source domain and target domain when creating an intermediate domain, is a hyperparameter and can be 1.

또한, 특징 부분에 있어서 도메인 적응 손실 함수는 신뢰성 스코어 및 도메인 간의 거리 값에 기초하여 수학식6과 같이 표현될 수 있다.Additionally, in the feature part, the domain adaptation loss function can be expressed as Equation 6 based on the reliability score and the distance value between domains.

전술한 수학식 6에서 는 소스 도메인과 타겟 도메인의 특징을 나타내고, 는 중간 도메인의 특징을 나타낼 수 있다. 특징 도메인 적응 손실 함수는 L2 norm(Euclidean Distance)를 이용하여 도메인의 특징 간의 거리를 산출할 수 있다. 이에 특징 도메인 적응 손실 함수는 중간 도메인의 특징과 소스 도메인의 특징 간의 거리 또는 중간 도메인의 특징과 타겟 도메인의 특징 간의 거리를 산출할 수 있다. 신뢰성 감마는 미니배치 내 샘플들에 각각 산출되며 도메인 적응 손실은 미니배치 내 모든 샘플들에 대해서 산출될 수 있다. 결과적으로 도메인 적응 손실함수는 중간도메인 특징들의 신뢰성 스코어가 부족할 때 모델 트레이닝에 영향을 주는 정도를 제한할 수 있다.In the aforementioned equation 6, represents the characteristics of the source domain and target domain, may represent the characteristics of the intermediate domain. The feature domain adaptation loss function can calculate the distance between features of the domain using L2 norm (Euclidean Distance). Accordingly, the feature domain adaptation loss function can calculate the distance between the features of the intermediate domain and the features of the source domain or the distance between the features of the intermediate domain and the features of the target domain. Reliability gamma can be calculated for each sample in the minibatch, and domain adaptation loss can be calculated for all samples in the minibatch. As a result, the domain adaptive loss function can limit the extent to which it affects model training when the reliability scores of mid-domain features are insufficient.

도 6은 일 실시예에 따른 비지도 도메인 적응을 위한 객체 검색 모델의 트레이닝 방법의 추론 단계의 흐름도를 도시한다.Figure 6 shows a flowchart of the inference step of a method for training an object search model for unsupervised domain adaptation according to one embodiment.

프로세서는 객체 식별 모델의 트레이닝이 끝나면 테스트 데이터를 이용하여 추론 단계를 수행할 수 있다. 테스트 데이터에는 하나의 객체(예: 사람) 이미지가 있는 쿼리와 복수의 객체 영상이 포함된 갤러리 영상들이 포함될 수 있다. 추론 단계는 갤러리 영상들에서 쿼리와 같은 객체를 찾아내는 테스트 방식일 수 있다. 프로세서는 객체 검출 모델에 테스트 데이터를 입력하면, 테스트 데이터의 테스트 객체를 검출할 수 있다. 프로세서는 검출된 테스트 객체를 트레이닝된 특징 추출 모델(610)에 입력하여 테스트 특징 데이터(611)를 추출할 수 있다. 프로세서는 추출된 테스트 특징 데이터(611)에 기초하여 특징 데이터 간의 유사도를 계산할 수 있다. 프로세서는 계산된 유사도가 높은 순으로 쿼리와 같은 객체를 찾을 수 있다. 이를 통해 사용자는 복수의 객체 사이에서 쿼리와 같은 객체를 정확하게 찾을 수 있다. 추론 단계에서는 전술한 트레이닝 단계와 다르게 객체식별 모델(600)이 특징 추출 모델(610)으로만 구성될 수 있다.After training the object identification model, the processor can perform the inference step using test data. Test data may include queries containing an image of a single object (e.g., a person) and gallery images containing multiple object images. The inference step may be a test method that finds objects such as queries in gallery videos. When test data is input to the object detection model, the processor can detect test objects in the test data. The processor may extract test feature data 611 by inputting the detected test object into the trained feature extraction model 610. The processor may calculate similarity between feature data based on the extracted test feature data 611. The processor can find objects such as the query in descending order of calculated similarity. This allows users to accurately find objects, such as queries, among multiple objects. In the inference stage, unlike the training stage described above, the object identification model 600 may consist of only the feature extraction model 610.

도 7은 일 실시예에 따른 트레이닝 방법이 적용된 객체 검색 모델의 성능을 다른 객체 검색 모델과 비교한 결과이다.Figure 7 shows the results of comparing the performance of an object search model to which a training method according to an embodiment is applied with another object search model.

본 발명의 일 실시예에 따른 트레이닝 방법이 적용된 객체 검색 모델의 성능은 벤치마크 데이터셋인 CUHK-SYSU와 PRW를 이용하여 확인할 수 있다. CUHK-SYSU 데이터 셋은 18,184개의 이미지를 제공하며 96,143개의 바운딩 박스 라벨과 8,432개의 ID 라벨을 포함할 수 있다. 그 중 학습데이터(예: 소스 도메인)는 11,204장이고 5,532개의 ID 라벨을 포함하며, 테스트데이터(예: 타겟 도메인)는 6,978장의 갤러리 이미지와 2,900장의 쿼리 이미지를 포함할 수 있다. PRW 데이터셋은 6개의 다른 카메라에서 촬영된 영상에서 캡처한 것이며, 11,816개의 영상 개수와 43,110개의 바운딩 박스 라벨과 932개의 ID 라벨이 포함할 수 있다. 학습데이터는 5,704장이고 482개의 ID 라벨을 포함하며, 테스트데이터는 6,112장의 갤러리 이미지와 2057개의 쿼리 이미지를 포함할 수 있다. 또한, 성능지표는 mAP와 Top-k를 성능지표로 사용했다. CUHK-SYSU 데이터 셋에서는 타겟 도메인이 라벨 없이도 지도 기계학습 모델 및 약한지도 기계학습 모델의 성능과 유사하게 달성할 수 있다. 또한, PRW 데이터셋에서는 다른 지도 기계학습 모델 및 약한지도 기계학습 모델보다 가장 뛰어난 성능을 보여주고 있다.The performance of the object search model to which the training method according to an embodiment of the present invention is applied can be confirmed using the benchmark datasets CUHK-SYSU and PRW. The CUHK-SYSU data set provides 18,184 images and can include 96,143 bounding box labels and 8,432 ID labels. Among them, the training data (e.g. source domain) is 11,204 images and includes 5,532 ID labels, and the test data (e.g. target domain) may include 6,978 gallery images and 2,900 query images. The PRW dataset is captured from images taken by 6 different cameras and can contain 11,816 image counts, 43,110 bounding box labels, and 932 ID labels. The training data is 5,704 images and includes 482 ID labels, and the test data can include 6,112 gallery images and 2057 query images. Additionally, mAP and Top-k were used as performance indicators. In the CUHK-SYSU dataset, the performance of supervised machine learning models and weakly supervised machine learning models can be achieved similar to that of target domains without labels. Additionally, the PRW dataset shows the best performance compared to other supervised machine learning models and weakly supervised machine learning models.

도 8은 일 실시예에 따른 트레이닝 방법을 순차적으로 적용한 객체 검색 모델의 성능을 비교한 결과이다.Figure 8 shows the results of comparing the performance of an object search model to which training methods according to an embodiment are sequentially applied.

도 8에서 baseline기법은 객체 식별 모델에 기본적인 IDM만 적용한 것을 의미할 수 있다. with DA(domain adaptation) 기법은 부분 분류화를 이용하여 신뢰성 스코어를 산출하는 파트 분류 모델을 객체 식별 모델에 적용한 것을 의미할 수 있다. with CR(clustering refinement) 기법은 유일성 논리를 적용하여 클러스터를 정제하고 그 결과를 객체 식별 모델에 적용한 것을 의미할 수 있다. CUHK -> PRW은 CHUCK 데이터 셋을 소스 도메인으로 설정하고, PRW 데이터 셋을 타겟 도메인으로 설정한 것을 의미할 수 있다. 또한, 이와 반대로 PRW -> CUHK은 PRW 데이터 셋을 소스 도메인으로 설정하고, CHUCK 데이터 셋을 타겟 도메인으로 설정한 것을 의미할 수 있다. 성능결과는 각각의 기법이 개별적으로 적용된 것 또한 Baseline 보다 우수한 것을 보여주고 있다. 그러나 최종적인 성능 결과는 with DA 기법과 with CR 기법이 다 적용된 제안 기법(본 발명의 일 실시예에 따른 트레이닝 방법)이 제일 뛰어난 것을 보여주고 있다.In Figure 8, the baseline technique may mean applying only basic IDM to the object identification model. The with DA (domain adaptation) technique may mean applying a part classification model that calculates a reliability score using partial classification to an object identification model. The with CR (clustering refinement) technique may mean refining clusters by applying uniqueness logic and applying the results to an object identification model. CUHK -> PRW may mean setting the CHUCK data set as the source domain and setting the PRW data set as the target domain. Also, on the contrary, PRW -> CUHK may mean that the PRW data set is set as the source domain and the CHUCK data set is set as the target domain. The performance results show that each technique applied individually is also superior to the baseline. However, the final performance results show that the proposed technique (training method according to an embodiment of the present invention) in which both the with DA technique and the with CR technique are applied is the best.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. It may be possible. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

In a method of training an object search model for unsupervised domain adaptation performed by a processor,
Detecting a source object of source data belonging to a source domain based on a trained object detection model;
detecting a target object of target data belonging to a target domain based on the object detection model;
extracting source feature data from the detected source object and target feature data from the detected target object based on a feature extraction model;
determining a reliability score for the detected target object based on intermediate feature data generated from the source feature data and the target feature data;
Clustering the target object detected from the target data based on the target feature data;
Applying uniqueness logic to refine a cluster for the target object; and
Training an object identification model including the feature extraction model using an objective function based on the refined clustering result and the reliability score.
Training method of object search model including.

According to paragraph 1,
Performing supervised training for the object detection model using training data of the source domain
Training method of object search model including.

According to paragraph 2,
After training the object detection model, performing unsupervised training of the object identification model using the source feature data and the target feature data.
A training method for an object search model further comprising:

According to paragraph 1,
The step of determining the reliability score is,
generating a classification result for a plurality of parts from the intermediate feature data based on a part classification model; and
Calculating the reliability score based on the average of correct prediction probabilities for the plurality of parts in the classification result.
Training method of object search model including.

According to clause 4,
The step of classifying into parts is,
Training the part classification model in a direction to decrease the part loss function calculated based on the correct prediction probability by which the plurality of parts are classified.
Training method of object search model including.

According to paragraph 1,
The step of refining the cluster is,
grouping the clustered target objects by image and refining clusters for the target objects by applying uniqueness logic to each group; and
Resetting negative objects and positive objects based on the refined cluster
Training method of object search model including.

According to clause 6,
The step of training the object identification model is,
Updating the parameters of the feature extraction model so that the distance between the feature vectors of the negative object and the reference object becomes larger and the distance between the feature vectors of the positive object and the reference object becomes closer.
A training method for an object search model further comprising:

According to paragraph 1,
The step of refining the cluster is,
Training the feature extraction model in a direction where the loss function calculated based on the refined clustering result decreases.
A training method for an object search model further comprising:

According to paragraph 1,
The step of training the object identification model is,
Training the object identification model in a direction to decrease the classification domain adaptation loss function calculated based on the reliability score and the refined clustering result.
A training method for an object search model further comprising:

According to paragraph 1,
The step of training the object identification model is,
Training the object identification model in a direction to decrease the feature domain adaptation loss function calculated based on the reliability score and the distance value between domains.
A training method for an object search model further comprising:

A computer-readable recording medium storing one or more computer programs including instructions for performing the method of any one of claims 1 to 10.

In an apparatus for training an object search model for unsupervised domain adaptation,
a memory storing instructions for training an object detection model and an object identification model; and
Processor executing instructions stored in the memory
Including,
The processor,
Detect a source object of source data belonging to a source domain and a target object of target data belonging to a target domain from the object detection model trained by executing instructions stored in the memory, and extract source features from the source object based on the feature extraction model. Extract target feature data from data and the target object, determine a reliability score for the target object based on intermediate data generated from the source feature data and the target feature data, and determine the target feature data based on the target feature data. An object that clusters objects, applies uniqueness logic to refine clusters for the target object, and trains an object identification model including the feature extraction model using an objective function based on the refined clustering results and the confidence score. Training device for search model.

According to clause 12,
The processor,
Training of an object detection model that performs supervised training on the object detection model using training data of the source domain and unsupervised training on the object identification model using the source feature data and the target feature data. Device.

According to clause 12,
The processor,
Generate classification results for a plurality of parts from the intermediate feature data based on a part classification model, and calculate the reliability score based on the average of correct prediction probabilities for the plurality of parts in the classification results. A training device for an object search model.

According to clause 14,
The processor,
A training device for an object search model that trains the part classification model in a direction in which a part classification loss function calculated based on the correct prediction probability by which the plurality of parts are classified decreases.

According to clause 12,
The processor,
Grouping the clustered target objects by image, refining clusters for the target objects by applying uniqueness logic to each group, and training an object search model that resets negative and positive objects based on the refined clusters. Device.

According to clause 16,
The processor,
A training device for an object search model that updates parameters of the feature extraction model so that the distance between the feature vectors of the negative object and the reference object becomes larger and the distance between the feature vectors of the positive object and the reference object becomes closer.

According to clause 12,
The processor,
A training device for an object search model that trains the feature extraction model in a direction in which the loss function calculated based on the refined clustering result decreases.

According to clause 12,
The processor,
A training device for an object search model that trains the object identification model in a direction that reduces the classification domain adaptation loss function calculated based on the reliability score and the refined clustering result.

According to clause 12,
The processor,
A training device for an object search model that trains the object identification model in a direction in which the feature domain adaptation loss function calculated based on the reliability score and the distance value between domains decreases.