[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2020111505A1 - Method and system for generating object gt information for image machine learning - Google Patents

Method and system for generating object gt information for image machine learning Download PDF

Info

Publication number
WO2020111505A1
WO2020111505A1 PCT/KR2019/013511 KR2019013511W WO2020111505A1 WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1 KR 2019013511 W KR2019013511 W KR 2019013511W WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
generating
metadata
frame
Prior art date
Application number
PCT/KR2019/013511
Other languages
French (fr)
Korean (ko)
Inventor
양창모
송재종
추유식
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Publication of WO2020111505A1 publication Critical patent/WO2020111505A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention relates to a method for generating GT (Ground Truth) information used for machine learning of an image or video, and more specifically, to a method and system for generating GT information for an object such as a person or a car included in an image or video frame. It is about.
  • GT Round Truth
  • a technique for detecting an object is the basic technique.
  • a GT Round Truth
  • an object is manually extracted in an offline state and trained. It takes a method of generating an detector to perform object detection.
  • a standard GT component and a GT description method are defined so as to describe comprehensive information of human or vehicle objects existing in an image or video frame.
  • an object of the present invention is to provide a method and a system capable of efficiently generating GT information by automating GT tagging.
  • the method for generating GT information of an object in an image according to the present invention includes automatically analyzing and generating GT information of each object included in an image according to a predetermined GT structure, and correcting the generated GT information. And generating metadata by converting the completed GT information, and the GT information for each object includes an object type, an object posture, an object state, an object location information, and an object property.
  • the GT information generation system includes an image storage unit that stores an image, and a GT analysis unit that receives an image file from the image storage unit and analyzes and generates GT information for each object present in each frame. And, it includes a GT information correction unit for modifying the GT information generated by the GT analysis unit, and a metadata generation unit for generating metadata by converting the modified GT information.
  • a standard GT component and a GT technology method can be defined so as to describe comprehensive information of objects such as a person or a vehicle existing in a still image image or a video frame.
  • GT information can be automated to generate more efficiently compared to conventional methods.
  • FIG. 1 is a block diagram of a GT information generation system according to an embodiment of the present invention.
  • FIG. 2 is an overall flowchart of a method for generating GT information according to an embodiment of the present invention.
  • FIG. 3 is a detailed flowchart of the GT analysis step shown in FIG. 2;
  • FIG. 4 is a detailed flowchart of the GT information modification step illustrated in FIG. 2.
  • each component, functional blocks or means may be composed of one or more sub-components, the electrical, electronic, and mechanical functions performed by each component are electronic circuits , It may be implemented with various known devices such as an integrated circuit, an ASIC (Application Specific Integrated Circuit), or mechanical elements, or may be implemented separately, or two or more may be integrated into one.
  • ASIC Application Specific Integrated Circuit
  • the overall information of the image object such as a person or a vehicle is tagged and GT meta is used. Use a method of organizing data.
  • the image object GT information defined in the present invention is shown in Table 1 below.
  • the frame number has one value in the case of inputting a video image, and the number of each frame in the case of video input.
  • the number of objects means the number of objects detected in the corresponding image or frame.
  • the object ID list is composed of the number of objects. For each object, the object type, object posture, object state, object position information, and object property information are defined based on the object ID.
  • the types of objects are classified as'person' or'car', and the posture of the object is the'front','back','left','right','front-' of the posture of the object expressed in the image or video frame. It is divided into 8 directions such as'left','front-right','back-right', and'back-left'.
  • the state of an object is divided into ‘total’, ‘cut’, and ‘overlap’ depending on whether the entire object is visible in an image or video frame or overlapping with another object.
  • the location information of the object expresses four coordinates in which the object's bounding box is expressed based on the coordinates of (0, 0) of the image or video frame.
  • the property information of the object is configured differently according to the type of the object.
  • the type of the object is, for example, a person, it is composed of race, gender, age, height, color of the top, color of the bottom, and whether or not the glasses are worn.
  • the present invention provides a method and system for defining a standard GT component so as to describe comprehensive information of objects existing in an image or video frame, and generating and managing GT information accordingly.
  • FIG. 1 is a configuration diagram of a GT information generation system according to an embodiment of the present invention.
  • the system includes an image storage unit 100, a GT analysis unit 110, a GT information correction unit 120, a GT metadata generation unit 130, and a metadata storage unit 140.
  • the image storage unit 100 analyzes GT information and stores an image or video frame (hereinafter referred to as a'frame') containing an object to be generated.
  • a'frame' an image or video frame
  • Non-volatile memory such as a hard disk, volatile memory such as RAM, Alternatively, it may be implemented as a register that serves as a buffer for temporarily storing streaming data.
  • the GT analysis unit 110 automatically determines each piece of information according to the above-described GT components for each frame from the image input from the image storage unit 100, that is, the type of the object, the posture of the object, the state of the object, and the location information of the object. Extract the properties of an object. For automatic GT analysis of objects in an image, you can use open source or cloud API as well as self-developed algorithm.
  • the GT information correction unit 120 displays the extracted GT information as a list so that the user can modify the GT information generated by the GT analysis unit through analysis and displays it to the user.
  • a list of objects having the above-described GT information structure is displayed for each frame number (image number of a video or frame number of a video), so that the user can review.
  • the GT information is modified by reflecting the input.
  • the GT metadata generation unit 130 converts the modified GT information into metadata, and the metadata storage unit 140 stores metadata.
  • the metadata storage unit 140 and the image storage unit 100 are described separately from a logical point of view, and in hardware, each may be a separate storage device and may be configured in a single physical storage device.
  • the GT automatic analysis step will be described in more detail as follows.
  • the input video is divided into frames, and a frame number is assigned to the frame to be analyzed (S211).
  • a frame number is assigned to the frame to be analyzed (S211). In the case of a normal image, only one frame exists, so frame division is not performed.
  • GT analysis is automatically analyzed for the divided frame image (S214).
  • developed algorithms, open source, or cloud APIs can be used, and examples of cloud APIs include the Sighthound Cloud API.
  • the GT generated by analysis is generated as shown in Table 1 above, the frame number, the number of objects in each frame, the object ID for each object, the type of object by object ID, the posture of the object, the state of the object, the location information of the object, the object It is preferable to construct a data structure including the properties of.
  • the GT information analyzed and generated is stored for modification (S216).
  • the GT information correction step S220 is performed.
  • the GT information modification step (S220) is a step for the user to manually modify the GT information that is automatically analyzed.
  • the GT information first stored through the information correction unit 120 is represented as a list and displayed to the user (S222).
  • a list of objects is represented according to the number of images or video frames.
  • step S222 when the user selects an object for which information is to be modified, the selection input is received (S224), and the selected object is switched to a modifiable state.
  • the user modifies GT information by receiving an input for modifying GT information of the selected object (S226). Specifically, the user can modify the type of the object, the posture of the object, the state of the object, the location information of the object, the attribute information of the object, and the information correction unit 120 receives the user's correction input for each of the attribute information GT Update and update information.
  • step S228 it is determined whether GT information correction has been completed for all frames of the image or video, and if there is a frame that has not yet been reviewed, the process returns to step S224, and if all frames have been reviewed for correction, GT information Complete the modification.
  • GT information is converted into metadata (S230).
  • the converted generated metadata is stored in an appropriate storage (metadata storage unit 140).
  • XML, EXCEL, JSON, TEXT, etc. formats can be used as metadata.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

A method for generating GT information of objects in an image, according to the present invention, comprises the steps of: performing automatic analysis according to a predetermined GT structure so as to generate GT information of each object included in the image; correcting the generated GT information; and generating metadata by transforming the corrected GT information, wherein the GT information for each object comprises the type of the object, the orientation of the object, the state of the object, positional information of the object, and attributes of the object. According to the present invention, standard GT constitutional elements and a GT description method can be defined so as to describe collective information of objects such as humans and vehicles present in a still image or a video frame, and GT information can be more efficiently generated than by conventional methods by automating the GT information. Ultimately, by defining standard GT elements for vehicle or human objects in an image and a video frame, it is possible to provide and utilize common standard GT generation techniques for image and video machine learning.

Description

영상 기계학습을 위한 객체 GT 정보 생성 방법 및 시스템Method and system for generating object GT information for machine learning
본 발명은 이미지나 동영상의 기계학습에 사용되는 GT(Ground Truth) 정보 생성 방법에 관한 것으로서, 보다 자세히는 이미지나 동영상 프레임에 포함된 사람 및 자동차 등 객체에 대한 GT 정보를 생성하는 방법 및 시스템에 관한 것이다.The present invention relates to a method for generating GT (Ground Truth) information used for machine learning of an image or video, and more specifically, to a method and system for generating GT information for an object such as a person or a car included in an image or video frame. It is about.
영상에서의 객체 인식 및 추적과 관련해서는 물체(이하, 객체라 칭함)를 검출하는 기술이 기본인데, 종래의 객체 검출 방식은 오프라인 상태에서 수작업으로 객체의 GT(Ground Truth)를 추출하고 이를 학습시켜서 검출기를 생성하여 객체 검출을 수행하는 방식을 취한다.Regarding object recognition and tracking in an image, a technique for detecting an object (hereinafter referred to as an object) is the basic technique. In the conventional object detection method, a GT (Ground Truth) of an object is manually extracted in an offline state and trained. It takes a method of generating an detector to perform object detection.
이와 같이 GT의 추출과 GT 정보의 생성은 객체 인식에 있어서 기본이 되는 중요한 것인데, 종래의 GT 정보 생성 방법에서는 이미지나 동영상 프레임에 포함된 객체를 기술(Description)하기 위한 종합적인 표현 방법이 없었다. 이로 인해, 종래의 GT 정보 생성 방법에서는 영상 기계학습을 위해 필요로 하는 요소들을 자체적인 기준에 따라 기술하는 방법이 사용되었으며, 사용되는 요소들도 극히 제한적일 수 밖에 없다.As described above, extraction of GT and generation of GT information are fundamentally important in object recognition. In the conventional method for generating GT information, there is no comprehensive expression method for describing an object included in an image or video frame. For this reason, in the conventional method for generating GT information, a method of describing elements required for image machine learning according to its own standards has been used, and the elements used must be extremely limited.
또한, 종래에는 기계학습을 위한 객체 GT 정보를 생성하기 위해서 사용자가 이미지나 동영상 프레임을 일일이 확인하며 수동으로 태깅(Tagging)하는 방법이 사용되었는데, 이러한 방법은 방대한 학습용 영상에 대한 GT 정보를 사용자가 수동으로 태깅하기 때문에 많은 인력과 시간이 소요되는 단점을 가진다.In addition, conventionally, in order to generate object GT information for machine learning, a method in which a user manually checks an image or a video frame and manually tags it has been used. Such a method allows the user to receive GT information for a vast learning image. It has the disadvantage that it takes a lot of manpower and time because it is manually tagged.
본 발명의 GT 정보 생성 방법에서는 이미지나 비디오 프레임에 존재하는 사람 혹은 자동차 객체들의 종합적인 정보를 기술할 수 있도록 표준적인 GT 구성 요소 및 GT 기술 방법을 정의하고자 한다. In the GT information generation method of the present invention, a standard GT component and a GT description method are defined so as to describe comprehensive information of human or vehicle objects existing in an image or video frame.
또한, GT 태깅을 자동화하여 효율적으로 GT 정보를 생성할 수 있는 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, an object of the present invention is to provide a method and a system capable of efficiently generating GT information by automating GT tagging.
본 발명에 따른 영상내 객체의 GT 정보 생성방법은 영상에 포함된 각 객체의 GT 정보를 소정의 GT 구조에 따라 자동으로 분석하여 생성하는 단계와, 생성된 상기 GT 정보를 수정하는 단계와, 수정완료된 GT 정보를 변환하여 메타데이터를 생성하는 단계를 포함하며, 객체별 GT 정보는 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성을 포함한다. The method for generating GT information of an object in an image according to the present invention includes automatically analyzing and generating GT information of each object included in an image according to a predetermined GT structure, and correcting the generated GT information. And generating metadata by converting the completed GT information, and the GT information for each object includes an object type, an object posture, an object state, an object location information, and an object property.
본 발명의 다른 실시예에 따른 GT 정보 생성 시스템은 영상을 저장하는 영상 저장부와, 상기 영상 저장부로부터 영상 파일을 입력받아, 각 프레임에 존재하는 객체별로 GT 정보를 분석하여 생성하는 GT 분석부와, 상기 GT 분석부가 생성한 GT 정보를 수정하는 GT 정보 수정부와, 수정 완료된 GT 정보를 변환하여 메타테이터를 생성하는 메타데이터 생성부를 포함한다. The GT information generation system according to another embodiment of the present invention includes an image storage unit that stores an image, and a GT analysis unit that receives an image file from the image storage unit and analyzes and generates GT information for each object present in each frame. And, it includes a GT information correction unit for modifying the GT information generated by the GT analysis unit, and a metadata generation unit for generating metadata by converting the modified GT information.
본 발명에 따르면, 정영상 이미지나 동영상 비디오 프레임에 존재하는 사람 혹은 자동차 등의 객체들의 종합적인 정보를 기술할 수 있도록 표준적인 GT 구성 요소 및 GT 기술 방법을 정의할 수 있다. According to the present invention, a standard GT component and a GT technology method can be defined so as to describe comprehensive information of objects such as a person or a vehicle existing in a still image image or a video frame.
또한, GT 정보를 자동화하여 종래의 방법과 비교하여 보다 효율적으로 생성할 수 있다.In addition, GT information can be automated to generate more efficiently compared to conventional methods.
나아가, 이미지 및 비디오 프레임의 자동차 또는 사람 객체에 대한 표준적인 GT 요소를 정의함으로써, 이미지 및 비디오 기계학습을 위한 공통적인 표준 GT 생성 기술의 마련 및 활용이 가능하다.Furthermore, by defining standard GT elements for automobile or human objects of image and video frames, it is possible to prepare and utilize common standard GT generation techniques for image and video machine learning.
도 1은 본 발명의 일 실시예에 따른 GT 정보 생성 시스템의 구성도.1 is a block diagram of a GT information generation system according to an embodiment of the present invention.
도 2는 본 발명의 일 실시예에 따른 GT 정보 생성 방법의 전체 순서도.2 is an overall flowchart of a method for generating GT information according to an embodiment of the present invention.
도 3은 도 2에 도시된 GT 분석 단계에 대한 상세 순서도.3 is a detailed flowchart of the GT analysis step shown in FIG. 2;
도 4는 도 2에 도시된 GT 정보 수정 단계에 대한 상세 순서도.FIG. 4 is a detailed flowchart of the GT information modification step illustrated in FIG. 2.
본 발명의 목적 및 효과는 이상에서 언급한 것으로 제한되지 않으며, 본 발명의 목적 및 효과, 그리고 그것들을 달성하기 위한 기술적 구성들은 첨부 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. The objects and effects of the present invention are not limited to those mentioned above, and the objects and effects of the present invention and technical configurations for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings.
본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있다. 이하의 각 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며 본 발명의 범위를 제한하고자 하는 것이 아니다. In the description of the present invention, when it is determined that a detailed description of known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms. Each of the following embodiments is provided to make the disclosure of the present invention complete, and to provide a complete knowledge of the scope of the invention to those skilled in the art to which the invention pertains, and to limit the scope of the invention. no.
명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 또는 "구비"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...유닛", "...장치", "...디바이스", "...부" 또는 "...모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part is said to "include" or "equipment" a component, this means that other components may be further included instead of excluding other components, unless specifically stated to the contrary. . In addition, terms such as "...unit", "...device", "...device", "...unit" or "...module" described in the specification may have at least one function or operation. Means a unit of processing, which may be implemented in hardware or software or a combination of hardware and software.
한편, 본 발명의 각 실시예에 있어서, 각 구성요소들, 기능 블록들 또는 수단들은 하나 또는 그 이상의 하부 구성요소로 구성될 수 있으며, 각 구성요소들이 수행하는 전기, 전자, 기계적 기능들은 전자회로, 집적회로, ASIC(Application Specific Integrated Circuit) 등 공지된 다양한 소자들 또는 기계적 요소들로 구현될 수 있으며, 각각 별개로 구현되거나 2 이상이 하나로 통합되어 구현될 수도 있다.On the other hand, in each embodiment of the present invention, each component, functional blocks or means may be composed of one or more sub-components, the electrical, electronic, and mechanical functions performed by each component are electronic circuits , It may be implemented with various known devices such as an integrated circuit, an ASIC (Application Specific Integrated Circuit), or mechanical elements, or may be implemented separately, or two or more may be integrated into one.
이하, 첨부도면을 참조하여 본 발명의 구성에 대하여 상세히 설명한다.Hereinafter, the configuration of the present invention will be described in detail with reference to the accompanying drawings.
종래의 영상 객체에 대한 기계학습에 사용되는 GT 정보는 기계학습을 수행하는 주체에 따라 극히 제한적인 요소가 사용되었는데, 본 발명에서는 사람이나 자동차와 같은 영상 객체의 종합적인 정보를 태깅하고 이를 GT 메타데이터로 구성하는 방법을 사용한다. 본 발명에서 정의하는 영상 객체 GT 정보는 다음의 표 1과 같다.In the GT information used for the machine learning of the conventional image object, an extremely limited element was used depending on the subject who performs the machine learning. In the present invention, the overall information of the image object such as a person or a vehicle is tagged and GT meta is used. Use a method of organizing data. The image object GT information defined in the present invention is shown in Table 1 below.
프레임 번호Frame number
객체의 개수Number of objects
객체 ID (LIST)Object ID (LIST)
객체의 종류Type of object
객체의 자세Object posture
객체의 상태The state of the object
객체의 위치 정보Object location information
객체의 속성Object properties
프레임 번호는 정영상 이미지 입력의 경우에는 하나의 값을 가지며, 동영상 입력의 경우 각 프레임의 번호를 의미한다. 객체의 개수는 해당 이미지 혹은 프레임에서 검출되는 객체의 개수를 의미한다. 객체의 개수만큼 객체 ID 목록이 구성되며, 각 객체에 대해 객체 ID를 기준으로 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성 정보를 정의한다. The frame number has one value in the case of inputting a video image, and the number of each frame in the case of video input. The number of objects means the number of objects detected in the corresponding image or frame. The object ID list is composed of the number of objects. For each object, the object type, object posture, object state, object position information, and object property information are defined based on the object ID.
객체의 종류는 ‘사람’ 또는 ‘자동차’와 같이 구분되며, 객체의 자세는 이미지나 동영상 프레임에 표현되는 객체의 자세를 ‘앞’, ‘뒤’, ‘좌’, ‘우’, ‘앞-좌’, ‘앞-우’, ‘뒤-우’, ‘뒤-좌’와 같이 8방향으로 구분하여 표현한다. The types of objects are classified as'person' or'car', and the posture of the object is the'front','back','left','right','front-' of the posture of the object expressed in the image or video frame. It is divided into 8 directions such as'left','front-right','back-right', and'back-left'.
객체의 상태는 객체의 전체가 이미지나 동영상 프레임에서 보이는지의 유무 혹은 다른 객체와의 겹침 유무에 따라 ‘전체’, ‘잘림’, ‘겹침’으로 구분한다. The state of an object is divided into ‘total’, ‘cut’, and ‘overlap’ depending on whether the entire object is visible in an image or video frame or overlapping with another object.
객체의 위치 정보는 이미지나 동영상 프레임의 (0, 0)의 좌표를 기준으로 객체의 바운딩 박스(Bounding Box)가 표현되는 4개의 좌표를 표현한다. The location information of the object expresses four coordinates in which the object's bounding box is expressed based on the coordinates of (0, 0) of the image or video frame.
객체의 속성 정보는 객체의 종류에 따라 다르게 구성되는데, 객체의 종류가 예컨대 사람인 경우에는 인종, 성별, 나이, 키, 상의 컬러, 하의 컬러, 안경 착용 여부로 구성되며, 객체의 종류가 자동차인 경우에는 자동차 색상, 번호판 번호, 자동차 제조사, 자동차 모델, 자동차 연식으로 구성될 수 있다.The property information of the object is configured differently according to the type of the object. When the type of the object is, for example, a person, it is composed of race, gender, age, height, color of the top, color of the bottom, and whether or not the glasses are worn. There may be car color, license plate number, car make, car model, and car year.
이와 같이 본 발명에서는 이미지나 비디오 프레임에 존재하는 객체들의 종합적인 정보를 기술할 수 있도록 표준적인 GT 구성 요소를 정의하고, 이에 따른 GT 정보를 생성하고 관리하는 방법 및 시스템을 제공한다.As described above, the present invention provides a method and system for defining a standard GT component so as to describe comprehensive information of objects existing in an image or video frame, and generating and managing GT information accordingly.
도 1은 본 발명의 일 실시예에 따른 GT 정보 생성 시스템의 구성도이다.1 is a configuration diagram of a GT information generation system according to an embodiment of the present invention.
도시된 바와 같이, 시스템은 영상 저장부(100), GT 분석부(110), GT 정보 수정부(120), GT 메타데이터 생성부(130), 메타데이터 저장부(140)를 포함한다.As shown, the system includes an image storage unit 100, a GT analysis unit 110, a GT information correction unit 120, a GT metadata generation unit 130, and a metadata storage unit 140.
영상 저장부(100)는 GT 정보를 분석하고 생성할 객체를 담고 있는 이미지 또는 동영상 프레임(이하, '프레임'으로 총칭함)을 저장하는데, 하드디스크와 같은 비휘발성 메모리, RAM 등의 휘발성 메모리, 또는 스트리밍 데이터를 일시적으로 저장하는 버퍼 역할을 하는 레지스터 등으로 구현될 수 있다.The image storage unit 100 analyzes GT information and stores an image or video frame (hereinafter referred to as a'frame') containing an object to be generated. Non-volatile memory such as a hard disk, volatile memory such as RAM, Alternatively, it may be implemented as a register that serves as a buffer for temporarily storing streaming data.
GT 분석부(110)는 영상 저장부(100)로부터 입력된 영상으로부터 프레임별로 자동으로 전술한 GT 구성 요소에 따른 각 정보, 즉 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성을 추출한다. 영상 내 객체의 GT 자동 분석을 위해서는 자체 개발한 알고리즘은 물론이고 오픈 소스나, 클라우드 API를 이용할 수 있다.The GT analysis unit 110 automatically determines each piece of information according to the above-described GT components for each frame from the image input from the image storage unit 100, that is, the type of the object, the posture of the object, the state of the object, and the location information of the object. Extract the properties of an object. For automatic GT analysis of objects in an image, you can use open source or cloud API as well as self-developed algorithm.
GT 정보 수정부(120)는 GT 분석부가 분석을 통하여 생성한 GT 정보를 사용자가 수정할 수 있도록 추출된 GT 정보를 목록으로 표현하여 사용자에게 표시한다. 바람직하게는 프레임 번호(정영상의 이미지 번호 또는 동영상의 프레임 번호)별로 전술한 GT 정보 구조를 가지는 객체 목록을 표시함으로써, 사용자가 검토할 수 있도록 한다.The GT information correction unit 120 displays the extracted GT information as a list so that the user can modify the GT information generated by the GT analysis unit through analysis and displays it to the user. Preferably, a list of objects having the above-described GT information structure is displayed for each frame number (image number of a video or frame number of a video), so that the user can review.
사용자로부터 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성 정보에 대한 수정 입력을 받으면 이를 반영하여 GT 정보를 수정한다.When the user receives a modification input for the type of object, the posture of the object, the state of the object, the position information of the object, and the property information of the object, the GT information is modified by reflecting the input.
GT 메타데이터 생성부(130)는 수정이 완료된 GT 정보를 메타데이터로 변환하고, 메타데이터 저장부(140)는 메타데이터를 저장한다.The GT metadata generation unit 130 converts the modified GT information into metadata, and the metadata storage unit 140 stores metadata.
메타데이터 저장부(140)와 영상 저장부(100)는 논리적 관점에서 구분하여 설명한 것이고, 하드웨어적으로는 각각 별도의 기억장치일 수 있고 하나의 물리적 기억장치내에 구성될 수 있음은 물론이다.The metadata storage unit 140 and the image storage unit 100 are described separately from a logical point of view, and in hardware, each may be a separate storage device and may be configured in a single physical storage device.
이하, 도 2 내지 4를 참조하여 본 발명의 일 실시예에 따른 GT 정보 생성 방법을 구체적으로 설명한다. Hereinafter, a method of generating GT information according to an embodiment of the present invention will be described in detail with reference to FIGS. 2 to 4.
프로그램이 시작된 후, GT 정보를 생성할 이미지 혹은 동영상 파일을 열고(S200), 동영상 파일의 GT 자동 분석을 수행한다(S210).After the program starts, an image or video file to generate GT information is opened (S200), and GT automatic analysis of the video file is performed (S210).
GT 자동 분석 단계를 보다 상세히 설명하면 다음과 같다. The GT automatic analysis step will be described in more detail as follows.
맨 먼저, 입력되는 동영상을 프레임별로 구분하고 분석할 프레임에 프레임 번호를 부여한다(S211). 정영상 이미지의 경우 하나의 프레임만이 존재하므로, 프레임 분할을 수행하지 않는다.First, the input video is divided into frames, and a frame number is assigned to the frame to be analyzed (S211). In the case of a normal image, only one frame exists, so frame division is not performed.
다음으로, 분할된 프레임 영상에 대하여 GT 분석을 자동으로 분석한다(S214). 영상의 GT 자동 분석을 위해서는 개발된 알고리즘, 오픈 소스, 또는 클라우드 API가 사용될 수 있으며, 클라우드 API의 예로서, Sighthound Cloud API 가 있다.Next, GT analysis is automatically analyzed for the divided frame image (S214). For automatic GT analysis of images, developed algorithms, open source, or cloud APIs can be used, and examples of cloud APIs include the Sighthound Cloud API.
분석되어 생성되는 GT는 전술한 표 1과 같이, 프레임 번호, 각 프레임내 객체의 갯수, 각 객체별 객체 ID, 객체 ID별로 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성을 포함하는 데이터 구조체로 구성함이 바람직하다. The GT generated by analysis is generated as shown in Table 1 above, the frame number, the number of objects in each frame, the object ID for each object, the type of object by object ID, the posture of the object, the state of the object, the location information of the object, the object It is preferable to construct a data structure including the properties of.
분석되어 생성된 GT 정보는 수정을 위하여 저장한다(S216).The GT information analyzed and generated is stored for modification (S216).
입력되는 영상이 동영상인 경우, 자동 GT 분석을 수행할 다음 프레임이 있는지를 판별하여(S218), 있으면 단계(S212)로 되돌아가고 더 이상 분석할 프레임이 없으면 분석 단계를 완료한다. When the input image is a video, it is determined whether there is a next frame to perform automatic GT analysis (S218), and if there is no, there is no more frame to analyze, and the analysis step is completed.
GT 자동 분석 단계(S210)이 완료되면, GT 정보 수정 단계(S220)를 진행한다.When the GT automatic analysis step S210 is completed, the GT information correction step S220 is performed.
GT 정보 수정 단계(S220)는 자동으로 분석된 GT 정보를 사용자가 수동으로 수정할 수 있도록 지원하는 단계이다. The GT information modification step (S220) is a step for the user to manually modify the GT information that is automatically analyzed.
이를 위하여, 정보 수정부(120)을 통해 우선 저장된 GT 정보를 목록으로 표현하여 사용자에게 표시한다(S222). 이 단계에서는 이미지나 비디오 프레임의 번호에 따라 객체 목록이 표현된다. To this end, the GT information first stored through the information correction unit 120 is represented as a list and displayed to the user (S222). In this step, a list of objects is represented according to the number of images or video frames.
그 다음, 단계(S222)에서 표현된 객체 목록 중, 사용자가 정보 수정을 원하는 객체를 선택하면 그 선택 입력을 수신하고(S224), 선택된 객체를 수정 가능한 상태로 전환한다. Then, among the object list expressed in step S222, when the user selects an object for which information is to be modified, the selection input is received (S224), and the selected object is switched to a modifiable state.
사용자가 선택된 객체의 GT 정보를 수정하는 입력을 수신하여 GT 정보를 수정한다(S226). 구체적으로 사용자는 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성 정보를 수정할 수 있고, 정보 수정부(120)는 이들 각 속성 정보에 대한 사용자의 수정 입력을 받아서 GT 정보를 수정 갱신한다. The user modifies GT information by receiving an input for modifying GT information of the selected object (S226). Specifically, the user can modify the type of the object, the posture of the object, the state of the object, the location information of the object, the attribute information of the object, and the information correction unit 120 receives the user's correction input for each of the attribute information GT Update and update information.
후속의 단계(S218)에서 이미지 혹은 동영상의 모든 프레임에 대해 GT 정보 수정을 완료했는지를 판단하여, 아직 수정 검토가 안된 프레임이 있으면 단계(S224)로 되돌아가고, 모든 프레임이 수정 검토가 되었으면 GT 정보 수정을 완료한다.In a subsequent step (S218), it is determined whether GT information correction has been completed for all frames of the image or video, and if there is a frame that has not yet been reviewed, the process returns to step S224, and if all frames have been reviewed for correction, GT information Complete the modification.
분석 및 수정이 완료되면, GT 정보를 메타데이터로 변환한다(S230). 변환생성된 메타데이터는 적절한 저장소(메타데이터 저장부; 140)에 저장한다. 메타데이터로는 XML, EXCEL, JSON, TEXT 등이 포맷이 사용될 수 있다.When analysis and modification are completed, GT information is converted into metadata (S230). The converted generated metadata is stored in an appropriate storage (metadata storage unit 140). XML, EXCEL, JSON, TEXT, etc. formats can be used as metadata.
이상, 본 발명의 바람직한 실시예에 대하여 개시하였으며, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시예 외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명할 것이며, 따라서 본 발명의 권리범위의 이하의 특허청구범위의 기재에 의하여 정하여져야 할 것이다. The preferred embodiments of the present invention have been described above, which are merely used in a general sense to easily describe the technical contents of the present invention and to help understand the present invention, and are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art to which the present invention pertains that other modifications based on the technical spirit of the present invention may be implemented in addition to the embodiments disclosed herein, and thus, It should be determined by the description of the claims.

Claims (7)

  1. 영상에 포함된 각 객체의 GT 정보를 소정의 GT 구조에 따라 자동으로 분석하여 생성하는 단계와,Automatically analyzing and generating GT information of each object included in the image according to a predetermined GT structure,
    생성된 상기 GT 정보를 수정하는 단계와,Modifying the generated GT information;
    수정완료된 GT 정보를 변환하여 메타데이터를 생성하는 단계Steps to generate metadata by converting the modified GT information
    를 포함하는 영상내 객체의 GT 정보 생성방법.Method for generating GT information of an object in an image including a.
  2. 제1항에 있어서, 상기 생성하는 단계는 The method of claim 1, wherein the generating step
    각 프레임 - 정영상 이미지 또는 동영상의 각 프레임을 총칭함, 이하 같음 - 에 포함된 객체별로 분석하여 객체의 종류, 객체의 자세, 객체의 상태, 객체의 위치 정보, 객체의 속성을 포함하는 GT 정보를 생성하는 것인 영상내 객체의 GT 정보 생성방법.Each frame-a normal image or each frame of a movie is collectively referred to as follows-by analyzing each object included in the object type, the attitude of the object, the state of the object, the location information of the object, and GT information including the properties of the object A method of generating GT information of an object in an image that is being generated.
  3. 제1항에 있어서, 상기 수정하는 단계는The method of claim 1, wherein the modifying step
    각 프레임별로, 생성된 상기 객체의 GT 정보를 목록으로 표현하여 사용자에게 제시하는 단계와,In each frame, the GT information of the generated object is represented as a list and presented to the user.
    사용자의 선택에 따라 선택된 GT 정보에 대한 사용자의 수정 입력을 수신하는 단계와,Receiving a user's correction input for the selected GT information according to the user's selection,
    상기 수정 입력에 따라 GT 정보를 수정 갱신하는 단계를 포함하되,Comprising the step of modifying and updating the GT information according to the correction input,
    모든 프레임 내의 객체에 대한 수정 검토가 완료될 때까지 전술한 3개 단계를 반복하는 것인 영상내 객체의 GT 정보 생성방법.The method of generating GT information of an object in an image is to repeat the above three steps until the modification review of the object in all frames is completed.
  4. 제1항에 있어서, 메타데이터를 생성하는 단계는The method of claim 1, wherein the step of generating metadata
    수정이 완료된 상기 GT 정보를 XML, EXCEL, JSON, TEXT 중 어느 하나의 포맷으로 변환하는 단계를 포함하는 것인 영상내 객체의 GT 정보 생성방법.And converting the modified GT information into any one format among XML, EXCEL, JSON, and TEXT.
  5. 영상을 저장하는 영상 저장부와,An image storage unit for storing images,
    상기 영상 저장부로부터 영상 파일을 입력받아, 각 프레임에 존재하는 객체별로 GT 정보를 분석하여 생성하는 GT 분석부와,A GT analysis unit that receives an image file from the image storage unit and analyzes and generates GT information for each object present in each frame,
    상기 GT 분석부가 생성한 GT 정보를 수정하는 GT 정보 수정부와,A GT information correction unit for correcting GT information generated by the GT analysis unit,
    수정 완료된 GT 정보를 변환하여 메타테이터를 생성하는 메타데이터 생성부를 포함하는 영상내 객체의 GT 정보 생성 시스템.A system for generating GT information of objects in an image including a metadata generating unit that converts the modified GT information to generate metadata.
  6. 제5항에 있어서, 상기 GT 분석부는According to claim 5, The GT analysis unit
    자체 개발 분석 알고리즘, 오픈 소스 및 클라우드 AIP 중 적어도 하나를 내장하여 분석을 수행하는 것인 영상내 객체의 GT 정보 생성 시스템.A system for generating GT information of objects in an image that performs analysis by embedding at least one of a self-developed analysis algorithm, open source, and cloud AIP.
  7. 제5항에 있어서,The method of claim 5,
    상기 메타데이터를 저장하는 메타데이터 저장부를 더 포함하는 영상내 객체의 GT 정보 생성 시스템.GT information generation system of the object in the image further comprising a metadata storage unit for storing the metadata.
PCT/KR2019/013511 2018-11-26 2019-10-15 Method and system for generating object gt information for image machine learning WO2020111505A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2018-0146998 2018-11-26
KR1020180146998A KR20200068043A (en) 2018-11-26 2018-11-26 Ground Truth information generation method and system for image machine learning

Publications (1)

Publication Number Publication Date
WO2020111505A1 true WO2020111505A1 (en) 2020-06-04

Family

ID=70851977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/013511 WO2020111505A1 (en) 2018-11-26 2019-10-15 Method and system for generating object gt information for image machine learning

Country Status (2)

Country Link
KR (1) KR20200068043A (en)
WO (1) WO2020111505A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102313944B1 (en) * 2021-05-14 2021-10-18 주식회사 인피닉 Method for tracking object crossing the boundary of the angle of view, and computer program recorded on record-medium for executing method therefor
KR102313940B1 (en) * 2021-05-14 2021-10-18 주식회사 인피닉 Method for tracking object in continuous 3D data, and computer program recorded on record-medium for executing method therefor
KR102310613B1 (en) * 2021-05-14 2021-10-12 주식회사 인피닉 Method for tracking object in continuous 2D image, and computer program recorded on record-medium for executing method therefor
KR102313938B1 (en) * 2021-06-17 2021-10-18 주식회사 인피닉 Method for tracking object through 3D path inference, and computer program recorded on record-medium for executing method therefor
KR102310611B1 (en) * 2021-06-17 2021-10-13 주식회사 인피닉 Method for tracking object through 2D path inference, and computer program recorded on record-medium for executing method therefor
KR102557136B1 (en) * 2021-11-23 2023-07-20 이인텔리전스 주식회사 Method and device for generating user data set for object and line in front of vehicle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234960A1 (en) * 2004-04-14 2005-10-20 Microsoft Corporation Automatic data perspective generation for a target variable
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
KR20180029625A (en) * 2016-09-13 2018-03-21 대구대학교 산학협력단 Ground Truth generation program for performance evaluation of image processing
KR20180118596A (en) * 2015-10-02 2018-10-31 트랙터블 리미티드 Semi-automatic labeling of data sets

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234960A1 (en) * 2004-04-14 2005-10-20 Microsoft Corporation Automatic data perspective generation for a target variable
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
KR20180118596A (en) * 2015-10-02 2018-10-31 트랙터블 리미티드 Semi-automatic labeling of data sets
KR20180029625A (en) * 2016-09-13 2018-03-21 대구대학교 산학협력단 Ground Truth generation program for performance evaluation of image processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CARL VONDRICK: "Video Annotation and Tracking with Active Learning", RESEARCHGATE, January 2011 (2011-01-01), pages 1 - 9 *

Also Published As

Publication number Publication date
KR20200068043A (en) 2020-06-15

Similar Documents

Publication Publication Date Title
WO2020111505A1 (en) Method and system for generating object gt information for image machine learning
CN110705405B (en) Target labeling method and device
WO2017039086A1 (en) Deep learning modularization system on basis of web plug-in and image recognition method using same
CN109635783B (en) Video monitoring method, device, terminal and medium
CN111353555A (en) Label detection method and device and computer readable storage medium
WO2019132589A1 (en) Image processing device and method for detecting multiple objects
WO2012053867A1 (en) Method and apparatus for recognizing an emotion of an individual based on facial action units
EP3172683A1 (en) Method for retrieving image and electronic device thereof
US20200250401A1 (en) Computer system and computer-readable storage medium
WO2022213540A1 (en) Object detecting, attribute identifying and tracking method and system
CN111242083A (en) Text processing method, device, equipment and medium based on artificial intelligence
US11023714B2 (en) Suspiciousness degree estimation model generation device
CN112699758A (en) Sign language translation method and device based on dynamic gesture recognition, computer equipment and storage medium
CN110009038B (en) Training method and device for screening model and storage medium
WO2011093568A1 (en) Method for recognizing layout-based print medium page
CN114359160A (en) Screen detection method and device, electronic equipment and storage medium
WO2024005413A1 (en) Artificial intelligence-based method and device for extracting information from electronic document
CN114821513B (en) Image processing method and device based on multilayer network and electronic equipment
EP4105893A1 (en) Dynamic artifical intelligence camera model update
WO2023095991A1 (en) System for automatically extracting question area and type within content for learning included in electronic document and method therefor
WO2022177069A1 (en) Labeling method and computing device therefor
CN115713621A (en) Cross-modal image target detection method and device by using text information
WO2021118047A1 (en) Method and apparatus for evaluating accident fault in accident image by using deep learning
CN114202719A (en) Video sample labeling method and device, computer equipment and storage medium
CN112131400A (en) Construction method of medical knowledge map for assisting outpatient assistant

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19891015

Country of ref document: EP

Kind code of ref document: A1