KR20210037858A

KR20210037858A - Video receiving apparatus for synchronizing video data and metadata, and method thereof

Info

Publication number: KR20210037858A
Application number: KR1020190120302A
Authority: KR
Inventors: 정승원; 박미주
Original assignee: 한화테크윈 주식회사
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2021-04-07
Also published as: KR102714326B1

Abstract

The present invention relates to a video reception device. The video reception device comprises: a reception unit receiving video data and metadata from an imaging device; a first storage unit storing the video data; a second storage unit storing the metadata; and a text track unit storing the timestamp of the received metadata and transmitting an output signal to the second storage unit so that the metadata is outputted from the second storage unit at the output time of the video data if the metadata corresponding to the timestamp of the video data exists when outputting the video data from the first storage unit. Since the outputted video data and metadata are synchronized, the received video data and metadata can be synchronized and outputted.

Description

Video receiving apparatus for synchronizing video data and metadata in real time, and method thereof

본 발명은 영상 수신 장치에 관한 것으로서, 보다 상세하게는 촬상장치로부터 비동기적으로 전송되는 비디오 데이터 및 메타 데이터를 수신하여 출력되는 비디오 데이터 및 메타 데이터를 동기화하는 영상 수신 장치, 및 비디오 데이터 및 메타 데이터를 동기화하는 방법에 관한 것이다.The present invention relates to an image receiving apparatus, and more particularly, an image receiving apparatus for synchronizing output video data and meta data by receiving video data and meta data asynchronously transmitted from an imaging apparatus, and video data and meta data It is about how to synchronize.

기존의 웹 브라우저 기반 실시간 영상감시에서 영상과 메타 데이타를 수신하여 출력시 비디오 데이터 및 메타 데이터 간의 동기화를 수행하지 않았다. 이상적으로 비디오 데이터와 메타 데이터는 동시에 생성되므로 수신된 비디오 데이터와 메타 데이터는 수신된 순서대로 보여줄 뿐, 따로 비디오 데이터 및 메타 데이터 간의 동기화를 수행하지 않았다. 이때, 비디오 데이터와 메타 데이터가 이상적으로 송수신된 경우, 비디오 데이터와 메타 데이터 간에 동기가 잘 맞고 그렇지 않은 경우에는 잘 맞지 않았다.In the existing web browser-based real-time video surveillance, synchronization between video data and meta data was not performed when receiving and outputting video and metadata. Ideally, video data and meta data are generated at the same time, so the received video data and meta data are only displayed in the order they are received, and synchronization between the video data and meta data is not performed. In this case, when video data and meta data are ideally transmitted/received, synchronization between the video data and meta data is well matched, and otherwise, it is not well matched.

실질적으로 라이브 영상에서 비디오 데이터와 영상을 분석한 결과인 메타 데이터를 각각의 Session으로 받을 때 동일한 시간의 비디오 데이터와 메타 데이터를 수신하는 시간 사이에 시간 차이가 존재하고, 비디오 데이터에 따른 영상을 재생 시 지연이 발생하는 등 영상과 메타 데이터 간의 동기화가 맞지 않는 문제점이 발생하였다.When receiving metadata, which is a result of analyzing video data and video in a live video, there is a time difference between the video data of the same time and the time when metadata is received, and the video is played according to the video data. There was a problem that synchronization between the video and the metadata was not correct, such as a time delay.

또한, 영상 분석을 수행하는 과정인, 기존 Machine learning 등을 이용한 물체 인식(object Recognition) 과정은 서버나 클라이언트 중 한 장치에서 처리하였고, 그에 따라 물체 인식을 처리하는 장치에서의 부담이 많이 발생하는 문제도 있었다.In addition, the object recognition process using existing machine learning, which is the process of performing image analysis, was processed by either the server or the client, and accordingly, the burden on the apparatus that processes object recognition occurs. There was also.

미국공개특허 제2006-0256232호US Patent Publication No. 2006-0256232

본 발명이 해결하고자 하는 과제는, 출력되는 비디오 데이터 및 메타 데이터를 동기화하는 영상 수신 장치를 제공하는 것이다.The problem to be solved by the present invention is to provide an image receiving apparatus that synchronizes output video data and metadata.

본 발명의 과제들은 이상에서 언급한 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems of the present invention are not limited to the problems mentioned above, and other problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기 과제를 해결하기 위하여, 본 발명의 실시예에 따른 영상 수신 장치는 촬상장치로부터 비디오 데이터 및 메타 데이터를 수신하는 수신부; 상기 비디오 데이터를 저장하는 제1 저장부; 상기 메타 데이터를 저장하는 제2 저장부; 및 상기 수신된 메타 데이터의 타임스탬프를 저장하고, 상기 비디오 데이터가 제1 저장부에서 출력시, 상기 비디오 데이터의 타임스탬프에 해당하는 메타 데이터가 존재하는 경우, 상기 비디오 데이터의 출력시간에 상기 제2 저장부로부터 해당 메타 데이터가 출력되도록 출력신호를 상기 제2 저장부에 전달하는 텍스트 트랙부를 포함하되, 상기 출력되는 비디오 데이터 및 메타 데이터는 동기화되는 것을 특징으로 한다.In order to solve the above problems, an image receiving apparatus according to an embodiment of the present invention includes: a receiving unit for receiving video data and meta data from an imaging device; A first storage unit for storing the video data; A second storage unit for storing the meta data; And storing the timestamp of the received meta data, and when the video data is output from the first storage unit, if meta data corresponding to the timestamp of the video data exists, the second time at the output time of the video data 2 A text track unit for transmitting an output signal to the second storage unit so that the corresponding meta data is output from the storage unit, wherein the output video data and the meta data are synchronized.

또한, 상기 텍스트 트랙부는, 상기 비디오 데이터의 타임스탬프를 id 값으로 이용할 수 있다.In addition, the text track unit may use the timestamp of the video data as an id value.

또한, 상기 비디오 데이터가 출력될 시간을 산출하고, 출력된 비디오 데이터의 영상 프레임의 타임스탬프를 상기 텍스트 트랙부에 전달하는 비디오 처리부를 더 포함할 수 있다.In addition, a video processing unit may further include a video processing unit that calculates a time at which the video data is to be output and transmits a time stamp of an image frame of the output video data to the text track unit.

또한, 상기 비디오 처리부는, 상기 비디오 데이터를 출력하기 위한 지연시간을 산출하여, 상기 비디오 데이터가 출력될 시간을 산출할 수 있다.In addition, the video processor may calculate a delay time for outputting the video data, and calculate a time for outputting the video data.

또한, 상기 출력된 비디오 데이터에 상기 출력된 메타 데이터를 오버레이시켜 화면상에 표시하는 디스플레이를 더 포함할 수 있다.In addition, the display may further include a display that overlays the output metadata on the output video data and displays it on a screen.

또한, 상기 출력된 비디오 데이터 및 상기 출력된 메타 데이터를 이용하여 객체 검출을 수행하는 객체 추출부를 더 포함할 수 있다.In addition, an object extraction unit for performing object detection using the output video data and the output metadata may be further included.

또한, 상기 출력된 비디오 데이터 및 상기 출력된 메타 데이터를 이용하여 객체에 대한 분류(Classification) 또는 세그멘테이션(Segmentation)을 수행하는 객체 처리부를 더 포함할 수 있다.In addition, an object processing unit for performing classification or segmentation on an object using the output video data and the output metadata may further be included.

또한, 상기 객체 처리부는, 머신러닝을 이용하여 상기 객체에 대한 분류 또는 세그멘테이션을 수행할 수 있다.In addition, the object processor may classify or segment the object using machine learning.

또한, 상기 수신부는, 상기 촬상장치와 웹소켓(Websoket) 연결을 통해 상기 비디오 데이터 및 메타 데이터를 수신할 수 있다.In addition, the receiving unit may receive the video data and metadata through a Websoket connection with the imaging device.

또한, 하나 이상의 프로세서를 포함하는 영상 수신 장치에서 비디오 데이터 및 메타 데이터를 동기화하는 방법에 있어서, 촬상장치로부터 비디오 데이터 및 메타 데이터를 수신하는 단계; 상기 비디오 데이터, 메타 데이터, 메타 데이터의 타임스탬프를 각각 저장하는 단계; 및 상기 비디오 데이터가 출력시, 상기 저장된 메타 데이터의 타임스탬프에 상기 비디오 데이터의 타임스탬프에 해당하는 메타 데이터가 존재하는 경우, 상기 비디오 데이터의 출력시간에 해당 메타 데이터를 출력하여, 출력되는 비디오 데이터 및 메타 데이터를 동기화는 단계를 포함할 수 있다.In addition, a method for synchronizing video data and metadata in an image receiving device including one or more processors, the method comprising: receiving video data and metadata from an imaging device; Storing the video data, the meta data, and a timestamp of the meta data, respectively; And when the video data is output, if metadata corresponding to the timestamp of the video data exists in the timestamp of the stored metadata, corresponding metadata is outputted at the output time of the video data, and the output video data And synchronizing the metadata.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the present invention are included in the detailed description and drawings.

본 발명의 실시예들에 의하면 적어도 다음과 같은 효과가 있다.According to the embodiments of the present invention, there are at least the following effects.

본 발명에 따르면 수신한 비디오 데이터와 메타 데이터를 동기화하여 출력할 수 있다. 또한, 비디오 데이터와 메타 데이터를 동기화시켜 머신 러닝을 이용한 객체처리의 정확성을 높일 수 있다. 나아가, 비디오 데이터와 메타 데이터를 동기화시켜 물체 검출과 물체 인식은 영상 송신 장치와 영상 수신 장치에서 각각 처리함으로써 시스템의 부하를 분담시킬 수 있다. According to the present invention, received video data and metadata can be synchronized and output. In addition, it is possible to increase the accuracy of object processing using machine learning by synchronizing video data and metadata. Furthermore, by synchronizing video data and metadata, object detection and object recognition are processed by the image transmitting device and the image receiving device, respectively, so that the load of the system can be shared.

본 발명에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.The effects according to the present invention are not limited by the contents exemplified above, and more various effects are included in the present specification.

도 1은 본 발명의 일 시실예에 따른 영상 수신 장치의 블록도이다.
도 2 및 도 3은 본 발명의 실시예에 영상 수신 장치의 블록도이다.
도 4 내지 도 6은 본 발명의 일 실시예에 따른 영상 수신 장치에서 비디오 데이터와 메타 데이터를 동기화하는 과정을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 비디오 데이터 및 메타 데이터를 동기화하는 방법의 흐름도이다.1 is a block diagram of an image receiving apparatus according to an exemplary embodiment of the present invention.
2 and 3 are block diagrams of an image receiving apparatus according to an embodiment of the present invention.
4 to 6 are diagrams for explaining a process of synchronizing video data and metadata in an image receiving apparatus according to an embodiment of the present invention.
7 is a flowchart of a method of synchronizing video data and metadata according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms, and only these embodiments make the disclosure of the present invention complete, and are common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used with meanings that can be commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not interpreted ideally or excessively unless explicitly defined specifically.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.The terms used in the present specification are for describing exemplary embodiments and are not intended to limit the present invention. In this specification, the singular form also includes the plural form unless specifically stated in the phrase. As used herein, “comprises” and/or “comprising” do not exclude the presence or addition of one or more other elements other than the mentioned elements.

도 1은 본 발명의 일 실시예에 따른 영상 수신 장치의 블록도이다.1 is a block diagram of an image receiving apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 영상 수신 장치(110)는 하나 이상의 프로세서(111)로 구성될 수 있고, 하나 이상의 저장부(미도시) 또는 통신부(미도시)를 더 포함할 수 있다.The image receiving apparatus 110 according to an embodiment of the present invention may be configured with one or more processors 111, and may further include one or more storage units (not shown) or communication units (not shown).

프로세서(111)는 촬상장치에 수신한 비디오 데이터와 메타 데이터를 동기화하는 프로그램을 수행할 수 있고, 비디오 데이터와 메타 데이터를 동기화하는 프로그램은 저장부에 저장되어 있을 수 있다. 프로세서(111)는 촬상장치(120)로부터 수신한 비디오 데이터와 메타 데이터를 동기화한다.The processor 111 may execute a program for synchronizing video data and meta data received by the imaging device, and a program for synchronizing the video data and meta data may be stored in a storage unit. The processor 111 synchronizes the video data and metadata received from the imaging device 120.

영상 수신 장치(110)는 도 2와 같이, 수신부(210), 제1 저장부(220), 제2 저장부(230), 및 텍스트 트랙부(240)로 구성되어 촬상장치에 수신한 비디오 데이터와 메타 데이터를 동기화할 수 있다. 도 3과 같이, 비디오 처리부(310), 디스플레이(320), 객체 추출부(330), 또는 객체 처리부(340)를 더 포함할 수 있다.As shown in FIG. 2, the image receiving device 110 is composed of a receiving unit 210, a first storage unit 220, a second storage unit 230, and a text track unit 240 to receive video data received by the imaging device. And the metadata can be synchronized. As shown in FIG. 3, a video processing unit 310, a display 320, an object extraction unit 330, or an object processing unit 340 may be further included.

수신부(210)는 촬상장치(120)로부터 비디오 데이터 및 메타 데이터를 수신한다. 촬상장치(120)와의 통신을 연결하고, 연결된 통신을 통해 비디오 데이터 및 메타 데이터를 수신한다. 이때, 수신부(210)는 촬상장치(120)와 웹소켓(Websoket) 연결을 통해 상기 비디오 데이터 및 메타 데이터를 수신할 수 있다. 웹소켓을 이용하여 RTSP/RTP 프로토콜을 이용하여 통신을 수행할 수 있다. 촬상장치(120)는 데이터를 송신하는 입장에서 웹소켓 서버가 되고, 영상 수신 장치(110)는 웹소켓 클라이언트가 된다.The receiving unit 210 receives video data and meta data from the imaging device 120. Communication with the imaging device 120 is connected, and video data and metadata are received through the connected communication. In this case, the receiving unit 210 may receive the video data and meta data through a connection between the imaging device 120 and a web socket. Communication can be performed using RTSP/RTP protocol by using a web socket. The imaging device 120 becomes a websocket server from the standpoint of transmitting data, and the image receiving device 110 becomes a websocket client.

제1 저장부(220)는 상기 비디오 데이터를 저장하고, 제2 저장부(230)는 상기 메타 데이터를 저장한다. 수신부(210)가 수신한 비디오 데이터와 메타 데이터는 각각 서로 다른 저장부에 저장된다. 비디오 데이터는 제1 저장부(220)에 저장되고, 메타 데이터는 제2 저장부(230)에 저장된다.The first storage unit 220 stores the video data, and the second storage unit 230 stores the metadata. The video data and metadata received by the receiving unit 210 are stored in different storage units, respectively. Video data is stored in the first storage unit 220, and meta data is stored in the second storage unit 230.

텍스트 트랙부(240)는 상기 수신된 메타 데이터의 타임스탬프를 저장하고, 상기 비디오 데이터가 제1 저장부에서 출력시, 상기 비디오 데이터의 타임스탬프에 해당하는 메타 데이터가 존재하는 경우, 상기 비디오 데이터의 출력시간에 상기 제2 저장부로부터 해당 메타 데이터가 출력되도록 출력신호를 상기 제2 저장부에 전달한다.The text track unit 240 stores the timestamp of the received metadata, and when the video data is output from the first storage unit, when metadata corresponding to the timestamp of the video data exists, the video data The output signal is transmitted to the second storage unit so that the metadata is output from the second storage unit at the output time of.

텍스트 트랙부(240)는 수신된 메타 데이터의 타임스탬프를 저장한다. 이후, 비디오 데이터가 제1 저장부에서 출력되는 경우, 해당 비디오 데이터와 동일한 타임스탬프에 해당하는 메타 데이터가 있는 경우, 비디오 데이터와 메타 데이터를 동기화하기 위하여, 텍스트 트랙부(240)는 상기 비디오 데이터의 출력시간에 해당 메타 데이터를 제2 저장부(230)로부터 출력되도록 제2 저장부(230)에 출력신호를 전달한다.The text track unit 240 stores a timestamp of the received meta data. Thereafter, when the video data is output from the first storage unit, when there is metadata corresponding to the same timestamp as the video data, in order to synchronize the video data and the metadata, the text track unit 240 uses the video data The output signal is transmitted to the second storage unit 230 so that the corresponding metadata is output from the second storage unit 230 at the output time of.

상기 출력신호를 수신한 제2 저장부(230)는 해당 메타 데이터를 비디오 데이터가 출력되는 것과 동시에 출력되고, 이를 통해, 출력되는 비디오 데이터 및 메타 데이터는 동기화된다.The second storage unit 230 receiving the output signal outputs the corresponding metadata at the same time as the video data is output, and through this, the output video data and the metadata are synchronized.

텍스트 트랙부(240)는 웹 뷰어 상 videoElement의 textTrack을 이용한다. 기존 textTrack은 저장된 비디오 영상을 재생시 자막이나 설명, 영상에 대한 정보를 제공하는 용도로 사용된다. 영상 재생시 시작시간, 종료시간과 text 정보(자막, 설명)이 포함된 WebVTT 파일을 불러와 영상 재생시 자막을 보여주는 기술을 HTML5에서 사용하고 있다. 텍스트 트랙부(240)는 웹 브라우저 상 videoElement의 textTrack을 이용하여 비디오 데이터의 재생시 메타 데이터를 불러와 동기화한다.The text track unit 240 uses the textTrack of videoElement on the web viewer. Existing textTrack is used to provide information on subtitles, descriptions, and images when playing stored video images. HTML5 uses a technology to display subtitles during video playback by loading a WebVTT file containing start time, end time and text information (subtitles, descriptions) when playing video. The text track unit 240 loads and synchronizes metadata when video data is reproduced using textTrack of videoElement on a web browser.

텍스트 트랙부(240)는 상기 비디오 데이터의 타임스탬프를 id 값으로 이용한다. 비디오 데이터의 프레임의 타임스탬프(UTC timestamp)를 id 값으로 입력되면, 해당 프레임이 재생시, 해당 시간에 텍스트 트랙부가 동작하여, id 값에 해당하는 메타데이터를 찾아, 출력신호를 메타 데이터가 저장된 제2 저장부(230)에 전달하여, 해당 시간에 메타 데이터가 비디오 데이터와 동기화하여 출력되도록 한다.The text track unit 240 uses the timestamp of the video data as an id value. When a UTC timestamp of a frame of video data is input as an id value, when the frame is played back, the text track unit operates at the time, finds metadata corresponding to the id value, and stores the output signal. It is transmitted to the second storage unit 230 so that the meta data is synchronized with the video data and outputted at a corresponding time.

이 때, 상기 비디오 데이터가 출력될 시간을 산출하고, 출력된 비디오 데이터의 영상 프레임의 타임스탬프를 상기 텍스트 트랙부에 전달하는 비디오 처리부를 더 포함할 수 있다. 도 3과 같이, 비디오 처리부(310)는 제1 저장부(220)에 저장된 비디오 데이터를 출력하기 전에, 비디오 데이터를 버퍼링하며, 해당 데이터가 출력될 예상 시간을 산출하고, 비디오 데이터의 영상 프레임의 타임스탬프를 텍스트 트랙부에 입력하여 텍스트 트랙부(240)가 동작하도록 할 수 있다.In this case, the video processing unit may further include a video processing unit that calculates a time at which the video data is to be output, and transmits a timestamp of an image frame of the output video data to the text track unit. As shown in FIG. 3, before outputting the video data stored in the first storage unit 220, the video processing unit 310 buffers the video data, calculates an expected time for outputting the corresponding data, and calculates an image frame of the video data. By inputting the timestamp to the text track unit, the text track unit 240 may operate.

비디오 처리부(310)는 상기 비디오 데이터를 출력하기 위한 지연시간을 산출하여, 상기 비디오 데이터가 출력될 시간을 산출할 수 있다. 비디오 데이터를 출력하기 위해선 버퍼링에 따른 시간이 소요되는 바, 비디오 데이터를 출력하기 위한 지연시간을 산출하여, 비디오 데이터가 출력될 시간을 산출하여 이용할 수 있다.The video processing unit 310 may calculate a delay time for outputting the video data and calculate a time for outputting the video data. In order to output video data, it takes a time according to buffering, so a delay time for outputting the video data may be calculated, and a time at which the video data will be output may be calculated and used.

제1 저장부(220)에서 비디오 처리부(310)를 통해 출력된 비디오 데이터와 제2 저장부(230)에서 출력된 메타 데이터는 동기화되고, 상기 출력된 비디오 데이터에 상기 출력된 메타 데이터를 오버레이시켜 디스플레이(320) 화면상에 표시할 수 있다. 이를 통해, 사용자는 동기화되어 비디오 데이터에 오버레이된 메타 데이터 정보를 비디오 데이터와 함께 확인할 수 있다.The video data output from the first storage unit 220 through the video processing unit 310 and the metadata output from the second storage unit 230 are synchronized, and the output metadata is overlaid on the output video data. The display 320 can be displayed on the screen. Through this, the user can check the metadata information that is synchronized and overlaid on the video data together with the video data.

또는, 제1 저장부(220)에서 비디오 처리부(310)를 통해 출력된 비디오 데이터와 제2 저장부(230)에서 출력된 메타 데이터는 동기화되어 객체 추출부(330)로 입력되고, 제2 저장부(230)에서 출려된 메타 데이터는 제1 저장부(220)에서 비디오 처리부(310)를 통해 출력된 비디오 데이터와 동기화되어, 비디오 데이터 상에 존재하는 객체를 추출하는데 이용될 수 있다. 앞서 설명한 바와 같이, 비디오 데이터의 타임스탬프를 id값으로 텍스트 트랙부(240)에 입력되면, 해당 id 값에 해당하는 메타 데이터를 찾아 영상에서 타겟 객체(target object)의 위치를 찾고, 타겟 객체를 추출할 수 있다. Alternatively, the video data output from the first storage unit 220 through the video processing unit 310 and the metadata output from the second storage unit 230 are synchronized and input to the object extraction unit 330, and the second storage unit is The metadata output from the unit 230 is synchronized with the video data output through the video processing unit 310 from the first storage unit 220 and may be used to extract an object existing on the video data. As described above, when the timestamp of video data is input to the text track unit 240 as an id value, the meta data corresponding to the id value is found, the location of the target object in the image is found, and the target object is Can be extracted.

이후, 객체 처리부(340)에서 상기 출력된 비디오 데이터 및 상기 출력된 메타 데이터를 이용하여 객체에 대한 분류(Classification) 또는 세그멘테이션(Segmentation)을 수행할 수 있다. 촬상장치(120)에서 생성된 메타 데이터는 객체의 박스 형태의 정보만을 포함하고 있을 수 있는 바, 해당 메타 데이터를 이용하여 객체에 대한 분류 및 세그멘테이션을 통해 객체인식을 수행할 수 있다. 특히, 감시 시스템에서는 분류 및 세그멘테이션이 중요하고, 이에 대한 정확성을 높이기 위해 비디오 데이터와 메타 데이터의 동기화가 중요하다. 이때, 객체 처리부(340)는 머신러닝(Machine Learing)을 이용하여 상기 객체에 대한 분류 또는 세그멘테이션을 수행할 수 있다. 객체에 대한 분류 또는 세그멘테이션의 정확성을 높이기 위하여 머신러닝을 이용할 수 있다. 머신러닝을 이용하는 경우, 시스템 부하가 많이 발생할 수 있기 때문에, 객체검출은 촬상장치에서 수행되고, 객체에 대한 분류 또는 세그멘테이션은 영상 수신 장치에서 처리되도록 할 수 있다. 이를 통해, 시스템 부하를 분담할 수 있다. Thereafter, the object processor 340 may classify or segment an object using the output video data and the output metadata. Since the metadata generated by the imaging apparatus 120 may include only information in the form of a box of an object, object recognition may be performed through classification and segmentation of the object using the corresponding metadata. In particular, classification and segmentation are important in a surveillance system, and synchronization of video data and metadata is important to increase accuracy. In this case, the object processing unit 340 may classify or segment the object using machine learning. Machine learning can be used to improve the accuracy of classification or segmentation of objects. In the case of using machine learning, since a large system load may occur, object detection may be performed by an imaging device, and classification or segmentation of an object may be processed by an image receiving device. This allows the system load to be shared.

예를 들어, 도 4와 같이, 기존 CCTV 시스템에서는 객체검출(Object Detection)과 객체인식(Object Recognition)을 CCTV에서 수행하거나 클라이언트(client)에서 모두 처리하였다. 하지만, 도 5와 같이, 객체검출은 CCTV에서 수행하고, 객체인식은 클라이언트에서 처리하는 경우, 시스템 부하의 분담이 가능하다. 하지만, 객체검출 및 객체인식을 다른 장치에서 처리하는 경우 비디오 데이터와 메타 데이터의 시간 차이가 있는 경우, 정확성이 떨어질 수 있다. 하지만, 본 발명의 일 실시예에 따른 영상 처리 장치에서의 비디오 데이터와 메타 데이터의 동기화를 통해 비디오 데이터와 메타 데이터를 동기화시키는 경우, 그 정확성을 높일 수 있고, 따라서, 영상 처리 시스템의 기능을 다양하게 설계할 수 있는 설계자유도를 높일 수 있다.For example, as shown in FIG. 4, in the existing CCTV system, object detection and object recognition are performed by CCTV or both are processed by a client. However, as shown in FIG. 5, when object detection is performed by CCTV and object recognition is performed by a client, system load can be shared. However, when object detection and object recognition are processed by other devices, if there is a time difference between video data and metadata, accuracy may be degraded. However, when the video data and the metadata are synchronized through the synchronization of the video data and the metadata in the image processing apparatus according to an embodiment of the present invention, the accuracy can be improved, and thus, the functions of the image processing system can be varied. The degree of freedom of design that can be designed can be increased.

도 6은 촬상장치에서 비디오 데이터 및 메타 데이터를 생성되는 과정부터 영상 수신 장치에서 비디오 데이터 및 메타 데이터를 동기화하는 전체 과정을 도시한 것이다.6 is a diagram illustrating an entire process of synchronizing video data and meta data in an image receiving device from a process of generating video data and meta data in an imaging device.

촬상장치(610)의 이미지 센서에서 촬영된 이미지(611)는 인코딩(612)되고, 객체검출(613)을 통해 메타데이터(614)를 생성한다. 이후, 웹소켓 상 RTSP/RTP를 통해 영상 수신 장치(620)로 송수신된다. 여기서, 촬상장치(610)는 카메라이고, 영상 수신 장치(620)는 Non plugin 웹뷰어일 수 있고, 영상 수신 장치(620)는 RTSP로 연결되어, 실시간 비디오 스트림과 해당 비디오 데이터에 대해 영상분석한 결과를 메타데이터로 수신할 수 있다.The image 611 photographed by the image sensor of the imaging device 610 is encoded 612, and metadata 614 is generated through object detection 613. Thereafter, it is transmitted/received to the image receiving device 620 through RTSP/RTP on the web socket. Here, the imaging device 610 is a camera, the image receiving device 620 may be a non-plugin web viewer, and the image receiving device 620 is connected via RTSP to perform an image analysis of a real-time video stream and corresponding video data. Results can be received as metadata.

영상 수신 장치(620)는 비디오 데이터와 메타 데이터를 수신하고, 웹소켓 클라이언트(621), JS RTSP/RTP 클라이언트(622), 및 미디어 라우터(Media Router)를 통해, 비디오 데이터는 제1 저장부인 비디오 청크(624) 저장되고, 타임스탬프는 텍스트 트랙부인 Text Element(325)에 저장되며, 메타 데이터는 메타데이터 파서(meta parser)를 통해 제2 저장부인 메타 데이터 리스트(627)에 저장된다. 저장된 비디오 데이터는 MSE(628)을 통해 Video element(629)에서 재생을 위해 Media Segment를 생성하면서 frameDuration에 따라 재생될 시간을 계산하고, 해당 영상 프레임의 id 값(UTC timestamp 값으로 사용)을 Text Element(325)인 textTrack에 입력하고 영상이 재생될 때 해당 시간의 textTrack이 active되고, 이 때 id값에 해당하는 Metadata 정보를 찾아 출력되도록 한다. 출력된 비디오 데이터와 메타 데이터는 객체 추출부(631)로 입력되어, target object의 위치를 찾아 target object를 추출한다. 추출한 target object는 객체 분류 및 세그멘테이션을 수행하는 object recognition module(632)에 입력되고, 최종 처리된 영상은 모니터(633) 상에 디스플레이된다. The video receiving device 620 receives video data and meta data, and through a websocket client 621, a JS RTSP/RTP client 622, and a media router, the video data is a first storage unit of video. The chunk 624 is stored, the timestamp is stored in the text element 325, which is a text track unit, and the meta data is stored in the meta data list 627, which is a second storage unit, through a metadata parser. The stored video data is generated by the video element 629 through the MSE 628, calculates the playback time according to the frameDuration, and calculates the time to be played according to the frameDuration, and uses the ID value of the video frame (using the UTC timestamp value) as a text element. When input to textTrack (325) and the video is played, the textTrack of the time is active, and at this time, the metadata information corresponding to the id value is searched for and output. The output video data and meta data are input to the object extraction unit 631, and the target object is extracted by finding the location of the target object. The extracted target object is input to the object recognition module 632 that performs object classification and segmentation, and the final processed image is displayed on the monitor 633.

도 7은 본 발명의 일 실시예에 따른 비디오 데이터 및 메타 데이터를 동기화하는 방법의 흐름도이다. 도 7의 각 단계에 대한 상세한 설명은 도 1 내지 도 6의 영상 수신 장치에서 비디오 데이터 및 메타 데이터를 동기화하는 과정에 대한 상세한 설명에 대응되는 바, 이하 중복되는 설명은 생략하도록 한다. 도 7의 각 단계는 영상 처리 장치에 포함된 하나 이상의 프로세서에 의해 수행될 수 있다.7 is a flowchart of a method of synchronizing video data and metadata according to an embodiment of the present invention. A detailed description of each step of FIG. 7 corresponds to a detailed description of a process of synchronizing video data and metadata in the image receiving apparatus of FIGS. 1 to 6, and redundant descriptions will be omitted below. Each step of FIG. 7 may be performed by one or more processors included in the image processing apparatus.

하나 이상의 프로세서를 포함하는 영상 수신 장치에서 비디오 데이터 및 메타 데이터를 동기화하는 방법은 S11 단계에서 촬상장치로부터 비디오 데이터 및 메타 데이터를 수신하고, S12 단계에서 상기 비디오 데이터, 메타 데이터, 메타 데이터의 타임스탬프를 각각 저장하고, S13 단계에서 상기 비디오 데이터가 출력시, 상기 저장된 메타 데이터의 타임스탬프에 상기 비디오 데이터의 타임스탬프에 해당하는 메타 데이터가 존재하는 경우, 상기 비디오 데이터의 출력시간에 해당 메타 데이터를 출력하여, 출력되는 비디오 데이터 및 메타 데이터를 동기화한다.A method of synchronizing video data and metadata in an image receiving device including one or more processors includes receiving video data and metadata from an imaging device in step S11, and a timestamp of the video data, metadata, and metadata in step S12. When the video data is output in step S13, if metadata corresponding to the timestamp of the video data is present in the timestamp of the stored metadata, the corresponding metadata is added to the output time of the video data. By outputting, the output video data and metadata are synchronized.

본 발명의 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체 (magnetic media), CD-ROM, DVD와 같은 광기록 매체 (optical media), 플롭티컬 디스크 (floptical disk)와 같은 자기-광 매체 (magneto-optical media), 및 롬 (ROM), 램 (RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Further, the computer-readable recording medium is distributed over a computer system connected by a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the technical field to which the present invention belongs. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Those of ordinary skill in the art to which the present invention pertains will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features thereof. Therefore, it should be understood that the embodiments described above are illustrative in all respects and are not limiting. The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

110: 영상 수신 장치
111: 프로세서
120: 촬상장치
210: 수신부
220: 제1 저장부
230: 제2 저장부
240: 텍스트 트랙부
310: 비디오 처리부
320: 디스플레이
330: 객체 추출부
340: 객체 처리부110: video receiving device
111: processor
120: image pickup device
210: receiver
220: first storage unit
230: second storage unit
240: text track part
310: video processing unit
320: display
330: object extraction unit
340: object processing unit

Claims

A receiving unit for receiving video data and meta data from the imaging device;
A first storage unit for storing the video data;
A second storage unit for storing the meta data; And
When the timestamp of the received metadata is stored, and when the video data is output from the first storage unit, when metadata corresponding to the timestamp of the video data exists, the second Including a text track unit for transmitting an output signal to the second storage unit so that the corresponding metadata is output from the storage unit,
The video receiving device, characterized in that the output video data and the metadata are synchronized.

The method of claim 1,
The text track part,
An image receiving apparatus, characterized in that using the timestamp of the video data as an id value.

The method of claim 1,
An image receiving apparatus further comprising a video processing unit that calculates a time at which the video data is to be output and transmits a timestamp of an image frame of the output video data to the text track unit.

The method of claim 3,
The video processing unit,
And calculating a time for outputting the video data by calculating a delay time for outputting the video data.

The method of claim 1,
An image receiving apparatus further comprising a display for overlaying the output metadata on the output video data and displaying it on a screen.

The method of claim 1,
An image receiving apparatus further comprising an object extracting unit configured to detect an object using the output video data and the output metadata.

The method of claim 1,
An image receiving apparatus further comprising an object processing unit that performs classification or segmentation on an object using the output video data and the output metadata.

The method of claim 7,
The object processing unit,
An image receiving apparatus, characterized in that for performing classification or segmentation on the object using machine learning.

The method of claim 1,
The receiving unit,
The video receiving device, characterized in that receiving the video data and the metadata through a websoket connection with the imaging device.

A method for synchronizing video data and metadata in an image receiving device including one or more processors, the method comprising:
Receiving video data and meta data from an imaging device;
Storing the video data, the meta data, and a timestamp of the meta data, respectively; And
When the video data is output, if metadata corresponding to the timestamp of the video data exists in the timestamp of the stored metadata, corresponding metadata is output at the output time of the video data, and the output video data and A method comprising the step of synchronizing the metadata.