KR20220027433A

KR20220027433A - Method for generating multimedia bitstream including deep learning network model and apparatus thereof

Info

Publication number: KR20220027433A
Application number: KR1020200108297A
Authority: KR
Inventors: 정진우; 최병호; 김성제; 홍민수; 이승호
Original assignee: 한국전자기술연구원
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2022-03-08
Also published as: KR102517449B1

Abstract

The present invention relates to a method for generating a multimedia bitstream comprising a deep learning network model and a decoding method thereof. The present invention can transmit, to an additional information area of a multimedia bitstream, a bitstream with a deep learning network model so that deep learning model data can be easily managed, deep learning performance can also be improved since the deep learning network model can be applied in units of a frame, a scene, and a picture, and the deep learning network model can be accurately applied. The method for generating the multimedia bitstream comprises: a step of generating a payload type field; a step of generating a use case field; a step of generating a location index field; a step of generating a parameter index field; a step of generating an attribute field; and a step of generating a payload field.

Description

Method for generating a multimedia bitstream including a deep learning network model and an apparatus therefor

본 발명은 딥러닝 네트워크 모델의 표현 및 저장 방법에 관한 것으로, 특히 멀티미디에 비트스트림에 딥러닝 네트워크 모델 데이터를 포함시키는 기술에 관한 것이다.The present invention relates to a method for representing and storing a deep learning network model, and more particularly, to a technique for including deep learning network model data in a bitstream on a multimedia.

딥러닝 기술은 영상 인식, 음성 신호처리, 자연어 처리 등 다양한 응용분야에서 기존 전통적인 방법을 압도하는 고무적인 성능을 보여주고 있다. 딥러닝 기술은 가중치(Weights)와 편향치(Biases)로 구성된 네트워크 모델을 주어진 응용분야에 적합하게 학습시키고, 학습된 네트워크 모델을 해당 응용분야에 다시 적용하는 과정을 거쳐 성능을 개선해 나간다. Deep learning technology is showing encouraging performance that surpasses existing traditional methods in various application fields such as image recognition, voice signal processing, and natural language processing. Deep learning technology improves performance by training a network model composed of weights and biases to suit a given application field, and then re-applying the learned network model to the application field.

이를 위해 멀티미디어 비트스트림 뿐 아니라 해당되는 딥러닝 네트워크 모델을 같이 전송해야 할 필요가 있다. 하지만 멀티미디어 비트스트림 전송과 딥러닝 네트워크 모델의 전송은 따로 이루어지므로 어떤 비트스트림에 어떤 딥러닝 네트워크 모델이 적용되어야 하는지 관리가 어려운 문제가 있다. For this, it is necessary to transmit not only the multimedia bitstream but also the corresponding deep learning network model. However, since the transmission of the multimedia bitstream and the transmission of the deep learning network model are performed separately, there is a problem in that it is difficult to manage which deep learning network model is applied to which bitstream.

본 발명의 발명자들은 이러한 종래 기술의 딥러닝 네트워크 모델의 관리의 어려움을 해결하기 위해 연구 노력해 왔다. 종래기술에서 멀티미디어 비트스트림과 여기에 적용되는 딥러닝 네트워크 모델이 함께 관리될 수 있는 멀티미디어 비트스트림 생성 방법을 제공하기 위해 많은 노력 끝에 본 발명을 완성하기에 이르렀다.The inventors of the present invention have been researching and trying to solve the difficulties of managing such a deep learning network model of the prior art. In the prior art, the present invention was completed after much effort to provide a multimedia bitstream generation method in which a multimedia bitstream and a deep learning network model applied thereto can be managed together.

본 발명의 목적은 멀티미디어 비트스트림 내에 딥러닝 네트워크 모델 정보를 포함시켜 하나의 파일로 생성함으로써 멀티미디어 비트스트림과 딥러닝 네트워크 모델의 관리가 용이한 멀티미디어 비트스트림 생성방법을 제공하는 것이다.It is an object of the present invention to provide a multimedia bitstream generating method that facilitates management of a multimedia bitstream and a deep learning network model by including deep learning network model information in the multimedia bitstream and generating it as a single file.

한편, 본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론 할 수 있는 범위 내에서 추가적으로 고려될 것이다.On the other hand, other objects not specified in the present invention will be additionally considered within the range that can be easily inferred from the following detailed description and effects thereof.

본 발명에 따른 비트스트림 생성 방법은, A method for generating a bitstream according to the present invention comprises:

딥러닝 네트워크 모델을 구분하는 페이로드 타입 필드를 생성하는 단계; 멀티미디어 데이터에 적용될 알고리즘을 구분하는 유즈 케이스 필드를 생성하는 단계; 상기 딥러닝 네트워크 모델이 적용되는 상기 멀티미디어 데이터 내의 위치를 지정하는 위치 인덱스 필드를 생성하는 단계; 상기 멀티미디어 데이터에 적용될 파라미터를 지정하는 파라미터 인덱스 필드를 생성하는 단계; 상기 딥러닝 네트워크 모델이 적용된 후의 상기 멀티미디어 데이터의 속성을 나타내는 속성 필드를 생성하는 단계; 및 상기 페이로드 타입에 따른 딥러닝 네트워크 모델 데이터를 포함하는 페이로드 필드를 생성하는 단계;를 포함한다.generating a payload type field that identifies a deep learning network model; generating a use case field that identifies an algorithm to be applied to multimedia data; generating a location index field that specifies a location in the multimedia data to which the deep learning network model is applied; generating a parameter index field that designates a parameter to be applied to the multimedia data; generating an attribute field indicating an attribute of the multimedia data after the deep learning network model is applied; and generating a payload field including deep learning network model data according to the payload type.

상기 비트스트림은, 멀티미디어 비트스트림의 부가정보 영역에 포함되는 것을 특징으로 한다The bitstream is characterized in that it is included in the additional information area of the multimedia bitstream.

상기 비트스트림은, 멀티미디어 비트스트림의 멀티미디어 데이터 영역에 포함되는 것을 특징으로 한다.The bitstream is characterized in that it is included in the multimedia data area of the multimedia bitstream.

상기 딥러닝 네트워크 모델이 적용되는 상기 멀티미디어 데이터 내의 위치는, 멀티미디어 데이터의 프레임, 영역 또는 신(Scene) 단위로 구분되는 것을 특징으로 한다.The position in the multimedia data to which the deep learning network model is applied is characterized in that it is divided into frames, regions, or scenes of the multimedia data.

상기 멀티미디어 데이터에 적용될 파라미터 인덱스는, 비디오 파라미터 세트(VPS: Video Parameter Set) 인덱스, 시퀀스 파라미터 세트(SPS: Sequence Parameter Set) 인덱스 또는 픽쳐 파라미터 세트(PPS: Picture Parameter Set) 인덱스인 것을 특징으로 한다.The parameter index to be applied to the multimedia data is a video parameter set (VPS) index, a sequence parameter set (SPS) index, or a picture parameter set (PPS) index.

상기 유즈 케이스 필드를 생성하는 단계에서, 상기 멀티미디어 데이터에 적용될 알고리즘은 수퍼 레졸루션(Super Resolution) 또는 노이즈 리덕션(Noise Reduction)인 것을 특징으로 한다.In the step of generating the use case field, the algorithm to be applied to the multimedia data is characterized in that it is super resolution or noise reduction.

상기 딥러닝 네트워크 모델이 적용된 후의 상기 멀티미디어 데이터의 속성은, 영상 해상도, 프레임 레이트 또는 색공간을 포함하는 것을 특징으로 한다The properties of the multimedia data after the deep learning network model is applied include image resolution, frame rate, or color space.

본 발명의 다른 실시예에 따른 인코딩 장치는, 메모리; 및 하나 이상의 프로세서를 포함하여 멀티미디어 데이터 및 상기 멀티미디어 데이터에 적용될 딥러닝 네트워크 모델을 포함하는 비트스트림을 생성하여 상기 메모리에 저장하는 비트스트림 생성부;를 포함하되, 상기 비트스트림 생성부는, 딥러닝 네트워크 모델을 구분하는 페이로드 타입 필드를 생성하고, 멀티미디어 데이터에 적용될 알고리즘을 구분하는 유즈 케이스 필드를 생성하고, 상기 딥러닝 네트워크 모델이 적용되는 상기 멀티미디어 데이터 내의 위치를 지정하는 인덱스 필드를 생성하고, 상기 딥러닝 네트워크 모델이 적용된 후의 상기 멀티미디어 데이터의 속성을 나타내는 필드를 생성하며, 상기 페이로드 타입에 따른 딥러닝 네트워크 모델 데이터를 포함하는 페이로드 필드를 생성하는 것을 특징으로 한다.An encoding apparatus according to another embodiment of the present invention includes: a memory; and a bitstream generator including one or more processors to generate a bitstream including multimedia data and a deep learning network model to be applied to the multimedia data and store it in the memory; including, wherein the bitstream generator comprises: a deep learning network Generates a payload type field for classifying a model, a use case field for classifying an algorithm to be applied to multimedia data, an index field for designating a location in the multimedia data to which the deep learning network model is applied, and the It is characterized in that a field indicating the properties of the multimedia data after the deep learning network model is applied is generated, and a payload field including the deep learning network model data according to the payload type is generated.

본 발명의 다른 실시예에 따른 비트스트림 파싱 방법은,A bitstream parsing method according to another embodiment of the present invention,

페이로드 타입 필드에 의해 딥러닝 네트워크 모델의 종류를 구분하는 단계; 유즈 케이스 필드에 의해 멀티미디어 데이터에 적용될 알고리즘을 구분하는 단계; 위치 인덱스 필드에 의해 상기 딥러닝 네트워크 모델이 적용되는 상기 멀티미디어 데이터 내의 위치 구분하는 단계; 속성 필드에 의해 상기 딥러닝 네트워크 모델이 적용된 후의 상기 멀티미디어 데이터의 속성을 구분하는 단계; 및 페이로드 필드에 의해 상기 페이로드 타입에 따른 딥러닝 네트워크 모델 데이터를 구분하는 단계;를 포함한다.classifying the deep learning network model type by the payload type field; classifying an algorithm to be applied to multimedia data according to a use case field; Separating the location in the multimedia data to which the deep learning network model is applied by the location index field; classifying the properties of the multimedia data after the deep learning network model is applied by the property field; and classifying the deep learning network model data according to the payload type by the payload field.

본 발명의 또 다른 실시예에 따른 디코딩 장치는,A decoding apparatus according to another embodiment of the present invention,

메모리; 및 하나 이상의 프로세서를 포함하여 멀티미디어 데이터에 적용될 딥러닝 네트워크 모델을 포함하는 비트스트림을 파싱하여 상기 메모리에 저장하는 비트스트림 파싱부;를 포함하되, 상기 비트스트림 파싱부는, 페이로드 타입 필드에 의해 딥러닝 네트워크 모델의 종류를 구분하고, 유즈 케이스 필드에 의해 멀티미디어 데이터에 적용될 알고리즘을 구분하고, 위치 인덱스 필드에 의해 상기 딥러닝 네트워크 모델이 적용되는 상기 멀티미디어 데이터 내의 위치 구분하고, 속성 필드에 의해 상기 딥러닝 네트워크 모델이 적용된 후의 상기 멀티미디어 데이터의 속성을 구분하며, 페이로드 필드에 의해 상기 페이로드 타입에 따른 딥러닝 네트워크 모델 데이터를 구분하는 것을 특징으로 한다.Memory; and a bitstream parsing unit that parses and stores a bitstream including a deep learning network model to be applied to multimedia data by including one or more processors in the memory; including, wherein the bitstream parsing unit performs the deep learning by the payload type field Classify the type of learning network model, classify an algorithm to be applied to multimedia data by a use case field, classify a location in the multimedia data to which the deep learning network model is applied by a location index field, and classify the deep learning network model by an attribute field It is characterized in that the attribute of the multimedia data after the learning network model is applied is distinguished, and the deep learning network model data according to the payload type is distinguished by the payload field.

본 발명에 따르면 딥러닝 네트워크 모델 데이터를 멀티미디어 비트스트림에 함께 포함시킴으로써 어떤 멀티미디어 비트스트림에 어떤 딥러닝 네트워크 모델을 적용할 것인지 관리가 용이한 장점이 있다.According to the present invention, there is an advantage in that it is easy to manage which deep learning network model is applied to which multimedia bitstream by including the deep learning network model data in the multimedia bitstream.

또한 멀티미디어 비트스트림 단위가 아니라 멀티미디어 비트스트림에 포함된 프레임 단위 등 보다 세밀한 단위마다 딥러닝 네트워크 모델을 적용할 수 있으므로 딥러닝 모델의 성능을 향상시킬 수 있는 효과도 있다.In addition, since the deep learning network model can be applied to each more granular unit such as a frame unit included in the multimedia bitstream rather than the multimedia bitstream unit, there is an effect of improving the performance of the deep learning model.

한편, 여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급됨을 첨언한다.On the other hand, even if it is an effect not explicitly mentioned herein, it is added that the effects described in the following specification expected by the technical features of the present invention and their potential effects are treated as described in the specification of the present invention.

도 1은 종래기술에 따른 멀티미디어 비트스트림과 딥러닝 네트워크 모델이 적용되는 예를 나타낸다.
도 2는 본 발명의 바람직한 어느 실시예에 따른 딥러닝 네트워크 모델을 멀티미디어 비트스트림에 적용하기 위한 예이다
도 3은 본 발명의 바람직한 어느 실시예에 따른 멀티미디어 비트스트림 생성방법이 적용된 밀티미디어 비트스트림의 구조를 나타낸다.
도 4는 본 발명의 바람직한 다른 실시예에 따른 멀티미디어 비트스트림 생성방법의 흐름도이다.
도 5는 본 발명의 바람직한 또 다른 실시예에 따른 멀티미디어 비트스트림 디코딩 방법의 흐름도이다.
도 6은 본 발명의 바람직한 어느 실시예에 따른 인코딩 장치의 구조도이다.
도 7은 본 발명의 바람직한 어느 실시예에 따른 디코딩 장치의 구조도이다.
※ 첨부된 도면은 본 발명의 기술사상에 대한 이해를 위하여 참조로서 예시된 것임을 밝히며, 그것에 의해 본 발명의 권리범위가 제한되지는 아니한다1 shows an example in which a multimedia bitstream and a deep learning network model according to the prior art are applied.
2 is an example for applying a deep learning network model to a multimedia bitstream according to a preferred embodiment of the present invention.
3 shows the structure of a multimedia bitstream to which the multimedia bitstream generating method according to a preferred embodiment of the present invention is applied.
4 is a flowchart of a method for generating a multimedia bitstream according to another preferred embodiment of the present invention.
5 is a flowchart of a multimedia bitstream decoding method according to another preferred embodiment of the present invention.
6 is a structural diagram of an encoding apparatus according to a preferred embodiment of the present invention.
7 is a structural diagram of a decoding apparatus according to a preferred embodiment of the present invention.
※ It is revealed that the accompanying drawings are exemplified as a reference for understanding the technical idea of the present invention, and the scope of the present invention is not limited thereby

본 발명의 상기 목적과 수단 및 그에 따른 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.The above object and means of the present invention and its effects will become more apparent through the following detailed description in relation to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can easily understand the technical idea of the present invention. will be able to carry out In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며, 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 경우에 따라 복수형도 포함한다. 본 명세서에서, "포함하다", “구비하다”, “마련하다” 또는 “가지다” 등의 용어는 언급된 구성요소 외의 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for the purpose of describing the embodiments, and is not intended to limit the present invention. In this specification, the singular form also includes the plural form as the case may be, unless otherwise specified in the phrase. In this specification, terms such as "include", "provide", "provide" or "have" do not exclude the presence or addition of one or more other elements other than the mentioned elements.

본 명세서에서, “또는”, “적어도 하나” 등의 용어는 함께 나열된 단어들 중 하나를 나타내거나, 또는 둘 이상의 조합을 나타낼 수 있다. 예를 들어, “A 또는 B”, “A 및 B 중 적어도 하나”는 A 또는 B 중 하나만을 포함할 수 있고, A와 B를 모두 포함할 수도 있다.In this specification, terms such as “or” and “at least one” may indicate one of the words listed together, or a combination of two or more. For example, “A or B” and “at least one of A and B” may include only one of A or B, or both A and B.

본 명세서에서, “예를 들어” 등에 따르는 설명은 인용된 특성, 변수, 또는 값과 같이 제시한 정보들이 정확하게 일치하지 않을 수 있고, 허용 오차, 측정 오차, 측정 정확도의 한계와 통상적으로 알려진 기타 요인을 비롯한 변형과 같은 효과로 본 발명의 다양한 실시 예에 따른 발명의 실시 형태를 한정하지 않아야 할 것이다.In the present specification, descriptions according to “for example” and the like may not exactly match the information presented, such as recited properties, variables, or values, tolerances, measurement errors, limits of measurement accuracy, and other commonly known factors The embodiments of the invention according to various embodiments of the present invention should not be limited by effects such as modifications, including

본 명세서에서, 어떤 구성요소가 다른 구성요소에 ‘연결되어’ 있다거나 ‘접속되어’ 있다고 기재된 경우, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성 요소에 ‘직접 연결되어’ 있다거나 ‘직접 접속되어’ 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해될 수 있어야 할 것이다.In this specification, when it is described that a component is 'connected' or 'connected' to another component, it may be directly connected or connected to the other component, but other components may exist in between. It should be understood that there may be On the other hand, when it is mentioned that a certain element is 'directly connected' or 'directly connected' to another element, it should be understood that there is no other element in the middle.

본 명세서에서, 어떤 구성요소가 다른 구성요소의 '상에' 있다거나 '접하여' 있다고 기재된 경우, 다른 구성요소에 상에 직접 맞닿아 있거나 또는 연결되어 있을 수 있지만, 중간에 또 다른 구성요소가 존재할 수 있다고 이해되어야 할 것이다. 반면, 어떤 구성요소가 다른 구성요소의 '바로 위에' 있다거나 '직접 접하여' 있다고 기재된 경우에는, 중간에 또 다른 구성요소가 존재하지 않은 것으로 이해될 수 있다. 구성요소 간의 관계를 설명하는 다른 표현들, 예를 들면, '～사이에'와 '직접 ～사이에' 등도 마찬가지로 해석될 수 있다.In this specification, when it is described that a certain element is 'on' or 'adjacent' to another element, it may be directly in contact with or connected to the other element, but another element may exist in the middle. It should be understood that On the other hand, when it is described that a certain element is 'directly above' or 'directly' of another element, it may be understood that another element does not exist in the middle. Other expressions describing the relationship between elements, for example, 'between' and 'directly between', etc. may be interpreted similarly.

본 명세서에서, '제1', '제2' 등의 용어는 다양한 구성요소를 설명하는데 사용될 수 있지만, 해당 구성요소는 위 용어에 의해 한정되어서는 안 된다. 또한, 위 용어는 각 구성요소의 순서를 한정하기 위한 것으로 해석되어서는 안되며, 하나의 구성요소와 다른 구성요소를 구별하는 목적으로 사용될 수 있다. 예를 들어, '제1구성요소'는 '제2구성요소'로 명명될 수 있고, 유사하게 '제2구성요소'도 '제1구성요소'로 명명될 수 있다.In this specification, terms such as 'first' and 'second' may be used to describe various components, but the components should not be limited by the above terms. In addition, the above term should not be construed as limiting the order of each component, and may be used for the purpose of distinguishing one component from another. For example, a 'first component' may be termed a 'second component', and similarly, a 'second component' may also be termed a 'first component'.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used herein may be used with meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하도록 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 종래기술에 따른 멀티미디어 비트스트림과 딥러닝 네트워크 모델이 적용되는 예를 나타낸다.1 shows an example in which a multimedia bitstream and a deep learning network model according to the prior art are applied.

멀티미디어 비트스트림 디코딩 장치(1)는 디코더(2)와 딥러닝 처리부(3)를 포함한다.The multimedia bitstream decoding apparatus 1 includes a decoder 2 and a deep learning processing unit 3 .

디코더(2)는 멀티미티어 비트스트림(4)을 디코딩하여 비디오/오디오로 만들어낼 수 있다.The decoder 2 may decode the multimedia bitstream 4 to produce video/audio.

딥러닝 처리부(3)는 디코딩 된 비디오/오디오 컨텐츠에 딥러닝 네트워크 모델(5)을 적용하여 화질이나 음질이 개선된 비디오/오디오 컨텐츠를 만들어낸다.The deep learning processing unit 3 applies the deep learning network model 5 to the decoded video/audio content to create video/audio content with improved picture quality or sound quality.

딥러닝이 적용되는 분야가 많아지면서 딥러닝 네트워크 모델의 교환을 위한 표준 모델의 필요성이 높아지고 있다. 그에 따라 마이크로소프트 및 페이스북 등의 기업 연합에서 ONNX(Open Neural Network Exchange) 포맷을, 첨단 가속 표준을 제정하는 하드웨어 및 소프트웨어 산업체 컨소시엄인 크로노스 그룹은 NNEF(Neural Network Exchange Format)를 제정하였다. 또한 국제 표준 기구인 MPEG(Moving Picture Experts Group)은 딥러닝 네트워크 모델의 가중치(Weights)와 편향치(Biases)를 효율적으로 압축하기 위해 NNR(Neural Network Representation)의 표준을 제정 중이고 딥러닝 네트워크 모델의 특징(Feature) 정보를 압축, 전송하기 위한 VCM(Video Coding for Machine) 표준 제정 또한 진행되고 있다.As the fields to which deep learning is applied increase, the need for a standard model for exchanging deep learning network models is increasing. Accordingly, the Open Neural Network Exchange (ONNX) format was established by a coalition of companies such as Microsoft and Facebook, and the Neural Network Exchange Format (NNEF) was established by the Kronos Group, a hardware and software industry consortium that establishes advanced acceleration standards. In addition, MPEG (Moving Picture Experts Group), an international standard organization, is establishing a standard for Neural Network Representation (NNR) to efficiently compress the weights and biases of a deep learning network model. VCM (Video Coding for Machine) standard establishment for compressing and transmitting feature information is also in progress.

그런데 이러한 딥러닝 네트워크 모델의 표준화는 딥러닝 네트워크 모델을 정의하고 가중치를 효과적으로 교환하기 위한 방법만 기술되어 있을 뿐 이들 딥러닝 네트워크 모델이 적용되는 멀티미디어 스트림과의 관련성에 관한 부분은 없이 독립적으로 구성된다.However, this deep learning network model standardization only describes a method for defining the deep learning network model and effectively exchanging weights, and it is independently composed without any relation to the multimedia stream to which these deep learning network models are applied. .

따라서 도 1에서 디코더(2)는 비트스트림(4)를 처리하고, 딥러닝 처리부(3)는 딥러닝 네트워크 모델(5)을 적용하는데, 비트스트림(4)과 딥러닝 네트워크 모델(5)은 어떠한 연관성 없이 독립적으로 관리되고 있는 현실이다.Therefore, in Fig. 1, the decoder 2 processes the bitstream 4, the deep learning processing unit 3 applies the deep learning network model 5, and the bitstream 4 and the deep learning network model 5 are It is a reality that is managed independently without any connection.

이렇게 독립적으로 비트스트림(4)과 딥러닝 네트워크 모델(5)이 관리되면 어떤 비트스트림(4)에 어떤 딥러닝 네트워크 모델(5)이 적용되는 지 관리가 쉽지 않다. 게다가 동일한 비트스트림(4) 내에서도 씬(Scene)단위나 프레임(Frame)단위로 딥러닝 네트워크 모델(5)을 적용하려면 관리의 어려움은 더 커지게 마련이다.If the bitstream (4) and the deep learning network model (5) are managed independently in this way, it is not easy to manage which deep learning network model (5) is applied to which bitstream (4). In addition, if the deep learning network model 5 is applied in units of scenes or frames even within the same bitstream 4, the difficulty of management is bound to increase.

도 2는 이러한 종래기술의 문제점을 해결하기 위한 본 발명의 바람직한 어느 실시예에 따른 딥러닝 네트워크 모델이 포함된 멀티미디어 비트스트림의 구조이다. 2 is a structure of a multimedia bitstream including a deep learning network model according to a preferred embodiment of the present invention for solving the problems of the prior art.

일반적으로 영상(Video)이나 음성(Audio)으로 이루어진 멀티미디어 데이터는 매우 많은 양의 데이터로 구성된다. 따라서 멀티미디어 데이터를 효율적으로 저장 및 전송하기 위해서는 주로 압축이 이루어진다.In general, multimedia data composed of video or audio consists of a very large amount of data. Therefore, in order to efficiently store and transmit multimedia data, compression is mainly performed.

멀티미디어 데이터를 압축하기 위해 다양한 국제 표준이 존재하며 전송 또는 저장을 위해 압축된 데이터를 멀티미디어 스트림이라 한다. 예를 들어 영상 데이터를 위한 국제 표준으로는 H.264, HEVC(Hi Efficiency Video Coding) 등이 있고 이미지 데이터를 위한 국제 표준으로는 JPEG(Joint Picture Experts Group), JPEG2000 등이 널리 사용된다. 또한 음성 데이터를 위한 표준으로 MPEG-3(MP3), AAC(Advanced Audio Coding) 등이 있다. Various international standards exist for compressing multimedia data, and the data compressed for transmission or storage is called a multimedia stream. For example, international standards for image data include H.264 and HEVC (Hi Efficiency Video Coding), and international standards for image data include Joint Picture Experts Group (JPEG) and JPEG2000. Also, there are MPEG-3 (MP3) and AAC (Advanced Audio Coding) as standards for voice data.

이러한 멀티미디어 비트스트림 표준에는 대부분 멀티미디어 데이터 자체를 저장하는 영역인 페이로드(Payload)영역(10)과 기타 정보를 포함하는 부가정보 영역(20)으로 구분된다. 부가정보 영역(20)은 비어진 채로 전송되는 경우도 있다. 부가정보 영역(20)에는 정의되지 않은(Unspecified) 영역이 존재하는데 이 영역은 표준이 정해지던 당시 정해지지 않은 용도를 위해 비워진 영역이다. 따라서 이 영역 중 일부를 사용하여 딥러닝 네트워크 모델을 전송하는 것이 본 발명의 목적이다. 부가정보 영역(20)에 딥러닝 네트워크 모델을 삽입함으로써 하나의 스트림에 멀티미디어 페이로드와 딥러닝 네트워크 모델을 동시에 포함시킬 수 있다.Most of these multimedia bitstream standards are divided into a payload area 10, which is an area for storing multimedia data itself, and an additional information area 20 including other information. In some cases, the additional information area 20 is transmitted empty. An unspecified area exists in the additional information area 20, and this area is an area that is vacated for a use that is not determined at the time the standard is defined. Therefore, it is an object of the present invention to transmit a deep learning network model using some of these areas. By inserting the deep learning network model in the additional information area 20, it is possible to simultaneously include the multimedia payload and the deep learning network model in one stream.

이 부가정보 영역(20)에는 각각의 멀티미디어 표준에서 정의된 딥러닝 네트워크 모델 뿐만 아니라 기존에 정의되었던 딥러닝 네트워크 모델 표준이 포함될 수 있다. 예를 들면 위에서 언급된 ONNX, NNEF, NNR 포맷 등이 부가정보 영역(20)에 포함될 수 있고, 이들을 표시하는 플래그(Flag)도 추가될 수 있다.This additional information area 20 may include not only the deep learning network model defined in each multimedia standard, but also the deep learning network model standard defined previously. For example, the above-mentioned ONNX, NNEF, NNR format, etc. may be included in the additional information area 20, and a flag indicating them may also be added.

부가정보 역역(20)에는 어떠한 딥러닝 모델이 사용되었는지 표시되는 유즈케이스(21)나 각 페이로드의 시작을 구분할 수 있는 헤더(22) 및 딥러닝 네트워크 모델 데이터들인 페이로드(23, 24, 25) 들이 포함될 수 있다.In the additional information region 20, a use case 21 indicating which deep learning model was used, a header 22 that can distinguish the start of each payload, and payloads 23, 24, 25 which are deep learning network model data ) may be included.

더불어 H.264, HEVC, VVC와 같은 NAL(Network Abstraction Layer) 유닛을 가지는 비디오 표준에서는 NAL 유닛 중 정의되지 않은(Unspecified) 영역에 새로운 타입의 NAL 유닛을 추가하여 딥러닝 네트워크 모델을 추가할 수도 있다. 부가정보 영역의 예와 마찬가지로 NAL 유닛에는 각각의 멀티미디어 표준에서 정의된 딥러닝 네트워크 모델 포맷 뿐만 아니라 기존 정의된 딥러닝 네트워크 모델 표준을 포함시킬 수 있다. 딥러닝 네트워크 모델을 위한 비트스트림 정의에서는 딥러닝 네트워크가 어떤 프레임, 어떤 신(Scene) 등에 적용되는지를 가리키는 플래그가 포함될 수 있다.In addition, in video standards having NAL (Network Abstraction Layer) units such as H.264, HEVC, and VVC, a deep learning network model can be added by adding a new type of NAL unit to an unspecified area among NAL units. . As in the example of the additional information area, the NAL unit may include not only the deep learning network model format defined in each multimedia standard but also the previously defined deep learning network model standard. A bitstream definition for a deep learning network model may include a flag indicating which frame, which scene, etc., the deep learning network is applied to.

다음 표 1은 HEVC에 위와 같은 방법을 적용한 예를 나타낸다.Table 1 below shows an example of applying the above method to HEVC.

HEVC 표준에는 부가정보를 전송하기 위한 SEI(Supplemental Enhancement Information) 영역이 정의되어 있다. 본 발명에서는 SEI 영역의 payloadType 중 할당되지 않은 값에 딥러닝 네트워크 모델을 추가하는 방법을 제시한다. payloadType에 aaa, bbb, ccc, ddd 값을 할당함으로써 각각 ONNX, NNEF, NNR, DNM(Deeplearning Network Model) 등의 딥러닝 모델을 할당할 수 있다. DNM은 새로 정의되는 딥러닝 네트워크 모델을 의미한다.The HEVC standard defines a Supplemental Enhancement Information (SEI) area for transmitting additional information. The present invention proposes a method of adding a deep learning network model to an unassigned value among payloadTypes of the SEI area. By assigning aaa, bbb, ccc, and ddd values to payloadType, deep learning models such as ONNX, NNEF, NNR, and DNM (Deeplearning Network Model) can be assigned, respectively. DNM stands for a newly defined deep learning network model.

다음 표 2는 payloadType이 ONNX인 경우의 신택스 구조를 보여준다.Table 2 below shows the syntax structure when payloadType is ONNX.

ONNX(payloadSize)ONNX(payloadSize) 설명explanation use_caseuse_case Super resolution, Noise Reduction 등Super resolution, Noise Reduction, etc. frame_idframe_id 적용될 프레임 인덱스Frame index to be applied vps_id vps_id 적용될 vps 인덱스vps index to be applied sps_idsps_id 적용될 sps 인덱스sps index to be applied pps_idpps_id 적용될 pps 인덱스pps index to be applied frame_resolutionframe_resolution 딥러닝 모델이 적용된 후의 영상 해상도Image resolution after deep learning model is applied frame_rateframe_rate 딥러닝 모델이 적용된 후의 프레임 레이트Frame rate after the deep learning model is applied color_spacecolor_space 딥러닝 모델이 적용된 후의 색공간Color space after deep learning model is applied ONNX_payloadONNX_payload ONNX 포맷의 페이로드. ONNX 포맷이 포함하는 모든 데이터Payload in ONNX format. All data contained in ONNX format ONNX_model_dataONNX_model_data ONNX 포맷에서 포함하지 않지만 추가로 필요한 데이터Data not included in ONNX format but required additionally

use_case는 딥러닝 네트워크 모델이 적용되는 응용을 보여준다. 해상도를 높여주는 Super resolution이나 노이즈를 제거하기 위한 Noise Reduction등의 응용이 해당될 수 있다.use_case shows the application to which the deep learning network model is applied. Applications such as super resolution to increase resolution or noise reduction to remove noise may be applicable.

frame_id, vps_id(video parameter set), sps_id(scene paraeter set), pps_id(picture parameter set)는 딥러닝 네트워크 모델이 어떤 프레임, 어떤 신(scene), 어떤 픽쳐(picture)에 적용될지를 가리킨다.frame_id, vps_id (video parameter set), sps_id (scene paraeter set), and pps_id (picture parameter set) indicate which frame, which scene, and which picture the deep learning network model is applied to.

frame_reolution, frame_rate, color_space는 딥러닝 네트워크 모델이 적용된 후의 영상의 속성, 즉, 프레임 해상도, 프레임 레이트, 색 공간 등의 영상의 속성을 나타낸다.frame_reolution, frame_rate, and color_space represent image properties after the deep learning network model is applied, that is, image properties such as frame resolution, frame rate, and color space.

ONNX_payload는 ONNX 표준이 가지고 있는 데이터들을 의미하고, ONNX표준 제이터를 모두 포함한다.ONNX_payload refers to the data that the ONNX standard has, and includes all data from the ONNX standard.

ONNX 외에도 NNEF, NNR, DNM등의 유즈 케이스도 같은 형식으로 페이로드가 정의될 수 있을 것이다.In addition to ONNX, use cases such as NNEF, NNR, and DNM may also have payload defined in the same format.

도 3은 본 발명의 바람직한 어느 실시예에 따른 멀티미디어 비트스트림 생성방법이 적용된 밀티미디어 비트스트림의 구조를 나타낸다.3 shows the structure of a multimedia bitstream to which the multimedia bitstream generating method according to a preferred embodiment of the present invention is applied.

payloadType이 ONNX인 부가정보 데이터(35)는 use_case는 SR(Super Resolution)이고 바로 앞의 INTRA 프레임(34)에 적용됨을 보여준다.Additional information data 35 of which payloadType is ONNX shows that use_case is SR (Super Resolution) and is applied to the immediately preceding INTRA frame 34 .

payloadType이 NNEF인 부가정보 데이터(38)는 use_case가 NR(Noise Reduction)이고 두 INTER 프레임(36, 37)에 적용되는 것을 나타낸다.The additional information data 38 whose payloadType is NNEF indicates that use_case is NR (Noise Reduction) and is applied to the two INTER frames 36 and 37 .

이처럼 한 스트림 내에서도 여러 가지의 딥러닝 네트워크 모델이 적용될 수 있고, 특정 프레임에만 딥러닝 모델이 적용될 수 있도록 설정할 수 있다.In this way, various deep learning network models can be applied even within one stream, and it can be set so that the deep learning model can be applied only to a specific frame.

도 4는 이상과 같은 부가정보 영역을 포함하는 멀티미디어 비트스트림 생성방법의 흐름도이다.4 is a flowchart of a method for generating a multimedia bitstream including the additional information area as described above.

우선 페이로드 타입(payloadType)을 설정한다(S10). 페이로드 타입은 ONNX, NNEF, NNR, DNM 등의 딥러닝 네트워크 모델에 할당된다. 페이로드 타입 값에 의해 어떤 딥러닝 네트워크 모델이 부가정보 영역에 포함되었는지 파악할 수 있다.First, a payload type is set ( S10 ). Payload types are assigned to deep learning network models such as ONNX, NNEF, NNR, and DNM. It is possible to determine which deep learning network model is included in the additional information area by the payload type value.

다음으로 유즈케이스 필드를 생성한다(S20). 유즈케이스(use_case)는 어떤 응용에 딥러닝 네트워크 모델이 사용되었는지를 나타낸다. 예를 들어 SD(Standard Defenition) 영상을 HD(High Definition)영상으로 해상도를 증가시키는 Super Resolution 기술이나 노이즈 제거를 위한 Noise_reduction 등이 유즈케이스의 예이다.Next, a use case field is generated (S20). The use case (use_case) indicates which application the deep learning network model is used for. For example, Super Resolution technology for increasing the resolution of SD (Standard Defenition) images into HD (High Definition) images or Noise_reduction for noise removal are examples of use cases.

유즈케이스 필드가 생성되면 유즈케이스가 적용될 위치를 나타내는 위치 인덱스 필드가 생성된다(S30). 비디오 전체에 적용하거나 프레임 단위, 신 단위, 픽쳐 단위 등 다양한 적용 범위를 구성할 수 있다.When the use case field is generated, a location index field indicating a location to which the use case is to be applied is generated (S30). It may be applied to the entire video or various application ranges such as frame unit, scene unit, and picture unit may be configured.

딥러닝 네트워크가 적용될 위치를 생성한 다음 적용될 파라미터의 인덱스를 생성한다(S40). 파라미터로는 비디오 파라미터 셋(vps), 신 파라미터 셋(sps), 픽쳐 파라미터 셋(PPS) 등이 있다.A position to which the deep learning network is applied is created, and then an index of the parameter to be applied is generated (S40). The parameters include a video parameter set (vps), a scene parameter set (sps), and a picture parameter set (PPS).

다음으로 딥러닝 네트워크가 적용된 후의 영상의 속성을 나타내는 필드를 생성한다(S50). 예를 들면, 프레임 해상도(frame_resolution), 프레임 레이트(frame_rate), 색공간(color space) 등이 영상의 속성을 나타날 수 있다.Next, a field indicating the properties of the image after the deep learning network is applied is generated (S50). For example, frame resolution (frame_resolution), frame rate (frame_rate), color space (color space), etc. may indicate the properties of the image.

마지막으로 페이로드 필드를 생성한다(S60). 예를 들어 ONNX 페이로드라면 ONNX 표준에서 포함하는 딥러닝 네트워크 모델 전체를 포함한다. ONNX 표준에는 딥러닝 모델의 구조, 가중치 등 전체 데이터를 포함한다.Finally, a payload field is generated (S60). For example, the ONNX payload includes the entire deep learning network model included in the ONNX standard. The ONNX standard includes all data such as structure and weights of deep learning models.

페이로드 필드를 생성한 다음 ONNX 모델 데이터에는 포함되지 않지만 추후 사용 가능한 데이터들을 위한 새로운 영역이 추가될 수도 있다.After creating the payload field, a new area for data that is not included in the ONNX model data but can be used later may be added.

딥러닝 네트워크 모델은 SEI 메시지에 포함될 수도 있지만 별도의 NAL 유닛 타입으로 정의하여 구성할 수도 있다. The deep learning network model can be included in the SEI message, but can also be defined and configured as a separate NAL unit type.

다음 표 3은 H..264의 부가정보 영역에 딥러닝 네트워크 모델 충욜를 nal_uit_type에 새로 할당하는 예를 나타낸다.The following Table 3 shows an example of newly allocating the deep learning network model fill to nal_uit_type in the additional information area of H..264.

HEVC의 경우 nal_unit_type 값이 48~63까지는 정의되어 있지 않다. 따라서 이 영역에 딥러닝 네트워크 모델을 적용할 수 있는 것이다. 예를 들어 48은 ONNX, 49는 NNEF, 50은 NNR, 51은 DNM을 표시할 수 있다. 각 딥러닝 네트워크 모델의 구성은 위에서 살펴본 바와 같다. 예를 들어 onnx_rbsp()의 구성은 앞에서 살펴본 SEI에 포함된 구성과 유사하다.In case of HEVC, nal_unit_type values are not defined from 48 to 63. Therefore, deep learning network models can be applied to this area. For example, 48 may indicate ONNX, 49 may indicate NNEF, 50 may indicate NNR, and 51 may indicate DNM. The configuration of each deep learning network model is as described above. For example, the configuration of onnx_rbsp() is similar to the configuration included in the SEI discussed above.

다음 표 4는 h.264의 sei 메세지에 딥러닝 네트워크 모델이 포함된 경우를 나타낸다.Table 4 below shows the case in which the deep learning network model is included in the h.264 sei message.

다음 표 6은 H.264엣 부가정보 영역이 아니라 별도의 NAL unit type에 딥러닝 네트워크 모델을 포함시키는 한 예이다.Table 6 below is an example of including the deep learning network model in a separate NAL unit type, not in the additional information area in H.264.

결국 포함되는 포맷이나 내용은 HEVC의 예와 크게 다르지 않다. 따라서 이와같은 비트스트림 생성 방법은 HEVC나 H.264에 한정되지 않고 VVC나 추후 정의될 어떤 비디오 스트림 표준에도 유사한 형태로 포함될 수 있을 것이다.After all, the included format or content is not much different from the HEVC example. Accordingly, such a bitstream generation method is not limited to HEVC or H.264, and may be included in VVC or any video stream standard to be defined later in a similar form.

도 5는 본 발명의 바람직한 또 다른 실시예에 따른 멀티미디어 비트스트림 디코딩 방법의 흐름도이다.5 is a flowchart of a multimedia bitstream decoding method according to another preferred embodiment of the present invention.

부가정보 영역이나 별도의 NAL 유닛 타입에 포함된 딥러닝 네트워크 모델을 디코딩하는 방법은 다음과 같다.A method of decoding a deep learning network model included in the additional information area or a separate NAL unit type is as follows.

우선 payloadType에 의해 딥러닝 네트워크 모델 종류를 구분한다(S110). 딥러닝 네트워크 모델은 ONN, NNEF, NMR, DNM 등이 있고, 이들 모델에 한정되지 않고 다양한 종류의 딥러닝 모델이 추가될 수 있다.First, the deep learning network model type is classified by payloadType (S110). The deep learning network model includes ONN, NNEF, NMR, and DNM, and is not limited to these models, and various types of deep learning models may be added.

딥러닝 네트워크 모델 구분 후 use_case를 이용해 어떤 알고리즘이 적용?? 지 구분한다(S120). Super Resolution이나 Noise Reduction 등 외에도 다양한 알고리즘이 포함될 수 있다After classifying the deep learning network model, which algorithm is applied using use_case? classify (S120). In addition to Super Resolution and Noise Reduction, various algorithms may be included.

다음으로 딥러닝 네트워크 모델이 적용될 위치를 구분한다(S130). 딥러닝 네트워크 모델이 전체 영상뿐 아니라 프레임, 픽쳐, 신(scene)단위로 적용되기 위해서는 어떤 위치에 어떤 딥러닝 모델이 적용될 지 위치 ID 를 통해 파악할 수 있다.Next, the location to which the deep learning network model is applied is divided (S130). In order for the deep learning network model to be applied not only to the entire image but also to the frame, picture, and scene units, it is possible to figure out which deep learning model is applied to which position through the location ID.

데이터가 디코딩된 다음 어떤 속성을 가질지는 속성 필드를 파싱함으로써 알아낼 수 있다(S140). 이 필드에 영상 해상도(frame_resolution), 프레임 레이트, 색 공간 등을 포함할 수 있다.After the data is decoded, it can be found out by parsing the attribute field which attribute it will have ( S140 ). This field may include image resolution (frame_resolution), frame rate, color space, and the like.

마지막으로 멀티미디어 비트스트림에 적용될 딥러닝 네트워크 모델의 데이터를 구분하여 딥러닝 연상을 수행할 수 있다(S150)Finally, deep learning association can be performed by classifying the data of the deep learning network model to be applied to the multimedia bitstream (S150)

도 6은 본 발명의 바람직한 어느 실시예에 따른 인코딩 장치의 구조도이다.6 is a structural diagram of an encoding apparatus according to a preferred embodiment of the present invention.

본 발명에 따른 인코딩 장치(100)는 인코더(110), 딥러닝 네트워크 모델(120), 메모리(130) 및 비트스트림 생성부(140)를 포함한다.The encoding apparatus 100 according to the present invention includes an encoder 110 , a deep learning network model 120 , a memory 130 , and a bitstream generator 140 .

인코딩 장치(100)는 비디오/오디오 신호가 입력되면 인코더(110)를 통해 압축 데이터로 인코딩한다. 그리고 비트스트림 생성부(140)는 인코딩 된 멀티미디어 신호와 함께, 멀티미디어 신호에 적용될 딥러닝 네트워크 모델을 포함시켜 비트스트림으로 생성한 후 메모리(130)에 저장한다.When a video/audio signal is input, the encoding apparatus 100 encodes it into compressed data through the encoder 110 . Then, the bitstream generator 140 includes a deep learning network model to be applied to the multimedia signal together with the encoded multimedia signal to generate a bitstream and store it in the memory 130 .

비트스트림 생성부(140)는 딥러닝 네트워크 모델을 구분하는 페이로드 타입 필드(payloadType field)를 생성한다. 페이로드 타입은 ONNX, NNEF, NMR, DNM 등일 수 있다.The bitstream generating unit 140 generates a payloadType field for classifying a deep learning network model. The payload type may be ONNX, NNEF, NMR, DNM, or the like.

그리고 이 딥러닝 네트워크 모델이 사용되는 알고리즘, 응용을 나타내는 유즈 케이스(use_case) 필드를 생성한다. 유즈 케이스는 Super Resolution, Noise Reduction 등의 응용일 수 있다.Then, a use case field indicating the algorithm and application in which this deep learning network model is used is generated. Use cases may be applications such as Super Resolution, Noise Reduction, and the like.

다음으로 이 딥러닝 네트워크 모델이 적용될 위치를 나타내는 인덱스 필드를 생성하고, 딥러닝 네트워크 모델이 적용된 후의 멀티미디어 파일의 속성을 나타내는 속성 필드가 생성된다. 위치를 나타내는 인덱스 필드는 프레임, 신, 픽쳐, 비디오 단위로 구분될 수 있다. 속성 필드로는 프레임 해상도, 프레임 레이트, 색공간 등의 멀티미디어 파일의 속성이 해당된다.Next, an index field indicating the location to which this deep learning network model is applied is created, and an attribute field indicating the properties of the multimedia file after the deep learning network model is applied is created. The index field indicating the position may be divided into units of frames, scenes, pictures, and videos. The property field corresponds to properties of a multimedia file such as frame resolution, frame rate, and color space.

마지막으로 페이로드 타입에 따른 딥러닝 네트워크 모델 데이터를 페이로드에 포함시킴으로써 비트스트림 생성이 완료된다. 딥러닝 네트워크 모델 데이터 ?玲〈? 정의되지 않은 새로운 데이터가 추가로 포함될 수도 있을 것이다.Finally, the bitstream generation is completed by including the deep learning network model data according to the payload type in the payload. Deep learning network model data? New undefined data may be additionally included.

도 7은 본 발명의 바람직한 어느 실시예에 따른 디코딩 장치의 구조도이다.7 is a structural diagram of a decoding apparatus according to a preferred embodiment of the present invention.

디코딩 장치(200)는 비트스트림 파서(210), 메모리(220), 디코더(230) 및 딥러닝 처리부(240)를 포함한다.The decoding apparatus 200 includes a bitstream parser 210 , a memory 220 , a decoder 230 , and a deep learning processing unit 240 .

디코딩 장치(200)는 멀티미디어 비트스트림과 딥러닝 네트워크 모델이 함께 포함된 비트스트림을 비트스트림 파서(210)를 통해 파싱한 후 이를 디코더(230)를 통해 멀티미디어 파일로 디코딩하고 딥러닝 네트워크 모델 데이터를 이용하여 딥러닝 처리부(240)에서 딥러닝 처리를 한 다음 비디오/오디오 파일이 최종 생성된다.The decoding device 200 parses the bitstream including the multimedia bitstream and the deep learning network model together through the bitstream parser 210, and then decodes it into a multimedia file through the decoder 230 and deep learning network model data. A video/audio file is finally generated after deep learning processing is performed in the deep learning processing unit 240 using

비트스트림 파서(210)는 다음 순서로 비트스트림에 포함된 딥러닝 네트워크 모델을 구분해낸다.The bitstream parser 210 classifies the deep learning network model included in the bitstream in the following order.

우선 페이로드 타입 필드에 의해 딥러닝 네트워크 모델을 구분한다. 딥러닝 네트워크 모델은 ONNX, NNEF, NNR, DNM 등일 수 있다.First, the deep learning network model is classified by the payload type field. The deep learning network model may be ONNX, NNEF, NNR, DNM, or the like.

다음으로 유즈케이스를 구분한다. 유즈케이스는 딥러닝 네트워크 모델이 적용될 응용을 의미한다. 해상도를 향상시키는 Super Resolution이나 노이즈를 감소시키는 Noise Reduction 등의 유즈케이스가 존재한다.Next, we classify the use cases. A use case refers to an application to which a deep learning network model is applied. There are use cases such as Super Resolution to improve resolution and Noise Reduction to reduce noise.

유즈케이스를 구분한 후에는 딥러닝 네트워크 모델이 적용될 비디오/오디오 파일 내의 위치에 관한 인덱스를 구분한다. 파일 내의 위치는 프레임, 신, 픽쳐 등의 단위로 구분될 수 있다.After classifying the use cases, the index on the location in the video/audio file to which the deep learning network model is applied is distinguished. The location in the file may be divided into units such as frames, scenes, and pictures.

다음 딥러닝 네트워크 모델이 적용된 후의 속성 필드가 파싱된다. 속성은 프레임 해상도, 프레임 레이트, 색공간 등이 해당된다.After the following deep learning network model is applied, the attribute field is parsed. Properties include frame resolution, frame rate, and color space.

마지막으로 페이로드에 실려있는 딥러닝 네트워크 모델 데이터 자체를 구분해낸다. 이에 의해 멀티미디어 파일의 필요한 위치에 적절한 딥러닝 네트워크 모델을 적용할 수 있는 장점이 있다.Finally, it identifies the deep learning network model data itself loaded in the payload. This has the advantage of being able to apply an appropriate deep learning network model to the required location of the multimedia file.

이상과 같은 본 발명에 따르면 딥러닝 네트워크 모델을 딥러닝 네트워크 모델이 적용될 멀티미디어 스트림과 함께 전송할 수 있으므로 관리가 용이하며, 프레임, 신, 픽쳐 단위로 딥러닝 네트워크 모델을 적용할 수 있으므로 딥러닝 성능 또한 향상시키며 정확한 딥러닝 네트워크 모델 적용이 가능한 장점이 있다.According to the present invention as described above, the deep learning network model can be transmitted together with the multimedia stream to which the deep learning network model is applied, so it is easy to manage. It has the advantage of being able to apply an accurate deep learning network model.

본 발명의 보호범위가 이상에서 명시적으로 설명한 실시예의 기재와 표현에 제한되는 것은 아니다. 또한, 본 발명이 속하는 기술분야에서 자명한 변경이나 치환으로 말미암아 본 발명이 보호범위가 제한될 수도 없음을 다시 한 번 첨언한다.The protection scope of the present invention is not limited to the description and expression of the embodiments explicitly described above. In addition, it is added once again that the protection scope of the present invention cannot be limited due to obvious changes or substitutions in the technical field to which the present invention pertains.

Claims

generating a payload type field that identifies a deep learning network model;
generating a use case field that identifies an algorithm to be applied to multimedia data;
generating a location index field that specifies a location in the multimedia data to which the deep learning network model is applied;
generating a parameter index field that designates a parameter to be applied to the multimedia data;
generating an attribute field indicating an attribute of the multimedia data after the deep learning network model is applied; and
generating a payload field including deep learning network model data according to the payload type;

The method of claim 1, wherein the bitstream comprises:
A method of generating a bitstream, characterized in that it is included in the additional information area of the multimedia bitstream.

The method of claim 1, wherein the bitstream comprises:
A method of generating a bitstream, characterized in that it is included in the multimedia data area of the multimedia bitstream.

According to claim 1, wherein the location in the multimedia data to which the deep learning network model is applied,
A method for generating a bitstream, characterized in that the multimedia data is divided into frames, regions, or scenes.

According to claim 1, The parameter index to be applied to the multimedia data,
A method of generating a bitstream, characterized in that it is a video parameter set (VPS) index, a sequence parameter set (SPS) index, or a picture parameter set (PPS) index.

The method of claim 1 , wherein in the step of generating the use case field,
An algorithm to be applied to the multimedia data is a method for generating a bitstream, characterized in that Super Resolution or Noise Reduction.

According to claim 1, wherein the properties of the multimedia data after the deep learning network model is applied,
A method of generating a bitstream, comprising image resolution, frame rate or color space.

Memory; and
A bitstream generator including one or more processors to generate a bitstream including multimedia data and a deep learning network model to be applied to the multimedia data and store it in the memory;
The bitstream generator,
Creates a payload type field that distinguishes the deep learning network model, creates a use case field that identifies an algorithm to be applied to multimedia data, and creates an index field that specifies a location in the multimedia data to which the deep learning network model is applied and generating a field indicating the properties of the multimedia data after the deep learning network model is applied, and generating a payload field including deep learning network model data according to the payload type, the encoding device.

classifying the deep learning network model type by the payload type field;
classifying an algorithm to be applied to multimedia data according to a use case field;
Separating the location in the multimedia data to which the deep learning network model is applied by the location index field;
classifying the properties of the multimedia data after the deep learning network model is applied by the property field; and
A bitstream parsing method comprising a; classifying the deep learning network model data according to the payload type by the payload field.

Memory; and
A bitstream parsing unit for parsing a bitstream including a deep learning network model to be applied to multimedia data, including one or more processors, and storing it in the memory; including,
The bitstream parsing unit,
Classify the type of deep learning network model by the payload type field, classify the algorithm to be applied to multimedia data by the use case field, and classify the location in the multimedia data to which the deep learning network model is applied by the location index field , Decoding device, characterized in that the attribute of the multimedia data after the deep learning network model is applied by the attribute field, and the deep learning network model data according to the payload type is distinguished by the payload field.