[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2022247452A1 - 多媒体资源中轨道数据的处理方法、装置、介质及设备 - Google Patents

多媒体资源中轨道数据的处理方法、装置、介质及设备 Download PDF

Info

Publication number
WO2022247452A1
WO2022247452A1 PCT/CN2022/083956 CN2022083956W WO2022247452A1 WO 2022247452 A1 WO2022247452 A1 WO 2022247452A1 CN 2022083956 W CN2022083956 W CN 2022083956W WO 2022247452 A1 WO2022247452 A1 WO 2022247452A1
Authority
WO
WIPO (PCT)
Prior art keywords
track data
knowledge image
bit stream
data
main bit
Prior art date
Application number
PCT/CN2022/083956
Other languages
English (en)
French (fr)
Inventor
胡颖
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22810185.3A priority Critical patent/EP4351142A4/en
Priority to US17/988,713 priority patent/US11949966B2/en
Publication of WO2022247452A1 publication Critical patent/WO2022247452A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/58Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • the present application relates to the field of computer and communication technologies, and in particular, relates to a processing method, device, medium and equipment for track data in multimedia resources.
  • AVS3 the third generation audio and video coding standard
  • main bitstream main bitstream
  • knowledge image bitstream library picture bitstream
  • Embodiments of the present application provide a method, device, medium, and equipment for processing orbit data in multimedia resources, and then at least to a certain extent, the association relationship between each orbit data can be obtained in advance according to the signaling file, avoiding the need to Temporary acquisition of knowledge image orbit data brings unnecessary delay.
  • a method for processing track data in a multimedia resource including: receiving a signaling file corresponding to the multimedia resource, the signaling file containing multiple track data of the multimedia resource Descriptors corresponding to each other, the plurality of track data includes the main bit stream track data corresponding to the main bit stream and the knowledge image track data corresponding to the knowledge image bit stream, and the dependencies contained in the descriptors corresponding to the main bit stream track data
  • the item identifier points to the descriptor corresponding to the knowledge image track data; parse the signaling file, and determine the dependency relationship between the main bit stream track data and the knowledge image track data according to the dependency item identifier; According to the dependency relationship, the knowledge image track data and the main bit stream track data are sequentially obtained from the data source side.
  • a method for processing track data in a multimedia resource including: generating a signaling file corresponding to the multimedia resource, the signaling file including a plurality of track data of the multimedia resource Descriptors corresponding to each other, the plurality of track data includes the main bit stream track data corresponding to the main bit stream and the knowledge image track data corresponding to the knowledge image bit stream, and the dependencies contained in the descriptors corresponding to the main bit stream track data
  • the item identifier points to the descriptor corresponding to the knowledge image track data; the signaling file is sent to the data receiver, so that the data party determines the main bit according to the dependency identifier in the signaling file
  • the dependency relationship between the stream track data and the knowledge image track data and sequentially acquire the knowledge image track data and the main bit stream track data from the data source side according to the dependency relationship.
  • a device for processing track data in multimedia resources including: a receiving unit configured to receive a signaling file corresponding to a multimedia resource, the signaling file containing the multimedia resource Descriptors corresponding to a plurality of track data respectively, the plurality of track data includes the main bit stream track data corresponding to the main bit stream and the knowledge image track data corresponding to the knowledge image bit stream, and the description corresponding to the main bit stream track data
  • the dependency identifier contained in the child points to the descriptor corresponding to the knowledge image track data
  • the parsing unit is configured to parse the signaling file, and determine the master bitstream track data and the knowledge image according to the dependency identifier Dependency between track data
  • an acquisition unit configured to sequentially acquire the knowledge image track data and the master bitstream track data from a data source side according to the dependency.
  • the descriptor corresponding to the knowledge image track data contains first element information, and the first element information is used to indicate that the descriptor containing the first element information is The descriptor corresponding to the knowledge image track data.
  • the plurality of track data includes at least two knowledge image track data
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains Second element information, where the second element information is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains third element information, and the value of the third element information is used to indicate Whether each knowledge image track data in the at least two knowledge image track data is dependent on a plurality of main bit stream track data.
  • the target knowledge image orbit data if there is target knowledge image orbit data on which a plurality of main bit stream orbit data depends on the at least two knowledge image orbit data, then the target knowledge image orbit data
  • the corresponding descriptor further includes fourth element information, and the fourth element information is used to indicate the frame rate of the specified main bit stream track data in the plurality of main bit stream track data.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data further includes a sample index identifier, and the sample index identifier is used to indicate the main bit
  • the sample index number interval of each knowledge image track data in the at least two knowledge image track data is indexed in the stream track data.
  • the sample index identifier includes fifth element information and sixth element information
  • the value of the fifth element information indicates that the at least two The minimum value of the sample index number of each knowledge image track data in the knowledge image track data
  • the value of the sixth element information indicates the sample indexing each knowledge image track data in the at least two knowledge image track data in the main bit stream track data The maximum index number.
  • the main bit stream track data includes an index identifier, and the index sign is used to indicate the knowledge image track data on which the main bit stream track data depends or to indicate The knowledge image track group on which the master bitstream track data depends.
  • the track data of the main bit stream includes a track reference type data box, and the track reference type data box includes a reference type field, and the reference type field is used to represent the The above index ID.
  • the track data of the main bit stream includes a track reference data box
  • the track reference data box includes the track reference type data box.
  • the plurality of track data includes at least two knowledge image track data, and each knowledge image track data in the at least two knowledge image track data includes a track group identifier , the track group identifier is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • each knowledge image track data in the at least two knowledge image track data also contains a parameter indicating whether each knowledge image track data in the at least two knowledge image track data
  • the first field information that is dependent on multiple main bit stream track data if the first field information indicates that the knowledge image track data is dependent on one main bit stream track data, then the knowledge image track data also includes the information indicating the A field in the main bitstream track data that indexes the minimum value of the sample index number of the knowledge image track data, and a field that indicates the maximum value of the sample index number in the main bitstream track data that indexes the knowledge image track data.
  • the knowledge image track data also includes the information respectively indicating the The field of the minimum value of the sample index number indexing the knowledge image track data in each of the main bit stream track data in the plurality of main bit stream track data respectively indicates that each of the main bit stream track data in the plurality of main bit stream track data
  • each knowledge image track data in the at least two knowledge image track data also includes a parameter indicating whether each knowledge image track data in the at least two knowledge image track data
  • the first field information that is dependent on multiple main bit stream track data; if the first field information indicates that the knowledge image track data is dependent on one main bit stream track data, then the knowledge image track data also includes the information indicating the A field in the main bitstream track data to index the sample group number of the knowledge image track data, and a field indicating a sample group index number in the main bitstream track data to index the knowledge image track data.
  • the knowledge image track data also includes the information respectively indicating the The field of the number of sample groups indexing the knowledge image track data in each of the main bit stream track data in the plurality of main bit stream track data respectively indicates the A field for the sample group index number of the knowledge image track data, and a field for indicating the frame rate of each main bit stream track data in the plurality of main bit stream track data.
  • the device for processing track data in multimedia resources further includes: a decoding unit configured to determine a decoding sequence according to the dependency relationship; Decoding the knowledge image track data and the main bit stream track data to obtain the multimedia resource.
  • the decoding unit is configured to: decode the main bit stream track data; and obtain the sample index number of the knowledge image track data in the decoding to obtain the main bit stream track data During the interval, according to the interval of the sample index number, determine the knowledge image orbit data that needs to be referenced from a plurality of knowledge image orbit data; decode the knowledge image orbit data that needs to be referred to.
  • a device for processing track data in multimedia resources including: a generating unit configured to generate a signaling file corresponding to the multimedia resource, and the signaling file contains the multimedia resource Descriptors corresponding to a plurality of track data respectively, the plurality of track data includes the main bit stream track data corresponding to the main bit stream and the knowledge image track data corresponding to the knowledge image bit stream, and the description corresponding to the main bit stream track data
  • the dependency identifier contained in the child points to the descriptor corresponding to the knowledge image track data; the sending unit is configured to send the signaling file to the data receiver, so that the data party according to the signaling file in the
  • the dependency identifier determines the dependency relationship between the main bit stream track data and the knowledge image track data, and sequentially acquires the knowledge image track data and the main bit stream from the data source side according to the dependency relationship track data.
  • the generating unit is further configured to: before generating the signaling file corresponding to the multimedia resource, generate the main bit stream track data corresponding to the main bit stream and the knowledge image bit stream corresponding
  • the main bit stream track data includes an index identifier, and the index identifier is used to indicate the knowledge image track data on which the main bit stream track data depends.
  • a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, the processing of the track data in the multimedia resource as described in the above-mentioned embodiments is realized. method.
  • an electronic device including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more When executed by multiple processors, the one or more processors implement the method for processing track data in multimedia resources as described in the foregoing embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for processing track data in multimedia resources provided in the above-mentioned various optional embodiments.
  • the party by receiving the signaling file corresponding to the multimedia resource, according to the descriptors corresponding to the plurality of track data contained in the signaling file, and the corresponding descriptors of the main bit stream track data
  • the dependency identifier contained in the descriptor determines the dependency relationship between the main bit stream orbit data and the knowledge image orbit data, and then obtains the knowledge image orbit data and the main bit stream orbit data from the data source side according to the dependency relationship, so that the data receiving According to the signaling file, the party can obtain the relationship between the various track data in advance to decide whether to obtain the knowledge image track data and what kind of knowledge image track data to request.
  • it avoids the need to Temporary acquisition of knowledge image track data brings unnecessary delay, which improves the encoding and decoding efficiency of media resources.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied;
  • FIG. 2 shows a schematic diagram of a placement manner of a video encoding device and a video decoding device in a streaming system
  • Figure 3 shows a basic flow diagram of a video encoder
  • Fig. 4 shows the overall transmission flowchart of the video file according to the embodiment of the application
  • Fig. 5 shows a schematic diagram of encoding a video sequence to generate a main bit stream and a knowledge image bit stream
  • FIG. 6 shows a flowchart of a method for processing track data in multimedia resources according to an embodiment of the present application
  • FIG. 7 shows a flowchart of a method for processing track data in multimedia resources according to an embodiment of the present application
  • FIG. 8 shows a flowchart of a method for processing track data in multimedia resources according to an embodiment of the present application
  • FIG. 9 shows a block diagram of an apparatus for processing track data in multimedia resources according to an embodiment of the present application.
  • FIG. 10 shows a block diagram of an apparatus for processing track data in multimedia resources according to an embodiment of the present application
  • FIG. 11 shows a schematic structural diagram of a computer system suitable for implementing the electronic device of the embodiment of the present application.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concepts of example embodiments to those skilled in the art.
  • the "plurality” mentioned in this article refers to two or more than two.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships. For example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character “/” generally indicates that the contextual objects are an "or” relationship.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture 100 includes a plurality of end devices that can communicate with each other through, for example, a network 150 .
  • the system architecture 100 may include a first terminal device 110 and a second terminal device 120 interconnected by a network 150 .
  • the first terminal device 110 and the second terminal device 120 perform unidirectional data transmission.
  • the first terminal device 110 can encode video data (such as a video picture stream collected by the first terminal device 110) to be transmitted to the second terminal device 120 through the network 150, and the encoded video data is represented by one or more An encoded video stream is transmitted.
  • the second terminal device 120 may receive encoded video data from the network 150, decode the encoded video data to restore the video data, and display video pictures according to the restored video data.
  • the system architecture 100 may include a third terminal device 130 and a fourth terminal device 140 performing bidirectional transmission of encoded video data, such as may occur during a video conference.
  • each of the third terminal device 130 and the fourth terminal device 140 can encode video data (such as a stream of video pictures captured by the terminal device) for transmission to the third terminal device via the network 150 130 and the other terminal device in the fourth terminal device 140 .
  • Each of the third terminal device 130 and the fourth terminal device 140 can also receive the encoded video data transmitted by the other terminal device of the third terminal device 130 and the fourth terminal device 140, and can modify the encoded video data.
  • the video data is decoded to recover the video data, and video pictures may be displayed on an accessible display device based on the recovered video data.
  • the first terminal device 110, the second terminal device 120, the third terminal device 130 and the fourth terminal device 140 may be servers, personal computers and smart phones, but the principles disclosed in this application may not be limited thereto . Embodiments disclosed herein are suitable for use with laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment.
  • the network 150 represents any number of networks, including for example wired and/or wireless communication networks, that communicate encoded video data between the first terminal device 110, the second terminal device 120, the third terminal device 130 and the fourth terminal device 140.
  • Communication network 150 may exchange data in circuit-switched and/or packet-switched channels.
  • the network may include a telecommunications network, a local area network, a wide area network and/or the Internet. For purposes of this application, unless explained below, the architecture and topology of network 150 may be immaterial to the operation of the present disclosure.
  • FIG. 2 shows how a video encoding device and a video decoding device are placed in a streaming environment.
  • the subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, videoconferencing, digital TV (television), storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.
  • the streaming transmission system may include an acquisition subsystem 213 , the acquisition subsystem 213 may include a video source 201 such as a digital camera, and the video source creates an uncompressed video picture stream 202 .
  • video picture stream 202 includes samples captured by a digital camera. Compared to the encoded video data 204 (or the encoded video bitstream 204), the video picture stream 202 is depicted as a thick line to emphasize the high data volume of the video picture stream.
  • the stream of video pictures 202 may be processed by an electronic device 220 including a video encoding device 203 coupled to a video source 201 .
  • Video encoding device 203 may include hardware, software, or a combination of hardware and software to implement or implement aspects of the disclosed subject matter as described in more detail below.
  • the encoded video data 204 (or encoded video code stream 204 ) is depicted as a thinner line to emphasize the lower data volume of the encoded video data 204 (or encoded video code stream 204 ) compared to the video picture stream 202 . 204), which may be stored on the streaming server 205 for future use.
  • One or more streaming client subsystems such as client subsystem 206 and client subsystem 208 in FIG. 2 , may access streaming server 205 to retrieve copies 207 and 209 of encoded video data 204 .
  • Client subsystem 206 may include, for example, video decoding device 210 in electronic device 230 .
  • a video decoding device 210 decodes an incoming copy 207 of encoded video data and produces an output video picture stream 211 that may be presented on a display 212, such as a display screen, or another presentation device.
  • encoded video data 204, video data 207, and video data 209 may be encoded according to certain video encoding/compression standards. Examples of these standards include the ITU-T H.265 standard, the Chinese National Video Coding Standard AVS (Audio Video Coding Standard) and the like. This application can be used in the context of AVS.
  • the electronic device 220 and the electronic device 230 may include other components not shown in the figure.
  • the electronic device 220 may include a video decoding device
  • the electronic device 230 may also include a video encoding device.
  • HEVC High Efficiency Video Coding, high-efficiency video coding
  • VVC Very Video Coding, multi-functional video coding
  • AVS Advanced Video Coding, multi-functional video coding
  • CTU Coding Tree Unit, coding tree unit
  • LCU Large Coding Unit, largest coding unit
  • the CTU can be further divided into finer divisions to obtain one or more basic coding units CU, which are the most basic elements in a coding process.
  • Predictive Coding includes intra-frame prediction and inter-frame prediction. After the original video signal is predicted by the selected reconstructed video signal, a residual video signal is obtained. The encoder needs to decide which predictive coding mode to choose for the current CU, and inform the decoder. Among them, intra-frame prediction means that the predicted signal comes from the area that has been coded and reconstructed in the same image; inter-frame prediction means that the predicted signal comes from another image (called a reference image) that has been encoded and is different from the current image. .
  • Transform & Quantization After the residual video signal undergoes transformation operations such as DFT (Discrete Fourier Transform, Discrete Fourier Transform), DCT (Discrete Cosine Transform, Discrete Cosine Transform), the signal is converted into the transform domain, are called transformation coefficients. The transform coefficients are further subjected to a lossy quantization operation, and certain information is lost, so that the quantized signal is conducive to compressed expression. In some video coding standards, there may be more than one transformation method for selection, so the encoder also needs to select one of the transformation methods for the current CU and inform the decoder. The fineness of quantization is usually determined by the quantization parameter (Quantization Parameter, referred to as QP).
  • QP quantization Parameter
  • a larger value of QP means that coefficients with a larger range of values will be quantized into the same output, which usually results in greater distortion and Lower code rate; on the contrary, the QP value is smaller, which means that the coefficients with a smaller range of values will be quantized to the same output, so it usually brings smaller distortion and corresponds to a higher code rate.
  • Entropy Coding or Statistical Coding The quantized transform domain signal will be statistically compressed and encoded according to the frequency of occurrence of each value, and finally output a binary (0 or 1) compressed code stream. At the same time, other information generated by encoding, such as the selected encoding mode, motion vector data, etc., also needs to be entropy encoded to reduce the bit rate.
  • Statistical coding is a lossless coding method that can effectively reduce the bit rate required to express the same signal. Common statistical coding methods include variable length coding (Variable Length Coding, VLC for short) or context-based binary arithmetic coding ( Content Adaptive Binary Arithmetic Coding, referred to as CABAC).
  • Loop Filtering The changed and quantized signal will be reconstructed through inverse quantization, inverse transformation and prediction compensation operations. Compared with the original image, due to the influence of quantization, some information of the reconstructed image is different from the original image, that is, the reconstructed image will produce distortion (Distortion). Therefore, filtering operations can be performed on the reconstructed image, such as filters such as Deblocking filter (DB for short), SAO (Sample Adaptive Offset, adaptive pixel compensation) or ALF (Adaptive Loop Filter, adaptive loop filter) , which can effectively reduce the degree of distortion generated by quantization. Since these filtered reconstructed images will be used as references for subsequent encoded images to predict future image signals, the above filtering operation is also called in-loop filtering, that is, a filtering operation in an encoding loop.
  • DB Deblocking filter
  • SAO Sample Adaptive Offset, adaptive pixel compensation
  • ALF Adaptive Loop Filter, adaptive loop filter
  • FIG. 3 shows a basic flow chart of a video encoder, in which intra prediction is taken as an example for illustration.
  • the original image signal s k [x,y] and the predicted image signal Do the difference operation to get the residual signal u k [x,y].
  • the residual signal u k [x, y] is transformed and quantized to obtain quantized coefficients.
  • the quantized coefficients are encoded by entropy coding to obtain the coded bit stream, and on the other hand, the reconstructed residual is obtained by inverse quantization and inverse transformation.
  • the reconstructed image signal s' k [x, y] is output through loop filtering.
  • the reconstructed image signal s' k [x, y] can be used as a reference image of the next frame for motion estimation and motion compensation prediction. Then based on the result of motion compensation prediction s' r [x m x ,y m y ] and the result of intra prediction Get the predicted image signal of the next frame And continue to repeat the above process until the encoding is completed.
  • the prediction signal corresponding to the CU can be obtained, and then the reconstruction signal can be obtained after adding the residual signal and the prediction signal, and the reconstruction signal is then subjected to operations such as loop filtering to generate the final output signal.
  • the overall transmission process of the video file is shown in Figure 4.
  • the video file is obtained through video capture, and then the video file is transmitted to the receiver after video encoding and video file encapsulation.
  • the receiver After receiving the video file, the receiver decapsulates the video file, performs video decoding after decapsulation, and finally presents the decoded video.
  • a main bitstream main bitstream
  • a knowledge image bitstream library picture bitstream
  • the knowledge image bitstream can be referred to Image frames in the stream.
  • the image frames in the knowledge image bitstream are knowledge images.
  • a knowledge picture can be a special type of I-frame picture, which, as an independent picture, can be referenced by B-frames and/or P-frames in the main bitstream during decoding.
  • the difference between a knowledge image and an I-frame in the main bitstream is that a knowledge image is not used for display rendering.
  • the master bitstream and the knowledge image bitstream may correspond to the same original video sequence.
  • an EssentialProperty element whose @schemeIdUri attribute is "urn:avs:ims:2018:ds" represents a segment-dependent descriptor.
  • At least one fragment-dependent descriptor is specified at the presentation layer, but should not be specified at the MPD (media presentation description) layer and the adaptation set layer.
  • the fragment dependency descriptor indicates that each fragment in each representation has a non-temporal dependency relationship with other fragments (which may be fragments in the same representation or fragments in different representations), indicating that other fragments are dependent).
  • the identifier (URL or indicator) and the image number used for the compression layer in the fragment should be included in this descriptor.
  • the related art indicates the sample information of the knowledge image referenced by some samples in the main bit stream, it also gives the dependencies between the segment levels from the signaling level.
  • these sample-level dependencies and associations need to be obtained by the decoder when parsing to specific sample fragments. If the data receiver has not requested or decoded the corresponding knowledge image track before, it needs to temporarily request or decode the knowledge image track, which will bring unnecessary delay and reduce the encoding and decoding efficiency of media resources.
  • the technical solution of the embodiment of the present application proposes a new processing scheme for track data in multimedia resources, so that the data receiver can obtain the association relationship between each track data in advance according to the signaling file, so as to decide whether to acquire knowledge Image track data and what kind of knowledge image track data is requested, under the premise of ensuring a reasonable allocation of network and CPU resources, avoids the unnecessary delay caused by the need to temporarily obtain knowledge image track data, and improves the encoding and decoding efficiency of media resources .
  • Fig. 6 shows a flowchart of a method for processing track data in multimedia resources according to an embodiment of the present application.
  • the method for processing the track data in the multimedia resource can be executed by a media playing device, and the media playing device can be a smart phone, a tablet computer, and the like.
  • the method for processing track data in the multimedia resource includes at least step S610 to step S630, which are described in detail as follows:
  • a signaling file corresponding to the multimedia resource is received, and the signaling file includes descriptors respectively corresponding to multiple track data of the multimedia resource.
  • the plurality of track data includes main bit stream track data corresponding to the main bit stream and knowledge image track data corresponding to the knowledge image bit stream.
  • the dependency identifier contained in the descriptor corresponding to the main bitstream track data points to the descriptor corresponding to the knowledge image track data on which it depends.
  • the multimedia resource includes specific media resource data, for example, the specific content (video picture, introduction audio, etc.) of the introduction video of item A.
  • the signaling file corresponding to the multimedia resource may be a DASH (Dynamic Adaptive Streaming over HTTP, HTTP-based dynamic adaptive streaming) signaling file.
  • the multiple track data of the multimedia resource may contain one knowledge image track data, or may contain multiple knowledge image track data.
  • the descriptor corresponding to the knowledge image track data may contain first element information, and the first element information is used to indicate that the descriptor containing the first element information is a descriptor corresponding to the knowledge image track data .
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data can be The second element information is included, and the second element information is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains The third element information, the value of the third element information is used to indicate whether each knowledge image track data in the at least two knowledge image track data is dependent on a plurality of main bit stream track data. For example, if the value of the third element information is 1, it means that the knowledge image track data is dependent on multiple main bit stream track data; if the value of the third element information is 0, it means that the knowledge image track data is A master bitstream track data depends on.
  • the The descriptor corresponding to the target knowledge image track data also includes fourth element information, where the fourth element information is used to indicate the frame rate of the specified main bit stream track data among the plurality of main bit stream track data.
  • the specified main bitstream track data may be the plurality of main bitstream track data, or may be part of the main bitstream track data.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data also contains A sample index identifier is included, and the sample index identifier is used to indicate a sample index number range used to index each knowledge image track data in the at least two knowledge image track data in the main bit stream track data.
  • the sample index identifier includes fifth element information and sixth element information
  • the value of the fifth element information indicates that each knowledge image track in the at least two knowledge image track data is indexed in the main bit stream track data
  • the minimum value of the sample index number of the data, the value of the sixth element information indicates the maximum value of the sample index number indexing each knowledge image track data in the at least two knowledge image track data in the main bit stream track data.
  • step S620 the signaling file is analyzed, and the dependency relationship between the master bitstream track data and the knowledge image track data is determined according to the dependency identifier.
  • step S630 according to the dependency relationship between the main bit stream track data and the knowledge image track data, the knowledge image track data and the main bit stream track data are sequentially acquired from the data source side.
  • the main bitstream track data is acquired from the data source side after the knowledge image track data is acquired.
  • the knowledge image track data that the main bit stream track data needs to be referred to first can be obtained first, and then in the process of decoding the main bit stream track data, if the decoded When it is necessary to refer to the positions of other knowledge image orbit data, other knowledge image orbit data is obtained.
  • the decoding process may also be performed after the main bit stream track data and all knowledge image track data are acquired.
  • the main bit stream track data may contain an index identifier, which is used to indicate the knowledge image track data on which the main bit stream track data depends or to indicate the knowledge image track data on which the main bit stream track data depends.
  • Knowledge image track group is used to indicate the knowledge image track data on which the main bit stream track data depends or to indicate the knowledge image track data on which the main bit stream track data depends.
  • the main bit stream track data includes a track reference type data box, and the track reference type data box includes a reference type field, where the reference type field is used to indicate an index identifier. Based on this, the value of the reference type field is used to indicate the knowledge image track data or the knowledge image track group on which the main bitstream track data depends.
  • the main bitstream track data may include a track reference data box, in this case, the track reference data box includes the track reference type data box.
  • the multiple track data of the multimedia resource may contain at least two knowledge image track data, and each of the knowledge image track data in the at least two knowledge image track data contains a track group identifier, which The track group identifier is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the data receiver sequentially acquires the knowledge image track data and the main bit stream track data from the data source according to the dependency relationship, it can determine the decoding sequence according to the dependency relationship; then according to the determined decoding sequence, The knowledge image track data and the main bit stream track data are decoded sequentially to obtain multimedia resources.
  • the main bit stream track data can be decoded first, and when decoding the main bit stream track data needs to refer to the sample index number interval of the knowledge image track data, according to the sample index number interval, from multiple The knowledge image track data to be referenced is determined in the knowledge image track data, and then the knowledge image track data to be referred to is decoded.
  • the decoding process can be performed after the main bit stream track data and all the knowledge image track data are obtained; or the knowledge image track data that needs to be referred first to the main bit stream track data can be obtained first, Then, in the process of decoding the track data of the main bit stream, if the decoding reaches a position where other knowledge image track data needs to be referred to, other knowledge image track data is obtained.
  • each knowledge image track data in the at least two knowledge image track data also includes a parameter indicating whether the knowledge image track data
  • the first field information that is depended on by multiple master bitstream track data.
  • the first field information may be, for example, multi_main_bitstream. If the value of multi_main_bitstream is 1, it means that the knowledge image track data is dependent on multiple main bit stream track data; if the value of multi_main_bitstream is 0, it means that the knowledge image track data is dependent on one main bit stream track data.
  • the knowledge image track data also includes an index knowledge image track indicating that the main bit stream track data A field for the minimum value of the sample index number of the data, and a field indicating the maximum value of the sample index number for indexing the knowledge image track data in the one main bitstream track data.
  • the field indicating the minimum value of the sample index number may be sample_number_min
  • the field indicating the maximum value of the sample index number may be sample_number_max.
  • the knowledge image track data also includes The field of the minimum value of the sample index number of the index knowledge image track data in each main bit stream track data respectively indicates that the sample index number of the index knowledge image track data in each main bit stream track data of the plurality of main bit stream track data is the largest A field for values, and a field for indicating a frame rate of each of the plurality of main bitstream track data.
  • the field indicating the minimum value of the sample index number may be sample_number_min
  • the field indicating the maximum value of the sample index number may be sample_number_max.
  • the knowledge image track data also includes an index knowledge image track indicating that the main bit stream track data A field of the sample group number of data, and a field indicating the sample group index number of the index knowledge image track data in the one master bitstream track data.
  • the field indicating the number of sample groups may be num_sample_groups, and the field indicating the sample group index number may be group_description_index.
  • the knowledge image track data also includes A field for the sample group number of the index knowledge image track data in each main bit stream track data, a field respectively indicating a sample group index number of the index knowledge image track data in each of the main bit stream track data of the plurality of main bit stream track data , and a field for indicating the frame rate of each of the main bitstream track data among the plurality of main bitstream track data.
  • the field indicating the number of sample groups may be num_sample_groups
  • the field indicating the sample group index number may be group_description_index.
  • Fig. 6 illustrates the technical solution of the embodiment of the present application from the receiver of the media resource.
  • the implementation details of the embodiment of the present application are further described below in conjunction with Fig. 7 from the data source side:
  • FIG. 7 shows a flow chart of a method for processing track data in a multimedia resource according to an embodiment of the present application.
  • the method for processing track data in a multimedia resource can be performed by a media generation device, which can be a smart phone or a tablet. computer etc.
  • the method for processing track data in the multimedia resource includes at least step S710 to step S720, which are described in detail as follows:
  • a signaling file corresponding to the multimedia resource is generated, the signaling file includes descriptors corresponding to multiple track data of the multimedia resource, and the multiple track data includes the main bit stream track data corresponding to the main bit stream
  • the dependency identifier contained in the descriptor corresponding to the main bit stream track data points to the descriptor corresponding to the knowledge image track data.
  • the multimedia resource includes specific media resource data, for example, the specific content (video picture, introduction audio, etc.) of the introduction video of item A.
  • the signaling file corresponding to the multimedia resource may be a DASH signaling file.
  • the multiple track data of the multimedia resource may contain one knowledge image track data, or may contain multiple knowledge image track data.
  • the descriptor corresponding to the knowledge image track data may contain first element information, and the first element information is used to indicate that the descriptor containing the first element information is a descriptor corresponding to the knowledge image track data .
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data can be The second element information is included, and the second element information is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains The third element information, the value of the third element information is used to indicate whether each knowledge image track data in the at least two knowledge image track data is dependent on a plurality of main bit stream track data. For example, if the value of the third element information is 1, it means that the knowledge image track data is dependent on multiple main bit stream tracks; if the value of the third element information is 0, it means that the knowledge image track data is A master bitstream track is relied upon.
  • the The descriptor corresponding to the target knowledge image track data also includes fourth element information, where the fourth element information is used to indicate the frame rate of the specified main bit stream track data among the plurality of main bit stream track data.
  • the specified main bitstream track data may be the plurality of main bitstream track data, or may be part of the main bitstream track data.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data also contains A sample index identifier is included, and the sample index identifier is used to indicate a sample index number interval in the main bit stream track data for indexing each knowledge image track data in the at least two knowledge image track data.
  • the sample index identifier includes fifth element information and sixth element information
  • the value of the fifth element information indicates that each knowledge image track in the at least two knowledge image track data is indexed in the main bit stream track data
  • the minimum value of the sample index number of the data, the value of the sixth element information indicates the maximum value of the sample index number indexing each knowledge image track data in the at least two knowledge image track data in the main bit stream track data.
  • step S720 the signaling file is sent to the data receiver, so that the data side determines the dependency relationship between the main bit stream track data and the knowledge image track data according to the dependency identifier in the signaling file, and according to the dependency relationship from The data source side sequentially acquires the knowledge image track data and the main bit stream track data.
  • the data source side may also generate main bit stream track data corresponding to the main bit stream and knowledge image track data corresponding to the knowledge image bit stream.
  • the main bit stream track data includes an index identifier, which is used to indicate the knowledge image track data on which the main bit stream track data depends or to indicate the knowledge image track group on which the main bit stream track data depends.
  • the main bit stream track data includes a track reference type data box, and the track reference type data box includes a reference type field, where the reference type field is used to indicate an index identifier. Based on this, the value of the reference type field is used to indicate the knowledge image track data or the knowledge image track group on which the main bitstream track data depends.
  • the main bitstream track data may include a track reference data box, in this case, the track reference data box includes the track reference type data box.
  • the multiple track data of the multimedia resource may contain at least two knowledge image track data, and each of the knowledge image track data in the at least two knowledge image track data contains a track group identifier, which The track group identifier is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the server is used as the data source side and the client is used as the data receiver as an example for illustration, which may specifically include the following steps S801 to S807:
  • step S801 the server generates a bit stream.
  • the server can generate a main bit stream and one or more knowledge image bit streams during the video encoding link.
  • step S802 the server encapsulates and generates track data.
  • the server in the video file encapsulation link, can encapsulate the main bit stream into a separate file track, and also encapsulate each knowledge image bit stream into a separate file track, and based on the main bit stream and
  • the reference relationship between the knowledge image bit streams during decoding is to associate the main bit stream track and the knowledge image track through the index relationship between the tracks. If the main bitstream track needs to refer to multiple knowledge image tracks, these knowledge image tracks can be associated through track groups, and different knowledge image tracks can be distinguished by sample index range information, description information, etc. in the track group.
  • a master bitstream track may be associated with a knowledge image track, or may be associated with a knowledge image track group.
  • Multiple master bitstream tracks (generally multiple tracks of the same content with different frame rates) can also be associated with the same knowledge image track or knowledge image track group.
  • step S803 the server generates DASH signaling.
  • the server in the signaling generation link, can specially mark the media resources corresponding to the knowledge image bit stream, and indicate the dependency relationship between the main bit stream media resource and the knowledge image bit stream media resource . If the main stream media resource needs to refer to multiple knowledge image media resources, these knowledge image media resources can be associated with each other and differentiated by using sample index range information, description information, etc. The above information is included in the DASH signaling.
  • step S804 the server sends DASH signaling to the client.
  • step S805 the client requests the media file from the server according to the DASH signaling.
  • the client judges according to the signaling file whether the required media resource depends on the media resource corresponding to the knowledge image bitstream, and if so, requests the media resource corresponding to the knowledge image bitstream first. If the media resources corresponding to multiple knowledge image bit streams are relied upon, the corresponding media resources are requested according to the sample index range information to which the currently presented frame belongs.
  • step S806 the server transmits the media file to the client.
  • step S807 the client decapsulates the media file and presents the corresponding media resource.
  • the client after the client requests the corresponding media resource, it can preferentially decode the track data corresponding to the knowledge image bit stream according to the index relationship between file tracks. If there is a knowledge image track group, that is, the track data corresponding to the main bit stream depends on the track data corresponding to multiple knowledge image bit streams, the corresponding track data is decoded according to the sample index range information to which the currently presented frame belongs.
  • the master bitstream track can be indexed to the knowledge image track on which its decoding depends through the track index data box.
  • the corresponding TrackReferenceTypeBoxes should be added to the TrackReferenceBox (track reference data box) of the main bit stream track, and the track_IDs in the TrackReferenceTypeBoxes data box indicate the knowledge image track or knowledge image track group indexed by the current main bit stream track.
  • the index between the main bit stream track and the knowledge image track is identified by the corresponding reference_type (reference type) index type in TrackReferenceTypeBoxes, and the type field is defined as follows:
  • the indexed track is the knowledge image track corresponding to the current track.
  • a master bitstream track can be indexed to a knowledge image track or knowledge image track group by 'a3lr'; multiple master bitstream tracks can be indexed to a knowledge image track or knowledge image track group by 'a3lr' .
  • a definition of the knowledge image track group is as follows:
  • the track group of the knowledge image is obtained by extending the track group data box, identified by the track group type of 'a3lg'.
  • tracks with the same group ID belong to the same track group.
  • the semantics of each field in Avs3LibraryGroupBox are as follows:
  • multi_main_bitstream indicates whether the knowledge image track is referenced by multiple main bitstream tracks.
  • the value of this field is 1, indicating that the knowledge image track is referenced by multiple main bitstream tracks; the value of this field is 0, indicating that the knowledge image track is only used by one Master bitstream track reference. In one embodiment, the default value of this field is 0.
  • sample_number_min indicates the minimum value of the sample index number indexing the current knowledge image track in the main bit stream track or the main bit stream track of a specific frame rate.
  • sample_number_max indicates the maximum value of the sample index number indexing the current knowledge image track in the main bit stream track or the main bit stream track of a specific frame rate.
  • frame_rate When the knowledge image track is referenced by multiple main bitstream tracks, it indicates the frame rate of a certain track among the multiple main bitstream tracks.
  • track_description is a null-terminated string indicating the description information of the knowledge image track.
  • sample group information can also be used to distinguish different tracks in the same knowledge image track group.
  • another definition of the knowledge image track group is as follows:
  • the track group of the knowledge image is obtained by extending the track group data box, identified by the track group type of 'a3lg'.
  • tracks with the same group ID belong to the same track group.
  • the semantics of each field in Avs3LibraryGroupBox are as follows:
  • multi_main_bitstream indicates whether the knowledge image track is referenced by multiple main bitstream tracks.
  • the value of this field is 1, indicating that the knowledge image track is referenced by multiple main bitstream tracks; the value of this field is 0, indicating that the knowledge image track is only used by one Master bitstream track reference. In one embodiment, the default value of this field is 0.
  • num_sample_groups indicates the number of LibrarySampleGroupEntry sample groups indexing the current knowledge image track in the main bitstream track or the main bitstream track of a specific frame rate.
  • group_description_index indicates the index number of the LibrarySampleGroupEntry sample group indexing the current knowledge image track in the main bitstream track or the main bitstream track of a specific frame rate.
  • frame_rate When the knowledge image track is referenced by multiple main bitstream tracks, it indicates the frame rate of a certain track among the multiple main bitstream tracks.
  • track_description is a null-terminated string indicating the description information of the knowledge image track.
  • the knowledge image descriptor Avs3Library is a SupplementalProperty element, and its @schemeIdUri attribute is "urn:avs:ims:2018:av3l".
  • the descriptor can exist at the level of adaptation set or the level of representation. When the descriptor exists at the adaptation set level, it describes all the representations in the adaptation set; when it exists at the representation level, it describes the corresponding representations.
  • the Avs3Library descriptor indicates the relevant attributes of the knowledge image representation, and the specific attributes are shown in Table 1 below:
  • the server encodes them respectively to generate a bitstream. For example, for media content A, a main bit stream StreamA and a knowledge image bit stream StreamAL are generated; for media content B, a main bit stream StreamB is generated.
  • the server After the bit stream is generated, the server encapsulates StreamA as TrackA (track A), streamAL as TrackAL, and uses a TrackReferenceTypeBox of type 'a3lr' in TrackA to index to TrackAL.
  • the server encapsulates StreamB as TrackB. Since TrackB does not have a corresponding knowledge image track, there is no need to include a TrackReferenceTypeBox of type 'a3lr' in TrackB.
  • the server After encapsulation, for TrackA and TrackAL, the server is described as one representation (for example, RA and RAL) respectively.
  • the @dependencyId (dependency identifier) attribute of RA should point to RAL, indicating that the consumption of RA depends on RAL, and RAL needs to be described by Avs3Library descriptor.
  • the server For TrackB, the server describes it as a representation (for example, RB), without special extension.
  • the server After describing the orbit data, the server generates DASH signaling and sends the signaling file to the client.
  • the client After receiving the signaling file, the client can determine the dependency relationship between descriptors according to the signaling file, for example, RA depends on RAL and RAL is a knowledge image media resource. Assuming that client 1 needs to request the media resources corresponding to RA, and client 2 needs to request the media resources corresponding to RB, then client 1 needs to first request the media resources corresponding to RAL from the server, and then request the media resources corresponding to RA from the server . The client 2 can directly request the media resource corresponding to the RB.
  • the client 1 After receiving the media resource corresponding to the RAL and the media resource corresponding to the RA, the client 1 first decodes the media resource corresponding to the RAL, and then decodes the media resource corresponding to the RA. After receiving the media resource corresponding to the RB, the client 2 can directly decode the media resource corresponding to the RB.
  • the media content contains a main bit stream and a knowledge image bit stream.
  • the following takes the media content including a main bit stream and multiple knowledge image bit streams as an example to illustrate again:
  • the server encodes it to generate the main bit stream StreamA and the knowledge image bit streams StreamAL1 and StreamAL2.
  • the server After generating the bit stream, the server encapsulates StreamA as TrackA, StreamAL1 as TrackAL1, and StreamAL2 as TrackAL2. At the same time, associate TrackAL1 and TrackAL2 with a track group of type 'a3lg', where the parameters are as follows:
  • TrackReferenceTypeBox of type 'a3lr' in TrackA index to the corresponding track group (index by group_id, group_id is 100 in this example).
  • the server After encapsulation, for TrackA, TrackAL1 and TrackAL2, the server is described as a representation (for example, RA, RAL1 and RAL2), where the @dependencyId (dependency identifier) attribute of RA should point to RAL1 and RAL2, indicating that RA consumption depends on RAL1 and RAL2.
  • RAL1 and RAL2 need to be described by the Avs3Library descriptor, as follows:
  • the server After describing the orbit data, the server generates DASH signaling and sends the signaling file to the client.
  • the client can determine the dependency relationship between descriptors according to the signaling file. For example, RA depends on RAL1 and RAL2, and RAL1 and RAL2 are knowledge image media resources, and RAL1 corresponds to the top samples in RA. Assuming that client 1 needs to request media resources corresponding to RA, then client 1 needs to first request media resources corresponding to RAL1 and RA from the server. When the client 1 consumes the RA until the 101st sample is reached, it requests the media resource corresponding to the RAL2 from the server.
  • the technical solutions of the above embodiments of the present application aim at the characteristic of the knowledge image in the AVS3 codec standard, and propose a method for encapsulating and transmitting signaling at the file track level.
  • the knowledge image track and the main bit stream track can be flexibly associated at the file track level, and this association relationship can be indicated through signaling.
  • the client can decide whether to request a knowledge image track and what kind of knowledge image track to request according to the information.
  • the client can decide the order of decoding different tracks based on these information, and finally allocate network and CPU resources reasonably.
  • the device embodiments of the present application are introduced below, which can be used to implement the method for processing track data in multimedia resources in the above-mentioned embodiments of the present application.
  • the device embodiments of the present application please refer to the embodiments of the method for processing track data in multimedia resources mentioned above in the present application.
  • the processing device for track data in a multimedia resource can be set in a media playback device, and the media playback device can be a smart phone or a tablet computer Wait.
  • an apparatus 900 for processing track data in multimedia resources includes: a receiving unit 902 , an analyzing unit 904 and an acquiring unit 906 .
  • the receiving unit 902 is configured to receive a signaling file corresponding to a multimedia resource, the signaling file includes descriptors corresponding to a plurality of track data of the multimedia resource, and the plurality of track data includes The main bit stream track data and the knowledge image track data corresponding to the knowledge image bit stream, the dependency identifier contained in the descriptor corresponding to the main bit stream track data points to the descriptor corresponding to the knowledge image track data; parsing unit 904 It is configured to parse the signaling file, and determine the dependency relationship between the main bit stream track data and the knowledge image track data according to the dependency identifier; the obtaining unit 906 is configured to obtain from the data source side according to the dependency relationship Acquire the knowledge image track data and the main bit stream track data in sequence.
  • the descriptor corresponding to the knowledge image track data contains first element information, and the first element information is used to indicate that the descriptor containing the first element information is The descriptor corresponding to the knowledge image track data.
  • the plurality of track data includes at least two knowledge image track data
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains Second element information, where the second element information is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data contains third element information, and the value of the third element information is used to indicate Whether each knowledge image track data in the at least two knowledge image track data is dependent on a plurality of main bit stream track data.
  • the target knowledge image orbit data if there is target knowledge image orbit data on which a plurality of main bit stream orbit data depends on the at least two knowledge image orbit data, then the target knowledge image orbit data
  • the corresponding descriptor further includes fourth element information, and the fourth element information is used to indicate the frame rate of the specified main bit stream track data in the plurality of main bit stream track data.
  • the descriptor corresponding to each knowledge image track data in the at least two knowledge image track data further includes a sample index identifier, and the sample index identifier is used to indicate the main bit stream In the track data, a sample index number interval of each knowledge image track data in the at least two knowledge image track data is indexed.
  • the sample index identifier includes fifth element information and sixth element information
  • the value of the fifth element information indicates that the at least two The minimum value of the sample index number of each knowledge image track data in the knowledge image track data
  • the value of the sixth element information indicates the sample indexing each knowledge image track data in the at least two knowledge image track data in the main bit stream track data The maximum index number.
  • the main bit stream track data includes an index identifier, and the index sign is used to indicate the knowledge image track data on which the main bit stream track data depends or to indicate The knowledge image track group on which the master bitstream track data depends.
  • the track data of the main bit stream includes a track reference type data box, and the track reference type data box includes a reference type field, and the reference type field is used to represent the The above index ID.
  • the track data of the main bit stream includes a track reference data box
  • the track reference data box includes the track reference type data box.
  • the plurality of track data includes at least two knowledge image track data, and each knowledge image track data in the at least two knowledge image track data includes a track group identifier , the track group identifier is used to indicate the track group where each knowledge image track data in the at least two knowledge image track data is located.
  • each knowledge image track data in the at least two knowledge image track data also includes a parameter indicating whether each knowledge image track data in the at least two knowledge image track data
  • the first field information that is dependent on multiple main bit stream track data; if the first field information indicates that the knowledge image track data is dependent on one main bit stream track data, then the knowledge image track data also includes the information indicating the A field of the minimum sample index number indexing the knowledge image track data in one main bitstream track data, and a field indicating the maximum value of the sample index number indexing the knowledge image track data in the one main bitstream track data.
  • the knowledge image track data also includes the information respectively indicating the The field of the minimum value of the sample index number indexing the knowledge image track data in each of the main bit stream track data in the plurality of main bit stream track data respectively indicates that each of the main bit stream track data in the plurality of main bit stream track data
  • each knowledge image track data in the at least two knowledge image track data also includes a parameter indicating whether each knowledge image track data in the at least two knowledge image track data
  • the first field information that is dependent on multiple main bit stream track data; if the first field information indicates that the knowledge image track data is dependent on one main bit stream track data, then the knowledge image track data also includes the information indicating the A field for indexing the sample group number of the knowledge image track data in one main bitstream track data, and a field indicating a sample group index number for indexing the knowledge image track data in the one main bitstream track data.
  • the knowledge image track data also includes the information respectively indicating the The field of the number of sample groups indexing the knowledge image track data in each of the main bit stream track data in the plurality of main bit stream track data respectively indicates the A field for the sample group index number of the knowledge image track data, and a field for indicating the frame rate of each main bit stream track data in the plurality of main bit stream track data.
  • the apparatus 900 for processing track data in multimedia resources further includes: a decoding unit configured to determine the decoding sequence according to the dependency relationship; according to the decoding sequence, sequentially The knowledge image track data and the main bit stream track data are decoded to obtain the multimedia resource.
  • the decoding unit is configured to: decode the main bit stream track data; and need to refer to the sample index number of the knowledge image track data in decoding the main bit stream track data During the interval, according to the interval of the sample index number, determine the knowledge image orbit data that needs to be referenced from a plurality of knowledge image orbit data; decode the knowledge image orbit data that needs to be referred to.
  • Fig. 10 shows a block diagram of a processing device for track data in a multimedia resource according to an embodiment of the present application.
  • the processing device for track data in a multimedia resource can be set in a media generation device, and the media generation device can be a smart phone or a tablet computer Wait.
  • an apparatus 1000 for processing track data in multimedia resources includes: a generating unit 1002 and a sending unit 1004 .
  • the generating unit 1002 is configured to generate a signaling file corresponding to a multimedia resource, the signaling file includes descriptors corresponding to a plurality of track data of the multimedia resource, and the plurality of track data includes The main bit stream track data and the knowledge image track data corresponding to the knowledge image bit stream, the dependency identifier contained in the descriptor corresponding to the main bit stream track data points to the descriptor corresponding to the knowledge image track data; sending unit 1004 It is configured to send the signaling file to the data receiver, so that the data party determines the relationship between the main bit stream track data and the knowledge image track data according to the dependency identifier in the signaling file dependency relationship, and sequentially acquire the knowledge image track data and the main bitstream track data from the data source side according to the dependency relationship.
  • the generating unit 1002 is further configured to: before generating the signaling file corresponding to the multimedia resource, generate the corresponding main bit stream track data and knowledge image bit stream corresponding
  • the knowledge image track data of the main bit stream track data includes an index identifier, and the index identifier is used to indicate the knowledge image track data on which the main bit stream track data depends.
  • FIG. 11 shows a schematic structural diagram of a computer system suitable for implementing the electronic device of the embodiment of the present application.
  • the computer system 1100 includes a central processing unit (Central Processing Unit, CPU) 1101, which can be stored in a program in a read-only memory (Read-Only Memory, ROM) 1102 or loaded from a storage part 1108 to a random Access to programs in the memory (Random Access Memory, RAM) 1103 to perform various appropriate actions and processes, such as performing the methods described in the above-mentioned embodiments.
  • CPU Central Processing Unit
  • RAM Random Access Memory
  • RAM 1103 various programs and data necessary for system operation are also stored.
  • the CPU 1101, ROM 1102, and RAM 1103 are connected to each other via a bus 1104.
  • An input/output (Input/Output, I/O) interface 1105 is also connected to the bus 1104 .
  • the following components are connected to the I/O interface 1105: an input part 1106 including a keyboard, a mouse, etc.; an output part 1107 including a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc., and a speaker ; comprise the storage part 1108 of hard disk etc.; And comprise the communication part 1109 of the network interface card such as LAN (Local Area Network, local area network) card, modem etc. The communication section 1109 performs communication processing via a network such as the Internet.
  • a drive 1110 is also connected to the I/O interface 1105 as needed.
  • a removable medium 1111 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 1110 as necessary so that a computer program read therefrom is installed into the storage section 1108 as necessary.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes a computer program for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication portion 1109, and/or installed from removable media 1111.
  • this computer program is executed by a central processing unit (CPU) 1101, various functions defined in the system of the present application are performed.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiment of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable one of the above The combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which a computer-readable computer program is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • a computer program embodied on a computer readable medium can be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Stored Programmes (AREA)

Abstract

本申请实施例提供了一种多媒体资源中轨道数据的处理方法、装置、介质及设备。该多媒体资源中轨道数据的处理方法包括:接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,主位流轨道数据对应的描述子中包含的依赖项标识指向知识图像轨道数据对应的描述子;解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。

Description

多媒体资源中轨道数据的处理方法、装置、介质及设备
本申请要求2021年05月24日提交的申请号为202110567993.0、发明名称为“多媒体资源中轨道数据的处理方法、装置、介质及设备”的中国专利申请的优先权。
技术领域
本申请涉及计算机及通信技术领域,具体而言,涉及一种多媒体资源中轨道数据的处理方法、装置、介质及设备。
背景技术
在AVS3(第三代音视频编码标准)视频编解码技术中,为了提升视频压缩效率,提出了知识图像(library picture)的概念,即在对视频序列编码时,可生成主位流(main bitstream)和知识图像位流(library picture bitstream),并且主位流内的图像帧在解码时,可以参考知识图像位流中的图像帧。
发明内容
本申请的实施例提供了一种多媒体资源中轨道数据的处理方法、装置、介质及设备,进而至少在一定程度上可以根据信令文件提前获取到各个轨道数据之间的关联关系,避免了需要临时获取知识图像轨道数据而带来不必要的时延。
根据本申请实施例的一个方面,提供了一种多媒体资源中轨道数据的处理方法,包括:接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
根据本申请实施例的一个方面,提供了一种多媒体资源中轨道数据的处理方法,包括:生成多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;将所述信令文件发送给数据接收方,以使所述数据方根据所述信令文件中的所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系,并根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
根据本申请实施例的一个方面,提供了一种多媒体资源中轨道数据的处理装置,包括:接收单元,配置为接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对 应的描述子;解析单元,配置为解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;获取单元,配置为根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
在本申请的一些实施例中,基于前述方案,所述知识图像轨道数据对应的描述子中包含第一元素信息,所述第一元素信息用于指示包含所述第一元素信息的描述子为知识图像轨道数据对应的描述子。
在本申请的一些实施例中,基于前述方案,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第二元素信息,所述第二元素信息用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第三元素信息,所述第三元素信息的值用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖。
在本申请的一些实施例中,基于前述方案,若所述至少两个知识图像轨道数据中存在被多个主位流轨道数据所依赖的目标知识图像轨道数据,则所述目标知识图像轨道数据对应的描述子中还包含第四元素信息,所述第四元素信息用于指示所述多个主位流轨道数据中指定主位流轨道数据的帧率。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中还包含有样本索引标识,所述样本索引标识用于指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号区间。
在本申请的一些实施例中,基于前述方案,所述样本索引标识包括第五元素信息和第六元素信息,所述第五元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最小值,所述第六元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最大值。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数据或用于指示所述主位流轨道数据所依赖的知识图像轨道组。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含轨道参考类型数据盒,所述轨道参考类型数据盒中包含参考类型字段,所述参考类型字段用于表示所述索引标识。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含轨道参考数据盒,所述轨道参考数据盒包含所述轨道参考类型数据盒。
在本申请的一些实施例中,基于前述方案,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据中包含有轨道组标识,所述轨道组标识用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据中还包含用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息;若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所 依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段,以及指示主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段。
在本申请的一些实施例中,基于前述方案,若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据中还包含用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息;若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段,以及指示主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段。
在本申请的一些实施例中,基于前述方案,若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段。
在本申请的一些实施例中,基于前述方案,所述的多媒体资源中轨道数据的处理装置还包括:解码单元,配置为根据所述依赖关系确定解码顺序;根据所述解码顺序,依次对所述知识图像轨道数据和所述主位流轨道数据进行解码处理,得到所述多媒体资源。
在本申请的一些实施例中,基于前述方案,所述解码单元配置为:解码所述主位流轨道数据;在解码得到所述主位流轨道数据中需要参考知识图像轨道数据的样本索引号区间时,根据所述样本索引号区间,从多个知识图像轨道数据中确定需要参考的知识图像轨道数据;解码所述需要参考的知识图像轨道数据。
根据本申请实施例的一个方面,提供了一种多媒体资源中轨道数据的处理装置,包括:生成单元,配置为生成多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;发送单元,配置为将所述信令文件发送给数据接收方,以使所述数据方根据所述信令文件中的所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系,并根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
在本申请的一些实施例中,基于前述方案,所述生成单元还配置为:在生成多媒体资源对应的信令文件之前,生成主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据中包含有索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数 据。
根据本申请实施例的一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例中所述的多媒体资源中轨道数据的处理方法。
根据本申请实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上述实施例中所述的多媒体资源中轨道数据的处理方法。
根据本申请实施例的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各种可选实施例中提供的多媒体资源中轨道数据的处理方法。
在本申请的一些实施例所提供的技术方案中,通过接收多媒体资源对应的信令文件,根据该信令文件中包含的多个轨道数据分别对应的描述子,以及主位流轨道数据对应的描述子中包含的依赖项标识确定主位流轨道数据与知识图像轨道数据之间的依赖关系,然后根据该依赖关系从数据源侧依次获取知识图像轨道数据和主位流轨道数据,使得数据接收方能够根据信令文件提前获取到各个轨道数据之间的关联关系,以决定是否获取知识图像轨道数据以及请求何种知识图像轨道数据,在保证合理分配网络和CPU资源的前提下,避免了需要临时获取知识图像轨道数据而带来不必要的时延,提高了媒体资源的编解码效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图;
图2示出视频编码装置和视频解码装置在流式传输系统中的放置方式示意图;
图3示出了一个视频编码器的基本流程图;
图4示出了根据本申请实施例的视频文件的整体传输流程图;
图5示出了对视频序列编码生成主位流和知识图像位流的示意图;
图6示出了根据本申请实施例的多媒体资源中轨道数据的处理方法的流程图;
图7示出了根据本申请实施例的多媒体资源中轨道数据的处理方法的流程图;
图8示出了根据本申请实施例的多媒体资源中轨道数据的处理方法的流程图;
图9示出了根据本申请实施例的多媒体资源中轨道数据的处理装置的框图;
图10示出了根据本申请实施例的多媒体资源中轨道数据的处理装置的框图;
图11示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实 施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
需要说明的是:在本文中提及的“多个”是指两个或两个以上。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构100包括多个终端装置,所述终端装置可通过例如网络150彼此通信。举例来说,系统架构100可以包括通过网络150互连的第一终端装置110和第二终端装置120。在图1的实施例中,第一终端装置110和第二终端装置120执行单向数据传输。
举例来说,第一终端装置110可对视频数据(例如由第一终端装置110采集的视频图片流)进行编码以通过网络150传输到第二终端装置120,已编码的视频数据以一个或多个已编码视频码流形式传输。第二终端装置120可从网络150接收已编码视频数据,对已编码视频数据进行解码以恢复视频数据,并根据恢复的视频数据显示视频图片。
在本申请的一个实施例中,系统架构100可以包括执行已编码视频数据的双向传输的第三终端装置130和第四终端装置140,所述双向传输比如可以发生在视频会议期间。对于双向数据传输,第三终端装置130和第四终端装置140中的每个终端装置可对视频数据(例如由终端装置采集的视频图片流)进行编码,以通过网络150传输到第三终端装置130和第四终端装置140中的另一终端装置。第三终端装置130和第四终端装置140中的每个终端装置还可接收由第三终端装置130和第四终端装置140中的另一终端装置传输的已编码视频数据,且可对已编码视频数据进行解码以恢复视频数据,并可根据恢复的视频数据在可访问的显示装置上显示视频图片。
在图1的实施例中,第一终端装置110、第二终端装置120、第三终端装置130和第四终端装置140可为服务器、个人计算机和智能电话,但本申请公开的原理可不限于此。本申请公开的实施例适用于膝上型计算机、平板电脑、媒体播放器和/或专用视频会议设备。网络150表示在第一终端装置110、第二终端装置120、第三终端装置130和第四终端装置140之间传送已编码视频数据的任何数目的网络,包括例如有线和/或无线通信网络。通信网络150可在电路交换和/或分组交换信道中交换数 据。该网络可包括电信网络、局域网、广域网和/或互联网。出于本申请的目的,除非在下文中有所解释,否则网络150的架构和拓扑对于本申请公开的操作来说可能是无关紧要的。
在本申请的一个实施例中,图2示出视频编码装置和视频解码装置在流式传输环境中的放置方式。本申请所公开主题可同等地适用于其它支持视频的应用,包括例如视频会议、数字TV(television,电视机)、在包括CD、DVD、存储棒等的数字介质上存储压缩视频等等。
流式传输系统可包括采集子系统213,采集子系统213可包括数码相机等视频源201,视频源创建未压缩的视频图片流202。在实施例中,视频图片流202包括由数码相机拍摄的样本。相较于已编码的视频数据204(或已编码的视频码流204),视频图片流202被描绘为粗线以强调高数据量的视频图片流。视频图片流202可由电子装置220处理,电子装置220包括耦接到视频源201的视频编码装置203。视频编码装置203可包括硬件、软件或软硬件组合以实现或实施如下文更详细地描述的所公开主题的各方面。相较于视频图片流202,已编码的视频数据204(或已编码的视频码流204)被描绘为细线以强调较低数据量的已编码的视频数据204(或已编码的视频码流204),其可存储在流式传输服务器205上以供将来使用。一个或多个流式传输客户端子系统,例如图2中的客户端子系统206和客户端子系统208,可访问流式传输服务器205以检索已编码的视频数据204的副本207和副本209。客户端子系统206可包括例如电子装置230中的视频解码装置210。视频解码装置210对已编码的视频数据的传入副本207进行解码,且产生可在显示器212(例如显示屏)或另一呈现装置上呈现的输出视频图片流211。在一些流式传输系统中,可根据某些视频编码/压缩标准对已编码的视频数据204、视频数据207和视频数据209(例如视频码流)进行编码。该些标准的实施例包括ITU-T H.265标准、中国国家视频编码标准AVS(音视频编码标准)等。本申请可用于AVS的上下文中。
应注意,电子装置220和电子装置230可包括图中未示出的其它组件。举例来说,电子装置220可包括视频解码装置,且电子装置230还可包括视频编码装置。
在本申请的一个实施例中,以国际视频编码标准HEVC(High Efficiency Video Coding,高效率视频编码)、VVC(Versatile Video Coding,多功能视频编码),以及AVS为例,当输入一个视频帧图像之后,会根据一个块大小,将视频帧图像划分成若干个不重叠的处理单元,每个处理单元将进行类似的压缩操作。这个处理单元被称作CTU(Coding Tree Unit,编码树单元),或者称之为LCU(Largest Coding Unit,最大编码单元)。CTU再往下可以继续进行更加精细的划分,得到一个或多个基本的编码单元CU,CU是一个编码环节中最基本的元素。以下介绍对CU进行编码时的一些概念:
预测编码(Predictive Coding):预测编码包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。编码端需要为当前CU决定选择哪一种预测编码模式,并告知解码端。其中,帧内预测是指预测的信号来自于同一图像内已经编码重建过的区域;帧间预测是指预测的信号来自已经编码过的、不同于当前图像的其它图像(称之为参考图像)。
变换及量化(Transform & Quantization):残差视频信号经过DFT(Discrete Fourier Transform,离散傅里叶变换)、DCT(Discrete Cosine Transform,离散余弦变换)等变换操作后,将信号转换到变换域中,称之为变换系数。变换系数进一步进行有损的量化操作,丢失掉一定的信息,使得量化后的信号有利于压缩表达。在一些视频编码标准中,可能有多于一种变换方式可以选择,因此编码端也需要 为当前CU选择其中的一种变换方式,并告知解码端。量化的精细程度通常由量化参数(Quantization Parameter,简称QP)来决定,QP取值较大,表示更大取值范围的系数将被量化为同一个输出,因此通常会带来更大的失真及较低的码率;相反,QP取值较小,表示较小取值范围的系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。
熵编码(Entropy Coding)或统计编码:量化后的变换域信号将根据各个值出现的频率进行统计压缩编码,最后输出二值化(0或者1)的压缩码流。同时,编码产生其他信息,例如选择的编码模式、运动矢量数据等,也需要进行熵编码以降低码率。统计编码是一种无损的编码方式,可以有效的降低表达同样信号所需要的码率,常见的统计编码方式有变长编码(Variable Length Coding,简称VLC)或者基于上下文的二值化算术编码(Content Adaptive Binary Arithmetic Coding,简称CABAC)。
环路滤波(Loop Filtering):经过变化及量化的信号会通过反量化、反变换及预测补偿的操作获得重建图像。重建图像与原始图像相比由于存在量化的影响,部分信息与原始图像有所不同,即重建图像会产生失真(Distortion)。因此,可以对重建图像进行滤波操作,例如去块效应滤波(Deblocking filter,简称DB)、SAO(Sample Adaptive Offset,自适应像素补偿)或者ALF(Adaptive Loop Filter,自适应环路滤波)等滤波器,可以有效降低量化所产生的失真程度。由于这些经过滤波后的重建图像将作为后续编码图像的参考来对将来的图像信号进行预测,因此上述的滤波操作也被称为环路滤波,即在编码环路内的滤波操作。
在本申请的一个实施例中,图3示出了一个视频编码器的基本流程图,在该流程中以帧内预测为例进行说明。其中,原始图像信号s k[x,y]与预测图像信号
Figure PCTCN2022083956-appb-000001
做差值运算,得到残差信号u k[x,y]。残差信号u k[x,y]经过变换及量化处理之后得到量化系数,量化系数一方面通过熵编码得到编码后的比特流,另一方面通过反量化及反变换处理得到重构残差信号u' k[x,y]。预测图像信号
Figure PCTCN2022083956-appb-000002
与重构残差信号u' k[x,y]叠加生成图像信号
Figure PCTCN2022083956-appb-000003
图像信号
Figure PCTCN2022083956-appb-000004
一方面输入至帧内模式决策模块和帧内预测模块进行帧内预测处理,另一方面通过环路滤波输出重建图像信号s' k[x,y]。重建图像信号s' k[x,y]可以作为下一帧的参考图像进行运动估计及运动补偿预测。然后基于运动补偿预测的结果s' r[x m x,y m y]和帧内预测结果
Figure PCTCN2022083956-appb-000005
得到下一帧的预测图像信号
Figure PCTCN2022083956-appb-000006
并继续重复上述过程,直至编码完成。
基于上述的编码过程,在解码端针对每一个CU,在获取到压缩码流(即比特流)之后,进行熵解码获得各种模式信息及量化系数。然后量化系数经过反量化及反变换处理得到残差信号。另一方面,根据已知的编码模式信息,可获得该CU对应的预测信号,然后将残差信号与预测信号相加之后即可得到重建信号,重建信号再经过环路滤波等操作,产生最终的输出信号。
简单而言,视频文件的整体传输过程如图4所示,通过视频采集得到视频文件,然后经过视频编码和视频文件封装处理之后,将视频文件传输给接收方。接收方接收到视频文件之后,对视频文件进行解封装,解封装之后进行视频解码处理,最后呈现解码得到的视频。
在AVS3视频编解码技术中,为了提升视频压缩效率,提出了知识图像的概念。如图5所示,在对视频序列编码时,可生成主位流(main bitstream)和知识图像位流(library picture bitstream),并且主位流内的图像帧在解码时,可以参考知识图像位流中的图像帧。知识图像位流中的图像帧为知识图 像。知识图像可以是一类特殊的I帧图像,它作为独立的图像,可以被主位流中的B帧和/或P帧在解码时参考。知识图像和主位流中的I帧的不同之处在于,知识图像不会用于显示呈现。主位流和知识图像位流可以对应于相同的原始视频序列。
同时,相关技术中也提出了片段依赖描述子的信令。具体而言,一个@schemeIdUri属性为"urn:avs:ims:2018:ds"(dependent segment)的EssentialProperty元素表示一个片段依赖描述子。至少一个片段依赖描述子在表示层被指定,而在MPD(media presentation description,媒体演示说明)层和自适应集(adaptation set)层不应被指定。片段依赖描述子指示每个表示(representation)中的每个片段与其它片段(可能是同一表示中的片段、也可能是不同表示中的片段)存在非时间的依赖关系,表示被依赖的其它片段的标识(URL或indicator)和片段中用于压缩层的图像编号应该包含在该描述子中。
虽然相关技术指示了主位流中某些样本所参考的知识图像的样本的信息,也从信令层面给出了片段层级之间的依赖关系。然而这些样本级别的依赖和关联关系需要解码器在解析到具体的样本片段时才能够获取。若数据接收方之前并没有请求或解码相对应的知识图像轨道,则需要临时请求或解码知识图像轨道,进而会带来不必要的时延,降低了媒体资源的编解码效率。
因此,本申请实施例的技术方案提出了一种新的多媒体资源中轨道数据的处理方案,使得数据接收方能够根据信令文件提前获取到各个轨道数据之间的关联关系,以决定是否获取知识图像轨道数据以及请求何种知识图像轨道数据,在保证合理分配网络和CPU资源的前提下,避免了需要临时获取知识图像轨道数据而带来不必要的时延,提高了媒体资源的编解码效率。
以下对本申请实施例的技术方案的实现细节进行详细阐述:
图6示出了根据本申请实施例的多媒体资源中轨道数据的处理方法的流程图。该多媒体资源中轨道数据的处理方法可以由媒体播放设备来执行,该媒体播放设备可以是智能手机、平板电脑等。参照图6所示,该多媒体资源中轨道数据的处理方法至少包括步骤S610至步骤S630,详细介绍如下:
在步骤S610中,接收多媒体资源对应的信令文件,该信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子。该多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据。主位流轨道数据对应的描述子中包含的依赖项标识指向其所依赖的知识图像轨道数据对应的描述子。
需要说明的是:多媒体资源包含有具体的媒体资源数据,比如包含有物品A的介绍视频的具体内容(视频画面、介绍音频等)。多媒体资源对应的信令文件可以是DASH(Dynamic Adaptive Streaming over HTTP,基于HTTP的动态自适应流)信令文件。
在一实施方式中,多媒体资源的多个轨道数据中可以包含有一个知识图像轨道数据,也可以包含有多个知识图像轨道数据。
在本申请的一个实施例中,知识图像轨道数据对应的描述子中可以包含第一元素信息,该第一元素信息用于指示包含第一元素信息的描述子为知识图像轨道数据对应的描述子。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,那么所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中可以包含第二元素 信息,该第二元素信息用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,那么所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第三元素信息,该第三元素信息的值用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖。比如,若该第三元素信息的值为1,则说明该知识图像轨道数据被多个主位流轨道数据所依赖;若该第三元素信息的值为0,则说明该知识图像轨道数据被一个主位流轨道数据所依赖。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,且其中存在被多个主位流轨道数据所依赖的目标知识图像轨道数据,则该目标知识图像轨道数据对应的描述子中还包含第四元素信息,该第四元素信息用于指示这多个主位流轨道数据中指定主位流轨道数据的帧率。在一实施方式中,该指定主位流轨道数据可以是这多个主位流轨道数据,或者也可以是其中的部分主位流轨道数据。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,则所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中还包含样本索引标识,该样本索引标识用于指示主位流轨道数据中用于索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号区间。在一实施方式中,该样本索引标识包含第五元素信息和第六元素信息,该第五元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最小值,该第六元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最大值。
在步骤S620中,解析信令文件,根据依赖项标识确定主位流轨道数据与知识图像轨道数据之间的依赖关系。
在步骤S630中,根据主位流轨道数据与知识图像轨道数据之间的依赖关系,从数据源侧依次获取知识图像轨道数据和主位流轨道数据。
在本申请的一个实施例中,由于主位流轨道数据与知识图像轨道数据之间存在依赖关系,因此在获取知识图像轨道数据之后,再从数据源侧获取主位流轨道数据。
在一实施方式中,如果存在多个知识图像轨道数据,则可以先获取到主位流轨道数据需要最先参考的知识图像轨道数据,然后在解码主位流轨道数据的过程中,如果解码到需要参考其它知识图像轨道数据的位置时,再获取其它知识图像轨道数据。当然也可以在获取到主位流轨道数据和所有知识图像轨道数据之后,再进行解码处理。
在本申请的一个实施例中,主位流轨道数据中可以包含索引标识,该索引标识用于指示主位流轨道数据所依赖的知识图像轨道数据或用于指示主位流轨道数据所依赖的知识图像轨道组。
在一实施方式中,主位流轨道数据中包含有轨道参考类型数据盒,该轨道参考类型数据盒中包含参考类型字段,该参考类型字段用于表示索引标识。基于此,参考类型字段的值用于指示主位流轨道数据所依赖的知识图像轨道数据或所依赖的知识图像轨道组。
在本申请的一个实施例中,主位流轨道数据中可以包含轨道参考数据盒,在这种情况下,轨道参考数据盒包含该轨道参考类型数据盒。
在本申请的一个实施例中,多媒体资源的多个轨道数据中可以包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据中包含有轨道组标识,该轨道组标识用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一个实施例中,数据接收方在根据依赖关系从数据源侧依次获取知识图像轨道数据和主位流轨道数据之后,可以根据该依赖关系确定解码顺序;然后根据确定的解码顺序,依次对知识图像轨道数据和主位流轨道数据进行解码处理,得到多媒体资源。
在本申请的一个实施例中,可以先解码主位流轨道数据,在解码到主位流轨道数据中需要参考知识图像轨道数据的样本索引号区间时,根据该样本索引号区间,从多个知识图像轨道数据中确定需要参考的知识图像轨道数据,然后再解码需要参考的知识图像轨道数据。在一实施方式中,可以在获取到主位流轨道数据和全部的知识图像轨道数据之后,再进行解码处理;或者也可以先获取到主位流轨道数据需要最先参考的知识图像轨道数据,然后在解码主位流轨道数据的过程中,如果解码到需要参考其它知识图像轨道数据的位置时再获取其它知识图像轨道数据。
在本申请的一个实施例中,如果媒体资源包含至少两个知识图像轨道数据,那么所述至少两个知识图像轨道数据中各个知识图像轨道数据中还包含用于指示所述知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息。在一实施方式中,该第一字段信息比如可以是multi_main_bitstream。若multi_main_bitstream值为1,则说明知识图像轨道数据被多个主位流轨道数据所依赖;若multi_main_bitstream值为0,则说明知识图像轨道数据被一个主位流轨道数据所依赖。
在本申请的一个实施例中,如果第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则知识图像轨道数据中还包含指示该一个主位流轨道数据中索引知识图像轨道数据的样本索引号最小值的字段,以及指示该一个主位流轨道数据中用于索引知识图像轨道数据的样本索引号最大值的字段。在一实施方式中,指示样本索引号最小值的字段可以是sample_number_min,指示样本索引号最大值的字段可以是sample_number_max。其中,主位流轨道数据中样本索引号最小值和样本索引号最大值之间的片段需要依赖于知识图像轨道数据。
在本申请的一个实施例中,如果第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引知识图像轨道数据的样本索引号最小值的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引知识图像轨道数据的样本索引号最大值的字段,以及用于指示多个主位流轨道数据中各个主位流轨道数据的帧率的字段。类似地,指示样本索引号最小值的字段可以是sample_number_min,指示样本索引号最大值的字段可以是sample_number_max。
在本申请的一个实施例中,如果第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则知识图像轨道数据中还包含指示该一个主位流轨道数据中索引知识图像轨道数据的样本组数量的字段,以及指示该一个主位流轨道数据中索引知识图像轨道数据的样本组索引号的字段。在一实施方式中,指示样本组数量的字段可以是num_sample_groups,指示样本组索引号的字段可以是 group_description_index。
在本申请的一个实施例中,如果第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引知识图像轨道数据的样本组数量的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引知识图像轨道数据的样本组索引号的字段,以及用于指示多个主位流轨道数据中各个主位流轨道数据的帧率的字段。类似地,指示样本组数量的字段可以是num_sample_groups,指示样本组索引号的字段可以是group_description_index。
图6是从媒体资源的接收方来阐述本申请实施例的技术方案,以下结合图7从数据源侧来对本申请实施例的实现细节做进一步说明:
图7示出了根据本申请实施例的多媒体资源中轨道数据的处理方法的流程图,该多媒体资源中轨道数据的处理方法可以由媒体生成设备来执行,该媒体生成设备可以是智能手机、平板电脑等。参照图7所示,该多媒体资源中轨道数据的处理方法至少包括步骤S710至步骤S720,详细介绍如下:
在步骤S710中,生成多媒体资源对应的信令文件,该信令文件中包含有多媒体资源的多个轨道数据分别对应的描述子,该多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,该主位流轨道数据对应的描述子中包含的依赖项标识指向知识图像轨道数据对应的描述子。
需要说明的是:多媒体资源包含有具体的媒体资源数据,比如包含有物品A的介绍视频的具体内容(视频画面、介绍音频等)。多媒体资源对应的信令文件可以是DASH信令文件。
在一实施方式中,多媒体资源的多个轨道数据中可以包含有一个知识图像轨道数据,也可以包含有多个知识图像轨道数据。
在本申请的一个实施例中,知识图像轨道数据对应的描述子中可以包含第一元素信息,该第一元素信息用于指示包含第一元素信息的描述子为知识图像轨道数据对应的描述子。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,那么所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中可以包含第二元素信息,该第二元素信息用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,那么所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第三元素信息,该第三元素信息的值用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖。比如,若该第三元素信息的值为1,则说明该知识图像轨道数据被多个主位流轨道就所依赖;若该第三元素信息的值为0,则说明该知识图像轨道数据被一个主位流轨道就所依赖。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,且其中存在被多个主位流轨道数据所依赖的目标知识图像轨道数据,则该目标知识图像轨道数据对应的描述子中还包含第四元素信息,该第四元素信息用于指示这多个主位流轨道数据中指定主位流 轨道数据的帧率。在一实施方式中,该指定主位流轨道数据可以是这多个主位流轨道数据,或者也可以是其中的部分主位流轨道数据。
在本申请的一个实施例中,如果多媒体资源的多个轨道数据中包含有至少两个知识图像轨道数据,则所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中还包含样本索引标识,所述样本索引标识用于指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号区间。在一实施方式中,该样本索引标识包括第五元素信息和第六元素信息,该第五元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最小值,该第六元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最大值。
在步骤S720中,将信令文件发送给数据接收方,以使数据方根据信令文件中的依赖项标识确定主位流轨道数据与知识图像轨道数据之间的依赖关系,并根据依赖关系从数据源侧依次获取知识图像轨道数据和主位流轨道数据。
在本申请的一个实施例中,数据源侧在生成多媒体资源对应的信令文件之前,还可以生成主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据。该主位流轨道数据中包含有索引标识,该索引标识用于指示主位流轨道数据所依赖的知识图像轨道数据或用于指示主位流轨道数据所依赖的知识图像轨道组。
在一实施方式中,主位流轨道数据中包含有轨道参考类型数据盒,该轨道参考类型数据盒中包含参考类型字段,该参考类型字段用于表示索引标识。基于此,参考类型字段的值用于指示主位流轨道数据所依赖的知识图像轨道数据或所依赖的知识图像轨道组。
在本申请的一个实施例中,主位流轨道数据中可以包含轨道参考数据盒,在这种情况下,轨道参考数据盒包含该轨道参考类型数据盒。
在本申请的一个实施例中,多媒体资源的多个轨道数据中可以包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据中包含有轨道组标识,该轨道组标识用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在一实施方式中,知识图像轨道组数据的其它相关内容可以参照前述实施例的技术方案,不再赘述。
以上分别从数据接收方和数据源侧阐述了本申请实施例的技术方案,以下结合图8从整体上对本申请实施例的实现细节做进一步说明:
如图8所示,以服务端作为数据源侧、客户端作为数据接收方为例进行说明,具体可以包括如下步骤S801至步骤S807:
在步骤S801中,服务端生成位流。
在本申请的一个实施例中,服务端可以在视频编码环节,生成主位流,以及一个或多个知识图像位流。
在步骤S802中,服务端封装生成轨道数据。
在本申请的一个实施例中,服务端在视频文件封装环节,可以将主位流封装为单独的文件轨道, 将每个知识图像位流也封装为单独的文件轨道,并根据主位流和知识图像位流之间在解码时的参考关系,将主位流轨道和知识图像轨道通过轨道间的索引关系进行关联。如果主位流轨道需要参考多个知识图像轨道,那么这些知识图像轨道可通过轨道组进行关联,并在轨道组内通过样本索引范围信息、描述信息等区分不同的知识图像轨道。
在一实施方式中,一个主位流轨道可以关联至一个知识图像轨道,也可以关联至一个知识图像轨道组。多个主位流轨道(一般为同一内容不同帧率的多个轨道)也可以关联至同一个知识图像轨道或知识图像轨道组。
在步骤S803中,服务端生成DASH信令。
在本申请的一个实施例中,服务端在信令生成环节,可以将知识图像位流对应的媒体资源进行特殊标记,并指示主位流媒体资源和知识图像位流媒体资源之间的依赖关系。若主位流媒体资源需要参考多个知识图像媒体资源,则这些知识图像媒体资源可以互相关联并采用样本索引范围信息、描述信息等进行区分。上述信息都包括在DASH信令中。
在步骤S804中,服务端向客户端发送DASH信令。
在步骤S805中,客户端根据DASH信令向服务端请求媒体文件。
在本申请的一个实施例中,客户端根据信令文件判断所需的媒体资源是否依赖知识图像位流对应的媒体资源,如果依赖,则优先请求知识图像位流对应的媒体资源。如果依赖多个知识图像位流对应的媒体资源,则根据当前呈现的帧所属的样本索引范围信息来请求对应的媒体资源。
在步骤S806中,服务端向客户端传输媒体文件。
在步骤S807中,客户端解封装媒体文件,呈现对应的媒体资源。
在本申请的一个实施例中,客户端请求对应的媒体资源后,可以根据文件轨道之间的索引关系,优先解码知识图像位流对应的轨道数据。如果存在知识图像轨道组,即主位流对应的轨道数据依赖多个知识图像位流对应的轨道数据,则根据当前呈现的帧所属的样本索引范围信息来解码对应的轨道数据。
为了实现前述实施例的技术方案,本申请的实施例添加了一些描述性字段信息。以下以扩展ISOBMFF数据盒和DASH MPD信令的形式举例,定义了相关的字段以支持AVS3知识图像技术,具体如下:
1、定义知识图像轨道和主位流轨道之间的索引关系:
在本申请的一个实施例中,主位流轨道可以通过轨道索引数据盒索引至其解码所依赖的知识图像轨道。主位流轨道的TrackReferenceBox(轨道参考数据盒)中应添加对应的TrackReferenceTypeBoxes(轨道参考类型数据盒),该TrackReferenceTypeBoxes数据盒中通过track_IDs指示当前主位流轨道索引的知识图像轨道或知识图像轨道组。
具体地,主位流轨道和知识图像轨道之间的索引通过TrackReferenceTypeBoxes中对应的reference_type(参考类型)索引类型标识,该类型字段定义如下:
'a3lr':被索引的轨道为当前轨道对应的知识图像轨道。
在一实施方式中,一个主位流轨道可以通过'a3lr'索引至一个知识图像轨道或知识图像轨道组;多 个主位流轨道可以通过'a3lr'索引至一个知识图像轨道或知识图像轨道组。
2、定义知识图像轨道组:
在本申请的一个实施例中,如果一个主位流轨道需要参考多个知识图像轨道,那么这些知识图像轨道应该通过知识图像轨道组进行关联。在一实施方式中,知识图像轨道组的一种定义如下:
Figure PCTCN2022083956-appb-000007
在上述的定义中,知识图像轨道组是通过扩展轨道组数据盒得到,以'a3lg'轨道组类型标识。在所有包含'a3lg'类型TrackGroupTypeBox的轨道中,组ID相同的轨道属于同一个轨道组。Avs3LibraryGroupBox中各字段语义如下:
multi_main_bitstream指示该知识图像轨道是否被多个主位流轨道参考,该字段取值为1表示该知识图像轨道被多个主位流轨道参考;该字段取值为0表示该知识图像轨道仅被一个主位流轨道参考。在一实施方式中,该字段默认值为0。
sample_number_min指示主位流轨道或特定帧率的主位流轨道中,索引当前知识图像轨道的样本索引号最小值。
sample_number_max指示主位流轨道或特定帧率的主位流轨道中,索引当前知识图像轨道的样本索引号最大值。
frame_rate:当该知识图像轨道被多个主位流轨道参考时,指示多个主位流轨道中某个轨道的帧率。
track_description是以空字符结尾的字符串,指示该知识图像轨道的描述信息。在一实施方式中,还可以利用样本群组信息来区分同一知识图像轨道组中的不同轨道。具体地,知识图像轨道组的另一种定义如下:
Figure PCTCN2022083956-appb-000008
在上述的定义中,知识图像轨道组是通过扩展轨道组数据盒得到,以'a3lg'轨道组类型标识。在所有包含'a3lg'类型TrackGroupTypeBox的轨道中,组ID相同的轨道属于同一个轨道组。Avs3LibraryGroupBox中各字段语义如下:
multi_main_bitstream指示该知识图像轨道是否被多个主位流轨道参考,该字段取值为1表示该知识图像轨道被多个主位流轨道参考;该字段取值为0表示该知识图像轨道仅被一个主位流轨道参考。在一实施方式中,该字段默认值为0。
num_sample_groups指示主位流轨道或特定帧率的主位流轨道中,索引当前知识图像轨道的LibrarySampleGroupEntry样本组的数目。
group_description_index指示主位流轨道或特定帧率的主位流轨道中,索引当前知识图像轨道的LibrarySampleGroupEntry样本组的索引号。
frame_rate:当该知识图像轨道被多个主位流轨道参考时,指示多个主位流轨道中某个轨道的帧率。
track_description是以空字符结尾的字符串,指示该知识图像轨道的描述信息。
3、在DASH信令扩展中,定义了知识图像描述子:
在本申请的一个实施例中,知识图像描述子Avs3Library为SupplementalProperty元素,其@schemeIdUri属性为"urn:avs:ims:2018:av3l"。该描述子可存在于adaptation set(自适应集)层级或representation(表示)层级。该描述子存在于adaptation set层级时,描述该adaptation set内所有的representation;该描述子存在于representation层级时,描述对应的representation。Avs3Library描述子指示知识图像representation的相关属性,具体属性如下表1所示:
Figure PCTCN2022083956-appb-000009
表1
在表1中,“使用方法(Use)”列中的“0…N”表示个数(具体为整数),O表示Optional(即可选的),CM表示Conditional Mandatory(即条件强制)。“数据类型(Data type)”列中的xs表示的是short int(短整型)类型。
在一个具体示例中,假设服务端存在媒体内容A和媒体内容B,服务端分别对其进行编码,生成bitstream。例如,针对媒体内容A,生成主位流StreamA和知识图像位流StreamAL;针对媒体内容B,生成主位流StreamB。
在生成位流之后,服务端将StreamA封装为TrackA(轨道A)、将StreamAL封装为TrackAL,并在TrackA中使用'a3lr'类型的TrackReferenceTypeBox索引至TrackAL。
此外,服务端将StreamB封装为TrackB,由于TrackB没有对应的知识图像轨道,则TrackB中无需包含'a3lr'类型的TrackReferenceTypeBox。
在进行封装之后,对于TrackA和TrackAL,服务端分别作为1个representation进行描述(例如,RA和RAL)。RA的@dependencyId(依赖项标识)属性应该指向RAL,表示RA的消费依赖于RAL,且RAL需以Avs3Library描述子进行描述。对于TrackB,服务端将其作为1个representation进行描述(例如,RB),无需特殊扩展。
在对轨道数据进行描述之后,服务端据此生成DASH信令,并将信令文件发送给客户端。
客户端在接收到信令文件之后,可以根据信令文件确定描述子之间的依赖关系,比如RA依赖于RAL且RAL为知识图像媒体资源。假设客户端1需要请求RA对应的媒体资源,客户端2需要请求RB对应的媒体资源,那么客户端1需要先向服务端请求RAL对应的媒体资源,然后再向服务端请求RA对应的媒体资源。而客户端2可以直接请求RB对应的媒体资源。
客户端1在接收到收到RAL对应的媒体资源和RA对应的媒体资源后,优先解码RAL对应的媒体资源,然后再解码RA对应的媒体资源。而客户端2在接收到RB对应的媒体资源之后,可以直接解码RB对应的媒体资源。
在上述示例中,媒体内容中包含了一个主位流和一个知识图像位流,以下以媒体内容中包含了一个主位流和多个知识图像位流为例进行再次说明:
在本申请的一个具体示例中,假设服务端存在媒体内容A,服务端对其进行编码,生成主位流StreamA和知识图像位流StreamAL1、StreamAL2。
在生成位流之后,服务端将StreamA封装为TrackA、将StreamAL1封装为TrackAL1、将StreamAL2封装为TrackAL2。同时,将TrackAL1和TrackAL2以类型为'a3lg'的轨道组进行关联,其中的参数如下:
TrackAL1:{group_id=100;sample_number_min=0;sample_number_max=100}
TrackAL2:{group_id=100;sample_number_min=101;sample_number_max=200}
此时在TrackA中使用'a3lr'类型的TrackReferenceTypeBox索引至对应的轨道组(通过group_id进行索引,在该示例中group_id为100)。
在进行封装之后,对于TrackA、TrackAL1和TrackAL2,服务端分别作为1个representation进行描述(例如,RA、RAL1和RAL2),其中RA的@dependencyId(依赖项标识)属性应该指向 RAL1和RAL2,表示RA的消费依赖于RAL1和RAL2。且RAL1和RAL2需以Avs3Library描述子进行描述,具体如下:
RAL1:{group_id=100;sample_number_min=0;sample_number_max=100}
RAL2:{group_id=100;sample_number_min=101;sample_number_max=200}
在对轨道数据进行描述之后,服务端据此生成DASH信令,并将信令文件发送给客户端。
客户端在接收到信令文件之后,可以根据信令文件确定描述子之间的依赖关系。比如RA依赖于RAL1和RAL2,且RAL1和RAL2为知识图像媒体资源,RAL1对应的是RA中靠前的样本。假设客户端1需要请求RA对应的媒体资源,那么客户端1需要先向服务端请求RAL1和RA对应的媒体资源。当客户端1消费RA至接近第101个样本时,再向服务端请求RAL2对应的媒体资源。
本申请上述实施例的技术方案针对AVS3编解码标准中的知识图像这一特性,提出了一种文件轨道层面的封装和传输信令指示方法。通过本申请实施例的技术方案,可以在文件轨道层级灵活关联知识图像轨道和主位流轨道,并通过信令指示这种关联关系。在数据传输阶段,客户端可以根据这些信息决定是否请求知识图像轨道以及请求何种知识图像轨道。同样地,在解码阶段,客户端可以根据这些信息决定解码不同轨道的顺序,最终合理地分配网络和CPU资源。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的多媒体资源中轨道数据的处理方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的多媒体资源中轨道数据的处理方法的实施例。
图9示出了根据本申请实施例的多媒体资源中轨道数据的处理装置的框图,该多媒体资源中轨道数据的处理装置可以设置在媒体播放设备内,该媒体播放设备可以是智能手机、平板电脑等。
参照图9所示,根据本申请实施例的多媒体资源中轨道数据的处理装置900,包括:接收单元902、解析单元904和获取单元906。
其中,接收单元902配置为接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;解析单元904配置为解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;获取单元906配置为根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
在本申请的一些实施例中,基于前述方案,所述知识图像轨道数据对应的描述子中包含第一元素信息,所述第一元素信息用于指示包含所述第一元素信息的描述子为知识图像轨道数据对应的描述子。
在本申请的一些实施例中,基于前述方案,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第二元素信息,所述第二元素信息用于指示所述所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第三元素信息,所述第三元素信息的值用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖。
在本申请的一些实施例中,基于前述方案,若所述至少两个知识图像轨道数据中存在被多个主位流轨道数据所依赖的目标知识图像轨道数据,则所述目标知识图像轨道数据对应的描述子中还包含第四元素信息,所述第四元素信息用于指示所述多个主位流轨道数据中指定主位流轨道数据的帧率。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中还包含样本索引标识,所述样本索引标识用于指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号区间。
在本申请的一些实施例中,基于前述方案,所述样本索引标识包括第五元素信息和第六元素信息,所述第五元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最小值,所述第六元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最大值。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数据或用于指示所述主位流轨道数据所依赖的知识图像轨道组。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含轨道参考类型数据盒,所述轨道参考类型数据盒中包含参考类型字段,所述参考类型字段用于表示所述索引标识。
在本申请的一些实施例中,基于前述方案,所述主位流轨道数据中包含轨道参考数据盒,所述轨道参考数据盒包含所述轨道参考类型数据盒。
在本申请的一些实施例中,基于前述方案,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据中包含有轨道组标识,所述轨道组标识用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道数据中还包含用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息;若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段,以及指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段。
在本申请的一些实施例中,基于前述方案,若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段。
在本申请的一些实施例中,基于前述方案,所述至少两个知识图像轨道数据中各个知识图像轨道 数据中还包含用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息;若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段,以及指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段。
在本申请的一些实施例中,基于前述方案,若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段。
在本申请的一些实施例中,基于前述方案,所述的多媒体资源中轨道数据的处理装置900还包括:解码单元,配置为根据所述依赖关系确定解码顺序;根据所述解码顺序,依次对所述知识图像轨道数据和所述主位流轨道数据进行解码处理,得到所述多媒体资源。
在本申请的一些实施例中,基于前述方案,所述解码单元配置为:解码所述主位流轨道数据;在解码到所述主位流轨道数据中需要参考知识图像轨道数据的样本索引号区间时,根据所述样本索引号区间,从多个知识图像轨道数据中确定需要参考的知识图像轨道数据;解码所述需要参考的知识图像轨道数据。
图10示出了根据本申请实施例的多媒体资源中轨道数据的处理装置的框图,该多媒体资源中轨道数据的处理装置可以设置在媒体生成设备内,该媒体生成设备可以是智能手机、平板电脑等。
参照图10所示,根据本申请的一个实施例的多媒体资源中轨道数据的处理装置1000,包括:生成单元1002和发送单元1004。
其中,生成单元1002配置为生成多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;发送单元1004配置为将所述信令文件发送给数据接收方,以使所述数据方根据所述信令文件中的所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系,并根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
在本申请的一些实施例中,基于前述方案,所述生成单元1002还配置为:在生成多媒体资源对应的信令文件之前,生成主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据中包含有索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数据。
图11示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图11示出的电子设备的计算机系统1100仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图11所示,计算机系统1100包括中央处理单元(Central Processing Unit,CPU)1101,其可以 根据存储在只读存储器(Read-Only Memory,ROM)1102中的程序或者从存储部分1108加载到随机访问存储器(Random Access Memory,RAM)1103中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1103中,还存储有系统操作所需的各种程序和数据。CPU 1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(Input/Output,I/O)接口1105也连接至总线1104。
以下部件连接至I/O接口1105:包括键盘、鼠标等的输入部分1106;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1107;包括硬盘等的存储部分1108;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1109。通信部分1109经由诸如因特网的网络执行通信处理。驱动器1110也根据需要连接至I/O接口1105。可拆卸介质1111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1110上,以便于从其上读出的计算机程序根据需要被安装入存储部分1108。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中,该计算机程序可以通过通信部分1109从网络上被下载和安装,和/或从可拆卸介质1111被安装。在该计算机程序被中央处理单元(CPU)1101执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
本领域技术人员在考虑说明书及实践这里公开的实施方式后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (20)

  1. 一种多媒体资源中轨道数据的处理方法,其特征在于,包括:
    接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;
    解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;
    根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
  2. 根据权利要求1所述的多媒体资源中轨道数据的处理方法,其特征在于,所述知识图像轨道数据对应的描述子中包含第一元素信息,所述第一元素信息用于指示包含所述第一元素信息的描述子为知识图像轨道数据对应的描述子。
  3. 根据权利要求1所述的多媒体资源中轨道数据的处理方法,其特征在于,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第二元素信息,所述第二元素信息用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
  4. 根据权利要求3所述的多媒体资源中轨道数据的处理方法,其特征在于,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中包含第三元素信息,所述第三元素信息的值用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖。
  5. 根据权利要求4所述的多媒体资源中轨道数据的处理方法,其特征在于,若所述至少两个知识图像轨道数据中存在被多个主位流轨道数据所依赖的目标知识图像轨道数据,则所述目标知识图像轨道数据对应的描述子中还包含第四元素信息,所述第四元素信息用于指示所述多个主位流轨道数据中指定主位流轨道数据的帧率。
  6. 根据权利要求3所述的多媒体资源中轨道数据的处理方法,其特征在于,所述至少两个知识图像轨道数据中各个知识图像轨道数据对应的描述子中还包含样本索引标识,所述样本索引标识用于指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号区间。
  7. 根据权利要求6所述的多媒体资源中轨道数据的处理方法,其特征在于,所述样本索引标识包括第五元素信息和第六元素信息,所述第五元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最小值,所述第六元素信息的值指示主位流轨道数据中索引所述至少两个知识图像轨道数据中各个知识图像轨道数据的样本索引号最大值。
  8. 根据权利要求1所述的多媒体资源中轨道数据的处理方法,其特征在于,所述主位流轨道数据中包含索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数据或用于指示所述主位流轨道数据所依赖的知识图像轨道组。
  9. 根据权利要求8所述的多媒体资源中轨道数据的处理方法,其特征在于,所述主位流轨道数据 中包含轨道参考类型数据盒,所述轨道参考类型数据盒中包含参考类型字段,所述参考类型字段用于表示所述索引标识。
  10. 根据权利要求9所述的多媒体资源中轨道数据的处理方法,其特征在于,所述主位流轨道数据中包含轨道参考数据盒,所述轨道参考数据盒包含所述轨道参考类型数据盒。
  11. 根据权利要求1所述的多媒体资源中轨道数据的处理方法,其特征在于,所述多个轨道数据中包含至少两个知识图像轨道数据,所述至少两个知识图像轨道数据中各个知识图像轨道数据中包含有轨道组标识,所述轨道组标识用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据所在的轨道组。
  12. 根据权利要求11所述的多媒体资源中轨道数据的处理方法,其特征在于,所述至少两个知识图像轨道数据中各个知识图像轨道数据中还包含用于指示所述至少两个知识图像轨道数据中各个知识图像轨道数据是否被多个主位流轨道数据所依赖的第一字段信息;
    若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段,以及指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段;和/或
    若所述第一字段信息指示知识图像轨道数据被一个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段,以及指示该一个主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段。
  13. 根据权利要求12所述的多媒体资源中轨道数据的处理方法,其特征在于,若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最小值的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本索引号最大值的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段;和/或
    若所述第一字段信息指示知识图像轨道数据被多个主位流轨道数据所依赖,则所述知识图像轨道数据中还包含分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组数量的字段、分别指示所述多个主位流轨道数据中各个主位流轨道数据中索引所述知识图像轨道数据的样本组索引号的字段,以及用于指示所述多个主位流轨道数据中各个主位流轨道数据的帧率的字段。
  14. 根据权利要求1至13中任一项所述的多媒体资源中轨道数据的处理方法,其特征在于,所述处理方法还包括:
    根据所述依赖关系确定解码顺序;
    根据所述解码顺序,依次对所述知识图像轨道数据和所述主位流轨道数据进行解码处理,得到所述多媒体资源。
  15. 根据权利要求14所述的多媒体资源中轨道数据的处理方法,其特征在于,根据所述解码顺序,依次对所述知识图像轨道数据和所述主位流轨道数据进行解码处理,包括:
    解码所述主位流轨道数据;
    在解码到所述主位流轨道数据中需要参考知识图像轨道数据的样本索引号区间时,根据所述样本索引号区间,从多个知识图像轨道数据中确定需要参考的知识图像轨道数据;
    解码所述需要参考的知识图像轨道数据。
  16. 一种多媒体资源中轨道数据的处理方法,其特征在于,包括:
    生成多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;
    将所述信令文件发送给数据接收方,以使所述数据方根据所述信令文件中的所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系,并根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
  17. 根据权利要求16所述的多媒体资源中轨道数据的处理方法,其特征在于,在生成多媒体资源对应的信令文件之前,所述处理方法还包括:
    生成主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据中包含有索引标识,所述索引标识用于指示所述主位流轨道数据所依赖的知识图像轨道数据。
  18. 一种多媒体资源中轨道数据的处理装置,其特征在于,包括:
    接收单元,配置为接收多媒体资源对应的信令文件,所述信令文件中包含有所述多媒体资源的多个轨道数据分别对应的描述子,所述多个轨道数据包括主位流对应的主位流轨道数据和知识图像位流对应的知识图像轨道数据,所述主位流轨道数据对应的描述子中包含的依赖项标识指向所述知识图像轨道数据对应的描述子;
    解析单元,配置为解析所述信令文件,根据所述依赖项标识确定所述主位流轨道数据与所述知识图像轨道数据之间的依赖关系;
    获取单元,配置为根据所述依赖关系从数据源侧依次获取所述知识图像轨道数据和所述主位流轨道数据。
  19. 一种计算机可读介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至15中任一项所述的多媒体资源中轨道数据的处理方法,或实现如权利要求16至17中任一项所述的多媒体资源中轨道数据的处理方法。
  20. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1至15中任一项所述的多媒体资源中轨道数据的处理方法,或实现如权利要求16至17中任一项所述的多媒体资源中轨道数据的处理方法。
PCT/CN2022/083956 2021-05-24 2022-03-30 多媒体资源中轨道数据的处理方法、装置、介质及设备 WO2022247452A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22810185.3A EP4351142A4 (en) 2021-05-24 2022-03-30 METHOD AND APPARATUS FOR PROCESSING TRACK DATA IN A MULTIMEDIA RESOURCE, MEDIUM AND DEVICE
US17/988,713 US11949966B2 (en) 2021-05-24 2022-11-16 Processing method, apparatus, medium and device for track data in multimedia resource

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110567993.0 2021-05-24
CN202110567993.0A CN115396678A (zh) 2021-05-24 2021-05-24 多媒体资源中轨道数据的处理方法、装置、介质及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/988,713 Continuation US11949966B2 (en) 2021-05-24 2022-11-16 Processing method, apparatus, medium and device for track data in multimedia resource

Publications (1)

Publication Number Publication Date
WO2022247452A1 true WO2022247452A1 (zh) 2022-12-01

Family

ID=84113731

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083956 WO2022247452A1 (zh) 2021-05-24 2022-03-30 多媒体资源中轨道数据的处理方法、装置、介质及设备

Country Status (5)

Country Link
US (1) US11949966B2 (zh)
EP (1) EP4351142A4 (zh)
CN (1) CN115396678A (zh)
TW (1) TWI794076B (zh)
WO (1) WO2022247452A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118317066A (zh) * 2023-01-09 2024-07-09 腾讯科技(深圳)有限公司 一种触觉媒体的数据处理方法及相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111416976A (zh) * 2019-01-08 2020-07-14 华为技术有限公司 视频解码方法、视频编码方法、装置、设备及存储介质
CN111526368A (zh) * 2019-02-03 2020-08-11 华为技术有限公司 视频解码方法、视频编码方法、装置、设备及存储介质
CN111526365A (zh) * 2019-02-01 2020-08-11 浙江大学 位流检验方法、解码方法及其装置
WO2020229734A1 (en) * 2019-05-16 2020-11-19 Nokia Technologies Oy An apparatus, a method and a computer program for handling random access pictures in video coding

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3968645A1 (en) * 2015-12-11 2022-03-16 InterDigital Madison Patent Holdings, SAS Scheduling multiple-layer video segments
US10778993B2 (en) * 2017-06-23 2020-09-15 Mediatek Inc. Methods and apparatus for deriving composite tracks with track grouping
CN110858916B (zh) * 2018-08-24 2020-11-24 上海交通大学 支持大跨度相关性信息编码的标识方法及系统
CN110876084B (zh) * 2018-08-29 2021-01-01 浙江大学 处理和传输媒体数据的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111416976A (zh) * 2019-01-08 2020-07-14 华为技术有限公司 视频解码方法、视频编码方法、装置、设备及存储介质
CN111526365A (zh) * 2019-02-01 2020-08-11 浙江大学 位流检验方法、解码方法及其装置
CN111526368A (zh) * 2019-02-03 2020-08-11 华为技术有限公司 视频解码方法、视频编码方法、装置、设备及存储介质
WO2020229734A1 (en) * 2019-05-16 2020-11-19 Nokia Technologies Oy An apparatus, a method and a computer program for handling random access pictures in video coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4351142A4 *

Also Published As

Publication number Publication date
CN115396678A (zh) 2022-11-25
EP4351142A1 (en) 2024-04-10
TW202247666A (zh) 2022-12-01
TWI794076B (zh) 2023-02-21
EP4351142A4 (en) 2024-07-31
US20230188812A1 (en) 2023-06-15
US11949966B2 (en) 2024-04-02

Similar Documents

Publication Publication Date Title
AU2007319699B2 (en) Techniques for variable resolution encoding and decoding of digital video
WO2022062880A1 (zh) 视频解码方法、装置、计算机可读介质及电子设备
WO2022063033A1 (zh) 视频解码方法、视频编码方法、装置、计算机可读介质及电子设备
Steinert et al. Architecture of a Low Latency H. 264/AVC Video Codec for Robust ML based Image Classification: How Region of Interests can Minimize the Impact of Coding Artifacts
WO2022247452A1 (zh) 多媒体资源中轨道数据的处理方法、装置、介质及设备
WO2022042325A1 (zh) 视频处理方法、装置、设备及存储介质
US20240080487A1 (en) Method, apparatus for processing media data, computer device and storage medium
CN111182310A (zh) 视频处理方法、装置、计算机可读介质及电子设备
WO2022174701A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备
CN112449185B (zh) 视频解码方法、编码方法、装置、介质及电子设备
WO2023130893A1 (zh) 流媒体传输方法、装置、电子设备及计算机可读存储介质
CN114979656B (zh) 视频编解码方法、装置、计算机可读介质及电子设备
WO2023202097A1 (zh) 环路滤波方法、视频编解码方法、装置、介质、程序产品及电子设备
WO2023051222A1 (zh) 滤波及编解码方法、装置、计算机可读介质及电子设备
JP6748657B2 (ja) 圧縮ビデオビットストリームに付属メッセージデータを含めるシステムおよび方法
WO2024212676A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备
CN112449184B (zh) 变换系数优化方法、编解码方法、装置、介质及电子设备
US20240244229A1 (en) Systems and methods for predictive coding
JP7483029B2 (ja) ビデオ復号方法、ビデオ符号化方法、装置、媒体、及び電子機器
WO2022174659A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备
WO2023130899A1 (zh) 环路滤波方法、视频编解码方法、装置、介质及电子设备
WO2024082632A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备
US20130287100A1 (en) Mechanism for facilitating cost-efficient and low-latency encoding of video streams
JP2024535489A (ja) ビデオ処理のための方法、装置及び媒体
KR20240155209A (ko) 동영상 처리 방법, 장치 및 매체

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22810185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2022810185

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022810185

Country of ref document: EP

Effective date: 20240102