CN108886616A

CN108886616A - The method, apparatus and computer system of Video coding

Info

Publication number: CN108886616A
Application number: CN201780018384.1A
Authority: CN
Inventors: 周焰; 郑萧桢
Original assignee: Shenzhen Dajiang Innovations Technology Co Ltd
Current assignee: Shenzhen Dajiang Innovations Technology Co Ltd
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2018-11-23
Also published as: WO2019127100A1

Abstract

Disclose the method, apparatus and computer system of a kind of Video coding.This method includes：The motion vector information of the first default frame number image in 360 degree of panoramic videos is obtained by image-signal processor；According to the motion vector information, determine minimum movement vector value and pole motion vector value, wherein, the minimum movement vector value is the motion vector value in the smallest region of motion vector value in the described first default frame number image, and the pole motion vector value is the motion vector value that the two poles of the earth region of 360 degree of panoramic videos is corresponded in the described first default frame number image；According to the minimum movement vector value, the pole motion vector value and motion vector value thresholding, the rotation angle of the second default frame number image in 360 degree of panoramic videos is determined；According to the rotation angle, the described second default frame number image is rotated, is then encoded again.The technical solution of the embodiment of the present invention can be improved the code efficiency of 360 degree of panoramic videos.

Description

Video coding method, device and computer system

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The present invention relates to the field of information technology, and more particularly, to a method, an apparatus, and a computer system for video encoding.

Background

With the expansion of video shooting scenes in recent years, 360-degree panoramic video applications are gradually popularized. A 360-degree panoramic video generally refers to a video with a horizontal viewing angle of 360 degrees (-180 degrees to 180 degrees) and a vertical viewing angle of 180 degrees (-90 degrees to 90 degrees), and is generally presented in a three-dimensional spherical form. A 360-degree panoramic video is usually mapped in a geometric relationship to generate a two-dimensional planar video, and then processed by digital image processing and encoding/decoding.

When a 360-degree panoramic video is mapped into a two-dimensional plane for coding, for example, the panoramic video is mapped into a longitude and latitude map format, the upper and lower polar parts (the areas near 90 degrees and 90 degrees in vertical view angle, which can also be called pole areas) of the panoramic video are obviously stretched, and the splicing positions of the left side and the right side of the longitude and latitude map are also obviously discontinuous. Considering that a human eye attention area is mainly in an area near the equator of the panoramic video (an area near a vertical visual angle of 0 degree), when the longitude and latitude map is coded, more bits are distributed to the central area of the longitude and latitude map, and the coding is more careful; for the two poles above and below and the two sides above and below the longitude and latitude map, the bits allocated are relatively less, and the quality of video coding is relatively worse. However, for some 360-degree panoramic videos with severe motion of two polar parts and smooth motion of the central area, the coding method consumes a large amount of redundant bits in the gradual motion area, and for the two polar parts with severe motion, the coding quality is poor due to less allocated bits, so that the coding efficiency is low.

Therefore, a method for encoding a 360-degree panoramic video is needed to improve the encoding efficiency of the 360-degree panoramic video.

Disclosure of Invention

The embodiment of the invention provides a video coding method, a video coding device and a computer system, which can improve the coding efficiency of a 360-degree panoramic video.

In a first aspect, a method for video coding is provided, including: acquiring motion vector information of a first preset frame number image in a 360-degree panoramic video through an image signal processor; determining a minimum motion vector value and a pole motion vector value according to the motion vector information, wherein the minimum motion vector value is a motion vector value of a region with the minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image; determining a rotation angle of a second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and a motion vector value threshold; and rotating the second preset frame number image according to the rotation angle, and then encoding.

In a second aspect, an apparatus for video encoding is provided, including: the image signal processor is used for acquiring motion vector information of a first preset frame number image in the 360-degree panoramic video; a processing unit, configured to determine a minimum motion vector value and a pole motion vector value according to the motion vector information, where the minimum motion vector value is a motion vector value of a region with a minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image; determining a rotation angle of a second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and a motion vector value threshold; a rotation unit, configured to rotate the second preset frame number image according to the rotation angle; and an encoding unit for encoding the rotated image.

In a third aspect, there is provided a computer system comprising: a memory for storing computer executable instructions; a processor for accessing the memory and executing the computer-executable instructions to perform the operations in the method of the first aspect described above.

In a fourth aspect, a computer storage medium is provided, in which program code is stored, the program code being operable to instruct execution of the method of the first aspect.

According to the technical scheme of the embodiment of the invention, the motion vector information is obtained through the image signal processor, the minimum motion vector value and the pole motion vector value are determined according to the motion vector information, and the rotation angle of the 360-degree panoramic video is determined according to the minimum motion vector value and the pole motion vector value, so that the rotation coding of the 360-degree panoramic video can be realized without secondary coding, the compression efficiency and the coding quality of the 360-degree panoramic video can be improved, and the coding efficiency of the 360-degree panoramic video can be improved.

Drawings

Fig. 1 is an architecture diagram of a solution to which an embodiment of the invention is applied.

FIG. 2 is a process architecture diagram of an encoder of an embodiment of the present invention.

Fig. 3 is a schematic diagram of data to be encoded according to an embodiment of the present invention.

Fig. 4a is a diagram of generating a two-dimensional flat video by 360-degree panoramic video mapping according to an embodiment of the present invention.

Fig. 4b is a diagram of video rotation according to an embodiment of the present invention.

Fig. 5 is a schematic flow chart of a method of video encoding of an embodiment of the present invention.

Fig. 6 is a flowchart of a method of video encoding according to an embodiment of the present invention.

FIG. 7 is a schematic block diagram of an apparatus for video encoding of an embodiment of the present invention.

FIG. 8 is a schematic block diagram of a computer system of an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be described below with reference to the accompanying drawings.

It should be understood that the specific examples are included merely as a prelude to a better understanding of the embodiments of the present invention for those skilled in the art and are not intended to limit the scope of the embodiments of the present invention.

It should also be understood that the formula in the embodiment of the present invention is only an example, and is not intended to limit the scope of the embodiment of the present invention, and the formula may be modified, and the modifications should also fall within the protection scope of the present invention.

It should also be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should also be understood that the various embodiments described in this specification may be implemented alone or in combination, and are not limited to the embodiments of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

As shown in fig. 1, system 100 may receive data to be encoded 102, encode data to be encoded 102, generate encoded data 108, and may further video encode 108. For example, the system 100 may receive 360-degree panoramic video data, and may perform a rotation encoding on the 360-degree panoramic video data to generate rotation encoded data. In some embodiments, the components in system 100 may be implemented by one or more processors, which may be processors in a computing device or in a mobile device (e.g., a drone). The processor may be any kind of processor, which is not limited in this embodiment of the present invention. In some possible designs, the processor may include an Image Signal Processor (ISP), encoder, and the like. One or more memories may also be included in the system 100. The memory may be used to store instructions and data, e.g., computer-executable instructions to implement aspects of embodiments of the invention, data to be encoded 102, encoded 108, etc. The memory may be any kind of memory, which is not limited in this embodiment of the present invention.

Data to be encoded 102 may include text, images, graphical objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, data to be encoded 102 may include sensory data from sensors, which may be visual sensors (e.g., cameras, infrared sensors), microphones, near-field sensors (e.g., ultrasonic sensors, radar), position sensors, temperature sensors, touch sensors, and so forth. In some cases, the data to be encoded 102 may include information from the user, e.g., biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA samples, etc.

Encoding is necessary for efficient and/or secure transmission or storage of data. Encoding of data to be encoded 102 may include data compression, encryption, error correction coding, format conversion, and the like. For example, compression of multimedia data (e.g., video or audio) may reduce the number of bits transmitted in a network. Sensitive data, such as financial information and personal identification information, may be encrypted prior to transmission and storage to protect confidentiality and/or privacy. In order to reduce the bandwidth occupied by video storage and transmission, video data needs to be subjected to encoding compression processing.

Any suitable encoding technique may be used to encode the data to be encoded 102. The type of encoding depends on the data being encoded and the specific encoding requirements.

In some embodiments, the encoder may implement one or more different codecs. Each codec may include code, instructions or computer programs implementing a different coding algorithm. An appropriate encoding algorithm may be selected to encode a given data 102 to be encoded based on a variety of factors, including the type and/or source of the data 102 to be encoded, the receiving entity of the encoded data, available computing resources, network environment, business environment, rules and criteria, and the like.

For example, the encoder may be configured to encode a series of video frames. A series of steps may be taken to encode the data in each frame. In some embodiments, the encoding step may include prediction, transform, quantization, entropy encoding, and like processing steps.

The prediction includes two types, intra prediction and inter prediction, and aims to remove redundant information of a current image block to be coded by using prediction block information. The intra prediction obtains prediction block data using information of the present frame image. The inter-frame prediction utilizes the information of a reference frame to obtain prediction block data, and the process comprises the steps of dividing an image block to be coded into a plurality of sub-image blocks; then, aiming at each sub image block, searching an image block which is most matched with the current sub image block in a reference image as a prediction block; and then subtracting the corresponding pixel values of the sub image block and the prediction block to obtain a residual error, and combining the obtained residual errors corresponding to the sub image blocks together to obtain the residual error of the image block.

The method comprises the steps of using a transformation matrix to transform a residual block of an image to remove the correlation of the residual of the image block, namely removing redundant information of the image block so as to improve the coding efficiency, and using two-dimensional transformation to transform a data block in the image block, namely multiplying the residual information of the data block by an NxM transformation matrix and a transposition matrix at a coding end respectively to obtain a transformation coefficient after multiplication. The quantized coefficient can be obtained by quantizing the transformation coefficient, and finally entropy coding is carried out on the quantized coefficient, and finally the bit stream obtained by entropy coding and the coding mode information after coding, such as an intra-frame prediction mode, motion vector information and the like, are stored or sent to a decoding end. At the decoding end of the image, entropy coding is carried out after entropy coding bit streams are obtained, corresponding residual errors are obtained, according to the decoded motion vectors or intra-frame prediction and other information image blocks, prediction image blocks corresponding to the image blocks are obtained, and according to the prediction image blocks and the residual errors of the image blocks, values of all pixel points in the current sub image blocks are obtained.

FIG. 2 shows a processing architecture diagram of an encoder of an embodiment of the present invention. As shown in fig. 2, the prediction process may include intra prediction and inter prediction. Through prediction processing, residual errors corresponding to data units (such as pixel points) can be obtained, wherein when a certain pixel point is predicted, a pixel obtained after reconstruction of a reference pixel point can be obtained from a stored context, and a pixel residual error corresponding to the pixel point is obtained according to the pixel obtained after reconstruction of the reference pixel point and the pixel of the pixel point. And entropy coding is carried out on the pixel residual error after transformation and quantization. In the quantization process, the control of the code rate can be realized by controlling the quantization parameter. And carrying out inverse quantization inverse transformation on the quantized pixel residual error corresponding to a certain pixel point, then carrying out reconstruction processing to obtain a pixel reconstructed by the pixel point, and storing the pixel reconstructed by the pixel point so as to obtain the pixel residual error corresponding to other pixel points by using the pixel reconstructed by the pixel point when the pixel point is used as a reference pixel point.

The Quantization Parameter may include a Quantization step size, a value representing or related to the Quantization step size, for example, a Quantization Parameter (QP) in h.264, h.265 or similar encoders, or a Quantization matrix or a reference matrix thereof, etc.

Fig. 3 shows a schematic diagram of data to be encoded according to an embodiment of the invention.

As shown in fig. 3, the data to be encoded 302 may include a plurality of frames 304. For example, the plurality of frames 304 may represent successive image frames in a video stream. Each frame 304 may include one or more stripes 306. Each slice 306 may include one or more macroblocks 308. Each macroblock 308 may include one or more blocks 310. Each block 310 may include one or more pixels 312. Each pixel 312 may include one or more data sets corresponding to one or more data portions, e.g., a luminance data portion and a chrominance data portion. The data units may be frames, slices, macroblocks, blocks, pixels or groups of any of the above. The size of the data units may vary in different embodiments. By way of example, a frame 304 may include 100 slices 306, each slice 306 may include 10 macroblocks 308, each macroblock 308 may include 4 (e.g., 2x2) blocks 310, and each block 310 may include 64 (e.g., 8x8) pixels 312.

The technical scheme of the embodiment of the invention can be used for coding the 360-degree panoramic video and can be applied to various products related to the 360-degree panoramic video, such as panoramic cameras, virtual reality products, Head Mounted Devices (HMDs), augmented reality products, video encoders, video decoders and the like.

A 360-degree panoramic video is usually mapped in a geometric relationship to generate a two-dimensional planar video, and then processed by digital image processing and encoding/decoding. A common format for two-dimensional planar maps mapped in a specific geometric relationship via a 360-degree panorama is graticule (Equirectangular). The longitude and latitude diagram shows a complete spherical surface to azimuth angle theta and pitch angleThe two-dimensional plan obtained by sampling is shown in fig. 4 a.

Besides the longitude and latitude map, the common mapped two-dimensional plane map format also includes hexahedron, octahedron and icosahedron formats, and other mapping mechanisms can be used to map a spherical body into a two-dimensional plane map, and the mapped two-dimensional plane map forms a two-dimensional plane video, which can be encoded and compressed by using common video coding and decoding standards, such as HEVC/h.265, h.264/AVC, AVS1-P2, AVS2-P2, VP8, VP 9. The two-dimensional plane video is obtained by spherical video mapping and can also be obtained by partial spherical video mapping. The spherical video or the partial spherical video is usually captured by a plurality of cameras.

In the embodiment of the present invention, unless otherwise specified, the processing of the 360-degree panoramic video is the processing of the two-dimensional planar video thereof.

In order to improve the compression efficiency and the coding quality of a 360-degree panoramic video and improve the coding efficiency, the embodiment of the invention provides a rotary coding scheme applied to 360-degree panoramic video coding.

as shown in fig. 4b, assuming that the rotation angle of the 360-degree panoramic video is (α, β, γ), each frame of image of the 360-degree panoramic video is rotated around the x-axis, the y-axis, and the z-axis by α, β, and γ angles, respectively, to obtain a rotated video.

Fig. 5 shows a schematic flow chart of a method 500 of video encoding of an embodiment of the present invention. The method 500 may be performed by the system 100 shown in fig. 1.

And 510, acquiring motion vector information of a first preset frame number image in the 360-degree panoramic video through the image signal processor.

In the embodiment of the invention, the motion vector information is acquired by the image signal processor, so that a secondary encoding process can be avoided.

The video is preprocessed by the image signal processor to obtain the motion vector information of each pixel point or each area of each frame of image in the video. The motion vector information of different image blocks in the image can be calculated based on the motion vector information of each pixel point or each region of the image.

In the embodiment of the invention, the motion vector information of the first preset frame number image in the 360-degree panoramic video is obtained from the image signal processor. Optionally, the motion vector information of the first preset frame number image is a local motion vector or a global motion vector of the first preset frame number image in the image signal processor.

And 520, determining a minimum motion vector value and a pole motion vector value according to the motion vector information, wherein the minimum motion vector value is a motion vector value of a region with the minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image.

The motion vector values of the different regions may be calculated based on motion vector information of the first preset number of frames image acquired from the image signal processor. Alternatively, the area may be an image Block (Block), that is, the image may be divided into image blocks, and a motion vector value of each image Block is calculated, but the embodiment of the present invention is not limited thereto.

Specifically, for a frame of image, the image is firstly divided into blocks, each image Block obtained by the division is marked as a Block, and a motion vector value of each Block in the image is obtained by calculating motion vector information of each pixel or each region (smaller than the Block).

Optionally, the image block includes a plurality of Coding Tree Units (CTUs), and the motion vector value of the image block is an average value of the motion vector values of the CTUs. The motion vector value of each CTU may be a sum of an absolute value of a horizontal component and an absolute value of a vertical component of the motion vector of all the pixels of each CTU.

For example, a Block may be 512x512 in size, and a CTU may be 64x64 in size, i.e., one Block is 8x8 CTUs. The CTU is taken as a Motion Vector (MV) calculation unit. The image is divided into blocks according to 8x8 CTUs, and the blocks can be obtained according to the number of the actual residual CTUs in the positions where the image boundary is less than 8x8 CTUs. Let the MV of one CTU be denoted as MVctu, the value of MVctu may be the sum of the absolute value of the horizontal component (MVx) and the absolute value of the vertical component (MVy) of MVs of all pixels in a 64 × 64-sized pixel block, that is:

the MV of each CTU in the picture can be calculated according to the above formula. And then calculating the MV of each Block according to the MV of each CTU in the image, and recording the MV as MVblock. Alternatively, the value of MV of each Block may be the average of the MV of all CTUs in that Block.

After the motion vector value of each Block in each frame image in the first preset frame number image is obtained, the minimum motion vector value and the pole motion vector value can be determined according to the motion vector value.

Optionally, when the first preset frame number is 1, that is, when only one frame image is taken, the minimum motion vector value is a motion vector value minMVblock of an image block with a minimum motion vector value in the frame image.

Optionally, when the first preset frame number is greater than 1, the minimum motion vector value is a minimum value avgmmvblock of average values of motion vector values of image blocks corresponding to the same position in the first preset frame number image. For example, taking the first preset number of frames as 5 as an example, the minimum motion vector value is the minimum value among the average values of the motion vector values of image blocks corresponding to the same position in a 5-frame image.

Considering that the spherical video has symmetry, only the upper half of the spherical video is actually processed. For example, for each Block in the longitude and latitude map, a Block which is symmetrical about the center of the sphere in the sphere is found first, and the average value of the two MVblocks is used as the MVs of the two blocks, so that only the upper half part of the longitude and latitude map needs to be searched in the searching and calculating process, and half of the calculated amount can be saved.

And the pole motion vector value is the motion vector value of the two-pole area corresponding to the 360-degree panoramic video in the first preset frame number image. Alternatively, for one frame image, the average value of the motion vector values of a predetermined number of blocks of the two-polar region may be taken as the pole motion vector value. For example, the two top rows of CTU regions in the longitude and latitude map are selected to calculate the pole motion vector value. Firstly, the MVctu values of the two uppermost rows of the CTUs and the MVctu values of the CTUs which are symmetrical about the center are averaged to be used as actual MVctu values, then the MVctu values of all the CTUs of the two uppermost rows of the longitude and latitude map are averaged to be used as pole motion vector values, and the pole motion vector values are recorded as polar MVctu. For a multi-frame image, the average value of the pole motion vector values obtained in the above manner for the multi-frame image may be used as the pole motion vector value. For example, taking the first preset frame number as 5 as an example, the pole motion vector value avgPolarMVctu of the 5 frame image is the average value of polarMVctu of the 5 frame image.

And 530, determining the rotation angle of the second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and the motion vector value threshold.

And comparing the minimum motion vector value and the pole motion vector value with a motion vector value threshold, and judging whether a second preset frame number image in the 360-degree panoramic video needs to be rotated.

It should be understood that, in the embodiment of the present invention, if the rotation angle is 0, it means that no rotation is performed; if the rotation angle is not 0, it indicates that the rotation is performed.

In the embodiment of the invention, the rotation angle of the second preset frame number image is determined by using the minimum motion vector value and the pole motion vector value obtained according to the first preset frame number image. Optionally, the first preset frame number image may be an image of a preset frame number after the first random entry point of the 360-degree panoramic video. The second preset frame number image may be an image of a preset frame number between the first random entry point and the second random entry point of the 360-dimensional panoramic video, wherein the second random entry point is a next random entry point of the first random entry point.

Optionally, the first preset frame number is less than or equal to the second preset frame number. That is, the rotation angle of the more frame number image can be determined from the less frame number image. For example, the rotation angles of all images from the random entry point to the next random entry point may be determined according to several frames of images after the random entry point. And rotation judgment is carried out aiming at each random access point, so that the phenomenon of later image rotation error in the video can be avoided, and more accurate rotation coding is realized.

Alternatively, the motion vector value threshold may be determined according to a motion search range of the 360-degree panoramic video. Alternatively, the motion search range may be determined by the image signal processor.

Specifically, assuming that the size of the motion search range obtained from the image signal processor is (± H, ± V), that is, the search range in the horizontal direction is [ -H, H ], and the search range in the vertical direction is [ -V, V ], the motion vector value threshold MVlow may be:

for example, in a specific embodiment, the motion search range may be (± 16, ± 8), and therefore, the motion vector value threshold MVlow calculated according to the above formula is 1536.

The rotation judgment is carried out by adopting the motion vector value threshold designed according to the motion search range, and the method can be suitable for coding of various motion search conditions.

And comparing the minimum motion vector value and the pole motion vector value with a motion vector value threshold to determine the rotation angle of the second preset frame number image.

Optionally, if the pole motion vector value is smaller than the motion vector value threshold, determining that the rotation angle is 0. This situation indicates that the pole region motion vector values are small and therefore no rotation is required.

Optionally, if the pole motion vector value is not less than the motion vector value threshold and the minimum motion vector value is less than the motion vector value threshold, determining that the rotation angle is the first angle. This case indicates that the pole region has a large motion vector value, and other regions have regions with a small motion vector value, so that rotation is required to rotate the region with the minimum motion vector value to the pole region. Alternatively, the specific rotation mode may be that the coordinates on the corresponding spherical surface are calculated by the coordinates of the central point of the area with the minimum motion vector value on the longitude and latitude map, then the rotation angle (i.e. the first angle, whose calculation formula is given below) is calculated by the rotation formula, and finally the rotation mode may be performed according to the rotation mode of the geometric spherical surface shown in fig. 4 b.

The first angle is determined according to the position of the area with the minimum motion vector value in the first preset frame number image.

Specifically, the first angle may be determined according to a position of a region having a minimum motion vector value in the first preset frame number image. For example, the coordinates of the center point of the area with the minimum motion vector value, such as Block, are (m, n), and (m, n) may be obtained by mapping the sample point on the longitude and latitude map with the point on the spherical surface: (The conversion relation between theta is calculated to obtainθ), the specific transformation formula is as follows:

θ＝(0.5-v)*π

where W, H is the width and height of the longitude and latitude map, the rotation angle is then calculated according to the following formula, i.e. the first angle (α, β, 0):

α＝90-θ，

in the case of rotation, the rotation angle is the first angle (α, β, 0), and the region with the minimum motion vector value can be rotated to the pole region by rotating according to the rotation mode shown in fig. 4 b.

Optionally, if the pole motion vector value is not less than the motion vector value threshold, the minimum motion vector value is not less than the motion vector value threshold, and the pole motion vector value is greater than a predetermined multiple of the minimum motion vector value, determining that the angle of rotation is a first angle. This case shows that although the minimum motion vector value is also large, the pole motion vector value is much larger than the minimum motion vector value, and therefore the first angle also needs to be rotated to rotate the region with the smallest motion vector value to the pole region.

Alternatively, the predetermined multiple may be 8, but the embodiment of the present invention is not limited thereto, that is, as long as it can indicate that the pole motion vector value is much larger than the minimum motion vector value.

Optionally, if the pole motion vector value is not less than the motion vector value threshold, the minimum motion vector value is not less than the motion vector value threshold, and the pole motion vector value is not greater than a predetermined multiple of the minimum motion vector value, determining the rotation angle to be 0. This situation indicates that the minimum motion vector value and the pole motion vector value are both large, and the pole motion vector value is not much larger than the minimum motion vector value, so no rotation is required.

And 540, rotating the second preset frame number image according to the rotation angle, and then encoding.

Specifically, if the rotation angle is 0, the rotation is not performed; and if the rotation angle is the first angle, rotating the image by the first angle. After the rotation operation, the image is encoded.

Alternatively, the rotation and encoding operations for the 360-degree panoramic video may be performed by both encoders or may be performed by different processing units, respectively.

Alternatively, in one embodiment, the rotation angle may be calculated before the original video is sent to the video encoder for encoding, and then the rotation angle and the original video are sent to the video encoder for encoding. The video encoder rotates the video and then encodes the video, the rotating angle of the video is written into the code stream in the encoding process, and the decoding end obtains the rotating angle from the code stream and then rotates the decoded video back. The video rotation angle information may be written in a sequence header (sequence header), a picture header (picture header), a slice header (slice header), a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), additional enhancement information (SEI), or extension data.

Alternatively, in another embodiment, the rotation angle may be calculated before the original video is sent to the video encoder for encoding, then the original video is rotated by a tool capable of rotating the video, and the rotated video is sent to the video encoder for encoding. Because the 360-degree panoramic video is watched by the head-mounted equipment or the virtual reality product and the like, the information which can be watched cannot be influenced after the video is rotated, and the decoded video does not need to be rotated back.

Fig. 6 is a flowchart of a video encoding method according to an embodiment of the invention. It should be understood that fig. 6 is only an example and should not be taken as limiting the embodiments of the present invention.

As shown in fig. 6, before encoding a 360-degree panoramic video, it is determined whether rotation is required. And respectively obtaining a pole MV and a minimum MV by calculating the MV of the CTU in the image, and calculating the rotation angle according to the position corresponding to the minimum MV. And comparing the pole MV and the minimum MV with a threshold value to judge whether rotation is needed, wherein whether rotation is needed for the second preset frame number image can be determined by using the pole MV and the minimum MV obtained according to the first preset frame number image. If the images need to be rotated, rotating the corresponding images by the rotation angle, and then coding; if rotation is not required, encoding is performed directly.

Having described the method of video encoding of an embodiment of the present invention in detail above, an apparatus and a computer system for video encoding of an embodiment of the present invention will be described below.

Fig. 7 shows a schematic block diagram of an apparatus 700 for video encoding according to an embodiment of the present invention. The apparatus 700 may perform the method of video encoding according to the embodiment of the present invention described above.

As shown in fig. 7, the apparatus 700 may include:

an image signal processor 710, configured to obtain motion vector information of a first preset frame number image in a 360-degree panoramic video;

a processing unit 720, configured to determine a minimum motion vector value and a pole motion vector value according to the motion vector information, where the minimum motion vector value is a motion vector value of a region with a minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image; determining a rotation angle of a second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and a motion vector value threshold;

a rotation unit 730, configured to rotate the second preset frame number image according to the rotation angle;

and an encoding unit 740 for encoding the rotated image.

Optionally, the processing unit 720 is further configured to:

and determining the motion vector value threshold according to the motion search range of the 360-degree panoramic video.

Optionally, the image signal processor is further configured to:

determining the motion search range.

Optionally, the motion vector information of the first preset frame number image is a local motion vector or a global motion vector of the first preset frame number image in the image signal processor.

Optionally, the first preset frame number image is an image with a preset frame number after the first random entry point of the 360-degree panoramic video.

Optionally, the second preset frame number image is an image of a preset frame number between the first random entry point and a second random entry point of the 360-degree panoramic video, where the second random entry point is a next random entry point of the first random entry point.

Optionally, the first preset frame number is less than or equal to the second preset frame number.

Optionally, the processing unit 720 is specifically configured to:

if the pole motion vector value is smaller than the motion vector value threshold, determining that the rotation angle is 0; or,

if the pole motion vector value is not less than the motion vector value threshold and the minimum motion vector value is less than the motion vector value threshold, determining that the rotation angle is a first angle; or,

if the pole motion vector value is not less than the motion vector value threshold, the minimum motion vector value is not less than the motion vector value threshold, and the pole motion vector value is greater than the minimum motion vector value of a predetermined multiple, determining that the rotation angle is a first angle; or,

if the pole motion vector value is not less than the motion vector value threshold, the minimum motion vector value is not less than the motion vector value threshold, and the pole motion vector value is not greater than the minimum motion vector value of a predetermined multiple, determining that the rotation angle is 0;

and the first angle is determined according to the position of the area with the minimum motion vector value in the first preset frame number image.

Optionally, the predetermined multiple is 8.

Optionally, the processing unit 720 is further configured to:

and determining the first angle according to the position of the region with the minimum motion vector value in the first preset frame number image.

Optionally, the minimum motion vector value is a minimum value of average values of motion vector values of image blocks corresponding to the same position in the first preset frame number image, where the image block includes a plurality of pixel points.

Optionally, the image block includes a plurality of coding tree units CTUs, and a motion vector value of the image block is an average value of motion vector values of the plurality of CTUs, where the motion vector value of each CTU is a sum of an absolute value of a horizontal component and an absolute value of a vertical component of a motion vector of all pixel points of each CTU.

Alternatively, the rotation unit 730 and the encoding unit 740 may be implemented by an encoder, or implemented separately, for example, the rotation unit 730 is implemented by a tool capable of rotating the video, and the encoding unit 740 is implemented by an encoder.

It should be understood that the apparatus for video encoding according to the above embodiment of the present invention may be a chip, which may be specifically implemented by a circuit, but the embodiment of the present invention is not limited to a specific implementation form.

FIG. 8 shows a schematic block diagram of a computer system 800 of an embodiment of the invention.

As shown in fig. 8, the computer system 800 may include a processor 810 and a memory 820.

It should be understood that the computer system 800 may also include other components commonly included in computer systems, such as input/output devices, communication interfaces, etc., which are not limited by the embodiments of the present invention.

The memory 820 is used to store computer executable instructions.

The Memory 820 may be various types of memories, and may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory, which is not limited in this embodiment of the present invention.

The processor 810 is configured to access the memory 820 and execute the computer-executable instructions to perform the operations of the method for video encoding of an embodiment of the present invention described above.

The processor 810 may include a microprocessor, a Field-Programmable gate array (FPGA), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the like, which are not limited in the embodiments of the present invention.

The video encoding apparatus and the computer system according to the embodiments of the present invention may correspond to the execution main body of the video encoding method according to the embodiments of the present invention, and the above and other operations and/or functions of each module in the video encoding apparatus and the computer system are respectively for implementing corresponding flows of the foregoing methods, and are not described herein again for brevity.

The embodiment of the present invention further provides an electronic device, which may include the video coding apparatus or the computer system according to the various embodiments of the present invention.

Embodiments of the present invention also provide a computer storage medium having a program code stored therein, where the program code may be used to instruct a method for performing video coding according to the above embodiments of the present invention.

It should be understood that, in the embodiment of the present invention, the term "and/or" is only one kind of association relation describing an associated object, and means that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of video encoding, comprising:

acquiring motion vector information of a first preset frame number image in a 360-degree panoramic video through an image signal processor;

determining a minimum motion vector value and a pole motion vector value according to the motion vector information, wherein the minimum motion vector value is a motion vector value of a region with the minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image;

determining a rotation angle of a second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and a motion vector value threshold;

and rotating the second preset frame number image according to the rotation angle, and then encoding.

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

determining, by the image signal processor, the motion search range.

4. The method according to any one of claims 1 to 3, wherein the motion vector information of the first predetermined frame number image is a local motion vector or a global motion vector of the first predetermined frame number image in the image signal processor.

5. The method as claimed in any one of claims 1 to 4, wherein the first predetermined frame number image is a predetermined frame number image after the first random cut point of the 360-degree panoramic video.

6. The method of claim 5, wherein the second predetermined frame number image is a predetermined frame number image from the first random access point to a second random access point of the 360-degree panoramic video, and wherein the second random access point is a next random access point to the first random access point.

7. The method of any one of claims 1 to 6, wherein the first preset number of frames is less than or equal to the second preset number of frames.

8. The method according to any one of claims 1 to 7, wherein said determining a rotation angle of a second preset frame number of images in the 360 degree panoramic video according to the minimum motion vector value, the pole motion vector value, and a motion vector value threshold comprises:

9. The method of claim 8, wherein the predetermined multiple is 8.

10. The method according to claim 8 or 9, characterized in that the method further comprises:

11. The method according to any one of claims 1 to 10, wherein the minimum motion vector value is a minimum value of an average value of motion vector values of image blocks corresponding to the same position in the first preset frame number image, wherein the image blocks include a plurality of pixel points.

12. The method according to claim 11, wherein the image block comprises a plurality of Coding Tree Units (CTUs), and the motion vector value of the image block is an average value of the motion vector values of the plurality of CTUs, wherein the motion vector value of each CTU is a sum of an absolute value of a horizontal component and an absolute value of a vertical component of the motion vector of all pixels of the each CTU.

13. An apparatus for video encoding, comprising:

the image signal processor is used for acquiring motion vector information of a first preset frame number image in the 360-degree panoramic video;

a processing unit, configured to determine a minimum motion vector value and a pole motion vector value according to the motion vector information, where the minimum motion vector value is a motion vector value of a region with a minimum motion vector value in the first preset frame number image, and the pole motion vector value is a motion vector value of a two-pole region corresponding to the 360-degree panoramic video in the first preset frame number image; determining a rotation angle of a second preset frame number image in the 360-degree panoramic video according to the minimum motion vector value, the pole motion vector value and a motion vector value threshold;

a rotation unit, configured to rotate the second preset frame number image according to the rotation angle;

and an encoding unit for encoding the rotated image.

14. The apparatus of claim 13, wherein the processing unit is further configured to:

15. The apparatus of claim 14, wherein the image signal processor is further configured to:

determining the motion search range.

16. The apparatus of any one of claims 13 to 15, wherein the motion vector information of the first predetermined frame number image is a local motion vector or a global motion vector of the first predetermined frame number image in the image signal processor.

17. The apparatus of any one of claims 13 to 16, wherein the first preset frame number image is a preset frame number image after a first random entry point of the 360-view panoramic video.

18. The apparatus of claim 17, wherein the second predetermined frame number image is a predetermined frame number image from the first random access point to a second random access point of the 360-dimensional panoramic video, and wherein the second random access point is a next random access point to the first random access point.

19. The apparatus of any one of claims 13 to 18, wherein the first preset number of frames is less than or equal to the second preset number of frames.

20. The apparatus according to any one of claims 13 to 19, wherein the processing unit is specifically configured to:

21. The apparatus of claim 20, wherein the predetermined multiple is 8.

22. The apparatus according to claim 20 or 21, wherein the processing unit is further configured to:

23. The apparatus according to any one of claims 13 to 22, wherein the minimum motion vector value is a minimum value of an average value of motion vector values of image blocks corresponding to a same position in the first preset frame number image, wherein the image blocks comprise a plurality of pixels.

24. The apparatus of claim 23, wherein the image block comprises a plurality of Coding Tree Units (CTUs), and wherein a motion vector value of the image block is an average of motion vector values of the plurality of CTUs, and wherein a motion vector value of each CTU is a sum of an absolute value of a horizontal component and an absolute value of a vertical component of a motion vector of all pixels of the each CTU.

25. A computer system, comprising:

a memory for storing computer executable instructions;

a processor for accessing the memory and executing the computer-executable instructions to perform operations in the method of any one of claims 1 to 12.