WO2024164710A1 - 一种帧间编码方法、装置、电子设备及计算机可读介质 - Google Patents
一种帧间编码方法、装置、电子设备及计算机可读介质 Download PDFInfo
- Publication number
- WO2024164710A1 WO2024164710A1 PCT/CN2023/139651 CN2023139651W WO2024164710A1 WO 2024164710 A1 WO2024164710 A1 WO 2024164710A1 CN 2023139651 W CN2023139651 W CN 2023139651W WO 2024164710 A1 WO2024164710 A1 WO 2024164710A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video frame
- prediction unit
- motion vector
- panoramic video
- candidate block
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 239000013598 vector Substances 0.000 claims abstract description 166
- 230000002123 temporal effect Effects 0.000 claims abstract description 34
- 230000008569 process Effects 0.000 claims abstract description 31
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 18
- 238000004891 communication Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 13
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
Definitions
- the present application relates to the field of image processing technology, and in particular to an inter-frame coding method, an inter-frame coding device, an electronic device and a computer-readable medium.
- the edge parts of the panoramic video can usually be cropped and spliced so that the panoramic video screen can be completely stored in a smaller video frame size.
- the panoramic video screen after cropping and splicing may have visually visible artificial traces, affecting the viewing effect.
- redundant pixels can be introduced into the video frame of the panoramic video.
- redundant pixels can lead to a decrease in coding efficiency.
- the embodiments of the present application provide an inter-frame coding method, device, electronic device and computer-readable medium to solve the problem of low coding efficiency and poor viewing effect.
- the embodiment of the present application discloses an inter-frame coding method, which is applied to a panoramic video, wherein the panoramic video includes at least one panoramic video frame, and the panoramic video frame includes a splicing boundary.
- the method includes:
- the current panoramic video frame is divided into a plurality of coding units, wherein the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, wherein the co-located video frame has a co-located reference video frame;
- the co-located prediction unit has a co-located motion vector
- the co-located motion vector is scaled according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, as well as the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame to obtain the time domain candidate motion vector of the prediction unit.
- the method further comprises:
- the prediction unit For the prediction unit, if the prediction unit is adjacent to the splicing boundary, at least one candidate block is determined based on the spherical neighbor relationship;
- the motion vector corresponding to the candidate block is used as the spatial candidate motion vector of the prediction unit.
- the step of determining at least one candidate block based on the spherical neighbor relationship includes:
- the candidate block to be confirmed is rotated to obtain a candidate block.
- the method further comprises:
- an interpolation process is performed on the spatial candidate motion vector of the prediction unit.
- the method further comprises:
- Motion estimation is performed based on the temporal candidate motion vectors and/or the spatial candidate motion vectors to determine an optimal motion vector.
- performing motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector includes:
- Sub-pixel precision motion estimation is performed based on the temporal candidate motion vector and/or the spatial candidate motion vector.
- the sub-pixel precision motion estimation includes sub-pixel interpolation calculation
- the step of performing sub-pixel precision motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector comprises:
- interpolation is performed based on at least one pixel around the prediction unit that is closest to the splicing boundary.
- the embodiment of the present application further discloses an inter-frame coding device, which is applied to a panoramic video, wherein the panoramic video includes at least one panoramic video frame, and the panoramic video frame includes a splicing boundary, and the device includes:
- a coding unit division module is used for dividing the current panoramic video frame into a plurality of coding units, wherein the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, and the co-located video frame has a co-located reference video frame;
- a reference prediction unit determination module is used to determine a co-located prediction unit corresponding to the prediction unit in the co-located video frame; the co-located prediction unit has a co-located motion vector;
- the time domain candidate motion vector acquisition module is used to scale the co-located motion vector for the prediction unit according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame, so as to obtain the time domain candidate motion vector of the prediction unit if the prediction unit is adjacent to the splicing boundary.
- the device further comprises:
- a candidate block determination module for determining at least one candidate block based on a spherical neighbor relationship for a prediction unit if the prediction unit is adjacent to a splicing boundary;
- the spatial candidate motion vector acquisition module is used to use the motion vector corresponding to the candidate block as the spatial candidate motion vector of the prediction unit.
- the candidate block determination module includes:
- a candidate block to be determined determination submodule used to determine at least one candidate block to be confirmed based on a spherical adjacent relationship
- a rotation processing determination submodule is used to determine whether the candidate block to be confirmed has been subjected to rotation processing
- the candidate block acquisition submodule is used to rotate the candidate block to be confirmed if the candidate block to be confirmed has been rotated to obtain the candidate block.
- the device further comprises:
- the interpolation processing module is used to perform interpolation processing on the spatial candidate motion vector of the prediction unit if the reference pixel area corresponding to the prediction unit has a different resolution from the prediction unit.
- the device further comprises:
- the optimal motion vector determination module is used to perform motion estimation based on the temporal candidate motion vectors and/or the spatial candidate motion vectors to determine the optimal motion vector.
- the optimal motion vector determination module includes:
- the motion estimation submodule is used to perform sub-pixel precision motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector.
- the sub-pixel precision motion estimation includes sub-pixel interpolation calculation
- the motion estimation submodule includes:
- the motion estimation unit is used to perform interpolation based on at least one pixel closest to the splicing boundary around the prediction unit during the sub-pixel interpolation calculation process if the prediction unit is adjacent to the splicing boundary.
- the embodiment of the present application also discloses an electronic device, including a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;
- Memory used to store computer programs
- the processor is used to implement the method of the embodiment of the present application when executing the program stored in the memory.
- the embodiments of the present application also disclose one or more computer-readable media on which instructions are stored, which, when executed by one or more processors, enable the processors to execute the methods of the embodiments of the present application.
- the current panoramic video frame is divided into a number of coding units, and the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, and the co-located video frame has a co-located reference video frame; a co-located prediction unit corresponding to the prediction unit is determined in the co-located video frame; the co-located prediction unit has a co-located motion vector; for the prediction unit, if the prediction unit is adjacent to a splicing boundary, the co-located motion vector is scaled according to the horizontal position relationship and the vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and the vertical position relationship between the co-located video frame and the co-located reference video frame, to obtain a time domain candidate motion vector of the prediction unit.
- the prediction units at the stitching boundaries in adjacent panoramic video image frames by considering both the horizontal position relationship and the vertical position relationship in the process of determining the time domain candidate motion vector, when the object moves across the stitching boundary, the position relationship between the current panoramic video frame and the reference video frame and the position relationship between the co-located video frame and the co-located reference video frame can be fully considered, so that the motion vector corresponding to the prediction unit can be determined more accurately during the encoding process, and the encoding effect is improved without the need to enter additional redundant pixels, which can ultimately improve the viewing experience of the panoramic video.
- FIG1 is a schematic diagram of stitching a panoramic video provided in an embodiment of the present application.
- FIG2 is a schematic diagram of a panoramic video frame provided in an embodiment of the present application.
- FIG3 is a flowchart of a method for inter-frame coding provided in an embodiment of the present application.
- FIG4 is a schematic diagram of a video frame provided in an embodiment of the present application.
- FIG5 is a schematic diagram of a candidate block in a process of establishing a spatial candidate list provided in the prior art
- FIG6 is a schematic diagram of a panoramic video frame provided in an embodiment of the present application.
- FIG7 is a structural block diagram of an inter-frame coding device provided in an embodiment of the present application.
- FIG8 is a block diagram of an electronic device provided in an embodiment of the present application.
- FIG. 9 is a schematic diagram of a computer-readable medium provided in an embodiment of the present application.
- a panoramic video may include several panoramic video frames.
- a spherical equal area expansion mapping method may be used. After the panoramic video is mapped to a plane by this method, the boundary of the panoramic video may not be a regular rectangular shape. Therefore, in order to reduce redundant pixels as much as possible, the panoramic video is stored in a video frame of a smaller size, and the panoramic video may be cropped and spliced.
- Figure 1 is a schematic diagram of the stitching of a panoramic video according to an embodiment of the present application.
- the boundary of the panoramic video can be approximated to a triangle.
- the boundary of the panoramic video can be divided into part A, part B, and part C, and the pixels of part A and part C are rearranged and rotated so that the boundary of the panoramic video can be approximated to a regular rectangle.
- Figure 2 is a schematic diagram of a panoramic video frame according to an embodiment of the present application, and there may be a stitching boundary between the stitched part and the unstitched part, which is used to distinguish the image area that has been stitched and the unstitched part.
- the stitching boundary can be a black solid line, a white solid line, etc., and the present application does not limit this.
- the embodiment of the present application since the panoramic video frames in the panoramic video can be processed by cropping, splicing, rotating, etc., in this case, if the traditional encoding method is directly used to encode the panoramic video in HEVC (High Efficiency Video Coding), the encoding result may not be able to obtain the same encoding effect as the general video, making the viewing experience of the panoramic video poor. Therefore, the embodiment of the present application improves the inter-frame encoding method of the panoramic video, so that the motion vector corresponding to the prediction unit can be determined more accurately during the encoding process, and the encoding effect is improved without the need to enter additional redundant pixels, which can ultimately improve the viewing experience of the panoramic video.
- HEVC High Efficiency Video Coding
- FIG. 3 a flowchart of an inter-frame coding method provided in an embodiment of the present application is shown, which is applied to a panoramic video, wherein the panoramic video includes at least one panoramic video frame, and the panoramic video frame includes a splicing boundary. Specifically, the following steps may be included:
- Step 301 for a current panoramic video frame, divide the current panoramic video frame into a plurality of coding units, where the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, where the co-located video frame has a co-located reference video frame;
- a panoramic video may include at least one panoramic video frame.
- the panoramic video frames in the panoramic video may be encoded in sequence.
- the panoramic video frame currently to be encoded is segmented so that the panoramic video frame can be divided into a number of coding units (coding unit, CU).
- the coding unit may further include a prediction unit (prediction unit, PU).
- the prediction unit may be the coding unit, or the prediction unit may be obtained by further segmenting the coding unit.
- the panoramic video frame can first be divided into several coding tree units (CTUs).
- the coding tree unit can be further evenly divided into 4 coding units. Thereafter, depending on the prediction mode used, the coding unit can be further divided and some of the units can be used as prediction units.
- the current panoramic video frame may correspond to a reference video frame.
- the current panoramic video frame may be encoded based on the reference video frame.
- the current panoramic video frame may also have a co-located video frame, which may be an encoded video frame with the smallest difference in picture order count (POC) with the current panoramic video frame.
- the panoramic video frame can refer to the co-located video frame to determine the motion vector, so that video encoding can be completed based on the change of the image in the time domain.
- the co-located video frame can also usually have its corresponding video frame used as a reference during encoding, that is, the co-located reference video frame.
- Step 302 determining a co-located prediction unit corresponding to the prediction unit in the co-located video frame; the co-located prediction unit has a co-located motion vector;
- the co-located video frame there may be a co-located prediction unit corresponding to the prediction unit in the current panoramic video frame.
- a co-located prediction unit corresponding to the prediction unit in the current panoramic video frame.
- two positions with a high correlation with the prediction unit can be found in the co-located video frame as candidate co-located prediction units, and finally one of the positions is determined as the co-located prediction unit corresponding to the prediction unit. Since the co-located video frame has been encoded, the co-located prediction unit may already have a corresponding co-located motion vector.
- Step 303 for the prediction unit, if the prediction unit is adjacent to the stitching boundary, the co-located motion vector is scaled according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame to obtain the time domain candidate motion vector of the prediction unit.
- the horizontal position relationship between the current panoramic video frame and the reference video can usually be considered in the process of determining the temporal candidate motion vector.
- the prediction unit is adjacent to the splicing boundary in the current panoramic video frame
- the object in the panoramic video may move across the splicing boundary.
- the spliced part in the panoramic video frame can be processed by cropping, splicing, rotation, etc. In this case, only using the horizontal position relationship to determine the temporal candidate motion vector may not accurately describe the motion change of the object.
- the co-located motion vector can be scaled according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame, and the temporal candidate motion vector is determined on the basis of considering the horizontal and vertical position relationship encoding at the same time, so that the motion vector corresponding to the prediction unit can be determined more accurately, the video encoding effect can be improved, and the user's video viewing experience can be improved.
- FIG4 is a schematic diagram of a video frame provided in an embodiment of the present application.
- the horizontal distance between the current panoramic video frame and the reference video frame can be tb
- the horizontal distance between the co-located video frame and the co-located reference video frame can be td.
- tb and td can be measured using the difference in image sequence numbers.
- the vertical distance between the current panoramic video frame and the reference video frame can be b
- the vertical distance between the co-located video frame and the co-located reference video frame can be d.
- the temporal candidate motion vector of the prediction unit can be calculated using the following calculation method:
- tb is the horizontal distance between the current panoramic video frame and the reference video frame
- td is the horizontal distance between the co-located video frame and the co-located reference video frame
- B is the vertical distance between the current panoramic video frame and the reference video frame
- d is the vertical distance between the co-located video frame and the co-located reference video frame.
- the method further includes:
- several spatial candidate motion vectors in the spatial domain can also be determined to select the best motion vector between the temporal candidate motion vector and the spatial candidate motion vector for further motion estimation to complete the encoding of the image.
- the candidate blocks A0, A1, B1, B0, and B2 adjacent to the current prediction unit (current PU) can usually be selected as the spatial candidate motion vector.
- the prediction unit when the prediction unit is adjacent to the splicing boundary in the current panoramic video frame, since the panoramic video frame is expanded in a spherical equal-area manner, the reference value of the candidate block can be lower when the original method is used to select the candidate block. Therefore, for the current panoramic video frame expanded in a spherical equal-area manner, at least one candidate block can be determined based on the spherical adjacent relationship, which can further improve the encoding accuracy of the panoramic video frame.
- the prediction unit may refer to the motion vector of the candidate block to determine its own motion vector.
- the motion vector corresponding to the candidate block may be used as the spatial candidate motion vector of the prediction unit.
- the step of determining at least one candidate block based on the spherical neighbor relationship includes:
- the candidate block to be confirmed may be located on the image area that has been rotated because the prediction unit is close to the splicing boundary. At this time, if the prediction unit directly refers to the motion vector of the candidate block to be confirmed, its accuracy may be relatively low. Therefore, it is necessary to determine whether the candidate block to be confirmed has been rotated in order to further process the candidate block to be confirmed.
- the candidate block to be confirmed can be rotated so that the candidate block to be confirmed can be at the same rotation angle as the prediction unit to obtain the candidate block. For example, if the prediction unit has not been rotated, and the candidate block to be confirmed has been rotated 180 degrees, the candidate block to be confirmed can be rotated 180 degrees to obtain a candidate block corresponding to the rotation angle of the prediction unit. After the rotation process is completed, the motion vector can be recalculated for the candidate block, and the recalculated motion vector can be used as the spatial candidate motion vector of the prediction unit.
- the prediction unit may also be located on a rotated image region.
- the prediction unit may be reversely rotated to restore it to a state that has not been rotated, and motion vector prediction may be performed in the state that has not been rotated. For example, if the prediction unit is rotated 90 degrees clockwise, the prediction unit may be rotated 90 degrees counterclockwise to obtain a rotated prediction unit.
- the method further includes:
- the prediction unit may have a reference pixel region, the reference pixel region may generally be located at an edge of a video frame, and the prediction unit may perform encoding processing based on the reference pixel region.
- Figure 6 is a schematic diagram of a panoramic video frame provided in an embodiment of the present application, where the resolution of the prediction unit in area F, that is, on the stitching boundary, may be different from the corresponding reference pixel area f.
- the calculated spatial candidate motion vector of the prediction unit may not be directly used for motion prediction and other processing.
- the spatial candidate motion vector of the prediction unit can be interpolated based on the motion vectors of other prediction units adjacent to the prediction unit, so that the spatial candidate motion vector of the prediction unit can be normally used for motion prediction and other processing.
- the interpolation calculation of the spatial candidate motion vector of the prediction unit can be performed using the following formula:
- w i is the distance between the prediction unit and other adjacent prediction units; and is the horizontal component of the motion vectors of other prediction units adjacent to the prediction unit; and is the vertical component of the motion vectors of other prediction units adjacent to the prediction unit; is the horizontal component of the candidate motion vector in the spatial domain of the prediction unit, is the vertical component of the candidate motion vector in the spatial domain of the prediction unit, is the candidate spatial motion vector of the prediction unit calculated after interpolation.
- the method further includes:
- motion estimation can be further performed, and a search starting point is determined based on the temporal candidate motion vector and/or the spatial candidate motion vector to search for the optimal matching block, and the optimal motion vector is determined based on the optimal matching block.
- performing motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector includes:
- sub-pixel precision motion estimation may be used for motion estimation.
- sub-pixel precision motion estimation can be divided into sub-pixel interpolation and sub-search processes.
- the sub-pixel interpolation step can be centered on the best integer pixel motion vector searched out by the integer pixel motion estimation, obtain 8 1/2 pixel points near it, and interpolate to obtain the sub-pixel reference blocks corresponding to these 8 1/2 pixel points.
- the sub-search process can calculate the cost of these 9 points (8 1/2 pixel points and 1 integer pixel point), and replace the point with the smallest cost as the best sub-pixel motion vector.
- sub-pixel precision motion estimation includes sub-pixel interpolation calculation
- the step of performing sub-pixel precision motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector comprises:
- the prediction unit is adjacent to the splicing boundary, in the process of determining the 8 nearby 1/2 pixel points, since the panoramic video frame is expanded with equal area on a sphere, if the 8 1/2 pixel points adjacent to the best integer pixel motion vector are directly selected, then the 8 1/2 pixel points are not the best reference points for the calculation process. Therefore, at least one pixel closest to the splicing boundary in the prediction unit and a pixel in a spherical adjacent relationship can be selected as a reference pixel in the sub-pixel interpolation calculation process for difference processing, thereby further improving the calculation accuracy.
- the current panoramic video frame is divided into a plurality of coding units, the coding unit including a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, the co-located video frame has a co-located reference video frame; a co-located prediction unit corresponding to the prediction unit is determined in the co-located video frame; the co-located prediction unit has a co-located motion vector; for the prediction unit, if the prediction unit is adjacent to a splicing boundary, the prediction unit is According to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame, the co-located motion vector is scaled to obtain the temporal candidate motion vector of the prediction unit.
- the prediction unit at the splicing boundary in the adjacent panoramic video image frame by considering the horizontal position relationship and the vertical position relationship at the same time in the process of determining the temporal candidate motion vector, when the object moves across the splicing boundary, the position relationship between the current panoramic video frame and the reference video frame, and the position relationship between the co-located video frame and the co-located reference video frame can be fully considered, so that the motion vector corresponding to the prediction unit can be determined more accurately during the encoding process, and the encoding effect is improved without the need to enter additional redundant pixels, which can ultimately improve the viewing experience of the panoramic video.
- FIG. 7 a block diagram of an inter-frame coding apparatus provided in an embodiment of the present application is shown, which is applied to a panoramic video.
- the panoramic video includes at least one panoramic video frame.
- the panoramic video frame includes a splicing boundary.
- the apparatus may include the following modules:
- the coding unit division module 701 is used for dividing the current panoramic video frame into a plurality of coding units, wherein the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, and the co-located video frame has a co-located reference video frame;
- a reference prediction unit determination module 702 is used to determine a co-located prediction unit corresponding to the prediction unit in the co-located video frame; the co-located prediction unit has a co-located motion vector;
- the time domain candidate motion vector acquisition module 703 is used to scale the co-located motion vector according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, and the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame for the prediction unit, so as to obtain the time domain candidate motion vector of the prediction unit if the prediction unit is adjacent to the splicing boundary.
- the device further comprises:
- a candidate block determination module for determining at least one candidate block based on a spherical neighbor relationship for a prediction unit if the prediction unit is adjacent to a splicing boundary;
- the spatial candidate motion vector acquisition module is used to use the motion vector corresponding to the candidate block as the spatial candidate motion vector of the prediction unit.
- the candidate block determination module includes:
- a candidate block to be determined determination submodule used to determine at least one candidate block to be confirmed based on a spherical adjacent relationship
- a rotation processing determination submodule is used to determine whether the candidate block to be confirmed has been subjected to rotation processing
- the candidate block acquisition submodule is used to rotate the candidate block to be confirmed if the candidate block to be confirmed has been rotated to obtain the candidate block.
- the device further comprises:
- the interpolation processing module is used to interpolate the prediction unit if the reference pixel area corresponding to the prediction unit has a different resolution from the prediction unit.
- the spatial candidate motion vectors of the detection unit are interpolated.
- the device further comprises:
- the optimal motion vector determination module is used to perform motion estimation based on the temporal candidate motion vectors and/or the spatial candidate motion vectors to determine the optimal motion vector.
- the optimal motion vector determination module includes:
- the motion estimation submodule is used to perform sub-pixel precision motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector.
- the sub-pixel precision motion estimation includes sub-pixel interpolation calculation
- the motion estimation submodule includes:
- the motion estimation unit is used to perform interpolation based on at least one pixel closest to the splicing boundary around the prediction unit during the sub-pixel interpolation calculation process if the prediction unit is adjacent to the splicing boundary.
- the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.
- an embodiment of the present application further provides an electronic device, as shown in FIG8 , including a processor 801, a communication interface 802, a memory 803, and a communication bus 804, wherein the processor 801, the communication interface 802, and the memory 803 communicate with each other through the communication bus 804.
- Memory 803 used for storing computer programs
- the processor 801 is used to execute the program stored in the memory 803, and implements the following steps:
- the current panoramic video frame is divided into a plurality of coding units, wherein the coding unit includes a prediction unit; the current panoramic video frame corresponds to a reference video frame and a co-located video frame, wherein the co-located video frame has a co-located reference video frame;
- the co-located prediction unit has a co-located motion vector
- the co-located motion vector is scaled according to the horizontal position relationship and vertical position relationship between the current panoramic video frame and the reference video frame, as well as the horizontal position relationship and vertical position relationship between the co-located video frame and the co-located reference video frame to obtain the time domain candidate motion vector of the prediction unit.
- the method further comprises:
- the prediction unit For the prediction unit, if the prediction unit is adjacent to the splicing boundary, at least one candidate block is determined based on the spherical neighbor relationship;
- the motion vector corresponding to the candidate block is used as the spatial candidate motion vector of the prediction unit.
- the step of determining at least one candidate block based on the spherical neighbor relationship includes:
- the candidate block to be confirmed is rotated to obtain a candidate block.
- the method further comprises:
- an interpolation process is performed on the spatial candidate motion vector of the prediction unit.
- the method further comprises:
- Motion estimation is performed based on the temporal candidate motion vectors and/or the spatial candidate motion vectors to determine an optimal motion vector.
- performing motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector includes:
- Sub-pixel precision motion estimation is performed based on the temporal candidate motion vector and/or the spatial candidate motion vector.
- the sub-pixel precision motion estimation includes sub-pixel interpolation calculation
- the step of performing sub-pixel precision motion estimation based on the temporal candidate motion vector and/or the spatial candidate motion vector comprises:
- interpolation is performed based on at least one pixel around the prediction unit that is closest to the splicing boundary.
- the communication bus mentioned in the above terminal can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used for communication between the above terminal and other devices.
- the memory may include a random access memory (RAM) or a non-volatile memory, such as at least one disk storage.
- the memory may also be at least one storage device located away from the aforementioned processor.
- processors can be general-purpose processors, including central processing units (CPU), network processors (NP), etc.; they can also be digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
- CPU central processing units
- NP network processors
- DSP digital signal processors
- ASIC application specific integrated circuits
- FPGA field programmable gate arrays
- a computer-readable storage medium 901 is further provided, in which instructions are stored.
- the instructions executes the inter-frame coding method in the above embodiment.
- a computer program product including instructions is also provided.
- the computer program product When the computer program product is run on a computer, the computer executes the inter-frame coding method in the above embodiment.
- the computer program product includes one or more computer instructions.
- the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions can be transmitted from one website site, computer, server or data center to another website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
- the computer-readable storage medium can be any available medium that can be accessed by the computer or a data storage device such as a server or data center that includes one or more available media integrated.
- the available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive Solid State Disk (SSD)), etc.
- SSD Solid State Disk
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请实施例提供了一种帧间编码方法、装置、电子设备及计算机可读介质。所述方法包括:针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对参考运动矢量进行缩放,得到预测单元的时域候选运动矢量。可以全面地考虑当前全景视频帧与参考视频帧之间的位置关系以及同位视频帧与同位参考视频帧之间位置关系,使得编码过程中可以更加准确地确定预测单元对应的运动矢量,提高编码效果,最终可以提高全景视频的观看体验。
Description
本申请要求于2023年02月10日提交中国专利局,申请号为202310097652.0,申请名称为“一种帧间编码方法、装置、电子设备及计算机可读介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及图像处理技术领域,特别是涉及一种帧间编码方法、一种帧间编码装置、一种电子设备以及一种计算机可读介质。
一般来说,全景视频映射为平面视频存储的过程中,为了便于提高存储效率,通常全景视频的边缘部分可以进行裁切、拼接处理,以便在较小的视频帧尺寸中完整存储全景视频的画面。但是,经过裁切拼接处理后的全景视频画面可能出现视觉可见的人工痕迹,影响观看效果。现有技术中,为了确保较好的观看效果,可以在全景视频的视频帧中引入冗余像素。但是,冗余像素可以导致编码效率的降低。
发明内容
本申请实施例是提供一种帧间编码方法、装置、电子设备及计算机可读介质,以解决编码效率不高且观看效果不好的问题。
本申请实施例公开了一种帧间编码方法,其应用于全景视频,全景视频包括至少一个全景视频帧,全景视频帧中包含拼接边界,方法包括:
针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;
在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;
针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。
可选地,方法还包括:
针对预测单元,若预测单元邻近拼接边界,基于球面相邻关系,确定至少一个候选块;
将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
可选地,基于球面相邻关系,确定至少一个候选块的步骤,包括:
基于球面相邻关系,确定至少一个待确认候选块;
确定待确认候选块是否经过旋转处理;
若待确认候选块经过旋转处理,旋转待确认候选块,得到候选块。
可选地,方法还包括:
若预测单元对应的参考像素区域与预测单元的分辨率不同,对预测单元的空域候选运动矢量进行插值处理。
可选地,方法还包括:
基于时域候选运动矢量和/或空域候选运动矢量进行运动估计,确定最优运动矢量。
可选地,基于时域候选运动矢量和/或空域候选运动矢量进行运动估计包括:
基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计。
可选地,亚像素精度运动估计包括亚像素插值计算;
基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计的步骤,包括:
在亚像素插值计算过程中,若预测单元邻近拼接边界,基于预测单元周围距离拼接边界最近的至少一个像素进行插值。
本申请实施例还公开了一种帧间编码装置,其应用于全景视频,全景视频包括至少一个全景视频帧,全景视频帧中包含拼接边界,装置包括:
编码单元划分模块,用于针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;
参考预测单元确定模块,用于在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;
时域候选运动矢量获取模块,用于针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。
可选地,装置还包括:
候选块确定模块,用于针对预测单元,若预测单元邻近拼接边界,基于球面相邻关系,确定至少一个候选块;
空域候选运动矢量获取模块,用于将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
可选地,候选块确定模块包括:
待确定候选块确定子模块,用于基于球面相邻关系,确定至少一个待确认候选块;
旋转处理确定子模块,用于确定待确认候选块是否经过旋转处理;
候选块获取子模块,用于若待确认候选块经过旋转处理,旋转待确认候选块,得到候选块。
可选地,装置还包括:
插值处理模块,用于若预测单元对应的参考像素区域与预测单元的分辨率不同,对预测单元的空域候选运动矢量进行插值处理。
可选地,装置还包括:
最优运动矢量确定模块,用于基于时域候选运动矢量和/或空域候选运动矢量进行运动估计,确定最优运动矢量。
可选地,最优运动矢量确定模块包括:
运动估计子模块,用于基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计。
可选地,亚像素精度运动估计包括亚像素插值计算;
运动估计子模块包括:
运动估计单元,用于在亚像素插值计算过程中,若预测单元邻近拼接边界,基于预测单元周围距离拼接边界最近的至少一个像素进行插值。
本申请实施例还公开了一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,处理器、通信接口以及存储器通过通信总线完成相互间的通信;
存储器,用于存放计算机程序;
处理器,用于执行存储器上所存放的程序时,实现如本申请实施例的方法。
本申请实施例还公开了一个或多个计算机可读介质,其上存储有指令,当由一个或多个处理器执行时,使得处理器执行如本申请实施例的方法。
本申请实施例包括以下优点:
针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。针对邻近全景视频图像帧中拼接边界的预测单元,通过在确定时域候选运动矢量的过程中,同时考虑水平位置关系与垂直位置关系,使得物体跨拼接边界运动的情况下,可以全面地考虑当前全景视频帧与参考视频帧之间的位置关系以及同位视频帧与同位参考视频帧之间位置关系,使得编码过程中可以更加准确地确定预测单元对应的运动矢量,在不需要额外进入冗余像素的情况下,提高编码效果,最终可以提高全景视频的观看体验。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例中提供的一种全景视频的拼接示意图;
图2是本申请实施例中提供的一种全景视频帧的示意图;
图3是本申请实施例中提供的一种帧间编码方法的步骤流程图;
图4是本申请实施例中提供的一种视频帧的示意图;
图5是现有技术中提供的一种空域候选列表建立过程中候选块的示意图;
图6是本申请实施例中提供的一种全景视频帧的示意图;
图7是本申请实施例中提供的一种帧间编码装置的结构框图;
图8是本申请实施例中提供的一种电子设备的框图;
图9是本申请实施例中提供的一种计算机可读介质的示意图。
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
在本申请实施例中,全景视频中可以包含若干全景视频帧,在将全景视频映射为平面的过程中,可以采用球面等面积展开映射方法。采用该方将全景视频映射为平面后,全景视频的边界可以不为规整的矩形形状。由此,为了尽可能减少冗余像素,在较小尺寸的视频帧中存储全景视频,可以对全景视频进行裁切拼接处理。
具体而言,如图1所示,图1为本申请实施例的一种全景视频的拼接示意图。在图1中,全景视频的边界可以近似于三角形。为了尽可能在较小的空间中存储全景视频,可以将全景视频边界分为A部分、B部分、以及C部分,并将A部分以及C部分的像素重排后进行旋转,使得全景视频的边界可以近似于规整的矩形。同时,在经过处理的全景视频中,如图2所示,图2为本申请实施例的一种全景视频帧的示意图,经过拼接处理的部分与未经过拼接处理的部分之间可以具有拼接边界,用于区分经过拼接处理以及未经过拼接处理的图像区域。拼接边界可以为黑色实线,白色实线等,本申请对此不做限制。
在本申请实施例中,由于全景视频中全景视频帧可以经过裁切、拼接、旋转等处理,在此情况下,若直接采用传统的编码方式对全景视频进行HEVC(High Efficiency Video Coding,高效率视频编码)视频编码,编码得到的结果可能无法获得如同一般视频的编码效果,使得全景视频的观看体验较差。因此,本申请实施例针对全景视频的帧间编码方法进行改进,使得编码过程中可以更加准确地确定预测单元对应的运动矢量,在不需要额外进入冗余像素的情况下,提高编码效果,最终可以提高全景视频的观看体验。
参照图3,示出了本申请实施例中提供的一种帧间编码方法的步骤流程图,其应用于全景视频,全景视频包括至少一个全景视频帧,全景视频帧中包含拼接边界,具体可以包括如下步骤:
步骤301,针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;
在本申请实施例中,全景视频可以包括至少一个全景视频帧。在视频编码的过程中,可以依次对全景视频中的全景视频帧进行编码。对于当前需要进行编码的全景视频帧进行分割,从而可以将全景视频帧划分为若干编码单元(coding unit,CU)。其中,编码单元可以进一步包括预测单元(prediction unit,PU)。具体地,预测单元可以即为编码单元,或者预测单元可以为编码单元进一步分割得到的。
在具体实现中,在HEVC编码过程中,可以首先将全景视频帧划分为若干编码树单元(coding tree units,CTU)。编码树单元可以进一步均匀划分为4个编码单元。其后,根据使用的预测模式不同,编码单元可以进一步被划分,并将其中的部分单元作为预测单元。
在帧间编码中,当前全景视频帧可以对应具有一参考视频帧。当前全景视频帧可以基于参考视频帧进行编码。当前全景视频帧还可以具有一同位视频帧,同位视频帧可以为图像序列号(Picture order count,POC)与当前全景视频帧差值最小的已编码视频帧。当前
全景视频帧可以参考同位视频帧确定运动矢量,从而可以实现基于时域上图像的变化完成视频编码。同位视频帧也可以通常具有其相应的编码时作为参考的视频帧,即同位参考视频帧。
步骤302,在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;
在同位视频帧中,可以存在与当前全景视频帧中的预测单元相对应的同位预测单元。一般来说,可以从同位视频帧中查找两个与预测单元关联度较高的位置作为候选的同位预测单元,并最终在确定其中一个位置作为与预测单元对应的同位预测单元。由于同位视频帧已编码,从而同位预测单元可以已经具有相应的同位运动矢量。
步骤303,针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。
一般来说,现有编码方式中,确定时域候选运动矢量的过程中,通常可以考虑当前全景视频帧与参考视频之间的水平位置关系。但是,在预测单元邻近当前全景视频帧中的拼接边界的情况下,由于全景视频中的物体可能跨拼接边界运动。而全景视频帧中的拼接部分可以经过裁切拼接旋转等处理,在此情况下,仅仅采用水平位置关系确定时域候选运动矢量可以无法准确地描述物体的运动变化。由此,可以根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系对同位运动矢量进行缩放,在同时考虑水平方向以及垂直方向位置关系编码的基础上确定时域候选运动矢量,从而可以更加准确地确定预测单元对应的运动矢量,提高视频编码效果,最终可以提高用户的视频观看体验。
在具体实现中,如图4所示,图4为本申请实施例提供的一种视频帧的示意图。其中,当前全景视频帧与参考视频帧之间的水平距离可以为tb,同位视频帧与同位参考视频帧之间的水平距离可以为td。一般来说tb和td可以采用图像序列号的差度量。当前全景视频帧与参考视频帧之间的垂直距离可以为b,同位视频帧与同位参考视频帧之间的垂直距离可以为d。可以采用如下计算方式计算得到预测单元的时域候选运动矢量
其中,为同位运动矢量,tb为当前全景视频帧与参考视频帧之间的水平距离,td为同位视频帧与同位参考视频帧之间的水平距离。B为当前全景视频帧与参考视频帧之间的垂直距离,d为同位视频帧与同位参考视频帧之间的垂直距离。
在本申请的一种实施例中,方法还包括:
S11,针对预测单元,若预测单元邻近拼接边界,基于球面相邻关系,确定至少一个候选块;
在帧间编码过程中,除了确定时域候选运动矢量之外,还可以确定若干空域上的空域候选运动矢量,以在时域候选运动矢量以及空域候选运动矢量之间选出最佳运动矢量进一步进行运动估计,完成对图像的编码。
针对空域候选运动矢量的确定,在现有技术中,如图5所示,通常可以选取与当前预测单元(current PU)相邻的候选块A0、A1、B1、B0、B2中的至少一个作为空域候选运
动矢量。但是,在本申请实施例中,在预测单元邻近当前全景视频帧中的拼接边界的情况下,由于全景视频帧采用球面等面积方式展开,此时采用原有方式选取候选块,候选块的参考价值可以较低。由此,针对球面等面积方式展开的当前全景视频帧,可以基于球面相邻关系确定至少一个候选块,可以进一步提高全景视频帧的编码准确性。
S12,将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
在本申请实施例中,预测单元可以参考候选块的运动矢量,确定自身的运动矢量。由此,可以将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
在本申请的一种实施例中,基于球面相邻关系,确定至少一个候选块的步骤,包括:
S21,基于球面相邻关系,确定至少一个待确认候选块;
S22,确定待确认候选块是否经过旋转处理;
进一步地,在基于球面相邻关系,确定至少一个可以用于参考运动矢量的待确认候选块之后,由于预测单元接近拼接边界,待确认候选块可能位于经过旋转处理的图像区域上。此时若预测单元直接参考该待确认候选块的运动矢量,其精确度可以相对较低。由此,需要确定待确认候选块是否经过旋转处理,以进一步对待确认候选块进行处理。
S23,若待确认候选块经过旋转处理,旋转待确认候选块,得到候选块。
具体而言,若待确认候选块或预测单元经过旋转处理,此时为了准确地确定预测单元可以参考的运动矢量,可以对待确认候选块进行旋转处理,使得待确认候选块可以与预测单元处于相同的旋转角度上,得到候选块。例如,若预测单元未经过旋转处理,而待确认候选块旋转了180度,则可以将待确认候选块旋转180度,得到与预测单元旋转角度对应的候选块。在完成旋转处理后,可以针对候选块重新计算运动矢量,并将重新计算的运动矢量作为预测单元的空域候选运动矢量。
可选地,预测单元也可能位于经过旋转处理的图像区域上,在此情况下,还可以对预测单元进行反向的旋转处理,使其恢复为未经过旋转处理的状态,并在未经过旋转处理的状态下进行运动矢量预测。例如,若预测单元旋转了顺时针90度。则可以将预测单元逆时针旋转90,得到经过旋转处理的预测单元。
在本申请的一种实施例中,方法还包括:
S31,若预测单元对应的参考像素区域与预测单元的分辨率不同,对预测单元的空域候选运动矢量进行插值处理。
具体而言,预测单元可以具有参考像素区域,参考像素区域通常可以位于视频帧的边缘,预测单元可以基于参考像素区域进行编码处理。
若预测单元位于拼接边界上的情况下,由于拼接边界为后期人为添加,分辨率可能与其对应的参考像素区域不同。例如,如图6所示,图6是本申请实施例中提供的一种全景视频帧的示意图,其中F区域处,即拼接边界上的预测单元的分辨率可以与其对应的参考像素区域f不同。
此时计算得到预测单元的空域候选运动矢量可能无法直接用于运动预测等处理。此时可以基于与预测单元相邻的其他预测单元的运动矢量,对预测单元的空域候选运动矢量进行插值处理,使预测单元的空域候选运动矢量可以正常用于运动预测等处理。
在具体实现中,预测单元的空域候选运动矢量的插值计算可以采用如下公式进行:
其中,wi为预测单元与相邻的其他预测单元之间的距离;和为与预测单元相邻的其他预测单元的运动矢量的水平分量;和为与预测单元相邻的其他预测单元的运动矢量的垂直分量;为预测单元空域候选运动矢量的水平分量,为预测单元空域候选运动矢量的垂直分量,为经过插值计算的预测单元空域候选运动矢量。
在本申请的一种实施例中,方法还包括:
S41,基于时域候选运动矢量和/或空域候选运动矢量进行运动估计,确定最优运动矢量。
在本申请实施例中,在确定预测单元的时域候选运动矢量和/或空域候选运动矢量之后,可以进一步进行运动估计,基于时域候选运动矢量和/或空域候选运动矢量确定搜索起始点进行搜索,查找最优匹配块,并基于最优匹配块确定最优运动矢量。
在本申请的一种实施例中,基于时域候选运动矢量和/或空域候选运动矢量进行运动估计包括:
S51,基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计。
在具体实现中,基于时域候选运动矢量和/或空域候选运动矢量进行运动估计的过程中,为了进一步提高运动估计的效率,可以采用亚像素精度运动估计进行运动估计。
具体而言,亚像素精度运动估计可以分为亚像素插值及亚搜索过程。亚像素插值步骤可以以整像素运动估计搜索出来的最佳整像素运动矢量为中心,得到其附近的8个1/2像素点,并插值得到这8个1/2像素点对应的亚像素参考块。亚搜索过程可以计算这9个点(8个1/2像素点和1个整像素点)的代价,取代价最小的点作为最佳亚像素运动矢量。
在本申请的一种实施例中,亚像素精度运动估计包括亚像素插值计算;
基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计的步骤,包括:
S61,在亚像素插值计算过程中,若预测单元邻近拼接边界,基于预测单元周围距离拼接边界最近的至少一个像素进行插值。
在亚像素插值计算过程中,若预测单元邻近拼接边界,则在确定附近的8个1/2像素点的过程中,由于全景视频帧为球面等面积展开,若直接选择与最佳整像素运动矢量相邻的8个1/2像素点,则此时8个1/2像素点并非计算过程的最佳参考点。由此,可以选择预测单元中拼接边界最近的至少一个像素,且为球面相邻关系的像素作为亚像素插值计算过程中的参考像素进行差值处理,从而可以进一步提高计算精度。
通过本申请实施例提供的帧间编码方法,针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;针对预测单元,若预测单元邻近拼接边界,根
据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。针对邻近全景视频图像帧中拼接边界的预测单元,通过在确定时域候选运动矢量的过程中,同时考虑水平位置关系与垂直位置关系,使得物体跨拼接边界运动的情况下,可以全面地考虑当前全景视频帧与参考视频帧之间的位置关系以及同位视频帧与同位参考视频帧之间位置关系,使得编码过程中可以更加准确地确定预测单元对应的运动矢量,在不需要额外进入冗余像素的情况下,提高编码效果,最终可以提高全景视频的观看体验。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图7,示出了本申请实施例中提供的一种帧间编码装置的结构框图,其应用于全景视频,全景视频包括至少一个全景视频帧,全景视频帧中包含拼接边界,具体可以包括如下模块:
编码单元划分模块701,用于针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;
参考预测单元确定模块702,用于在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;
时域候选运动矢量获取模块703,用于针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。
可选地,装置还包括:
候选块确定模块,用于针对预测单元,若预测单元邻近拼接边界,基于球面相邻关系,确定至少一个候选块;
空域候选运动矢量获取模块,用于将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
可选地,候选块确定模块包括:
待确定候选块确定子模块,用于基于球面相邻关系,确定至少一个待确认候选块;
旋转处理确定子模块,用于确定待确认候选块是否经过旋转处理;
候选块获取子模块,用于若待确认候选块经过旋转处理,旋转待确认候选块,得到候选块。
可选地,装置还包括:
插值处理模块,用于若预测单元对应的参考像素区域与预测单元的分辨率不同,对预
测单元的空域候选运动矢量进行插值处理。
可选地,装置还包括:
最优运动矢量确定模块,用于基于时域候选运动矢量和/或空域候选运动矢量进行运动估计,确定最优运动矢量。
可选地,最优运动矢量确定模块包括:
运动估计子模块,用于基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计。
可选地,亚像素精度运动估计包括亚像素插值计算;
运动估计子模块包括:
运动估计单元,用于在亚像素插值计算过程中,若预测单元邻近拼接边界,基于预测单元周围距离拼接边界最近的至少一个像素进行插值。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
另外,本申请实施例还提供一种电子设备,如图8所示,包括处理器801、通信接口802、存储器803和通信总线804,其中,处理器801,通信接口802,存储器803通过通信总线804完成相互间的通信,
存储器803,用于存放计算机程序;
处理器801,用于执行存储器803上所存放的程序时,实现如下步骤:
针对当前全景视频帧,将当前全景视频帧划分为若干编码单元,编码单元包括预测单元;当前全景视频帧对应有参考视频帧以及同位视频帧,同位视频帧具有同位参考视频帧;
在同位视频帧中确定与预测单元对应的同位预测单元;同位预测单元具有同位运动矢量;
针对预测单元,若预测单元邻近拼接边界,根据当前全景视频帧与参考视频帧之间的水平位置关系与垂直位置关系,以及同位视频帧与同位参考视频帧之间的水平位置关系与垂直位置关系,对同位运动矢量进行缩放,得到预测单元的时域候选运动矢量。
可选地,方法还包括:
针对预测单元,若预测单元邻近拼接边界,基于球面相邻关系,确定至少一个候选块;
将候选块对应的运动矢量作为预测单元的空域候选运动矢量。
可选地,基于球面相邻关系,确定至少一个候选块的步骤,包括:
基于球面相邻关系,确定至少一个待确认候选块;
确定待确认候选块是否经过旋转处理;
若待确认候选块经过旋转处理,旋转待确认候选块,得到候选块。
可选地,方法还包括:
若预测单元对应的参考像素区域与预测单元的分辨率不同,对预测单元的空域候选运动矢量进行插值处理。
可选地,方法还包括:
基于时域候选运动矢量和/或空域候选运动矢量进行运动估计,确定最优运动矢量。
可选地,基于时域候选运动矢量和/或空域候选运动矢量进行运动估计包括:
基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计。
可选地,亚像素精度运动估计包括亚像素插值计算;
基于时域候选运动矢量和/或空域候选运动矢量进行亚像素精度运动估计的步骤,包括:
在亚像素插值计算过程中,若预测单元邻近拼接边界,基于预测单元周围距离拼接边界最近的至少一个像素进行插值。
上述终端提到的通信总线可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通信接口用于上述终端与其他设备之间的通信。
存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
如图9所示,在本申请提供的又一实施例中,还提供了一种计算机可读存储介质901,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述实施例中的帧间编码方法。
在本申请提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例中的帧间编码方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或
者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。
Claims (15)
- 一种帧间编码方法,其特征在于,其应用于全景视频,所述全景视频包括至少一个全景视频帧,所述全景视频帧中包含拼接边界,所述方法包括:针对当前全景视频帧,将所述当前全景视频帧划分为若干编码单元,所述编码单元包括预测单元;所述当前全景视频帧对应有参考视频帧以及同位视频帧,所述同位视频帧具有同位参考视频帧;在所述同位视频帧中确定与所述预测单元对应的同位预测单元;所述同位预测单元具有同位运动矢量;针对所述预测单元,若所述预测单元邻近所述拼接边界,根据所述当前全景视频帧与所述参考视频帧之间的水平位置关系与垂直位置关系,以及所述同位视频帧与所述同位参考视频帧之间的水平位置关系与垂直位置关系,对所述同位运动矢量进行缩放,得到所述预测单元的时域候选运动矢量。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:针对所述预测单元,若所述预测单元邻近所述拼接边界,基于球面相邻关系,确定至少一个候选块;将所述候选块对应的运动矢量作为所述预测单元的空域候选运动矢量。
- 根据权利要求2所述的方法,其特征在于,所述基于球面相邻关系,确定至少一个候选块的步骤,包括:基于球面相邻关系,确定至少一个待确认候选块;确定所述待确认候选块是否经过旋转处理;若所述待确认候选块经过旋转处理,旋转所述待确认候选块,得到候选块。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:若所述预测单元对应的参考像素区域与所述预测单元的分辨率不同,对所述预测单元的空域候选运动矢量进行插值处理。
- 根据权利要求2所述的方法,其特征在于,所述方法还包括:基于时域候选运动矢量和/或所述空域候选运动矢量进行运动估计,确定最优运动矢量。
- 根据权利要求5所述的方法,其特征在于,所述基于时域候选运动矢量和/或所述空域候选运动矢量进行运动估计包括:基于时域候选运动矢量和/或所述空域候选运动矢量进行亚像素精度运动估计。
- 根据权利要求6所述的方法,其特征在于,所述亚像素精度运动估计包括亚像素插值计算;所述基于时域候选运动矢量和/或所述空域候选运动矢量进行亚像素精度运动估计的步骤,包括:在所述亚像素插值计算过程中,若所述预测单元邻近所述拼接边界,基于所述预测单元周围距离所述拼接边界最近的至少一个像素进行插值。
- 一种帧间编码装置,其特征在于,其应用于全景视频,所述全景视频包括至少一个全景视频帧,所述全景视频帧中包含拼接边界,所述装置包括:编码单元划分模块,用于针对当前全景视频帧,将所述当前全景视频帧划分为若干编码单元,所述编码单元包括预测单元;所述当前全景视频帧对应有参考视频帧以及同位视频帧,所述同位视频帧具有同位参考视频帧;参考预测单元确定模块,用于在所述同位视频帧中确定与所述预测单元对应的同位预测单元;所述同位预测单元具有同位运动矢量;时域候选运动矢量获取模块,用于针对所述预测单元,若所述预测单元邻近所述拼接边界,根据所述当前全景视频帧与所述参考视频帧之间的水平位置关系与垂直位置关系,以及所述同位视频帧与所述同位参考视频帧之间的水平位置关系与垂直位置关系,对所述同位运动矢量进行缩放,得到所述预测单元的时域候选运动矢量。
- 根据权利要求8所述的装置,其特征在于,所述装置还包括:候选块确定模块,用于针对所述预测单元,若所述预测单元邻近所述拼接边界,基于球面相邻关系,确定至少一个候选块;空域候选运动矢量获取模块,用于将所述候选块对应的运动矢量作为所述预测单元的空域候选运动矢量。
- 根据权利要求9所述的装置,其特征在于,所述候选块确定模块包括:待确定候选块确定子模块,用于基于球面相邻关系,确定至少一个待确认候选块;旋转处理确定子模块,用于确定所述待确认候选块是否经过旋转处理;候选块获取子模块,用于若所述待确认候选块经过旋转处理,旋转所述待确认候选块,得到候选块。
- 根据权利要求9所述的装置,其特征在于,所述装置还包括:插值处理模块,用于若所述预测单元对应的参考像素区域与所述预测单元的分辨率不同,对所述预测单元的空域候选运动矢量进行插值处理。
- 根据权利要求9所述的装置,其特征在于,所述装置还包括:最优运动矢量确定模块,用于基于时域候选运动矢量和/或所述空域候选运动矢量进行运动估计,确定最优运动矢量。
- 根据权利要求12所述的装置,其特征在于,所述最优运动矢量确定模块包括:运动估计子模块,用于基于时域候选运动矢量和/或所述空域候选运动矢量进行亚像素精度运动估计。
- 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,所述处理器、所述通信接口以及所述存储器通过所述通信总线完成相互间的通信;所述存储器,用于存放计算机程序;所述处理器,用于执行存储器上所存放的程序时,实现如权利要求1-7任一项所述的方法。
- 一个或多个计算机可读介质,其上存储有指令,当由一个或多个处理器执行时,使得所述处理器执行如权利要求1-7任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310097652.0 | 2023-02-10 | ||
CN202310097652.0A CN115802039B (zh) | 2023-02-10 | 2023-02-10 | 一种帧间编码方法、装置、电子设备及计算机可读介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024164710A1 true WO2024164710A1 (zh) | 2024-08-15 |
Family
ID=85430850
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/139651 WO2024164710A1 (zh) | 2023-02-10 | 2023-12-18 | 一种帧间编码方法、装置、电子设备及计算机可读介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115802039B (zh) |
WO (1) | WO2024164710A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115802039B (zh) * | 2023-02-10 | 2023-06-23 | 天翼云科技有限公司 | 一种帧间编码方法、装置、电子设备及计算机可读介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106162207A (zh) * | 2016-08-25 | 2016-11-23 | 北京字节跳动科技有限公司 | 一种全景视频并行编码方法和装置 |
CN107396081A (zh) * | 2017-06-19 | 2017-11-24 | 深圳市铂岩科技有限公司 | 针对全景视频的优化编码方法及装置 |
CN108476322A (zh) * | 2016-01-22 | 2018-08-31 | 联发科技股份有限公司 | 用于球面图像和立方体图像的帧间预测的装置 |
WO2019158812A1 (en) * | 2018-02-16 | 2019-08-22 | Nokia Technologies Oy | A method and an apparatus for motion compensation |
CN110213590A (zh) * | 2019-06-25 | 2019-09-06 | 浙江大华技术股份有限公司 | 时域运动矢量获取、帧间预测、视频编码的方法及设备 |
CN115802039A (zh) * | 2023-02-10 | 2023-03-14 | 天翼云科技有限公司 | 一种帧间编码方法、装置、电子设备及计算机可读介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101484171B1 (ko) * | 2011-01-21 | 2015-01-23 | 에스케이 텔레콤주식회사 | 예측 움직임벡터 색인부호화에 기반한 움직임정보 생성/복원 장치 및 방법, 및 그것을 이용한 영상 부호화/복호화 장치 및 방법 |
WO2016165069A1 (en) * | 2015-04-14 | 2016-10-20 | Mediatek Singapore Pte. Ltd. | Advanced temporal motion vector prediction in video coding |
CN105872386A (zh) * | 2016-05-31 | 2016-08-17 | 深圳易贝创新科技有限公司 | 一种全景摄像装置以及全景图片生成方法 |
CN107622474B (zh) * | 2017-09-26 | 2021-03-30 | 北京大学深圳研究生院 | 基于主视点的全景视频映射方法 |
CN114007044A (zh) * | 2021-10-28 | 2022-02-01 | 安徽奇智科技有限公司 | 一种基于opencv的图像拼接系统及方法 |
-
2023
- 2023-02-10 CN CN202310097652.0A patent/CN115802039B/zh active Active
- 2023-12-18 WO PCT/CN2023/139651 patent/WO2024164710A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108476322A (zh) * | 2016-01-22 | 2018-08-31 | 联发科技股份有限公司 | 用于球面图像和立方体图像的帧间预测的装置 |
CN106162207A (zh) * | 2016-08-25 | 2016-11-23 | 北京字节跳动科技有限公司 | 一种全景视频并行编码方法和装置 |
CN107396081A (zh) * | 2017-06-19 | 2017-11-24 | 深圳市铂岩科技有限公司 | 针对全景视频的优化编码方法及装置 |
WO2019158812A1 (en) * | 2018-02-16 | 2019-08-22 | Nokia Technologies Oy | A method and an apparatus for motion compensation |
CN110213590A (zh) * | 2019-06-25 | 2019-09-06 | 浙江大华技术股份有限公司 | 时域运动矢量获取、帧间预测、视频编码的方法及设备 |
CN115802039A (zh) * | 2023-02-10 | 2023-03-14 | 天翼云科技有限公司 | 一种帧间编码方法、装置、电子设备及计算机可读介质 |
Also Published As
Publication number | Publication date |
---|---|
CN115802039B (zh) | 2023-06-23 |
CN115802039A (zh) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10116934B2 (en) | Image processing method and apparatus | |
US11922599B2 (en) | Video super-resolution processing method and apparatus | |
WO2017005128A1 (zh) | 图像预测方法和相关设备 | |
WO2021114868A1 (zh) | 降噪方法、终端及存储介质 | |
CA3027764C (en) | Intra-prediction video coding method and device | |
WO2016065872A1 (zh) | 图像预测方法及相关装置 | |
US11128874B2 (en) | Motion compensating prediction method and device | |
CN111179199B (zh) | 图像处理方法、装置及可读存储介质 | |
US10284810B1 (en) | Using low-resolution frames to increase frame rate of high-resolution frames | |
WO2024164710A1 (zh) | 一种帧间编码方法、装置、电子设备及计算机可读介质 | |
CN110622214B (zh) | 基于超体素的时空视频分割的快速渐进式方法 | |
US11669942B2 (en) | Image de-warping system | |
US9706220B2 (en) | Video encoding method and decoding method and apparatuses | |
JP6781823B2 (ja) | インターフレーム予測符号化方法および装置 | |
CN111754429B (zh) | 运动矢量后处理方法和装置、电子设备及存储介质 | |
US9432690B2 (en) | Apparatus and method for video processing | |
US9031358B2 (en) | Video retargeting using seam carving | |
US11893704B2 (en) | Image processing method and device therefor | |
CN113873095B (zh) | 运动补偿方法和模块、芯片、电子设备及存储介质 | |
WO2022017747A1 (en) | Leak-free gradual decoding refresh without restrictions on coding units in clean areas | |
Chen et al. | A shape-adaptive low-complexity technique for 3D free-viewpoint visual applications | |
CN110958457A (zh) | 模式依赖的仿射继承 | |
JP2011197727A (ja) | 動きベクトル検出方法、及び動きベクトル検出プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23920899 Country of ref document: EP Kind code of ref document: A1 |