[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114913196A - Attention-based dense optical flow calculation method - Google Patents

Attention-based dense optical flow calculation method Download PDF

Info

Publication number
CN114913196A
CN114913196A CN202111623934.7A CN202111623934A CN114913196A CN 114913196 A CN114913196 A CN 114913196A CN 202111623934 A CN202111623934 A CN 202111623934A CN 114913196 A CN114913196 A CN 114913196A
Authority
CN
China
Prior art keywords
feature
generate
network
optical flow
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111623934.7A
Other languages
Chinese (zh)
Inventor
张继东
吕超
曹靖城
涂娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Digital Life Technology Co Ltd
Original Assignee
Tianyi Digital Life Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Digital Life Technology Co Ltd filed Critical Tianyi Digital Life Technology Co Ltd
Priority to CN202111623934.7A priority Critical patent/CN114913196A/en
Priority to PCT/CN2022/097531 priority patent/WO2023123873A1/en
Publication of CN114913196A publication Critical patent/CN114913196A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a dense optical flow calculation method based on attention machine mechanism. The invention provides a dense optical flow calculation method based on Unet and a Transformer. In the invention, two adjacent frames are spliced on a channel through a down-sampling module and then input into a convolution network for down-sampling; then, a feature processing module is used for carrying out global context feature processing on the feature map coding input sequence output by the down-sampling network; and finally, an up-sampling module is used for up-sampling and reconstructing the feature map after feature processing into a light flow map with the same size as the input picture.

Description

Attention-based dense optical flow calculation method
Technical Field
The present invention relates to the field of image processing, and mainly to the field of dense optical flow computation.
Background
When a moving object is viewed by the human eye, the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information constantly "flows" through the retina (i.e., the image plane) as if it were a "flow" of light, so called optical flow. Specifically, the optical flow is the instantaneous velocity of pixel motion of a spatially moving object on the observation imaging plane. The optical flow method is a method for calculating motion information of an object between adjacent frames by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the corresponding relationship between a previous frame and a current frame. Conventional methods for computing optical flow are mainly gradient-based, frequency-based, phase-based, and matching-based methods.
Dense optical flow is an image registration method for point-by-point matching of an image or a specified area, which calculates the offset of all points on the image to form a dense optical flow field. With this dense optical flow field, image registration at the pixel level can be performed. The Horn-Schunck algorithm and most optical flow methods based on region matching fall into the category of dense optical flow. Among optical flow calculation methods using deep learning, FlowNet is the most widely used in practical applications.
The patent "robust interpolation optical flow calculation method for pyramid occlusion detection block matching" (CN112509014A) discloses a robust interpolation optical flow calculation method for pyramid occlusion detection block matching, which comprises the steps of firstly carrying out pyramid occlusion detection block matching to obtain a sparse robust motion field, forming k-layer image pyramids on two continuous frames of images through downsampling factors, carrying out block matching on each layer of pyramids, and obtaining a matching result with initial occlusion; obtaining occlusion detection information through an occlusion detection algorithm based on a deformation error; obtaining an accurate sparse matching result through matching, and acquiring a dense optical flow through a robust interpolation algorithm; after obtaining the dense optical flow by a robust interpolation algorithm, optimizing the dense optical flow by global energy functional variational: and obtaining a final optical flow through global energy functional variation optimization.
The patent "an image sequence light stream estimation method based on learnable occlusion mask and secondary deformation optimization" (CN112465872A) discloses an image sequence light stream estimation method based on learnable occlusion mask and secondary deformation optimization, which comprises the steps of firstly inputting any two continuous frames of images in an image sequence, and carrying out feature pyramid downsampling and layering on the images to obtain multi-resolution two-frame features; calculating the correlation degree of the first frame feature and the second frame feature in each layer of pyramid, and constructing a shielding mask-based module by utilizing the correlation degree; then, removing the edge artifact of the deformation feature by using the obtained shielding mask to optimize the optical flow of the image motion edge blur; constructing a secondary deformation optimization module by using the optical flow after the occlusion constraint, and further optimizing the estimation of the optical flow of the image motion edge at a sub-pixel level by secondary deformation; and carrying out the same shielding mask and secondary deformation on the deformation features in each pyramid layer to obtain a residual flow to refine the optical flow, and outputting the final optimized optical flow estimation when the optical flow reaches the pyramid bottom layer.
Both of the above patents effectively improve the calculation accuracy of optical flow estimation, but the requirements of tasks such as video encoding and HDR composition on optical flow cannot be met in terms of the accuracy of dense optical flow. Therefore, there is a need for an improved technique to increase the accuracy of dense optical flow computations.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Compared with the existing dense optical flow method, the method introduces a multi-head self-attention machine into the optical flow prediction calculation task, and improves the effect of the optical flow calculation task by utilizing the global self-attention advantage of the Transformer in the aspect of sequence-to-sequence prediction. In addition, the method can improve the accuracy of the dense optical flow graph at the key position, and meanwhile, the timeliness of dense optical flow calculation is improved by reducing the network depth of up-sampling and down-sampling of Unet.
In accordance with one embodiment of the present invention, a method for dense optical flow computation is disclosed, comprising: splicing adjacent frames on a channel to generate a spliced vector diagram; inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector; mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedding representation sequence; inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence; recombining the generated hidden characteristic sequence to generate a recombined characteristic vector; and inputting the recombined feature vectors into an upsampling network for processing so as to generate a dense light flow graph.
According to another embodiment of the invention, a system for dense optical flow computation is disclosed that includes a downsampling module, a feature processing module, and an upsampling module. The down-sampling module is configured to: splicing adjacent frames on a channel to generate a spliced vector diagram; and inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector. The feature processing module is configured to: mapping the feature vectors generated by the down-sampling module to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence; the high-dimensional embedded representation sequence is input into a feature processing network consisting of I transform layers to generate a hidden feature sequence. The upsampling module is configured to: recombining the hidden feature sequences generated by the feature processing module to generate recombined feature vectors; and inputting the recombined feature vectors into an up-sampling network for processing so as to generate a dense light flow graph.
In accordance with another embodiment of the present invention, a computing device for dense optical flow computation is disclosed, comprising: a processor; a memory storing instructions that, when executed by the processor, are capable of performing the method as described above.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
FIG. 1 shows a block diagram of a system 100 for dense optical flow computation according to one embodiment of the invention;
FIG. 2 illustrates a detailed view 200 of the modules 101 and 103 of FIG. 1 according to one embodiment of the invention;
FIG. 3 illustrates a flow diagram of a method 300 for dense optical flow computation according to one embodiment of the invention; and
FIG. 4 shows a block diagram 400 of an exemplary computing device, according to an embodiment of the invention.
Detailed Description
The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.
The following is an explanation of terms used in the present invention, which includes the general meanings well known to those skilled in the art:
unet: the convolutional layer is completely symmetrical in the down-sampling part and the up-sampling part, and a characteristic diagram of a down-sampling end can skip deep sampling and is spliced to the corresponding up-sampling end.
Transformer is a Natural Language Processing (NLP) model that employs a mechanism of attention to accomplish the task of machine translation.
In computer vision, optical flow plays an important role, and has very important applications in target object segmentation, recognition, tracking, robot navigation, shape information recovery and the like. The optical flow calculation can be widely applied to various scenes, such as motion detection of video coding and decoding in a cloud storage video compression task, high-altitude parabolic motion, falling detection and other motion recognition and video understanding tasks. To obtain more accurate motion estimation, dense optical flow computation is a key module in video codec technology. The traditional dense optical flow calculation method has large calculation amount and poor timeliness. The existing optical flow calculation method based on the deep learning method is improved in timeliness, but the accuracy of a dense optical flow graph is low, and the adverse effect is caused on the quality of video coding and decoding.
The invention provides a dense optical flow calculation method based on Unet and a Transformer, which introduces a Transformer module into an Unet structure, utilizes the global self-attention advantage of the Transformer in the aspect of sequence-to-sequence prediction to improve the accuracy of dense optical flow of a key position, and can reduce the network depth of up-sampling and down-sampling of the Unet and improve the timeliness of dense optical flow calculation.
FIG. 1 shows a block diagram of a system 100 for dense optical flow computation according to one embodiment of the invention. As shown in fig. 1, the system 100 is divided into modules, with communication and data exchange between the modules being performed in a manner known in the art. In the present invention, each module may be implemented by software or hardware or a combination thereof. As shown in fig. 1, the system 100 may include a downsampling module 101, a feature processing module 102, and an upsampling module 103.
According to an embodiment of the present invention, the downsampling module 101 is configured to concatenate two adjacent frames on a channel (e.g., a color channel) to form an input picture, so as to input the input picture to a convolution network for downsampling, thereby obtaining the feature map. The feature processing module 102 is configured to perform global context feature processing on the feature map encoded input sequence output by the downsampling module 101. The upsampling module 103 is configured as a cascade upsampler to upsample the feature-processed feature map to reconstruct a light-flow map of the same size as the input picture.
FIG. 2 illustrates a detailed view 200 of the modules 101 and 103 of FIG. 1 according to one embodiment of the invention.
As shown in fig. 2, the downsampling module 101 receives two adjacent frames 201, first splices the two frames 201 to obtain an h × w × 6 vector map, and then inputs the vector map to a downsampling network composed of 7 convolutional blocks, each of which is composed of a convolutional layer and a ReLU activation function, wherein the step size of 5 convolutional layers is 2.
Finally, the down-sampling module 101 outputs a value of
Figure BDA0003439132950000051
For processing by the feature processing module 102.
As shown in fig. 2, the feature processing module 102 includes a step of mapping the sequence of feature maps output by the downsampling module 101 into the high-dimensional embedding space of the potential layer using a trainable linear mapping E, which is calculated as shown in equation (1):
Figure BDA0003439132950000052
the high-dimensional embedded representation sequence is then input into a feature processing network consisting of I transform layers. The specific structure of the Transformer layer is shown in fig. 3. Specifically, the transform Layer is composed of a Multi-head Self-Attention (MSA) and a Multi-Layer Perceptron (MLP), and the output of the i-th Layer is shown in formulas (2) and (3):
z′ i =MSA(LN(z i-1 ))+z i-1 , (2)
z i =MLP(LN(z' i ))+z' i , (3)
where LN (-) represents a hierarchical normalization operation. The feature processing module 102 finally outputs a hidden feature sequence z I
As shown in FIG. 2, the upsampling module 103 is a cascaded upsampling network that includes multiple upsampling steps to decode the output final optical streamPicture 202. First, the upsampling module 103 processes the hidden feature sequence z finally output by the feature processing module 102 I Is recombined into
Figure BDA0003439132950000053
The feature vector of size is then input into an upsampling network consisting of 7 deconvolution blocks, each consisting of one deconvolution layer and one ReLU activation function, where the step size of the 5 deconvolution layers is 2. Finally, an optical flow diagram output with the size h multiplied by w multiplied by 3 is obtained. Furthermore, the invention incorporates three layers of jumps with down-sampled feature vectors to enable feature aggregation (203, 204, 205) at different resolution levels, thereby optimizing the details of the optical flow.
FIG. 3 shows a flow diagram of a method 300 for dense optical flow computation according to one embodiment of the present invention.
In step 301, adjacent frames are spliced on a channel to generate a spliced vector graph. According to one embodiment of the invention, the channel is a color channel, such as an RGB channel. According to one embodiment of the invention, the vector graph size is h × w × 6.
In step 302, the spliced vector graphics are input into a down-sampling network for feature extraction to generate feature vectors. According to one embodiment of the invention, the downsampled network consists of 7 convolution blocks, each convolution block consisting of one convolution layer and one ReLU activation function, where the step size of 5 convolution layers is 2. According to one embodiment of the invention, the feature vector is of a size of
Figure BDA0003439132950000061
In step 303, the feature vectors generated in step 302 are mapped into the high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence. According to an embodiment of the invention, the feature vectors obtained in step 302 may be mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
At step 304, the high-dimensional embedded representation sequence is input into a feature processing network consisting of I transform layers to generate a hidden feature sequence. According to one embodiment of the invention, the Transformer layer is composed of MSA and MLP for global context feature processing.
In step 305, the hidden feature sequences generated in step 304 are recombined to generate recombined feature vectors. According to one embodiment of the invention, the sequence of hidden features z I Is recombined into
Figure BDA0003439132950000062
A feature vector of size.
At step 306, the reconstructed eigenvectors are input into the upsampling network for processing to generate a dense optical flow graph. The dense light flow graph may embody the light flow of the object motion in two adjacent frames acquired in step 301. According to one embodiment of the invention, the upsampling network consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size of 5 deconvolution layers is 2. According to one embodiment of the invention, the size of the dense light-flow pattern is hxwx 3. According to one embodiment of the invention, the upsampling network is a cascaded upsampling network, enabling feature aggregation at different resolution levels, thereby optimizing the details of dense optical flow.
In summary, compared with the prior art, the invention has the main advantages that: (1) the method has the advantages that a multi-head self-attention machine is introduced into the optical flow prediction calculation task, and the accuracy of dense optical flows of key positions can be improved by utilizing the global self-attention advantage of a Transformer in the aspect of sequence-to-sequence prediction; (2) due to the excellent performance of the multi-head self-attention machine in prediction calculation on the feature layer, the network depth of Unet up-sampling and down-sampling can be reduced, and the timeliness of dense optical flow calculation can be improved.
FIG. 4 shows a block diagram 400 of an exemplary computing device, which is one example of a hardware device that may be applied to aspects of the present invention, according to one embodiment of the present invention. Computing device 400 may be any machine that may be configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. Computing device 400 may include components that may be connected or communicate via one or more interfaces and a bus 402. For example, computing device 400 may include a bus 402, one or more processors 404, one or more input devices 406, and one or more output devices 408. The one or more processors 404 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., dedicated processing chips). Input device 406 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device 408 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 400 may also include or be connected to non-transitory storage device 410, which may be any storage device that is non-transitory and that enables data storage, and which may include, but is not limited to, a disk drive, an optical storage device, a solid-state memory, a floppy disk, a flexible disk, a hard disk, a tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. Non-transitory storage device 410 may be detached from the interface. The non-transitory storage device 410 may have data/instructions/code for implementing the above-described methods and steps. Computing device 400 may also include a communication device 412. The communication device 412 may be any type of device or system capable of communicating with internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth device, an IEEE1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
The bus 402 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computing device 400 may also include a working memory 414, which working memory 414 may be any type of working memory capable of storing instructions and/or data that facilitate the operation of processor 404 and may include, but is not limited to, random access memory and/or read only memory devices.
Software components may be located in the working memory 414 including, but not limited to, an operating system 416, one or more application programs 418, drivers, and/or other data and code. Instructions for implementing the above-described methods and steps of the invention may be contained within the one or more applications 418, and the instructions of the one or more applications 418 may be read and executed by the processor 404 to implement the above-described method 300 of the invention.
It should also be appreciated that variations may be made according to particular needs. For example, customized hardware might also be used and/or particular components might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. In addition, connections to other computing devices, such as network input/output devices, and the like, may be employed. For example, some or all of the disclosed methods and apparatus can be implemented with logic and algorithms in accordance with the present invention through programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) having assembly language or hardware programming languages (e.g., VERILOG, VHDL, C + +).
Although the aspects of the present invention have been described so far with reference to the accompanying drawings, the above-described methods and apparatuses are merely examples, and the scope of the present invention is not limited to these aspects but only by the appended claims and equivalents thereof. Various components may be omitted or may be replaced with equivalent components. In addition, the steps may also be performed in a different order than described in the present invention. Further, the various components may be combined in various ways. It is also important that as technology develops that many of the described components can be replaced by equivalent components appearing later.

Claims (10)

1. A method for dense optical flow computation, comprising:
splicing adjacent frames on a channel to generate a spliced vector diagram;
inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector;
mapping the generated feature vectors to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence;
inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence;
recombining the generated hidden characteristic sequence to generate a recombined characteristic vector; and
and inputting the recombined feature vectors into an up-sampling network for processing so as to generate a dense light flow graph.
2. The method of claim 1, wherein the downsampling network consists of 7 convolutional blocks, each convolutional block consisting of one convolutional layer and one ReLU activation function, wherein the step size for 5 convolutional layers is 2.
3. The method of claim 1, wherein the transform layer consists of a multi-headed spontoon and a multi-layered perceptron.
4. The method of claim 1, wherein the upsampling network is a cascaded upsampling network and consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size for 5 deconvolution layers is 2.
5. The method of claim 1, wherein mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence further comprises: the feature vectors are mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
6. A system for dense optical flow computation, comprising:
a downsampling module configured to:
splicing adjacent frames on a channel to generate a spliced vector diagram;
inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector;
a feature processing module configured to:
mapping the feature vectors generated by the down-sampling module to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence;
inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence;
an upsampling module configured to:
recombining the hidden feature sequences generated by the feature processing module to generate recombined feature vectors; and
and inputting the recombined feature vectors into an upsampling network for processing so as to generate a dense light flow graph.
7. The system of claim 6, wherein the downsampling network consists of 7 convolution blocks, each convolution block consisting of one convolution layer and one ReLU activation function, wherein the step size for 5 convolution layers is 2;
wherein the upsampling network is a cascaded upsampling network and consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size of 5 deconvolution layers is 2.
8. The system of claim 6, wherein the transform layer consists of a multi-headed spontoon and a multi-layered perceptron.
9. The system of claim 6, wherein mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence further comprises: the feature vectors are mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
10. A computing device for dense optical flow computation, comprising:
a processor;
a memory storing instructions that, when executed by the processor, are capable of performing the method of any of claims 1-5.
CN202111623934.7A 2021-12-28 2021-12-28 Attention-based dense optical flow calculation method Pending CN114913196A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111623934.7A CN114913196A (en) 2021-12-28 2021-12-28 Attention-based dense optical flow calculation method
PCT/CN2022/097531 WO2023123873A1 (en) 2021-12-28 2022-06-08 Dense optical flow calculation method employing attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111623934.7A CN114913196A (en) 2021-12-28 2021-12-28 Attention-based dense optical flow calculation method

Publications (1)

Publication Number Publication Date
CN114913196A true CN114913196A (en) 2022-08-16

Family

ID=82763430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111623934.7A Pending CN114913196A (en) 2021-12-28 2021-12-28 Attention-based dense optical flow calculation method

Country Status (2)

Country Link
CN (1) CN114913196A (en)
WO (1) WO2023123873A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164429A1 (en) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and device
CN113610031A (en) * 2021-08-14 2021-11-05 北京达佳互联信息技术有限公司 Video processing method and video processing device
CN113709455A (en) * 2021-09-27 2021-11-26 北京交通大学 Multilevel image compression method using Transformer

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2722816A3 (en) * 2012-10-18 2017-04-19 Thomson Licensing Spatio-temporal confidence maps
CN111724360B (en) * 2020-06-12 2023-06-02 深圳技术大学 Lung lobe segmentation method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021164429A1 (en) * 2020-02-21 2021-08-26 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and device
CN113610031A (en) * 2021-08-14 2021-11-05 北京达佳互联信息技术有限公司 Video processing method and video processing device
CN113709455A (en) * 2021-09-27 2021-11-26 北京交通大学 Multilevel image compression method using Transformer

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIENENG CHEN: "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation", ARXIV, 8 February 2021 (2021-02-08), pages 1 - 11 *
李森;许宏科;: "基于时空建模的视频帧预测模型", 物联网技术, no. 02, 20 February 2020 (2020-02-20) *
李耀仟: "面向手术器械语义分割的半监督时空Transformer网络", 软件学报, vol. 33, no. 4, 26 October 2021 (2021-10-26), pages 1501 - 1515 *
杨健程;倪冰冰;: "医学3D计算机视觉:研究进展和挑战", 中国图象图形学报, no. 10, 16 October 2020 (2020-10-16) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116486107B (en) * 2023-06-21 2023-09-05 南昌航空大学 Optical flow calculation method, system, equipment and medium

Also Published As

Publication number Publication date
WO2023123873A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
WO2020177651A1 (en) Image segmentation method and image processing device
Xie et al. Edge-guided single depth image super resolution
CN113066017A (en) Image enhancement method, model training method and equipment
CN110910437B (en) Depth prediction method for complex indoor scene
WO2022104026A1 (en) Consistency measure for image segmentation processes
WO2020146911A2 (en) Multi-stage multi-reference bootstrapping for video super-resolution
CN112308866A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113807361A (en) Neural network, target detection method, neural network training method and related products
WO2024041235A1 (en) Image processing method and apparatus, device, storage medium and program product
CN112991254A (en) Disparity estimation system, method, electronic device, and computer-readable storage medium
Pang et al. Lightweight multi-scale aggregated residual attention networks for image super-resolution
CN114359361A (en) Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium
Liu et al. Residual-guided multiscale fusion network for bit-depth enhancement
Jin et al. Light field reconstruction via deep adaptive fusion of hybrid lenses
CN114913196A (en) Attention-based dense optical flow calculation method
CN118570054B (en) Training method, related device and medium for image generation model
CN117151987A (en) Image enhancement method and device and electronic equipment
CN111507950B (en) Image segmentation method and device, electronic equipment and computer-readable storage medium
WO2024032331A9 (en) Image processing method and apparatus, electronic device, and storage medium
US11790633B2 (en) Image processing using coupled segmentation and edge learning
CN116630744A (en) Image generation model training method, image generation device and medium
CN116486009A (en) Monocular three-dimensional human body reconstruction method and device and electronic equipment
CN112927250B (en) Edge detection system and method based on multi-granularity attention hierarchical network
CN115272906A (en) Video background portrait segmentation model and algorithm based on point rendering
CN117523560A (en) Semantic segmentation method, semantic segmentation device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination