CN114913196A - Attention-based dense optical flow calculation method - Google Patents
Attention-based dense optical flow calculation method Download PDFInfo
- Publication number
- CN114913196A CN114913196A CN202111623934.7A CN202111623934A CN114913196A CN 114913196 A CN114913196 A CN 114913196A CN 202111623934 A CN202111623934 A CN 202111623934A CN 114913196 A CN114913196 A CN 114913196A
- Authority
- CN
- China
- Prior art keywords
- feature
- generate
- network
- optical flow
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 66
- 238000004364 calculation method Methods 0.000 title abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000005070 sampling Methods 0.000 claims abstract description 27
- 239000013598 vector Substances 0.000 claims description 41
- 238000000034 method Methods 0.000 claims description 29
- 238000010586 diagram Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 5
- 230000015654 memory Effects 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 230000003936 working memory Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 210000001525 retina Anatomy 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a dense optical flow calculation method based on attention machine mechanism. The invention provides a dense optical flow calculation method based on Unet and a Transformer. In the invention, two adjacent frames are spliced on a channel through a down-sampling module and then input into a convolution network for down-sampling; then, a feature processing module is used for carrying out global context feature processing on the feature map coding input sequence output by the down-sampling network; and finally, an up-sampling module is used for up-sampling and reconstructing the feature map after feature processing into a light flow map with the same size as the input picture.
Description
Technical Field
The present invention relates to the field of image processing, and mainly to the field of dense optical flow computation.
Background
When a moving object is viewed by the human eye, the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information constantly "flows" through the retina (i.e., the image plane) as if it were a "flow" of light, so called optical flow. Specifically, the optical flow is the instantaneous velocity of pixel motion of a spatially moving object on the observation imaging plane. The optical flow method is a method for calculating motion information of an object between adjacent frames by using the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the corresponding relationship between a previous frame and a current frame. Conventional methods for computing optical flow are mainly gradient-based, frequency-based, phase-based, and matching-based methods.
Dense optical flow is an image registration method for point-by-point matching of an image or a specified area, which calculates the offset of all points on the image to form a dense optical flow field. With this dense optical flow field, image registration at the pixel level can be performed. The Horn-Schunck algorithm and most optical flow methods based on region matching fall into the category of dense optical flow. Among optical flow calculation methods using deep learning, FlowNet is the most widely used in practical applications.
The patent "robust interpolation optical flow calculation method for pyramid occlusion detection block matching" (CN112509014A) discloses a robust interpolation optical flow calculation method for pyramid occlusion detection block matching, which comprises the steps of firstly carrying out pyramid occlusion detection block matching to obtain a sparse robust motion field, forming k-layer image pyramids on two continuous frames of images through downsampling factors, carrying out block matching on each layer of pyramids, and obtaining a matching result with initial occlusion; obtaining occlusion detection information through an occlusion detection algorithm based on a deformation error; obtaining an accurate sparse matching result through matching, and acquiring a dense optical flow through a robust interpolation algorithm; after obtaining the dense optical flow by a robust interpolation algorithm, optimizing the dense optical flow by global energy functional variational: and obtaining a final optical flow through global energy functional variation optimization.
The patent "an image sequence light stream estimation method based on learnable occlusion mask and secondary deformation optimization" (CN112465872A) discloses an image sequence light stream estimation method based on learnable occlusion mask and secondary deformation optimization, which comprises the steps of firstly inputting any two continuous frames of images in an image sequence, and carrying out feature pyramid downsampling and layering on the images to obtain multi-resolution two-frame features; calculating the correlation degree of the first frame feature and the second frame feature in each layer of pyramid, and constructing a shielding mask-based module by utilizing the correlation degree; then, removing the edge artifact of the deformation feature by using the obtained shielding mask to optimize the optical flow of the image motion edge blur; constructing a secondary deformation optimization module by using the optical flow after the occlusion constraint, and further optimizing the estimation of the optical flow of the image motion edge at a sub-pixel level by secondary deformation; and carrying out the same shielding mask and secondary deformation on the deformation features in each pyramid layer to obtain a residual flow to refine the optical flow, and outputting the final optimized optical flow estimation when the optical flow reaches the pyramid bottom layer.
Both of the above patents effectively improve the calculation accuracy of optical flow estimation, but the requirements of tasks such as video encoding and HDR composition on optical flow cannot be met in terms of the accuracy of dense optical flow. Therefore, there is a need for an improved technique to increase the accuracy of dense optical flow computations.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Compared with the existing dense optical flow method, the method introduces a multi-head self-attention machine into the optical flow prediction calculation task, and improves the effect of the optical flow calculation task by utilizing the global self-attention advantage of the Transformer in the aspect of sequence-to-sequence prediction. In addition, the method can improve the accuracy of the dense optical flow graph at the key position, and meanwhile, the timeliness of dense optical flow calculation is improved by reducing the network depth of up-sampling and down-sampling of Unet.
In accordance with one embodiment of the present invention, a method for dense optical flow computation is disclosed, comprising: splicing adjacent frames on a channel to generate a spliced vector diagram; inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector; mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedding representation sequence; inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence; recombining the generated hidden characteristic sequence to generate a recombined characteristic vector; and inputting the recombined feature vectors into an upsampling network for processing so as to generate a dense light flow graph.
According to another embodiment of the invention, a system for dense optical flow computation is disclosed that includes a downsampling module, a feature processing module, and an upsampling module. The down-sampling module is configured to: splicing adjacent frames on a channel to generate a spliced vector diagram; and inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector. The feature processing module is configured to: mapping the feature vectors generated by the down-sampling module to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence; the high-dimensional embedded representation sequence is input into a feature processing network consisting of I transform layers to generate a hidden feature sequence. The upsampling module is configured to: recombining the hidden feature sequences generated by the feature processing module to generate recombined feature vectors; and inputting the recombined feature vectors into an up-sampling network for processing so as to generate a dense light flow graph.
In accordance with another embodiment of the present invention, a computing device for dense optical flow computation is disclosed, comprising: a processor; a memory storing instructions that, when executed by the processor, are capable of performing the method as described above.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
FIG. 1 shows a block diagram of a system 100 for dense optical flow computation according to one embodiment of the invention;
FIG. 2 illustrates a detailed view 200 of the modules 101 and 103 of FIG. 1 according to one embodiment of the invention;
FIG. 3 illustrates a flow diagram of a method 300 for dense optical flow computation according to one embodiment of the invention; and
FIG. 4 shows a block diagram 400 of an exemplary computing device, according to an embodiment of the invention.
Detailed Description
The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.
The following is an explanation of terms used in the present invention, which includes the general meanings well known to those skilled in the art:
unet: the convolutional layer is completely symmetrical in the down-sampling part and the up-sampling part, and a characteristic diagram of a down-sampling end can skip deep sampling and is spliced to the corresponding up-sampling end.
Transformer is a Natural Language Processing (NLP) model that employs a mechanism of attention to accomplish the task of machine translation.
In computer vision, optical flow plays an important role, and has very important applications in target object segmentation, recognition, tracking, robot navigation, shape information recovery and the like. The optical flow calculation can be widely applied to various scenes, such as motion detection of video coding and decoding in a cloud storage video compression task, high-altitude parabolic motion, falling detection and other motion recognition and video understanding tasks. To obtain more accurate motion estimation, dense optical flow computation is a key module in video codec technology. The traditional dense optical flow calculation method has large calculation amount and poor timeliness. The existing optical flow calculation method based on the deep learning method is improved in timeliness, but the accuracy of a dense optical flow graph is low, and the adverse effect is caused on the quality of video coding and decoding.
The invention provides a dense optical flow calculation method based on Unet and a Transformer, which introduces a Transformer module into an Unet structure, utilizes the global self-attention advantage of the Transformer in the aspect of sequence-to-sequence prediction to improve the accuracy of dense optical flow of a key position, and can reduce the network depth of up-sampling and down-sampling of the Unet and improve the timeliness of dense optical flow calculation.
FIG. 1 shows a block diagram of a system 100 for dense optical flow computation according to one embodiment of the invention. As shown in fig. 1, the system 100 is divided into modules, with communication and data exchange between the modules being performed in a manner known in the art. In the present invention, each module may be implemented by software or hardware or a combination thereof. As shown in fig. 1, the system 100 may include a downsampling module 101, a feature processing module 102, and an upsampling module 103.
According to an embodiment of the present invention, the downsampling module 101 is configured to concatenate two adjacent frames on a channel (e.g., a color channel) to form an input picture, so as to input the input picture to a convolution network for downsampling, thereby obtaining the feature map. The feature processing module 102 is configured to perform global context feature processing on the feature map encoded input sequence output by the downsampling module 101. The upsampling module 103 is configured as a cascade upsampler to upsample the feature-processed feature map to reconstruct a light-flow map of the same size as the input picture.
FIG. 2 illustrates a detailed view 200 of the modules 101 and 103 of FIG. 1 according to one embodiment of the invention.
As shown in fig. 2, the downsampling module 101 receives two adjacent frames 201, first splices the two frames 201 to obtain an h × w × 6 vector map, and then inputs the vector map to a downsampling network composed of 7 convolutional blocks, each of which is composed of a convolutional layer and a ReLU activation function, wherein the step size of 5 convolutional layers is 2.
Finally, the down-sampling module 101 outputs a value ofFor processing by the feature processing module 102.
As shown in fig. 2, the feature processing module 102 includes a step of mapping the sequence of feature maps output by the downsampling module 101 into the high-dimensional embedding space of the potential layer using a trainable linear mapping E, which is calculated as shown in equation (1):
the high-dimensional embedded representation sequence is then input into a feature processing network consisting of I transform layers. The specific structure of the Transformer layer is shown in fig. 3. Specifically, the transform Layer is composed of a Multi-head Self-Attention (MSA) and a Multi-Layer Perceptron (MLP), and the output of the i-th Layer is shown in formulas (2) and (3):
z′ i =MSA(LN(z i-1 ))+z i-1 , (2)
z i =MLP(LN(z' i ))+z' i , (3)
where LN (-) represents a hierarchical normalization operation. The feature processing module 102 finally outputs a hidden feature sequence z I 。
As shown in FIG. 2, the upsampling module 103 is a cascaded upsampling network that includes multiple upsampling steps to decode the output final optical streamPicture 202. First, the upsampling module 103 processes the hidden feature sequence z finally output by the feature processing module 102 I Is recombined intoThe feature vector of size is then input into an upsampling network consisting of 7 deconvolution blocks, each consisting of one deconvolution layer and one ReLU activation function, where the step size of the 5 deconvolution layers is 2. Finally, an optical flow diagram output with the size h multiplied by w multiplied by 3 is obtained. Furthermore, the invention incorporates three layers of jumps with down-sampled feature vectors to enable feature aggregation (203, 204, 205) at different resolution levels, thereby optimizing the details of the optical flow.
FIG. 3 shows a flow diagram of a method 300 for dense optical flow computation according to one embodiment of the present invention.
In step 301, adjacent frames are spliced on a channel to generate a spliced vector graph. According to one embodiment of the invention, the channel is a color channel, such as an RGB channel. According to one embodiment of the invention, the vector graph size is h × w × 6.
In step 302, the spliced vector graphics are input into a down-sampling network for feature extraction to generate feature vectors. According to one embodiment of the invention, the downsampled network consists of 7 convolution blocks, each convolution block consisting of one convolution layer and one ReLU activation function, where the step size of 5 convolution layers is 2. According to one embodiment of the invention, the feature vector is of a size of
In step 303, the feature vectors generated in step 302 are mapped into the high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence. According to an embodiment of the invention, the feature vectors obtained in step 302 may be mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
At step 304, the high-dimensional embedded representation sequence is input into a feature processing network consisting of I transform layers to generate a hidden feature sequence. According to one embodiment of the invention, the Transformer layer is composed of MSA and MLP for global context feature processing.
In step 305, the hidden feature sequences generated in step 304 are recombined to generate recombined feature vectors. According to one embodiment of the invention, the sequence of hidden features z I Is recombined intoA feature vector of size.
At step 306, the reconstructed eigenvectors are input into the upsampling network for processing to generate a dense optical flow graph. The dense light flow graph may embody the light flow of the object motion in two adjacent frames acquired in step 301. According to one embodiment of the invention, the upsampling network consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size of 5 deconvolution layers is 2. According to one embodiment of the invention, the size of the dense light-flow pattern is hxwx 3. According to one embodiment of the invention, the upsampling network is a cascaded upsampling network, enabling feature aggregation at different resolution levels, thereby optimizing the details of dense optical flow.
In summary, compared with the prior art, the invention has the main advantages that: (1) the method has the advantages that a multi-head self-attention machine is introduced into the optical flow prediction calculation task, and the accuracy of dense optical flows of key positions can be improved by utilizing the global self-attention advantage of a Transformer in the aspect of sequence-to-sequence prediction; (2) due to the excellent performance of the multi-head self-attention machine in prediction calculation on the feature layer, the network depth of Unet up-sampling and down-sampling can be reduced, and the timeliness of dense optical flow calculation can be improved.
FIG. 4 shows a block diagram 400 of an exemplary computing device, which is one example of a hardware device that may be applied to aspects of the present invention, according to one embodiment of the present invention. Computing device 400 may be any machine that may be configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. Computing device 400 may include components that may be connected or communicate via one or more interfaces and a bus 402. For example, computing device 400 may include a bus 402, one or more processors 404, one or more input devices 406, and one or more output devices 408. The one or more processors 404 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., dedicated processing chips). Input device 406 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device 408 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 400 may also include or be connected to non-transitory storage device 410, which may be any storage device that is non-transitory and that enables data storage, and which may include, but is not limited to, a disk drive, an optical storage device, a solid-state memory, a floppy disk, a flexible disk, a hard disk, a tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. Non-transitory storage device 410 may be detached from the interface. The non-transitory storage device 410 may have data/instructions/code for implementing the above-described methods and steps. Computing device 400 may also include a communication device 412. The communication device 412 may be any type of device or system capable of communicating with internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth device, an IEEE1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
The bus 402 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Software components may be located in the working memory 414 including, but not limited to, an operating system 416, one or more application programs 418, drivers, and/or other data and code. Instructions for implementing the above-described methods and steps of the invention may be contained within the one or more applications 418, and the instructions of the one or more applications 418 may be read and executed by the processor 404 to implement the above-described method 300 of the invention.
It should also be appreciated that variations may be made according to particular needs. For example, customized hardware might also be used and/or particular components might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. In addition, connections to other computing devices, such as network input/output devices, and the like, may be employed. For example, some or all of the disclosed methods and apparatus can be implemented with logic and algorithms in accordance with the present invention through programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) having assembly language or hardware programming languages (e.g., VERILOG, VHDL, C + +).
Although the aspects of the present invention have been described so far with reference to the accompanying drawings, the above-described methods and apparatuses are merely examples, and the scope of the present invention is not limited to these aspects but only by the appended claims and equivalents thereof. Various components may be omitted or may be replaced with equivalent components. In addition, the steps may also be performed in a different order than described in the present invention. Further, the various components may be combined in various ways. It is also important that as technology develops that many of the described components can be replaced by equivalent components appearing later.
Claims (10)
1. A method for dense optical flow computation, comprising:
splicing adjacent frames on a channel to generate a spliced vector diagram;
inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector;
mapping the generated feature vectors to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence;
inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence;
recombining the generated hidden characteristic sequence to generate a recombined characteristic vector; and
and inputting the recombined feature vectors into an up-sampling network for processing so as to generate a dense light flow graph.
2. The method of claim 1, wherein the downsampling network consists of 7 convolutional blocks, each convolutional block consisting of one convolutional layer and one ReLU activation function, wherein the step size for 5 convolutional layers is 2.
3. The method of claim 1, wherein the transform layer consists of a multi-headed spontoon and a multi-layered perceptron.
4. The method of claim 1, wherein the upsampling network is a cascaded upsampling network and consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size for 5 deconvolution layers is 2.
5. The method of claim 1, wherein mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence further comprises: the feature vectors are mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
6. A system for dense optical flow computation, comprising:
a downsampling module configured to:
splicing adjacent frames on a channel to generate a spliced vector diagram;
inputting the spliced vector diagram into a down-sampling network for feature extraction to generate a feature vector;
a feature processing module configured to:
mapping the feature vectors generated by the down-sampling module to a high-dimensional embedding space of a potential layer to generate a high-dimensional embedding representation sequence;
inputting a high-dimensional embedded representation sequence into a feature processing network consisting of I transform layers to generate a hidden feature sequence;
an upsampling module configured to:
recombining the hidden feature sequences generated by the feature processing module to generate recombined feature vectors; and
and inputting the recombined feature vectors into an upsampling network for processing so as to generate a dense light flow graph.
7. The system of claim 6, wherein the downsampling network consists of 7 convolution blocks, each convolution block consisting of one convolution layer and one ReLU activation function, wherein the step size for 5 convolution layers is 2;
wherein the upsampling network is a cascaded upsampling network and consists of 7 deconvolution blocks, each deconvolution block consisting of one deconvolution layer and one ReLU activation function, wherein the step size of 5 deconvolution layers is 2.
8. The system of claim 6, wherein the transform layer consists of a multi-headed spontoon and a multi-layered perceptron.
9. The system of claim 6, wherein mapping the generated feature vectors to a high-dimensional embedding space of the potential layer to generate a high-dimensional embedded representation sequence further comprises: the feature vectors are mapped into the high-dimensional embedding space of the latent layer using a trainable linear mapping E.
10. A computing device for dense optical flow computation, comprising:
a processor;
a memory storing instructions that, when executed by the processor, are capable of performing the method of any of claims 1-5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111623934.7A CN114913196A (en) | 2021-12-28 | 2021-12-28 | Attention-based dense optical flow calculation method |
PCT/CN2022/097531 WO2023123873A1 (en) | 2021-12-28 | 2022-06-08 | Dense optical flow calculation method employing attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111623934.7A CN114913196A (en) | 2021-12-28 | 2021-12-28 | Attention-based dense optical flow calculation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114913196A true CN114913196A (en) | 2022-08-16 |
Family
ID=82763430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111623934.7A Pending CN114913196A (en) | 2021-12-28 | 2021-12-28 | Attention-based dense optical flow calculation method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114913196A (en) |
WO (1) | WO2023123873A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486107A (en) * | 2023-06-21 | 2023-07-25 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021164429A1 (en) * | 2020-02-21 | 2021-08-26 | 京东方科技集团股份有限公司 | Image processing method, image processing apparatus, and device |
CN113610031A (en) * | 2021-08-14 | 2021-11-05 | 北京达佳互联信息技术有限公司 | Video processing method and video processing device |
CN113709455A (en) * | 2021-09-27 | 2021-11-26 | 北京交通大学 | Multilevel image compression method using Transformer |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2722816A3 (en) * | 2012-10-18 | 2017-04-19 | Thomson Licensing | Spatio-temporal confidence maps |
CN111724360B (en) * | 2020-06-12 | 2023-06-02 | 深圳技术大学 | Lung lobe segmentation method, device and storage medium |
-
2021
- 2021-12-28 CN CN202111623934.7A patent/CN114913196A/en active Pending
-
2022
- 2022-06-08 WO PCT/CN2022/097531 patent/WO2023123873A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021164429A1 (en) * | 2020-02-21 | 2021-08-26 | 京东方科技集团股份有限公司 | Image processing method, image processing apparatus, and device |
CN113610031A (en) * | 2021-08-14 | 2021-11-05 | 北京达佳互联信息技术有限公司 | Video processing method and video processing device |
CN113709455A (en) * | 2021-09-27 | 2021-11-26 | 北京交通大学 | Multilevel image compression method using Transformer |
Non-Patent Citations (4)
Title |
---|
JIENENG CHEN: "TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation", ARXIV, 8 February 2021 (2021-02-08), pages 1 - 11 * |
李森;许宏科;: "基于时空建模的视频帧预测模型", 物联网技术, no. 02, 20 February 2020 (2020-02-20) * |
李耀仟: "面向手术器械语义分割的半监督时空Transformer网络", 软件学报, vol. 33, no. 4, 26 October 2021 (2021-10-26), pages 1501 - 1515 * |
杨健程;倪冰冰;: "医学3D计算机视觉:研究进展和挑战", 中国图象图形学报, no. 10, 16 October 2020 (2020-10-16) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486107A (en) * | 2023-06-21 | 2023-07-25 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
CN116486107B (en) * | 2023-06-21 | 2023-09-05 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2023123873A1 (en) | 2023-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020177651A1 (en) | Image segmentation method and image processing device | |
Xie et al. | Edge-guided single depth image super resolution | |
CN113066017A (en) | Image enhancement method, model training method and equipment | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
WO2022104026A1 (en) | Consistency measure for image segmentation processes | |
WO2020146911A2 (en) | Multi-stage multi-reference bootstrapping for video super-resolution | |
CN112308866A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN113807361A (en) | Neural network, target detection method, neural network training method and related products | |
WO2024041235A1 (en) | Image processing method and apparatus, device, storage medium and program product | |
CN112991254A (en) | Disparity estimation system, method, electronic device, and computer-readable storage medium | |
Pang et al. | Lightweight multi-scale aggregated residual attention networks for image super-resolution | |
CN114359361A (en) | Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium | |
Liu et al. | Residual-guided multiscale fusion network for bit-depth enhancement | |
Jin et al. | Light field reconstruction via deep adaptive fusion of hybrid lenses | |
CN114913196A (en) | Attention-based dense optical flow calculation method | |
CN118570054B (en) | Training method, related device and medium for image generation model | |
CN117151987A (en) | Image enhancement method and device and electronic equipment | |
CN111507950B (en) | Image segmentation method and device, electronic equipment and computer-readable storage medium | |
WO2024032331A9 (en) | Image processing method and apparatus, electronic device, and storage medium | |
US11790633B2 (en) | Image processing using coupled segmentation and edge learning | |
CN116630744A (en) | Image generation model training method, image generation device and medium | |
CN116486009A (en) | Monocular three-dimensional human body reconstruction method and device and electronic equipment | |
CN112927250B (en) | Edge detection system and method based on multi-granularity attention hierarchical network | |
CN115272906A (en) | Video background portrait segmentation model and algorithm based on point rendering | |
CN117523560A (en) | Semantic segmentation method, semantic segmentation device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |