CN116681597A

CN116681597A - Image processing method, device, storage medium and terminal

Info

Publication number: CN116681597A
Application number: CN202210157024.2A
Authority: CN
Inventors: 田秀敏; 霰心培
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2023-09-01

Abstract

The embodiment of the application discloses an image processing method, an image processing device, a storage medium and a terminal. The method comprises the steps of firstly determining target video frames in target video and adjacent video frames of the target video frames, twisting the adjacent video frames to the target video frames through an estimation model, then respectively extracting features of the target video frames and the adjacent video frames, then calculating correlation amounts of the target video frames and the adjacent video frames in the whole feature space range, carrying out normalization processing on the correlation amounts, multiplying the normalized correlation amounts by features of the adjacent video frames subjected to convolution operation to obtain a plurality of feature maps, then aggregating the feature maps to obtain target aggregate feature maps, and recovering clear target video frames after the target aggregate feature maps are input into an image recovery network, so that the deblurring processing effect of video can be improved.

Description

Image processing method, device, storage medium and terminal

Technical Field

The present application relates to the field of video blurring technologies, and in particular, to an image processing method, an image processing device, a storage medium, and a terminal.

Background

Motion blur is often present in video, which severely affects the quality of the video. The reasons for the formation of video motion blur include: one is that the shot static scene has larger depth change, and camera shake occurs in the exposure time; the other is to shoot an object with rapid motion in a scene, and the blurring degree of each pixel point in a video image is different. Therefore, a deblurring process is required for the video.

The task of video deblurring aims at improving video quality by recovering a sharp frame from a blurred video sequence, typically by using a more sharp block of pixels in adjacent frames to remove the blur. However, since the adjacent frames of reference are generally spatially misaligned with the current frame of reference due to motion, constructing a visual correspondence between frames of reference remains a current challenge, and related art uses homography or optical flow to solve the problem of misalignment between frames, but deblurring is poor when the object is moving rapidly with large displacement.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a storage medium and a terminal, which can improve the deblurring processing effect of video.

The embodiment of the application provides an image processing method, which comprises the following steps:

acquiring a target frame in a target video and adjacent frames of the target frame;

respectively extracting the characteristics of the target frame and the adjacent frames to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames;

calculating a correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature;

Determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image feature;

and performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame.

Correspondingly, the embodiment of the application also provides an image processing device, which comprises:

the acquisition unit is used for acquiring a target frame in a target video and adjacent frames of the target frame;

the extraction unit is used for extracting the characteristics of the target frame and the adjacent frames respectively to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames;

a calculation unit configured to calculate a correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature;

a first determining unit configured to determine a target feature map of the neighboring frame based on the correlation amount and the neighboring image feature;

and the first processing unit is used for carrying out image restoration processing on the target feature mapping to obtain a processed image frame corresponding to the target frame.

In some embodiments, the computing unit comprises:

a first determining subunit, configured to determine a number of feature channels of the target image feature and the neighboring image feature;

The point multiplication subunit is used for carrying out point multiplication on the feature vector of the target image feature of each feature channel and the feature vector of the adjacent image feature to obtain a first feature vector of the number of the feature channels;

the summing subunit is used for summing the first feature vectors of the number of the feature channels to obtain a second feature vector;

and the first calculating subunit is used for calculating the index of the second characteristic vector to obtain the correlation quantity.

In some embodiments, the apparatus further comprises:

and the second processing unit is used for carrying out normalization processing on the correlation quantity to obtain a normalized correlation quantity.

In some embodiments, the first determining unit includes

A second determination subunit configured to determine the target feature map based on the normalized correlation quantity and the neighboring image features.

In some embodiments, the second processing unit comprises:

a second calculating subunit, configured to calculate a sum of the correlation amounts in a specified feature channel;

and the third calculating subunit is used for calculating the ratio of the correlation quantity to the sum value to obtain the normalized correlation quantity.

In some embodiments, the first determining unit includes

An acquisition subunit, configured to acquire an initial feature map of the neighboring image features;

And a third determining subunit configured to determine the target feature map based on the initial feature map and the correlation amount.

In some embodiments, the third determining subunit is specifically configured to:

performing convolution operation on the initial feature map to obtain a feature map after convolution;

and calculating the product of the characteristic mapping after convolution and the correlation quantity to obtain the target characteristic mapping.

calculating the product of the characteristic mapping after convolution and the correlation quantity in each layer of characteristic pyramid to obtain the sub-characteristic mapping of each layer of characteristic pyramid; and aggregating the sub-feature maps of all feature pyramids to obtain the target feature map.

In some embodiments, the computing unit comprises:

a fourth calculation subunit configured to calculate a first correlation amount of the target frame and the forward neighboring frame based on the target image feature and the first neighboring image feature;

a fifth calculation subunit for calculating a second correlation amount of the target frame and the backward neighboring frame based on the target image feature and the second neighboring image feature.

In some embodiments, the first determining unit comprises:

a fourth determination subunit configured to determine a first feature map of the forward neighboring frame based on the first correlation amount and the first neighboring image feature;

a fifth determining subunit configured to determine a second feature map of the backward neighboring frame based on the second correlation amount and the second neighboring image feature;

and the first connection subunit is used for connecting the first feature map with the second feature map to obtain the target feature map.

In some embodiments, the apparatus further comprises:

a second determination unit configured to determine an inter-frame correlation amount of the target frame based on the target image feature;

and a third determining unit configured to determine a feature map of the target frame based on the inter-frame correlation amount.

In some embodiments, the first processing unit comprises:

the second connection subunit is used for connecting the feature map of the target frame with the target feature map to obtain a connected feature map;

and the first processing subunit is used for carrying out image restoration processing on the connected feature map.

In some embodiments, the extraction unit comprises:

A second processing subunit, configured to perform alignment processing on the adjacent frames based on the target frame, to obtain aligned adjacent frames;

and the extraction subunit is used for respectively extracting the characteristics of the target frame and the aligned adjacent frames to obtain the target image characteristics and the adjacent image characteristics.

Accordingly, an embodiment of the present application also provides a storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the image processing method as described above.

Correspondingly, the embodiment of the application also provides a terminal, which comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the image processing method.

According to the embodiment of the application, the target video frames in the target video and the adjacent video frames of the target video frames are determined, the adjacent video frames are distorted on the target video frames through the estimation model, then the characteristic extraction is respectively carried out on the target video frames and the adjacent video frames, then the correlation quantity between the target video frames and the adjacent video frames is calculated in the whole characteristic space range, the normalization processing is carried out on the correlation quantity, the normalized correlation quantity is multiplied with the characteristics of the adjacent video frames subjected to convolution operation to obtain a plurality of characteristic maps, the characteristic maps are aggregated to obtain the target aggregated characteristic map, and the clear target video frames are restored after the target aggregated characteristic map is input into the image restoration network, so that the deblurring processing effect of the video can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application.

Fig. 2 is a schematic process flow diagram of an image processing method according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a network model according to an embodiment of the present application.

Fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides an image processing method, an image processing device, a storage medium and computer equipment. Specifically, the image processing method of the embodiment of the present application may be executed by a terminal. The terminal can be a terminal device such as a smart phone, a tablet computer, a notebook computer, a touch screen, a personal computer (PC, personal Computer), a personal digital assistant (Personal Digital Assistant, PDA) and the like.

For example, the terminal may acquire a target frame in the target video, and a neighboring frame of the target frame; respectively extracting the characteristics of the target frame and the adjacent frames to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames; calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature; determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image features; and performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame.

Based on the above problems, embodiments of the present application provide an image processing method, an image processing device, a storage medium, and a terminal, which can improve the deblurring effect of video. The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

Referring to fig. 1, fig. 1 is a flowchart of an image processing method according to an embodiment of the application. Taking the example that the image processing method is applied to a terminal, the specific flow of the image processing method can be as follows:

101. and acquiring a target frame in the target video and adjacent frames of the target frame.

In the embodiment of the application, the target video refers to a video with a blurred video frame, the target frame can refer to the blurred video frame in the target video, and the adjacent frame refers to a video frame adjacent to the target frame in the target video.

The number of adjacent frames of the target frame may be single or plural.

For example, a target video frame, that is, a target frame, in the target video may be determined by performing blur detection on the target video, and then a video frame adjacent to the target frame is selected from the target video as an adjacent frame to the target frame.

102. And respectively extracting the characteristics of the target frame and the adjacent frames to obtain the target image characteristics corresponding to the target frame and the adjacent image characteristics corresponding to the adjacent frames.

In some embodiments, in order to ensure that the target frame corresponds to the features of the same pixels of the neighboring frame, the step of "feature extraction is performed on the target frame and the neighboring frame, respectively" may include the following operations:

Performing alignment processing on adjacent frames based on the target frames to obtain aligned adjacent frames;

and respectively extracting the characteristics of the target frame and the aligned adjacent frames to obtain the characteristics of the target image and the characteristics of the adjacent images.

The alignment processing for the adjacent frames refers to adjusting the positions of the offset pixels in the adjacent frames according to the positions of the pixels in the target frames so as to obtain the aligned adjacent frames.

Specifically, the alignment processing of the adjacent frames may use an optical flow estimation method, where the optical flow estimation is a method for calculating motion information of an object between the adjacent frames by using a change in a time domain of pixels in an image sequence and a correlation between the adjacent frames to find a correspondence between the previous frame and a target frame, where each point in the next frame image and each point in the previous frame image are different when two frame images are given, and the different points move to positions. Basic principle of optical flow estimation method: in the adjacent two frames of images with constant brightness, small displacement variation and consistent space, the pixel value after displacement is found out, the change of x and y is estimated mainly through the brightness of the current position in the x and y directions and the gray level variation value in the adjacent two frames.

In the embodiment of the present application, when performing alignment processing on adjacent frames based on a target frame to obtain aligned adjacent frames, the following operations may be included:

acquiring first image pixel information of a target frame and second image pixel information of an adjacent frame;

determining position change information of image pixels in adjacent frames based on the first image pixel information and the second image pixel information;

and adjusting the positions of the image pixels in the adjacent frames according to the position change information to obtain the aligned adjacent frames.

The first image pixel information comprises pixel information of each pixel point in the target frame, and the second image pixel information comprises pixel information of each pixel point in the adjacent frame, wherein the pixel information at least comprises pixel positions, pixel gray values and the like.

The position change information refers to the offset of the pixel points, and then the positions of the image pixels are adjusted according to the offset of the image pixel points in the adjacent frames, so that the aligned adjacent frames can be obtained.

In the embodiment of the present application, after receiving the target frame and the adjacent frame, the adjacent frame may be distorted to the target frame by calculation to perform alignment, and the specific calculation may be as follows:

the set target frame may be A _i The adjacent frame of the target frame may be A _i+1 。A _i And A _i+1 The estimation model between two frames is O, and O can estimate the displacement and the direction of the corresponding pixel motion of the target frame and the adjacent frame. See formula (1):

wherein,,know->The corresponding x and y components are estimated, respectively, and x and y represent offset position parameters, i.e., offsets, including direction and displacement, respectively. The estimation model O passes A through the formula _i+1 The (x, y) of each pixel in (a) is matched to a _i Is a kind of medium. See formula (2):

wherein x and y respectively represent the initial position parameters of each pixel point in the adjacent frames, and then according toPair A _i+1 Processing to obtain aligned adjacent frames +.>

Further, image feature extraction is performed on the target frame and the aligned adjacent frames respectively. Specifically, by an encoder network F having a trainable parameter θ _θ Encoder network F _θ Comprising 3 convolutional layers and 18 residual blocks, 6 of which are used at full resolution, 6 of which are used at half resolution, and 6 of which are used at quarter resolution. The encoder network is used for converting the picture into characteristic representation, inputting the target frame and the adjacent frames into the encoder network, and extracting the characteristics of the target frame and the target frame through the encoder network to obtain the target image characteristic F of the target frame _θ (A _i ) And adjacent image features of adjacent frames

103. The amount of correlation of the target frame and the neighboring frame is calculated based on the target image feature and the neighboring image feature.

The correlation quantity refers to visual correlation between the target frame and the adjacent frame, and the visual correlation comprises similarity between the corresponding characteristics of the target frame and the adjacent frame.

In some embodiments, the step of "calculating the correlation amount of the target frame and the neighboring frame based on the target image feature and the neighboring image feature" may include the following operations:

determining the number of feature channels of the target image features and the adjacent image features;

performing point multiplication on the feature vector of the target image feature of each feature channel and the feature vector of the adjacent image feature to obtain a first feature vector of the number of feature channels;

summing the first feature vectors of the number of the feature channels to obtain a second feature vector;

and calculating an index of the second feature vector to obtain a correlation quantity.

The number of feature channels refers to the channel dimension of the frame feature, and the number of feature channels can be set according to practical situations.

In the embodiment of the present application, the calculation formula of the correlation amount may be as follows:

wherein the target frame may be A _i The adjacent frame of the target frame may be A _i+1 ，F _θ (A _i ) _xyc Refers to the target image features of the target frame in the c-dimensional feature channel,refers to adjacent image features of adjacent frames in the c-dimensional feature channel. exp refers to an exponential function based on a natural constant e. />Represented is the point multiplication of the feature pairs of the target frame and the adjacent frames. Wherein, dot product is also called number product. The result is that the length of the projection of one vector in the direction of the other vector is a scalar.

Further, the feature vector of the target image feature of each feature channel and the feature vector of the adjacent image feature are subjected to point multiplication to obtain c first feature vectors, then the c first feature vectors are summed to obtain a second feature vector, and finally the index of the second feature vector is calculated to obtain the correlation quantity of the target frame and the adjacent frame.

In some embodiments, to aggregate the features on the basis of the feature correspondence, after the step of "calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature", the following steps may be further included:

normalizing the correlation quantity to obtain a normalized correlation quantity;

the step of normalizing the correlation quantity to obtain a normalized correlation quantity may include the following operations:

Calculating the sum value of the correlation quantity in the appointed characteristic channel;

and calculating the ratio of the correlation quantity to the sum value to obtain a normalized correlation quantity.

In the embodiment of the present application, the normalized correlation amount may be calculated by the following calculation formula:

wherein,,refers to the correlation amount of the target frame and the adjacent frames, and the correlation amountIs a correlation matrix describing a specific feature F _θ (A _i ) _xy Adjacent toThe features map the correspondence between all spatial locations.

Wherein,,representing that the sum is first applied to the v-dimension and then to the u-dimension, the correlation is obtained>At the sum value of the appointed characteristic channel, then calculating the ratio of the correlation quantity to the sum value to obtain a normalized correlation quantity +.>

The step of determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image feature may comprise the following operations:

a target feature map is determined based on the normalized correlation quantity and the neighboring image features.

In some embodiments, in order to enhance the deblurring effect on the target frame, the neighboring frames may include a forward neighboring frame and a backward neighboring frame, the neighboring image features include a first neighboring image feature corresponding to the forward neighboring frame and a second neighboring image feature corresponding to the backward neighboring frame, and the step of "calculating the correlation amount of the target frame and the neighboring frame based on the target image feature and the neighboring image feature" may include the following operations:

Calculating a first correlation amount of the target frame and the forward adjacent frame based on the target image feature and the first adjacent image feature;

a second correlation amount of the target frame and the backward neighboring frame is calculated based on the target image feature and the second neighboring image feature.

For example, the target frame may be A _i The forward adjacent frame may be a _i-1 The backward adjacent frame may be a _i+1 The extracting of the image features of the target frame may be: f (F) _θ (A _i ) The image features of the front adjacent frame may be extracted as:the extraction of image features of the backward adjacent frame may be: />

Further, the formula for calculating the first correlation amount of the target frame and the forward adjacent frame is as follows:

wherein F is _θ (A _i ) _xyc Refers to the target image features of the target frame in the c-dimensional feature channel,refers to the neighboring image features of the forward neighboring frame in the c-dimensional feature channel. Firstly, carrying out point multiplication on feature vectors of target image features of each feature channel and feature vectors of first adjacent image features to obtain c first feature vectors, then summing the c first feature vectors to obtain a second feature vector, and finally calculating an index of the second feature vector to obtain a first correlation quantity of a target frame and a forward adjacent frame->

Then, the formula for calculating the second correlation amount of the target frame and the backward adjacent frame is as follows:

Wherein F is _θ (A _i ) _xyc Refers to the target image features of the target frame in the c-dimensional feature channel,refers to the neighboring image features of the backward neighboring frame in the c-dimensional feature channel. First, the feature vector of the target image feature of each feature channel is connected with the second adjacentPerforming point multiplication on feature vectors of image features to obtain c first feature vectors, summing the c first feature vectors to obtain a second feature vector, and finally calculating an index of the second feature vector to obtain a second correlation value (i.e., a second correlation value) of a target frame and a backward adjacent frame>

104. A target feature map for the neighboring frame is determined based on the correlation amount and the neighboring image features.

In some embodiments, the step of determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image feature may include the following operations:

acquiring initial feature mapping of adjacent image features;

a target feature map is determined based on the initial feature map and the correlation amount.

Wherein, the initial feature map of the adjacent image features refers to the feature map of the adjacent frames. Then, a target feature map of the neighboring frame is calculated from the initial feature map and the correlation value.

In some embodiments, the step of "determining a target feature map based on the initial feature map and the correlation amount" may include the following operations:

Specifically, convolution operation is performed on initial feature mapping of adjacent framesThe feature map after convolution is obtained as follows:

in some embodiments, to preserve high resolution image detail, an l=4 layer feature pyramid may be usedThe step of "calculating the product of the convolved feature map and the correlation quantity to obtain the target feature map" may include the following operations:

calculating the product of the feature mapping after convolution and the correlation quantity in each layer of feature pyramid to obtain the sub-feature mapping of each layer of feature pyramid;

and aggregating the sub-feature maps of all feature pyramids to obtain a target feature map.

Specifically, the product of the feature map after convolution and the correlation quantity in each layer of feature pyramid is calculated. The following calculation formula can be used:

wherein,,representing the correlation of the target frame with the neighboring frames at the k-th layer feature pyramid,representing the convolved feature map of the adjacent frames and then calculating +.>And->The product of (a) can be used to obtain feature mapping of adjacent frames on each layer of feature pyramid, namely, the sub-feature mapping is ρ (A _i←i+1 ) _k 。

Specifically, the sub-feature mappings of all feature pyramids are aggregated, and the following calculation formula can be adopted:

where I represents a join operation, representing joining the aggregated feature map from each pyramid along the channel dimension. The sub-feature maps are aggregated through the formula, and the target feature map is obtained as follows: ρ (A) _i←i+1 )。

In some embodiments, when there are multiple neighboring frames, and the target feature map of each neighboring frame needs to be calculated, the step of determining the target feature map of the neighboring frame based on the correlation amount and the neighboring image features may include the following operations:

determining a first feature map for the forward neighboring frame based on the first correlation amount and the first neighboring image feature;

determining a second feature map for the backward neighboring frame based on the second correlation amount and the second neighboring image feature;

and connecting the first feature map with the second feature map to obtain a target feature map.

The first correlation amount refers to the correlation amount of the target frame and the forward adjacent frame, the first adjacent image features are the image features of the forward adjacent frame, and the first feature mapping refers to the aggregation of the sub-feature mapping of each layer of feature pyramid of the forward adjacent frame to obtain an aggregation feature.

Specifically, calculating the first feature map according to the first correlation amount and the first neighboring image feature may be according to the following calculation formula:

wherein,,representing the amount of correlation of the target frame with the feature pyramid of the preceding adjacent frame at the k-th layer,representing the convolved feature map of the preceding adjacent frame, and then calculate +.>And->The product of (a) can be used to obtain feature maps of the preceding adjacent frames on each layer of feature pyramid, i.e. the sub-feature maps are p (a _i←i-1 ) _k . And then, the sub-feature mappings of the front adjacent frames in all feature pyramids are aggregated through connection operation, so that a first feature mapping is obtained.

wherein,,representing the amount of correlation of the target frame with the feature pyramid of the preceding adjacent frame at the k-th layer,representing the convolved feature map of the preceding adjacent frame, and then calculate +.>And->The product of (a) can be used to obtain the feature mapping of the front adjacent frame on each layer of feature pyramid, namely the sub-texelThe symptom map is ρ (A _i←i+1 ) _k . And then, the sub-feature mappings of the front adjacent frames in all feature pyramids are aggregated through connection operation, so that a first feature mapping is obtained.

Specifically, calculating the second feature map from the second correlation amount and the second adjacent image feature may be according to the following calculation formula:

wherein,,representing the amount of correlation of the target frame with the backward neighboring frame at the k-th layer feature pyramid,representing the convolved feature map of the backward adjacent frame, and then calculating +.>And->The feature map of the backward adjacent frame on each layer of feature pyramid can be obtained, namely the sub-feature map is rho (A _i←i+1 ) _k . And then, the sub-feature mappings of the front adjacent frames in all feature pyramids are aggregated through connection operation, so that a second feature mapping is obtained.

Further, the first feature map and the second feature map are connected, and then the target feature map can be obtained.

In some embodiments, referring to fig. 2, fig. 2 is a schematic process flow diagram of an image processing method provided by the embodiment of the present application, as shown in fig. 2, dot product operation is performed on extracted image features of a target frame and adjacent frames to obtain a correlation quantity, that is, the correlation quantity is normalized to obtain a normalized correlation quantity, convolution operation is performed on feature maps of adjacent frames to obtain a convolved feature map, then the normalized vector and the convolved feature map are threaded to obtain an aggregated feature, and as a multi-layer feature pyramid is designed, the aggregated feature corresponding to each layer of feature pyramid can be calculated, and then the aggregated feature is connected to obtain the target feature map.

105. And performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame.

Specifically, the image restoration processing is performed on the target feature map, and the target feature map can be processed through the clear frame recovery network, so that a processed image frame corresponding to the target frame can be obtained.

Referring to fig. 3, fig. 3 is a schematic diagram of a network model according to an embodiment of the application. As shown in fig. 3: the device comprises an input layer, a convolution layer, three residual blocks, a long-period memory layer, three residual blocks, a deconvolution layer, three residual blocks, three deconvolution layers, a deconvolution layer and an output layer in sequence.

Wherein one convolutional layer and three residual blocks form an encoder network and three residual blocks and one deconvolution layer form a decoder network.

Specifically, the processing flow of the processing of the target feature map through the clear frame reply network may be as follows: the target feature map is input through an input layer, then the target feature map is encoded through an encoder, and then the target feature map enters a Long-term Memory layer, wherein Long-term Memory (LSTM) is a special convolutional neural network, and is mainly used for solving the problems of gradient elimination and gradient explosion in the Long-sequence training process. Finally, the feature representation is returned to the image representation by the decoder, and finally the clear frame is output by the output layer.

In some embodiments, to consider the intra-frame motion of the exposure time, an inter-frame correlation amount, that is, a correlation amount between the target frame and the target frame before, the step of "performing image restoration processing on the target feature map" may further include the steps of:

determining an inter-frame correlation amount of a target frame based on the target image features;

determining a feature map of the target frame based on the inter-frame correlation;

wherein, the calculation of the inter-frame correlation amount may be according to the following calculation formula:

wherein F is _θ (A _i ) _xyc Refers to the target image features of the target frame in the c-dimensional feature channel. Firstly, carrying out point multiplication on feature vectors of target image features of each feature channel and feature vectors of target image features to obtain c first feature vectors, then summing the c first feature vectors to obtain a second feature vector, and finally calculating an index of the second feature vector to obtain inter-frame correlation quantity of target frames

Wherein, calculating the feature map of the target frame based on the inter-frame correlation may be according to the following calculation formula:

wherein,,representing the object frame and the feature pyramid of the object frame at the k layerThe amount of correlation is calculated,representing the convolved feature map of the target frame, then calculate +. >And->The product of the features of the target frame on each layer of feature pyramid can be obtained, namely the sub-feature map is rho (A _i←l ) _k . And then, the sub-feature mappings of the target frame in all feature pyramids are aggregated through connection operation, so that the feature mappings of the target frame are obtained.

The step of performing image restoration processing on the target feature map may include the following operations:

connecting the feature map of the target frame with the target feature map to obtain a connected feature map;

and performing image restoration processing on the connected feature map.

For example, the aggregate feature map of the front adjacent frame may be a first aggregate feature map, the aggregate feature map of the back adjacent frame may be a second aggregate feature map, the aggregate feature map of the target frame may be a third aggregate feature map, then the first aggregate feature map, the second aggregate feature map and the third aggregate feature map are connected to obtain a connected feature map, and then the connected feature map is input into a clear frame recovery network to perform image recovery processing to obtain a processed image frame corresponding to the target frame.

The embodiment of the application discloses an image processing method, which comprises the following steps: acquiring a target frame in a target video and adjacent frames of the target frame; respectively extracting the characteristics of the target frame and the adjacent frames to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames; calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature; determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image features; and performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame. The method comprises the steps of determining target video frames in target video and adjacent video frames of the target video frames, twisting the adjacent video frames to the target video frames through an estimation model, then respectively extracting features of the target video frames and the adjacent video frames, calculating correlation amounts of the target video frames and the adjacent video frames in the whole feature space range, carrying out normalization processing on the correlation amounts, multiplying the normalized correlation amounts by features of the adjacent video frames subjected to convolution operation to obtain a plurality of feature maps, then aggregating the feature maps to obtain target aggregate feature maps, and recovering clear target video frames after the target aggregate feature maps are input into an image recovery network, so that the deblurring processing effect of video can be improved.

In order to facilitate better implementation of the image processing method provided by the embodiment of the application, the embodiment of the application also provides a device based on the image processing method. Where the meaning of the terms is the same as in the image processing method described above, specific implementation details may be referred to in the description of the method embodiments.

Referring to fig. 4, fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present application, which can be applied to a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a portable media player (Portable Media Player, PMP), and a fixed terminal such as a desktop computer, and the apparatus includes:

an acquiring unit 301, configured to acquire a target frame in a target video, and an adjacent frame of the target frame;

an extracting unit 302, configured to perform feature extraction on the target frame and the adjacent frame, respectively, to obtain a target image feature corresponding to the target frame and an adjacent image feature corresponding to the adjacent frame;

a calculating unit 303 for calculating a correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature;

a first determining unit 304 for determining a target feature map of the neighboring frame based on the correlation amount and the neighboring image feature;

The first processing unit 305 is configured to perform image restoration processing on the target feature map, so as to obtain a processed image frame corresponding to the target frame.

In some embodiments, the computing unit 303 may include:

In some embodiments, the apparatus may further comprise:

In some embodiments, the first determining unit 304 may include:

In some embodiments, the second processing unit may include:

In some embodiments, the first determining unit 304 may include:

In some embodiments, the third determining subunit may be specifically configured to:

In some embodiments, the computing unit 303 may include:

In some embodiments, the first determining unit 304 may include:

In some embodiments, the apparatus may further comprise:

In some embodiments, the first processing unit 305 may include:

In some embodiments, the extraction unit 302 may include:

The embodiment of the application discloses an image processing device, which is characterized in that a target frame in a target video and an adjacent frame of the target frame are acquired through an acquisition unit 301, an extraction unit 302 respectively performs feature extraction on the target frame and the adjacent frame to obtain a target image feature corresponding to the target frame and an adjacent image feature corresponding to the adjacent frame, a calculation unit 303 calculates the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature, a first determination unit 304 determines a target feature map of the adjacent frame based on the correlation amount and the adjacent image feature, and a first processing unit 305 performs image reduction processing on the target feature map to obtain a processed image frame corresponding to the target frame. Therefore, the deblurring processing effect of the video can be improved.

The embodiment of the application also provides a terminal. As shown in fig. 5, the terminal may include Radio Frequency (RF) circuitry 601, memory 602 including one or more storage media, input unit 603, display unit 604, sensor 605, audio circuit 606, wireless fidelity (WiFi, wireless Fidelity) module 607, processor 608 including one or more processing cores, and power supply 609. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 5 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the RF circuit 601 may be used for receiving and transmitting signals during the process of receiving and transmitting information, in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 608; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 601 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 601 may also communicate with networks and other devices through wireless communications.

The memory 602 may be used to store software programs and modules, and the processor 608 may execute various functional applications and image processing by executing the software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 608 and the input unit 603.

The input unit 603 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 603 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. The input unit 603 may comprise other input devices in addition to a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 604 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the server, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 604 may include a display panel, which may optionally be configured in the form of a liquid crystal display (display screen, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 5 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.

The terminal may also include at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and the backlight when the server moves to the ear.

Audio circuitry 606, speakers, and a microphone may provide an audio interface between the user and the server. The audio circuit 606 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted to a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 606 and converted into audio data, which are processed by the audio data output processor 608 for transmission to, for example, a terminal via the RF circuit 601, or which are output to the memory 602 for further processing. The audio circuit 606 may also include an ear bud jack to provide communication between the peripheral ear bud and the server.

The WiFi belongs to a short-distance wireless transmission technology, and the terminal can help the user to send and receive e-mail, browse web pages, access streaming media and the like through the WiFi module 607, so that wireless broadband internet access is provided for the user. Although fig. 5 shows a WiFi module 607, it is understood that it does not belong to the essential constitution of the terminal, and can be omitted entirely as required within a range that does not change the essence of the application.

The processor 608 is a control center of the terminal, and connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the mobile phone. Optionally, the processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.

The terminal also includes a power supply 609 (e.g., a battery) for powering the various components, which may be logically connected to the processor 608 via a power management system so as to provide for managing charging, discharging, and power consumption by the power management system. The power supply 609 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Specifically, in this embodiment, the processor 608 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 608 executes the application programs stored in the memory 602, so as to implement various functions:

calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature;

determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image features;

The embodiment of the application discloses an image processing method, an image processing device, a storage medium and a terminal. The image processing method comprises the following steps: acquiring a target frame in a target video and adjacent frames of the target frame; respectively extracting the characteristics of the target frame and the adjacent frames to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames; calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature; determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image features; and performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame, so that the deblurring processing effect of the video can be improved.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be performed by instructions, or by controlling associated hardware by instructions, which may be stored in a storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform steps in any one of the image processing methods provided by the embodiment of the present application. For example, the instructions may perform the steps of:

Acquiring a target frame in a target video and adjacent frames of the target frame; respectively extracting the characteristics of the target frame and the adjacent frames to obtain target image characteristics corresponding to the target frame and adjacent image characteristics corresponding to the adjacent frames; calculating the correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature; determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image features; and performing image restoration processing on the target feature map to obtain a processed image frame corresponding to the target frame.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The instructions stored in the storage medium may perform steps in any image processing method provided by the embodiments of the present application, so that the beneficial effects that any image processing method provided by the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.

The image processing method, device, storage medium and terminal provided by the embodiments of the present application are described in detail, and specific examples are applied to illustrate the principles and embodiments of the present application, and the description of the above embodiments is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. An image processing method, the method comprising:

2. The method of claim 1, wherein the calculating a correlation amount of the target frame and the neighboring frame based on the target image feature and the neighboring image feature comprises:

and calculating the index of the second eigenvector to obtain the related quantity.

3. The method according to claim 1, further comprising, after said calculating a correlation amount of said target frame and said adjacent frame based on said target image feature and said adjacent image feature:

the determining a target feature map for the neighboring frame based on the correlation amount and the neighboring image feature comprises:

the target feature map is determined based on the normalized correlation quantity and the neighboring image features.

4. A method according to claim 3, wherein normalizing the correlation quantity to obtain a normalized correlation quantity comprises:

calculating the sum value of the related quantity in a specified characteristic channel;

and calculating the ratio of the correlation quantity to the sum value to obtain the normalized correlation quantity.

5. The method of claim 1, wherein the determining a target feature map for the neighboring frame based on the correlation quantity and the neighboring image feature comprises:

Acquiring an initial feature map of the adjacent image features;

the target feature map is determined based on the initial feature map and the correlation amount.

6. The method of claim 5, wherein the determining the target feature map based on the initial feature map and the correlation amount comprises:

7. The method of claim 6, wherein said calculating the product of the convolved feature map and the correlation quantity to obtain the target feature map comprises:

calculating the product of the characteristic mapping after convolution and the correlation quantity in each layer of characteristic pyramid to obtain the sub-characteristic mapping of each layer of characteristic pyramid;

and aggregating the sub-feature maps of all feature pyramids to obtain the target feature map.

8. The method of claim 1, wherein the neighboring frames comprise a forward neighboring frame and a backward neighboring frame, the neighboring image features comprising a first neighboring image feature corresponding to the forward neighboring frame and a second neighboring image feature corresponding to the backward neighboring frame;

The calculating a correlation amount of the target frame and the adjacent frame based on the target image feature and the adjacent image feature includes:

calculating a first correlation amount of the target frame and the forward neighboring frame based on the target image feature and the first neighboring image feature;

9. The method of claim 8, wherein the determining a target feature map for the neighboring frame based on the correlation quantity and the neighboring image feature comprises:

and connecting the first feature map with the second feature map to obtain the target feature map.

10. The method of claim 1, further comprising, prior to said image restoration processing of said target feature map:

determining an inter-frame correlation amount of the target frame based on the target image feature;

Determining a feature map for the target frame based on the inter-frame correlation amount;

the image restoration processing for the target feature map comprises the following steps:

and performing image restoration processing on the connected feature map.

11. The method of claim 1, wherein the feature extraction of the target frame and the neighboring frame, respectively, comprises:

performing alignment processing on the adjacent frames based on the target frames to obtain aligned adjacent frames;

and respectively extracting the characteristics of the target frame and the aligned adjacent frames to obtain the target image characteristics and the adjacent image characteristics.

12. An image processing apparatus applied to a terminal, comprising:

13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the image processing method of any one of claims 1 to 11.

14. A terminal comprising a processor and a memory, the memory storing a plurality of instructions, the processor loading the instructions to perform the image processing method of any one of claims 1 to 11.