CN112967200B - Image processing method, apparatus, electronic device, medium, and computer program product - Google Patents
Image processing method, apparatus, electronic device, medium, and computer program product Download PDFInfo
- Publication number
- CN112967200B CN112967200B CN202110247381.3A CN202110247381A CN112967200B CN 112967200 B CN112967200 B CN 112967200B CN 202110247381 A CN202110247381 A CN 202110247381A CN 112967200 B CN112967200 B CN 112967200B
- Authority
- CN
- China
- Prior art keywords
- target object
- target
- mask
- neural network
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004590 computer program Methods 0.000 title claims abstract description 19
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 238000003062 neural network model Methods 0.000 claims abstract description 172
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000006870 function Effects 0.000 claims description 26
- 230000000903 blocking effect Effects 0.000 claims description 16
- 230000008569 process Effects 0.000 description 15
- 230000011218 segmentation Effects 0.000 description 12
- 230000000295 complement effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 244000025254 Cannabis sativa Species 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure relates to an image processing method, apparatus, electronic device, medium, and computer program product. The method comprises the steps of obtaining a first mask of a visible region of a target object in an original image, stacking the original image and the first mask, inputting the stacked first mask into a first target neural network model to obtain a first feature image corresponding to the target object, inputting the first feature image with higher accuracy into a second target neural network model to obtain a target complete mask of the target object, and improving the accuracy of the obtained target complete mask of the target object.
Description
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, an apparatus, an electronic device, a medium, and a computer program product.
Background
In the field of image processing, it is generally required to complement some partially blocked target objects in an image, where the target objects may be human bodies or other objects, for example, if a human body part is blocked by a blocking object in one image, it is required to complement the blocked part of the human body by a series of image processing, so as to obtain a complete human body. In general, in the process of complementing a target object, it is often necessary to predict a target complete mask of the target object.
In the prior art, an original image is generally input into an instance segmentation network, a mask of a visible region of a target object is obtained through the instance segmentation network, and then a target complete mask of the target object is obtained through a neural network model based on the mask of the visible region of the target object.
However, the accuracy of obtaining the target complete mask of the target object using the prior art is not high.
Disclosure of Invention
To solve the above technical problems, the present disclosure provides an image processing method, an apparatus, an electronic device, a medium, and a computer program product.
In a first aspect, the present disclosure provides an image processing method, the method comprising:
acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a part of the region of the target object is blocked by a blocking object;
The original image and the first mask are stacked and then input into a first target neural network model, and a first feature map corresponding to the target object is obtained;
inputting the first feature map into a second target neural network model to obtain a target complete mask of the target object;
The first target neural network model and the second target neural network model are obtained through training based on an original image sample, a reference visible region mask of a target object sample and a reference complete mask of the target object sample, wherein the original image sample contains the target object sample, and a part of regions of the target object sample are blocked by a blocking object.
Optionally, the method further comprises:
and processing the first characteristic map through a multi-layer convolution network to obtain a second mask of the visible region of the target object.
Optionally, the method further comprises:
and the target complete mask of the target object is subjected to difference with the second mask, so that the mask of the invisible area of the target object is obtained.
Optionally, before the obtaining the first mask of the visible area of the target object in the original image, the method further includes:
Acquiring a third mask of a visible area of a target object sample in the original image sample;
The original image sample and the third mask are stacked and then input into a first neural network model, and a second feature map corresponding to the target object sample is obtained;
Inputting the second feature map into a second neural network model to obtain a target complete mask of the target object sample;
And training the first neural network model and the second neural network model by taking the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervision signals until the first neural network model and the second neural network model converge, taking the converged first neural network model as the first target neural network model, and taking the converged second neural network model as the second target neural network model.
Optionally, the training the first neural network model and the second neural network model with the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervisory signals includes:
processing the second feature map through a multi-layer convolution network to obtain a fourth mask of the visible region of the target object sample;
Acquiring a first loss function according to a fourth mask of the visible area of the target object sample and a reference visible area mask of the target object sample;
acquiring a second loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample;
Training the first neural network model and the second neural network model according to the first loss function and the second loss function.
Optionally, the acquiring the first mask of the visible area of the target object in the original image includes:
a first mask of a visible region of a target object in an original image is acquired through an instance segmentation network.
Optionally, the inputting the first feature map into a second target neural network model to obtain a target complete mask of the target object includes:
inputting the first feature map into a second target neural network model to obtain a third feature map;
And processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
Optionally, the inputting the first feature map into a second target neural network model to obtain a target complete mask of the target object includes:
The first feature map and a second mask of a visible region of the target object are stacked and then input into the second target neural network model, so that a third feature map is obtained;
And processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
In a second aspect, the present disclosure provides an image processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring a first mask of a visible area of a target object in an original image, wherein the original image contains the target object, and a part of area of the target object is blocked by a blocking object;
the processing module is used for inputting the original image and the first mask after being stacked into a first target neural network model to obtain a first feature map corresponding to the target object;
The processing module is further configured to input the first feature map into a second target neural network model to obtain a target complete mask of the target object;
The first target neural network model and the second target neural network model are obtained through training based on an original image sample, a reference visible region mask of a target object sample and a reference complete mask of the target object sample, wherein the original image sample contains the target object sample, and a part of regions of the target object sample are blocked by a blocking object.
Optionally, the processing module is further configured to process the first feature map through a multi-layer convolution network to obtain a second mask of the visible area of the target object.
Optionally, the processing module is further configured to make a difference between the target complete mask of the target object and the second mask, so as to obtain a mask of the invisible area of the target object.
Optionally, the processing module is further configured to obtain a third mask of the visible area of the target object sample in the original image sample; the original image sample and the third mask are stacked and then input into a first neural network model, and a second feature map corresponding to the target object sample is obtained; inputting the second feature map into a second neural network model to obtain a target complete mask of the target object sample; and training the first neural network model and the second neural network model by taking the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervision signals until the first neural network model and the second neural network model converge, taking the converged first neural network model as the first target neural network model, and taking the converged second neural network model as the second target neural network model.
Optionally, the processing module is specifically configured to process the second feature map through a multi-layer convolution network to obtain a fourth mask of the visible area of the target object sample; acquiring a first loss function according to a fourth mask of the visible area of the target object sample and a reference visible area mask of the target object sample; acquiring a second loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample; training the first neural network model and the second neural network model according to the first loss function and the second loss function.
Optionally, the acquiring module is specifically configured to acquire, through the instance segmentation network, a first mask of a visible region of the target object in the original image.
Optionally, the processing module is specifically configured to input the first feature map into a second target neural network model to obtain a third feature map; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
Optionally, the processing module specifically stacks the first feature map and a second mask of the visible region of the target object and then inputs the second mask into the second target neural network model to obtain a third feature map; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
In a third aspect, the present disclosure provides an electronic device comprising: a processor for executing a computer program stored in a memory, which when executed by the processor implements the steps of the method according to any of the first aspects.
In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.
In a fifth aspect, the present disclosure provides a computer program product which, when run on a computer, causes the computer to perform the image processing method of any one of the first aspects.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
The first target neural network model and the second target neural network model are obtained by training based on the original image sample, the reference visible area mask of the target object sample and the reference complete mask of the target object sample, namely, the target object is trained, so that the original image and the first mask are stacked and then input into the first target neural network model to obtain a first feature map corresponding to the target object, the accuracy of the first feature map is higher, the first feature map with higher accuracy is input into the second target neural network model to obtain the target complete mask of the target object, and the accuracy of the obtained target complete mask of the target object can be improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic diagram of an image processing system provided by the present disclosure;
FIG. 2 is a schematic flow chart of an image processing provided in the present disclosure;
FIG. 3 is a schematic flow chart of another image processing provided in the present disclosure;
FIG. 4 is a schematic flow chart of still another image processing provided in the present disclosure;
fig. 5 is a schematic structural view of an image processing apparatus according to the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
The original image of the present disclosure includes a target object, where a partial area of the target object is blocked by a blocking object, and for convenience of description, an area of the target object blocked by the blocking object is described as an invisible area of the target object. The area of the target object that is not occluded by the occlusion object is described as the visible area of the target object. Wherein the target object may be a human body, an animal or other object, to which the present disclosure is not limited.
In the process of complementing the invisible area of the target object of the original image, it is necessary to predict the mask of the invisible area of the target object, and to complement the target object based on the mask of the invisible area of the target object. So as to be capable of achieving image processing tasks such as target tracking, target detection, image segmentation and the like.
The mask of the invisible area of the target object is typically obtained based on the difference between the target complete mask of the target object and the mask of the visible area of the target object, and thus the accuracy of the target complete mask of the target object directly affects the accuracy of the mask of the invisible area of the target object.
In order to improve the accuracy of the target complete mask of the obtained target object, the cascade neural network model is designed, the cascade neural network model is trained based on the original image sample, the reference visible region mask of the target object sample and the reference complete mask of the target object sample, and the converged cascade neural network model is applied to the process of obtaining the target complete mask of the target object, so that the accuracy of the obtained target complete mask of the target object is improved.
The original image sample comprises a target object sample, a partial area of the target object sample is blocked by a blocking object, a reference visible area mask of the target object sample refers to an accurate visible area mask of the acquired target object sample, and a reference complete mask of the target object sample refers to an accurate target complete mask of the acquired target object sample. For example, the reference visible region mask of the target object sample and the reference complete mask of the target image sample may be obtained by first capturing an image including the target object sample, where the target object sample is not blocked by any obstacle, then placing the obstacle to block a partial region of the target object sample under the condition that the posture of the target object sample is unchanged, capturing an image, performing matting or other image processing on the target object sample in the first image, thereby obtaining the reference complete mask of the target object sample, and performing matting or other image processing on the target object sample in the second image, thereby obtaining the reference visible region mask of the target object sample.
The target object of the present disclosure may be a human body, an animal, or other objects, and the following embodiments of the present disclosure are described and illustrated by taking the human body as an example, and other target objects are similar to the human body and are not described in detail.
The image processing method of the present disclosure is performed by an electronic device. The electronic device may be a tablet computer, a mobile phone (such as a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personaldigital assistant, PDA), a smart television, a smart screen, a high definition television, a 4K television, a smart speaker, an intelligent projector, etc., and the internet of things (THE INTERNET of things, IOT) device, the disclosure does not limit the specific type of the electronic device.
Fig. 1 is a schematic diagram of an image processing system provided in the present disclosure, as shown in fig. 1, the system includes:
The first target neural network model 101 and the second target neural network model 102, wherein the first target neural network model 101 includes a plurality of network layers, the second target neural network model 102 includes a plurality of network layers, and the first target neural network model 101 and/or the second target neural network model 102 may be an hourglass network. The input signal of the first target neural network model 101 is the result of the superposition of the first mask 103 of the visible area of the target object of the original image and the original image 104, and the output of the first target neural network model 101 is the first feature map 105 of the target object; the input signal of the second target neural network model 102 is the first feature map 105, the first feature map 105 may be processed by a multi-layer convolution network (also referred to as a segmentation head) 108 to obtain a second mask 107, the output signal of the second target neural network model 102 is a third feature map 109, and the third feature map 109 may be processed by a multi-layer convolution network (also referred to as a segmentation head) 110 to obtain a target complete mask 106 of the target object.
According to the method and the device, the first feature map of the target object and the target complete mask of the target object are predicted through the two cascaded target neural network models, more parameters are in the two target neural network models, the processing capacity is higher, and therefore the result of the target complete mask of the predicted target object can be more accurate.
Fig. 2 is a schematic flow chart of image processing provided in the present disclosure, and with reference to fig. 1, the method in this embodiment is as follows:
s201: a first mask of a visible region of a target object in an original image is acquired.
The original image comprises a target object, and a part of the target object area is blocked by a blocking object. The visible region of the target object refers to the region of the target object displayed in the original image. The original image typically contains a background, a target object, and an occlusion.
Taking fig. 1 as an example, in which the target object is a human body, the shade is grass, the legs and feet of the human body are shielded by the shade, and the head and upper body and part of the legs are visible regions of the human body.
Illustratively, the original image may have a size of h×w×3, H is high, W is wide, and 3 is the number of channels.
Alternatively, the first mask of the target object visible region in the original image may be acquired through an instance segmentation network. Wherein the size of the first mask is H×W×1, and the values in the first mask follow a 0-1 distribution. However, since the instance segmentation network does not handle the occlusion situation, i.e. the instance segmentation network is used for segmenting various objects, and is not designed for the target object, the first mask of the visible area of the target object of the original image acquired with the instance segmentation network may be incomplete or partially erroneous, with low accuracy, for the case of occlusion of the target object.
In this step, the original image can be noted as: i s, example split network is noted: n i (), the first mask is noted as: m i, then,
Mi=Ni(Is)
S202: and stacking the original image and the first mask, and then inputting the stacked image and the first mask into a first target neural network model to obtain a first feature map corresponding to the target object.
Specifically, the original image and the first mask may be subjected to stacking processing to obtain a stacking result, and the stacking result is input into the first target neural network model to obtain a first feature map corresponding to the target object. In connection with the description in S201, the stacking result is one image of hxw×4, that is, the stacking result is one image of four channels.
The first target neural network model is noted as: the first feature map is: f m the first time period of the first time period, Representing stacking processing, then:
the purpose of this step is to correct the first mask by the first target neural network model.
The first target neural network model is trained based on the reference visible area mask of the target object sample as a supervision signal, and is trained for the target object, so that the accuracy of a first feature map of the target object output through the first target neural network model is higher.
S203: and inputting the first feature map into a second target neural network model to obtain a target complete mask of the target object.
In one implementation manner, the first feature map is input into a second target neural network model to obtain a third feature map, and the third feature map is processed through a multi-layer convolution network to obtain a target complete mask of the target object.
Specifically, assume that the second target neural network model is written as: The target complete mask for the target object is noted as M a, then:
The second target neural network model is trained based on the reference complete mask of the target object sample as a supervision signal, and the first target neural network model is trained on the target object, so that the accuracy of the target complete mask of the target object output through the second target neural network model is higher.
The first target neural network model and the second target neural network model in the present disclosure are obtained by training in a structure of a cascade network based on an original image sample, a reference visible region mask of a target object sample, and a reference complete mask of the target object sample. Therefore, the accuracy of the outputs of the first target neural network model and the second target neural network model can be further improved.
In this embodiment, the first target neural network model and the second target neural network model are obtained by training based on the original image sample, the reference visible region mask of the target object sample, and the reference complete mask of the target object sample, that is, the target object is trained, so that the original image and the first mask are input into the first target neural network model to obtain a first feature map corresponding to the target object, the accuracy of the first feature map is higher, the first feature map with higher accuracy is input into the second target neural network model to obtain the target complete mask of the target object, and the accuracy of the obtained target complete mask of the target object can be improved.
Fig. 3 is a schematic flow chart of another image processing provided in the present disclosure, and fig. 3 is a schematic flow chart of another image processing based on the embodiment shown in fig. 2, further including:
s204: and processing the first characteristic map through a multi-layer convolution network to obtain a second mask of the visible region of the target object.
In some scenes, image processing may be performed using a mask of a visible region, for example, in a scene of target tracking, target tracking may be performed based on only the upper body of a human body assuming that a target object is a human body.
Therefore, the method and the device also process the first feature map through the multi-layer convolution network to obtain the second mask of the visible area of the target object, so that the image can be processed in some scenes, and the accuracy of image processing is improved.
And, the mask of the visible region of the target object and the target complete mask of the target object can be obtained simultaneously through the cascade network of the first target neural network model and the second neural network model. The efficiency of image processing can be improved for a scene in which image processing is performed by simultaneously applying a mask of a visible region of a target object and a target complete mask of the target object.
Optionally, another implementation of S203 is: the first feature map and a second mask of a visible region of the target object are stacked and then input into the second target neural network model, so that a third feature map is obtained; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
Specifically, the second mask is noted as: m m, and correspondingly, the target complete mask of the target object is:
A third feature map is obtained by stacking the first feature map and a second mask of a visible region of the target object and then inputting the second mask into the second target neural network model; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object. The accuracy of the target complete mask of the acquired target object can be further improved.
Optionally, further, in a scenario where the image processing is performed by applying the mask of the visible area of the target object and the target complete mask of the target object at the same time, for example, in a scenario where the target object is completed, the target object is required to be completed based on the mask of the invisible area, where the mask of the invisible area is required to be obtained by using the target complete mask of the target object and the mask of the visible area, and because the accuracy of the target complete mask of the target object and the mask of the visible area is higher, the accuracy of the obtained mask of the invisible area is also higher.
The method further comprises the steps of:
s205: and the target complete mask of the target object is subjected to difference with the second mask, so that the mask of the invisible area of the target object is obtained.
In the present disclosure, since the accuracy of the target complete mask and the second mask is improved, the accuracy of the mask of the invisible area of the target object is also improved by making the target complete mask and the second mask of the target object worse, and the image complement and other processes can be performed based on the obtained mask of the invisible area, so that the image complement effect can be further improved.
Fig. 4 is a schematic flow chart of still another image processing provided in the present disclosure, and fig. 4 is a flowchart of the embodiment shown in fig. 2 or fig. 3, further including: a process of acquiring the first target neural network model and the second target neural network model, that is, a process of acquiring the first target neural network model and the second target neural network model through machine learning, is as shown in fig. 4:
S401: a third mask of the visible region of the target object sample in the original image sample is acquired.
The original image sample comprises a target object sample, and a part of the region of the target object sample is blocked by a blocking object. The visible region of the target object sample refers to the region of the target object sample displayed in the original image sample. The original image sample typically contains a background, a target object sample, and an occlusion.
Optionally, a third mask of the target object sample visible region in the original image sample may be obtained by an instance segmentation network.
S402: and stacking the original image sample and the third mask, and then inputting the stacked original image sample and the third mask into a first neural network model to obtain a second characteristic diagram corresponding to the target object sample.
S403: and inputting the second feature map into a second neural network model to obtain a target complete mask of the target object sample.
S404: and training the first neural network model and the second neural network model by taking the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervision signals until the first neural network model and the second neural network model converge, taking the converged first neural network model as the first target neural network model, and taking the converged second neural network model as the second target neural network model.
The method comprises the steps of training a first neural network model and a second neural network model by taking a reference complete mask of a target object sample and a reference visible area mask of the target object sample as supervision signals, namely, training the first neural network model and the second neural network model in a cascade manner by taking the reference complete mask of the target object sample as the supervision signals of the first neural network model and the reference visible area mask of the target object sample as the supervision signals of the second neural network model.
Specifically, one possible implementation is:
And processing the second feature map through a multi-layer convolution network to obtain a fourth mask of the visible region of the target object sample.
And acquiring a first loss function according to the fourth mask of the visible area of the target object sample and the reference visible area mask of the target object sample. And obtaining a second loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample. Training the first neural network model and the second neural network model according to the first loss function and the second loss function. That is, the parameters of the first and second neural network models are adjusted according to the first and second loss functions until the first and second neural network models converge, that is, until the first and second loss functions meet a preset requirement, that is, until the accuracy of the fourth mask of the visible region of the output target object sample and the target complete mask of the target object sample meet the requirement, then the first and second neural network models are considered to converge. Taking the converged first neural network model as a first target neural network model, and taking the second neural network model as a second target neural network model.
Specifically, the first neural network model and the second neural network model may be trained according to the first loss function and the second loss function based on a gradient descent method until the first neural network model and the second neural network model converge.
According to the embodiment, the mask of the visible area output by the first neural network model is supervised based on the reference visible area mask, so that the accuracy of the mask of the visible area output by the first neural network model can be improved, and the accuracy of the target complete mask output by the second neural network model is improved based on the reference complete mask. Based on the reference visible region mask of the target object sample and the reference complete mask of the target object sample as supervision signals, the cascade network formed by the first neural network model and the second neural network model is trained, so that the accuracy of the mask of the visible region output by the first neural network model and the accuracy of the target complete mask output by the second neural network model can be further improved. And predicting the target complete mask of the target object by utilizing the converged cascade network, so that the accuracy of the predicted target complete mask can be improved.
Fig. 5 is a schematic structural diagram of an image processing apparatus according to the present disclosure, and as shown in fig. 5, the apparatus according to the present embodiment includes: an acquisition module 501, and a processing module 502, wherein,
An obtaining module 501, configured to obtain a first mask of a visible area of a target object in an original image, where the original image includes the target object, and a partial area of the target object is blocked by a blocking object;
The processing module 502 is configured to stack the original image and the first mask, and then input the stack result into a first target neural network model, so as to obtain a first feature map corresponding to the target object;
The processing module 502 is further configured to input the first feature map into a second target neural network model, to obtain a target complete mask of the target object;
The first target neural network model and the second target neural network model are obtained through training based on an original image sample, a reference visible region mask of a target object sample and a reference complete mask of the target object sample, wherein the original image sample contains the target object sample, and a part of regions of the target object sample are blocked by a blocking object.
Optionally, the processing module 502 is further configured to process the first feature map through a multi-layer convolution network to obtain a second mask of the visible area of the target object.
Optionally, the processing module 502 is further configured to make a difference between the target complete mask of the target object and the second mask, to obtain a mask of the invisible area of the target object.
Optionally, the processing module 502 is further configured to obtain a third mask of a visible area of the target object sample in the original image sample; the original image sample and the third mask are stacked and then input into a first neural network model, and a second feature map corresponding to the target object sample is obtained; inputting the second feature map into a second neural network model to obtain a target complete mask of the target object sample; and training the first neural network model and the second neural network model by taking the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervision signals until the first neural network model and the second neural network model converge, taking the converged first neural network model as the first target neural network model, and taking the converged second neural network model as the second target neural network model.
Optionally, the processing module 502 is specifically configured to process the second feature map through a multi-layer convolution network, so as to obtain a fourth mask of the visible area of the target object sample; acquiring a first loss function according to a fourth mask of the visible area of the target object sample and a reference visible area mask of the target object sample; acquiring a second loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample; training the first neural network model and the second neural network model according to the first loss function and the second loss function.
Optionally, the acquiring module 501 is specifically configured to acquire, through the instance segmentation network, a first mask of a visible region of the target object in the original image.
Optionally, the processing module 502 is specifically configured to input the first feature map into a second target neural network model to obtain a third feature map; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
Optionally, the processing module 502 is specifically configured to stack the first feature map and a second mask of the visible region of the target object, and then input the stack into the second target neural network model to obtain a third feature map; and processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
The device of the present embodiment, corresponding to the technical solution that may be used to execute the foregoing method embodiment, has similar implementation principles and technical effects, and will not be described herein again.
The present disclosure also provides an electronic device, including: a processor for executing a computer program stored in a memory, which when executed by the processor performs the steps of the method embodiment described in any of figures 2-4. It should be noted that the processor may be a graphics processor (Graphics Processing Unit, GPU), i.e. the program algorithm of the present disclosure may be completed by the GPU entirely. Illustratively, pyTorch, etc. of the unified computing device architecture (Compute Unified Device Architecture, CUDA) may be employed.
The present disclosure provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method embodiment described in any of fig. 2-4.
The present disclosure provides a computer program product which, when run on a computer, causes the computer to perform the steps of the method embodiment described in any one of fig. 2-4.
In the above-described embodiments, all or part of the functions may be implemented by software, hardware, or a combination of software and hardware. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present disclosure are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid State Disk (SSD)) or the like.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. An image processing method, the method comprising:
acquiring a first mask of a visible region of a target object in an original image, wherein the original image contains the target object, and a part of the region of the target object is blocked by a blocking object;
The original image and the first mask are stacked and then input into a first target neural network model, and a first feature map corresponding to the target object is obtained;
inputting the first feature map into a second target neural network model to obtain a target complete mask of the target object;
The first target neural network model is obtained by training based on an original image sample as input and a reference visible region mask of a target object sample as a supervision signal, the second target neural network model is obtained by training based on a second feature map corresponding to the target object sample as input and a reference complete mask of the target object sample as a supervision signal, wherein the original image sample contains the target object sample, a part of the region of the target object sample is blocked by a blocking object, and a second feature map corresponding to the target object sample is output by the first target neural network model.
2. The method according to claim 1, wherein the method further comprises:
and processing the first characteristic map through a multi-layer convolution network to obtain a second mask of the visible region of the target object.
3. The method as recited in claim 2, further comprising:
and the target complete mask of the target object is subjected to difference with the second mask, so that the mask of the invisible area of the target object is obtained.
4. A method according to any one of claims 1-3, wherein prior to said obtaining the first mask of the visible region of the target object in the original image, further comprising:
Acquiring a third mask of a visible area of a target object sample in the original image sample;
The original image sample and the third mask are stacked and then input into a first neural network model, and a second feature map corresponding to the target object sample is obtained;
Inputting the second feature map into a second neural network model to obtain a target complete mask of the target object sample;
And training the first neural network model and the second neural network model by taking the reference complete mask of the target object sample and the reference visible area mask of the target object sample as supervision signals until the first neural network model and the second neural network model converge, taking the converged first neural network model as the first target neural network model, and taking the converged second neural network model as the second target neural network model.
5. The method of claim 4, wherein training the first neural network model and the second neural network model using the reference complete mask of the target object samples and the reference visible region mask of the target object samples as supervisory signals comprises:
processing the second feature map through a multi-layer convolution network to obtain a fourth mask of the visible region of the target object sample;
Acquiring a first loss function according to a fourth mask of the visible area of the target object sample and a reference visible area mask of the target object sample;
acquiring a second loss function according to the target complete mask of the target object sample and the reference complete mask of the target object sample;
Training the first neural network model and the second neural network model according to the first loss function and the second loss function.
6. A method according to any one of claims 1-3, wherein said inputting the first feature map into a second target neural network model results in a target complete mask for the target object, comprising:
Inputting the first feature map into a second target neural network model to obtain a third feature map;
And processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
7. A method according to claim 2 or 3, wherein said inputting the first feature map into a second target neural network model results in a target complete mask for the target object, comprising:
The first feature map and a second mask of a visible region of the target object are stacked and then input into the second target neural network model, so that a third feature map is obtained;
And processing the third feature map through a multi-layer convolution network to obtain a target complete mask of the target object.
8. An image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a first mask of a visible area of a target object in an original image, wherein the original image contains the target object, and a part of area of the target object is blocked by a blocking object;
the processing module is used for inputting the original image and the first mask after being stacked into a first target neural network model to obtain a first feature map corresponding to the target object;
The processing module is further configured to input the first feature map into a second target neural network model to obtain a target complete mask of the target object;
The first target neural network model is obtained by training based on an original image sample as input and a reference visible region mask of a target object sample as a supervision signal, the second target neural network model is obtained by training based on a second feature map corresponding to the target object sample as input and a reference complete mask of the target object sample as a supervision signal, wherein the original image sample contains the target object sample, a part of the region of the target object sample is blocked by a blocking object, and a second feature map corresponding to the target object sample is output by the first target neural network model.
9. An electronic device, comprising: a processor for executing a computer program stored in a memory, which when executed by the processor carries out the steps of the method according to any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-7.
11. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the image processing method as claimed in any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110247381.3A CN112967200B (en) | 2021-03-05 | 2021-03-05 | Image processing method, apparatus, electronic device, medium, and computer program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110247381.3A CN112967200B (en) | 2021-03-05 | 2021-03-05 | Image processing method, apparatus, electronic device, medium, and computer program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112967200A CN112967200A (en) | 2021-06-15 |
CN112967200B true CN112967200B (en) | 2024-09-13 |
Family
ID=76276728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110247381.3A Active CN112967200B (en) | 2021-03-05 | 2021-03-05 | Image processing method, apparatus, electronic device, medium, and computer program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112967200B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902747A (en) * | 2021-08-13 | 2022-01-07 | 阿里巴巴达摩院(杭州)科技有限公司 | Image processing method, computer-readable storage medium, and computing device |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11741639B2 (en) * | 2016-03-02 | 2023-08-29 | Holition Limited | Locating and augmenting object features in images |
JP7179515B2 (en) * | 2018-07-13 | 2022-11-29 | キヤノン株式会社 | Apparatus, control method and program |
CN110070056B (en) * | 2019-04-25 | 2023-01-10 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus, storage medium, and device |
CN110163864B (en) * | 2019-05-28 | 2020-12-04 | 北京迈格威科技有限公司 | Image segmentation method and device, computer equipment and storage medium |
CN110503097A (en) * | 2019-08-27 | 2019-11-26 | 腾讯科技(深圳)有限公司 | Training method, device and the storage medium of image processing model |
CN111145206B (en) * | 2019-12-27 | 2024-03-01 | 联想(北京)有限公司 | Liver image segmentation quality assessment method and device and computer equipment |
CN111339903B (en) * | 2020-02-21 | 2022-02-08 | 河北工业大学 | Multi-person human body posture estimation method |
CN112019828B (en) * | 2020-08-14 | 2022-07-19 | 上海网达软件股份有限公司 | Method for converting 2D (two-dimensional) video into 3D video |
-
2021
- 2021-03-05 CN CN202110247381.3A patent/CN112967200B/en active Active
Non-Patent Citations (1)
Title |
---|
基于先验信息建模的视频目标分割和补全研究;周强;《中国优秀硕士学位论文全文数据库 信息科技辑》;20230115(第1期);第I138-1227页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112967200A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9697416B2 (en) | Object detection using cascaded convolutional neural networks | |
Chen et al. | Median filtering forensics based on convolutional neural networks | |
CN110910422A (en) | Target tracking method and device, electronic equipment and readable storage medium | |
CN113469072B (en) | Remote sensing image change detection method and system based on GSoP and twin fusion network | |
CN111831844A (en) | Image retrieval method, image retrieval device, image retrieval apparatus, and medium | |
CN113191489B (en) | Training method of binary neural network model, image processing method and device | |
US20230252605A1 (en) | Method and system for a high-frequency attention network for efficient single image super-resolution | |
CN111709415B (en) | Target detection method, device, computer equipment and storage medium | |
CN110782430A (en) | Small target detection method and device, electronic equipment and storage medium | |
CN117131376A (en) | Hyperspectral cross-domain robust anomaly detection method, system, equipment and medium for continuous learning based on visual transformation combined generation countermeasure network | |
CN112967200B (en) | Image processing method, apparatus, electronic device, medium, and computer program product | |
CN113723352B (en) | Text detection method, system, storage medium and electronic equipment | |
CN111507252A (en) | Human body falling detection device and method, electronic terminal and storage medium | |
CN112749576B (en) | Image recognition method and device, computing equipment and computer storage medium | |
US20240331355A1 (en) | Synchronous Processing Method, System, Storage medium and Terminal for Image Classification and Object Detection | |
CN114419051B (en) | Method and system for adapting to multi-task scene containing pixel level segmentation | |
CN116682076A (en) | Multi-scale target detection method, system and equipment for ship safety supervision | |
CN116468902A (en) | Image processing method, device and non-volatile computer readable storage medium | |
CN116310899A (en) | YOLOv 5-based improved target detection method and device and training method | |
CN114820755A (en) | Depth map estimation method and system | |
CN112967197B (en) | Image processing method, apparatus, electronic device, medium, and computer program product | |
CN108268815B (en) | Method and device for understanding image scene | |
CN113033334B (en) | Image processing method, image processing device, electronic equipment and medium | |
CN109902764A (en) | Deformation pattern data detection method, device and equipment based on artificial intelligence | |
CN117593619B (en) | Image processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |