CN111292333B

CN111292333B - Method and apparatus for segmenting an image

Info

Publication number: CN111292333B
Application number: CN201811492808.0A
Authority: CN
Inventors: 陆苗; 陈标龙
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2024-05-17
Anticipated expiration: 2038-12-07
Also published as: CN111292333A

Abstract

The embodiment of the application discloses a method and a device for segmenting an image. One embodiment of the method comprises the following steps: acquiring a target image, wherein the target image comprises a foreground image; inputting a target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that a pixel point included in the target image is positioned in an area where a foreground image is positioned; based on the probability set, performing expansion operation on the area where the foreground image is located to obtain an image after the expansion operation, wherein the expansion operation is used for increasing the number of pixel points included in the area where the foreground image is located; the target image is divided into a foreground image and a non-foreground image based on the target image and the post-dilation operation image. This embodiment improves the accuracy of the image segmentation.

Description

Method and apparatus for segmenting an image

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for segmenting an image.

Background

Image processing (image processing), also known as image processing, is a technique that uses a computer to analyze an image to achieve a desired result. Image processing generally refers to digital image processing. The digital image is a two-dimensional array image obtained by photographing with an industrial camera, a video camera, a scanner and the like, and the elements of the array are called pixels, and the values thereof are called gray values.

Currently, there is an increasing demand for image processing. For example, in some scenes, there is a need to distinguish a foreground image from a background image included in an image, so that image background replacement, photo composition, movies, animation, human special effects, and the like can be realized.

Disclosure of Invention

The embodiment of the application provides a method and a device for segmenting an image.

In a first aspect, an embodiment of the present application provides a method for segmenting an image, the method comprising: acquiring a target image, wherein the target image comprises a foreground image; inputting a target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that a pixel point included in the target image is positioned in an area where a foreground image is positioned; based on the probability set, performing expansion operation on the area where the foreground image is located to obtain an image after the expansion operation, wherein the expansion operation is used for increasing the number of pixel points included in the area where the foreground image is located; the target image is divided into a foreground image and a non-foreground image based on the target image and the post-dilation operation image.

In some embodiments, the foreground image comprises a facial image; and performing expansion operation on the region where the foreground image is located based on the probability set to obtain an expanded image, wherein the expansion operation comprises the following steps: determining a position of the face image in the target image; performing expansion operation on the foreground image based on the probability set to obtain an expanded foreground image; for a pixel point in the dilated foreground image, determining the pixel point as a target pixel point in response to determining that a distance between the pixel point and a position is less than a preset distance threshold; and determining an image formed by the target pixel points in the dilation foreground image as a post-dilation operation image.

In some embodiments, the semantic segmentation model is trained beforehand by: acquiring a training sample set, wherein the training sample comprises an image and a labeling information set corresponding to the image, and the labeling information is used for representing whether pixel points in the image are positioned in an area where a foreground image is positioned; and using a machine learning algorithm, taking an image included in a training sample as input, taking a labeling information set corresponding to the input image as expected output, and training to obtain a semantic segmentation model.

In some embodiments, segmenting the target image into a foreground image and a non-foreground image based on the target image and the post-dilation operation image comprises: and dividing the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation by adopting a Global matching algorithm or a depth Matting algorithm DEEP IMAGE MATTING.

In some embodiments, the structure of the semantic segmentation model is deeplab v plus structure.

In a second aspect, an embodiment of the present application provides an apparatus for segmenting an image, the apparatus comprising: an acquisition unit configured to acquire a target image, wherein the target image includes a foreground image; the input unit is configured to input the target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that the pixel points included in the target image are located in the area where the foreground image is located; the expansion unit is configured to perform expansion operation on the area where the foreground image is located based on the probability set to obtain an image after the expansion operation, wherein the expansion operation is used for increasing the number of pixel points included in the area where the foreground image is located; an extraction unit configured to divide the target image into a foreground image and a non-foreground image based on the target image and the post-expansion operation image.

In some embodiments, the foreground image comprises a facial image; the expansion unit includes: a first determining module configured to determine a position of a face in a target image; the expansion module is configured to perform expansion operation on the foreground image based on the probability set to obtain an expanded foreground image; a second determining module configured to determine, for a pixel in the dilated foreground image, the pixel as a target pixel in response to determining that a distance between the pixel and a location is less than a preset distance threshold; and a third determination module configured to determine an image composed of the target pixel points in the dilated foreground image as a dilated operation image.

In some embodiments, the extraction unit comprises: the extraction module is configured to divide the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation by adopting a Global Matting algorithm or a depth Matting algorithm DEEP IMAGE MATTING.

In a third aspect, an embodiment of the present application provides an electronic device for segmenting an image, including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement a method as in any of the embodiments of the method for segmenting an image described above.

In a fourth aspect, embodiments of the present application provide a computer-readable medium for segmenting an image, having stored thereon a computer program which, when executed by a processor, implements a method as in any of the embodiments of the method for segmenting an image described above.

According to the method and the device for segmenting the image, the target image is acquired, then the target image is input into the pre-trained semantic segmentation model to obtain the probability set, then the region where the foreground image is located is subjected to expansion operation based on the probability set to obtain the image after the expansion operation, wherein the expansion operation is used for increasing the number of pixel points included in the region where the foreground image is located, and finally the target image is segmented into the foreground image and the non-foreground image based on the target image and the image after the expansion operation, so that the accuracy of image segmentation is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for segmenting an image according to the present application;

3A-3C are schematic diagrams of an application scenario of a method for segmenting an image according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a method for segmenting an image according to the present application;

FIG. 5 is a schematic structural view of one embodiment of an apparatus for segmenting an image according to the present application;

Fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of a method for segmenting an image or an apparatus for segmenting an image of the embodiments of the application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as an image processing class application, a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image presentation, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for images displayed on the terminal devices 101, 102, 103. The background server may analyze and process the received data such as the image processing request, and feed back the processing result (for example, the processed image) to the terminal device.

It should be noted that, the method for dividing an image provided by the embodiment of the present application may be performed by the server 105, and accordingly, the apparatus for dividing an image may be disposed in the server 105. In addition, the method for dividing an image provided by the embodiment of the present application may also be performed by the terminal devices 101, 102, 103, and accordingly, the apparatus for dividing an image may also be provided in the terminal devices 101, 102, 103.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. The system architecture may include only the electronic device on which the method for segmenting an image is operating when the electronic device on which the method for segmenting an image is operating does not require data transmission with other electronic devices.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for segmenting an image according to the present application is shown. The method for segmenting an image comprises the steps of:

in step 201, a target image is acquired.

In the present embodiment, an execution subject of the method for dividing an image (e.g., a server or a terminal device shown in fig. 1) may acquire a target image by a wired connection method or a wireless connection method. Wherein the target image comprises a foreground image. For example, the target image may be an image containing a person, an animal, or the like. Wherein, when the target image is an image containing a person, the foreground image may be an image of the person. When the target image is an image containing an animal, the foreground image may be an image of the animal.

Here, when the execution subject is a server, the execution subject may acquire the target image from a terminal device locally or in communication connection therewith; when the execution subject is a terminal device, the execution subject may acquire the target image from a device storing the target image, which is locally or communicatively connected thereto.

Step 202, inputting the target image into a pre-trained semantic segmentation model to obtain a probability set.

In this embodiment, the execution subject may input the target image obtained in step 201 to a pre-trained semantic segmentation model, to obtain a probability set. The semantic segmentation model can be used for calculating the probability that a pixel point included in an image is located in an area where a foreground image included in the image is located. The probabilities can be used to characterize the probability that the pixels comprised in the target image are located in the region of the foreground image.

It will be appreciated that there is typically a certain rate of erroneous determination when processing the pixel point images. For example, when the colors of the foreground and background images are relatively close, it may be difficult for a computer to accurately determine whether each pixel belongs to the foreground or background image. Therefore, the probability that the pixel points included in the image are positioned in the area where the foreground image included in the image is positioned can be calculated through the semantic segmentation model, and the judgment of whether each pixel point belongs to the foreground image or the background image is facilitated.

The structure of the semantic segmentation model may be a complete convolutional network (Fully Convolutional Networks) structure, an encoder-decoder (encoder-decoder) structure, or a hole convolution (Dilated Convolutions) structure (a structure that does not include a pooling layer of a convolutional neural network), or may be other existing semantic segmentation models, or may be a semantic segmentation model that is not known in the future.

As an example, the semantic segmentation model may be a correspondence table or database in which the pixel points and probabilities of the image (i.e., the probabilities that the pixel points are the regions where the foreground image included in the image is represented) and the correspondence between the pixel points and the probabilities are obtained by a technician according to a large number of statistics.

In practice, the semantic Segmentation model may group (Grouping) or segment (Segmentation) pixels in an image according to differences in the meaning of the semantic expression in the image.

In some alternative implementations of the present embodiment, the semantic segmentation model has a structure of deeplab V3+ (deeplab v plus) structure.

It can be understood that when the structure of the semantic segmentation model is a structure, more accurate probability can be obtained, so that the segmentation result of the image is more accurate, and compared with the structures of other semantic segmentation models, the semantic segmentation model with deeplab v3+ structure can further improve the precision and speed of the segmentation algorithm.

In some optional implementations of this embodiment, the semantic segmentation model may be obtained by training the execution body or other electronic devices through the following steps:

First, a training sample set is obtained. The training samples in the training sample set may include an image and a set of labeling information corresponding to the image, where the labeling information corresponding to the image may be used to characterize whether a pixel point in the image is located in an area where a foreground image is located. The labeling information may be obtained by manually labeling related personnel (such as labeling personnel) or may be automatically labeled by using a device such as a computer.

Then, using a machine learning algorithm, taking an image included in a training sample as an input of an initial model (such as a full convolution network which is not trained or has been subjected to preliminary training but not trained, a deeplab V3+ structure model and the like), taking a labeling information set corresponding to the input image as an expected output of the initial model, and training to obtain a semantic segmentation model.

Specifically, the difference between the output result actually output by the model in the training process (representing whether the pixel point in the image is located in the area where the foreground image is located) and the corresponding labeling information may be calculated by using a preset loss function, for example, the difference between the actual output and the expected output obtained by calculating the euclidean distance may be used as the loss function. Then, based on the calculated difference, the parameter of the initial model can be adjusted by adopting a back propagation method, and training is ended under the condition that the preset training ending condition is met, so that the semantic segmentation model with the training completed is obtained.

Optionally, after finishing training, the executing body or other electronic devices may further test the obtained semantic segmentation model to determine whether the semantic segmentation model can be used for image segmentation by:

First, a set of test samples is obtained. The training samples in the test sample set may include an image and a set of labeling information corresponding to the image, where the labeling information corresponding to the image may be used to characterize whether a pixel point in the image is located in an area where a foreground image is located. The labeling information may be obtained by manually labeling related personnel (such as labeling personnel), or may be automatically labeled by a device such as a computer.

Then, the images included in all or part of the training samples in the test sample set are input into a semantic segmentation model to be tested, so that output results (representing whether pixel points in the images are located in the area where the foreground images are located) corresponding to the input images are obtained.

And then comparing the obtained output result with labeling information included in training samples in the test sample set to determine the accuracy of the semantic segmentation model, and further determining whether the semantic segmentation model can be used for image segmentation.

Here, the above accuracy may be used to characterize the accuracy of the semantic segmentation model. As an example, the above-mentioned accuracy may be a ratio of the number of output results representing the same meaning as the labeling information to the number of pixels of the input image. But may be values obtained by other means for characterizing the accuracy of the semantic segmentation model.

It should be noted that, the preset training ending conditions may include, but are not limited to, at least one of the following: the training time exceeds the preset duration; the training times exceed the preset times; the calculated variance is less than a preset variance threshold.

And 203, performing expansion operation on the area where the foreground image is based on the probability set to obtain an image after expansion operation.

In this embodiment, the execution subject may perform the expansion operation on the region where the foreground image is located based on the probability set, to obtain the image after the expansion operation. The image after the expansion operation may be an image obtained after the execution subject performs the expansion operation on the region where the foreground image is located. The dilation operation is used to increase the number of pixels comprised in the region where the foreground image is located.

Here, the dilation operation may be a process of merging background pixels adjacent to the foreground image into the foreground image to dilate the boundary of the foreground image to the outside. In practice, various dilation algorithms may be employed to dilate the region where the foreground image is located. For example, the expansion algorithm described above may include, but is not limited to, any of the following: an expansion algorithm based on Manhattan distance and an expansion algorithm based on structural element decomposition. It will be appreciated that the dilation operation may be used to fill in holes comprised in the foreground images.

As an example, the execution subject may first determine, according to a set of probabilities, a foreground image included in the target image. For example, the execution subject may determine, as the foreground image, an image composed of pixels corresponding to a probability greater than a preset probability threshold in the probability set. Then, the execution body may perform an expansion operation on the area where the determined foreground image is located, to obtain an image after the expansion operation.

As yet another example, the execution subject may first determine, based on a probability set, an area in which the foreground image included in the target image is located, and an image area in which the target image includes an uncertainty as to whether the foreground image is a background image or not (for example, the execution subject may determine, as the foreground image, an image composed of pixels corresponding to a probability greater than a preset probability threshold in the probability set, and determine, as the image area of whether the foreground image is a background image or not, an image composed of pixels corresponding to a probability greater than another probability threshold (a probability threshold smaller than the probability threshold) in the probability set. Then, the executing body may determine, for each pixel point in the image area where it is not determined whether the foreground image or the background image is located, the pixel point as the area where the foreground image is located in response to determining that the distance between the pixel point and the area where the foreground image is located is smaller than a preset distance threshold value, so as to expand the area where the foreground image determined by the executing body is located, and obtain an expanded foreground image. Then, the execution body may perform an expansion operation on the region where the expanded foreground image is located, to obtain an image after the expansion operation.

It will be appreciated that the images obtained after the expansion operation performed by the subject may be generally continuous areas, which may enable the foreground images obtained finally to not include images such as holes. Thus, some problems (such as including a hole image in the foreground image and the problem of boundary fusion of the foreground image) existing when only the foreground image of the image is extracted by adopting the semantic segmentation model are overcome.

In some use cases, the post-dilation operation image may be a Trimap image of the foreground image. It will be appreciated that Trimap is a rough division of the image. For example, an image is divided into a foreground image, a background image, and an unknown region to be solved (i.e., a region composed of pixels belonging to the foreground image or the background image cannot be determined).

Alternatively, the post-expansion image may be an image marked in the stokes manner. Here, strokes is a way to mark on an image using graffiti.

Here, the dilation operation may be a process of merging a background point in contact with a foreground image into the foreground image, expanding a boundary of the foreground image. The dilation operation may be used to fill in holes in the foreground images.

The expansion operation is a well-known technique widely studied at present. And will not be described in detail herein.

Step 204, dividing the target image into a foreground image and a non-foreground image based on the target image and the image after the dilation operation.

In the present embodiment, the execution subject may divide the target image into the foreground image and the non-foreground image based on the target image and the post-expansion operation image. Wherein the non-foreground image may be an image other than the foreground image in the image. For example, the non-foreground images may include background images and/or image areas that are not determined to be foreground or background.

As an example, the execution body may traverse each pixel point of the target image, and if the probability that the pixel point corresponds to (i.e., the probability that the pixel point is located in the area where the foreground image is located) is greater than a preset probability threshold, and the pixel point is located in the area where the foreground image indicated by the image after the expansion operation, the execution body may determine the pixel point as the pixel point included in the foreground image. If the probability that the pixel point corresponds to (i.e., the probability that the pixel point is located in the area where the foreground image is located) is less than or equal to the probability threshold, or the pixel point is located outside the area where the foreground image indicated by the image after the expansion operation is located, the execution subject may determine the pixel point as a pixel point included in the non-foreground image. Thus, the execution subject can divide the target image into a foreground image and a non-foreground image.

In some optional implementations of this embodiment, the step 204 may include: and dividing the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation by using a matting algorithm. Illustratively, the matting algorithm may include, but is not limited to: bayesian Matting algorithm (Bayes Matting), poisson fusion Matting algorithm (Poisson Matting), closed surface Matting algorithm (Closed form Matting), global Matting algorithm or depth Matting algorithm (DEEP IMAGE MATTING).

As an example, the execution subject may first perform matting on the target image by using a matting algorithm, so as to obtain a foreground image included in the target image. And then, extracting a foreground image in the image after the expansion operation by adopting a matting algorithm. Then, the executing body may determine the pixel point as a pixel point in the segmented foreground image, by comparing the pixel point in the foreground image included in the obtained target image with the pixel point in the foreground image included in the extracted post-dilation operation image, and by comparing the pixel points included in both the foreground image included in the obtained target image and the foreground image included in the extracted post-dilation operation image. For a pixel included only in the foreground image included in the obtained target image or only in the foreground image in the extracted image after the dilation operation, the execution body may determine a probability that the pixel corresponds, and if the probability that the pixel corresponds is greater than a predetermined probability threshold, the execution body may determine the pixel as a pixel in the segmented foreground image; if the probability threshold is less than or equal to the probability threshold, the execution body may determine the pixel as a pixel in the segmented non-foreground image. Thus, the execution subject can divide the target image into a foreground image and a non-foreground image.

Alternatively, the executing body may further obtain the position of the foreground image in the target image according to the image after the expansion operation, so that the executing body may divide the target image into the foreground image and the non-foreground image after obtaining the position of the foreground image in the target image.

It can be understood that the semantic segmentation model focuses on semantic understanding of each pixel, the pixels with the same semantic meaning can be segmented into the same part, and the obtained result can be a plurality of block-shaped areas, so that whether the areas are connected naturally or not is not considered by the semantic segmentation model. The image can be divided into two areas of a foreground image and a non-foreground image by the matting algorithm, and the processing effects of details such as hair and the like can be more accurate by some matting algorithms, so that gradual change effects can be generated in the interaction area of the foreground image and the non-foreground image, and the matting is more natural. Therefore, by combining the semantic segmentation model and the matting algorithm, the accuracy of executing the segmentation of the main image can be improved, and the boundary fusion of the segmented foreground image and the non-foreground image can be improved.

With continued reference to fig. 3A-3C, fig. 3A-3C are schematic diagrams of one application scenario of the method for segmenting an image according to the present embodiment. As shown in fig. 3A, the server first acquires the target image. Wherein the target image comprises a foreground image (illustrated as a cartoon character). The server may then input the target image into a pre-trained semantic segmentation model, resulting in a set of probabilities (e.g., a set of probabilities characterizing the region where the pixels included in FIG. 3A are located in the foreground map). Then, the server performs a dilation operation on the region where the foreground image is located based on the probability set, resulting in a post-dilation operation image (as shown in fig. 3B, the post-dilation operation image includes a foreground image 303, a background image 301, and a region 302 that cannot be determined as either the foreground image or the background image). Finally, the server segments the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation (as shown in fig. 3C, where the area where the foreground image is located is an image area 305, and the area where the background image is located is an image area 304, and further, the server may segment the target image according to the area 305 where the foreground image is located and the area 304 where the background image is located, to obtain the foreground image and the non-foreground image).

According to the method provided by the embodiment of the application, the acquired target image is input into the pre-trained semantic segmentation model to obtain the set of probabilities that the pixels included in the target image are positioned in the area where the foreground image is positioned, and the expansion operation is performed on the area where the foreground image is positioned based on the set of probabilities, so that the target image is segmented into the foreground image and the non-foreground image based on the target image and the image after the expansion operation, and the semantic segmentation model and the matting algorithm are combined, so that the accuracy of image segmentation is improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for segmenting an image is shown. The flow 400 of the method for segmenting an image comprises the steps of:

step 401, acquiring a target image.

In this embodiment, step 401 is substantially identical to step 201 in the corresponding embodiment of fig. 2, and will not be described herein.

Step 402, inputting the target image into a pre-trained semantic segmentation model to obtain a probability set.

In this embodiment, step 402 is substantially identical to step 202 in the corresponding embodiment of fig. 2, and will not be described herein.

Step 403, determining a position of the face image in the target image.

In the present embodiment, in the case where the foreground image included in the above-described target image includes a face image, an execution subject (e.g., a server or a terminal device shown in fig. 1) of the method for segmenting an image may determine the position of the face image in the target image. The face image may be a face image or a face image of an animal or cartoon character.

As an example, the above-described execution subject may determine the position of the face image in the target image by means of face detection, or may determine the position of the face image in the target image by means of acquiring the position of the face image annotated by the annotator in the target image. The position determined by the execution subject may be a region position where the contour of the face image is located, or may be a position where a feature point (for example, a nose midpoint, a face midpoint, or the like) of the face image is located.

Step 404, performing expansion operation on the foreground image based on the probability set to obtain an expanded foreground image.

In this embodiment, the execution subject may perform the dilation operation on the foreground image based on the probability set, to obtain a dilated foreground image. The foreground image may be obtained by performing the expansion operation on the region where the foreground image is located by the execution subject. The dilation operation is used to increase the number of pixels comprised in the region where the foreground image is located.

As an example, the execution subject may first determine, according to a set of probabilities, a foreground image included in the target image. For example, the execution subject may determine, as the foreground image, an image composed of pixels corresponding to a probability greater than a preset probability threshold in the probability set. Then, the executing body may perform an expansion operation on the determined area where the foreground image is located, to obtain an expanded foreground image.

As yet another example, the execution subject may first determine, based on a probability set, an area in which the foreground image included in the target image is located, and an image area in which the target image includes an uncertainty as to whether the foreground image is a background image or not (for example, the execution subject may determine, as the foreground image, an image composed of pixels corresponding to a probability greater than a preset probability threshold in the probability set, and determine, as the image area of whether the foreground image is a background image or not, an image composed of pixels corresponding to a probability greater than another probability threshold (a probability threshold smaller than the probability threshold) in the probability set. Then, the executing body may determine, for each pixel point in the image area where it is not determined whether the foreground image or the background image is located, the pixel point as the area where the foreground image is located in response to determining that the distance between the pixel point and the area where the foreground image is located is smaller than a preset distance threshold value, so as to expand the area where the foreground image determined by the executing body is located, and obtain an expanded foreground image. Then, the executing body may perform an expansion operation on the region where the expanded foreground image is located, to obtain an expanded foreground image.

In step 405, for a pixel in the dilated foreground image, in response to determining that a distance between the pixel and the determined location is less than a preset distance threshold, the pixel is determined to be a target pixel.

In this embodiment, the execution body may further determine, for a pixel in the inflated foreground image, the pixel as the target pixel in response to determining that the distance between the pixel and the position determined in step 403 is less than the preset distance threshold. The preset distance threshold may be a preset distance value. For example, the preset distance threshold may be 100 pixels, 30 pixels, etc.

Here, when the above-described position determined by the execution subject is the region position where the contour of the face image is located, the distance between the pixel point and the position determined in step 403 may be characterized by the shortest distance between the pixel point and the region where the contour is located.

In step 406, an image composed of the target pixels in the dilated foreground image is determined as a dilated image.

In this embodiment, the execution subject may determine an image composed of the target pixel points in the dilation foreground image as the post-dilation operation image.

Step 407, dividing the target image into a foreground image and a non-foreground image based on the target image and the post-dilation operation image.

In this embodiment, step 407 is substantially identical to step 204 in the corresponding embodiment of fig. 2, and will not be described here again.

It can be appreciated that when the objective image is divided into the foreground image and the non-foreground image by using the matting algorithm, a complex calculation process is generally required, and by determining the objective pixel, the number of pixels to be calculated can be reduced, thereby reducing the calculation complexity and improving the calculation efficiency. In addition, since the hair image is generally located in a certain range of the facial image, the target image is facilitated to be analyzed by determining the area where the pixel points with the distance from the position being smaller than the preset distance threshold value are located, so that the position of the hair image is determined, and the accuracy of image segmentation is improved.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the procedure 400 of the method for dividing an image in this embodiment highlights the step of determining the position of the face image in the target image, and thus, the scheme described in this embodiment can reduce the computational complexity of the subsequent step and further improve the calculation efficiency by obtaining the region where the pixel point whose distance from the above-mentioned position is smaller than the preset distance threshold value is located.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for segmenting an image, which corresponds to the method embodiment shown in fig. 2, and which may include the same or corresponding features as the method embodiment shown in fig. 2, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the apparatus 500 for segmenting an image of the present embodiment includes: an acquisition unit 501, an input unit 502, an expansion unit 503, and an extraction unit 504. Wherein the acquisition unit 501 is configured to acquire a target image, wherein the target image comprises a foreground image; the input unit 502 is configured to input the target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that a pixel point included in the target image is located in an area where a foreground image is located; the expansion unit 503 is configured to perform expansion operation on the region where the foreground image is located based on the probability set, so as to obtain an image after expansion operation, where the expansion operation is used to increase the number of pixels included in the region where the foreground image is located; the extraction unit 504 is configured to divide the target image into a foreground image and a non-foreground image based on the target image and the post-expansion operation image.

In the present embodiment, the acquisition unit 501 of the apparatus 500 for dividing an image may acquire a target image by a wired connection method or a wireless connection method. Wherein the target image comprises a foreground image. For example, the target image may be an image containing a person, an animal, or the like. Wherein, when the target image is an image containing a person, the foreground image may be an image of the person. When the target image is an image containing an animal, the foreground image may be an image of the animal.

In this embodiment, the input unit 502 may input the target image acquired by the acquisition unit 501 to a pre-trained semantic segmentation model, to obtain a probability set. The semantic segmentation model can be used for calculating the probability that a pixel point included in an image is located in an area where a foreground image included in the image is located. The probabilities can be used to characterize the probability that the pixels comprised in the target image are located in the region of the foreground image.

In this embodiment, the expansion unit 503 may perform expansion operation on the region where the foreground image is located based on the probability set, to obtain an image after expansion operation. The image after the expansion operation may be an image obtained after the execution subject performs the expansion operation on the region where the foreground image is located.

In the present embodiment, the above-described extraction unit 504 may divide the target image into a foreground image and a non-foreground image based on the target image and the post-expansion operation image. Wherein the non-foreground image may be an image other than the foreground image in the image. For example, the non-foreground images may include background images and/or image areas that are not determined to be foreground or background.

In some optional implementations of the present embodiment, the foreground image comprises a facial image; the expansion unit includes: a first determining module (not shown) is configured to determine a position of a face in the target image; a dilation module (not shown) is configured to perform dilation operation on the foreground image based on the probability set to obtain a dilated foreground image; a second determining module (not shown in the figure) is configured to determine, for a pixel in the dilated foreground image, the pixel as a target pixel in response to determining that a distance between the pixel and a location is less than a preset distance threshold; the third determination module (not shown in the figure) is configured to determine an image composed of target pixel points in the dilated foreground image as a dilated operation post-image.

In some optional implementations of the present embodiment, the semantic segmentation model is trained in advance by: acquiring a training sample set, wherein the training sample comprises an image and a labeling information set corresponding to the image, and the labeling information is used for representing whether pixel points in the image are positioned in an area where a foreground image is positioned; and using a machine learning algorithm, taking an image included in a training sample as input, taking a labeling information set corresponding to the input image as expected output, and training to obtain a semantic segmentation model.

In some optional implementations of the present embodiment, the extracting unit includes: the extraction module is configured to divide the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation by adopting a Global Matting algorithm or a depth Matting algorithm DEEP IMAGE MATTING.

In some alternative implementations of the present embodiment, the structure of the semantic segmentation model is a deeplab v plus structure.

According to the device provided by the embodiment of the application, the target image is acquired through the acquisition unit 501, wherein the target image comprises a foreground image, the determination unit 502 inputs the target image into the pre-trained semantic segmentation model to obtain a probability set, the probability is used for representing the probability that the pixel points included in the target image are located in the area where the foreground image is located, the expansion unit 503 performs expansion operation on the area where the foreground image is located based on the probability set to obtain an expanded image, the expansion operation is used for increasing the number of the pixel points included in the area where the foreground image is located, and finally the extraction unit 504 segments the target image into the foreground image and the non-foreground image based on the target image and the expanded image, so that the accuracy of image segmentation is improved.

Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601.

The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, an input unit, an expansion unit, and an extraction unit. The names of these units do not constitute limitations on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires a target image".

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image, wherein the target image comprises a foreground image; inputting a target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that a pixel point included in the target image is positioned in an area where a foreground image is positioned; based on the probability set, performing expansion operation on the area where the foreground image is located to obtain an image after the expansion operation, wherein the expansion operation is used for increasing the number of pixel points included in the area where the foreground image is located; the target image is divided into a foreground image and a non-foreground image based on the target image and the post-dilation operation image.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for segmenting an image, comprising:

Acquiring a target image, wherein the target image comprises a foreground image, and the foreground image comprises a face image;

Inputting the target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that the pixel points included in the target image are positioned in the area where the foreground image is positioned;

determining the position of a face image in the target image by using a face detection mode;

Determining an area where a foreground image included in the target image is located based on the probability set, determining whether the uncertainty included in the target image is an image area of a foreground image or a background image, and for each pixel point in the image area of the uncertainty is the foreground image or the background image, determining the pixel point as the area where the foreground image is located in response to determining that the distance between the pixel point and the area where the foreground image is located is smaller than a preset distance threshold value, expanding the area where the foreground image is located to obtain an expanded foreground image, and performing expansion operation on the area where the expanded foreground image is located to obtain an image after expansion operation, wherein the expansion operation is used for increasing the number of the pixel points included in the area where the foreground image is located;

Dividing the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation by using a Global Matting algorithm, wherein the method comprises the following steps: extracting a foreground image in the expanded image, and determining the pixel points as the pixel points in the segmented foreground image for the foreground image included in the target image and the pixel points included in the foreground image in the expanded image; for the pixel points in the foreground image only contained in the target image or the foreground image in the image after the expansion operation, determining the pixel points as the pixel points in the segmented foreground image in response to determining that the probability corresponding to the pixel points is larger than a preset probability threshold; and determining the pixel point as the pixel point in the segmented non-foreground image in response to determining that the probability corresponding to the pixel point is not greater than the probability threshold.

2. The method of claim 1, wherein the semantic segmentation model is trained beforehand by:

Acquiring a training sample set, wherein the training sample comprises an image and a labeling information set corresponding to the image, and the labeling information is used for representing whether pixel points in the image are positioned in an area where a foreground image is positioned;

And using a machine learning algorithm, taking an image included in a training sample as input, taking a labeling information set corresponding to the input image as expected output, and training to obtain a semantic segmentation model.

3. A method according to claim 1 or 2, wherein the structure of the semantic segmentation model is a deeplab v plus structure.

4. An apparatus for extracting an image, comprising:

An acquisition unit configured to acquire a target image, wherein the target image includes a foreground image including a face image;

The input unit is configured to input the target image into a pre-trained semantic segmentation model to obtain a probability set, wherein the probability is used for representing the probability that a pixel point included in the target image is positioned in an area where the foreground image is positioned;

An expansion unit configured to determine a position of a face image in the target image by means of face detection; determining an area where a foreground image included in the target image is located based on the probability set, determining whether the uncertainty included in the target image is an image area of a foreground image or a background image, aiming at each pixel point in the image area of the uncertainty is the foreground image or the background image, determining the pixel point as the area where the foreground image is located in response to determining that the distance between the pixel point and the area where the foreground image is located is smaller than a preset distance threshold value, expanding the area where the foreground image is located to obtain an expanded foreground image, and performing expansion operation on the area where the expanded foreground image is located to obtain an image after expansion operation;

an extraction unit configured to divide the target image into a foreground image and a non-foreground image based on the target image and the image after the expansion operation using a Global Matting algorithm;

Wherein the extraction unit is further configured to: extracting a foreground image in the expanded image, and determining the pixel points as the pixel points in the segmented foreground image for the foreground image included in the target image and the pixel points included in the foreground image in the expanded image; for the pixel points in the foreground image only contained in the target image or the foreground image in the image after the expansion operation, determining the pixel points as the pixel points in the segmented foreground image in response to determining that the probability corresponding to the pixel points is larger than a preset probability threshold; and determining the pixel point as the pixel point in the segmented non-foreground image in response to determining that the probability corresponding to the pixel point is not greater than the probability threshold.

5. The apparatus of claim 4, wherein the semantic segmentation model is trained beforehand by:

6. The apparatus of claim 4 or 5, wherein the semantic segmentation model has a structure of deeplab v plus structure.

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-3.

8. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-3.