CN109858333A

CN109858333A - Image processing method, device, electronic equipment and computer-readable medium

Info

Publication number: CN109858333A
Application number: CN201811564701.2A
Authority: CN
Inventors: 邰颖; 曹玮剑; 葛彦昊; 汪铖杰; 李季檩; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-12-20
Filing date: 2018-12-20
Publication date: 2019-06-07
Anticipated expiration: 2038-12-20
Also published as: CN109858333B

Abstract

This disclosure relates to a kind of image processing method, device, electronic equipment and computer-readable medium.This method comprises: detecting to target object in image, target area is extracted；The target area input feature vector point is extracted to the temperature figure that the target object is generated in model, the temperature figure is used to show that the pixel in the target area to be the probability of the characteristic point of the target object；Determine multiple validity feature points in the temperature figure；And the target feature point of the target object in described image is determined according to the multiple validity feature point.This disclosure relates to image processing method, device, electronic equipment and computer-readable medium, quickly can accurately position the facial key point in image.

Description

Image processing method, image processing device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of computer information processing, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

Face keypoint detection, also referred to as face keypoint localization or face alignment, refers to a detection technique that locates, given an image of a human face, critical regions of the face, including eyebrows, eyes, nose, mouth, face contours, and the like. The detection of the key points of the face is the focus of current academic research, and the technology is based on the accurate detection of the face frame, and the eyes, the nose, the mouth, the chin and the like are accurately positioned after the face frame is detected. The face key point detection is widely applied to the fields of face recognition, face tracking, face animation, 3D face modeling and the like.

In recent years, with the explosive growth of personal and network video images, a fully automatic, efficient and powerful face key point positioning method is needed, and the face key point detection method in the prior art cannot meet the requirements of current users in terms of time efficiency and processing capacity.

Therefore, a new image processing method, apparatus, electronic device and computer readable medium are needed.

The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of the above, the present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a computer readable medium, which are capable of quickly and accurately locating a facial key point in an image.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to another aspect of the present disclosure, an image processing method is provided, the method including: detecting a target object in the image, and extracting a target area; inputting the target region into a feature point extraction model to generate a heat map of the target object, wherein the heat map is used for displaying the probability that pixel points in the target region are feature points of the target object; determining a plurality of valid feature points in the heat map; and determining the target characteristic points of the target object in the image according to the plurality of effective characteristic points.

In one embodiment, the method further comprises: and training the stacked hourglass network model through a training image and an accurate heat map corresponding to the training image to generate the feature point extraction model.

According to still another aspect of the present disclosure, there is provided an image processing apparatus including: the extraction module is used for detecting a target object in the image and extracting a target area; the model calculation module is used for inputting the target area into a feature point extraction model to generate a heat map of the target object, and the heat map is used for displaying the probability that pixel points in the target area are feature points of the target object; a valid feature point module for determining a plurality of valid feature points in the heat map; and the characteristic point module is used for determining the target characteristic points of the target object in the image according to the effective characteristic points.

In one embodiment, the apparatus further comprises: and the model training module is used for training the stacked hourglass network model through the training images and the accurate heat map corresponding to the training images to generate the feature point extraction model.

According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.

According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.

According to the image processing method, the image processing device, the electronic equipment and the computer readable medium, the key points of the human face in the image can be quickly and accurately positioned.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.

FIG. 1 is a system block diagram illustrating an image processing method according to an exemplary embodiment.

Fig. 2 is a schematic diagram illustrating an application scenario of an image processing method according to an exemplary embodiment.

Fig. 3 is a schematic diagram illustrating an application scenario of an image processing method according to an exemplary embodiment.

FIG. 4 is a flow diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 5 is a diagram illustrating an image processing method according to an exemplary embodiment.

FIG. 6 is a diagram illustrating an image processing method according to an exemplary embodiment.

Fig. 7 is a flowchart illustrating an image processing method according to another exemplary embodiment.

FIG. 8 is a diagram illustrating an image processing method according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment.

Fig. 10 is a block diagram illustrating an image processing apparatus according to another exemplary embodiment.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

FIG. 12 is a schematic diagram illustrating a computer-readable storage medium according to an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.

As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background server that supports a website where users take pictures or videos with the terminal devices 101, 102, 103. The server 105 may analyze and perform other processing on the received data such as the picture or the video image, and feed back the processing result (e.g., the facial feature points in the picture or the facial feature points of each frame in the video image) to the terminal device.

A user can take pictures or videos through the terminal devices 101, 102 and 103, and the terminal devices 101, 102 and 103 can detect a target object in an image and extract a target area; the terminal devices 101, 102, 103 may, for example, input the target region into a feature point extraction model to generate a heat map of the target object, where the heat map is used to display a probability that a pixel point in the target region is a feature point of the target object; the terminal device 101, 102, 103 may for example determine a plurality of valid feature points in the heat map; and the terminal device 101, 102, 103 may determine the target feature points of the target object in the image, e.g. from the plurality of valid feature points. The terminal devices 101, 102, 103 may further process the pictures or videos taken by the user, for example, through the feature points, and may perform face recognition, real-time makeup processing, or the like, for example.

The terminal devices 101, 102, 103 may also train the stacked hourglass network model, for example, through a training image and an accurate heat map corresponding to the training image, to generate the feature point extraction model.

A user can take pictures or videos through the terminal devices 101, 102, 103, the terminal devices 101, 102, 103 can forward the pictures or videos to the server 105, and the server 105 can detect a target object in an image and extract a target area; the server 105 may, for example, input the target region into a feature point extraction model to generate a heat map of the target object, where the heat map is used to display probabilities that pixel points in the target region are feature points of the target object; the server 105 may, for example, determine a plurality of valid feature points in the heat map; and the server 105 may determine the target feature points of the target object in the image, for example, from the plurality of valid feature points. The server 105 may further process the picture or video taken by the user, for example, by feature points, may perform face recognition, real-time makeup processing, or the like, for example. The server 105 may also return the characteristic points to the terminal devices 101, 102, 103, for example, and the terminal devices 101, 102, 103 may further process the pictures or videos taken by the user, for example, by the characteristic points.

Server 105 may also train the stacked hourglass network model, for example, by training images and their corresponding accurate heat maps, to generate the feature point extraction model.

The server 105 may be a single entity server, or may be composed of a plurality of servers, for example, it should be noted that the image processing method provided by the embodiment of the present disclosure may be executed by the server 105 and/or the terminal devices 101, 102, 103, and accordingly, the image processing apparatus may be disposed in the server 105 and/or the terminal devices 101, 102, 103. And the requesting end for providing the user with picture or video input is typically located in the terminal equipment 101, 102, 103.

The image processing method can be applied to various scenes, and can be used for quickly and accurately positioning the key points of the human face in the image in a mode that the target area in the image is input into a feature point extraction model with a Stacked Hourglass Network structure (Stacked Hourglass Network) to generate a heat map, and then the feature points in the image are determined according to at least four effective feature points in the output heat map.

According to the image processing method disclosed by the invention, effective information of the image under different scales can be captured through the characteristic point extraction model with the stacked hourglass network structure. Furthermore, by stacking multiple hourglass structures, the network estimates the coordinates of feature points from the feature map by estimating the heat map of feature points in the face or body image from a coarse to a fine format. Therefore, the image processing method based on the disclosure can accurately estimate the feature points in the face image.

Fig. 2 is a schematic diagram illustrating an application scenario of an image processing method according to an exemplary embodiment. As shown in fig. 2, a user takes a picture of a person in a user terminal through a camera device, and the terminal device determines feature points (e.g., a face contour and key points of eyes, a nose, a mouth, etc.) in the image. The terminal device may, for example, display the feature points in the user terminal device for the user to click, and then process the facial image of the user according to the different feature points according to the selection of the user, and may, for example, fill a lip region in the image of the user with a makeup effect according to the selection of the lip feature points by the user.

The terminal device may further, for example, automatically process the image of the user according to the preset parameters after acquiring the feature points in the picture according to the preset parameters of the user, and then display the whole makeup effect in the terminal device.

Fig. 3 is a schematic diagram illustrating an application scenario of an image processing method according to an exemplary embodiment. As shown in fig. 3, a user takes a video image of an individual in a user terminal through a camera device, and the terminal device transmits the video image to a remote server, and the remote server determines feature points (e.g., a face contour and key points of eyes, a nose, a mouth, etc.) in the video image. The remote server may then return the above feature points to the user's terminal device for further operation by the user.

When the terminal device transmits the video image to the remote server, the terminal device may also upload preset parameters to the remote server together, the remote server further processes the video image of the user according to the preset parameters in combination with the feature points, for example, add special effects to the video image, and then the remote server returns the processed video to the terminal device so as to display the processed video image at the user.

It should be noted that, although the present disclosure is exemplarily described by processing the user face image and extracting the feature points in the user face image, the image processing method in the present disclosure may also be applied to other image processing scenarios, and the present disclosure is not limited thereto.

The image processing method in the present disclosure will be described in detail below by specific examples:

FIG. 4 is a flow diagram illustrating an image processing method according to an exemplary embodiment. The image processing method 40 includes at least steps S402 to S408.

As shown in fig. 4, in S402, a target object in the image is detected, and a target region is extracted. The image may be, for example, a face image containing a face, and the target region may be, for example, a rectangular region.

In one embodiment, a rectangular frame area image is extracted from the image through a face detection algorithm; zooming the rectangular frame area image to a predetermined size; and carrying out normalization processing on the pixels in the scaled rectangular frame area to generate a target area.

When a rectangular frame region image is extracted from an image through a face detection algorithm, the face position of the image with any size can be detected by using the face detection algorithm, and a face frame (rectangular frame) is output; specifically, the position of the face can be detected by a Robust real-time face detection analysis technology (Robust real-time face detection), and a rectangular face region is extracted. It is worth mentioning that when the face part is taken out according to the face frame, the target area can be determined according to the fact that the center of the face frame is unchanged and the side length is enlarged by 1.3 times. The experience value of the face rectangular frame is 1.3 times that of the face rectangular frame, and different parameters can be set to obtain the rectangular frame area under different use scenes and use requirements.

Wherein, when the rectangular frame area image is zoomed to a preset size, the fetched rectangular frame area can be zoomed to 256 × 256 size. The 256 × 256 size is an empirical value capable of obtaining a better face rectangular frame, and different sizes can be set under different use scenes and use requirements.

When the pixels in the zoomed rectangular frame area are normalized to generate the target area, the pixel values of the image in the rectangular frame area are normalized to the [0,1] interval.

In S404, the target region is input into a feature point extraction model to generate a heat map of the target object, where the heat map is used to display the probability that a pixel point in the target region is a feature point of the target object. Wherein the feature point extraction module may be a stacked hourglass network model. The stacked hourglass network model is a network model applied to predicting human body postures, and the network structure of the stacked hourglass network can capture and integrate information of all scales of images. This structure is commonly referred to as an hourglass based on the visualization of the pooling and subsequent upsampling steps used to obtain the final output of the network.

The hourglass structure in the feature point extraction model of the present disclosure is a simple, minimized design that can capture all of these features and combine them together to output a per-pixel prediction. The present disclosure extends a single hourglass by successively bringing a plurality of hourglass modules together end-to-end. The scale-spanning repetition of the hourglass structure is from bottom to top and from top to bottom, so that repeated bidirectional inference is realized, and the image processing performance of the model is greatly improved.

FIG. 5 is a diagram illustrating an image processing method according to an exemplary embodiment. As shown in fig. 5, the target region may be input into a first hourglass structure of the stacked hourglass network model, the output data of the first hourglass structure is input into a second hourglass structure again, and then a heat map (Heatmap) with characteristic point marks is output after being input into a third hourglass structure. In the embodiment of fig. 5, only the stacked hourglass network model having three hourglass structures is taken as an example, and the calculation process of the network model is described, in an actual application scenario, the stacked hourglass network model having a plurality of hourglass structures may be selected according to actual situations for calculation, which is not limited in this disclosure.

FIG. 6 is a diagram illustrating an image processing method according to an exemplary embodiment. Fig. 5 shows a specific schematic diagram of the hourglass structure in the stacked hourglass network model, as shown in fig. 5, the design of the hourglass is as follows: convolution and max pooling layers are used to process features to very low resolution. In each max-gather step, the network branches and applies more convolution at the original pre-gather resolution. After reaching the lowest resolution, the network begins top-down upsampling and cross-scale feature combining. To assemble the information onto two adjacent resolutions, the lower nearest neighbor is upsampled in resolution and then two sets of features are added by element.

In conjunction with the illustrations of fig. 5 and 6, further description is as follows: inputting the target area into a first hourglass structure of the stacked hourglass network model, wherein the first hourglass structure can compress the target area for the operation requirement, so that the characteristics of the target area image are processed to be lower in resolution, inputting the output data of the first hourglass structure into a second hourglass structure again, and the second hourglass structure can compress the target area again for the operation requirement. The hourglass topology is symmetrical, so there is a corresponding layer rise for each layer present on the road. After the characteristic points of the target area are obtained after the plurality of rounds of compression, the target area compressed in the previous step is expanded by a plurality of continuous hourglass structures in a number which is symmetrical to the number of rounds of compression, so that a preset resolution image is obtained and output. The output of the stacked hourglass network model is a plurality of heat maps, where, for each given heat map, the stacked hourglass network model is able to predict the probability of the presence of a feature point at each pixel. Each hourglass structure contains a plurality of residual blocks (residual blocks) for performing specific network computations.

In one embodiment, the target region is input into a stacked hourglass network model, and a heat map is generated by performing forward calculation based on a mini-batch gradient descent method (mini-batch). The specific idea of the small-batch gradient descent method is that a part of samples are used for updating when each parameter is updated, and a compromise solution can be obtained in the calculation time and the calculation noise. Of course, other Gradient Descent methods such as Batch Gradient Descent (Batch Gradient Descent) and random Gradient Descent (Stochastic Gradient Descent) may be selected to perform the forward calculation of the present step.

In S406, a plurality of valid feature points in the heat map are determined. The valid feature point is a point whose pixel value is not O. Specific examples thereof include: determining extreme points in the heat map; at least four valid feature points are randomly determined within a predetermined range from the extreme point.

In one embodiment, the extreme point is the point in the heat map where the response is greatest. The heat map is generated according to a two-dimensional Gaussian function, and the calculation formula of the heat map is as follows:

the coordinates of the marked key points of the human face.

It is worth mentioning that the coordinates of the key points of the faceThe rounding and integer changing operation can not be carried out, and the decimal part can be reserved, so that the precision loss is avoided. The feature point coordinates can be accurately estimated from the recovered heat map by solving a binary quadratic equation of the heat map.

And determining a plurality of effective characteristic points in the acquired heat map in a random mode through the characteristic point coordinates so as to facilitate subsequent calculation.

In S408, a target feature point of the target object in the image is determined according to the plurality of valid feature points. Can be for example: substituting the positions and the pixel values of the effective characteristic points into a heat map equation, and solving the heat map equation to determine target characteristic points of a target object in the image; wherein the heat map equation is a two-dimensional Gaussian equation. .

In one embodiment, the two-dimensional Gaussian used to generate the heat map is a binary quadratic function with respect to the keypoints. Theoretically, the equation can be solved as long as the positions and pixel values of 4 effective feature points can be found, and accurate key point coordinates can be recovered. Therefore, for each heat map acquired by the stacked hourglass network model, a point with the maximum Gaussian function response can be determined in the heat map, 4 points are randomly sampled near the point with the maximum response, the equation is solved after the two-dimensional Gaussian function is substituted according to the positions and pixel values of the 4 points, accurate point coordinates are obtained, and finally the point coordinates are amplified and restored to the size of an original image.

According to the image processing method disclosed by the invention, effective information of the image under different scales can be captured through the feature point extraction model generated by the stacked hourglass network in the deep learning field. Furthermore, by stacking multiple hourglass structures, the network estimates the coordinates of feature points from the feature map by estimating the heat map of feature points in the face or body image from a coarse to a fine format. The image processing method based on the present disclosure can accurately estimate the feature points in the face image.

According to the image processing method disclosed by the invention, the decimal part of the coordinate point is reserved from the marked characteristic point coordinate to the generation of the heat map, and the problem of precision loss is avoided; and the coordinates of the characteristic points estimated from the recovered heat map have closed solutions, so that the accurate positions of the estimated points can be located in one step, and the accuracy is improved while almost no extra calculation complexity is introduced, thereby greatly improving the efficiency of final facial characteristic point identification.

The video data annotation requires a large amount of manpower input, and the image processing method can provide high-precision semi-automatic video data annotation generation, so that the manpower input is effectively reduced. In actual scenes, the method can be applied to corresponding functions in short video products such as makeup and action effect,

it should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

Fig. 7 is a flowchart illustrating an image processing method according to another exemplary embodiment. The image processing method 70 shown in fig. 7 is a detailed description of the "feature point extraction model" construction process mentioned in S404 "input the target region into the feature point extraction model to generate the heat map of the target object" in the flow shown in fig. 4. Specific examples thereof include: and training the stacked hourglass network model through a training image and an accurate heat map corresponding to the training image to generate the feature point extraction model.

As shown in fig. 7, in S702, a target region with a target object is extracted from the training image.

In one embodiment, rectangular frame area images are respectively extracted from a plurality of images through a face detection algorithm; scaling the plurality of rectangular frame area images to a predetermined size; and carrying out normalization processing on the pixels in the plurality of scaled rectangular frame areas to generate a plurality of target areas.

When a rectangular frame region image is extracted from an image through a face detection algorithm, the position of a face can be detected through a Robust real-time face detection analysis technology (Robust real-time face detection), and a rectangular face region is extracted.

In S704, feature points in the target region are determined. A corresponding plurality of precise coordinates of the plurality of feature points in the target area may be determined, for example; and generating a plurality of heat maps by the plurality of precise coordinates.

In one real-time example, a plurality of feature points in a plurality of target regions may be determined, for example, by way of manual annotation. In another embodiment, a plurality of feature points may also be determined in the plurality of target regions, respectively, for example, by other algorithms for obtaining feature points.

As above, the heat map is generated from a two-dimensional gaussian function, and the calculation formula of the heat map is as follows:

wherein,the coordinates of the marked key points of the human face.

It is worth mentioning that, in the embodiment of the present disclosure, the coordinates of the key points of the face are obtainedThe rounding and integer changing operation can not be carried out, and the decimal part can be reserved, so that the precision loss is avoided. A plurality of heat maps are generated from the plurality of precise coordinates.

In S706, an accurate heat map is generated from the feature points. And respectively generating a plurality of corresponding accurate heat maps by marking each feature point.

FIG. 8 is a diagram illustrating an image processing method according to an exemplary embodiment. As shown in fig. 8, the conventional stacked hourglass network has a problem of accuracy loss when processing a high-definition face image. Specifically, in the original stacked hourglass network, the remaining integer part is rounded off when the conversion from the marked training feature point coordinates to the precise heat map is realized, such as the point (29.55,77.38) in fig. 8, the mark of the precise heat map is rounded off to (30,77) when the mark of the precise heat map is generated, and then the existing stacked hourglass network takes (30,77) as the center point and generates the precise heat map according to the two-dimensional Gaussian function.

Fig. 8 further exemplarily shows a specific image of a heat map output by the stacked hourglass network model, in the stacked hourglass network model, the color intensity of each pixel point on the heat map represents the probability that the pixel point is a feature point of the target object, and the stronger the color, the greater the probability that the pixel point is a feature point.

This way of generation discards the fractional part, resulting in a loss of precision. In particular, the output of the original stacked hourglass network is typically smaller than the original input image (e.g., 64 x 64 for output and 256 x 256 for input), and the process of re-scaling the output to the input size further amplifies the error.

To solve the problems of the prior methods, the decimal part is preserved in the embodiment of the present disclosure in the conversion from the marked feature point coordinates to the heat map, and the heat map is generated directly centering on the point (29.55, 77.38).

In S708, a stacked hourglass network model is trained through the target region and the accurate heat map, and the feature point extraction model is generated. Multiple target regions may be input into a stacked hourglass network structure, for example, generating multiple initial heat maps; comparing the plurality of initial heat maps with the plurality of accurate heat maps to determine a plurality of error parameters; and updating the parameters of the stacked hourglass network structure based on a plurality of error parameters and an optimization algorithm to obtain a feature point extraction model.

The hourglass structure includes convolution and max pooling layers for processing features to extremely low resolution. In each max-gather step, the network branches and applies more convolution at the original pre-gather resolution. After reaching the lowest resolution, the network begins top-down upsampling and cross-scale feature combining. To assemble the information onto two adjacent resolutions, the lower nearest neighbor is upsampled in resolution and then two sets of features are added by element.

In one embodiment, the plurality of target regions may be respectively input into a first hourglass structure of the stacked hourglass network model, the first hourglass structure may compress the target regions for operational needs, thereby processing the features of the target region image to a lower resolution, and the output data of the first hourglass structure may be input into a second hourglass structure again, which may compress the target regions again for operational needs.

Since the topology of each hourglass is symmetrical in the stacked hourglass network model structure, there is a corresponding layer rise for each layer present on the road. After the characteristic points of the target area are obtained after the plurality of rounds of compression, the target area compressed in the previous step is expanded by a plurality of continuous hourglass structures in a number which is symmetrical to the number of rounds of compression, so that a preset resolution image is obtained and output. During the training of the stacked hourglass network model structure, the output of the stacked hourglass network model is a plurality of initial heat maps, wherein the stacked hourglass network model is capable of predicting the probability of feature points existing at each pixel, given each initial heat map.

In one embodiment, comparing the plurality of initial heat maps to the plurality of accurate heat maps to determine a plurality of error parameters comprises: and comparing the plurality of initial heat maps with the plurality of accurate heat maps by an Euclidean loss function to determine a plurality of error parameters. For example, in the training process, forward calculation is carried out based on the mini-batch to obtain a recovered heat map, and an Euclidean Loss function (Euclidean Loss) is used for the initial heat map and the accurate heat map to obtain the image error of the heat map.

When the initial heat maps and the accurate heat maps are compared to determine a plurality of error parameters, model parameters are updated through an Adam (optimization algorithm) algorithm based on heat map image errors, and iteration is continuously repeated until the model converges. In the training process, the whole stacked hourglass network model is trained in an end-to-end training mode.

It should be noted that the various algorithms described above are merely exemplary, and in the actual training process of the stacked hourglass network structure, different optimization algorithms may be selected according to different requirements for auxiliary training, which is not limited in this disclosure.

Loss function (loss function) refers to a function that maps an event (an element in a sample space) to a real number that expresses the economic or opportunity cost associated with its event. For example, in the comparison between the initial heat map and the accurate heat map, various types of loss functions may be selected to obtain the error, such as: logarithmic loss functions, squared loss functions, exponential loss functions, and the like.

The optimization algorithm can accelerate the model calculation convergence speed in the model training process, so that a better and smaller loss function value is obtained. For example, model parameter updates may be performed based on errors in the heat map image by a first order optimization algorithm, a second order optimization algorithm, or a gradient descent method.

According to the image processing method disclosed by the invention, the registration performance of the network on the high-definition face image is improved by improving the conversion precision of the feature point coordinates and the heat map. By improving the conversion precision of the characteristic point coordinates and the heat map, better face registration performance can be obtained compared with the original stacked hourglass model, so that the face registration method is better applied to subsequent application scenes such as beauty and make-up, face thinning and the like.

Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program of (a) may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 9 is a block diagram illustrating an image processing apparatus according to an exemplary embodiment. The image processing apparatus 90 includes: an extraction module 902, a model calculation module 904, a valid feature points module 906, and a feature points module 908.

The extracting module 902 is configured to extract a target region from an image; specifically, for example, a rectangular frame region image is extracted from an image by a face detection algorithm; zooming the rectangular frame area image to a predetermined size; and carrying out normalization processing on the pixels in the scaled rectangular frame area to generate a target area.

The model calculation module 904 is configured to input the target region into a feature point extraction model to generate a heat map of the target object, where pixel points on the heat map represent probabilities of the target region being feature points of the target object through color intensity. The feature point extraction model may have a stacked hourglass network model structure, which is a network model applied to predicting human body posture, and can capture and integrate information of all scales of images. This structure is commonly referred to as an hourglass based on the visualization of the pooling and subsequent upsampling steps used to obtain the final output of the network. The target area may be input into a first hourglass configuration of the stacked hourglass network model, the output data from the first hourglass configuration may be input into a second hourglass configuration again, and so on, until a thermal map with characteristic point markers is output from the last hourglass configuration.

A valid feature point module 906 for determining a plurality of valid feature points in the heat map; the valid feature point is a point whose pixel value is not O. Specific examples thereof include: determining extreme points in the heat map; at least four valid feature points are randomly determined within a predetermined range from the extreme point.

The feature point module 908 is configured to determine a target feature point of a target object in the image according to the plurality of valid feature points. Can be for example: and determining the characteristic points of the image through the positions and pixel values of at least four effective characteristic points based on a two-dimensional Gaussian function.

According to the image processing device disclosed by the invention, effective information of the image at different scales can be captured through the stacked hourglass network in the deep learning field. Furthermore, by stacking multiple hourglass structures, the network estimates the coordinates of feature points from a heat map (Heatmap) by estimating the feature points in the face or body image from a coarse to a fine format. The image processing method based on the present disclosure can accurately estimate the feature points in the face image.

Fig. 10 is a block diagram illustrating an image processing apparatus according to another exemplary embodiment. As shown in fig. 10, the image processing apparatus 1000 further includes, in addition to the image processing apparatus 90: model training module 1002.

The model training module 1002 is configured to train the stacked hourglass network model through a training image and an accurate heat map corresponding to the training image, and generate the feature point extraction model. Specific examples thereof include: extracting a plurality of target areas from the plurality of images respectively; determining a plurality of feature points in a plurality of target regions; generating a plurality of accurate heat maps through a plurality of feature points; and training the stacked hourglass network model through the target area and the accurate heat map to generate the feature point extraction model.

According to the image processing device disclosed by the invention, the conversion precision of the characteristic point coordinates and the heat map is improved, so that the registration performance of the network on the high-definition face image is improved. By improving the conversion precision of the characteristic point coordinates and the heat map, better face registration performance can be obtained compared with the original stacked hourglass model, so that the face registration method is better applied to subsequent application scenes such as beauty and make-up, face thinning and the like.

An electronic device 1100 according to this embodiment of the disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.

As shown in fig. 11, electronic device 1100 is embodied in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: at least one processing unit 1110, at least one memory unit 1120, a bus 1130 connecting the various system components including the memory unit 1120 and the processing unit 1110, a display unit 1140, and the like.

Wherein the storage unit stores program code executable by the processing unit 1110 to cause the processing unit 1110 to perform steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 1110 may perform the steps shown in fig. 4 and 7.

The memory unit 1120 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)11201 and/or a cache memory unit 11202, and may further include a read only memory unit (ROM) 11203.

The storage unit 1120 may also include a program/utility 11204 having a set (at least one) of program modules 11205, such program modules 11205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1130 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1100 may also communicate with one or more external devices 1100' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1100, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1150. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1160. The network adapter 1160 may communicate with other modules of the electronic device 1100 via the bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiments of the present disclosure.

Fig. 12 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure.

Referring to fig. 12, a program product 1200 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: detecting a target object in the image, and extracting a target area; inputting the target region into a feature point extraction model to generate a heat map of the target object, wherein the heat map is used for displaying the probability that pixel points in the target region are feature points of the target object; determining a plurality of valid feature points in the heat map; and determining the target characteristic points of the target object in the image according to the plurality of effective characteristic points.

Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An image processing method, comprising:

detecting a target object in the image, and extracting a target area;

inputting the target region into a feature point extraction model to generate a heat map of the target object, wherein the heat map is used for displaying the probability that pixel points in the target region are feature points of the target object;

determining a plurality of valid feature points in the heat map; and

and determining the target characteristic points of the target object in the image according to the plurality of effective characteristic points.

2. The method of claim 1, wherein determining a plurality of valid feature points in the heat map comprises:

determining an extreme point in the heat map; and

randomly determining the plurality of valid feature points within a predetermined range from the extreme point.

3. The method of claim 1, wherein determining the target feature points of the target object in the image from the plurality of valid feature points comprises:

substituting the positions and the pixel values of the effective characteristic points into a heat map equation, and solving the heat map equation to determine target characteristic points of a target object in the image;

wherein the heat map equation is a two-dimensional Gaussian equation.

4. The method of claim 1, wherein detecting the target object in the image, extracting the target region comprises:

detecting a target object in the image through a face detection algorithm;

extracting a detection area image according to a detection result;

scaling the detection region image to a predetermined size; and

and carrying out normalization processing on pixels in the scaled detection area image to generate the target area.

5. The method of claim 1, wherein inputting the target region into a feature point extraction model to generate a heat map of the target object comprises:

and inputting the target area into the feature point extraction model, and performing forward calculation based on a small batch gradient descent method to generate the heat map of the target object.

6. The method of claim 1, further comprising:

and training the stacked hourglass network model through a training image and an accurate heat map corresponding to the training image to generate the feature point extraction model.

7. The method of claim 6, wherein training a stacked hourglass network model with training images and accurate heat maps corresponding to the training images, generating the feature point extraction model comprises:

extracting a target area with a target object from the training image;

determining feature points in the target region;

generating an accurate heat map through the characteristic points; and

and training the stacked hourglass network model through the target area and the accurate heat map to generate the feature point extraction model.

8. The method of claim 7, wherein generating an accurate heat map from the feature points comprises:

determining corresponding precise coordinates of the feature points in the target area; and

and generating the accurate heat map through the accurate coordinates.

9. The method of claim 7, wherein training the stacked hourglass network model through the target region and the accurate heat map, generating the feature point extraction model comprises:

inputting the target area into the stacked hourglass network model to generate an initial heat map;

comparing the initial heat map with the accurate heat map to determine an error parameter; and

updating the parameters of the stacked hourglass network model based on the error parameters and an optimization algorithm to obtain the feature point extraction model.

10. The method of claim 9, wherein inputting the target region into the stacked hourglass network model, generating an initial heat map comprises:

and inputting the target area into the stacked hourglass network model, and performing forward calculation based on a small batch gradient descent method to generate the initial heat map.

11. The method of claim 9, wherein comparing the initial heat map to the accurate heat map to determine an error parameter comprises:

and comparing the initial heat map with the accurate heat map through an Euclidean loss function to determine the error parameter.

12. An image processing apparatus characterized by comprising:

the extraction module is used for detecting a target object in the image and extracting a target area;

the model calculation module is used for inputting the target area into a feature point extraction model to generate a heat map of the target object, and the heat map is used for displaying the probability that pixel points in the target area are feature points of the target object;

a valid feature point module for determining a plurality of valid feature points in the heat map; and

and the characteristic point module is used for determining the target characteristic points of the target object in the image according to the plurality of effective characteristic points.

13. The apparatus of claim 12, further comprising:

and the model training module is used for training the stacked hourglass network model through the training images and the accurate heat map corresponding to the training images to generate the feature point extraction model.

14. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.

15. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-11.