CN110728359B

CN110728359B - Method, device, equipment and storage medium for searching model structure

Info

Publication number: CN110728359B
Application number: CN201910960380.6A
Authority: CN
Inventors: 希滕; 姜志超; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2022-04-26
Anticipated expiration: 2039-10-10
Also published as: CN110728359A

Abstract

The application discloses a method, a device, equipment and a computer readable storage medium for searching a model structure, and relates to the field of artificial intelligence. The method includes determining a model structure search space capable of predicting information of face key points based on a face image. The model structure search space comprises a plurality of candidate model structures, wherein each candidate model structure indicates a fusion mode of a plurality of feature maps obtained from a face image into an output feature map used for predicting the information of key points of the face, and corresponding operation applied to the plurality of feature maps in the fusion. The method also includes searching the model structure search space for a model structure suitable for predicting information of a particular type of face keypoint. The embodiment of the disclosure can automatically search the optimal model structure suitable for the human face key point prediction task.

Description

Method, device, equipment and storage medium for searching model structure

Technical Field

Embodiments of the present disclosure relate generally to the field of artificial intelligence and, more particularly, relate to a method, apparatus, device and computer-readable storage medium for searching a model structure.

Background

In recent years, deep learning techniques have enjoyed great success in many directions. In the deep learning technology, the quality of a model structure (i.e., the structure of an artificial neural network) has a very important influence on the effect of a final model. Designing neural network structures manually often requires designers to have a very large experience and try a very large number of combinations. Conventional random searching is hardly feasible because many network parameters will yield a very large number of combinations. Therefore, in recent years, a Neural Architecture Search (NAS) technology has become a research focus, which replaces a tedious manual operation with an algorithm to automatically Search for an optimal Neural network Architecture.

Predicting information of face keypoints based on face images (e.g., the locations of the face keypoints in the face images and/or their corresponding depths) is very challenging. The existing manually designed model structure cannot well solve the problem, and the manually designed model structure is very complex and difficult to realize real-time prediction on low-cost equipment (such as a mobile phone). The existing automatic searching method of the model structure mainly aims at the classification problem and cannot be directly applied to the automatic searching of the model structure of the human face key point prediction problem.

Disclosure of Invention

According to an example embodiment of the present disclosure, a scheme for searching a model structure is provided.

In a first aspect of the disclosure, a method for searching a model structure is provided. The method comprises determining a model structure search space capable of predicting information of face key points based on a face image, the model structure search space comprising a plurality of candidate model structures, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map used for predicting information of the face key points, and corresponding operations applied to the plurality of feature maps in the fusion. The method also includes searching, in a model structure search space, a model structure suitable for predicting information of a particular face keypoint based on the face image based on the type of the particular face keypoint.

In a second aspect of the present disclosure, an apparatus for searching a model structure is provided. The apparatus comprises a search space determination module configured to determine a model structure search space capable of predicting information of face keypoints based on a face image, the model structure search space comprising a plurality of candidate model structures, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map used for predicting information of the face keypoints, and a corresponding operation applied to the plurality of feature maps in the fusion. The apparatus also includes a model structure search module configured to search a model structure search space for a model structure suitable for predicting information of a specific face keypoint based on a face image based on a type of the specific face keypoint.

In a third aspect of the disclosure, a computing device is provided, comprising one or more processors; and memory for storing one or more programs that, when executed by the one or more processors, cause the computing device to implement a method according to the first aspect of the disclosure.

In a fourth aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements a method according to the first aspect of the present disclosure.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 illustrates a schematic diagram of an example system in which various embodiments of the present disclosure can be implemented;

FIG. 2A illustrates a schematic diagram of sparse keypoints in a face image, according to some embodiments of the present disclosure;

FIG. 2B illustrates a schematic diagram of dense keypoints in a face image, according to some embodiments of the present disclosure;

FIG. 3 illustrates a flow diagram of an example method for searching a model structure, in accordance with some embodiments of the present disclosure;

FIG. 4 illustrates a schematic diagram of an example model structure search space for a face keypoint detection problem, according to some embodiments of the present disclosure;

FIG. 5 illustrates a flow diagram of an example method for searching a model structure in a model structure search space, in accordance with some embodiments of the present disclosure;

FIG. 6 illustrates an exemplary model structure searched from a model structure search space according to some embodiments of the present disclosure;

FIG. 7 shows a schematic block diagram of an apparatus for searching a model structure according to an embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

In the description of embodiments of the present disclosure, a "model" may learn from training data the associations between respective inputs and outputs, such that after training is complete, for a given input, a corresponding output may be generated. For example, a neural network model is constructed to include a plurality of neurons, each processing an input according to parameters obtained by training, and generating an output. The parameters of all neurons constitute a set of parameters of the neural network model. When a set of parameters for a neural network model is determined, the model may be run to perform a corresponding function. The terms "neural network", "neural network model", "model" and "network" are used interchangeably herein.

As mentioned above, predicting information of face keypoints based on face images (e.g., locating the position of face keypoints and predicting their depth in face images) is very challenging. The existing manually designed model structure cannot well solve the problem, and the manually designed model structure is very complex and difficult to realize real-time prediction on low-cost equipment (such as a mobile phone). The existing automatic searching method of the model structure mainly aims at the classification problem and cannot be directly applied to the automatic searching of the model structure of the human face key point prediction problem.

According to the embodiment of the disclosure, an automatic searching scheme of a model structure for a human face key point prediction problem is provided. The scheme determines a model structure search space capable of predicting information of face key points based on a face image. The model structure search space comprises a plurality of candidate model structures, wherein each candidate model structure indicates a fusion mode of a plurality of feature maps obtained from a face image into an output feature map used for predicting the information of key points of the face, and corresponding operation applied to the plurality of feature maps in the fusion. Then, based on the type of the specific face keypoint, a model structure suitable for predicting information of the specific face keypoint is searched in the model structure search space. The scheme can automatically search the optimal model structure suitable for the human face key point prediction task, so that the efficiency and the accuracy of the human face key point prediction are improved.

Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings. Fig. 1 illustrates a schematic diagram of an example system 100 in which various embodiments of the present disclosure can be implemented. As shown in FIG. 1, the system 100 may include a model structure search means 110, a model training means 120, and a model application means 130. The model structure search means 110, the model training means 120 and the model application means 130 may be implemented in the same physical device or separately implemented in different physical devices. It should be understood that the structure and function of system 100 is depicted in fig. 1 for exemplary purposes only, and is not meant to imply any limitation as to the scope of the present disclosure. Embodiments of the present disclosure may also be applied to systems having different structures and/or functions.

The model structure searching means 110 can implement automatic searching of the model structure. As shown in fig. 1, for the face keypoint prediction problem (for example, locating the position of a face keypoint in a face image and predicting the depth thereof), the model structure search means 110 may determine a model structure search space and search the model structure search space for a model structure 102 suitable for predicting information of the face keypoint 101 based on the face image.

In some embodiments, the model structure search space may include a plurality of candidate model structures that can be used for face keypoint prediction, where each candidate model structure indicates a manner of fusion of a plurality of feature maps derived from a face image into an output feature map of information used for predicting face keypoints, and a corresponding operation applied to the plurality of feature maps in the fusion. In some embodiments, the face keypoints 101 may be, for example, sparse keypoints for delineating a face contour and/or five sense organs, for example, as shown in fig. 2A. Alternatively, the face keypoints 101 may also be dense keypoints for depicting face details, for example, as shown in fig. 2B. In some embodiments, the predicted information of the face keypoints 101 may include the positions of the face keypoints 101 in the face image and/or the depths corresponding to the face keypoints 101. In some embodiments, the model structure search means 110 may search the model structure search space for the model structure 102 suitable for the information used to predict the type of face keypoints based on the type of face keypoints 101 (e.g., whether the face keypoints 101 are sparse keypoints or dense keypoints).

The model structure 102 searched by the model structure search means 110 may be provided to the model training means 120 for training. The model training means 120 may be trained on the basis of the training data 103 and with the model structure 102 to obtain a trained model 104. For example, for a face keypoint prediction problem, the training data 103 may include a plurality of training images and true information (e.g., true position and true depth) about the face keypoints 101 in each training image. The trained model 104 can predict the information of the face keypoints 101 in any face image based on the face image.

The model 104 trained by the model training means 120 based on the model structure 102 may be provided to the model application means 130. The model application means 130 may predict the information of the face keypoints 101 in the face image 105 using the model 104 and generate the prediction result 106. The prediction result 106 may indicate the position of the face keypoint 101 in the input image 105 and/or the depth to which it corresponds.

FIG. 3 illustrates a flow diagram of an example method 300 for searching a model structure, in accordance with some embodiments of the present disclosure. The process 300 may be implemented by the model structure searching means 110 as shown in fig. 1. It is to be understood that the method 300 may also include additional blocks not shown and/or may omit blocks shown. The scope of the present disclosure is not limited in this respect.

At block 310, the model structure search means 110 determines a model structure search space capable of predicting information of face key points based on the face image.

In some embodiments, the model structure search space may include a plurality of candidate model structures capable of predicting information of face keypoints based on the face image, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map for predicting information of the face keypoints, and a corresponding operation applied to the plurality of feature maps in the fusion.

FIG. 4 illustrates a schematic diagram of an example model structure search space 400 for a face keypoint detection problem, according to some embodiments of the present disclosure. In the model structure search space 400, feature maps 401-416 are shown, which may be obtained by transforming and fusing various features of an original face image (not shown in fig. 4). For example, feature maps 401-416 may be divided into an input feature map 401, intermediate feature maps 402-408 and 410-416, and an output feature map 409.

In some embodiments, the input feature map 401 may be, for example, a feature map obtained by preprocessing (e.g., downsampling and convolution operations) an original face image. For example, assume that the original image is an RGB image with dimensions 256 × 3 (i.e., 256 pixels in length, 256 pixels in width, and 3 channels), and the feature map 401 obtained by preprocessing has dimensions 64 × M (i.e., 64 pixels in length, 64 pixels in width, and M channels). The "scale" as described herein is represented by the length, width and number of channels (also referred to as "dimensions") of the image.

In some embodiments, the output signature 409 has a dimension of, for example, 64 × N, which is the same size as the input signature 401 but may have a different number of channels. The output feature map 409 may undergo post-processing (e.g., bilinear interpolation) corresponding to the pre-processing performed on the original image to yield output data (not shown in fig. 4) with a scale of 256 × N to indicate information of the predicted face keypoints. For example, taking a two-dimensional face keypoint as an example (i.e., only the location of the face keypoint in the face image needs to be predicted), the output data may indicate two-dimensional coordinates (e.g., x-coordinates and y-coordinates) of a plurality of face keypoints in the face image. Taking a three-dimensional face key point as an example (i.e., the position of the face key point in the face image and the depth corresponding to the face key point need to be predicted), the output data may indicate three-dimensional coordinates (e.g., u-coordinates and v-coordinates in uv space, and depth) of a plurality of face key points in the face image.

Each solid arrow in fig. 4 indicates a downsampling (e.g., 2-fold downsampling) and convolution operation. Through this operation, the length and width of the feature map are reduced to 1/2, respectively. It should be understood that the convolution operation indicated by each solid arrow in fig. 4 may be the same, or different (e.g., different convolution kernels). Similarly, each dashed arrow in fig. 4 indicates an upsampling (e.g., a 2-fold upsampling) and convolution operation. Through this operation, the length and width of the feature map were increased by 2 times, respectively. It should be understood that the convolution operation indicated by each dashed arrow in fig. 4 may be the same, or different (e.g., different convolution kernels). Each dot-dash arrow in fig. 4 indicates a convolution operation, without including an upsampling or downsampling operation. That is, the length and width of the feature map remain unchanged through this operation. It should be understood that the convolution operation indicated by each dotted line arrow in fig. 4 may be the same, or different (e.g., different convolution kernels). The solid, dashed and dashed arrows in fig. 4 show all possible ways of fusing the feature maps of various scales obtained from the original face image into the output feature map 409 for predicting the information of the face key points, and the corresponding operations (e.g., convolution operations) applied to the feature maps of various scales in the fusion process.

As can be seen from the model structure search space 400 shown in FIG. 4, there may be many combinations of selecting which of the intermediate feature maps 402-408 and 410-416 to fuse, which topologies the selected intermediate feature maps are connected, and which operations are performed on the feature maps separately during fusing, in order to obtain the output feature map 409. Each combination will constitute a candidate model structure in the model structure search space 400.

Returning to fig. 3, at block 320, the model structure search means 110 searches the model structure search space for a model structure suitable for predicting information of a specific face keypoint based on a face image, based on the type of the specific face keypoint.

In some embodiments, the specific face keypoints may be sparse keypoints for delineating a face contour and/or five sense organs, as shown in fig. 2A. Alternatively, in other embodiments, the particular face keypoints may be dense keypoints for delineating face details, as shown in fig. 2B. The model structure search means 110 may search the model structure search space 400 shown in fig. 4 for a model structure suitable for predicting information of a specific face keypoint based on a face image based on the type of the specific face keypoint.

FIG. 5 illustrates a flow diagram of an example method 500 for searching a model structure in a model structure search space, in accordance with some embodiments of the present disclosure. Method 500 may be viewed, for example, as one example implementation of block 320 as shown in fig. 3. It is to be understood that method 500 may also include additional blocks not shown and/or may omit blocks shown. The scope of the present disclosure is not limited in this respect.

At block 510, the model structure search means 110 selects one of a plurality of candidate model structures included in the model structure search space as a seed model structure. In some embodiments, the seed model structure may be randomly selected.

At block 520, the model structure search means 110 determines a model transition probability for the seed model structure.

In some embodiments, the model transition probabilities may indicate respective probabilities that the seed model structure is converted into each of a plurality of candidate model structures included in the model structure search space via one conversion. "transformation" as used herein refers to an operation that alters the manner in which one or more feature maps involved in the seed model structure are fused and/or applied thereto. For example, assume that the model structure search space includes a first candidate model structure and a second candidate model structure that are different from the seed model structure. In an initial case, the model structure searching means 110 may initialize the model transition probability such that the probability that the seed model structure is converted into the first candidate model structure and the probability that the seed model structure is converted into the second candidate model structure after one conversion are equal.

Then, the model structure search means 110 may iteratively execute blocks 530 to 560 until an iteration termination condition is satisfied.

At block 530, the model structure search means 110 generates a set of candidate model structures based on the seed model structure and the model transition probabilities. For example, the generated set of candidate model structures is included among a plurality of candidate model structures included in the model structure search space.

In some embodiments, the model structure search means 110 may first determine the number of candidate model structures to be generated. Then, the model structure search means 110 may generate a set of candidate model structures by performing the number of conversions on the seed model structure. Since initially the probability that a seed model structure is transformed into a different candidate model structure after one transformation is equal, the initially generated set of candidate model structures may be arbitrarily selected from the model structure search space.

At block 540, the model structure search means 110 determines performance indicators for the set of candidate model structures. In some embodiments, for each candidate model structure in the set of candidate model structures, the candidate model structure may be utilized to train a model for predicting information for a particular face keypoint. The trained model may be utilized to perform face keypoint prediction tasks (e.g., locating and predicting the depth of particular face keypoints in a face image) to arrive at a prediction. By comparing the prediction result with the actual information of the specific face key point of the face image, the performance index (e.g., accuracy rate of prediction, etc.) of the model can be determined. The determined performance indicator may be considered as a performance indicator of the candidate model structure. In this way, the model structure search means 110 is able to determine a performance indicator for each candidate model structure in the set of candidate model structures.

At block 550, the model structure search means 110 determines whether an iteration termination condition is satisfied. In some embodiments, the iteration termination condition may include one of: the iteration times reach threshold times; or the performance indicators of the selected candidate model structures in the two iterations vary by less than a threshold (i.e., converge).

If the iteration termination condition is not satisfied, the method 500 proceeds to block 560 where the model structure search means 110 updates the model transition probabilities based on the performance metrics of the set of candidate model structures.

In some embodiments, the model structure search means 110 may rank the set of candidate model structures according to their performance indicators. The model structure search means 110 may update the model transition probabilities associated with each candidate model structure in the set of candidate model structures based on the ranking structure, wherein the model transition probabilities associated with candidate model structures with better performance indicators are to be updated to exceed the model transition probabilities associated with candidate model structures with worse performance indicators. For example, the set of candidate model structures includes a third candidate model structure and a fourth candidate model structure, and the performance metric of the third candidate model structure is better than the performance metric of the fourth candidate model structure. In some embodiments, the model structure searching means 110 may update the model transition probabilities such that the probability that the seed model structure is converted into the third candidate model structure over the one conversion exceeds the probability of being converted into the fourth candidate model structure.

In response to the model transition probabilities being updated, the method 500 proceeds to block 530 for the next iteration. In this way, the performance indicators of the selected candidate model structures can get better and better until the iteration termination condition is met.

If it is determined at block 550 that the iteration termination condition is satisfied, the method 500 proceeds to block 570, where the model structure search means 110 selects the candidate model structure with the best performance metric from the set of candidate model structures as the final model structure (e.g., the model structure 102 shown in FIG. 1).

FIG. 6 illustrates an exemplary model structure 600 searched from the model structure search space 400 shown in FIG. 4, according to some embodiments of the present disclosure. For example, the example model structure 600 is obtained by the model structure search apparatus 110 by performing the method 300 as shown in FIG. 3.

As shown in fig. 6, the feature map 401 is converted into a feature map 402 by a 2-fold down-sampling and convolution operation f4, the feature map 402 is converted into a feature map 403 by a 2-fold down-sampling and convolution operation f5, the feature map 403 is converted into a feature map 404 by a 2-fold down-sampling and convolution operation f6, and the feature map 404 is converted into a feature map 405 by a 2-fold down-sampling and convolution operation f 7. The feature map 404 and the feature map 405 are fused into a feature map 406, where a convolution operation f3 is performed on the feature map 404 and a 2-fold upsampling and convolution operation f11 is performed on the feature map 405 in the fusion process. The feature map 403 and the feature map 406 are fused into a feature map 407, where a convolution operation f2 is performed on the feature map 403 and a 2-fold upsampling and convolution operation f10 is performed on the feature map 406 in the fusion process. The feature map 402 and the feature map 407 are fused into a feature map 408, where a convolution operation f1 is performed on the feature map 402 and a 2-fold upsampling and convolution operation f9 is performed on the feature map 407 in the fusion process. Finally, the input feature map 401 and the feature map 408 are fused into an output feature map 409, wherein a convolution operation f0 is performed on the feature map 401 and a 2-fold upsampling and convolution operation f8 is performed on the feature map 408 in the fusion process.

As can be seen from the above description, embodiments of the present disclosure propose an automatic search scheme for model structure for a face keypoint prediction problem. The scheme can determine a model structure search space for predicting information of key points of the face based on the face image. The model structure search space comprises a plurality of candidate model structures, wherein each candidate model structure indicates a fusion mode of a plurality of feature maps obtained from a face image into an output feature map used for predicting the information of key points of the face, and corresponding operation applied to the plurality of feature maps in the fusion. Then, based on the type of the specific face keypoint, a model structure suitable for predicting information of the specific face keypoint is searched in the model structure search space. The embodiment of the disclosure can automatically search the optimal model structure suitable for the human face key point prediction task, thereby improving the efficiency and accuracy of the human face key point prediction.

FIG. 7 shows a schematic block diagram of an apparatus 700 for searching a model structure according to an embodiment of the present disclosure. The apparatus 700 may be included in the model structure search apparatus 110 as shown in fig. 1 or implemented as the model structure search apparatus 110. As shown in fig. 7, the apparatus 700 may include a search space determining module 710 configured to determine a model structure search space capable of predicting information of a face keypoint based on a face image, the model structure search space including a plurality of candidate model structures, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map for predicting information of the face keypoint, and a corresponding operation applied to the plurality of feature maps in the fusion. The apparatus 700 may further include a model structure search module 720 configured to search a model structure search space for a model structure suitable for predicting information of a particular face keypoint based on a face image based on a type of the particular face keypoint.

In some embodiments, the specific face keypoints comprise sparse keypoints for delineating a face contour and/or five sense organs.

In some embodiments, the particular face keypoints comprise dense keypoints for delineating face details.

In some embodiments, the information of the specific face keypoints comprises at least one of: the position of a specific face key point in the face image; and the depth corresponding to the key point of the specific face.

In some embodiments, the model structure search module further comprises: a seed selection unit configured to select one of the plurality of candidate model structures as a seed model structure; a probability determination unit configured to determine a model transition probability for the seed model structure, the model transition probability indicating a respective probability that the seed model structure is converted into each of the plurality of candidate model structures through one conversion; an iteration unit configured to iteratively perform the following operations until an iteration termination condition is satisfied: generating a set of candidate model structures based on the seed model structures and the model transition probabilities; determining a performance indicator for a set of candidate model structures; and in response to the iteration termination condition not being satisfied, updating the model transition probability based on the performance indicators of the set of candidate model structures; and a model structure determination unit configured to determine, as the model structure, a candidate model structure having the best performance index among the set of candidate model structures in response to the iteration termination condition being satisfied.

In some embodiments, the iteration termination condition comprises one of: the iteration times reach threshold times; or the performance indicators for a set of candidate model structures vary by less than a threshold.

In some embodiments, the plurality of candidate model structures includes a first candidate model structure and a second candidate model structure, and the probability determination unit is further configured to: model transition probabilities are initialized such that a probability that the seed model structure is converted into a first candidate model structure after one conversion is equal to a probability that the seed model structure is converted into a second candidate model structure.

In some embodiments, the iteration unit is further configured to: determining the number of candidate model structures to be generated; and generating a set of candidate model structures by performing the quantitative transformation on the seed model structure.

In some embodiments, the set of candidate model structures comprises a third candidate model structure and a fourth candidate model structure, the performance metric of the third candidate model structure being better than the performance metric of the fourth candidate model structure, and wherein the iteration unit is further configured to: the model transition probabilities are updated such that the probability that the seed model structure is converted into the third candidate model structure over the one conversion exceeds the probability of being converted into the fourth candidate model structure.

In some embodiments, the iteration unit is further configured to: for each candidate model structure in a set of candidate model structures, training the candidate model structure to obtain a model for predicting information of a specific face keypoint based on a face image; predicting information of the specific face key point based on the face image by using the model; and determining the performance index of the candidate model structure based on the prediction result of the model.

Fig. 8 illustrates a schematic block diagram of an example device 800 that may be used to implement embodiments of the present disclosure. The apparatus 800 may be used to implement the model structure search means 110, the model training means 120 and/or the model application means 130 as shown in fig. 1. As shown, device 800 includes a Central Processing Unit (CPU)801 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)802 or loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Processing unit 801 performs the various methods and processes described above, such as processes 300 and/or 500. For example, in some embodiments, processes 300 and/or 500 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by CPU 801, a computer program may perform one or more of the steps of processes 300 and/or 500 described above. Alternatively, in other embodiments, CPU 801 may be configured to perform processes 300 and/or 500 in any other suitable manner (e.g., by way of firmware).

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for searching a model structure, comprising:

determining a model structure search space capable of predicting information of face key points based on a face image, the model structure search space comprising a plurality of candidate model structures, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map used for predicting information of the face key points, and a corresponding operation applied to the plurality of feature maps in the fusion; and

searching, based on a type of a specific face keypoint, a model structure in the model structure search space suitable for predicting information of the specific face keypoint based on a face image, comprising:

selecting one of the plurality of candidate model structures as a seed model structure;

determining a model transition probability for the seed model structure, the model transition probability indicating a respective probability that the seed model structure is converted into each of the plurality of candidate model structures through one conversion;

iteratively performing the following until an iteration termination condition is satisfied:

generating a set of candidate model structures based on the seed model structure and the model transition probabilities;

determining a performance metric for the set of candidate model structures; and

in response to the iteration termination condition not being satisfied, updating the model transition probability based on the performance indicators for the set of candidate model structures; and

determining a candidate model structure with a best performance metric among the set of candidate model structures as the model structure in response to the iteration termination condition being satisfied.

2. The method of claim 1, wherein the particular face keypoints comprise sparse keypoints for delineating face contours and/or five sense organs.

3. The method of claim 1, wherein the particular face keypoints comprise dense keypoints for delineating face details.

4. The method of claim 1, wherein the information of the particular face keypoints comprises at least one of:

the position of the specific face key point in the face image; and

and the depth corresponding to the specific face key point.

5. The method of claim 1, wherein the iteration termination condition comprises one of:

the iteration times reach threshold times; or

The set of candidate model structures has a performance metric variation below a threshold.

6. The method of claim 1, wherein the plurality of candidate model structures includes a first candidate model structure and a second candidate model structure, and determining the model transition probability includes:

initializing the model transition probabilities such that a probability that the seed model structure is converted into the first candidate model structure with one conversion is equal to a probability that the seed model structure is converted into the second candidate model structure.

7. The method of claim 1, wherein generating the set of candidate model structures comprises:

determining the number of candidate model structures to be generated; and

generating the set of candidate model structures by performing the quantitative transformation on the seed model structure.

8. The method of claim 1, wherein the set of candidate model structures includes a third candidate model structure and a fourth candidate model structure, the performance metric of the third candidate model structure being better than the performance metric of the fourth candidate model structure, and updating the model transition probability comprises:

updating the model transition probabilities such that a probability that the seed model structure is converted into the third candidate model structure over a conversion exceeds a probability of being converted into the fourth candidate model structure.

9. The method of claim 1, wherein determining performance indicators for the set of candidate model structures comprises:

for each candidate model structure of the set of candidate model structures,

training the candidate model structure to derive a model for predicting information of the specific face keypoints based on a face image;

predicting information of the specific face key point based on a face image by using the model; and

and determining the performance index of the candidate model structure based on the prediction result of the model.

10. An apparatus for searching a model structure, comprising:

a search space determination module configured to determine a model structure search space capable of predicting information of face key points based on a face image, the model structure search space including a plurality of candidate model structures, wherein each candidate model structure indicates a fusion manner of a plurality of feature maps derived from the face image into an output feature map used for predicting information of face key points, and a corresponding operation applied to the plurality of feature maps in the fusion; and

a model structure search module configured to search the model structure search space for a model structure suitable for predicting information of a specific face keypoint based on a face image based on a type of the specific face keypoint, the model structure search module further comprising:

a seed selection unit configured to select one of the plurality of candidate model structures as a seed model structure;

a probability determination unit configured to determine a model transition probability for the seed model structure, the model transition probability indicating a respective probability that the seed model structure is converted into each of the plurality of candidate model structures through one conversion;

an iteration unit configured to iteratively perform the following operations until an iteration termination condition is satisfied:

determining a performance metric for the set of candidate model structures; and

in response to the iteration termination condition not being satisfied, based on the set of candidates

Updating the model transition probability according to the performance index of the model structure; and

a model structure determination unit configured to determine, as the model structure, a candidate model structure with a best performance indicator among the set of candidate model structures in response to the iteration termination condition being satisfied.

11. The device of claim 10, wherein the particular face keypoints comprise sparse keypoints for delineating face contours and/or five sense organs.

12. The apparatus of claim 10, wherein the particular face keypoints comprise dense keypoints for delineating face details.

13. The apparatus of claim 10, wherein the information of the particular face keypoints comprises at least one of:

the position of the specific face key point in the face image; and

and the depth corresponding to the specific face key point.

14. The apparatus of claim 10, wherein the iteration termination condition comprises one of:

the iteration times reach threshold times; or

15. The apparatus of claim 10, wherein the plurality of candidate model structures comprises a first candidate model structure and a second candidate model structure, and the probability determination unit is further configured to:

16. The apparatus of claim 10, wherein the iteration unit is further configured to:

determining the number of candidate model structures to be generated; and

17. The apparatus of claim 10, wherein the set of candidate model structures comprises a third candidate model structure and a fourth candidate model structure, the performance metric of the third candidate model structure being better than the performance metric of the fourth candidate model structure, and wherein the iteration unit is further configured to:

18. The apparatus of claim 10, wherein the iteration unit is further configured to:

for each candidate model structure of the set of candidate model structures,

19. A computing device, comprising:

one or more processors; and

memory storing one or more programs that, when executed by the one or more processors, cause the computing device to implement the method of any of claims 1-9.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.