CN113033263B

CN113033263B - Face image age characteristic recognition method

Info

Publication number: CN113033263B
Application number: CN201911354494.2A
Authority: CN
Inventors: 黄映婷; 郑文先; 黎永冬; 肖婷; 张阳
Original assignee: Chengdu Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Chengdu Yuntian Lifei Technology Co ltd; Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2024-06-11
Anticipated expiration: 2039-12-24
Also published as: CN113033263A

Abstract

The application provides a face image age characteristic identification method, which comprises the following steps: acquiring a face image; performing age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image; and the third-stage deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image. Because the ages of the same age are different in different sexes in the face recognition process, the application is beneficial to improving the accurate recognition of the ages of different sexes in the face recognition process.

Description

Face image age characteristic recognition method

Technical Field

The invention belongs to the technical field of face recognition, and particularly relates to a face image age characteristic recognition method.

Background

The age estimation based on the face image means that a computer technology is applied to model the rule of the face image changing along with the age, so that a machine can estimate the approximate age or the age range of a person according to the face image. If the problem of age estimation based on face images is solved, various human-computer interaction systems based on age information will generate great application demands in real life in daily life.

In order to judge the age of the face image, the existing age identification system identifies the age through a neural network of two-stage cascade connection, but because different sexes influence the age identification, the age identification can be greatly error by using the two-stage neural network.

Disclosure of Invention

In order to overcome the defects of the prior art, the application aims to provide the face image age characteristic recognition method which is beneficial to improving the accurate recognition of different sexes to ages in the face recognition process due to the fact that the ages of the same age are different in the face recognition process.

In a first aspect, the present application provides a method for identifying age characteristics of a face image, including:

acquiring a face image;

Performing age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image; and the third-stage deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image.

In one possible implementation, the method further includes:

And carrying out image preprocessing on the face image, wherein the image preprocessing comprises image righting and/or image enhancement and/or image normalization.

In one possible implementation manner, before the age identification is performed on the face image according to the target model to obtain the age characteristic of the face image, the method further includes:

Acquiring training data, wherein the training data comprises one or more sample face images marked with age characteristics and gender characteristics;

and training the target model through a deep learning algorithm according to the sample face image.

In one possible implementation manner, the training the target model through a deep learning algorithm according to the sample face image specifically includes:

Training the first-stage deep neural network through the deep learning algorithm according to the sample face image and the gender characteristic marked by the sample face image;

Training the second-stage deep neural network through the deep learning algorithm according to the sample face image and the age group characteristics of the age characteristics marked by the sample face image;

And training the third-stage deep neural network through a deep learning algorithm according to the sample face image and the age characteristic marked by the sample face image.

In one possible implementation, before the acquiring the face image, the method further includes:

Displaying a first application interface;

receiving a first input operation of a user aiming at a first application interface;

the step of acquiring the face image specifically comprises the following steps:

and responding to the first input operation, and acquiring a face image of the user.

After the age characteristic of the face image is obtained, the method further comprises:

Determining commodity information corresponding to the age characteristics of the face image from a commodity database according to the age characteristics; the commodity database comprises commodity information corresponding to a plurality of age characteristics.

In a second aspect, the present application provides a facial image age characteristic recognition apparatus, including:

the first acquisition unit is used for acquiring the face image;

The identification unit is used for carrying out age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image; and the third-stage deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image.

In one possible implementation, the apparatus further includes:

the preprocessing unit is used for carrying out image preprocessing on the face image, wherein the image preprocessing comprises image righting and/or image enhancement and/or image normalization.

In one possible implementation, the apparatus further includes:

The second acquisition unit is used for acquiring training data, wherein the training data comprises one or more sample face images marked with age characteristics and gender characteristics;

And the training unit is used for training the target model through a deep learning algorithm according to the sample face image.

In a third aspect, the present application provides an age characteristic recognition apparatus, comprising: one or more processors, one or more memories, a transceiver; the one or more memories are coupled to the one or more processors, the one or more memories are configured to store computer program code, the computer program code comprising computer instructions that, when executed by the one or more processors, perform a face image age identification method in any of the possible implementations of the above.

In a fourth aspect, the present application provides a computer storage medium comprising computer instructions which, when executed, perform a method of face image age identification in any one of the possible implementations of the above.

The invention provides a face image age characteristic recognition method, which adopts a three-level cascade deep neural network to recognize the face image age characteristic, thereby being beneficial to improving the accurate recognition of different sexes on the age characteristic in the face image recognition process.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Wherein:

fig. 1 is a schematic structural diagram of a Convolutional Neural Network (CNN) provided by the present application;

FIG. 2 is a schematic diagram of a Convolutional Neural Network (CNN) with multiple convolutional layers/pooling layers in parallel, provided by the present application;

FIG. 3 is a schematic diagram of three-dimensional convolution kernel dimension reduction provided by an embodiment of the present application;

FIG. 4 is a system architecture diagram provided by an embodiment of the present application;

FIG. 5 is a flowchart of a method for training a target model according to an embodiment of the present application;

FIG. 6 is a flowchart of a face image age feature recognition method according to an embodiment of the present application;

FIG. 7 is a system diagram provided by an embodiment of the present application;

FIGS. 8-9 are user interface diagrams of a wild APP provided by an embodiment of the present application;

FIG. 10 is a training device for a target model according to an embodiment of the present application;

fig. 11 is a schematic diagram of an implementation apparatus of a target model according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The invention provides a face image age characteristic identification method, which comprises the steps of obtaining a face image; performing age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image; the third-level deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image

Because the ages of the same age are different in different sexes in the face recognition process, the application is beneficial to improving the accurate recognition of the ages of different sexes in the face recognition process.

Because the embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and related concepts of neural networks related to the embodiments of the present application will be described below.

(1) Deep neural network.

Deep neural networks (Deep Neural Network, DNN), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, many of which are not particularly metrics. From DNNs, which are divided by the location of the different layers, the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: Wherein/> Is an input vector,/>Is an output vector,/>Is an offset vector and W is a weight matrix (also called coefficient). Each layer is only for input vectors/>The output vector/>, is obtained through the simple operationSince the DNN layer number is large, the coefficient W and the offset vectorAnd thus a large number. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as/>The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficients from the kth neuron of the L-1 th layer to the jth neuron of the L-th layer are defined as/>It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(2) A convolutional neural network.

The convolutional neural network (CNN, convolutional Neuron Network) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter and the convolution process can be seen as a convolution with an input image or convolution feature plane (feature map) using a trainable filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The underlying principle in this is: the statistics of a certain part of the image are the same as other parts. I.e. meaning that the image information learned in one part can also be used in another part. The same learned image information can be used for all locations on the image. In the same convolution layer, a plurality of convolution kernels may be used to extract different image information, and in general, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

As shown in fig. 1, convolutional Neural Network (CNN) 100 may include an input layer 110, a convolutional layer/pooling layer 120, where the pooling layer is optional, and a neural network layer 130. This is described in detail below:

Convolution layer/pooling layer 120:

Convolution layer:

The convolutional/pooling layer 120 as shown in fig. 1 may include layers as examples 121-126, in one implementation, 121 being a convolutional layer, 122 being a pooling layer, 123 being a convolutional layer, 124 being a pooling layer, 125 being a convolutional layer, 126 being a pooling layer; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

Taking the example of the convolution layer 121, the convolution layer 121 may include a plurality of convolution operators, which may also be referred to as kernels, whose role in image processing is to be as a filter for extracting specific information from an input image matrix, where the convolution operators may be essentially a weight matrix, which is typically predefined, and where the weight matrix is typically processed on the input image in a horizontal direction (or two pixels followed by two pixels … … depending on the value of the step size stride) to perform the task of extracting specific features from the image. The size of the weight matrix is related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image in the process of performing the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same dimension. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image. Different weight matrices can be used for extracting different features in the image, for example, one weight matrix is used for extracting image edge information, the other weight matrix is used for extracting specific colors of the image, the other weight matrix is used for blurring … … unnecessary noise points in the image, the dimensions of the weight matrices are identical, the dimensions of feature images extracted by the weight matrices with the identical dimensions are identical, and the extracted feature images with the identical dimensions are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can extract information from the input image, so that the Convolutional Neural Network (CNN) 100 can be helped to perform correct prediction.

When Convolutional Neural Network (CNN) 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general features, which may also be referred to as low-level features; as the depth of Convolutional Neural Network (CNN) 100 increases, features extracted by the later convolutional layers (e.g., 126) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer:

Since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, i.e., layers 121-126 as exemplified by convolutional/pooling layer 120 in fig. 1, which may be one convolutional layer followed by a pooling layer, or a multi-layer convolutional layer followed by one or more pooling layers. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The averaging pooling operator may calculate pixel values in the image over a particular range to produce an average value. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 130:

After processing by the convolutional layer/pooling layer 120, the Convolutional Neural Network (CNN) 100 is not yet sufficient to output the required output information. Because, as previously described, the convolution/pooling layer 120 will only extract features and reduce the parameters imposed by the input image. However, in order to generate the final output information (the required class information or other relevant information), convolutional Neural Network (CNN) 100 needs to utilize neural network layer 130 to generate the output of one or a set of the required number of classes. Thus, multiple hidden layers (such as hidden layer 131, hidden layer 132 through hidden layer 13n shown in fig. 1) and output layer 140 may be included in neural network layer 130, where parameters included in the multiple hidden layers may be pre-trained according to relevant training data for a specific task type.

After the underlying layers of the neural network layer 130, i.e., the final layer of the overall Convolutional Neural Network (CNN) 100 is the output layer 140, the output layer 140 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall Convolutional Neural Network (CNN) 100 is completed, the backward propagation will begin to update the weights and deviations of the aforementioned layers to reduce the loss of the Convolutional Neural Network (CNN) 100 and the error between the result output by the Convolutional Neural Network (CNN) 100 through the output layer and the ideal result.

It should be noted that, the Convolutional Neural Network (CNN) 100 shown in fig. 1 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, a plurality of convolutional layers/pooling layers shown in fig. 2 are parallel, and the features extracted respectively are all input to the neural network layer 130 for processing.

Specifically, referring to fig. 3, a schematic diagram of three-dimensional convolution kernel dimension reduction is provided in an embodiment of the present application. As previously mentioned, there are typically multiple convolution kernels in a convolutional neural network, which tend to be three-dimensional, containing three dimensions of data, the x, y directions being the length and width of the data, and the z direction being the depth of the data. In practice, a three-dimensional convolution kernel may be converted to a two-dimensional convolution kernel by Matrix-Matrix multiplication (GEMM, general Matrix-Matrix Multiplication).

(3) Back propagation algorithm.

The Convolutional Neural Network (CNN) 100 may correct the size of parameters in the initial super-resolution model in the training process by adopting a Back Propagation (BP) algorithm, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial super-resolution model are updated by back-propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal super-resolution model, such as a weight matrix.

The system architecture 400 provided by embodiments of the present application is described below. Referring to fig. 4, as shown in a system architecture 400, a data acquisition device 460 is configured to acquire training data, where in an embodiment of the present application, the training data includes: and the human face image is marked with attribute characteristics. And stores the training data into the database 430, the training device 420 trains to obtain the target model 401 based on the training data maintained in the database 430, and the target model 401 can be used for implementing the face image age characteristic recognition method provided by the embodiment of the application, that is, the face image is preprocessed and then is input into the target model 401, and the target model 401 can recognize the attribute characteristics of the face image.

The target model 401 obtained by training according to the training device 420 may be applied to different systems or devices, such as the execution device 410 shown in fig. 4, where the execution device 410 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, etc., and may also be a server or cloud terminal, etc. In fig. 4, the execution device 410 is configured with an I/O interface 412 for data interaction with external devices, and a user may input data to the I/O interface 412 through the client device 440, where the input data may include, in an embodiment of the present application: the face image collected by the client device 440.

The preprocessing module 413 is configured to perform preprocessing according to input data (e.g., face images collected by the client device 440) received by the I/O interface 412.

In the process of performing computation or the like by the computing module 411, the execution device 410 may call data, codes or the like in the data storage system 450 for corresponding processing, or may store the data, instructions or the like obtained by the corresponding processing in the data storage system 450.

Finally, the I/O interface 412 returns the processing results, such as the attribute features of the face image identified as described above, to the client device 440 for provision to the user.

It should be noted that fig. 4 is only a system architecture diagram provided by an embodiment of the present application, and the positional relationship between the devices, apparatuses, modules, etc. shown in the figure does not constitute any limitation.

The method for training the object model according to the embodiment of the application is described below.

FIG. 5 is a flowchart of a method for training a target model according to an embodiment of the present application.

The method of training the object model may be performed by a training apparatus as shown in fig. 1. The training method comprises the following steps:

S501, establishing a target model.

In one possible implementation, the target model may be a three-level cascade of deep neural networks consisting of multiple independent convolutional neural networks, a first-level deep neural network, a second-level deep neural network, and a third-level deep neural network, respectively. The neural network of each stage includes an N-layer convolutional neural network, each of which may include a convolutional layer, a downsampling layer, and a fully-connected layer.

S502, acquiring training data A.

The data acquisition equipment acquires training data A, stores the training data A into a database, and places the acquired face images into a labeling system for labeling according to attribute characteristics, wherein the labeled face images are denoted as A= { a1, a2, a3 … ai … an }, ai is the acquired ith face image, and n face images are shared.

The attribute features of the face image may be one or more of the following: gender, age, race, expression, etc.; in the embodiment of the application, the attribute characteristics of the face image are gender and age, and the labeling system labels the acquired face image according to the gender and age.

In one possible implementation, the training data may be IMBD-wiki data sets. IMBD-wiki datasets have over 100 tens of thousands of face images labeled with gender and age, and are suitable for large-scale network training. In other embodiments, other image libraries may be used for training.

In the training process, a hierarchical training method is adopted, so that training time is saved, and accuracy of a target model is improved.

S503, training the first-stage deep neural network in the target model by using the training data A.

The first level deep neural network takes the face image a _i as input, and the first level deep neural network outputs the gender characteristics of the face image a _i. Then, the first-stage deep neural network calculates a loss error of the sex characteristic of the face image a _i output by the first-stage deep neural network, where the loss error is a probability value of the sex characteristic error of the face image a _i output by the first-stage deep neural network. For example, if the sex of the label of the face image a ₁ is male and the sex of the face image a ₁ output by the first-stage deep neural network is female, the first-stage deep neural network recognizes an error, and the n face images in the training data a are input to the first-stage deep neural network, respectively, to calculate a loss error. If the loss error is higher than the preset value, the loss error is reversely propagated and the parameters of the first-stage deep neural network are updated until the loss error of the training first-stage deep neural network is lower than the preset value, and the training of the first-stage deep neural network is finished.

Through the trained first-stage deep neural network, the sex characteristics of the input face image can be identified through the first-stage deep neural network.

S504, dividing the face image in the training data A into training data B with female sex and training data C with male sex according to the sex, and respectively training a second-level deep neural network in the target model by using the training data B and the training data C.

Firstly, a data set is established for a face image a _i with the gender of a female marked in training data A, which is called training data B; the face image a _i of the male with the sex marked in the training data a is established as a data set called training data C.

Before training the second level deep neural network, the ages of the people are divided into seven ages of 0-6 years old, 7-12 years old, 12-18 years old, 19-24 years old, 25-40 years old, 41-60 years old and above 60 years old.

First, the second-stage deep neural network takes training data B and gender girls as inputs, and the second-stage deep neural network outputs age groups of face images in the training data B. Then, the second-stage deep neural network calculates a loss error of the age characteristic of the face image in the training data B, wherein the age characteristic of the face image in the training data B is not in the age range of the face image in the training data B output by the second-stage deep neural network, the loss error is a probability value of the age characteristic of the face image in the training data B, for example, the age of the mark of the face image a ₃ in the training data B is 23, the age range of the face image a ₃ output by the second-stage deep neural network is 25-40 years, and the age range of 23 is not 25-40 years, so that the second-stage deep neural network recognizes the error. And similarly, respectively inputting the face images in the training data B into the first-stage deep neural network, and calculating the loss error. If the loss error is higher than the preset value, the loss error is reversely propagated and the parameters of the second-stage deep neural network are updated until the loss error of the training second-stage deep neural network is lower than the preset value, and the training of the second-stage deep neural network is finished.

Likewise, the second level deep neural network takes training data C and gender men as inputs, and the second level deep neural network outputs the age group of the face image in the training data C. Specifically, this training process is identical to the training process of the above embodiment, and will not be described here again.

Through the trained second-stage deep neural network, the second-stage deep neural network can identify and output age-group characteristics of the face image according to the gender characteristics of the face image and the face image output by the first-stage deep neural network.

According to the embodiment, the face images of the male and female sex of the person are independently trained into the second-stage deep neural network, so that the influence of sex characteristics on age identification is avoided, and the accuracy of identifying the age characteristics by the target model is improved.

S505, training the third-level deep neural network in the target model by using the training data B and the training data C according to age group classification.

The training data B and the training data C are marked with ages, and the ages of the people are divided into ages of 0-6 years old, 7-12 years old, 12-18 years old, 19-24 years old, 25-40 years old, 41-60 years old and over 60 years old.

The training data B and the training data C are classified according to age groups.

Illustratively, face images of ages 0-6 in training data B are set up as data set D, face images of ages 7-12 in training data B are set up as data set E, face images of ages 12-18 in training data B are set up as data set F, face images of ages 19-24 in training data B are set up as data set G, face images of ages 25-40 in training data B are set up as data set H, face images of ages 41-60 in training data B are set up as data set I, and face images of ages above 60 in training data B are set up as data set J.

Illustratively, face images of ages 0-6 in training data C are set up as data set D1, face images of ages 7-12 in training data C are set up as data set E1, face images of ages 12-18 in training data C are set up as data set F1, face images of ages 19-24 in training data C are set up as data set G1, face images of ages 25-40 in training data C are set up as data set H1, face images of ages 41-60 in training data C are set up as data set I1, and face images of ages above 60 in training data C are set up as data set J1.

Firstly, the third-level deep neural network takes training data D and ages of 0-6 years as input, and the third-level deep neural network outputs the ages of face images in the training data D. And then the third-stage deep neural network calculates and outputs a loss error of the age of the face image in the training data D and the labeling age of the face image in the training data D, wherein the loss error is a probability value that the labeling age of the face image in the training data D is different from the age of the face image output by the third-stage deep neural network, for example, the labeling age of the face image a ₅₀ is 53, the age of the face image a ₅₀ output by the third-stage deep neural network is 50, the third-stage deep neural network identifies the error, and the like, and the face image in the training data D is respectively input into the third-stage deep neural network to calculate the loss error. If the loss error is higher than the preset value, the loss error is reversely propagated and the parameters of the third-stage deep neural network are updated until the loss error of the training third-stage deep neural network is lower than the preset value, and the training of the third-stage deep neural network is finished.

Similarly, the data set E and age group 7-12 years old, the data set F and age group 12-18 years old, the data set G and age group 19-24 years old, the data set H and age group 25-40 years old, the data set I and age group 41-60 years old, and the data set J and age group 60 years old above are respectively input into the third-level deep neural network to train the third-level deep neural network, and specifically, the training process is identical to the training process of the above embodiment, and will not be repeated here.

Similarly, the data set D1 and age group 0-6 years old, the data set E1 and age group 7-12 years old, the data set F1 and age group 12-18 years old, the data set G1 and age group 19-24 years old, the data set H1 and age group 25-40 years old, the data set I1 and age group 41-60 years old, and the data set J1 and age group above 60 years old are respectively input into the third-level deep neural network to train the third-level deep neural network, and specifically, the training process is consistent with the training process of the above embodiment, and will not be repeated here.

Through the trained third-level depth neural network, the third-level depth neural network can identify and output the age characteristics of the face image according to the age bracket characteristics of the face image and the face image output by the second-level depth neural network.

Through the hierarchical training of the target model, time is saved, the accuracy of the training model is improved, and the target model can accurately identify the age characteristics of the face image.

Alternatively, the training method may be processed by the central processing unit (Central Processing Unit, CPU) in the training device 420 in fig. 4, or may be processed by both the CPU and the graphics processing unit (Graphic Processing Unit, GPU), or may be performed without a GPU, and other suitable processors for neural network computation may be used, which is not limited by the present application.

Through the training process, the target model can output the age of the face image according to the input face image.

Fig. 6 is a flowchart of a face image age characteristic recognition method according to an embodiment of the present application.

The method may be specifically performed by an execution device 410 as shown in fig. 4, and the method specifically includes the following steps:

s601, acquiring a face image.

The face image may be acquired by a client device shown in fig. 4, and the client device may be a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, etc., which is not limited herein.

The client device inputs the acquired face image to a preprocessing module in the execution device through an I/O interface.

S602, preprocessing a face image.

A preprocessing module in the execution device receives the face image acquired by the client device through the I/O interface and preprocesses the acquired face image.

When the first face image is acquired, the size and the position of the face in the whole acquired image are uncertain due to the influence of environments such as illumination intensity and position, and the accuracy of the identification result can be influenced.

The face image may be preprocessed by a preprocessing module as shown in fig. 4.

Specifically, the preprocessing module can be used for carrying out operations such as face image righting, face image enhancement, face image normalization and the like on face images acquired by the client equipment; the face image centralizing is used for obtaining a face image with a correct face position, the face image enhancing is used for improving the quality of the face image, the image is more beneficial to the processing and the recognition of a computer, and the face image normalizing is used for obtaining the face image with the same size.

S603, inputting the preprocessed face image into the target model.

S604, the target model identifies and outputs the age of the face image.

The target model of the embodiment of the application can be three-level cascade deep neural networks, namely a first-level deep neural network, a second-level deep neural network and a third-level deep neural network, wherein each level deep neural network comprises N layers of convolutional neural networks, and each layer of convolutional neural network can comprise a convolutional layer, a downsampling layer and a full-connection layer.

Specifically, the preprocessed face image is input into a first-stage deep neural network, and the first-stage deep neural network outputs the gender characteristics of the face image.

For example, the acquired face image is a girl, and the sex characteristic of the face image output by the first-stage deep neural network is a woman.

And inputting the sex characteristics of the face image output by the first-stage deep neural network and the face image into a second-stage deep neural network, and outputting the age bracket of the face image by the second-stage deep neural network.

For example, the sex characteristic of the face image output by the first-stage deep neural network, the acquired face image is 24 years old, then 24 is 19-24 years old, and the age range of the face image output by the second-stage deep neural network is 19-24 years old.

And inputting the age group of the face image output by the second-stage deep neural network and the face image into a third-stage deep neural network, and outputting the age of the face image by the third-stage deep neural network.

For example, the age range of the face image output by the second-stage deep neural network is 19 to 24 years, the age of the face image acquired is 24, and the age of the face image output by the third-stage deep neural network is 24 years.

The application provides a face image age characteristic recognition method based on a cascade neural network, which is beneficial to improving the accurate recognition of ages of different sexes in the face recognition process because the ages of the same ages are different in the face recognition process.

The method for recognizing the age characteristics of the face image based on the cascade neural network is described below with reference to application scenes.

In order to better understand the embodiment of the present application, an application of the method on a wearing application (ACCELERATED PARALLEL Processing, APP) is described below, specifically, the wearing APP receives a personal picture uploaded by a user, first the wearing APP identifies the sex of the personal picture, if it is a girl, a female homepage is recommended, and if it is a male homepage, a male homepage is recommended. After entering a male homepage or a female homepage, the wearing APP identifies the age of the personal picture, and recommends wearing single products suitable for the age.

The application of the method on the wild APP can be applied to a computer system/server. Referring to fig. 7, the system 701 of the present invention includes a terminal 700, a server 710, which may be implemented by a smart phone, a computer, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), or the like. A server is a device that provides computing services. The terminal may be connected to the server by means of a wired network or a wireless network. In this embodiment, the terminal is provided with a through-put APP.

Referring to fig. 8, a user interface diagram for logging in a wearing APP includes an upload personal image control 801, a terminal receives and responds to a user clicking operation of the upload personal image control 801, the terminal receives a personal image and identifies a sex characteristic of the personal image, and determines an age of the personal image in combination with the sex characteristic of the personal image, and recommends wearing items of each age for the user, and the wearing APP will display the try-on user interface shown in fig. 9.

Illustratively, if the uploaded personal picture is a male, the wearing APP will enter the application interface of the male, further judging that the age of the male is 24 years old, the wearing APP will recommend wearing items of the 24 year old male, such as a male shirt, a T-shirt, sports pants, canvas shoes, etc.; if the personal image is a female, the wearing APP will enter the application interface of the female, and further determine that the female age is 24 years, the wearing APP will recommend wearing items of the female 24 years old, such as one-piece dress, short skirt, T-shirt, high-heeled shoes, and the like.

As shown in fig. 9, the identified personal image displayed for the wearing APP is a user interface of a female aged 24, and fig. 9 includes a try-on control 901, a personal image 902 (a personal image uploaded by a user), a recommended wearing control 903, and a recommended wearing control 903, specifically including a wearing single-item control such as a dress control 904, a short skirt suit control 905, a overcoat control 906, and the like.

The click single item control in the recommended click control 903 may receive a click of each click single item control by the user, and each click single item may be received and displayed in the personal picture 902 in response to a click operation by the user. The dress controls 904, 905, and 906 may receive a user click operation to switch between the personal images 902.

This is merely one implementation of the embodiments of the present application and should not be construed as limiting.

The device for identifying the age characteristics of the face image provided by the embodiment of the application comprises the following components:

the first acquisition unit is used for acquiring the face image;

In one possible implementation, the apparatus further includes:

In one possible implementation manner, before the age identification is performed on the face image according to the target model, the apparatus further includes:

the first training unit is used for training the first-stage deep neural network through the deep learning algorithm according to the sample face image and the gender characteristic marked by the sample face image;

The second training unit is used for training the second-stage deep neural network through the deep learning algorithm according to the sample face image and the age bracket of the age characteristic marked by the sample face image;

And the third training unit is used for training the third-stage deep neural network through a deep learning algorithm according to the sample face image and the age characteristic marked by the sample face image.

The following describes a training device 1000 for a target model according to an embodiment of the present application. A training apparatus 1000 of a target model shown in fig. 10 (the training apparatus 1000 may be a computer device in particular) includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004. The memory 1001, the processor 1002, and the communication interface 1003 are connected to each other by a bus 1004.

The Memory 1001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 1001 may store a program, and when the program stored in the memory 1001 is executed by the processor 1002, the processor 1002 and the communication interface 1003 are for performing respective steps of a training method of a target model of an embodiment of the present application.

The processor 1002 may employ a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to perform the functions required by the elements in the training apparatus for the object model of the present application or to perform the training method for the object model of the present application.

The processor 1002 may also be an integrated circuit chip with signal processing capabilities. In implementation, the various steps of the training method of the object model of the present application may be accomplished by instructions in the form of integrated logic circuits of hardware or software in the processor 1002. The processor 1002 may also be a general purpose processor, a digital signal processor (DIGITAL SIGNAL Processing unit, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and combines the hardware thereof to perform the functions required to be performed by the units included in the training apparatus for the object model according to the embodiment of the present application, or to perform the training method for the object model according to the embodiment of the present application.

Communication interface 1003 enables communication between exercise device 1000 and other devices or communication networks using a transceiver device such as, but not limited to, a transceiver. For example, training data may be acquired through the communication interface 1003.

Fig. 11 is an execution apparatus 1100 of a target model according to an embodiment of the present application. The execution apparatus 1100 of the object model shown in fig. 11 (the execution apparatus 1100 may be a computer device specifically) includes a memory 1101, a processor 1102, a communication interface 1103, and a bus 1104. The memory 1101, the processor 1102, and the communication interface 1103 are communicatively connected to each other through a bus 1104.

The Memory 1101 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 1101 may store programs, and when the programs stored in the memory 1101 are executed by the processor 1102, the processor 1102 and the communication interface 1103 are configured to perform the steps of the execution device of the embodiment of the present application.

The processor 1102 may employ a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing the associated program.

The processor 1102 may also be an integrated circuit chip with signal processing capabilities. In implementation, the various execution steps of the execution device of the present application may be performed by integrated logic circuits of hardware in the processor 1102 or by instructions in the form of software. The processor 1102 may also be a general purpose processor, a digital signal processor (DIGITAL SIGNAL Processing unit, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1101 and the processor 1102 reads information in the memory 1101.

The communication interface 1103 enables communication with other devices or communication networks using a transceiver device such as, but not limited to, a transceiver. For example, training data may be acquired through the communication interface 1103.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a usb disk, a magnetic disk, an optical disk, a Read-only memory (ROM), a random access memory (Random Access Memory, RAM), or the like.

The present application may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present application.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to the respective computing/processing device or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as SMALLTALK, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The description of the present embodiments has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for the only preferred embodiments disclosed herein.

Claims

1. A method for identifying age characteristics of a face image, comprising:

acquiring a face image;

Inputting the face image into a target model;

Performing age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image, and takes the face image and the sex characteristic output by the first-stage deep neural network as input of the second-stage deep neural network; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image, and takes the face image and the age group feature output by the second-stage deep neural network as input of the third-stage deep neural network; the third-level deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image;

Before the age identification is performed on the face image according to the target model to obtain the age characteristic of the face image, the method further comprises: acquiring training data A, wherein the training data A comprises one or more sample face images marked with age characteristics and gender characteristics;

training a first-stage deep neural network in the target model by using training data A;

Dividing the face image in the training data A into training data B with female gender and training data C with male gender according to gender, and respectively training a second-level deep neural network in the target model by utilizing the training data B and the training data C;

And training the third neural network in the target model according to the age group classification by using the training data B and the training data C.

2. The method of claim 1, wherein after the acquiring the face image, the method further comprises:

3. The method of claim 1, wherein prior to the acquiring the face image, the method further comprises:

Displaying a first application interface;

responding to the first input operation, and acquiring a face image of the user;

4. A facial image age characteristic recognition apparatus, comprising:

the first acquisition unit is used for acquiring the face image;

The recognition unit is used for inputting the face image into a target model;

The identification unit is also used for carrying out age identification on the face image according to the target model to obtain age characteristics of the face image; the target model comprises a first-level deep neural network, a second-level deep neural network and a third-level deep neural network; the first-stage deep neural network performs sex characteristic extraction on the face image to obtain the sex characteristic of the face image, and takes the face image and the sex characteristic output by the first-stage deep neural network as input of the second-stage deep neural network; the second-stage deep neural network performs age group feature extraction on the face image to obtain the age group feature of the face image, and takes the face image and the age group feature output by the second-stage deep neural network as input of the third-stage deep neural network; the third-level deep neural network performs the age characteristic extraction on the face image to obtain the age characteristic of the face image;

The second acquisition unit is used for acquiring training data A, wherein the training data A comprises one or more sample face images marked with age characteristics and gender characteristics; the training unit is used for training the first-stage deep neural network in the target model by utilizing the training data A;

5. The apparatus of claim 4, wherein the apparatus further comprises:

6. An age characteristic recognition apparatus, comprising: one or more processors, one or more memories, a transceiver; the one or more memories coupled to the one or more processors, the one or more memories for storing computer program code comprising computer instructions that, when executed by the one or more processors, perform the method of any of claims 1-3.

7. Computer storage medium comprising computer instructions which, when executed, perform the method according to any of claims 1-3.