WO2019209131A1

WO2019209131A1 - Method of training a neural network for human facial recognition

Info

Publication number: WO2019209131A1
Application number: PCT/RU2018/000259
Authority: WO
Inventors: Евгений Алексеевич СМИРНОВ
Original assignee: Общество с ограниченной ответственностью "ЦРТ-инновации"
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2019-10-31
Also published as: EA202092529A1; KR20210033940A

Abstract

The invention relates to the field of facial biometrics, and more particularly to the task of training neural networks for facial recognition. Proposed is a method of training neural networks, according to which a database containing human facial images is made available and a list of look-alikes is made available. Next, a mini-packet of human facial images is generated by first including therein a set of human facial images from the database and then adding for each person having at least one image included in the mini-packet, at least one image of that person's look-alike from the list of look-alikes if a look-alike exists and if an image of said look-alike is not already included in the mini-packet, or, if no look-alike exists or if an image of a look-alike is already included in the mini-packet, adding at least one image of a different person from the database. The human facial images from the mini-packet are then fed to the input of a neural network. A verification training signal and an identification training signal are generated using the results obtained at the outlet of the neural network. The neural network is then trained using the verification and identification training signals. Each person is associated with another person as a look-alike using the aforesaid results, and the list of look-alikes is updated whenever a pair of look-alikes is obtained which is not present in the list of look-alikes. The above operations are repeated, starting with the generation of a mini-packet.

Description

METHOD FOR TRAINING NEURAL NETWORK RECOGNITION OF PERSONS

PEOPLE

FIELD OF TECHNOLOGY

The invention relates to the field of facial biometry, in particular to the task of training neural networks for face recognition.

BACKGROUND

A known face recognition method (WO2016119076). The document describes the convolutional neural network architecture for solving the face recognition problem. Architecture is taught using a combined identification and verification signal.

The disadvantage is that the training method in accordance with the known invention does not provide a neural network that allows with high accuracy to recognize the faces of people.

A patent publication is known (US20160180151), which discloses a method for teaching a convolutional neural network to recognize faces and a method for obtaining training images of people's faces. The specified learning method involves the use of triplets.

During operation, the network generates vector representations of facial images that are close in the Euclidean space for different images of one person and distant for images of two different people.

The disadvantage of this method is the difficulty of obtaining training examples (triplets), which slows down the process of learning a neural network.

A patent publication is known (US20150125049), which describes a method for classifying individuals using a deep convolutional neural network. In this method, the possibility of creating a three-dimensional image of a face from a two-dimensional image of a face is disclosed. Moreover, the identity of the image of a two-dimensional face can be classified based on the provision of a three-dimensional image of a face in a deep neural network. The identity of the two-dimensional image of the face may contain a vector of features. In addition, the method discloses the possibility of classifying images of one person, as well as the ability to verify images of both one class and different classes.

The disadvantage of this method is the difficulty of classifying and obtaining training examples, which slows down the learning process of a neural network. The known method of training does not allow you to get a neural network that provides high accuracy in recognizing people. A known recognition method (US20060204058), according to which face recognition is carried out on the basis of using a similarity measure between the feature vector obtained from the image supplied to the input and the feature vector obtained from the user image in the database. Also, according to this method, images of faces captured during face tracking are clustered, and the resulting clusters are used to train a face recognition system.

The disadvantage of this method of training is that it does not provide training of the neural network for further high-precision recognition of people's faces.

A known neural network for detecting and recognizing a deformed object (US5850470B). A well-known patent describes the structure of a face recognition system based on the use of neural networks trained in classification. To ensure in the known method a sufficient variety of face images in the training set, the algorithm converts the resulting image to create additional training examples, otherwise known as virtual training images. Two types of training images are used. The first training scheme consists of positive images (face / eye images), which are used for enhanced learning. The second training scheme consists of negative images (images not of the face / not the eyes) that are used for anti-enhanced learning. Weight parameters and thresholds are updated with enhanced / anti-reinforced training.

The disadvantage of this method is the difficulty of obtaining training images of people's faces, which slows down the process of learning a neural network.

Thus, in the prior art there is a problem of creating a method for such training of a neural network, which will allow in the future to recognize people with high accuracy. In addition, the known methods for searching for training examples have a computationally complex algorithm, which slows down the learning process of neural networks.

In view of the existing disadvantages of the known methods of recognition and identification, the technical problem of the present invention is such a training of a neural network that will allow the recognition of people's faces with high accuracy.

SUMMARY OF THE INVENTION

The technical problem is solved due to the fact that, according to the proposed method of training a neural network for recognizing people's faces, they provide a database with images of people's faces and provide a list of twins. After that, a mini-package is formed from images of people's faces by first including a set of images of people's faces from the database, and then adding for each person, at least one image of which is included in the mini-package, at least one image of its double from the list of doubles, if there is a double and if the image of this double is not yet included in the mini-package, and if there is no double or if the image of the double is already included in mini-package, adding at least one image of another person from the database. Next, images of people's faces from the mini-packet are fed to the input of the neural network. Verification and identification training signals are generated using the results obtained at the output of the neural network. After that, the neural network is trained using a verification and identification training signal. At the same time, each person is put in correspondence with each other as a double of another person using the indicated results with updating the list of twins upon receipt of a pair of twins that is not in the list of twins. Next, the indicated operations are repeated starting from the formation of the mini-package.

The proposed method ensures the achievement of a technical result in the form of improving educational examples for training neural networks in face recognition, improving the quality of training neural networks in facial recognition and, as a result, increasing the accuracy of recognizing people's faces, in particular very similar people's faces, using the resulting neural network. In addition, the proposed method provides an increase in the speed of learning a neural network.

A neural network that is trained using training examples obtained by forming mini-packets (according to the proposed method) containing images related to two or more similar people, in particular to people from the database and to their doubles from the list of twins, is able to recognize people with very similar faces. In other words, the formation of a verification training signal, i.e. comparing the images placed in one mini-package allows you to qualitatively train the neural network, and the more similar (but different) people will be on these images - the more complex cases of face recognition trained in this way the neural network will be able to process. An increase in the similarity of images of persons falling into the mini-package is achieved due to the fact that during training with each iteration, the list of doubles is gradually filled with doubles, and at the same time, the list is updated by replacing existing doubles with more similar doubles from among people whose images are available in the database. In this case, to find twins, the results obtained for the formation of the identification training signal are used (reused), which does not require additional computational and time costs. In addition, after each iteration again form a mini-package, which allows filling a mini-package with each iteration of images of people with more similar faces, which improves the quality and accelerates the training of the neural network.

According to one implementation option, the training of a neural network is completed when the quality of training reaches a given criterion. The criterion makes it possible to determine whether a neural network is trained at a given iteration.

According to one implementation option, provide an empty list of doubles. Providing an empty list of doubles at the beginning of training provides the ability to fill this list with a neural network in the learning process, which improves the quality of its training.

According to one embodiment, the number of people whose images are added to the mini-package and the number of images of each person whose images are added to the mini-package are determined based on the hyperparameter of the learning algorithm. This solution allows you to determine the required number of people whose images are added to the mini-package, and the number of images of each person whose images are added to the mini-package, before starting training the neural network, which allows you to speed up the learning process of the neural network, and also improves the quality of training and as a result, increases the accuracy of recognition of people.

According to one implementation option, all images of each person in the database have an identifier. The identifier can be used when forming a mini-package. The presence of an identifier in the image allows the system to select images related to a specific person from the training base and place them in a mini-package. In addition, by writing the identifier of the double to the list of doubles, the system can select from the database the images related to the double of a particular person and also put them in the same mini-package. The presence of an identifier allows you to increase the speed and accuracy of training a neural network.

According to one implementation option, people whose images are added to the mini-package from the database are randomly selected. The choice of images of people randomly provides more diverse training examples, so trained neural network will be able to recognize more complex examples of images in the future.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is explained in more detail on non-restrictive examples of its implementation with reference to the accompanying drawings, among which: FIG. 1 is a neural network training system according to one embodiment of the invention;

FIG. 2 is a schematic diagram of a mini-packet formation according to one embodiment of the invention;

FIG. 3 is a search result for twins, according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

A method for training a neural network, in accordance with various embodiments of the present invention, can be implemented using, for example, known computer or multiprocessor systems. In other embodiments, the claimed method can be implemented using specialized software and hardware.

In the present description, the term "double" means a person whose face has a strong external resemblance to the face of another person. The specific similarity criterion is determined by the specific application. Also in the present description, the term "mini-package" refers to a set of images of people, in particular a set of images of people's faces, selected from a training database (database). The mini-packet may contain one image of each selected person or several images of at least some of the selected people.

In FIG. 1 shows a training system for training neural networks, with which a method for training neural networks can be implemented in accordance with one embodiment of the present invention. The training system contains means 1 of forming a database and means 2 of forming a list of doubles, which are respectively associated with means 3 of forming a mini-package, which in turn are associated with means 4 of obtaining vector representations that also contain a classifier that provides classification results. Means 4 are associated with means 5 of forming an identification training signal and means 6 for searching for a double. In addition, means 4 are associated with means 7 for generating a verification training signal. The double search means 6 are associated with the double list forming means 2. Moreover, the means 5 for generating an identification training signal and the means 7 for generating a verification training signal are associated with means 8 summing and weighing the identification and verification training signals that are associated with means 4 and are the output of the system.

In more detail, the sequence of operations of the learning process of neural networks is disclosed below.

To train the neural network (Fig. 1), a training base (database) is required, the formation of which is provided by means 1. The training base contains a set of images of the faces of people grouped by people, that is, for each person in the database there is a certain number of images of his face. In a private embodiment, the database may contain one image of each person. In other embodiments, the database may contain more than one image of some or all of the people whose images are presented in the database. Images of faces for the training base can be obtained by any method known to the specialist, for example, using a camera. A computer memory that can store images of people's faces can serve as a training base. In other embodiments, it is possible to store images and use other means as a training base, for example, camera memory, flash card memory, etc. An implementation option is also possible when images used to train a neural network are obtained from more than one training base.

It is important to note that each person, in particular, all images of each person in the training base have an identifier, and the images of a particular person have an identifier associated with the identifier of that particular person. Using identifiers during training, the system can select an image of a specific person. In other words, a person’s identifier allows the system to select images related to that person from the training base and place them in a mini-package (the formation of a mini-package is described below). In addition, the identifier of a person’s double placed in the double list allows the system to select images related to the double of this person from the training database and also place them in the same mini-package (the formation of the list of doubles is described below).

In addition, for training the neural network, a list of twins is required, the formation of which is ensured by means 2. The list of twins is a list of identifiers of images of twins in relation to people whose images are in the training base. Moreover, for a given person from the training base, the list of doubles contains one double. In other embodiments, in which a high learning speed of the neural network is not required, for a given person from the training base, the list of twins contains more than one double. In preferred In the embodiment, the initial list of doubles is created empty, i.e. as identifiers of images of doubles indicate the value "-1", indicating the absence of a double. In the process of training a neural network, this list is filled out with the method proposed in the invention, which is described below. Moreover, in other implementations that require the highest possible speed of learning a neural network, the list of twins can be filled initially.

The formation of a mini-package means 3 is as follows. Randomly select a certain number of people, in particular a certain number of images of persons belonging to the selected people, from the training base and put their images in a mini-package. At the same time, a certain number of images of people's faces are added to the mini-package using the list of doubles, so that each person added to the mini-package is a double of one of those people whose images were added to the mini-package earlier, starting from the first and shifting to one with the addition of images of each new person to the mini-package. If the image (s) of the double, in particular the identifier of the image related to the double, is not in the list of doubles for any person, i.e. the double was not found yet, or if the image (s) of the double were already in the current mini-package, then the next person (in particular, his images) is also added to the mini-package with the help of his random selection from the training base with the exception of people, images which are already added to the mini-package (in other words, the choice in the training base at this stage is made among those people who have not yet been added to the current mini-package). In this case, the number of people and the number of images of each person in the mini-package is determined by the hyperparameter of the learning algorithm. It is important to note that at the first iteration of training, a mini-package is formed only from images of people randomly selected from the training base, since the list of doubles is still empty.

For example, according to one embodiment of the invention, in FIG. 2 illustrates a mini-packet formation scheme. Block A, shown by a dashed line spanning block B and block C, contains identifiers of people whose images need to be placed in a mini-packet. Block B contains identifiers of a number of people who are randomly selected. In this embodiment, block B contains the identifiers of two people, in particular the identifier 921 of the first person and the identifier 2312 of the second person. Block Contains identifiers of a certain number of people, which are doubles of people whose identifiers are added to block B. The mini-package in this embodiment has the ability to accommodate 24 images of people. Thus, the formation of mini- The package will occur as follows. First, the first three images of a person (0, 1, 2) with identifier 921 and the following two images of a person with identifier 2312 are placed in it, and the person with identifier 11123 is a double for person 921 according to the list of doubles, and therefore the images (in this case, five images ) this person is added to the mini-package following the images of the person with identifier 2312. Next, the filling of the mini-package takes place according to the well-known principle, i.e. the person with identifier 22 is the double of the person with identifier 2312, and therefore his images fall into the mini-package following the images of the person with identifier 11123, and the person with identifier 333 is the double of the person with identifier 11123, etc. until the mini-packet is completely filled . For each of the people whose identifiers are contained in block A, the mini-package contains one or more images.

Using means 4, a neural network processes a set of facial images from a mini-packet (Fig. 1) to obtain vector representations and classification values. The indicated vector representations and classification values are the results obtained at the output of the neural network. In particular, at the output of a neural network, a vector representation is obtained for each image of a person’s face. A vector view shows the placement of images of people's faces in vector space. The resulting vector representations are passed through a classifier to obtain classification values.

When training a neural network using an identification signal, the classifier provides classification values for each of the classes (containing images of one person), and also selects the maximum classification values as the result for each image. Also, the known output (correct) classification values are presented to the neural network in accordance with the known approaches to training neural networks, and based on the comparison of the correct value and the classification value received from the classifier, an identification training signal is generated in means 5. As a method of generating an identification signal, various other methods can be used, such as Softmax, L2-Softmax, Proxy Loss, Sampled Softmax.

It is important to note that the obtained classification values are also used to search for twins using tools 6. By itself, obtaining classification values is computationally expensive and, if the values were found specifically only for searching for twins, this would greatly slow down the training of the neural network. But since these values have already been obtained during training using an identification signal, their repeated use for searching for doubles saves computation and speeds up network training time. In other embodiments, it is possible to search for twins based on classification values obtained for generating training signals other than identification ones, however, in this case, the quality of training of a neural network is reduced.

Thus, by reusing the values issued by the classifier for each of the people whose face images were placed in the current mini-package, its double is found. In other words, for the image of a person’s face passing through a neural network, there is such a classification value of the neural network that is as close as possible to it among all classification values related to identifiers of people who are not the same person whose face image was fed into the neural network. The identifier of the person associated with this found classification value is recorded in double list 2 as a twin identifier. This process is repeated at each iteration of training the neural network, so that the list of twins is first gradually filled with twins and at the same time the existing twins are updated, thus maintaining the relevance of the list of twins.

Vector representations obtained by means of 4, means of 7 are compared in pairs with each other using any suitable method that generates a verification signal, in particular using one of the known methods, for example Triplet Loss or Margin Based Loss. The verification signal allows you to extract information useful for training, which consists in the difference in the vector representation between pairs of examples (images) of one mini-package. Useful information extracted from pairwise comparisons of such vector representations of faces is used in training the network to modify these vector representations so that similar persons belonging to different people in the vector representation are farther apart (according to the cosine metric of proximity between the vectors), than the faces of the same person, but obtained under different conditions. As a result of these comparisons, a verification training signal is generated.

The proposed method for training a neural network using the visual similarity of people, in particular human faces, helps to train the neural network to work with "complex" images, which means images of similar people's faces. A neural network trained on complex examples containing images of very similar people's faces, when recognizing images of people's faces, allows you to use such vector representations of images that place images, belonging to one person is close to each other in vector space, and images belonging to different people are far from each other in vector space. Otherwise, i.e. when learning a neural network using examples containing very dissimilar people, it can be mistaken and for images of two similar people give such vector representations that will be close to each other, in particular closer than images belonging to one person, but received in significantly different conditions.

Further, by means of 8, the identification and verification training signals received by means of 5 and 7 are weighed with different weights and summed, and then the signals are sent to means 4, i.e. used in training a neural network by the backpropagation method. The identification signal teaches the neural network to distinguish between different people, and the verification signal, in addition to this, allows you to find similarities in different faces of one person. At this stage, one iteration of learning is completed.

After that, the quality of training of the neural network is evaluated, and if the quality of training has reached a given criterion, in particular, a criterion known from the prior art, then training is stopped, because the neural network is trained, if not, then they start another iteration of the training. The next iteration of training begins with means 3 with the formation of a mini-package again, i.e. from scratch, according to the proposed method, as described above.

The figure 3 presents the search result for twins of the proposed method of training a neural network, according to one embodiment of the invention. On each line on the left are three photographs of one person, and on the right are three photographs of his double found in the training base using the proposed method for training a neural network.

Thus, the proposed method solves the problem of searching for training examples for training neural networks, which subsequently provide high-precision recognition of faces, in particular, very similar people. In addition, the proposed method for training a neural network is characterized by a learning algorithm that increases the speed of learning, while practically not requiring additional memory to implement the proposed method.

It should also be noted that in other embodiments of the proposed method for training a neural network that does not need to be used for recognizing people's faces, it is possible to teach the neural network to recognize other features of a person’s appearance, for example, specific facial features, such as eyes, nose, etc. In this In this case, the term "double" will be understood as a person whose facial features are strongly similar to the same facial features of this person. In this case, the similarity criterion depends on the required learning outcome of the neural network.

In addition, instead of the neural network in other implementations that do not require high accuracy in the further recognition of people, simpler models (for example, Eigenfaces) can be used to search for training examples.

The present invention is not limited to the specific embodiments disclosed in the description for illustrative purposes, and covers all possible modifications and alternatives that fall within the scope of the present invention defined by the claims.

Claims

CLAIM

1. A method of training a neural network to recognize people's faces, according to which

provide a database with images of people's faces;

provide a list of doubles;

form a mini-package of images of people's faces by first including a set of images of people's faces from the database, and then adding for each person at least one image of which is included in the mini-package of at least one image of his double from the list of doubles , if there is a double and if the image of this double is not yet included in the mini-package, and if there is no double or if the image of the double is already included in the mini-package, add at least one image of another person from the database;

submit images of people's faces from the mini-packet to the input of the neural network;

generating verification and identification training signals using the results obtained at the output of the neural network;

train a neural network using a verification and identification training signal;

match each person as a double of another person using the indicated results with updating the list of doubles when receiving a pair of doubles that is not in the list of doubles;

repeat the above operations from the formation of the mini-package.

2. The method according to p. 1, according to which the training of the neural network is completed when the quality of training reaches a given criterion.

3. The method according to any one of paragraphs. 1-2, according to which they provide an empty list of doubles.

4. The method according to any one of paragraphs. 1-3, according to which the number of people whose images are added to the mini-package, and the number of images of each person whose images are added to the mini-package, is determined based on the hyperparameter of the learning algorithm.

5. The method according to p. 1, according to which all the images of each person in the database have an identifier.

6. The method according to claim 1, according to which people whose images are added to the mini-package from the database are randomly selected.