WO2020029406A1

WO2020029406A1 - Human face emotion identification method and device, computer device and storage medium

Info

Publication number: WO2020029406A1
Application number: PCT/CN2018/108251
Authority: WO
Inventors: 吴壮伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-08-07
Filing date: 2018-09-28
Publication date: 2020-02-13
Also published as: CN109190487A

Abstract

Disclosed are a human face emotion identification method, a device, a computer device and a storage medium. The method comprises obtaining an energy feature vector of each frame of a video image; calculating an Euclidean distance value between each energy feature vector and a standard energy feature vector; screening out a key frame image according to the Euclidean distance value; identifying human face emotion in each key frame image; and Obaining, according to the human face emotion in all key frame images, a human face emotion corresponding to the video image to complete identification of human face emotion.

Description

Facial emotion recognition method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on August 7, 2018, with application number 201810892915.6, and the invention name is "Facial Emotion Recognition Method, Device, Computer Equipment, and Storage Medium". Citations are incorporated in this application.

Technical field

The present application relates to the field of computer technology, and in particular, to a method, a device, a computer device, and a storage medium for facial emotion recognition.

Background technique

In people's daily life, 7% of the information is transmitted through language, 38% is transmitted through sound, and 55% is transmitted through facial expressions. It can be seen that facial expressions are an important carrier of human communication and an important way of non-verbal communication. It can express human emotions well.

In general, human emotions affect human behavior to a certain extent. For example, when drivers are in negative emotions such as anger, sadness, and anxiety, it is easy to ignore the surrounding road conditions and reduce the speed of response to emergency things, resulting in a high incidence of traffic accidents. Based on this, the behavior of drivers and other personnel can be guided by identifying facial emotions. For example, when the driver's facial emotions are identified, if the driver is identified as having a negative emotion, the driver can be prompted to adjust his emotional state to avoid a traffic accident. Therefore, how to accurately recognize facial emotions has become an urgent technical problem.

Summary of the invention

The present application provides a facial emotion recognition method, device, computer equipment, and storage medium to accurately recognize facial emotions.

In a first aspect, the present application provides a facial emotion recognition method, which includes: acquiring video images collected in real time; performing wavelet transformation on all frame images in the video image to obtain corresponding energy feature vectors; and obtaining standard energy Feature vector, and calculate a Euclidean distance value between each of the energy eigenvectors and the standard energy eigenvector according to an image difference calculation method; determine whether there is an Euclidean distance value that exceeds a preset threshold among the plurality of Euclidean distance values If there is a European-style distance value exceeding the preset threshold in the plurality of European-style distance values, an image corresponding to an energy feature vector exceeding the European-style distance value of the preset threshold is taken as a key frame image, wherein the key The number of frame images is at least one; obtaining a pre-stored emotion recognition model, and recognizing a face emotion in each of the key frame images based on the emotion recognition model; and according to all the faces in the key frame images Emotionally obtain a facial emotion corresponding to the video image to complete recognition of the facial emotion.

In a second aspect, the present application provides a facial emotion recognition device, which includes: an acquiring unit for acquiring a video image collected in real time; and a transform unit for performing wavelet transformation on all frame images in the video image to Obtain a corresponding energy feature vector; a distance calculation unit configured to obtain a standard energy feature vector, and calculate an Euclidean distance value between each of the energy feature vector and the standard energy feature vector according to an image difference calculation method; a distance judgment unit For determining whether there is an Euclidean distance value that exceeds a preset threshold among the plurality of Euclidean distance values; and a key frame acquisition unit, which is used for if there is an Euclidean distance value that exceeds the preset threshold among the plurality of Euclidean distance values Taking an image corresponding to an energy feature vector of a European distance value exceeding the preset threshold as a key frame image, wherein the number of the key frame images is at least one; an emotion recognition unit is configured to obtain a previously stored emotion recognition A model, and identifying facial emotions in each of the key frame images based on the emotion recognition model; and emotions Acquiring means for acquiring video images corresponding to the emotions based on all the face image in the key frame mood face to face emotion recognition is completed.

According to a third aspect, the present application further provides a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor is implemented when the computer program is executed. The facial emotion recognition method provided by the first aspect.

According to a fourth aspect, the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the first aspect. The described facial emotion recognition method.

The application provides a method, a device, a computer device, and a storage medium for facial emotion recognition. This method can accurately recognize facial emotions.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions of the embodiments of the present application more clearly, the drawings used in the description of the embodiments are briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. For ordinary technicians, other drawings can be obtained based on these drawings without paying creative work.

FIG. 1 is a schematic flowchart of a facial emotion recognition method according to an embodiment of the present application; FIG.

2 to 6 are another schematic flowchart of a facial emotion recognition method provided by an embodiment of the present application;

7 to 8 are specific schematic flowcharts of a facial emotion recognition method provided by an embodiment of the present application;

FIG. 9 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application; FIG.

10 is a schematic block diagram of a facial emotion recognition device according to an embodiment of the present application;

11 to 15 are another schematic block diagrams of a facial emotion recognition device according to an embodiment of the present application;

FIG. 16 is a schematic block diagram of a computer device according to an embodiment of the present application.

detailed description

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Please refer to FIG. 1, which is a schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. The facial emotion recognition method can be applied to a facial emotion recognition system, and the facial emotion recognition system can be installed in a device with a camera function such as a mobile phone or a car. The facial emotion recognition system can exist in the device as an independent system, or it can be embedded in other systems of the device. For example, a facial emotion recognition system can be embedded in a car driving system to identify the driver's emotions. For another example, the facial emotion recognition system may be embedded in an application program of a mobile phone to assist the application program to realize a facial emotion recognition function. As shown in FIG. 1, the facial emotion recognition method includes steps S101 to S107.

S101. Obtain a video image collected in real time.

When the user turns on the facial emotion recognition system to perform facial emotion recognition, the device where the facial emotion recognition system is located invokes a camera to perform real-time image acquisition of the user. The device acquires video images collected within a certain period of time through a camera. For example, capture video images within 10 seconds of real-time capture. It can be understood that the video image will include multiple frames of images.

Since the facial emotion recognition method needs to use information such as a neutral expression image, a standard energy feature vector, and an emotion recognition model when performing facial emotion recognition, the user uses the facial emotion recognition system for facial emotion recognition. Before, that is, before step S101, the facial emotion recognition system also needs to perform the following operations:

In an embodiment, as shown in FIG. 2, FIG. 2 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. Prior to step S101, steps S101a, S101b, and S101c are also included.

S101a. Acquire a neutral expression image.

S101b: Perform wavelet transform on the neutral expression image to obtain a corresponding standard energy feature vector.

S101c. Store the neutral expression image and a standard energy feature vector.

In the embodiment shown in FIG. 2, before performing facial emotion recognition, a neutral expression image and a standard energy feature vector need to be prepared in advance. The neutral expression may be a facial expression of the user in a relatively stable mood. For example, the facial expressions commonly used by users when taking ID photos can be understood as neutral expressions.

When the user uses the facial emotion recognition system for the first time, the device may issue a voice prompt or a text prompt to prompt the user to make a neutral expression. After the user has made the neutral expression, the image of the user's neutral expression is captured by the camera to obtain a neutral expression image.

Of course, neutral expression images can also be obtained in other ways. For example, when the user uses the facial emotion recognition system for the first time, a neutral expression image such as a photo of the user input is acquired. In other words, the user transfers the image of the neutral expression taken in the past into the device where the facial emotion recognition system is located as a neutral expression image. For another example, when a user uses the facial emotion recognition system for the first time, the identity information input by the user is obtained, and then the identity photo corresponding to the identity information is obtained from the background server as a neutral facial expression image. The background server may be The background server of the vehicle system, the background server of mobile phone applications, the background server of the facial emotion recognition system, etc. The background server can store the ID photos corresponding to the user's identity information, or call the third party after obtaining the identity information The server or obtains the ID photo corresponding to the user's identity information from the network data through technologies such as web crawlers. There are no restrictions on how to obtain neutral expression images.

After the neutral expression image is obtained, the neutral expression image is subjected to wavelet transform using Gabor wavelet transform to obtain a corresponding standard energy feature vector, and the neutral expression image and the corresponding standard energy feature vector are stored for the convenience of the user. When using the facial emotion recognition system for facial emotion recognition, the neutral expression image and the corresponding standard energy feature vector may be called for facial emotion recognition.

In an embodiment, as shown in FIG. 3, FIG. 3 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. Prior to step S101, steps S101d and S101e are also included.

S101d: Obtain an emotional training sample image set, where the emotional training sample image set includes a plurality of emotional training sample images and an emotional label of a human face in the emotional training sample image.

S101e: input the sentiment training sample image and the corresponding sentiment label into a convolutional neural network model for machine learning to obtain a sentiment recognition model, and store the sentiment recognition model.

In the embodiment shown in FIG. 3, before performing facial emotion recognition, an emotion recognition model needs to be prepared in advance. Specifically, the facial emotion recognition system needs to acquire a set of emotional training sample images. The emotional training sample image set includes a large number of emotional training sample images and an emotional label of a human face corresponding to each emotional training sample image. It should be noted that the emotional labels of the faces in each of the emotional training sample images can be labeled manually, or can be labeled by other methods, which are not specifically limited here.

After obtaining the emotional training sample image set, the emotional training sample image and the corresponding emotional labels of the human face are input to a convolutional neural network (English full name: Convolutional Neural Networks, CNN) model for machine learning to obtain emotional recognition Model, and then store the emotion recognition model in the device where the facial emotion recognition system is located, so as to facilitate the subsequent use of the facial emotion recognition system, the emotion recognition model can be called for emotion recognition.

In an embodiment, when the user activates the facial emotion recognition system for facial emotion recognition, if the camera of the device where the facial emotion recognition system is located cannot perform a real-time image acquisition of the user, for example, the angle of the camera No, the user's face information is not collected in the video images collected in real time, or only half of the user's face information is collected. Such video images will inevitably reduce the accuracy of facial emotion recognition during subsequent facial emotion recognition. . Therefore, in order to ensure that in the subsequent facial emotion recognition, a better video image of the face can be captured, and to improve the accuracy of subsequent facial emotion recognition, the camera needs to be calibrated before acquiring the real-time captured video image. work.

Specifically, as shown in FIG. 4, FIG. 4 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. Prior to step S101, steps S101f, S101g, S101h and S101j are also included.

S101f: Obtain a calibration video image collected in real time.

S101g: Extract a preset number of frames as a calibration image from a plurality of frames of the calibration video image according to a preset rule.

S101h: Based on a pre-stored face detection and recognition model, determine whether face information exists in the calibration image in each frame.

S101j: If there is no face information in the calibration image of at least one frame, issue a prompt message so that the user adjusts the angle of the camera according to the prompt information, and after adjusting the angle of the camera, return to step S101f, Until the face information exists in the calibration image of each frame.

In the embodiment shown in FIG. 4, the facial emotion recognition system needs to acquire a segment of a calibration video image collected in real time. It can be understood that the calibration video image includes multiple frames of images. Then, according to a preset extraction rule, an image with a preset number of frames is extracted from a plurality of frames of the calibration video image as a calibration image.

In one embodiment, the preset extraction rule may extract one image as a calibration image every 1 second. The preset frame number can be set to 100, and the preset frame number can be set according to actual requirements. In addition, the preset extraction rule may not be limited to the above-mentioned rules, and may be set according to actual requirements, and is not limited here. After obtaining multiple frames of calibration images, a pre-stored face detection and recognition model is obtained, and the face detection and recognition model is used to identify whether or not there is face information in the calibration image.

If it is determined through the face detection recognition model that there is face information in each calibration image, it means that the current camera angle is good and a good video image of the face can be taken. At this time, step S101 can be performed, that is, real-time acquisition is performed. Steps to capture video images.

If there is no face information in at least one frame of the calibration image, it means that the current camera angle is not good and needs to be adjusted. At this time, a prompt message can be sent by voice or display mode, so that the user can readjust the camera angle according to the prompt And after adjusting the camera angle, return to step S101f, that is, return to the step of obtaining a calibration video image acquired in real time, until the face information is present in each frame of the calibration image, thereby completing the camera Angle calibration.

Because the face detection and recognition model needs to be used when performing the camera angle calibration, the face detection and recognition model needs to be generated in advance and stored in the device where the face emotion recognition system is located. In an embodiment, as shown in FIG. 5, FIG. 5 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. Prior to step S101, steps S101k, S101m, S101n, and S101p are also included.

S101k: Obtain a training sample image set, where the training sample image set includes multiple training sample images and a face label used to characterize whether or not face information exists in the training sample image.

S101m: Obtain a face Hal feature vector of the training sample image.

S101n: input the face hal feature vector and face label corresponding to the training sample image into an Adaboost lifting model based on a decision tree model for training to obtain a face detection and recognition model.

S101p. Store the face detection and recognition model.

In the embodiment shown in FIG. 5, before performing facial emotion recognition, a face detection recognition model needs to be prepared in advance, so as to be used when performing camera angle calibration. Specifically, a training sample image set is first obtained, where the training sample image set includes multiple training sample images, and a face label corresponding to each training sample image. The face label is used to characterize whether there is face information in a corresponding face sample image. Then, the Hal feature extraction of the face is performed on each training sample image to obtain the Hal feature vector of the face corresponding to each training sample image. Then, the face hal feature vector corresponding to each training sample image and the corresponding face label are input to an Adaboost lifting model based on a decision tree model for training, and a face detection and recognition model can be obtained. Finally, the face detection and recognition model is stored in the device where the facial emotion recognition system is located.

S102. Perform wavelet transformation on all frame images in the video image to obtain corresponding energy feature vectors.

After obtaining the video image in step S101, wavelet transform is required for all frame images in the video image to obtain the energy feature vector corresponding to each frame image. In an embodiment, the wavelet transform may be, for example, a Gabor wavelet transform. Of course, the wavelet transform may also adopt other methods, which is not limited herein.

S103. Obtain a standard energy feature vector, and calculate a Euclidean distance value between each of the energy feature vectors and the standard energy feature vector according to an image difference calculation method.

The standard energy feature vector is an energy feature vector obtained by performing wavelet transform on a neutral expression image of a user collected in advance. In this embodiment, the standard energy feature vector is stored in advance in a device where the facial emotion recognition system is located. Because the standard energy feature vector is stored in the device in advance, obtaining the standard energy feature vector is specifically to obtain the previously stored standard energy feature vector.

After obtaining the standard energy feature vector, the Euclidean distance value between each energy feature vector and the standard energy feature vector in step S102 will be calculated according to the image difference calculation method.

It should be noted that, in this embodiment, since the standard energy feature vector is stored in advance in the device where the facial emotion recognition system is located, the standard energy feature vector can be directly called in step S103, thereby reducing the Occupation of CPU resources of the device on which the facial emotion recognition system is located, reducing calculation time, etc. Of course, in other embodiments, the device on which the facial emotion recognition system is located may only store the neutral expression image in advance. In this way, when obtaining the standard energy feature vector in step S103, the previously stored neutral expression image is obtained first, and then Wavelet transform is performed on the neutral expression image to obtain a standard energy feature vector, and there is no limitation on the time for calculating the standard energy feature vector.

S104. Determine whether there is an Euclidean distance value among the plurality of Euclidean distance values that exceeds a preset threshold.

After calculating the Euclidean distance value between each energy feature vector and the standard energy feature vector in step S103, a plurality of Euclidean distance values are obtained, and then it is determined whether there is an Euclidean distance value that exceeds a preset threshold among the plurality of Euclidean distance values. If there is an Euclidean distance value exceeding a preset threshold, it indicates that the difference between the facial expression and the neutral expression in the video image is large, and step S105 is executed at this time.

In an embodiment, as shown in FIG. 6, FIG. 6 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. When it is determined in step S104 that there is no Euclidean distance value exceeding a preset threshold, it means that the difference between the facial expression and the neutral expression in the current video image is small. At this time, step S108 will be executed, that is, the standard energy feature vector corresponding to The neutral expression image is used as the key frame image. Then, the subsequent steps S106 and S107 are performed. Of course, in other embodiments, if it is determined in step S104 that there is no Euclidean distance value exceeding a preset threshold, the facial emotion corresponding to the video image may also be set as a neutral emotion to complete the recognition of the facial emotion.

S105. If there is an Euclidean distance value exceeding the preset threshold among the plurality of Euclidean distance values, an image corresponding to an energy feature vector exceeding the Euclidean distance value exceeding the preset threshold is used as a key frame image, where the The number of key frame images is at least one.

In this embodiment, the number of Euclidean distance values exceeding a preset threshold may be one, or may be two or more. At this time, the number of key frame images is at least one.

S106. Obtain a pre-stored emotion recognition model, and recognize a facial emotion in each of the key frame images based on the emotion recognition model.

In this embodiment, the emotion recognition model is a model for recognizing facial emotions obtained by performing machine learning training in advance, and the emotion recognition module may be, for example, a convolutional neural network model. The device where the facial emotion recognition system is located first obtains the emotion recognition model, and then inputs the key frame image as an input value into the emotion recognition model. The emotion recognition model performs emotion recognition on the key frame images to output each key frame image. Face emotions.

Specifically, in an embodiment, as shown in FIG. 7, FIG. 7 is a specific schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. This step S106 includes steps S1061 to S1063.

S1061. Each of the key frame images is sequentially input as an input value into the emotion recognition model.

S1062. Obtain probability values of each of the key frame images output by the emotion recognition model on multiple preset emotions.

S1063. Use the emotion corresponding to the larger probability value among the multiple probability values corresponding to each of the key frame images as the facial emotion in the key frame image.

In the embodiment shown in FIG. 7, each key frame image is sequentially input as an input value into an emotion recognition model, and then the emotion recognition model outputs probability values of each key frame image on various preset emotions. For example, the multiple preset emotions include 7 preset emotions such as fear, anger, sorrow, disgust, joy, surprise, and neutrality. The emotion recognition model will recognize the probability of the face emotion in each key frame image on these 7 preset expressions. For example, the emotion recognition model recognizes the face emotion in a key frame image in the above 7 preset expressions. The emotional probabilities are 10%, 70%, 15%, 5%, 0%, 0%, and 0%.

Then, the emotion corresponding to the larger probability value among the multiple probability values corresponding to each key frame image is used as the facial emotion in the key frame image. For example, 70% of the anger emotions with the largest probability value are used as the facial emotions corresponding to a certain key frame image.

S107. Acquire the facial emotions corresponding to the video images according to the facial emotions in all the key frame images to complete the recognition of the facial emotions.

Specifically, in an embodiment, as shown in FIG. 8, FIG. 8 is a specific schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. This step S107 includes steps S1071 to S1072.

S1071. Perform probability statistics on the facial emotions in all the key frame images.

S1072. The facial emotion with a higher probability of occurrence is used as the facial emotion corresponding to the video image to complete the recognition of the facial emotion.

For example, the number of key frame images is 10, and the facial emotion of 8 key frame images is identified by emotion recognition model as anger, the facial emotion of 1 key frame image is aversion, and the facial emotion of one key frame image is For fear, by performing probability statistics on the facial emotions of the 10 key frame images, it can be concluded that the probability of the appearance of angry facial emotions is 80%, the probability of the appearance of disgusted facial emotions is 10%, and the fear of facial emotions The probability of occurrence is 10%. In this way, the anger emotion with a high probability of occurrence can be used as the facial emotion corresponding to the entire video image, thereby completing the recognition of the facial emotion within the time period corresponding to the video image.

In an embodiment, as shown in FIG. 9, FIG. 9 is another schematic flowchart of a facial emotion recognition method according to an embodiment of the present application. After step S107, steps S109 to S112 are also included.

S109. Record a time period corresponding to the video image and a facial emotion corresponding to the video image into an emotion list.

S110. Count the probability of the facial emotions belonging to the preset emotion category among the facial emotions corresponding to all the video images in the preset time period according to the emotion list.

S111. Determine whether a probability of a facial emotion belonging to a preset emotion category exceeds a preset probability value.

S112. If the probability of a facial emotion belonging to the preset emotion category exceeds the preset probability value, obtain a preset prompt mode and preset prompt information, and prompt the user with the preset according to the preset prompt mode. Prompt message.

For example, suppose the preset time period is 2 minutes, and the preset emotion category is the negative emotion category. The facial emotions included in the negative emotion category are four types of fear, anger, sadness, and disgust. At the same time, assuming that the number of video images in the emotion list within two minutes is 100, then there will be 100 facial emotions, and then the probability of facial emotions belonging to the negative emotion category among these 100 facial emotions will be counted, such as The probability is 99%. When the probability of facial emotions belonging to the category of negative emotions exceeds the preset probability value of 80%, it means that the user has been in negative emotions within these 2 minutes. At this time, the preset reminder method and pre-emption will be obtained. Set prompt information, and prompt the user with the preset prompt information according to the preset prompt mode. The preset prompt mode may be, for example, a voice prompt mode, a text display mode, a voice prompt and vibration combination mode, and the like. The preset prompt information may be, for example, "your current mood is low, please pay attention to driving safely" and the like.

The facial emotion recognition method in this embodiment can accurately recognize facial emotions.

An embodiment of the present application further provides a facial emotion recognition device, which is configured to execute any one of the foregoing facial emotion recognition methods. Specifically, please refer to FIG. 10, which is a schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition device 300 may be installed in a device such as a car or a mobile phone.

As shown in FIG. 10, the facial emotion recognition device 300 includes an acquisition unit 301, a transformation unit 302, a distance calculation unit 303, a distance judgment unit 304, a key frame acquisition unit 305, an emotion recognition unit 306 and an emotion acquisition unit 307.

The obtaining unit 301 is configured to obtain a video image collected in real time.

In an embodiment, as shown in FIG. 11, FIG. 11 is another schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition apparatus 300 further includes a storage unit 308.

The obtaining unit 301 is further configured to obtain a neutral expression image.

The transform unit 302 is further configured to perform wavelet transform on the neutral expression image to obtain a corresponding standard energy feature vector.

The storage unit 308 is configured to store the neutral expression image and the standard energy feature vector.

In an embodiment, as shown in FIG. 12, FIG. 12 is another schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition apparatus 300 further includes an emotion model training unit 309.

The obtaining unit 301 is further configured to obtain an emotional training sample image set, where the emotional training sample image set includes a plurality of emotional training sample images and an emotional label of a human face in the emotional training sample image.

An emotional model training unit 309 is configured to input the emotional training sample image and a corresponding emotional label into a convolutional neural network model and perform machine learning to obtain an emotional recognition model, and store the emotional recognition model.

In an embodiment, as shown in FIG. 13, FIG. 13 is another schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition device 300 further includes an extraction unit 310, a face determination unit 311, and a prompting unit 312.

The obtaining unit 301 is further configured to obtain a calibration video image collected in real time.

The extraction unit 310 is configured to extract an image of a preset number of frames from a plurality of frames of the calibration video image according to a preset rule as a calibration image.

The face judging unit 311 is configured to determine whether face information exists in the calibration image in each frame based on a face detection and recognition model stored in advance.

The obtaining unit 301 is further configured to obtain real-time collected video images if face information exists in the calibration image in each frame.

A prompting unit 312 is configured to send a prompting message if the face information does not exist in at least one frame of the calibration image, so that the user can adjust the angle of the camera according to the prompting information, and obtain the unit after adjusting the angle of the camera. Step 301 returns to performing the step of acquiring a calibration video image collected in real time, until face information exists in the calibration image in each frame.

Correspondingly, in an embodiment, as shown in FIG. 14, FIG. 14 is another schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition apparatus 300 further includes a vector acquisition unit 313 and a face model training unit 314.

The obtaining unit 301 is further configured to obtain a training sample image set, where the training sample image set includes a plurality of training sample images and a face label used to characterize whether there is face information in the training sample image.

A vector obtaining unit 313 is configured to obtain a face Hal feature vector of the training sample image.

A face model training unit 314, configured to input a face Hal feature vector and a face label corresponding to the training sample image into an Adaboost lifting model based on a decision tree model for training to obtain a face detection and recognition model, and The face detection recognition model is stored.

A transformation unit 302 is configured to perform wavelet transformation on all frame images in the video image to obtain corresponding energy feature vectors.

The distance calculation unit 303 is configured to obtain a standard energy feature vector, and calculate a Euclidean distance value between each of the energy feature vectors and the standard energy feature vector according to an image difference calculation method.

The distance judging unit 304 is configured to judge whether there is an Euclidean distance value among the plurality of Euclidean distance values that exceeds a preset threshold.

A key frame obtaining unit 305 is configured to use, as a key frame, an image corresponding to an energy feature vector of the Euclidean distance value exceeding the preset threshold value if there are Euclidean distance values exceeding the preset threshold value among the plurality of Euclidean distance values. An image, wherein the number of the key frame images is at least one.

In an embodiment, the key frame obtaining unit 305 is further configured to, if there is no Euclidean distance value exceeding the preset threshold value among the plurality of Euclidean distance values, convert the neutral expression image corresponding to the standard energy feature vector As the key frame image.

The emotion recognition unit 306 is configured to obtain a pre-stored emotion recognition model, and recognize a facial emotion in each of the key frame images based on the emotion recognition model.

Specifically, in an embodiment, the emotion recognition unit 306 is specifically configured to: sequentially input each of the key frame images as an input value into the emotion recognition model; and obtain each of the information output by the emotion recognition model. The probability value of the key frame image on a plurality of preset emotions; and the emotion corresponding to the larger probability value among the plurality of probability values corresponding to each of the key frame images as the facial emotion in the key frame image .

The emotion acquiring unit 307 is configured to acquire the facial emotion corresponding to the video image according to the facial emotions in all the key frame images to complete the recognition of the facial emotion.

Specifically, in an embodiment, the emotion acquiring unit 307 is specifically configured to: perform probability statistics on the facial emotions in all the key frame images; and use the facial emotions with a higher probability of occurrence as the video image corresponding Facial emotions to complete recognition of facial emotions.

In an embodiment, as shown in FIG. 15, FIG. 15 is another schematic block diagram of a facial emotion recognition device according to an embodiment of the present application. The facial emotion recognition device 300 further includes a recording unit 315, a statistics unit 316, a probability judgment unit 317, and an information prompting unit 318.

The recording unit 315 is configured to record a time period corresponding to the video image and a facial emotion corresponding to the video image into an emotion list.

The statistics unit 316 is configured to count, according to the emotion list, a probability of a facial emotion belonging to a preset emotional category among facial emotions corresponding to all the video images in a preset time period.

The probability judging unit 317 is configured to judge whether a probability of a facial emotion belonging to a preset emotion category exceeds a preset probability value.

An information prompting unit 318 is configured to obtain a preset prompting method and preset prompting information if the probability of a face emotion belonging to the preset emotion category exceeds the preset probability value, and provide the user with a preset prompting method according to the preset prompting method. Prompt the preset prompt information.

It should be noted that those skilled in the art can clearly understand that the specific implementation process of the aforementioned facial emotion recognition device 300 and each unit can refer to the corresponding description in the foregoing embodiment of the facial emotion recognition method, for convenience of description And brevity, will not repeat them here.

The facial emotion recognition device 300 in this embodiment can accurately recognize facial emotions.

The above-mentioned facial emotion recognition device can be implemented in the form of a computer program, which can be run on a computer device as shown in FIG. 16.

Please refer to FIG. 16, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a terminal such as a mobile phone, or may be a device used in a car.

Referring to FIG. 16, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501. The memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions. When the program instructions are executed, the processor 502 can execute a method for facial emotion recognition. The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500. The internal memory 504 provides an environment for running a computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute a method for facial emotion recognition. The network interface 505 is used for network communication, such as sending assigned tasks. Those skilled in the art can understand that the structure shown in FIG. 16 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment 500 to which the solution of the present application is applied. The specific computer equipment 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.

The processor 502 is configured to run a computer program 5032 stored in a memory to achieve the following functions: acquiring video images collected in real time; performing wavelet transformation on all frame images in the video image to obtain corresponding energy characteristics Vector; obtaining a standard energy feature vector, and calculating an Euclidean distance value between each of the energy feature vectors and the standard energy feature vector according to an image difference calculation method; determining whether a plurality of the Euclidean distance values exceed a preset value Threshold Euclidean distance value; if there is an Euclidean distance value exceeding the preset threshold among the plurality of Euclidean distance values, an image corresponding to an energy feature vector exceeding the Euclidean distance value exceeding the preset threshold is taken as a key frame image, Wherein, the number of the key frame images is at least one; acquiring a pre-stored emotion recognition model, and recognizing a facial emotion in each of the key frame images based on the emotion recognition model; and according to all the key frames The facial emotion in the image acquires the facial emotion corresponding to the video image to complete the recognition of the facial emotion. do not.

In an embodiment, before executing the real-time acquisition of the video image, the processor 502 also implements the following functions: acquiring a neutral expression image; performing wavelet transform on the neutral expression image to obtain a corresponding standard energy feature vector; and The neutral expression image and the standard energy feature vector are stored.

In an embodiment, the processor 502 also implements the following function before acquiring video images acquired in real time: acquiring an emotional training sample image set, wherein the emotional training sample image set includes a plurality of emotional training sample images and the Emotion labels of human faces in the emotion training sample image; and inputting the emotion training sample images and corresponding emotion labels into a convolutional neural network model for machine learning to obtain an emotion recognition model, and storing the emotion recognition model.

In an embodiment, the processor 502 also implements the following function before acquiring video images acquired in real time: acquiring a training sample image set, wherein the training sample image set includes multiple training sample images and is used to characterize the Whether there is a face label of the face information in the training sample image; obtaining the face Hal feature vector of the training sample image; and inputting the face Hal feature vector corresponding to the training sample image and the face label into a decision-based Training in the Adaboost lifting model of the tree model to obtain a face detection and recognition model; and storing the face detection and recognition model.

In an embodiment, before executing the real-time video image acquisition, the processor 502 also implements the following functions: acquiring a real-time calibration video image; and extracting a pre-defined video from a plurality of frames of the calibration video image according to a preset rule. Set the number of frames as the calibration image; determine whether there is face information in the calibration image in each frame based on the pre-stored face detection recognition model; if face information exists in the calibration image in each frame, perform acquisition A step of acquiring a video image in real time; if there is no face information in the calibration image of at least one frame, sending a prompt message so that the user can adjust the angle of the camera according to the prompt information, and after adjusting the angle of the camera, Return to the step of obtaining a calibration video image acquired in real time, until the face information exists in the calibration image for each frame.

In an embodiment, when the processor 502 recognizes the facial emotion in each of the key frame images based on the emotion recognition model, the processor 502 specifically implements the following function: inputting each of the key frame images as an input value in sequence To the emotion recognition model; obtaining probability values of each of the key frame images output by the emotion recognition model on a plurality of preset emotions; and among the plurality of probability values corresponding to each of the key frame images The emotion corresponding to the larger probability value is used as the facial emotion in the key frame image.

In an embodiment, when the processor 502 executes acquiring facial emotions corresponding to the video image according to the facial emotions in all the key frame images to complete the recognition of the facial emotions, it specifically implements the following functions: Probability statistics are performed on the facial emotions in the key frame image; and facial emotions with a higher probability of occurrence are used as facial emotions corresponding to the video image to complete the recognition of facial emotions.

It should be understood that, in the embodiment of the present application, the processor 502 may be a central processing unit, and the processor 502 may also be other general-purpose processors, digital signal processors, application specific integrated circuits, ready-made programmable gate arrays, or other programmable logic. Devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

A person of ordinary skill in the art can understand that all or part of the processes in the embodiment of the method for recognizing facial emotions described above can be performed by a computer program instructing related hardware. The computer program may be stored in a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the embodiment including the facial emotion recognition method as described above.

The storage medium may be various media that can store program codes, such as a U disk, a mobile hard disk, a read-only memory (ROM, Read-Only Memory), a magnetic disk, or an optical disk.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the division of each unit is only a logical function division, and there may be another division manner in actual implementation. The steps in the method of the embodiment of the present application can be adjusted, combined, and deleted according to actual needs. The units in the apparatus of the embodiment of the present application may be combined, divided, and deleted according to actual needs. Each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of this application is essentially a part that contributes to the existing technology, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium Included are instructions for causing a computer device (which may be a personal computer, a terminal, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, and these modifications or replacements should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A facial emotion recognition method includes:

Obtain video images collected in real time;

Performing wavelet transformation on all frame images in the video image to obtain corresponding energy feature vectors;

Obtaining a standard energy feature vector, and calculating a Euclidean distance value between each of the energy feature vector and the standard energy feature vector according to an image difference calculation method;

Judging whether there is an Euclidean distance value that exceeds a preset threshold among the plurality of Euclidean distance values;

If there is a European-style distance value exceeding the preset threshold in the plurality of European-style distance values, an image corresponding to an energy feature vector exceeding the European-style distance value of the preset threshold is used as a key frame image, where the key frame The number of images is at least one;

Acquiring a pre-stored emotion recognition model, and recognizing a face emotion in each of the key frame images based on the emotion recognition model; and

The facial emotions corresponding to the video images are acquired according to the facial emotions in all the key frame images to complete the recognition of the facial emotions.
The method for facial emotion recognition according to claim 1, further comprising: obtaining a neutral expression image; and performing wavelet transform on the neutral expression image to obtain a corresponding standard before the acquisition of the video image collected in real time. An energy feature vector; and storing the neutral expression image and a standard energy feature vector.
The method for facial emotion recognition according to claim 1, before the acquiring the video images collected in real time, further comprising: acquiring an emotional training sample image set, wherein the emotional training sample image set includes a plurality of emotional training A sample image and an emotional label of a face in the emotional training sample image; and inputting the emotional training sample image and a corresponding emotional label into a convolutional neural network model for machine learning to obtain an emotional recognition model, and storing the emotional recognition model Emotion recognition model.
The method for facial emotion recognition according to claim 1, before the acquiring the video images collected in real time, further comprising: acquiring a training sample image set, wherein the training sample image set includes a plurality of training sample images and A face label used to characterize whether or not face information exists in the training sample image; obtaining a face Hal feature vector of the training sample image; and combining the face Hal feature vector and the face corresponding to the training sample image The labels are input to an Adaboost lifting model based on a decision tree model for training to obtain a face detection and recognition model; and the face detection and recognition model is stored.
The method for facial emotion recognition according to claim 4, wherein before the acquiring a video image acquired in real time, further comprising: acquiring a calibrated video image acquired in real time; and selecting from a plurality of frames of the calibrated video image according to The preset rule extracts an image of a preset number of frames as a calibration image; based on a pre-stored face detection recognition model, determines whether face information exists in the calibration image of each frame; if a person exists in the calibration image of each frame Face information, performing the step of acquiring video images collected in real time; if no face information exists in the calibration image of at least one frame, a prompt message is sent to enable the user to adjust the angle of the camera according to the prompt information, and after adjusting the After the angle of the camera, return to the step of acquiring a calibration video image acquired in real time, until the face information exists in the calibration image for each frame.
The method for facial emotion recognition according to claim 1, wherein the identifying facial emotions in each of the key frame images based on the emotion recognition model comprises: sequentially using each of the key frame images as an input Inputting values into the emotion recognition model; obtaining probability values of each of the key frame images output by the emotion recognition model on a plurality of preset emotions; and a plurality of probabilities corresponding to each of the key frame images The emotion corresponding to the larger probability value among the values is used as the facial emotion in the key frame image.
The facial emotion recognition method according to claim 1, wherein the acquiring facial emotions corresponding to the video image based on the facial emotions in all the key frame images to complete the recognition of the facial emotions comprises: Probability statistics are performed on the facial emotions in all the key frame images; and the facial emotions with a higher probability of occurrence are used as the facial emotions corresponding to the video images to complete the recognition of the facial emotions.
The method for facial emotion recognition according to claim 2, wherein after determining whether there is an Euclidean distance value exceeding a preset threshold among the plurality of Euclidean distance values, further comprising: if a plurality of the Euclidean distance values There is no Euclidean distance value exceeding the preset threshold, the neutral expression image corresponding to the standard energy feature vector is used as the key frame image, and the execution is returned to obtain a previously stored emotion recognition model, and based on the emotion The step of identifying the facial emotion in each of the key frame images by the recognition model.
The method for facial emotion recognition according to claim 2, wherein after determining whether there is an Euclidean distance value exceeding a preset threshold among the plurality of Euclidean distance values, further comprising: if a plurality of the Euclidean distance values There is no Euclidean distance value exceeding the preset threshold, and the facial emotion corresponding to the video image is set to be a neutral emotion to complete the recognition of the facial emotion.
The facial emotion recognition method according to claim 1, wherein after the obtaining the facial emotion corresponding to the video image according to the facial emotion in all the key frame images, further comprising: converting the video image The corresponding time period and the facial emotions corresponding to the video image are recorded into the emotion list; according to the emotional list, statistics are collected on the facial emotions corresponding to all the video images in the preset time period that belong to the preset emotion category. The probability of a facial emotion; determining whether the probability of a facial emotion belonging to the preset emotion category exceeds a preset probability value; if the probability of a facial emotion belonging to the preset mood category exceeds the preset probability value, obtaining Preset the prompt mode and the preset prompt information, and prompt the user with the preset prompt information according to the preset prompt mode.
A facial emotion recognition device includes:

An acquisition unit for acquiring a video image collected in real time;

A transformation unit, configured to perform wavelet transformation on all frame images in the video image to obtain corresponding energy feature vectors;

A distance calculation unit, configured to obtain a standard energy feature vector, and calculate an Euclidean distance value between each of the energy feature vectors and the standard energy feature vector according to an image difference calculation method;

A distance judging unit, configured to determine whether a plurality of Euclidean distance values has an Euclidean distance value exceeding a preset threshold;

A key frame acquisition unit is configured to: if a plurality of Euclidean distance values exists, and an Euclidean distance value exceeding the preset threshold exists, use an image corresponding to an energy feature vector of the Euclidean distance value exceeding the preset threshold as a key frame image , Wherein the number of the key frame images is at least one;

An emotion recognition unit, configured to obtain a pre-stored emotion recognition model, and recognize a facial emotion in each of the key frame images based on the emotion recognition model; and

An emotion acquiring unit is configured to acquire a facial emotion corresponding to the video image according to facial emotions in all the key frame images, so as to complete recognition of facial emotions.
A computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein when the processor executes the computer program, the following steps are implemented: A video image; performing wavelet transform on all frame images in the video image to obtain corresponding energy feature vectors; obtaining a standard energy feature vector, and calculating each of the energy feature vector and the standard energy feature according to an image difference calculation method Euclidean distance values between vectors; judging whether there are Euclidean distance values that exceed a preset threshold among the plurality of Euclidean distance values; if there are Euclidean distance values that exceed the preset threshold among the plurality of Euclidean distance values, An image corresponding to the energy feature vector of the Euclidean distance value exceeding the preset threshold is taken as a key frame image, where the number of the key frame images is at least one; a pre-stored emotion recognition model is obtained, and based on the emotion recognition The model identifies facial emotions in each of said keyframe images; and according to all said keys The facial emotion in the frame image acquires the facial emotion corresponding to the video image to complete the recognition of the facial emotion.
The computer device according to claim 12, wherein before the processor executes acquiring a video image acquired in real time, it further implements the steps of: acquiring a neutral expression image; and performing wavelet transform on the neutral expression image to obtain a corresponding A standard energy feature vector; and storing the neutral expression image and the standard energy feature vector.
The computer device according to claim 12, wherein before the processor executes acquiring video images acquired in real time, the processor further implements the step of: acquiring an emotional training sample image set, wherein the emotional training sample image set includes a plurality of emotions A training sample image and an emotional label of a face in the emotional training sample image; and inputting the emotional training sample image and a corresponding emotional label into a convolutional neural network model for machine learning to obtain an emotional recognition model, and storing the The emotion recognition model is described.
The computer device according to claim 12, wherein before the processor executes acquiring video images acquired in real time, the processor further implements the step of: acquiring a training sample image set, wherein the training sample image set includes a plurality of training sample images And a face label for characterizing whether or not face information exists in the training sample image; obtaining a face Hal feature vector of the training sample image; and combining the face Hal feature vector corresponding to the training sample image and a person The face label is input to an Adaboost lifting model based on a decision tree model for training to obtain a face detection recognition model; and storing the face detection recognition model.
The computer device according to claim 15, wherein before the processor executes acquiring a video image acquired in real time, it further implements the steps of: acquiring a calibration video image acquired in real time; and from a plurality of frames of the calibration video image Extract a preset number of frames as a calibration image according to a preset rule; based on a pre-stored face detection and recognition model, determine whether face information exists in the calibration image of each frame; if the calibration image exists in each frame For face information, perform the step of obtaining a video image collected in real time; if no face information exists in the calibration image of at least one frame, a prompt message is sent to enable the user to adjust the angle of the camera according to the prompt information and adjust the camera After describing the angle of the camera, return to the step of obtaining a calibration video image acquired in real time, until face information is present in the calibration image for each frame.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps: acquiring a video image acquired in real time; All frame images in the video image are wavelet transformed to obtain corresponding energy feature vectors; a standard energy feature vector is obtained, and a European-style between each of the energy feature vectors and the standard energy feature vector is calculated according to an image difference calculation method. Distance value; judging whether there is a European-style distance value that exceeds a preset threshold in a plurality of European-style distance values; if a European-style distance value that exceeds the preset threshold exists in a plurality of European-style distance values, the preset value will be exceeded An image corresponding to the energy feature vector of the threshold Euclidean distance value is used as a key frame image, where the number of the key frame images is at least one; a pre-stored emotion recognition model is obtained, and each of the images is identified based on the emotion recognition model. The face emotions in the keyframe image; and the faces in all the keyframe images Xu acquiring the video image corresponding to the emotional face to face emotion recognition is completed.
The computer-readable storage medium of claim 17, wherein the computer program, before being executed by a processor to acquire a video image acquired in real time, further causes the processor to perform the steps of: acquiring a neutral expression image; Performing wavelet transform on the neutral expression image to obtain a corresponding standard energy feature vector; and storing the neutral expression image and the standard energy feature vector.
The computer-readable storage medium of claim 17, wherein the computer program, before being executed by a processor to acquire video images acquired in real time, further causes the processor to perform the following steps: acquiring a set of emotional training sample images, wherein The emotional training sample image set includes a plurality of emotional training sample images and emotional labels of faces in the emotional training sample image; and inputting the emotional training sample images and corresponding emotional labels into a convolutional neural network model Machine learning is performed to obtain an emotion recognition model, and the emotion recognition model is stored.
The computer-readable storage medium according to claim 17, wherein the computer program, before being executed by a processor to acquire video images acquired in real time, further causes the processor to perform the following steps: acquiring a training sample image set, wherein, The training sample image set includes a plurality of training sample images and a face label used to characterize whether or not there is face information in the training sample image; obtaining a face Hal feature vector of the training sample image; The face hal feature vector corresponding to the sample image and the face label are input into an Adaboost lifting model based on a decision tree model for training to obtain a face detection recognition model; and the face detection recognition model is stored.