[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112699785B - Group emotion recognition and abnormal emotion detection method based on dimension emotion model - Google Patents

Group emotion recognition and abnormal emotion detection method based on dimension emotion model Download PDF

Info

Publication number
CN112699785B
CN112699785B CN202011601643.3A CN202011601643A CN112699785B CN 112699785 B CN112699785 B CN 112699785B CN 202011601643 A CN202011601643 A CN 202011601643A CN 112699785 B CN112699785 B CN 112699785B
Authority
CN
China
Prior art keywords
emotion
group
image
abnormal
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011601643.3A
Other languages
Chinese (zh)
Other versions
CN112699785A (en
Inventor
潘磊
王艾
赵欣
刘国春
高大鹏
袁小珂
严宏
马婷
朱建刚
严崇耀
卢志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation Flight University of China
Original Assignee
Civil Aviation Flight University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation Flight University of China filed Critical Civil Aviation Flight University of China
Priority to CN202011601643.3A priority Critical patent/CN112699785B/en
Publication of CN112699785A publication Critical patent/CN112699785A/en
Application granted granted Critical
Publication of CN112699785B publication Critical patent/CN112699785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which relates to the technical field of intelligent emotion recognition, and is characterized in that a video data set of group emotion is created through data collection and manual marking based on a cognitive psychology PAD three-dimensional emotion model, and the position relation of six typical emotions in a PAD space is disclosed; creating an emotion prediction model based on group behaviors, and mapping group motion characteristics into three-dimensional coordinates in a PAD space; and constructing an abnormal emotion classifier, and judging that the scene has an abnormal state when two abnormal emotions, namely anger and fear, are detected. Aiming at group motion videos, the method and the device can accurately express the continuous change state of group emotion and can effectively identify the global abnormal state.

Description

Group emotion recognition and abnormal emotion detection method based on dimension emotion model
Technical Field
The invention relates to the technical field of intelligent emotion recognition, in particular to a group emotion recognition and abnormal emotion detection method based on a dimension emotion model.
Background
In recent years, with the continuous development of artificial intelligence, deep learning, psychological science and cognitive science, the computer is used for identifying, understanding, expressing and communicating human emotions, so that the computer has more comprehensive and higher-level intelligent degree and is more and more attracted by extensive attention and deep exploration in academia. For the intelligent video monitoring technology, the language communication, the facial expression and the limb movement of the crowd in the scene are collected, the joy, anger and sadness of the crowd are understood and experienced, the emotional state and the internal intention of the crowd are analyzed, the next action attempt is deduced, the computer makes corresponding feedback, and therefore the intelligent video monitoring technology has the communication capability of the emotional level. As one of the important development directions of the future intelligent monitoring technology, the emotion analysis and emotion recognition method based on vision has very important academic research value.
Of course, there are many ways in which human information media can convey emotion, including words, language, facial expressions, body behaviors, etc. Although both speech and facial expressions can abundantly express human emotions, speech signals are difficult to clearly capture in noisy public places. Meanwhile, in consideration of the high crowdedness and dynamic change of dense scenes, it is difficult for the existing video analysis technology to accurately locate each person's face from a crowded crowd and accurately extract facial expressions. Therefore, emotion analysis based on face tracking and face feature extraction is difficult to achieve ideal effects in dense scenes. Therefore, a feasible approach is to identify and evaluate the emotional state of the population by analyzing the body behaviors of the population in the monitoring video.
It is worth noting that currently, for emotional analysis of limb movement in academic communities, individual individuals are often used as research objects, and the emphasis is placed on mining and identifying individual posture features and emotional expressions thereof. However, unlike individual exercise, group behaviors have their unique internal structures and abundant external forms under the combined action of subjective factors, environmental factors, social factors and psychological factors. On one hand, the individuals communicate and cooperate with each other through information, so that the group presents certain tendency and integrity; on the other hand, the movement of individuals has certain autonomy and randomness, and the group shows certain disordering and unstructured characteristics. From the perspective of social psychology, in a dense crowd scene, the individual psychology is influenced by the surrounding environment, certain independence is lost, certain dependence is formed on the companions, the emotional state of the companions gradually tends to be consistent with the crowd, and a collective and subordinate psychological state is formed. Therefore, in consideration of the specificity of dense scenes and the uniqueness of group psychology, it is necessary to explore specific methods and strategies for analyzing group emotional states.
Emotion recognition methods based on group behaviors are currently mainly classified into two types: the method comprises a discrete model-based identification method and a basic A-V two-dimensional emotion model identification method. However, both of these methods have some disadvantages. First, unlike a simple piece of speech or an image, the content presented by surveillance video is very rich. The method has active group movement, complex group emotion and certain plot change. Therefore, the discrete emotion model can only identify a few typical scenes with single shapes and high recognition, and the covered specific emotion types are limited and deficient for dense people. In addition, the group feelings have many subtle features, and are manifested as a combination of multiple emotions. And emotions can change continuously over time. The characteristics of group emotion cannot be effectively expressed by a discrete model. Second, the A-V two-dimensional emotional model is measured primarily from two dimensions, Arousal and Valence. Wherein, Arousal reflects the intensity of the emotional state, and Valence reflects the type of the emotional state. However, the description form of two dimensions is still slightly simpler compared with a three-dimensional emotion model, for example, the document adopts an A-V two-dimensional emotion model, and only four emotion categories are distinguished. For complex group emotions, this is clearly insufficient. Third, the A-V emotion model cannot distinguish certain emotions (e.g., anger and fear both belong to the higher-dominance emotion of Arousal), but the PAD three-dimensional emotion model can effectively distinguish (anger belongs to the higher-dominance emotion and fear belongs to the lower-dominance emotion).
In order to solve the problems, the application provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, and the PAD dimension model is used as a basis to express the group emotion as a three-dimensional coordinate point in an emotion space so as to realize accurate expression of complex emotion.
Disclosure of Invention
The invention aims to provide a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which is based on a PAD dimension model and expresses group emotion as a three-dimensional coordinate point in an emotion space so as to realize accurate expression of complex emotion.
The invention provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which comprises the following steps of:
s1: establishing a PAD three-dimensional emotion model based on group emotion: the method comprises three dimensions of a pleasure degree P, an activation degree A and an dominance degree D, wherein the value of each dimension is between-1 and +1, and a PAD emotion scale is set for reference of emotion dimensions;
s2: establishing a group behavior and group emotion data set: aiming at video data of different scenes, acquiring a standard video data set through a manual labeling strategy based on a cognitive psychology principle;
s3: and (3) counting a group emotion data set: according to a standard video data set, defining emotion types of videos, marking the videos as videos with the same emotion, normalizing PAD values of the videos to be between [ -1, 1], and determining values of the emotion in PAD space by calculating central points of coordinates;
s4: evaluating the group emotion data set: whether the labeling data are consistent or not is checked, whether the labeling data obey Gaussian distribution or not is verified and analyzed by adopting a Normplot function in a Matlab tool, and if the labeling data are not obey Gaussian distribution, the output image is bent;
s5: group emotion recognition and abnormal emotion detection: extracting group motion characteristics from the video, and expressing layer semantics in group motion;
s6: extracting and regressing the population emotional characteristics: a Support Vector Regression (SVR) is adopted, under the support of a training data set, an optimal hyperplane is searched, and a regression function is obtained on the basis of restraining the minimization of the structured risk;
s7: detection of abnormal emotional states: and taking the PAD value of each marked sample as an input, and training by a Support Vector Machine (SVM).
Further, in step S2, an emotion labeling system is designed according to the manual labeling strategy, and the system represents a P-dimensional value by the facial expression of the character model, represents an a-dimensional value by the vibration degree of the heart, and represents a D-dimensional value by the size of the small person.
Further, the method for determining consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:
Figure GDA0003559410720000041
if the variation coefficient is small, the consistency of the verification marking data is low; otherwise, the consistency of the verification marking data is high.
Further, the step S5 extracts a populationThe motion characteristics comprise extraction of a foreground area, extraction of optical flow characteristics, extraction of track characteristics and graphical expression of the motion characteristics; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as Rt(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of Gunner Farnembeck to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)1,p2..pL) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph.
Further, each track in the global motion track graph is represented by a solid line, and each track comprises three attribute features<T(p1,p2…pL),L,gi>(ii) a Wherein, T (p)1,p2…pL) Representing a number of tracking points p constituting a trajectoryiL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, giIs represented as follows:
Figure GDA0003559410720000042
wherein i belongs to [1, L-1 ].
Further, the expression of the level semantics in the group motion in the step S5 is deeply analyzed by using a gray level co-occurrence matrix, and the adopted statistics include variance, contrast, second moment, entropy, correlation and reciprocal difference moment;
the variance is used for reflecting the gray level change degree of the image, when the variance is larger, the gray level change of the image is larger, and the calculation formula of the variance is as follows:
Figure GDA0003559410720000051
wherein,
Figure GDA0003559410720000052
the contrast is used for measuring the value distribution of the matrix and the local variation in the image and reflecting the definition of the image and the depth of the texture, and a calculation formula of the contrast is as follows:
Figure GDA0003559410720000053
the second moment is used for measuring the gray change stability degree of the image texture and reflecting the gray distribution uniformity degree and the texture thickness degree of the image, if the value of the second moment is larger, the texture mode is in a uniform and regular change, and the calculation formula of the second moment is as follows:
Figure GDA0003559410720000054
the entropy is used for measuring the randomness of the information content of the image and reflecting the complexity of the gray level distribution of the image, and the calculation formula of the entropy is as follows:
Figure GDA0003559410720000055
the correlation is used for measuring the similarity of the elements of the space gray level co-occurrence matrix in the row or column direction and reflecting the consistency of image textures, and a calculation formula of the correlation is as follows:
Figure GDA0003559410720000056
the reciprocal difference moment is used for reflecting the homogeneity of the image texture and measuring the local change of the image texture, if the value is large, the change is absent among different areas of the image texture, the local uniformity is realized, and the calculation formula of the reciprocal difference moment is as follows:
Figure GDA0003559410720000057
further, the regression function of step S6 is as follows:
Figure GDA0003559410720000058
Figure GDA0003559410720000061
wherein, omega is a weight vector, C is a balance coefficient,
Figure GDA0003559410720000067
ξiin order to be a function of the relaxation variable,
Figure GDA0003559410720000062
for non-linear transformations mapping data to high dimensional space, b is the bias term and ε is the sensitivity;
introducing Lagrange multiplier, and converting the formula (10) into:
Figure GDA0003559410720000063
wherein,
Figure GDA0003559410720000064
the regression function finally found was:
Figure GDA0003559410720000065
wherein, k (x, x)i) Is a kernel function;
adopting a radial basis kernel function RBF, and the expression is as follows:
k(xi,xj)=exp(-||xi-xj||2/2σ2) (13)
and obtaining a regression model after training to realize dimension emotion prediction, predicting a continuous value of each video section in the PAD space, and when the group emotion changes along with time, expressing the group emotion as a continuous three-dimensional track so as to present a gradual emotion process.
Further, the detection of the abnormal emotional state in step S7 obtains a quadratic equation of the SVM hyperplane, which is expressed as:
Figure GDA0003559410720000066
wherein s.t.wTΦ(xi)≥ρ-ξi,ξi≥0,xiTraining set data w representing i ═ {1,2, … N }TΦ(xi) -0 max decision hyperplane; xi shapeiA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation for mapping training data to a high-dimensional feature space; further, the kernel function is defined as k (x)i,xj)=<Φ(xi),Φ(xj)>Performing point multiplication operation in the feature space, and adopting a Gaussian kernel function, wherein the decision function is defined as:
Figure GDA0003559410720000071
compared with the prior art, the invention has the following remarkable advantages:
the method comprises the steps of firstly, applying a three-dimensional emotion model to group emotion recognition under a dense crowd scene for the first time, and representing group emotion as a three-dimensional coordinate point in an emotion space on the basis of a PAD dimension model so as to realize accurate expression of complex emotion.
Secondly, a dimension emotion data set facing to group behaviors is created for the first time, and coordinates and connection of various emotions in a three-dimensional emotion space are disclosed through a manual labeling and statistical analysis method, so that a data base is laid for subsequent emotion analysis.
Thirdly, a series of methods for extracting emotional features from group motion are provided. Under the relevant definition of dimension emotion, through support vector regression, an abstract process and a mapping method from motion to emotion are constructed.
Fourth, both feelings of startle and anger are defined as abnormal feelings. By identifying these two emotions, it can be judged that an abnormal state has occurred in the scene. Therefore, a novel solution is developed for the intelligent detection of scenes from the perspective of emotion recognition.
Drawings
Fig. 1 is a diagram for detecting group abnormal emotion based on a UMN and PET2009 data set according to an embodiment of the present invention;
FIG. 2 is a diagram of data analysis of PAD dimension for video segments according to an embodiment of the present invention;
FIG. 3 is a flowchart of group emotion recognition and abnormal state detection provided by an embodiment of the present invention;
FIG. 4 is a diagram of extraction of group motion features and middle level semantic representation provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating the effect of GMIC, GMOC and GMTC provided by an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an exemplary method for detecting an abnormal emotional state;
FIG. 7 is a graph of the PAD dimension space for six emotion types provided by embodiments of the present invention.
Detailed Description
The technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
The psychological theory holds that the emotion and intention of the inner heart of a person can be revealed by subconscious limb actions due to the strong correlation between the emotion and the external behavior of the inner heart of the person. Therefore, it is feasible to identify emotional states by emotional attributes that describe the behavior of the population based on psychological models and social emotional principles. From a macroscopic point of view, different morphologies and movement patterns of the population tend to reflect many typical emotional states as a whole.
According to the existing literature data analysis, the emotion recognition method based on the group behaviors is divided from the perspective of a psychological model, and is mainly divided into two types at present: the identification method based on the discrete model and the identification method based on the A-V two-dimensional emotion model. However, both of these methods have some disadvantages.
First, unlike a simple piece of speech or an image, the content presented by surveillance video is very rich. The method has active group movement, complex group emotion and certain plot change. Therefore, the discrete emotion model can only identify a few typical scenes with single shapes and high recognition, and the covered specific emotion types are limited and deficient for dense people. In addition, the group feelings have many subtle features, and are manifested as a combination of multiple emotions. And emotions can change continuously over time. The characteristics of group emotion cannot be effectively expressed by a discrete model.
Second, the A-V two-dimensional emotional model is measured primarily from two dimensions, Arousal and Valence. Wherein, Arousal reflects the intensity of the emotional state, and Valence reflects the type of the emotional state. But the two-dimensional description is still somewhat simpler than the three-dimensional emotional model.
Third, the A-V affective model cannot distinguish some emotions (e.g., anger and fear both belong to the higher-dominance emotions of Arousal), but the PAD three-dimensional affective model can effectively distinguish (anger belongs to the higher-dominance emotion and fear belongs to the lower-dominance emotion).
According to the method, a video data set of group emotion is created from the perspective of group emotion recognition based on a psychological PAD three-dimensional emotion model through data collection and manual marking, and the position relation of six typical emotions in a PAD space is disclosed. An emotion prediction model based on group behaviors is constructed, and group motion characteristics are mapped into three-dimensional coordinates in a PAD space. The method comprises the following steps: motion feature extraction, middle-layer semantic expression, population emotion feature extraction and fusion, and emotion state regression. And constructing an abnormal emotion classifier, and judging that the scene has an abnormal state when two abnormal emotions, namely anger and fear, are detected. The method provided by the application can accurately express the continuous variation state of the group emotion and can also realize effective identification of the global abnormal state.
Referring to fig. 1-7, the invention provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which comprises the following steps:
s1: establishing a PAD three-dimensional emotion model based on group emotion: the method comprises three dimensions of a pleasure degree P, an activation degree A and an dominance degree D, wherein the value of each dimension is between-1 and +1, and a PAD emotion scale is set for reference of emotion dimensions;
s2: establishing a group behavior and group emotion data set: acquiring a standard video data set by a manual marking strategy based on a cognitive psychology principle aiming at video data of different scenes;
s3: and (3) counting a group emotion data set: according to a standard video data set, defining emotion types of videos, marking the videos as videos with the same emotion, normalizing PAD values of the videos to be between [ -1, 1], and determining values of the emotion in PAD space by calculating central points of coordinates;
s4: evaluating the group emotion data set: whether the labeling data are consistent or not is checked, whether the labeling data obey Gaussian distribution or not is verified and analyzed by adopting a Normplot function in a Matlab tool, and if the labeling data are not obey Gaussian distribution, the output image is bent;
s5: group emotion recognition and abnormal emotion detection: extracting group motion characteristics from the video, and expressing layer semantics in group motion;
s6: extracting and regressing the population emotional characteristics: a Support Vector Regression (SVR) is adopted, under the support of a training data set, an optimal hyperplane is searched, and a regression function is obtained on the basis of restraining the minimization of the structured risk;
s7: detection of abnormal emotional states: and taking the PAD value of each marked sample as an input, and training by a Support Vector Machine (SVM).
Wherein P in step S1 represents positive and negative characteristics of individual emotional state, including positive or negative opposite states of emotion, such as like or dislike, satisfaction or dissatisfaction, pleasure or unhappy. If the joy is positive, it represents positive emotion, otherwise it represents negative emotion. In general, different types of group movements may express positive or negative emotions for a dense population. For example, slow walking and stationary conversation all represent positive emotions; while fighting conflicts and running away are typical negative emotions.
A represents the neurophysiologic activation level, alertness, and the degree of activation of body energy associated with emotional states, i.e., the intensity characteristics of emotion, including both low arousal states (e.g., silence) and high arousal states (e.g., surprise). For dense populations, the intensity of the movement changes reflects the level of activation. For example, when the free forward movement of the crowd suddenly changes into escape, the general situation shows that the crowd is stimulated and influenced by some external factors, and the activation degree of the crowd is changed from a low awakening state to a high awakening state.
D represents the control state of the individual on the scene and other people, mainly refers to the subjective control degree of the individual on the emotional state, and is used for distinguishing whether the emotional state is generated by the individual subjectively or influenced by the objective environment. For dense populations, the variability and homogeneity of individual movements represent the magnitude of dominance. When the individual movement shows certain autonomy, randomness and disorder, for example, people who walk on squares and streets for leisure have individual behaviors which mainly follow the subjective consciousness of the individual, the dominance degree is higher. When the individual movement shows certain characteristics of people following and converging, for example, when people who escape are evacuated, all people run in a certain direction, the individual movement is limited to a group movement mode, and the group movement mode is macroscopically consistent, so that the dominance degree is low.
Example 1
In the step S2, an emotion labeling system is designed according to the strategy of manual labeling, and the system represents a P-dimensional value by the facial expression of the character model, represents an a-dimensional value by the vibration degree of the heart, and represents a D-dimensional value by the size of the little.
The emotion data set construction method mainly comprises two methods: deduction mode and extraction mode. The deduction way is composed of
Performers (preferably with professional performance literacy) simulate a typical emotional type (joy, panic, sadness) by physical movement. The emotion contrast of the method is bright, the expressive force is strong, but the deductive form is different from the real emotion, the requirement on performance literacy of an actor is high, and the method is not universal. The extraction mode is to score the emotional state and the display of the group behaviors and the evaluation of each index by adopting an artificial marking method from the video clips of the real scene. The emotion obtained by the method is natural leakage of people, is closer to real life, but has larger workload of later-stage labeling.
Currently, there is no emotional database of group behaviors in the academic world. Some scholars do similar work in research work, and propose some calibrated data sets, but the data sets are not published, and the validity of the data sets cannot be verified. The main purpose of the prior art is to improve the detection efficiency of group behaviors, not emotion analysis. In view of the data set of the individual posture and the individual behavior, the emotion labeling experiment is developed, so that the data set of the group behavior and the group emotion is established, a plurality of real video data are abstracted according to different scenes, and a standard video data set is obtained through a third-party manual labeling strategy.
The emotion data set constructed by the emotion experiment is derived from a UMN data set, a PETS 2009 data set, a UCF data set, an SCU data set, a UCF crown BEHAVE data set and Web
Absolul/Normal crown data set, and Violent-flows data set Rodriguez's data set, wherein the videos of the dense crowd scene are 50 videos in total, a plurality of video segments are cut out in a unit of 15 frames, 200 video segments are 200 video segments in total, 31 volunteers (17 males and 14 females with the age of 19-35 years) are invited, and each video segment is labeled respectively, and the method comprises the following two aspects of work:
(1) the volunteers scored P, A, D three dimensional values for each video segment. The score of each dimension is from low to high as {1,2,3,4,5} five single options.
(2) The volunteers were to determine the type of emotion each video presented, including seven single options { excited, angry, fear, peaceful, boring, neutral, none of the above }.
Example 2
The method for judging the consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:
Figure GDA0003559410720000111
if the variation coefficient is small, the consistency of the verification marking data is low; otherwise, the consistency of the verification marking data is high.
For the PAD data of different video segments, their mark values in the same dimension are counted. If the coefficient of variation is large, the dispersion degree on the unit mean value is large, which indicates that the consistency and the certainty of the volunteer scoring the group are low; otherwise, it indicates that the volunteer has high consistency and certainty of scoring the group. Generally speaking, for a video with low consistency, if the coefficient of variation is greater than 20%, the data is considered to be possibly abnormal, which indicates that the volunteer has a large divergence, and the data can be considered to be removed from the data set so as to ensure the credibility of the data.
Taking the video segments as examples, the former video segment represents that people are running away, the latter video segment represents that people are fighting violently, and regarding the PAD statistical data of the two video segments, the statistical results show that the variation coefficients CV of the labeled data are concentrated in the [0, 20% ] interval, and it can be considered that the scores of the volunteers are concentrated and have small divergence, so the PAD data of the two video segments are credible.
Example 3
The step S5 of extracting group motion features comprises extraction of foreground regions, extraction of optical flow features, extraction of track features and graphical expression of motion features; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as Rt(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of a Gunner Fameback to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)1,p2…pL) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph.
Each track in the global motion track graph is represented by a solid line, and each track comprises three attribute characteristics<T(p1,p2…pL),L,gi>(ii) a Wherein, T (p)1,p2…pL) Representing a number of tracking points p constituting a trajectoryiL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, giIs represented as follows:
Figure GDA0003559410720000131
wherein i belongs to [1, L-1 ].
The expression of the layer semantics in the group movement of the step S5 is deeply analyzed by adopting a gray level co-occurrence matrix, and the adopted statistics comprise variance, contrast, second moment, entropy, correlation and reciprocal difference moment;
the variance is used for reflecting the gray level change degree of the image, when the variance is larger, the gray level change of the image is larger, and the calculation formula of the variance is as follows:
Figure GDA0003559410720000132
wherein,
Figure GDA0003559410720000133
the contrast is used for measuring the value distribution of the matrix and the local variable quantity in the image and reflecting the definition of the image and the depth of the texture, and the calculation formula of the contrast is as follows:
Figure GDA0003559410720000134
the second moment is used for measuring the gray change stability of the image texture and reflecting the gray distribution uniformity and texture thickness of the image, and if the value of the second moment is larger, the texture mode with uniform and regular change is indicated, and the calculation formula of the second moment is as follows:
Figure GDA0003559410720000135
the entropy is used for measuring the randomness of the information content of the image and reflecting the complexity of the gray level distribution of the image, and the calculation formula of the entropy is as follows:
Figure GDA0003559410720000136
the correlation is used for measuring the similarity of the elements of the space gray level co-occurrence matrix in the row or column direction and reflecting the consistency of image textures, and a calculation formula of the correlation is as follows:
Figure GDA0003559410720000137
the reciprocal difference moment is used for reflecting the homogeneity of the image texture and measuring the local change of the image texture, if the value is large, the change is absent among different areas of the image texture, the local uniformity is realized, and the calculation formula of the reciprocal difference moment is as follows:
Figure GDA0003559410720000141
example 4
The step S6 regression function is as follows:
Figure GDA0003559410720000142
Figure GDA0003559410720000143
wherein, omega is a weight vector, C is a balance coefficient,
Figure GDA0003559410720000144
ξiin order to be a function of the relaxation variable,
Figure GDA0003559410720000145
for non-linear transformations mapping data to high dimensional space, b is the bias term and ε is the sensitivity;
introducing a lagrange multiplier, equation (10) translates to:
Figure GDA0003559410720000146
wherein,
Figure GDA0003559410720000147
the regression function finally found was:
Figure GDA0003559410720000148
wherein, k (x, x)i) Is a kernel function;
adopting a radial basis kernel function RBF, and the expression is as follows:
k(xi,xj)=exp(-||xi-xj||2/2σ2) (13)
and obtaining a regression model after training to realize dimension emotion prediction, predicting a continuous value of each video section in the PAD space, and when the group emotion changes along with time, expressing the group emotion as a continuous three-dimensional track so as to present a gradual emotion process.
Example 5
The step S7 of detecting the abnormal emotional state obtains a quadratic equation of the SVM hyperplane, where the expression is:
Figure GDA0003559410720000151
wherein s.t.wTΦ(xi)≥ρ-ξi,ξi≥0,xiTraining set data w representing i ═ {1,2, … N }TΦ(xi) -0 max decision hyperplane; xiiA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation for mapping training data to a high-dimensional feature space; further, the kernel function is defined as k (x)i,xj)=<Φ(xi),Φ(xj)>Performing point multiplication operation in the feature space, and adopting a Gaussian kernel function, wherein the decision function is defined as:
Figure GDA0003559410720000152
for the detection result of the population abnormal emotion, the coordinates of six emotion types in the PAD space are determined. The gradual change of the curve from light to dark shows the sequence of the frame sequences. Since the group emotional state in the video is changed continuously, the group emotional state is represented as continuous change of the group emotional state. The variation process of the group emotion in the video along with time is represented as a continuous emotion track in the figure. It can be seen that the emotion initially fluctuates around the boring coordinate point, indicating that the population is in a normal state at this time. Then the emotion track suddenly moves to the vicinity of the coordinate point of fear, which shows that the group emotion is converted into abnormity and the change is very sudden. Therefore, from a qualitative point of view, the description of the population emotion by the experiment is consistent with the fact.
The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any modifications that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims (4)

1. The method for group emotion recognition and abnormal emotion detection based on the dimension emotion model is characterized by comprising the following steps of:
s1: establishing a PAD three-dimensional emotion model based on group emotion: the method comprises three dimensions of a pleasure degree P, an activation degree A and an dominance degree D, wherein the value of each dimension is between-1 and +1, and a PAD emotion scale is set for reference of emotion dimensions;
s2: establishing a group behavior and group emotion data set: acquiring a standard video data set by a manual marking strategy based on a cognitive psychology principle aiming at video data of different scenes;
s3: and (3) counting a group emotion data set: according to a standard video data set, defining emotion types of videos, marking the videos as videos with the same emotion, normalizing PAD values of the videos to be between [ -1, 1], and determining values of the emotion in PAD space by calculating central points of coordinates;
s4: evaluating the group emotion data set: whether the labeling data are consistent or not is checked, whether the labeling data obey Gaussian distribution or not is verified and analyzed through a Normplot function, and if the labeling data do not obey the Gaussian distribution, the output image is bent;
s5: group emotion recognition and abnormal emotion detection: extracting group motion characteristics from the video, and expressing layer semantics in group motion;
extracting group motion features, wherein the extracting group motion features comprise extracting a foreground region, extracting optical flow features, extracting track features and graphically expressing motion features; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as Rt(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of Gunner Farnembeck to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)1,p2…pL) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph;
each track in the global motion track graph is represented by a solid line, and each track comprises three attribute characteristics<T(p1,p2…pL),L,gi>(ii) a Wherein, T (p)1,p2…pL) Representing a number of tracking points p constituting a trackiL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, giIs represented as follows:
Figure FDA0003605854010000021
wherein i belongs to [1, L-1 ];
the expression of the layer semantics in the group movement of the step S5 is deeply analyzed by adopting a gray level co-occurrence matrix, and the adopted statistics comprise variance, contrast, second moment, entropy, correlation and reciprocal difference moment;
the variance is used for reflecting the gray level change degree of the image, when the variance is larger, the gray level change of the image is larger, and the calculation formula of the variance is as follows:
Figure FDA0003605854010000022
wherein,
Figure FDA0003605854010000023
the contrast is used for measuring the value distribution of the matrix and the local variation in the image and reflecting the definition of the image and the depth of the texture, and a calculation formula of the contrast is as follows:
Figure FDA0003605854010000024
the second moment is used for measuring the gray change stability degree of the image texture and reflecting the gray distribution uniformity degree and the texture thickness degree of the image, if the value of the second moment is larger, the texture mode is in a uniform and regular change, and the calculation formula of the second moment is as follows:
Figure FDA0003605854010000025
the entropy is used for measuring the randomness of the information content of the image and reflecting the complexity of the gray level distribution of the image, and the calculation formula of the entropy is as follows:
Figure FDA0003605854010000026
the correlation is used for measuring the similarity of the elements of the space gray level co-occurrence matrix in the row or column direction and reflecting the consistency of image textures, and a calculation formula of the correlation is as follows:
Figure FDA0003605854010000031
the reciprocal difference moment is used for reflecting the homogeneity of the image texture and measuring the local change of the image texture, if the value is large, the change is absent among different areas of the image texture, the local uniformity is realized, and the calculation formula of the reciprocal difference moment is as follows:
Figure FDA0003605854010000032
s6: extracting and regressing the population emotional characteristics: a Support Vector Regression (SVR) is adopted, under the support of a training data set, an optimal hyperplane is searched, and a regression function is obtained on the basis of restraining the minimization of the structured risk; the regression function is as follows:
Figure FDA0003605854010000033
Figure FDA0003605854010000034
wherein, omega is a weight vector, C is a balance coefficient,
Figure FDA0003605854010000035
ξiin order to be a function of the relaxation variable,
Figure FDA0003605854010000036
for non-linear transformations mapping data to high dimensional space, b is the bias term and ε is the sensitivity;
introducing a lagrange multiplier, equation (10) translates to:
Figure FDA0003605854010000037
wherein, a is more than or equal to 0i≤C,
Figure FDA0003605854010000038
0≤ai≤C;
The regression function finally found was:
Figure FDA0003605854010000039
wherein, k (x, x)i) Is a kernel function;
adopting a radial basis kernel function RBF, and the expression is as follows:
k(xi,xj)=exp(-||xi-xj||2/2σ2) (13)
obtaining a regression model after training to realize dimension emotion prediction, predicting a continuous value of each video section in a PAD space, and expressing the continuous value as a continuous three-dimensional track to present a gradual emotion process when group emotion changes along with time;
s7: detection of abnormal emotional states: and taking the PAD value of each marked sample as an input, and training by a Support Vector Machine (SVM).
2. The method for group emotion recognition and abnormal emotion detection based on the dimension emotion model as recited in claim 1, wherein in step S2, an emotion labeling system is designed according to a manual labeling strategy, and the system represents a value in the P dimension by facial expression of the character model, a value in the a dimension by vibration degree of the heart, and a value in the D dimension by size of the small person.
3. The method for group emotion recognition and abnormal emotion detection based on dimension emotion model as claimed in claim 1, wherein the method for judging consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:
Figure FDA0003605854010000041
if the variation coefficient is small, the consistency of the verification marking data is low; otherwise, the consistency of the verification marking data is high.
4. The method for group emotion recognition and abnormal emotion detection based on dimension emotion model as claimed in claim 1, wherein the detection of abnormal emotion state in step S7 obtains quadratic equation of SVM hyperplane, and its expression is:
Figure FDA0003605854010000042
wherein s.t.wTΦ(xi)≥ρ-ξi,ξi≥0,xiTraining set data w representing i ═ {1,2, … N }TΦ(xi) -0 max decision hyperplane; xiiA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation of the training data mapping to the high-dimensional feature space; further, the radial basis kernel function is defined as k (x)i,xj)=<Φ(xi),Φ(xj)>Performing point multiplication operation in a feature space, and defining a decision function as follows by adopting a Gaussian kernel function:
Figure FDA0003605854010000051
CN202011601643.3A 2020-12-29 2020-12-29 Group emotion recognition and abnormal emotion detection method based on dimension emotion model Active CN112699785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011601643.3A CN112699785B (en) 2020-12-29 2020-12-29 Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011601643.3A CN112699785B (en) 2020-12-29 2020-12-29 Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Publications (2)

Publication Number Publication Date
CN112699785A CN112699785A (en) 2021-04-23
CN112699785B true CN112699785B (en) 2022-06-07

Family

ID=75512114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011601643.3A Active CN112699785B (en) 2020-12-29 2020-12-29 Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Country Status (1)

Country Link
CN (1) CN112699785B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743271B (en) * 2021-08-27 2023-08-01 中国科学院软件研究所 Video content effectiveness visual analysis method and system based on multi-modal emotion
CN113822184A (en) * 2021-09-08 2021-12-21 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Expression recognition-based non-feeling emotion abnormity detection method
CN114677725B (en) * 2022-03-02 2024-09-24 郑州大学 Method and device for predicting and evaluating passive emotion situation of population
US11930226B2 (en) * 2022-07-29 2024-03-12 Roku, Inc. Emotion evaluation of contents
CN115905837B (en) * 2022-11-17 2023-06-30 杭州电子科技大学 Semi-supervised self-adaptive marker regression electroencephalogram emotion recognition method for automatic anomaly detection
CN117313723B (en) * 2023-11-28 2024-02-20 广州云趣信息科技有限公司 Semantic analysis method, system and storage medium based on big data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732203A (en) * 2015-03-05 2015-06-24 中国科学院软件研究所 Emotion recognizing and tracking method based on video information
CN107169426A (en) * 2017-04-27 2017-09-15 广东工业大学 A kind of detection of crowd's abnormal feeling and localization method based on deep neural network
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
US10061977B1 (en) * 2015-04-20 2018-08-28 Snap Inc. Determining a mood for a group
CN111368649A (en) * 2020-02-17 2020-07-03 杭州电子科技大学 Emotion perception method operating in raspberry pie
CN111914594A (en) * 2019-05-08 2020-11-10 四川大学 Group emotion recognition method based on motion characteristics

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9907473B2 (en) * 2015-04-03 2018-03-06 Koninklijke Philips N.V. Personal monitoring system
CN111353366A (en) * 2019-08-19 2020-06-30 深圳市鸿合创新信息技术有限责任公司 Emotion detection method and device and electronic equipment
CN110826466B (en) * 2019-10-31 2023-10-03 陕西励爱互联网科技有限公司 Emotion recognition method, device and storage medium based on LSTM audio-video fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732203A (en) * 2015-03-05 2015-06-24 中国科学院软件研究所 Emotion recognizing and tracking method based on video information
US10061977B1 (en) * 2015-04-20 2018-08-28 Snap Inc. Determining a mood for a group
CN107169426A (en) * 2017-04-27 2017-09-15 广东工业大学 A kind of detection of crowd's abnormal feeling and localization method based on deep neural network
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN111914594A (en) * 2019-05-08 2020-11-10 四川大学 Group emotion recognition method based on motion characteristics
CN111368649A (en) * 2020-02-17 2020-07-03 杭州电子科技大学 Emotion perception method operating in raspberry pie

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Continuous Audiovisual Emotion Recognition Using Feature Selection and LSTM;Reda Elbarougy等;《Journal of Signal Processing》;20201130;第24卷(第6期);第229-235页 *
Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition;Shizhe Chen等;《AVEC"17:Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge》;20171031;第19-26页 *
基于PAD三维情感模型的情感语音研究;张婷;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20181015;第I136-90页 *
基于结构化认知计算的群体行为分析;张严浩;《中国博士学位论文全文数据库 (信息科技辑)》;20180115;第I138-56页 *
基于语义分析的情感计算技术研究进展;饶元等;《软件学报》;20180314;第29卷(第8期);第2397-2462页 *

Also Published As

Publication number Publication date
CN112699785A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN112699785B (en) Group emotion recognition and abnormal emotion detection method based on dimension emotion model
Chen et al. Analyze spontaneous gestures for emotional stress state recognition: A micro-gesture dataset and analysis with deep learning
Venkataraman et al. Shape distributions of nonlinear dynamical systems for video-based inference
Zhang et al. Physiognomy: Personality traits prediction by learning
Kulkarni et al. Facial expression (mood) recognition from facial images using committee neural networks
CN108256631A (en) A kind of user behavior commending system based on attention model
Tabassum et al. Non-intrusive identification of student attentiveness and finding their correlation with detectable facial emotions
Ahn et al. A digital twin city model for age-friendly communities: Capturing environmental distress from multimodal sensory data
Butt et al. Fall detection using LSTM and transfer learning
CN112801009B (en) Facial emotion recognition method, device, medium and equipment based on double-flow network
CN108717548B (en) Behavior recognition model updating method and system for dynamic increase of sensors
Adeniyi et al. Comparison of the performance of machine learning techniques in the prediction of employee
Pang et al. Dance video motion recognition based on computer vision and image processing
Doshi et al. From deep learning to episodic memories: Creating categories of visual experiences
Adeniyi et al. Comparative analysis of machine learning techniques for the prediction of employee performance
CN110084152B (en) Disguised face detection method based on micro-expression recognition
Alsaedi New Approach of Estimating Sarcasm based on the percentage of happiness of facial Expression using Fuzzy Inference System
CN111723869A (en) Special personnel-oriented intelligent behavior risk early warning method and system
TWI646438B (en) Emotion detection system and method
Iqbal et al. Facial emotion recognition using geometrical features based deep learning techniques
Yashaswini et al. Stress detection using deep learning and IoT
Adibuzzaman et al. In situ affect detection in mobile devices: a multimodal approach for advertisement using social network
Parmar et al. Human activity recognition system
Rew et al. Monitoring skin condition using life activities on the SNS user documents
Adibuzzaman et al. Towards in situ affect detection in mobile devices: A multimodal approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant