CN107463919A - A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks - Google Patents
A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks Download PDFInfo
- Publication number
- CN107463919A CN107463919A CN201710713962.5A CN201710713962A CN107463919A CN 107463919 A CN107463919 A CN 107463919A CN 201710713962 A CN201710713962 A CN 201710713962A CN 107463919 A CN107463919 A CN 107463919A
- Authority
- CN
- China
- Prior art keywords
- msub
- mrow
- facial
- network
- markers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention proposes a kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, and its main contents includes:3D starting residual errors network, facial markers, shot and long term memory network unit, its process is, by the time relationship between different frame in the spatial relationship in convolutional neural networks extraction face-image and video, facial markers contribute to prior facial composition in network attention characteristic pattern, therefore input of the extraction facial markers as network, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to more accurately be identified.The present invention proposes a kind of method that time relationship between every frame in video sequence is extracted using 3D convolutional neural networks and shot and long term memory network, and extract facial marks, emphasize the facial composition of more expressive force, improve the recognition capability of the delicate change of facial expression, innovative solution for the new design in the field of detecting a lie, and judicial domain has done further contribution.
Description
Technical field
The present invention relates to Expression Recognition field, and face is carried out based on depth 3D convolutional neural networks more particularly, to a kind of
The method of Expression Recognition.
Background technology
Human facial expression recognition refers to isolate specific emotional state from given still image or dynamic video sequence,
So that it is determined that the mental emotion of identified object, realizes understanding and identification of the computer to human face expression, fundamentally changes people
It is the premise of computer understanding people's emotion so as to reach more preferable man-machine interaction with the relation of computer, and people explore
With the effective way for understanding intelligence.Therefore expression recognition psychology, intelligent robot, intelligent monitoring, virtual reality and
There is very big potential using value in the fields such as digital filming.Specifically, in psychological field, the table of people is analyzed by computer
Feelings information, so as to infer the psychological condition of people, finally reach realize it is man-machine between intelligent interaction, with human facial expression recognition,
The change of research human psychology mood is the important breakthrough of modern science and technology.And in field in intelligent robotics, carried out using computer
Facial Expression Image acquisition, facial expression image pretreatment, Expression analysis etc., promote man-machine communication, reach a higher scientific and technological water
It is flat.In addition, in digital filming field, can be according to the automatic capture pictures of smiling face's expression detected.Although at present in expression knowledge side
The research in face is a lot of, but due to the complexity and cost consideration of method, in the market does not obtain popularization also and used, and by
Very fast in human face expression pace of change, part expression is difficult to catch identification, therefore in terms of Expression Recognition rate is improved, even in the presence of one
Fixed challenge.
The present invention proposes a kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, network structure
Residual error Internet (3DIR) is originated by a 3D and shot and long term memory network forms (LSTM) and formed, extracts the sky in face-image
Between time relationship in relation and video between different frame, facial markers contribute to prior face in network attention characteristic pattern
Portion's composition, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to
More accurately identified.The present invention proposes one kind and extracts video using 3D convolutional neural networks and shot and long term memory network
Per the method for time relationship between frame in sequence, and facial marks is extracted, emphasize the facial composition of more expressive force, improve face
The recognition capability of the delicate change of expression, for the new design of field in intelligent robotics, and the innovative solution of psychological field
Further contribution is done.
The content of the invention
For Expression Recognition, it is proposed that one kind extracts video sequence using 3D convolutional neural networks and shot and long term memory network
Per the method for time relationship between frame in row, and facial marks is extracted, improve the recognition capability of the delicate change of facial expression, be intelligence
The new design of energy robot field, and the innovative solution of psychological field have done further contribution.
To solve the above problems, present invention offer is a kind of to carry out human facial expression recognition based on depth 3D convolutional neural networks
Method, its main contents include:
(1) 3D originates residual error network;
(2) facial markers;
(3) shot and long term memory network unit.
Wherein, described depth 3D convolutional neural networks, residual error Internet (3DIR) is originated by a 3D and shot and long term is remembered
Recall network composition (LSTM) composition, LSTM extracts the spatial relationship in face-image behind 3D starting residual error Internets
And the time relationship in video between different frame, facial markers contribute in network attention characteristic pattern it is prior face into
Point, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to carry out
More accurately identify.
Wherein, described shot and long term memory network (LSTM), LSTM provide memory function, are responsible for non-volatile recording context
Information, comprising input structure (i), forget structure (f) and export structure (o), three structures are each responsible for storing up on time step t
Memory cell c rewriting, maintain and retrieve, make σ (x)=(1+exp (- x))-1For Sigmoid functions, For hyperbolic tangent function, x, h, c, W and b are respectively to input, output, location mode, ginseng
Matrix number and parameter vector,
Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
Further, described 3D startings residual error network, 3D starting residual error networks have higher discrimination, its network knot
Structure is:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, represents color per frame size for 299 × 299,3
Chrominance channel, an adsorption layer is followed by, 3DIR includes A, B, C layer, and by 3DIR-A layers, size of mesh opening is reduced to 18 by 38 × 38
× 18,8 × 8 are reduced to by 18 × 18 by 3DIR-B size of mesh opening, average pond is carried out by 3DIR-C layers, finally by complete
Articulamentum output result.
Further, described facial markers, in the network architecture using facial markers, major facial composition and face are distinguished
The less other parts of portion's expression, in human facial expression recognition, extraction facial markers improve discrimination, retain every frame in network
Time sequencing, CNN and LSTM are trained simultaneously in ad-hoc network, on raw residual network, with reference to facial markers, with residue
Unit replaces optimal path, the input tensor of facial markers and remaining unit is carried out element multiplication, in order to extract facial marks,
Facial bounding box is obtained using cross-platform computer vision library face recognition, alignd using the face for returning partial binary feature
Algorithm, extract 66 facial markers points.
Wherein, described facial alignment algorithm, after detecting and preserving the facial markers of all databases, it is in the training stage
Each sequence generates a facial markers wave filter, the facial markers in given sequence per frame, all images in sequence is adjusted
Whole is the correspondingly sized of network median filter, and according to the marking path detected, power is distributed to all pixels in sequence per frame
Weight, pixel is endowed bigger weight closer to facial markers, the then pixel, by using manhatton distance and linear weight letter
Number, makes the Expression Recognition rate in database reach higher level, the manhatton distance between facial markers and pixel is its phase
The difference sum of part is answered, weighted value is distributed to the weighting function of its individual features, is a simple line of manhatton distance
Property function, is defined as follows:
W (L, P)=1-0.1dM(L,P) (2)
Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, its
The pixel of surrounding has with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials mark weights
It is folded, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively,
Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication
Footpath:
Wherein, xlAnd xl+1It is the input and output of l layers respectively,It is Hadamard product code, F is survival function, and f is sharp
Function living.
Wherein, described facial markers point, after detecting face, facial markers point is extracted by facial alignment algorithm, it
It is 299 × 299 pixels to reset face-image afterwards, can be from sequence because larger image and sequence possess deeper network
More abstract characteristics are extracted, therefore select large-size images to be used as input, all networks have identical setting, respectively to every
Individual database is trained, and the accuracy of network is assessed using independent motif task and integration across database task.
Wherein, described independent motif task, each database with strict theme independent mode be divided into training set and
Checking collection, in all databases, using 5 times of Cross-Validation technique testing results, by 5 times of discrimination average out to, for each
Database and each network for folding, being proposed using above-mentioned setting nursery, delete the multiplication unit of mark, and with remaining unit
Input and output between simple and fast mode replace, select 20% target at random as test group, and report these mesh
Target test result.
Wherein, described integration across database task, in integration across database task, in order to test each database, the data
Storehouse is entirely used for test network, and remaining database is used for training network, test result indicates that, the method proposed improves table
The success rate of feelings identification.
Further, described shot and long term memory network unit, the characteristic pattern obtained from 3DIR units include characteristic pattern sequence
Concept of time in row, vector quantization is carried out to resulting 3DIR characteristic patterns in its sequence dimension, as needed for LSTM units
Sequencing input, the disorder feature figure of vectorization is fed to LSTM units, preserves the time sequencing of list entries and by the spy
Sign figure is delivered to LSTM units, and in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, weight decays to 0.0001,
Learning rate is 0.01, and loss function and the evaluation index of accuracy are used as using classification cross entropy.
Brief description of the drawings
Fig. 1 is a kind of system flow for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.
Fig. 2 is a kind of network frame for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.
Fig. 3 is a kind of facial markers for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system flow for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.Mainly include 3D starting residual errors network, facial markers, shot and long term memory network unit.
Wherein, described depth 3D convolutional neural networks, residual error Internet (3DIR) is originated by a 3D and shot and long term is remembered
Recall network composition (LSTM) composition, LSTM extracts the spatial relationship in face-image behind 3D starting residual error Internets
And the time relationship in video between different frame, facial markers contribute in network attention characteristic pattern it is prior face into
Point, therefore input of the facial markers as network is extracted, the recognition capability of the delicate change of sequence pair facial expression is improved, so as to carry out
More accurately identify.
Wherein, described shot and long term memory network (LSTM), LSTM provide memory function, are responsible for non-volatile recording context
Information, comprising input structure (i), forget structure (f) and export structure (o), three structures are each responsible for storing up on time step t
Memory cell c rewriting, maintain and retrieve, make σ (x)=(1+exp (- x))-1For Sigmoid functions, For hyperbolic tangent function, x, h, c, W and b are respectively to input, output, location mode, ginseng
Matrix number and parameter vector,
Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
Fig. 2 is a kind of network frame for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.Input video sequence, 3DIR combination facial markers strengthen countenance feature, and LSTM networks afterwards produce 3DIR layers
Enhanced feature figure be fully connected as input, and therefrom extracting time information by associated with softmax activation primitives
Layer output result.
Further, described 3D startings residual error network, 3D starting residual error networks have higher discrimination, its network knot
Structure is:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, represents color per frame size for 299 × 299,3
Chrominance channel, an adsorption layer is followed by, 3DIR includes A, B, C layer, and by 3DIR-A layers, size of mesh opening is reduced to 18 by 38 × 38
× 18,8 × 8 are reduced to by 18 × 18 by 3DIR-B size of mesh opening, average pond is carried out by 3DIR-C layers, finally by complete
Articulamentum output result.
Fig. 3 is a kind of facial markers for the method that human facial expression recognition is carried out based on depth 3D convolutional neural networks of the present invention
Figure.Further, described facial markers, in the network architecture using facial markers, difference major facial composition and facial table
The less other parts of feelings, in human facial expression recognition, extraction facial markers improve discrimination, retain the time per frame in network
Sequentially, CNN and LSTM are trained simultaneously in ad-hoc network, on raw residual network, with reference to facial markers, with remaining unit
Optimal path is replaced, the input tensor of facial markers and remaining unit is carried out element multiplication, in order to extract facial marks, is used
Cross-platform computer vision library face recognition obtains facial bounding box, is calculated using the face alignment for returning partial binary feature
Method, extract 66 facial markers points.
Wherein, described facial alignment algorithm, after detecting and preserving the facial markers of all databases, it is in the training stage
Each sequence generates a facial markers wave filter, the facial markers in given sequence per frame, all images in sequence is adjusted
Whole is the correspondingly sized of network median filter, and according to the marking path detected, power is distributed to all pixels in sequence per frame
Weight, pixel is endowed bigger weight closer to facial markers, the then pixel, by using manhatton distance and linear weight letter
Number, makes the Expression Recognition rate in database reach higher level, the manhatton distance between facial markers and pixel is its phase
The difference sum of part is answered, weighted value is distributed to the weighting function of its individual features, is a simple line of manhatton distance
Property function, is defined as follows:
W (L, P)=1-0.1dM(L,P) (2)
Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, its
The pixel of surrounding has with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials mark weights
It is folded, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively,
Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication
Footpath:
Wherein, xlAnd xl+1It is the input and output of l layers respectively,It is Hadamard product code, F is survival function, and f is sharp
Function living.
Wherein, described facial markers point, after detecting face, facial markers point is extracted by facial alignment algorithm, it
It is 299 × 299 pixels to reset face-image afterwards, can be from sequence because larger image and sequence possess deeper network
More abstract characteristics are extracted, therefore select large-size images to be used as input, all networks have identical setting, respectively to every
Individual database is trained, and the accuracy of network is assessed using independent motif task and integration across database task.
Wherein, described independent motif task, each database with strict theme independent mode be divided into training set and
Checking collection, in all databases, using 5 times of Cross-Validation technique testing results, by 5 times of discrimination average out to, for each
Database and each network for folding, being proposed using above-mentioned setting nursery, delete the multiplication unit of mark, and with remaining unit
Input and output between simple and fast mode replace, select 20% target at random as test group, and report these mesh
Target test result.
Wherein, described integration across database task, in integration across database task, in order to test each database, the data
Storehouse is entirely used for test network, and remaining database is used for training network, test result indicates that, the method proposed improves table
The success rate of feelings identification.
Further, described shot and long term memory network unit, the characteristic pattern obtained from 3DIR units include characteristic pattern sequence
Concept of time in row, vector quantization is carried out to resulting 3DIR characteristic patterns in its sequence dimension, as needed for LSTM units
Sequencing input, the disorder feature figure of vectorization is fed to LSTM units, preserves the time sequencing of list entries and by the spy
Sign figure is delivered to LSTM units, and in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, weight decays to 0.0001,
Learning rate is 0.01, and loss function and the evaluation index of accuracy are used as using classification cross entropy.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair
Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and change.
Claims (10)
- A kind of 1. method that human facial expression recognition is carried out based on depth 3D convolutional neural networks, it is characterised in that mainly including 3D Originate residual error network (one);Facial markers (two);Shot and long term memory network unit (three).
- 2. based on the depth 3D convolutional neural networks described in claims 1, it is characterised in that originate residual error net by a 3D Network layers (3DIR) and shot and long term memory network composition (LSTM) composition, LSTM is behind 3D starting residual error Internets, extraction Time relationship in spatial relationship and video in face-image between different frame, facial markers contribute to network attention feature Prior facial composition in figure, therefore input of the facial markers as network is extracted, improve the delicate change of sequence pair facial expression Recognition capability, so as to more accurately be identified.
- 3. based on the shot and long term memory network (LSTM) described in claims 2, it is characterised in that LSTM provides memory function, It is responsible for non-volatile recording contextual information, comprising input structure (i), forgets structure (f) and export structure (o), three structures exist Storage element c rewriting is each responsible on time step t, maintains and retrieves, make σ (x)=(1+exp (- x))-1For Sigmoid letters Number,For hyperbolic tangent function, x, h, c, W and b are respectively to input, and output is single First state, parameter matrix and parameter vector,<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>f</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>f</mi> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>i</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>i</mi> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>o</mi> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>o</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>g</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>W</mi> <mi>C</mi> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>h</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>&rsqb;</mo> <mo>+</mo> <msub> <mi>b</mi> <mi>C</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>f</mi> <mi>t</mi> </msub> <mo>*</mo> <msub> <mi>C</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>i</mi> <mi>t</mi> </msub> <mo>*</mo> <msub> <mi>g</mi> <mi>t</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>h</mi> <mi>t</mi> </msub> <mo>=</mo> <msub> <mi>o</mi> <mi>t</mi> </msub> <mo>*</mo> <mi>&phi;</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>Given input xt,ht-1And ct-1When on time step t, LSTM renewal is provided by equation (1).
- 4. based on the 3D starting residual error networks (one) described in claims 1, it is characterised in that 3D starting residual error networks have Higher discrimination, its network structure are:The video that size is 10 × 299 × 299 × 3 is inputted, wherein frame number is 10, per frame chi Very little is 299 × 299,3 expression color channels, is followed by an adsorption layer, 3DIR includes A, B, C layer, passes through 3DIR-A layers, grid Size is reduced to 18 × 18 by 38 × 38, is reduced to 8 × 8 by 18 × 18 by 3DIR-B size of mesh opening, is carried out by 3DIR-C layers Average pond, finally by being fully connected a layer output result.
- 5. based on the facial markers (two) described in claims 2, it is characterised in that in the network architecture using facial markers, Major facial composition and the less other parts of facial expression are distinguished, in human facial expression recognition, extraction facial markers, which improve, to be known Not rate, retain the time sequencing per frame in network, CNN and LSTM are trained simultaneously in ad-hoc network, in raw residual network On, with reference to facial markers, optimal path is replaced with remaining unit, the input tensor of facial markers and remaining unit is entered row element It is multiplied, in order to extract facial marks, facial bounding box is obtained using cross-platform computer vision library face recognition, using recurrence office The facial alignment algorithm of portion's binary features, extract 66 facial markers points.
- 6. based on the facial alignment algorithm described in claims 5, it is characterised in that detect and preserve the face of all databases It is that each sequence generates a facial markers wave filter in the training stage, the facial markers in given sequence per frame will after mark All Image Adjustings in sequence are the correspondingly sized of network median filter, according to the marking path detected, to every in sequence The all pixels distribution weight of frame, pixel is endowed bigger weight closer to facial markers, the then pixel, by using graceful Kazakhstan Pause distance and Line Weight Function, the Expression Recognition rate in database is reached higher level, between facial markers and pixel Manhatton distance be its corresponding component difference sum, weighted value is distributed to the weighting function of its individual features, is Man Ha One simple linear function of distance of pausing, is defined as follows:W (L, P)=1-0.1dM(L,P) (2)Wherein dM(L,P)It is the manhatton distance between facial markers L and pixel P, facial marks position has peak, around it Pixel have with corresponding facial markers apart from proportional relatively low weight, in order to avoid two adjacent facials marks are overlapping, 7 × 7 windows are defined around each facial markers, each mark uses the weighting function of this 49 pixels respectively, in net Facial markers are added in network, the optimal road in raw residual network is replaced by weighting function w and input layer x element multiplication Footpath:xl+1=f (yl) (3)Wherein, xlAnd xl+1It is the input and output of l layers respectively, ° is Hadamard product code, F is survival function, and f is activation letter Number.
- 7. based on the facial markers point described in claims 5, it is characterised in that after detecting face, calculated by face alignment Method extract facial markers point, afterwards reset face-image be 299 × 299 pixels, due to larger image and sequence possess it is deeper Network, more abstract characteristics can be extracted from sequence, therefore select large-size images to have as input, all networks Identical is set, and each database is trained respectively, network is assessed using independent motif task and integration across database task Accuracy.
- 8. based on the independent motif task described in claims 7, it is characterised in that each database is only with strict theme Cube formula is divided into training set and checking collects, in all databases, using 5 times of Cross-Validation technique testing results, by discrimination 5 times of average out to, for each database and each network for folding, being proposed using above-mentioned setting nursery, delete multiplying for mark Method unit, and replaced with the simple and fast mode between the input and output of remaining unit, 20% target conduct is selected at random Test group, and report the test result of these targets.
- 9. based on the integration across database task described in claims 7, it is characterised in that in integration across database task, in order to test Each database, the database being entirely used for test network, remaining database is used for training network, test result indicates that, The method proposed improves the success rate of Expression Recognition.
- 10. based on the shot and long term memory network unit (three) described in claims 1, it is characterised in that obtained from 3DIR units Characteristic pattern include concept of time in feature graphic sequence, vector is carried out to resulting 3DIR characteristic patterns in its sequence dimension Change, as the sequencing input needed for LSTM units, the disorder feature figure of vectorization is fed to LSTM units, preserves list entries Time sequencing and this feature figure is delivered to LSTM units, in the training stage, using asynchronous stochastic gradient descent, 0.9 momentum, Weight decays to 0.0001, learning rate 0.01, loss function and the evaluation index of accuracy is used as using classification cross entropy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710713962.5A CN107463919A (en) | 2017-08-18 | 2017-08-18 | A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710713962.5A CN107463919A (en) | 2017-08-18 | 2017-08-18 | A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107463919A true CN107463919A (en) | 2017-12-12 |
Family
ID=60550015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710713962.5A Withdrawn CN107463919A (en) | 2017-08-18 | 2017-08-18 | A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463919A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062538A (en) * | 2017-12-29 | 2018-05-22 | 成都智宝大数据科技有限公司 | Face identification method and device |
CN108229338A (en) * | 2017-12-14 | 2018-06-29 | 华南理工大学 | A kind of video behavior recognition methods based on depth convolution feature |
CN108280400A (en) * | 2017-12-27 | 2018-07-13 | 广东工业大学 | A kind of expression recognition method based on depth residual error network |
CN108319900A (en) * | 2018-01-16 | 2018-07-24 | 南京信息工程大学 | A kind of basic facial expression sorting technique |
CN108376234A (en) * | 2018-01-11 | 2018-08-07 | 中国科学院自动化研究所 | emotion recognition system and method for video image |
CN108596865A (en) * | 2018-03-13 | 2018-09-28 | 中山大学 | A kind of characteristic pattern for convolutional neural networks enhances system and method |
CN108682006A (en) * | 2018-04-25 | 2018-10-19 | 南京农业大学 | Contactless canned compost maturity judgment method |
CN108960122A (en) * | 2018-06-28 | 2018-12-07 | 南京信息工程大学 | A kind of expression classification method based on space-time convolution feature |
CN109165573A (en) * | 2018-08-03 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for extracting video feature vector |
CN109657716A (en) * | 2018-12-12 | 2019-04-19 | 天津卡达克数据有限公司 | A kind of vehicle appearance damnification recognition method based on deep learning |
CN109815835A (en) * | 2018-12-29 | 2019-05-28 | 联动优势科技有限公司 | A kind of interactive mode biopsy method |
CN110046551A (en) * | 2019-03-18 | 2019-07-23 | 中国科学院深圳先进技术研究院 | A kind of generation method and equipment of human face recognition model |
CN110287773A (en) * | 2019-05-14 | 2019-09-27 | 杭州电子科技大学 | Transport hub safety check image-recognizing method based on autonomous learning |
CN110363129A (en) * | 2019-07-05 | 2019-10-22 | 昆山杜克大学 | Autism early screening system based on smile normal form and audio-video behavioural analysis |
CN110414544A (en) * | 2018-04-28 | 2019-11-05 | 杭州海康威视数字技术股份有限公司 | A kind of dbjective state classification method, apparatus and system |
WO2021042372A1 (en) * | 2019-09-06 | 2021-03-11 | 中国医药大学附设医院 | Atrial fibrillation prediction model and prediction system thereof |
US11423634B2 (en) | 2018-08-03 | 2022-08-23 | Huawei Cloud Computing Technologies Co., Ltd. | Object detection model training method, apparatus, and device |
CN117218422A (en) * | 2023-09-12 | 2023-12-12 | 北京国科恒通科技股份有限公司 | Power grid image recognition method and system based on machine learning |
CN117976173A (en) * | 2024-03-28 | 2024-05-03 | 深圳捷工医疗装备股份有限公司 | Signal transmission call management system |
-
2017
- 2017-08-18 CN CN201710713962.5A patent/CN107463919A/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
BEHZAD HASANI等: ""Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks"", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1705.07871V1》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229338A (en) * | 2017-12-14 | 2018-06-29 | 华南理工大学 | A kind of video behavior recognition methods based on depth convolution feature |
CN108280400A (en) * | 2017-12-27 | 2018-07-13 | 广东工业大学 | A kind of expression recognition method based on depth residual error network |
CN108062538A (en) * | 2017-12-29 | 2018-05-22 | 成都智宝大数据科技有限公司 | Face identification method and device |
CN108376234A (en) * | 2018-01-11 | 2018-08-07 | 中国科学院自动化研究所 | emotion recognition system and method for video image |
CN108319900A (en) * | 2018-01-16 | 2018-07-24 | 南京信息工程大学 | A kind of basic facial expression sorting technique |
CN108596865B (en) * | 2018-03-13 | 2021-10-26 | 中山大学 | Feature map enhancement system and method for convolutional neural network |
CN108596865A (en) * | 2018-03-13 | 2018-09-28 | 中山大学 | A kind of characteristic pattern for convolutional neural networks enhances system and method |
CN108682006A (en) * | 2018-04-25 | 2018-10-19 | 南京农业大学 | Contactless canned compost maturity judgment method |
CN108682006B (en) * | 2018-04-25 | 2021-07-20 | 南京农业大学 | Non-contact type canned compost maturity judging method |
CN110414544A (en) * | 2018-04-28 | 2019-11-05 | 杭州海康威视数字技术股份有限公司 | A kind of dbjective state classification method, apparatus and system |
CN108960122A (en) * | 2018-06-28 | 2018-12-07 | 南京信息工程大学 | A kind of expression classification method based on space-time convolution feature |
CN109165573A (en) * | 2018-08-03 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Method and apparatus for extracting video feature vector |
US11605211B2 (en) | 2018-08-03 | 2023-03-14 | Huawei Cloud Computing Technologies Co., Ltd. | Object detection model training method and apparatus, and device |
US11423634B2 (en) | 2018-08-03 | 2022-08-23 | Huawei Cloud Computing Technologies Co., Ltd. | Object detection model training method, apparatus, and device |
CN109657716A (en) * | 2018-12-12 | 2019-04-19 | 天津卡达克数据有限公司 | A kind of vehicle appearance damnification recognition method based on deep learning |
CN109657716B (en) * | 2018-12-12 | 2020-12-29 | 中汽数据(天津)有限公司 | Vehicle appearance damage identification method based on deep learning |
CN109815835A (en) * | 2018-12-29 | 2019-05-28 | 联动优势科技有限公司 | A kind of interactive mode biopsy method |
CN110046551A (en) * | 2019-03-18 | 2019-07-23 | 中国科学院深圳先进技术研究院 | A kind of generation method and equipment of human face recognition model |
CN110287773A (en) * | 2019-05-14 | 2019-09-27 | 杭州电子科技大学 | Transport hub safety check image-recognizing method based on autonomous learning |
CN110363129B (en) * | 2019-07-05 | 2022-05-27 | 昆山杜克大学 | Early autism screening system based on smiling paradigm and audio-video behavior analysis |
CN110363129A (en) * | 2019-07-05 | 2019-10-22 | 昆山杜克大学 | Autism early screening system based on smile normal form and audio-video behavioural analysis |
JP2022523835A (en) * | 2019-09-06 | 2022-04-26 | 中國醫藥大學附設醫院 | Atrial fibrillation prediction model and its prediction system |
WO2021042372A1 (en) * | 2019-09-06 | 2021-03-11 | 中国医药大学附设医院 | Atrial fibrillation prediction model and prediction system thereof |
CN117218422A (en) * | 2023-09-12 | 2023-12-12 | 北京国科恒通科技股份有限公司 | Power grid image recognition method and system based on machine learning |
CN117218422B (en) * | 2023-09-12 | 2024-04-16 | 北京国科恒通科技股份有限公司 | Power grid image recognition method and system based on machine learning |
CN117976173A (en) * | 2024-03-28 | 2024-05-03 | 深圳捷工医疗装备股份有限公司 | Signal transmission call management system |
CN117976173B (en) * | 2024-03-28 | 2024-05-28 | 深圳捷工医疗装备股份有限公司 | Signal transmission call management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463919A (en) | A kind of method that human facial expression recognition is carried out based on depth 3D convolutional neural networks | |
Kang et al. | Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast | |
CN104217214B (en) | RGB D personage's Activity recognition methods based on configurable convolutional neural networks | |
Yu et al. | Deep learning in remote sensing scene classification: a data augmentation enhanced convolutional neural network framework | |
Oh et al. | Approaching the computational color constancy as a classification problem through deep learning | |
CN109344736B (en) | Static image crowd counting method based on joint learning | |
CN111814661B (en) | Human body behavior recognition method based on residual error-circulating neural network | |
CN107506740A (en) | A kind of Human bodys' response method based on Three dimensional convolution neutral net and transfer learning model | |
CN108921822A (en) | Image object method of counting based on convolutional neural networks | |
CN105469041B (en) | Face point detection system based on multitask regularization and layer-by-layer supervision neural network | |
CN110147743A (en) | Real-time online pedestrian analysis and number system and method under a kind of complex scene | |
CN109697435A (en) | Stream of people's quantity monitoring method, device, storage medium and equipment | |
CN109376667A (en) | Object detection method, device and electronic equipment | |
Baveye et al. | Deep learning for image memorability prediction: The emotional bias | |
CN109376747A (en) | A kind of video flame detecting method based on double-current convolutional neural networks | |
CN106503687A (en) | The monitor video system for identifying figures of fusion face multi-angle feature and its method | |
CN107016357A (en) | A kind of video pedestrian detection method based on time-domain convolutional neural networks | |
CN107403154A (en) | A kind of gait recognition method based on dynamic visual sensor | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN108090403A (en) | Face dynamic identification method and system based on 3D convolutional neural network | |
CN106529499A (en) | Fourier descriptor and gait energy image fusion feature-based gait identification method | |
CN104992223A (en) | Intensive population estimation method based on deep learning | |
CN109615574A (en) | Chinese medicine recognition methods and system based on GPU and double scale image feature comparisons | |
CN106156765A (en) | safety detection method based on computer vision | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20171212 |
|
WW01 | Invention patent application withdrawn after publication |