CN113505729A

CN113505729A - Interview cheating detection method and system based on human body face movement unit

Info

Publication number: CN113505729A
Application number: CN202110842018.6A
Authority: CN
Inventors: 陈昊昕
Original assignee: Shanghai Caili Network Co ltd
Current assignee: Shanghai Caili Network Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-15

Abstract

A human face movement unit-based interview cheating detection method and system are provided, the method comprises the following steps: carrying out face recognition on video data acquired during interview, extracting face features and coding a face motion unit, a head motion unit and/or an eyeball motion unit; and adopting a trained classifier to recognize the combination of the human face features and the face motion unit, the head motion unit and/or the eyeball motion unit, and judging whether cheating exists or not. The cheating detection method of the invention obtains the detection result by detecting, extracting and classifying the characteristics of the facial movement unit and utilizing the classification algorithm of machine learning, is not influenced by the personal relationship or emotion of the invigilator, and ensures the relative fairness of the monitoring result; the invention improves the interview judgment efficiency, and the misjudgment rate is superior to the original manual judgment method.

Description

Interview cheating detection method and system based on human body face movement unit

Technical Field

The invention belongs to the technical field of artificial intelligence, particularly relates to the technical field of machine learning assistance, and particularly relates to a human face movement unit-based interview cheating detection method and system.

Background

Face recognition, also known as face recognition system, refers to a computer technology for identity identification by analyzing and comparing human face visual characteristic information, and belongs to one of biological characteristic recognition technologies. The generalized face recognition comprises a series of related technologies for constructing a face recognition system, including face image acquisition, face positioning, face recognition preprocessing, identity confirmation, identity search and the like; the narrow-sense face recognition refers to a technique or system for identity confirmation or identity search through a face.

The research of the face recognition system starts in the 60 s of the 20 th century, a semi-automatic face recognition system is established by a researcher Bledsoe, the face recognition research at that time arouses the great interest of researchers in various subject fields, the development of computer technology and optical imaging technology is improved after the 80 s, the face recognition research becomes a hot topic again after the ninety s along with the urgent need of various industries for the face recognition system, and the technology of the United states, Germany and Japan is mainly realized finally; the key to the success of the face recognition system is whether the face recognition system has a core algorithm with a sharp end or not, and the recognition result has practical recognition rate and recognition speed; over thirty years of research, face recognition has become one of the most successful applications in the field of image analysis and image understanding.

At present, the face recognition technology is generally applied in the technical fields of personal identity confirmation, such as being used as a starting password and being used for personal payment, personal identity search, such as network pursuit, public place monitoring and the like, but in application scenes of interviewing, written trials and the like, the face recognition technology only judges whether the person is a real person to take an examination, judges whether cheating exists or not on the basis of expressions and actions of interviewees through the face recognition technology, and does not find related reports temporarily.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a method and a system for detecting cheating on interviews based on human face movement unit, so as to at least partially solve at least one of the above technical problems.

In order to achieve the above object, as a first aspect of the present invention, there is provided an interview cheating detection method based on a human face movement unit, comprising the steps of:

carrying out face recognition on video data acquired during interview, extracting face features and coding a face motion unit, a head motion unit and/or an eyeball motion unit;

and adopting a trained classifier to recognize the combination of the human face features and the face motion unit, the head motion unit and/or the eyeball motion unit, and judging whether cheating exists or not.

As a second aspect of the present invention, there is also provided an interview cheating detection system based on a human face movement unit, comprising:

the face recognition unit is used for carrying out face recognition on the video data acquired during interview, extracting face features and coding the face motion unit;

and the classification judging unit is used for identifying the human face features and the facial movement unit by adopting a trained classifier and judging whether cheating exists or not.

A third aspect of the present invention proposes an electronic device comprising a processor and a memory, the memory storing a computer-executable program, the processor executing the above-mentioned interview cheating detection method based on human face movement units when the computer-executable program is executed by the processor.

The fourth aspect of the present invention also provides a computer-readable medium storing a computer-executable program, which when executed, implements the above-mentioned interview cheating detection method based on human face movement units.

Based on the technical scheme, compared with the prior art, the method and the system for detecting the cheating on the interview based on the human face movement unit have at least one of the following beneficial effects:

1. the invention solves the problems of low efficiency and high misjudgment rate of artificially judging cheating behaviors of interviewers in the prior art;

2. according to the invention, the detection result is obtained by detecting, extracting and classifying the features of the facial movement unit and utilizing a machine learning classification algorithm, so that the influence of the personal relationship or emotion of the invigilator is avoided, and the relative fairness of the monitoring result is ensured;

3. the method can automatically screen the interview video by combining the neural network and the classifier trained by mass data, improves interview judgment efficiency, has speed far exceeding that of a manual method, can reduce the misjudgment rate to a very low degree under the support of enough labeled data, and has the misjudgment rate superior to that of the original manual judgment method. According to the experimental research of the inventor, the accuracy rate of the manual method is about 50% at present, the judgment time of a single video is 3min, and after the method is adopted, the accuracy rate can be improved to 82%, the judgment time of the single video is within 1s, and the efficiency is greatly improved.

Drawings

FIG. 1 is a list of motion feature representations of expressive faces;

FIG. 2 is a flow chart of the cheating on-trial detection method based on human face movement unit of the present invention;

FIG. 3 is a network structure diagram of a neural network model of the human face movement unit based interview cheating detection system of the present invention;

FIG. 4 is a schematic frame diagram of the network structure adopted in the video classification step of the human face movement unit-based interview cheating detection system of the invention;

FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the invention;

fig. 6 is a schematic view of a computer-readable recording medium of an embodiment of the present invention;

FIG. 7 is a primary facial motion code map in accordance with the present invention;

FIG. 8 is a primary head motion unit code map to which the present invention relates;

FIG. 9 is a code diagram of a primary eye movement unit according to the present invention.

FIG. 10 is a basic expression unit code diagram according to the present invention.

Detailed Description

In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that one skilled in the art may, in certain cases, practice the present invention in a manner that does not include the structures, properties, effects or other features described above.

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different network and/or processing unit devices and/or microcontroller devices.

The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will also be understood that, although the various devices, elements, components or sections may be described herein using terms such as first, second, third, etc. to indicate a number, these devices, elements, components or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Moreover, the terms "and/or," "and/or" refer to all combinations that include any one or more of the listed items.

Face recognition, also known as face recognition system, refers to a computer technology for identity identification by analyzing and comparing human face visual characteristic information, and belongs to one of biological characteristic recognition technologies. Over thirty years of research, face recognition has become one of the most successful applications in the field of image analysis and image understanding.

In general, a face recognition system includes image capture, face localization, image preprocessing, and face recognition (identity verification or identity lookup). The system input is typically one or a series of face images with undetermined identity, as well as several face images of known identity or corresponding codes in a face database, and the output is a series of similarity scores indicating the identity of the face to be recognized.

The current face recognition algorithm can be classified as: a recognition algorithm based on human face feature points (feature-based recognition algorithms), a recognition algorithm based on the whole human face image (aspect-based recognition algorithms), a recognition algorithm based on templates (template-based recognition algorithms), an algorithm for recognition using a neural network (recognition algorithms network), and an algorithm for recognition using a support vector machine (recognition algorithms SVM), etc.

Although the research and development of the face recognition technology is not in the way, the face recognition technology is rarely applied to recognition of fraud, and one reason is that the fraud actions are usually small in amplitude and high in speed, the number of frames of a general video acquisition system is low, and related actions cannot be found; and the fraud action is often confused with other involuntary small actions, and is difficult to train by machine learning, so that the recognition rate is low. The invention develops a new method after long-time research, and provides a human face motion unit-based cheating detection method for the interview based on the human face motion unit on the basis of the existing human face recognition technology aiming at the problems of low efficiency and high misjudgment rate in the application scene of cheating and screening the interview video by adopting a manual method in the prior art.

Specifically, the method adopts a convolutional neural network to carry out face detection on an interview video, detects various motions of a face, such as a face motion unit, head motion, eyeball motion and the like after the face is detected, and is different from conventional machine learning that the motion is directly trained by integral face action expression. These combinatorial codes are obtained by extensive practice and machine learning by the inventors.

Therefore, the training method of the adopted neural convolution network specifically comprises the following steps:

constructing a neural network model for identifying cheating, wherein the neural network model is preferably a convolutional neural network model;

the neural network model is adopted to detect the face of an interview video, particularly the interview video with a high frame rate (the conventional rate is 24 frames/second, preferably more than or equal to 36 frames/second, such as 50 frames/second and 200 frames/second), after the face is detected, the face including a face motion unit, a head motion unit and/or an eyeball motion unit is detected respectively, special data labeling is carried out on the basis of specific combined codes of different parts obtained through research, a large amount of sample data is obtained, and then a classifier is obtained through training by adopting a machine learning classification method.

Thus, the corresponding identification method steps specifically include the following steps:

carrying out face recognition on video data acquired during interview, extracting face features and coding a face motion unit;

and (3) adopting the trained classifier to identify the human face features and the face motion unit, the head motion unit and/or the eyeball motion unit, and judging whether cheating exists or not.

The face features, i.e. face key points, are important feature points of each part of the face, usually contour points and corner points. Face key point detection is a key step in the field of face recognition and analysis, and is a precondition and breakthrough for other face-related problems such as automatic face recognition, expression analysis, three-dimensional face reconstruction, three-dimensional animation and the like. In recent years, the deep learning method has been successfully applied to many fields such as image recognition and analysis, voice recognition, and natural language processing due to its automatic learning and continuous learning capabilities, and has brought about significant improvements in these fields. The face key point detection can locate key points of a face on a given face image, including points of eyebrow, eye, nose, mouth and face contour regions, and is influenced by factors such as posture, occlusion and the like, so that the face key point detection is a challenging task and has the following main applications at present: (1) the human face posture alignment and human face recognition algorithms need to align the human face posture so as to improve the accuracy of the model; (2) the face beautifying and editing method has the advantages that the face shape, the eye shape, the nose shape and the like can be accurately analyzed based on key points, so that the specific position of the face is modified, the entertainment functions of special effect face beautifying, surface mounting and the like of the face are realized, and some face editing algorithms can be assisted to better play a role; (3) and the human facial expression analysis is used for analyzing the human facial expression based on the key points, so that the method is used for scenes such as interactive entertainment, behavior prediction and the like. The invention is based on the design and development of the application of the point (3).

The face features may be detected, for example, by using a picture pyramid technique, so as to further optimize the recognition of the face features and improve the classification accuracy.

For example, a Multi-Task cascaded Convolutional Network (MTCNN) may be used to construct a Neural Network model, and the face region detection and the face keypoint detection may be performed simultaneously. The multitask cascade convolution network model mainly adopts three cascade networks and adopts the idea of adding a classifier to a candidate frame to carry out rapid and efficient face detection, the overall structure can be divided into three-layer network structures of P-Net, R-Net and O-Net, namely P-Net for rapidly generating a candidate window, R-Net for carrying out high-precision candidate window filtering selection and O-Net for generating a final boundary frame and face key points, and technologies such as a picture pyramid, a frame Regression (Bounding-Box Regression), a non-maximum suppression (NMS) and the like are used.

The P-Net is called a Proposal Network, the basic structure is a full convolution Network, the picture pyramid constructed in the last step is subjected to preliminary feature extraction and frame calibration through a full neural Network (FCN), and a frame regression adjustment window and an NMS (Network management system) are subjected to filtering of most windows.

Wherein, R-Net is called as Refine Network, the basic structure is a convolution neural Network, and compared with the P-Net of the first layer, a full connection layer is added, so that the screening of input data is stricter. The basic idea is to use a more complex network structure relative to P-Net to further select and adjust the window of the possible face region generated by P-Net, so as to achieve the effects of high-precision filtering and face region optimization.

Wherein, O-Net is called Output Network, the basic structure is a more complex convolution neural Network, and compared with R-Net, one convolution layer is added. The difference between the O-Net effect and the R-Net effect is that the layer structure can identify the facial region through more supervision, and can regress the facial feature points of the human, and finally five facial feature points are output. The basic idea of O-Net is similar to R-Net, where a more complex network is used to optimize the model performance.

The detailed reasoning process for face recognition by adopting the multitask cascade convolution network comprises the following steps:

step 1: and continuously adjusting the size (resize) of the test picture to obtain a picture pyramid. And performing resize on the test picture according to the resize _ factor (generally between 0.70 and 0.80) until the size of the test picture is equal to or larger than 12 × 12 required by Pnet, thereby obtaining pictures with different sizes, such as the original picture, the resize _ factor, the original picture, the resize _ factor 2, the original picture, the resize _ factor ^ n and the like.

Step 2: the obtained picture pyramid is input to Pnet to obtain a large number of candidates (candidate), and the output map shape is (m, n, 16). And screening most candidates according to the classification scores, calibrating the bbox according to the obtained 4 offsets to obtain upper left and lower right coordinates of the bbox, and performing non-maximum suppression (NMS) on the candidates according to the IOU value to screen most candidates. This operation is repeated to finally obtain (num _ left _ after _ nms,16) candidates.

And step 3: and according to the coordinates output by the Pnet, cutting out the picture from the original image, according to the square cutting method of the maximum side length, avoiding deformation and keeping more details, then resize is 24 x 24, and inputting the image into the Rnet for fine adjustment. Rnet still outputs two classes of one-hot 2 outputs, 4 coordinate offsets of bbox and 10 landmark outputs, most candidates which are not human faces are eliminated according to the two classes, offset adjustment is carried out on bbox of the screenshot, the IOU NMS described by Pnet is repeated again, and most candidates are eliminated. The final Rnet output is (num _ left _ after _ Rnet, 16) candidates, the original image is cut according to the coordinates of bbox and input into the Onet, and the image is also cut according to the square with the maximum side length, so that deformation is avoided and more details are kept.

And 4, step 4: the process of Pnet is repeated approximately, with the difference that in addition to the coordinates of bbox, the coordinates of landmark are also output. And (4) carrying out classification screening and frame adjustment on the NMS screening to obtain accurate human face bbox coordinates and landmark points, and finishing the task.

Wherein, for example, Facial behavior Coding System (Facial Action Coding System) and Facial motion Coding System (FACS) are used to construct training samples, the Facial behavior Coding System comprises at least 6 basic expressions (i.e. happy, sad, surprised, fear, anger and disgust), and a library of thousands of different Facial expression images is systematically established; the facial motion coding system divides the human face into about 46 motion units (AU) which are independent and connected with each other according to the anatomical characteristics of the human face, analyzes the motion characteristics of the motion units, main areas controlled by the motion units and expressions related to the motion characteristics, and provides a large number of picture descriptions. In the present invention, the facial motion, head motion and eye motion are further optimized and encoded as shown in fig. 7, 8 and 9. The basic expression code is shown in fig. 10, in which, for example, "6 + 12" is used for "happy" and the corresponding look-up table indicates "cheek raised" + "mouth corner raised". And for the cheating action, the corresponding micro expression or action is decomposed, so that the phenomenon that the whole action video is easily confused with other careless small actions to reduce the identification accuracy is avoided.

It should be noted that there are 42 muscles in the human face, which are controlled by different regions of the brain, some of which can be controlled directly by consciousness, and some of which are not easily controlled by consciousness. Muscles that can be directly controlled by consciousness are called "voluntary muscles". Muscles that most people are unable to control through consciousness are called "involuntary muscles". Facial behaviors can be divided into 46 facial motion units (Action units), 8 head motion units and 4 eyeball motion units (shown in fig. 7-9) according to human muscle characteristics. By the combination of these face movement unit, head movement unit, and eyeball movement unit, the emotion and face movement of a person can be recognized, and thus the behavior thereof can be judged.

By using the facial behavior coding system and the facial movement coding system, the sources of model training samples can be richer, the training samples are more accurate, and the obtained results are more accurate.

In the coding step of the face motion unit, 68 key points of the face need to be located through landmark obtained in the previous step, independent detection is performed according to different regional ROIs, and finally detection of each ROI is summarized to obtain a detection result of the whole face.

The encoding step further includes grouping different AUs according to prior knowledge of a face motion Unit (Action Unit). AUs occurring in the same position area are divided into one group, and the human face can be divided into 9 groups such as a left eye, a right eye and a nose wing. And obtaining an AU result of a single face according to the corresponding relation between the face motion unit and the key points of the face.

The classifier is trained by adopting the following method:

and constructing a two-channel convolutional neural network, wherein one channel input is a video stream, and the other channel input is the obtained AU coded matrix. The spatial semantic information of the face, the head and the eyeball is captured through the coding of the face motion unit, the head motion unit and the eyeball motion unit, and the motion information under the fine time resolution is captured through the high-frame-rate video. And then, at a decoding layer, fusing the feature maps obtained by the two channels, and finally obtaining a classification result of whether the video interviewer has cheating conditions.

The invention also discloses an electronic device comprising a processor and a memory, wherein the memory is used for storing a computer executable program, and when the computer executable program is executed by the processor, the processor executes the interview cheating detection method based on the human face movement unit.

The electronic device may be embodied in the form of a general purpose computing device, for example. The number of the processors may be one, or may be multiple and work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.

Wherein the memory stores a computer executable program, typically machine readable code, which is executable by said processor to enable the electronic device to perform the method of the invention, or at least some of the steps of the method.

The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).

Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for data exchange between the electronic device and an external device. The I/O interface may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and/or a memory storage device using any of a variety of bus architectures.

Elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method or at least part of the steps of the method.

The invention also discloses a computer readable medium, on which a computer executable program is stored, wherein the computer executable program, when executed, realizes the above-mentioned interview cheating detection method based on the human face movement unit.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, C + + or the like and conventional procedural programming languages, such as the C language, assembly language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

Example 1

In this embodiment, the training data is self-labeled video data, which amounts to about 5000000 video segments, and the total duration of the video is about 2500000 hours.

1. Face recognition and feature point extraction

The system flow chart is shown in fig. 2, and the method firstly carries out face recognition on an input video and detects face position coordinates in the video. Face recognition uses a multi-task cascaded convolutional network.

The reasoning process of the multitask cascade convolution network is as follows:

step 1, first, continuously performing Resize on a test picture to obtain a picture pyramid.

The resize _ factor can be determined according to the face size distribution of the data set, and is usually set to be 0.70-0.80, and if the face size distribution is too large, the inference time is easy to prolong, and if the face size distribution is too small, some small and medium-sized faces are easy to miss. And performing resize on the test picture according to the set resize _ factor until the size is larger than or equal to 12 x 12 required by Pnet. This makes it possible to obtain a series of pictures of different sizes, such as the original image, the original image reset _ factor 2, the original image reset _ factor n, and the like, and the pictures are stacked to form a pyramid, which is called a picture pyramid. In the resize step, the size of the last image is equal to or larger than 12, and it should be noted that these images are all inputted into Pnet one by one to obtain candidates.

The obtained picture pyramid is input to Pnet to obtain a large number of candidates (candidate). And (3) inputting all pictures into the Pnet according to the picture pyramid obtained in the step (1) to obtain an output map with the shape of (m, n, 16). According to the classification score, screening a large part of candidates, calibrating the bbox according to the obtained 4 offsets to obtain coordinates of the upper left part and the lower right part of the bbox (correcting the embedded pits according to the offsets to describe the candidates in the training stage), and screening a large part of candidates according to the IOU value and then performing non-maximum suppression (NMS) on the candidates. Specifically, the tensor of (num _ left,4), namely the absolute coordinates of upper left and lower right of num _ left bbox, is obtained according to the classification score from large to small. Calculating IOU by bbox coordinate and residual coordinate of maximum fraction value in the queue, removing frame with IOU greater than 0.6, and moving the maximum fraction value to final result; wherein the threshold is set in advance. Repeating the operation will remove a lot of bboxs with a lot of overlap, and finally obtain (num _ left _ after _ nms,16) candidates, which need to cut out the picture from the original image according to the bbox coordinates, and then resize is 24 × 24 and input to Rnet.

And (4) fine tuning the candidate pictures screened by the Pnet through the Rnet. And according to the coordinates output by the Pnet, cutting out the picture from the original picture, wherein bbox is required to be cut out to be a square with the largest edge length when the picture is cut out so as to ensure that no deformation is generated and more details around the face frame are reserved when resize is 24 x 24, and inputting the details into the Rnet for fine adjustment. Rnet still outputs two classes of one-hot 2 outputs, coordinate offset of bbox of 4 outputs and landmark of 10 outputs, most candidates which are not human faces are removed according to the two classification scores, offset adjustment is carried out on bbox of the screenshot, namely, after up-down left-right adjustment is carried out on x and y coordinates which are at the upper left and the lower right, the IOU NMS described by Pnet is repeated again to remove most candidates. Finally, the Pnet outputs num _ left _ after _ Rnet, 16, the original image is cut according to the coordinates of bbox and then input into the Onet, and the method is also a square cutting method according to the maximum side length, so that deformation is avoided and more details are kept.

And inputting the pictures subjected to Rnet drying to many candidates into the Onet, and outputting accurate bbox coordinates and landmark coordinates. The process of Pnet can be repeated in general, but with the difference that at this time the coordinates of landmark are also output in addition to the coordinates of bbox. And after classification screening and frame adjustment, NMS screening is carried out to obtain accurate human face bbox coordinates and landmark points, and the task is completed.

2. Facial motion unit coding

And positioning 68 key points of the face through the landmark obtained in the last step, independently detecting according to different regional ROIs, and finally summarizing the detection of each ROI to obtain a detection result of the whole face.

Different AUs are grouped according to a priori knowledge of the face motion Unit (Action Unit). AUs occurring in the same position area are divided into one group, and the human face can be divided into 9 groups such as a left eye, a right eye and a nose wing. And obtaining an AU result of a single face according to the corresponding relation between the face motion unit and the face key point.

3. Video classification

A two-channel convolutional neural network is constructed as shown in fig. 4, one channel input is a video stream, and the other channel input is an obtained AU encoded matrix. The face space semantic information is captured through the coding of the face motion unit, and the motion information under the fine time resolution is captured through the high-frame-rate video. And then, at a decoding layer, fusing the feature maps obtained by the two channels to finally obtain whether the cheating condition exists in the video interviewer. The specific parameters of the neural network model in fig. 4 are obtained by training, and are hidden layers.

The two-channel convolutional neural network obtained through training in the embodiment can be used for interviewing or examination sites, micro-expression of interviewees or examination taking personnel is monitored in real time through the camera shooting acquisition equipment, whether the cheating behaviors exist is judged through slight changes of expressions and actions of the interviewees, and compared with a traditional classifier obtained through machine learning training of actions, the two-channel convolutional neural network obtained through training in the embodiment is higher in operation rate and accuracy.

From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and an electronic processing unit, a server, a client, a mobile phone, a control unit, a processor, etc. included in the system, and the present invention can also be implemented by a vehicle including at least a part of the above system or components. The invention can also be implemented by computer software for performing the method of the invention, for example, by control software executed by a microprocessor, an electronic control unit, a client, a server, etc. of the locomotive side. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, but may also be implemented in a distributed manner by hardware entities without specific details, for example, some method steps executed by a computer program may be executed at the locomotive end, and another part may be executed in a mobile terminal or an intelligent helmet, etc. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.

Therefore, the invention provides an interview cheating detection method and system based on a human face movement unit, aiming at the problems of low judgment efficiency and high misjudgment rate of the cheating behavior of an interviewer mainly through human beings in the existing interview system. Practice results show that the method improves the interview judgment efficiency, and the misjudgment rate is superior to that of the original manual judgment method.

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not limited to the specific embodiments, but rather should be construed as broadly within the spirit and scope of the invention as defined in the appended claims.

Claims

1. A human face movement unit-based interview cheating detection method is characterized by comprising the following steps:

2. The method of claim 1,

the detection of the human face features is realized by adopting a picture pyramid technology.

3. The method of claim 1,

the detection of the human face features adopts a multitask cascade convolution network (MTCNN) to construct a neural network model, and the detection of the human face region and the detection of the human face key points are carried out simultaneously.

4. The method of claim 3,

the multi-task cascade convolution network model adopts three cascade networks and is divided into three-layer network structures of P-Net, R-Net and O-Net.

5. The method of claim 3,

and a frame regression algorithm and/or a non-maximum suppression technology (NMS) are/is adopted in the detection process of the face features.

6. The method of claim 1,

the trained classifier uses a facial behavior coding system and a facial motion coding system to construct training samples.

7. The method of claim 6,

in the step of face recognition, the behavior of the face is divided into 46 face moving units, 8 head moving units, and 4 eyeball moving units.

8. The method of claim 6,

in the coding step of the face motion unit, 68 key points of the face need to be located through landmark obtained in the previous step, independent detection is performed according to different regional ROIs, and finally the detection of each ROI is summarized to obtain the detection result of the whole face.

9. The method of claim 6,

the classifier also includes grouping different face motion units (AUs) according to their prior knowledge when trained; and/or

AUs generated in the same position area are divided into a group, and the human face is divided into 9 groups including a left eye, a right eye and a nasal alar part; and/or

And obtaining an AU result of a single face according to the corresponding relation between the face motion unit and the face key point.

10. The method of claim 1,

the classifier is trained by adopting the following method: constructing a two-channel convolutional neural network, wherein one channel input is a video stream, and the other channel input is the obtained AU coded matrix; capturing face space semantic information through coding of a face motion unit, and capturing motion information under fine time resolution through high frame rate video; and then, at a decoding layer, fusing the feature maps obtained by the two channels, and finally obtaining a classification result of whether the video interviewer has cheating conditions.

11. An interview cheating detection system based on human face movement units, comprising:

12. The interview cheating detection system of claim 11,

13. The interview cheating detection system of claim 1,

14. The interview cheating detection system of claim 13,

15. The interview cheating detection system of claim 13,

16. The interview cheating detection system of claim 11,

the trained classifier uses a facial behavior coding system and a facial motion coding system (FACS) to perform the construction of training samples.

17. The interview cheating detection system of claim 16,

18. The method of claim 16,

19. The method of claim 16,

20. The method of claim 1,

21. An electronic device comprising a processor and a memory, the memory for storing a computer-executable program, characterized in that:

when the computer executable program is executed by the processor, the processor performs the human facial motion unit based interview cheating detection method of any of claims 1-10.

22. A computer-readable medium storing a computer-executable program, wherein the computer-executable program, when executed, implements the human facial motion unit-based interview cheating detection method of any one of claims 1-10.