WO2021232775A1 - Video processing method and apparatus, and electronic device and storage medium - Google Patents
Video processing method and apparatus, and electronic device and storage medium Download PDFInfo
- Publication number
- WO2021232775A1 WO2021232775A1 PCT/CN2020/137690 CN2020137690W WO2021232775A1 WO 2021232775 A1 WO2021232775 A1 WO 2021232775A1 CN 2020137690 W CN2020137690 W CN 2020137690W WO 2021232775 A1 WO2021232775 A1 WO 2021232775A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target object
- video
- target
- learning
- detection
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 28
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 230000006399 behavior Effects 0.000 claims abstract description 192
- 238000000034 method Methods 0.000 claims abstract description 136
- 230000008569 process Effects 0.000 claims abstract description 61
- 238000001514 detection method Methods 0.000 claims description 198
- 230000014509 gene expression Effects 0.000 claims description 59
- 238000012545 processing Methods 0.000 claims description 51
- 230000008451 emotion Effects 0.000 claims description 42
- 210000001508 eye Anatomy 0.000 claims description 26
- 230000002452 interceptive effect Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 19
- 238000009877 rendering Methods 0.000 claims description 9
- 230000000007 visual effect Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 18
- 238000013528 artificial neural network Methods 0.000 description 16
- 230000003993 interaction Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 10
- 230000009471 action Effects 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 239000008186 active pharmaceutical agent Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 210000005252 bulbus oculi Anatomy 0.000 description 3
- 230000008921 facial expression Effects 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 241001310793 Podium Species 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000004247 hand Anatomy 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 238000011895 specific detection Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Definitions
- the present disclosure relates to the field of computer vision, and in particular to a video processing method and device, electronic equipment, and storage medium.
- the present disclosure proposes a video processing solution.
- a video processing method including:
- the video wherein at least part of the video frames in the video contain the target object; according to the video, detect at least one type of learning behavior of the target object during the course of watching the teaching course; when the target object is detected
- the learning state information is generated according to at least a part of the video frame containing the at least one type of learning behavior and/or the duration of the target object performing the at least one type of learning behavior.
- a video processing device including:
- a video acquisition module configured to acquire a video, wherein at least part of the video frames in the video contain the target object
- the detection module is configured to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video;
- a generating module configured to perform the at least one type of learning according to at least part of the video frame containing the at least one type of learning behavior and/or the target object in the case of detecting that the target object performs at least one type of learning behavior The duration of the behavior to generate learning status information.
- an electronic device including:
- a processor a memory for storing executable instructions of the processor; wherein the processor is configured to execute the above-mentioned video processing method.
- a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the foregoing video processing method is implemented.
- a computer program including computer readable code, and when the computer readable code is executed in an electronic device, a processor in the electronic device executes the method for implementing the video processing method described above. .
- the video frame containing the learning behavior when it is detected that the target object has at least one type of learning behavior, can be used to generate intuitive learning state information, and the quantified learning can be generated according to the duration of the learning behavior Status information.
- the above-mentioned methods can be used to flexibly obtain learning status information with evaluation value, which is convenient for teachers or parents and other relevant personnel and institutions to effectively and accurately grasp the learning status of students.
- Fig. 1 shows a flowchart of a video processing method according to an embodiment of the present disclosure.
- Fig. 2 shows a block diagram of a video processing device according to an embodiment of the present disclosure.
- Fig. 3 shows a schematic diagram of an application example according to the present disclosure.
- Fig. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 1 shows a flowchart of a video processing method according to an embodiment of the present disclosure.
- the method can be applied to a video processing device, and the video processing device can be a terminal device, a server, or other processing equipment.
- terminal devices can be User Equipment (UE), mobile devices, user terminals, terminals, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistants, PDAs), handheld devices, computing devices, vehicle-mounted devices, and mobile devices.
- UE User Equipment
- PDAs Personal Digital Assistants
- the data processing method can be applied to a cloud server or a local server
- the cloud server can be a public cloud server or a private cloud server, which can be flexibly selected according to actual conditions.
- the video processing method can also be implemented by a processor invoking computer-readable instructions stored in the memory.
- the video processing method may include:
- Step S11 Obtain a video, where at least part of the video frames in the video contain the target object.
- Step S12 according to the video, detect at least one type of learning behavior of the target object in the process of watching the teaching course.
- Step S13 in a case where it is detected that the target object performs at least one type of learning behavior, generate learning state information according to at least part of the video frames containing at least one type of learning behavior and/or the duration of the target object performing at least one type of learning behavior.
- the target object can be any object whose learning status information is acquired, that is, an object with learning status evaluation requirements, and its specific implementation form can be flexibly determined according to actual conditions.
- the target object may be students, such as elementary school students, middle school students or college students, etc.; in a possible realization method, the target object may be adults with advanced studies, such as adults participating in vocational education and training. Or the elderly who study in senior colleges, etc.
- the video may be a video recorded by the target object while watching the teaching course.
- the realization form of the teaching course is not limited. It may be a pre-recorded course video, or a live course or a teacher’s site. Courses taught, etc.; at least part of the video frames in the video can contain the target object, that is, the appearance of the target object in the recorded video can be flexibly determined according to the actual situation.
- the target object may always be in the video.
- the target object may also not appear in the video frame at certain moments or certain periods of time.
- this scene can be an online scene, that is, the target object watches the teaching course through online education methods such as online classrooms;
- this scene can also be an offline scene, that is, the target object can watch the teaching course taught by the teacher on the spot through the traditional face-to-face teaching method, or the target object can watch the video through the classroom and other specific teaching places. Or teaching courses played in other multimedia formats.
- the video can be a real-time video, such as a video recorded in real time by the target object in the online classroom learning process, or the target object is captured by a camera deployed in the classroom while the target object is in the classroom.
- the video can also be a recorded video. For example, after the target object learns through an online classroom, the recorded playback video of the target object learning, or the target object after the classroom has a class, Complete classroom learning videos collected through cameras deployed in the classroom.
- the subsequent disclosed embodiments all take the video recorded in real time during the online classroom learning process of the target object as an example to illustrate the video processing process.
- the video processing process in other application scenarios can be flexibly extended with reference to the subsequent disclosed embodiments, which will not be repeated here.
- step S12 can be used to detect at least one type of learning behavior of the target object in the process of watching the teaching course.
- the type and quantity of the detected learning behaviors can be flexibly determined according to actual conditions, and are not limited to the following disclosed embodiments.
- the learning behavior performed by the target object may include at least one of the following behaviors: for example, performing at least one target gesture, expressing the target emotion, paying attention to the display area of the teaching course, and generating at least one behavior with other objects.
- This kind of interactive behavior not appearing in at least part of the video frame in the video, closing eyes, and eye contact in the display area of the teaching course.
- the target gesture can reflect certain preset gestures that the target object may produce during the course of watching the teaching course.
- the specific implementation form can be flexibly set according to the actual situation. For details, please refer to the subsequent disclosed embodiments. Do unfold.
- the target emotion can be some emotions of the target object that reflect the true feelings of the teaching course during the process of watching the teaching course.
- the specific realization form can also be flexibly set according to the actual situation, and will not be expanded here.
- the display area can be the display area of the teaching course video in the online classroom.
- the display area can be these terminals
- At least one kind of interaction behavior with other objects can be the learning-related interaction generated by the target object and other objects related to the teaching course during the course of watching the teaching course.
- the realization form of other objects can be flexibly determined according to the actual situation.
- other objects can be teaching objects, such as teachers, etc.
- other objects can also be learning objects other than the target object in the teaching process, such as the target object’s Classmates, etc.; the interaction behavior with other objects can be flexibly changed according to the different objects.
- the interaction with other objects can include receiving sent by the teacher For example, receiving small red flowers from the teacher or commendation by name, etc.
- the interaction with other objects can include answering the teacher’s questions or according to
- the interaction with other objects can include group mutual assistance, group discussion, or group study.
- At least part of the video frame does not appear in the video. It may be that the learning object has left the teaching course at certain moments or certain time periods. For example, the target object may temporarily leave the current online due to personal reasons during the online learning process. Learning equipment, or leaving the shooting range of the current online learning equipment, etc.
- Eye-closing can be the closed-eye operation performed by the target object in the process of watching the teaching course.
- the eye contact in the display area of the teaching course can be the display area of watching the teaching course.
- the situation of eye contact in the display area of the teaching course can further determine the situation that the target object has not watched the display area of the teaching course, etc.
- comprehensive and flexible behavior detection can be performed on the learning process of the target object, thereby improving the comprehensiveness and accuracy of the learning status information obtained according to the detection, and making it more flexible and accurate. Grasp the learning status of the target object.
- step S12 which type or types of detections are performed on the various learning behaviors in the above disclosed embodiment can be flexibly set according to actual conditions.
- the various learning behaviors mentioned in the above disclosed embodiments can be detected at the same time, and the specific detection methods and processes can be detailed in the following disclosed embodiments, which will not be expanded here.
- the learning state information may be generated according to a video frame containing at least part of the at least one type of learning behavior and/or the duration of the target object performing at least one type of learning behavior.
- the specific implementation form of the learning status information can be flexibly determined according to the type of learning behavior and the corresponding operation performed.
- the learning state information in a case where the learning state information is generated based on video frames that at least partially contain at least one type of learning behavior, the learning state information may include information composed of video frames; in a possible implementation manner In the case of performing at least one type of learning behavior according to the duration of the target object, the learning state information may be data information in digital form; in a possible implementation, the learning state information may also include video frame information at the same time And data information; in a possible implementation, the learning status information can also include other status information. Specifically, how to generate the learning state information and the implementation form of the learning state information can refer to the subsequent disclosed embodiments, and will not be expanded here.
- the video frame containing the learning behavior when it is detected that the target object has at least one type of learning behavior, can be used to generate intuitive learning state information, and the quantified learning can be generated according to the duration of the learning behavior Status information.
- the above-mentioned methods can be used to flexibly obtain learning status information with evaluation value, which is convenient for teachers or parents and other relevant personnel and institutions to effectively and accurately grasp the learning status of students.
- the video can be a video recorded by the target object during watching the teaching course, and the scene of the target object watching the teaching course can be flexibly determined according to the actual situation. Therefore, correspondingly, the video is obtained in step S11
- the method can also be flexibly changed according to different scenarios.
- the way to obtain the video may include: If the video processing device The device for online learning with the target object is the same device, and the device for online learning for the target object can be used to collect video on the process of the target object watching the teaching course; if the video processing device and the device for online learning for the target object are different devices, Then, the device for online learning of the target object can collect the video of the process of the target object watching the teaching course, and transmit it to the video processing device in real-time and/or non-real-time.
- the way to obtain the video may include: collecting the video of the target object by deploying offline image acquisition equipment (such as ordinary cameras, shooting devices deployed in response to security requirements, etc.). Further, if the image acquisition device deployed offline can perform video processing, that is, it can be used as a video processing device, then the video acquisition process in step S11 has been completed; if the image acquisition device deployed offline cannot perform video processing, the The video collected by the offline image acquisition equipment is transmitted to the video processing device in real time and/or non-real time.
- offline image acquisition equipment such as ordinary cameras, shooting devices deployed in response to security requirements, etc.
- step S12 may include:
- Step S121 Perform target object detection on the video to obtain a video frame containing the target object.
- Step S122 Perform at least one type of learning behavior detection on the video frame containing the target object.
- target object detection can be performed on the video to determine the video frame containing the target object in the video. After determining which video frames contain the target object, at least one type of learning behavior detection can be performed on the target object in the video frame containing the target object.
- the method of detecting the target object can be flexibly determined according to the actual situation, and is not limited to the following embodiments.
- the target object in the video can be detected by means such as face detection or face tracking.
- multiple objects may be detected after the video frame is detected by means of face detection or face tracking.
- the detected face may be further detected.
- the image is screened, and one or more objects are selected as the target object.
- the specific screening method can be flexibly set according to the actual situation, which is not limited in the embodiment of the present disclosure.
- step S122 may be used to perform at least one type of learning behavior detection on the video frame containing the target object.
- the implementation of step S122 can be flexibly changed according to different learning behaviors. For details, refer to the following disclosed embodiments, which will not be expanded here. In the case where multiple types of learning behaviors of the target object need to be detected, multiple methods can be combined to achieve multiple types of learning behavior detection at the same time.
- the learning behavior detection of the target object in the process of watching the teaching course can be completed. That is, by performing target object detection on the video, it can be determined that this learning behavior does not appear in at least part of the video frames in the video mentioned in the above-mentioned disclosed embodiment. And further obtain the learning state information according to the video frame of the undetected target object, or calculate the time that the target object does not appear in at least some of the video frames in the video according to the video frame of the undetected target object as the learning state information .
- a video frame containing the target object is obtained, and at least one type of learning behavior detection is performed on the video frame containing the target object.
- the target object of the video can be used.
- Object detection is more targeted to detect at least one type of learning behavior of the target object, thereby making the learning behavior detection more accurate, and further improving the accuracy and reliability of the subsequent learning state information.
- step S122 can be flexibly changed according to different learning behaviors.
- the learning behavior may include: performing at least one target gesture;
- performing at least one type of learning behavior detection on the video frame containing the target object may include:
- At least one of the video frames containing the target gesture is recorded as a gesture start frame
- the number of gesture start frames and gesture end frames determine the number of times and/or time for the target object in the video to perform at least one target gesture.
- the learning behavior detection performed on the video frame of the target object may include target gesture detection.
- the target gesture specifically includes can be flexibly set according to actual conditions, and is not limited to the following disclosed embodiments.
- the target gesture includes one or more of a hand-raising gesture, a thumb-up gesture, an OK gesture, and a victory gesture.
- the target gesture can be included in the process of watching the teaching course.
- the target object reflects the learning-related gestures according to the listening situation, such as the gesture of raising the hand used to answer the question, the content of the lecture or the teaching
- the teacher's praise gesture tilts up, etc.
- the OK gesture that expresses understanding or approval of the teaching content
- the victory gesture for interaction with the instructor such as Little gesture, etc.
- the method for detecting at least one target gesture on the video frame containing the target object can be flexibly determined according to the actual situation, and is not limited to the following disclosed embodiments.
- the detection of target gestures can be achieved through the related algorithms of gesture recognition. For example, the key points of the hand of the target object in the video frame or the image area corresponding to the hand detection frame can be recognized, based on the hand key Gesture detection is performed on the image area corresponding to the point or hand detection frame, and based on the gesture detection result, it is determined whether the target object is performing the target gesture.
- the detection of the target gesture can be achieved through a neural network with a gesture detection function.
- the specific structure and implementation of the neural network with gesture detection function can be flexibly set according to the actual situation.
- the video frame containing the target object can be Input to the neural network that can detect multiple gestures at the same time to achieve the detection of the target gesture; in a possible implementation, the video frame containing the target object can also be input to multiple neural networks with a single gesture detection function. In the network, to realize the detection of multiple target gestures.
- the number of first thresholds can be flexibly set according to the actual situation.
- the number of first thresholds corresponding to different target gestures can be the same or different.
- the first threshold corresponding to the hand-raising gesture can be set to 6.
- the first threshold corresponding to the thumbs-up gesture is set to 7. If the number of consecutive video frames containing the hand-raising gesture is detected to be not less than 6, at least one frame can be selected from the video frames containing the hand-raising gesture as the gesture.
- the gesture start frame of the hand gesture if the number of consecutive video frames of the like gesture is not less than 7, at least one frame may be selected from the video frames containing the like gesture as the gesture start frame of the like gesture.
- the first thresholds corresponding to different target gestures may be set to the same value. In an example, the number of the first thresholds may be set to 6.
- the selection method of the gesture start frame can also be flexibly set according to the actual situation.
- the first frame of the detected continuous video frames containing the target gesture can be used as the gesture start of the target gesture Frame, in a possible implementation, in order to reduce the error of gesture detection, a certain frame after the first frame of the detected continuous video frames containing the target gesture can also be used as the gesture start frame of the target gesture .
- the gesture end frame can be determined from the video frames after the gesture start frame, that is, the end time of the target gesture in the gesture start frame can be determined.
- the specific determination method can be flexibly selected according to the actual situation, and is not limited to the following disclosed embodiments.
- the target will not be included. At least one of the consecutive video frames of the gesture is recorded as the gesture end frame.
- the value of the second threshold can also be flexibly set according to actual conditions. The values of the second threshold corresponding to different target gestures can be the same or different.
- the specific setting method can refer to the first threshold, which will not be repeated here.
- the value of the second threshold corresponding to different target gestures can be the same, for example, it can be set to 10. That is, after the gesture start frame, it is detected that 10 consecutive frames do not contain the target gesture in the gesture start frame. It is considered that the target object has finished performing the target gesture. In this case, you can select at least one frame from the continuous video frames that do not contain the target gesture as the gesture end frame.
- the selection method can also refer to the gesture start frame. In one example, you can select the gesture start frame.
- the last frame in the continuous video frames is used as the gesture end frame; in an example, a frame before the last frame in the continuous video frames that do not contain the target gesture may also be used as the gesture end frame.
- a frame before the last frame in the continuous video frames that do not contain the target gesture may also be used as the gesture end frame.
- one or some video frames that do not contain the target object can also be set End the frame as a gesture.
- the number of gesture start frames and gesture end frames contained in the video frame can be used to determine the number of times the target object performs a certain target gesture or certain target gestures.
- the duration of the execution of a certain or certain target gesture, etc. The specific determination of the content related to the target gesture can be flexibly determined according to the requirement of learning state information in step S13. For details, please refer to the subsequent disclosed embodiments, which will not be expanded here.
- the learning behavior can include: expressing the target emotion
- performing at least one type of learning behavior detection on the video frame containing the target object may include:
- the target object In a case where it is detected that the number of consecutive first detection frames exceeds the third threshold, it is determined that the target object generates the target emotion.
- the target emotion can be any emotion set according to actual needs, for example, it can be a happy emotion that indicates that the target object is focused on learning, or a bored emotion that indicates that the target object is in a poor learning state.
- the following disclosed embodiments are described by taking the target emotion as happy emotion as an example, and the case where the target emotion is other emotions can be expanded with reference to the subsequent disclosed embodiments.
- expression detection and/or smile value detection can be used to achieve the learning behavior detection of the target object.
- the learning behavior of expressing the target emotion can be detected only by expression detection or smile value detection.
- expression detection and smile value detection can be used together. Determine whether the target object expresses the target emotion.
- the subsequent disclosed embodiments are described by taking as an example the determination of whether the target object expresses the target emotion through expression detection and smile value detection. The remaining implementation manners can be expanded with reference to the subsequent disclosed embodiments, and will not be repeated here.
- the expression detection can include the detection of the expressions displayed by the target object, for example, it can detect what kind of expression the target object displays.
- the specific expression division can be flexibly set according to the actual situation.
- the expression can be divided For happiness, calmness, etc.;
- the smile value detection can include the detection of the smile intensity of the target object, for example, it can detect how big the smile of the target object is, and the result of the smile value detection can be fed back by numerical values.
- the detection result is set to be between [0,100]. The higher the value, the higher the intensity or amplitude of the target's smile.
- the specific expression detection and smile value detection methods can be flexibly determined according to the actual situation.
- any method that can detect the expression or the degree of smile of the target object can be used as a corresponding detection method, and is not limited to the following disclosed embodiments.
- the expression detection of the target object can be realized by the facial expression recognition neural network
- the smile value detection of the target object can be realized by the smile detection neural network.
- the structure and implementation of the facial expression recognition neural network and the smile value detection neural network are not limited in the embodiments of the present disclosure.
- Any neural network that can realize the expression recognition function through training and the neural network that realizes the smile value detection function through training are both It can be applied to the embodiments of the present disclosure.
- facial expression detection and smile value detection can also be realized by detecting the key points of the face and the mouth of the target object in the video.
- the target object in the video frame is detected to show at least one first target expression, or the smile value detection result exceeds the target smile value, the target object in the video frame is considered Show the target emotion, in this case, the video frame can be used as the first detection frame.
- the specific expression type of the first target expression can be flexibly set according to the actual situation, and is not limited to the following disclosed embodiments.
- happiness may be used as the first target expression, that is, video frames in which the detected expression of the target object is happy may be used as the first detection frame.
- both happy and calm can be used as the first target expression, that is, the detected expression of the target object can be a happy or calm video frame, and both can be used as the first detection frame.
- the specific value of the target smile value can also be flexibly set according to the actual situation, and there is no specific limitation here. Therefore, in a possible implementation manner, a video frame whose smile value detection result exceeds the target smile value may also be used as the first detection frame.
- the target in the case that a certain video frame is detected as the first detection frame, it may be determined that the target object generates the target emotion.
- the target in order to improve the accuracy of detection and reduce the impact of detection errors on the results of learning behavior detection, the target can be determined when the number of consecutive first detection frames exceeds the third threshold.
- the subject develops the target emotion.
- a video frame sequence in which each frame in the continuous video frames is the first detection frame may be used as the continuous first detection frame.
- the number of the third threshold can be a number flexibly set according to the actual situation, and its value can be the same as or different from the first threshold or the second threshold. In an example, the number of the third threshold can be 6, which means it is detected In the case where 6 consecutive frames are the first detection frame, it can be considered that the target object has the target emotion.
- a frame from the first continuous detection frame can be selected as the target emotion start frame, and then after the target emotion start frame, the expression of the target object is not detected for 10 consecutive frames If it is the first target expression, or the smile value detection result of the target object in 10 consecutive frames does not exceed the third threshold, or the target object cannot be detected in a certain frame or a few frames, the target emotion end frame can be further determined, Then, according to the target emotion start frame or the target emotion end frame, the number and/or time of the target emotion generated by the target object are determined.
- the specific process can refer to the corresponding process of the target gesture, which will not be repeated here.
- the first detection frame is determined, so that when the number of consecutive first detection frames exceeds the first detection frame In the case of three thresholds, it is determined that the target object produces the target emotion.
- the emotion of the target object in the learning process can be flexibly determined based on the expression and smile of the target object, so that the target object can be more comprehensively and accurately perceived in the learning process.
- the emotional state in the process generates more accurate learning state information.
- the learning behavior can include: paying attention to the display area of the teaching course;
- performing at least one type of learning behavior detection on the video frame containing the target object may include:
- the implementation form of the display area of the teaching course can refer to the above-mentioned disclosed embodiments, which will not be repeated here.
- the learning behavior detection of the target object can be achieved through expression detection and face angle detection.
- the detection of the learning behavior of paying attention to the display area of the teaching course can also be realized only by detecting the face angle.
- Subsequent disclosed embodiments are described by using expression detection and face angle detection to determine whether the target object pays attention to the display area of the teaching course as an example. The remaining implementation methods can be expanded with reference to the subsequent disclosed embodiments, and will not be repeated here. .
- the implementation of expression detection can refer to the above disclosed embodiments, which will not be repeated here;
- the face angle detection can be the detection of the orientation angle of the face.
- the specific face angle detection method can be flexibly determined according to the actual situation. Any method that can detect the face angle of the target object can be used as the face angle detection method, and is not limited to the following disclosed embodiments.
- the face angle detection of the target object can be realized through the face angle detection neural network.
- the structure and implementation of the face angle detection neural network are not limited in the embodiments of the present disclosure, and any neural network that can realize the face angle detection function through training can be applied to the embodiments of the present disclosure.
- the face angle of the target object can also be determined by detecting the key points of the target object's face in the video.
- the form of the face angle that can be detected by the face angle detection can also be flexibly determined according to the actual situation.
- it can be determined by detecting the yaw angle and pitch angle of the target object’s face The angle of the target's face.
- the target object is determined to focus on the display area of the teaching course, and the implementation method can be flexibly set according to the actual situation.
- the target object of is concerned with the display area of the teaching course.
- the video frame can be used as the second detection frame.
- the specific expression type of the second target expression can be flexibly set according to the actual situation, and may be the same as the first target expression mentioned in the above-mentioned public embodiment, or it may be different from the first target expression mentioned in the above-mentioned public embodiment It is not limited to the following disclosed embodiments.
- calm can be used as the second target expression, that is, the detected target object's expression is calm and the video frame whose face angle is within the range of the target face angle can be regarded as the second detection frame .
- the face angle of the detected target object can be within the range of the target face angle, and the expression is not "other" video Frames are regarded as the second detection frame.
- the specific range value of the target face angle range can also be flexibly set according to the actual situation, and no specific limitation is made here.
- the target face angle range may be static.
- the target face angle range in one example, a fixed area (such as the display screen that the target object pays attention to in an online scene) may be taken as the target face angle range when the target object views the teaching course.
- the target face angle range can also be dynamic.
- the target face angle range can be flexibly determined according to the current position of the teacher's movement during the lecture, that is, it can follow the teacher's movement To dynamically change the value of the target face angle range.
- the target in the case that a certain video frame is detected as the second detection frame, it can be determined that the target object pays attention to the display area of the teaching course.
- the target in order to improve the accuracy of detection and reduce the impact of detection errors on the results of learning behavior detection, the target can be determined when the number of consecutive second detection frames exceeds the fourth threshold.
- the subject pays attention to the display area of the teaching course.
- a video frame sequence in which each frame in the continuous video frames is the second detection frame may be used as the continuous second detection frame.
- the number of the fourth threshold can be a number flexibly set according to actual conditions, and its value can be the same as or different from the first threshold, the second threshold, or the third threshold. In an example, the number of the fourth threshold can be 6. , That is, when it is detected that 6 consecutive frames are all the second detection frames, it can be considered that the target object pays attention to the display area of the teaching course.
- the target object is not detected for 10 consecutive frames
- the end frame of attention can be further determined , And then determine the number and/or time the target object pays attention to the teaching course display area according to the focus start frame or focus end frame.
- the specific process can refer to the corresponding process of target gestures and target emotions, which will not be repeated here.
- the second detection frame is determined, so that the number of consecutive second detection frames exceeds the first detection frame.
- the target object pays attention to the display area of the teaching course.
- the learning behavior may also include: generating at least one interaction behavior with other objects.
- the interactive behavior reference may be made to the above disclosed embodiments, which will not be repeated here.
- the method of detecting the interactive behavior of the video frame containing the target object can be flexibly determined according to the actual situation.
- the interactive behavior is an online interactive behavior, such as receiving the teacher’s approval
- the interactive behavior detection method can be directly based on the signals transmitted by other objects to determine whether the target object has an interactive behavior.
- the method of detecting whether the target object has an interactive behavior can include: The target action of the object is recognized to determine whether the target object has an interactive behavior.
- the target action can be flexibly set according to the actual situation of the interactive behavior.
- the target action can include speaking after standing up or speaking with the face facing other objects. The time exceeds a certain time value, etc.
- step S12 may include:
- detecting the learning behavior includes: not appearing in at least part of the video frames in the video.
- each video frame in the video may also contain video frames that do not contain the target object. Therefore, these video frames that do not contain the target object can be regarded as undetected Video frames of the target object, and in the case where the number of video frames where the target object is not detected exceeds the preset number of video frames, it is confirmed that the learning behavior of "not appearing in at least part of the video frames in the video" is detected.
- the number of preset video frames can be flexibly set according to the actual situation.
- the number of preset video frames can be set to 0, that is, when the video contains video frames where the target object is not detected, That is to say, it is considered that this learning behavior is not detected in at least part of the video frames in the video.
- the preset number of video frames can also be a number greater than 0. The specific setting can be based on the actual situation. Flexible decision.
- the learning behavior can also include closed eyes.
- the learning behavior detection method can be closed eyes detection.
- the specific process of closed eyes detection can be flexibly set according to the actual situation.
- It can be realized by a neural network with closed eyes detection function.
- it can also determine whether the target object has closed eyes or not by detecting the key points in the eyes and the eyeball. For example, after detecting the key points in the eyeball In the case of dots, it is determined that the target object has eyes open; in the case that only the key points of the eye are detected, and the key points in the eyeball are not detected, the eyes of the target object are determined to be closed.
- the learning behavior can also include eye contact in the display area of the teaching course.
- the learning behavior detection method can refer to the focus on the display area of the teaching course in the above disclosed embodiment.
- the specific detection method can be flexibly changed.
- the target object can be detected with closed eyes and face angle at the same time, and the video frame with the face angle within the target face angle range without closed eyes is used as the third detection frame. Then, when the number of third detection frames exceeds a certain set threshold, it is determined that the target object is making eye contact in the display area of the teaching course.
- step S13 After the detection of at least one type of learning behavior of the target object is achieved through any combination of the various implementation manners of the above disclosed embodiments, it can be generated through step S13 when the target object is detected to perform at least one type of learning behavior. Learning status information.
- the specific implementation of step S13 is not limited, and can be flexibly changed according to the actual situation of the detected learning behavior, and is not limited to the following disclosed embodiments.
- the learning state can be generated based on a video frame containing at least one type of learning behavior. Information; or generate learning state information according to the duration of the target object performing at least one type of learning behavior; or a combination of the above two situations, both based on the video frame containing at least one type of learning behavior to generate part of the learning state information, Another type of learning state information is generated according to the duration of at least one type of learning behavior performed by the target object.
- the learning state information can be generated based on the video frames of the learning behavior, and the learning state information can be generated based on the duration of the target object performing at least one type of learning behavior, which learning state is generated according to which type of learning behavior Information and its mapping method can be flexibly set according to the actual situation.
- some positive learning behaviors can be corresponded to the process of generating learning state information based on the video frames containing the learning behaviors, such as performing at least one target gesture on the target object and showing a positive goal.
- the learning state information can be generated based on the video frame containing the above learning behavior; in a possible implementation manner, it can also be Some negative learning behaviors, such as when the target object does not appear in at least part of the video frame in the video, eyes are closed, or there is no eye contact in the display area of the teaching course, can be based on the duration of the above learning behavior. Generate learning status information.
- generating learning state information according to video frames containing at least one type of learning behavior at least in part may include:
- Step S1311 Obtain video frames containing at least one type of learning behavior in the video as a target video frame set;
- Step S1312 Perform face quality detection on at least one video frame in the target video frame set, and use a video frame with a face quality greater than a face quality threshold as a target video frame;
- Step S1313 Generate learning state information according to the target video frame.
- the video frame containing at least one type of learning behavior may be a video frame in which the target object is detected to perform at least one type of behavior in the process of learning behavior detection, such as the first detection frame mentioned in the above-mentioned disclosed embodiment, The second detection frame and the third detection frame, etc., or the video frame containing the target gesture between the gesture start frame and the gesture end frame, etc.
- each video frame containing each type of learning behavior can be obtained according to the type of learning behavior, so as to form the target video frame set of each type of learning behavior; in a possible implementation manner, It is also possible to obtain partial frames containing each type of learning behavior according to the type of learning behavior, and then obtain the target video frame set of that type of learning behavior based on the partial frames of each type of learning behavior, which part of the frame is specifically selected, and the selection method Can be flexibly decided.
- step S1312 may be used to select and obtain the target video frame from the target video frame set. It can be seen from step S1312 that, in a possible implementation manner, face quality detection may be performed on the video frames in the target video frame set, and then video frames with face quality greater than the face quality threshold are used as the target video frames.
- the face quality detection method can be flexibly set according to the actual situation, and is not limited to the following disclosed embodiments.
- the face quality can be determined by performing face recognition on the face in the video frame.
- the completeness of the face in the video frame is used to determine the face quality; in a possible implementation, the face quality can also be determined based on the clarity of the face in the video frame; in a possible implementation, it is also
- the face quality in the video frame can be comprehensively judged based on multiple parameters such as the completeness, clarity, and brightness of the face of the video frame; in a possible implementation, the video frame can also be input to the face quality nerve Network to obtain the face quality in the video frame.
- the face quality neural network can be obtained by training a large number of face images containing face quality scores.
- the specific implementation form can be flexibly selected according to the actual situation. In the embodiments of the present disclosure, No restrictions.
- the specific value of the face quality threshold can be flexibly determined according to the actual situation, which is not limited in the embodiment of the present disclosure.
- different face quality thresholds may be set for each type of learning behavior; in a possible implementation manner, the same face threshold may also be set for each type of learning behavior.
- the face quality threshold can also be set to the maximum value of the face quality in the target video frame set. In this case, you can directly set the highest face quality under each type of learning behavior The video frame is used as the target video frame.
- video frames there may be certain video frames that contain multiple types of learning behaviors at the same time.
- the manner of processing video frames containing multiple types of learning behaviors can be flexibly changed according to actual conditions.
- these video frames can be assigned to each type of learning behavior, and then selected from the set of video frames corresponding to each type of learning behavior in step S1312 to obtain the target video frame;
- a video frame containing multiple types of learning behaviors at the same time can also be directly selected as the target video frame.
- step S1313 may be used to generate learning state information according to the target video frame.
- the implementation of step S1313 can be flexibly selected according to the actual situation. For details, please refer to the following disclosed embodiments, which will not be expanded here.
- the video frame containing at least one type of learning behavior is obtained as the target video frame set, so that according to the target video frame set of each type of learning behavior, the video frame with higher face quality is selected As the target video frame, the learning state information is then generated according to the target video frame.
- the generated learning status information can be based on the information obtained from the video frames with higher face quality and containing learning behaviors, with higher accuracy, so that the learning of the target object can be grasped more accurately state.
- step S1313 can be flexibly changed.
- step S1313 may include:
- At least one frame of the target video frame can be directly used as the learning state information.
- the obtained target video frame can be further selected. This selection can be random or subject to certain conditions, and then the selected target video frame is directly used as the learning state information; in one example, each target video frame obtained can also be directly equalized. As learning status information.
- the area where the target object is located in the target video frame may be further identified, so as to generate learning state information according to the area where the target object is located.
- the method of recognizing the target object area is not limited in the embodiment of the present disclosure. In a possible implementation manner, it can be implemented by the neural network with the target object detection function mentioned in the above-mentioned disclosed embodiment. After the area of the target object in the target video frame is determined, the target video frame can be further processed accordingly to obtain the learning state information. Among them, the processing method can be flexibly determined.
- the image of the area where the target object is located in the target video frame can be used as the learning state information; in one example, the background outside the area where the target object is located in the target video frame can also be used as learning state information.
- Area rendering such as adding other stickers, or adding mosaic to the background area, or replacing the image of the background area, etc., to get the learning status information that does not display the current background of the target object, so as to better protect the privacy of the target object .
- the above method can make the final learning state information more flexible, so that According to the needs of the target object, the learning status information of the target object is more prominent, or the learning status information that protects the privacy of the target object more can be obtained.
- Table 1 shows a learning state information generation rule according to an embodiment of the present disclosure.
- M, N, X, Y, and Z are all positive integers, and the specific values can be set according to actual needs.
- the parameters such as M in different rows in Table 1 may be the same or different.
- the above-mentioned parameters such as M are only used as a schematic description, and not as a limitation to the present disclosure.
- the amazing moment is the moment corresponding to the positive learning behavior of the target object.
- the target object can be detected to perform target gestures such as raising hands, to generate the target emotion of happiness, or to pay attention to the display area of the teaching course, and to have a roll call with the teacher.
- target gestures such as raising hands
- the target emotion of happiness or to pay attention to the display area of the teaching course
- the teacher to have a roll call with the teacher.
- certain data processing is performed on the video, and after the data processing, further image processing is performed on the video frame to obtain the target video frame as the learning state information.
- generating learning state information according to the duration of the target object performing at least one type of learning behavior may include:
- Step S1321 in the case where it is detected that the time for the target object to perform at least one type of learning behavior is not less than the time threshold, record the duration of the at least one type of learning behavior;
- step S1322 the duration corresponding to at least one type of learning behavior is used as the learning state information.
- the time threshold can be a certain value flexibly set according to the actual situation, and the time thresholds of different types of learning behaviors can be the same or different.
- the time for the target object to perform these learning behaviors can be counted, so as to feed back to the teacher or parent as learning status information.
- the specific statistical conditions and the statistical time under which learning behaviors can be implemented can be flexibly set according to the actual situation.
- the time length of these learning behaviors can be counted and used as the learning status information.
- the duration of at least one type of learning behavior is recorded as the learning state information.
- the video processing method proposed in the embodiment of the present disclosure may further include:
- the segmentation method of the background area and the rendering method of the background area reference may be made to the above-mentioned disclosed embodiment for identifying the area where the target object in the target video frame is located and the rendering process after the recognition, which will not be repeated here.
- it can be rendered by a universal template preset in the current video processing device; in one example, it can also be rendered by calling other templates in the database of the non-video processing device or Customized templates, etc. for rendering, for example, other background templates can be called from a cloud server of a non-video processing device, etc., to render the background area in the video, etc.
- the privacy of the target object in the video can be protected, and the possibility of privacy leakage of the target object due to the lack of a suitable video capture location is reduced. On the other hand, it is also It can enhance the interest of the target object to watch the teaching course process.
- the video processing method proposed in the embodiment of the present disclosure may further include:
- the learning state statistical data is generated.
- the target object contained in a video may be one or multiple.
- the video processing method in the embodiment of the present disclosure may be used to process a single video, or it may be used to process a single video. Multiple videos are processed. Therefore, correspondingly, the learning status information of one target object can be obtained, and the learning status information of multiple target objects can also be obtained.
- statistics can be performed on the learning state information of at least one target object to obtain a statistical result of at least one target object.
- the statistical result may include not only the learning status information of the target object, but also other information related to the target object's viewing of the teaching course.
- the sign-in data of the target object can also be obtained before step S12, that is, before performing learning behavior detection on the target object.
- the check-in data of the target object may include the identity information and check-in time of the target object.
- the specific check-in data acquisition method can be flexibly determined according to the actual check-in method of the target object, which is not limited in the embodiments of the present disclosure.
- the learning state statistical data can be generated according to the at least one statistical result.
- the generation method and content of the learning state statistical data can be flexibly changed according to the realization form of the statistical result.
- the statistical result of the at least one target object is obtained by counting the learning status information of at least one target object, so as to generate the learning status statistical data according to the statistical result of the at least one target object.
- generating the learning state statistical data according to the statistical result of at least one of the target objects includes:
- the statistical result of the target object contained in the at least one category is obtained, and the learning status statistical data of at least one category is generated. And at least one of the devices used by the target object; and/or,
- the category to which the target object belongs may be a category divided according to the identity of the target object.
- the category to which the target object belongs may include at least one of the courses the target object participates in, the institution registered by the target object, and the equipment used by the target object.
- the course that the target object participates in may be the teaching course watched by the target object mentioned in the above disclosed embodiment
- the institution registered by the target object may be the educational institution where the target object is located, or the grade or grade of the target object.
- the class where the target object is located, and the equipment used by the target object may be the terminal device used by the target object to participate in the online course in an online scene.
- the statistical results of the target objects contained in at least one category can be obtained according to the category to which the target object belongs, that is, at least one statistical result of the category to which the target object belongs can be summarized to obtain the Statistics of overall learning status. For example, it can be divided according to the categories of equipment, courses, educational institutions, etc., and the statistical results of different target objects under the same equipment, the statistical results of different target objects under the same course, and the statistical results of different target objects in the same educational institution can be obtained respectively. Wait. In an example, these statistical results can also be displayed in the form of a report.
- the statistical results of each category in the report can include not only the overall learning status information of each target object, but also the specific learning status information of each target object, such as the focus on the display area of the teaching course
- the length of time, the length of smiling time, etc. in addition to this, it can also contain other information related to watching the teaching course, such as the check-in time of the target object, the number of check-ins, the match between the target object and the face in the preset database, Sign-in equipment and sign-in courses, etc.
- the statistical results of at least one target object can also be visualized to obtain the statistical data of the learning state of the at least one target object.
- the visual processing method can be flexibly determined according to the actual situation, for example, the data can be sorted into forms such as charts or videos.
- the content contained in the learning status statistics can be flexibly determined according to the actual situation. For example, it can include the overall learning status information of the target object, the name of the teaching course watched by the target object, and the specific learning status information of the target object.
- the actual situation is flexible.
- the results, the number of interactions of the target object, and the emotions of the target object are organized into a visual report, and sent to the target object or other relevant personnel of the target object, such as the parents of the target object.
- the visualized statistical data of learning status can contain text content in the form of "The subject of class is XX, the duration of concentration of A student is 30 minutes, and the concentration is concentrated, which is 10% higher than the class. % Of classmates interacted 3 times and smiled 5 times. I hereby give praise and are willing to continue to work hard" or "The subject of class is XX, B students have less concentration, and the frequency of gestures such as raising hands is lower. Parents are advised to pay close attention , Adjust the children’s study habits in time” and so on.
- the learning state statistical data of at least one category is generated, and/or the statistical result of the at least one target object is visualized to generate the statistics of the at least one target object.
- Statistics of learning status Through the above process, the learning state of the target object can be grasped more intuitively and comprehensively through different statistical methods.
- Fig. 2 shows a block diagram of a video processing device according to an embodiment of the present disclosure.
- the video processing device 20 may include:
- the video acquisition module 21 is configured to acquire a video, where at least part of the video frames in the video contain the target object;
- the detection module 22 is used to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video;
- the generating module 23 is configured to generate learning based on at least part of the video frames containing at least one type of learning behavior and/or the duration of the target object performing at least one type of learning behavior when it is detected that the target object performs at least one type of learning behavior status information.
- the learning behavior includes at least one of the following behaviors: performing at least one target gesture, expressing the target emotion, paying attention to the display area of the teaching course, generating at least one interactive behavior with other objects, There is no eye contact, eyes closed, and eye contact in the display area of the teaching course in at least part of the video frames in.
- the detection module is configured to: perform target object detection on the video to obtain a video frame containing the target object; and perform at least one type of learning behavior detection on the video frame containing the target object.
- the learning behavior includes performing at least one target gesture; the detection module is further configured to: perform detection of at least one target gesture on the video frame containing the target object; When the number of continuous video frames exceeds the first threshold, record at least one of the video frames containing the target gesture as the gesture start frame; in the video frames after the gesture start frame, the continuous video frames that do not contain the target gesture When the number exceeds the second threshold, record at least one of the video frames that do not contain the target gesture as the gesture end frame; according to the number of gesture start frames and gesture end frames, determine that the target object in the video performs at least one target The number and/or time of gestures.
- the learning behavior includes expressing the target emotion; the detection module is further used to: perform expression detection and/or smile value detection on the video frame containing the target object; in the detected video frame, the target object displays at least one When the first target expression or smile value detection result exceeds the target smile value, the detected video frame is regarded as the first detection frame; when the number of consecutive first detection frames exceeds the third threshold, it is determined The target object produces the target emotion.
- the learning behavior includes paying attention to the display area of the teaching course; the detection module is further used to: perform expression detection and face angle detection on the video frame containing the target object; display the target object in the detected video frame In the case of at least one second target expression and the face angle is within the range of the target face angle, the detected video frame is used as the second detection frame; when the number of consecutive second detection frames exceeds the fourth threshold In this case, determine the target object to focus on the display area of the teaching course.
- the generating module is used to: obtain video frames containing at least one type of learning behavior in the video as a target video frame set; perform face quality detection on at least one video frame in the target video frame set, The video frame whose face quality is greater than the face quality threshold is taken as the target video frame; according to the target video frame, the learning state information is generated.
- the generating module is further configured to: use at least one frame of the target video frame as the learning state information; and/or, identify the area where the target object is located in the at least one frame of the target video frame, based on the target object In the area, the learning status information is generated.
- the detection module is used to: perform target object detection on the video to obtain a video frame containing the target object, and use the video frame other than the video frame containing the target object as the undetected target object
- the detected learning behavior includes: not appearing in at least part of the video frames in the video.
- the generating module is used to record the duration of at least one type of learning behavior when it is detected that the time for the target object to perform at least one type of learning behavior is not less than a time threshold; The duration corresponding to the behavior is used as the learning status information.
- the device is further configured to: render a background area in at least part of the video frame in the video, where the background area is an area outside the target object in the video frame.
- the device is further configured to: collect statistics on the learning state information of at least one target object to obtain a statistical result of at least one target object; and generate statistical data of the learning state according to the statistical result of at least one target object.
- the device is further configured to: obtain statistical results of the target objects contained in the at least one category according to the category to which the at least one target object belongs, and generate statistical data of the learning state of at least one category, wherein the target object belongs to
- the category includes at least one of the courses the target object participates in, the institution registered by the target object, and the equipment used by the target object; and/or visualize the statistical results of at least one target object to generate the learning status of at least one target object Statistical data.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the application example of the present disclosure proposes a set of learning system, which can effectively grasp the learning state of students through the video processing method proposed in the above-mentioned disclosed embodiment.
- Fig. 3 shows a schematic diagram of an application example according to the present disclosure.
- the learning system can be composed of three parts: the user end, the educational software service (SaaS, Software-as-a-Service) backend, and the interactive classroom backend. Among them, students watch the teaching courses through the client.
- the client can include two parts: hardware devices for learning (such as the client with Windows system or IOS system and SDK installed in the picture), and the student's login to the online classroom.
- Application ie the user APP in the figure).
- the education SaaS backend can be a platform built by the server of the educational institution where the student is located, and the interactive classroom backend can be a platform built by a server that aggregates data from different educational institutions and performs data maintenance, whether it is an education SaaS backend or an interactive classroom backend.
- Data can be exchanged with the client through the API interface.
- the process of generating learning state information may include:
- the user terminal obtains the learning status information of each student by collecting the videos of the students watching the teaching course process and processing the collected videos.
- the education SaaS background and the interactive classroom background call the learning status generated in different users through the API interface.
- the user terminal processes the collected video, and the process of obtaining the learning status information of each student may include:
- the student's wonderful moments can be obtained after the student signs in successfully, and the videos or pictures of the next wonderful moments will be uploaded to the background or the cloud. At the same time, it is also possible to choose whether the students can see the uploaded wonderful moments in real time.
- the highlight definition rule may include: generating at least one target gesture.
- the target gesture may include raising hand, like, gesture OK, gesture Little, etc. If a student is detected to perform the above gesture within a period of time, Then you can extract pictures or video frames from videos that contain gestures. Express the happy target emotion.
- the smile value reaches a certain target smile value (such as 99 points)
- a certain target smile value such as 99 points
- the video frame performs picture or video frame extraction. Pay attention to the display area of the teaching course. If the student's face orientation is always correct within a period of time, that is, the headpose is within a certain threshold range, then pictures or video frames can be extracted from the video within this period of time.
- the student may not be on the screen or may be unfocused, and the data can be pushed to the parents in real time through the learning situation detection, so that the parents can pay attention to the children in the first time, and correct the children’s bad learning habits in time. Auxiliary supervision.
- the process of checking the student's academic status can be carried out after the student signs in successfully. For example, for how long in front of the camera, no one appears in front of the camera, does not watch the screen, closes eyes, etc., it is judged that the person has a low degree of concentration. In this case, it is possible to count the length of time during which the student has the above-mentioned learning behavior, and use it as the result of the academic condition detection to obtain the corresponding learning state data.
- the specific academic condition detection configuration rules can refer to the above disclosed embodiments, which will not be repeated here.
- learning status information including exciting moments and learning situation detection can be obtained.
- the education SaaS backend and interactive classroom backend use API interfaces to call the learning status information generated in different client terminals to generate learning status.
- the process of statistical data can include:
- Report generation that is, the generation of statistical data of learning status in at least one category in the above disclosed embodiment.
- the backend or cloud API can view student sign-in information and learning status information in different dimensions such as device, course, institution, etc.
- the main data indicators can include: sign-in time, sign-in times, and face database (that is, the above-mentioned public The target object in the embodiment matches the face in the preset database), sign-in equipment, sign-in course, focus time, smile time, etc.
- Analysis report (that is, the visualization process in the above disclosed embodiment generates statistics on the learning status of at least one target object).
- the education SaaS backend or the interactive classroom backend can unify the students' performance in the online classroom into a complete academic analysis report.
- the report explains the student’s class status through a visual graphical interface.
- the background can also select a better situation and push it to parents or teachers, so that it can be used by institutional teachers to analyze the student’s situation and gradually assist children in improving their learning behavior.
- the learning system can also perform background segmentation processing on the student's learning video when the student is learning through the user terminal.
- the user terminal may provide a background segmentation function for situations where the student does not have a location background suitable for live broadcast or is unwilling to display a background image for privacy protection.
- the SDK on the user side can support several different background templates. For example, several general templates can be preset.
- students can also call customized templates from the interactive classroom backend through the user side.
- the SDK can provide a background template preview interface to the app on the user side, so that students can preview the customized templates that can be called through the app; students can also use the background segmentation stickers on the app on the user side to compare The live broadcast background is rendered.
- the student can also be manually triggered to close.
- the APP on the user side can report the data of students using stickers to the corresponding back-end (education SaaS back-end or interactive classroom back-end), and the corresponding back-end can analyze which background stickers are used by students and information such as usage amount as additional learning status information.
- the learning system proposed in the application examples of the present disclosure can not only be applied to online classrooms, but also be extended to other related fields, such as online meetings.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
- the computer-readable storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.
- An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above-mentioned method.
- the embodiment of the present disclosure also provides a computer program, including computer readable code, when the computer readable code is executed in an electronic device, the processor in the electronic device is executed to implement the above method.
- the above-mentioned memory may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- volatile memory such as RAM
- non-volatile memory such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- the foregoing processor may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It is understandable that, for different devices, the electronic device used to implement the above-mentioned processor function may also be other, and the embodiment of the present disclosure does not specifically limit it.
- the electronic device can be provided as a terminal, server or other form of device.
- the embodiment of the present disclosure also provides a computer program, which implements the foregoing method when the computer program is executed by a processor.
- FIG. 4 is a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
- the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
- the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
- the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
- the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
- the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
- the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable and Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Magnetic Disk Magnetic Disk or Optical Disk.
- the power supply component 806 provides power for various components of the electronic device 800.
- the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
- the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G, or 5G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related personnel information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- ASIC application-specific integrated circuits
- DSP digital signal processors
- DSPD digital signal processing devices
- PLD programmable logic devices
- FPGA field-available A programmable gate array
- controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
- FIG. 5 is a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
- the electronic device 1900 may be provided as a server. 5
- the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932, for storing instructions executable by the processing component 1922, such as application programs.
- the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the above-described methods.
- the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
- the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory flash memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as a printer with instructions stored thereon
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connect).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), is personalized by using status personnel information of computer-readable program instructions.
- FPGA field programmable gate array
- PDA programmable logic array
- the computer-readable program instructions can be executed to implement various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (17)
- 一种视频处理方法,其特征在于,包括:A video processing method, characterized in that it comprises:获取视频,其中,所述视频中的至少部分视频帧包含目标对象;Acquiring a video, where at least part of the video frames in the video contains the target object;根据所述视频,对所述目标对象在观看教学课程过程中的至少一类学习行为进行检测;According to the video, detect at least one type of learning behavior of the target object in the process of watching the teaching course;在检测到所述目标对象执行至少一类学习行为的情况下,根据至少部分包含所述至少一类学习行为的视频帧和/或所述目标对象执行所述至少一类学习行为的持续时间,生成学习状态信息。In the case of detecting that the target object performs at least one type of learning behavior, according to at least a part of the video frame containing the at least one type of learning behavior and/or the duration of the target object performing the at least one type of learning behavior, Generate learning status information.
- 根据权利要求1所述的方法,其特征在于,所述学习行为包括以下行为中的至少一类:执行至少一种目标手势、表现目标情绪、关注所述教学课程的展示区域、与其他对象产生至少一种互动行为、在所述视频中的至少部分视频帧中未出现、闭眼以及在所述教学课程的展示区域内的目光交流。The method according to claim 1, wherein the learning behavior includes at least one of the following behaviors: performing at least one target gesture, expressing target emotions, paying attention to the display area of the teaching course, and producing with other objects. At least one interactive behavior, not appearing in at least part of the video frames in the video, eyes closed, and eye contact in the display area of the teaching course.
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述视频,对所述目标对象的至少一类学习行为进行检测,包括:The method according to claim 1 or 2, wherein the detecting at least one type of learning behavior of the target object according to the video comprises:对所述视频进行目标对象检测,得到包含所述目标对象的视频帧;Performing target object detection on the video to obtain a video frame containing the target object;对包含所述目标对象的视频帧进行至少一类学习行为检测。At least one type of learning behavior detection is performed on the video frame containing the target object.
- 根据权利要求3所述的方法,其特征在于,所述学习行为包括执行至少一种目标手势;The method according to claim 3, wherein the learning behavior comprises performing at least one target gesture;所述对包含所述目标对象的视频帧进行至少一类学习行为检测,包括:The performing at least one type of learning behavior detection on the video frame containing the target object includes:对包含所述目标对象的视频帧进行至少一种目标手势的检测;Detecting at least one target gesture on the video frame containing the target object;在检测到包含至少一种所述目标手势的连续视频帧的数量超过第一阈值的情况下,将包含所述目标手势的视频帧中的至少一帧记录为手势开始帧;In a case where it is detected that the number of continuous video frames containing at least one of the target gestures exceeds the first threshold, at least one of the video frames containing the target gesture is recorded as a gesture start frame;在手势开始帧以后的视频帧中,不包含所述目标手势的连续视频帧的数量超过第二阈值的情况下,将不包含所述目标手势的视频帧中的至少一帧记录为手势结束帧;In the video frames after the gesture start frame, if the number of consecutive video frames that do not include the target gesture exceeds the second threshold, record at least one of the video frames that do not include the target gesture as the gesture end frame ;根据所述手势开始帧与所述手势结束帧的数量,确定所述视频中所述目标对象执行至少一种目标手势的次数和/或时间。According to the number of the gesture start frame and the gesture end frame, determine the number of times and/or time for the target object in the video to perform at least one target gesture.
- 根据权利要求3或4所述的方法,其特征在于,所述学习行为包括表现目标情绪;The method according to claim 3 or 4, wherein the learning behavior includes expressing a target emotion;所述对包含所述目标对象的视频帧进行至少一类学习行为检测,包括:The performing at least one type of learning behavior detection on the video frame containing the target object includes:对包含所述目标对象的视频帧进行表情检测和/或微笑值检测;Performing expression detection and/or smile value detection on the video frame containing the target object;在检测到视频帧中所述目标对象展示至少一种第一目标表情或微笑值检测的结果超过目标微笑值情况下,将检测到的视频帧作为第一检测帧;In a case where it is detected that the target object in the video frame shows at least one first target expression or a smile value detection result exceeds the target smile value, use the detected video frame as the first detection frame;在检测到连续的所述第一检测帧的数量超过第三阈值的情况下,确定所述目标对象产生所述目标情绪。In a case where it is detected that the number of consecutive first detection frames exceeds a third threshold, it is determined that the target object generates the target emotion.
- 根据权利要求3至5中任意一项所述的方法,其特征在于,所述学习行为包括关注所述教学课程的展示区域;The method according to any one of claims 3 to 5, wherein the learning behavior includes paying attention to the display area of the teaching course;所述对包含所述目标对象的视频帧进行至少一类学习行为检测,包括:The performing at least one type of learning behavior detection on the video frame containing the target object includes:对包含所述目标对象的视频帧进行表情检测和人脸角度检测;Performing expression detection and face angle detection on the video frame containing the target object;在检测到视频帧中所述目标对象展示至少一种第二目标表情且人脸角度在目标人脸角度范围以内的情况下,将检测到的视频帧作为第二检测帧;In a case where it is detected that the target object in the video frame shows at least one second target expression and the face angle is within the target face angle range, use the detected video frame as the second detection frame;在检测到连续的所述第二检测帧的数量超过第四阈值的情况下,确定所述目标对象关注所述教学课程的展示区域。In a case where it is detected that the number of consecutive second detection frames exceeds a fourth threshold, it is determined that the target object pays attention to the display area of the teaching course.
- 根据权利要求1至6中任意一项所述的方法,其特征在于,所述根据至少部分包含所述至少一类学习行为的视频帧,生成学习状态信息,包括:The method according to any one of claims 1 to 6, wherein the generating learning state information according to a video frame at least partially containing the at least one type of learning behavior comprises:获取所述视频中包含至少一类学习行为的视频帧,作为目标视频帧集合;Acquiring video frames containing at least one type of learning behavior in the video as a target video frame set;对所述目标视频帧集合中的至少一个视频帧进行人脸质量检测,将人脸质量大于人脸质量阈值的视频帧作为目标视频帧;Perform face quality detection on at least one video frame in the target video frame set, and use a video frame with a face quality greater than a face quality threshold as a target video frame;根据所述目标视频帧,生成所述学习状态信息。According to the target video frame, the learning state information is generated.
- 根据权利要求7所述的方法,其特征在于,所述根据所述目标视频帧,生成所述学习状态信息,包括:The method according to claim 7, wherein said generating said learning state information according to said target video frame comprises:将所述目标视频帧中的至少一帧作为学习状态信息;和/或,Use at least one of the target video frames as learning state information; and/or,识别在至少一帧所述目标视频帧中所述目标对象所在区域,基于所述目标对象所在区域,生成所述学习状态信息。Identify the area where the target object is located in at least one frame of the target video frame, and generate the learning state information based on the area where the target object is located.
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述视频,对所述目标对象的至少一类学习行为进行检测,包括:The method according to claim 1 or 2, wherein the detecting at least one type of learning behavior of the target object according to the video comprises:对所述视频进行目标对象检测,得到包含所述目标对象的视频帧,并将所述视频中包含所述目标对象的视频帧以外的视频帧,作为未检测到目标对象的视频帧;Performing target object detection on the video to obtain a video frame containing the target object, and using a video frame other than the video frame containing the target object in the video as a video frame in which no target object is detected;在所述未检测到目标对象的视频帧的数量超过预设视频帧数量的情况下,检测到所述学习行为包括:在所述视频中的至少部分视频帧中未出现。In a case where the number of video frames in which the target object is not detected exceeds the preset number of video frames, detecting the learning behavior includes: not appearing in at least part of the video frames in the video.
- 根据权利要求1至9中任意一项所述的方法,其特征在于,所述根据所述目标对象执行所述至少一类学习行为的持续时间,生成学习状态信息,包括:The method according to any one of claims 1 to 9, wherein the generating learning state information according to the duration of the target object performing the at least one type of learning behavior comprises:在检测到所述目标对象执行至少一类学习行为的时间不小于时间阈值的情况下,记录至少一类所述学习行为的持续时间;If it is detected that the time for the target object to perform at least one type of learning behavior is not less than a time threshold, record the duration of at least one type of learning behavior;将至少一类所述学习行为对应的所述持续时间,作为所述学习状态信息。The duration corresponding to at least one type of the learning behavior is used as the learning state information.
- 根据权利要求1至10中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 10, wherein the method further comprises:对所述视频中的至少部分视频帧中的背景区域进行渲染,其中,所述背景区域为所述视频帧中所述目标对象以外的区域。Rendering a background area in at least a part of the video frame in the video, where the background area is an area outside the target object in the video frame.
- 根据权利要求1至11中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 11, wherein the method further comprises:统计至少一个所述目标对象的学习状态信息,得到至少一个所述目标对象的统计结果;Statistics the learning state information of at least one of the target objects, and obtain a statistical result of at least one of the target objects;根据至少一个所述目标对象的统计结果,生成学习状态统计数据。According to the statistical result of at least one of the target objects, the learning state statistical data is generated.
- 根据权利要求12所述的方法,其特征在于,所述根据至少一个所述目标对象的统计结果,生成学习状态统计数据,包括:The method according to claim 12, wherein said generating statistical data of learning state according to a statistical result of at least one of said target objects comprises:根据至少一个所述目标对象所属的类别,获取至少一个所述类别包含的目标对象的统计结果,生成至少一个类别的学习状态统计数据,其中,所述目标对象所属的类别包括所述目标对象参与的课程、所述目标对象注册的机构以及所述目标对象使用的设备中的至少一种;和/或,According to the category to which at least one of the target objects belongs, the statistical results of the target objects contained in at least one of the categories are obtained, and the learning status statistics data of at least one category are generated, wherein the category to which the target object belongs includes the participation of the target object At least one of the courses of, the institution registered by the target object, and the equipment used by the target object; and/or,将至少一个所述目标对象的统计结果进行可视化处理,生成至少一个所述目标对象的学习状态统计数据。Visual processing is performed on the statistical results of at least one of the target objects to generate statistical data of the learning state of at least one of the target objects.
- 一种视频处理装置,其特征在于,包括:A video processing device, characterized in that it comprises:视频获取模块,用于获取视频,其中,所述视频中的至少部分视频帧包含目标对象;A video acquisition module, configured to acquire a video, wherein at least part of the video frames in the video contain the target object;检测模块,用于根据所述视频,对所述目标对象在观看教学课程过程中的至少一类学习行为进行检测;The detection module is configured to detect at least one type of learning behavior of the target object in the process of watching the teaching course according to the video;生成模块,用于在检测到所述目标对象执行至少一类学习行为的情况下,根据至少部分包含所述至少一类学习行为的视频帧和/或所述目标对象执行所述至少一类学习行为的持续时间,生成学习状态信息。A generating module, configured to perform the at least one type of learning according to at least part of the video frame containing the at least one type of learning behavior and/or the target object in the case of detecting that the target object performs at least one type of learning behavior The duration of the behavior to generate learning status information.
- 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:处理器;processor;用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至12中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method according to any one of claims 1-12.
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至13中任意一项所述的方法。A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 13 when the computer program instructions are executed by a processor.
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-13中的任一权利要求所述的方法。A computer program, comprising computer-readable code, when the computer-readable code runs in an electronic device, the processor in the electronic device executes the Methods.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217021262A KR20210144658A (en) | 2020-05-22 | 2020-12-18 | Video processing method and apparatus, electronic device and storage medium |
JP2021538705A JP2022537475A (en) | 2020-05-22 | 2020-12-18 | Video processing method and apparatus, electronic device and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010442733.6A CN111553323A (en) | 2020-05-22 | 2020-05-22 | Video processing method and device, electronic equipment and storage medium |
CN202010442733.6 | 2020-05-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021232775A1 true WO2021232775A1 (en) | 2021-11-25 |
Family
ID=72000950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/137690 WO2021232775A1 (en) | 2020-05-22 | 2020-12-18 | Video processing method and apparatus, and electronic device and storage medium |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP2022537475A (en) |
KR (1) | KR20210144658A (en) |
CN (1) | CN111553323A (en) |
TW (1) | TW202145131A (en) |
WO (1) | WO2021232775A1 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553323A (en) * | 2020-05-22 | 2020-08-18 | 北京市商汤科技开发有限公司 | Video processing method and device, electronic equipment and storage medium |
CN112287844B (en) * | 2020-10-30 | 2023-04-18 | 北京市商汤科技开发有限公司 | Student situation analysis method and device, electronic device and storage medium |
CN112652200A (en) * | 2020-11-16 | 2021-04-13 | 北京家有课堂科技有限公司 | Man-machine interaction system, man-machine interaction method, server, interaction control device and storage medium |
TWI759016B (en) * | 2020-12-17 | 2022-03-21 | 正文科技股份有限公司 | Testee learning status detection method and testee learning status detection system |
CN112598551B (en) * | 2020-12-24 | 2022-11-29 | 北京市商汤科技开发有限公司 | Behavior guidance scheme generation method and device, computer equipment and storage medium |
CN112613780B (en) * | 2020-12-29 | 2022-11-25 | 北京市商汤科技开发有限公司 | Method and device for generating learning report, electronic equipment and storage medium |
CN112866808B (en) * | 2020-12-31 | 2022-09-06 | 北京市商汤科技开发有限公司 | Video processing method and device, electronic equipment and storage medium |
CN112990723B (en) * | 2021-03-24 | 2021-11-30 | 食安快线信息技术(深圳)有限公司 | Online education platform student learning force analysis feedback method based on user learning behavior deep analysis |
CN113052088A (en) * | 2021-03-29 | 2021-06-29 | 北京大米科技有限公司 | Image processing method and device, readable storage medium and electronic equipment |
CN114663261B (en) * | 2022-05-18 | 2022-08-23 | 火焰蓝(浙江)信息科技有限公司 | Data processing method suitable for training and examination system |
CN114677751B (en) * | 2022-05-26 | 2022-09-09 | 深圳市中文路教育科技有限公司 | Learning state monitoring method, monitoring device and storage medium |
CN116128453B (en) * | 2023-02-18 | 2024-05-03 | 广州市点易资讯科技有限公司 | Online course inspection method, system, equipment and medium |
CN117636219B (en) * | 2023-12-04 | 2024-06-14 | 浙江大学 | Collaborative state analysis method and system in family sibling interaction process |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160148515A1 (en) * | 2014-11-20 | 2016-05-26 | MyChild, Inc. | Web and mobile parent engagement and learning management system |
CN109815795A (en) * | 2018-12-14 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Classroom student's state analysis method and device based on face monitoring |
CN110033400A (en) * | 2019-03-26 | 2019-07-19 | 深圳先进技术研究院 | A kind of classroom monitoring analysis system |
CN110991381A (en) * | 2019-12-12 | 2020-04-10 | 山东大学 | Real-time classroom student state analysis and indication reminding system and method based on behavior and voice intelligent recognition |
CN111553323A (en) * | 2020-05-22 | 2020-08-18 | 北京市商汤科技开发有限公司 | Video processing method and device, electronic equipment and storage medium |
CN112287844A (en) * | 2020-10-30 | 2021-01-29 | 北京市商汤科技开发有限公司 | Student situation analysis method and device, electronic device and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013097311A (en) * | 2011-11-04 | 2013-05-20 | Zenrin Datacom Co Ltd | Learning support device, learning support method and learning support program |
JPWO2018097177A1 (en) * | 2016-11-24 | 2019-10-17 | 株式会社ガイア・システム・ソリューション | Engagement measurement system |
CN108399376B (en) * | 2018-02-07 | 2020-11-06 | 华中师范大学 | Intelligent analysis method and system for classroom learning interest of students |
JP6636670B1 (en) * | 2019-07-19 | 2020-01-29 | 株式会社フォーサイト | Learning system, learning lecture providing method, and program |
-
2020
- 2020-05-22 CN CN202010442733.6A patent/CN111553323A/en active Pending
- 2020-12-18 WO PCT/CN2020/137690 patent/WO2021232775A1/en active Application Filing
- 2020-12-18 JP JP2021538705A patent/JP2022537475A/en active Pending
- 2020-12-18 KR KR1020217021262A patent/KR20210144658A/en not_active Application Discontinuation
-
2021
- 2021-01-07 TW TW110100570A patent/TW202145131A/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160148515A1 (en) * | 2014-11-20 | 2016-05-26 | MyChild, Inc. | Web and mobile parent engagement and learning management system |
CN109815795A (en) * | 2018-12-14 | 2019-05-28 | 深圳壹账通智能科技有限公司 | Classroom student's state analysis method and device based on face monitoring |
CN110033400A (en) * | 2019-03-26 | 2019-07-19 | 深圳先进技术研究院 | A kind of classroom monitoring analysis system |
CN110991381A (en) * | 2019-12-12 | 2020-04-10 | 山东大学 | Real-time classroom student state analysis and indication reminding system and method based on behavior and voice intelligent recognition |
CN111553323A (en) * | 2020-05-22 | 2020-08-18 | 北京市商汤科技开发有限公司 | Video processing method and device, electronic equipment and storage medium |
CN112287844A (en) * | 2020-10-30 | 2021-01-29 | 北京市商汤科技开发有限公司 | Student situation analysis method and device, electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111553323A (en) | 2020-08-18 |
KR20210144658A (en) | 2021-11-30 |
JP2022537475A (en) | 2022-08-26 |
TW202145131A (en) | 2021-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021232775A1 (en) | Video processing method and apparatus, and electronic device and storage medium | |
CN112287844B (en) | Student situation analysis method and device, electronic device and storage medium | |
US9734410B2 (en) | Systems and methods for analyzing facial expressions within an online classroom to gauge participant attentiveness | |
US20220319181A1 (en) | Artificial intelligence (ai)-based system and method for managing education of students in real-time | |
CN111833861B (en) | Event evaluation report generation based on artificial intelligence | |
US9955116B2 (en) | Utilizing eye tracking to determine attendee engagement | |
US20230222932A1 (en) | Methods, systems, and media for context-aware estimation of student attention in online learning | |
WO2021218194A1 (en) | Data processing method and apparatus, electronic device, and storage medium | |
CN111353363A (en) | Teaching effect detection method and device and electronic equipment | |
Zhao et al. | Semi-automated 8 collaborative online training module for improving communication skills | |
CN111556279A (en) | Monitoring method and communication method of instant session | |
CN110598632A (en) | Target object monitoring method and device, electronic equipment and storage medium | |
Baecher | Video in teacher learning: Through their own eyes | |
JP2009267621A (en) | Communication apparatus | |
TWI528336B (en) | Speech skills of audio and video automatic assessment and training system | |
CN116527828A (en) | Image processing method and device, electronic equipment and readable storage medium | |
CN116912723A (en) | Classroom broadcasting guiding method and device, electronic equipment and storage medium | |
CN114830208A (en) | Information processing apparatus, information processing method, and program | |
CN111144255B (en) | Analysis method and device for non-language behaviors of teacher | |
Kushalnagar et al. | Improving classroom visual accessibility with cooperative smartphone recordings | |
Marino | Implicitly Conveying Emotion While Teleconferencing | |
JP2022171465A (en) | Program, method, and information processing device | |
CN115428466A (en) | Simulating audience reaction to performers on a camera | |
CN115052194A (en) | Learning report generation method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021538705 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20937075 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20937075 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 522432705 Country of ref document: SA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 522432705 Country of ref document: SA |