CN112818929A

CN112818929A - Method and device for detecting people fighting, electronic equipment and storage medium

Info

Publication number: CN112818929A
Application number: CN202110217543.9A
Authority: CN
Inventors: 张玉阳; 包汉彬; 王中飞; 谢会斌; 李聪廷
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-05-18
Anticipated expiration: 2041-02-26
Also published as: CN112818929B

Abstract

The application discloses a people fighting detection method, a device, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: acquiring a video frame to be detected, and performing motion detection based on a limb angle on the video frame to be detected to obtain a first detection result; performing motion amplitude detection based on optical flow loss on the video frame to be detected to obtain a second detection result; detecting the human body key point fusion state of the video frame to be detected based on the discrete degree to obtain a third detection result; generating a fighting detection result by using the first detection result, the second detection result and the third detection result; if the fighting detection result is hit, determining that people are fighting; according to the method, the people's fighting behavior is detected from multiple angles, and the fighting detection result is obtained by integrating the detection result, so that the accuracy of the obtained fighting detection result is high, and false alarm is reduced.

Description

Method and device for detecting people fighting, electronic equipment and storage medium

Technical Field

The application relates to the technical field of image processing, in particular to a people fighting detection method, a people fighting detection device, an electronic device and a computer readable storage medium.

Background

With the frequent occurrence of fighting events such as fighting which endangers public safety in recent years, the detection of abnormal behaviors of people in public places has become a hot problem for research in the field of computer vision. At present, more and more public areas are provided with monitoring cameras, but the cameras can only record videos passively and only serve as the basis of post investigation, and cannot automatically alarm in real time. In order to solve the problem, the related art judges whether a person blows or not by the distance between the persons, the arm direction, the arm lifting duration and the like. However, the related art has a simple logic for judging the fighting behavior, and only judges the fighting behavior through the action of the arm, and when a person in the image operates or holds a pull ring on the bus, a false alarm is easily generated, so that the related art can be only applied to a specific occasion, and the application scene is limited.

Therefore, the problem of the related art that false alarm is easily generated is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, an object of the present application is to provide a people detecting method, a people detecting device, an electronic device, and a computer-readable storage medium, which detect fighting behaviors from a plurality of angles and integrate detection results to obtain a fighting detection result, so that the obtained fighting detection result has a high accuracy and false alarms are greatly reduced.

In order to solve the technical problem, the application provides a people's fighting detection method, which comprises the following steps:

acquiring a video frame to be detected, and performing motion detection based on a limb angle on the video frame to be detected to obtain a first detection result;

performing motion amplitude detection based on the optical flow loss on the video frame to be detected to obtain a second detection result;

detecting the human body key point fusion state of the video frame to be detected based on the discrete degree to obtain a third detection result;

generating a fighting detection result by using the first detection result, the second detection result and the third detection result;

and if the fighting detection result is hit, determining that people are fighting.

Optionally, the performing motion detection based on the body angle on the video frame to be detected to obtain a first detection result includes:

acquiring human body key point information, and generating an arm vector, a trunk vector, a left leg vector and a right leg vector by using the human body key point information;

obtaining a first limb included angle by using the arm vector and the trunk vector, and generating a second limb included angle by using the left leg vector and the right leg vector;

and if the first limb included angle is in a first interval or the second limb included angle is in a second interval, determining that the first detection result is hit.

Optionally, the acquiring of the human body key point information includes:

inputting the video frame to be detected into an openposition attitude detection model obtained based on a distillation learning training mode to obtain human body key point coordinates corresponding to each person in the video frame to be detected;

and determining the coordinates of the human key points as the human key point information.

Optionally, the performing, based on the optical flow loss, motion amplitude detection on the video frame to be detected to obtain a second detection result includes:

performing effective area detection on the video frame to be detected to obtain an effective area image, and generating optical flow loss corresponding to the key points of the four limbs according to the effective area image;

calculating a direction entropy corresponding to the optical flow vector;

obtaining a movement distance corresponding to the four-limb key point by using the optical flow vector, and determining the four-limb key point with the movement distance larger than a distance threshold as a target four-limb key point;

and if the number of the key points of the target limbs is in a third interval and the direction entropy is in a fourth interval, determining that the second detection result is hit.

Optionally, the detecting the human key point fusion state based on the discrete degree of the video frame to be detected to obtain a third detection result includes:

identifying a plurality of first human key points corresponding to a first person and a plurality of second human key points corresponding to a second person in the video frame to be detected;

respectively calculating the distance between each first human body key point and the corresponding second human body key point, and calculating the distance standard deviation by using the distance;

and if the distance standard deviation is larger than the reliability threshold value, determining that the third detection result is hit.

Optionally, the acquiring the video frame to be detected includes:

acquiring an initial video frame, and extracting the central coordinates of persons corresponding to each person in the initial video frame;

calculating the personnel distance corresponding to any two personnel by utilizing the personnel center coordinates;

and if any person distance is smaller than a distance threshold value, obtaining the video frame to be detected by using the initial video frame.

Optionally, the generating a fighting detection result by using the first detection result, the second detection result, and the third detection result includes:

if the first detection result, the second detection result and the third detection result are hit, determining that the current detection result is hit;

and obtaining a plurality of historical detection results, and if the current detection result and the number of hits corresponding to the historical detection result are greater than a judgment threshold value, determining that the fighting detection result is a hit.

The application also provides a people fighting detection device, includes:

the motion detection module is used for acquiring a video frame to be detected and performing motion detection based on a limb angle on the video frame to be detected to obtain a first detection result;

the amplitude detection module is used for carrying out optical flow loss-based action amplitude detection on the video frame to be detected to obtain a second detection result;

the fusion detection module is used for detecting the fusion state of the human key points of the video frame to be detected based on the discrete degree to obtain a third detection result;

the result generation module is used for generating an fighting detection result by utilizing the first detection result, the second detection result and the third detection result;

and the determination detection module is used for determining that people are fighting is detected if the fighting detection result is hit.

The present application further provides an electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is used for executing the computer program to realize the people fighting detection method.

The application also provides a computer readable storage medium for storing a computer program, wherein the computer program realizes the above-mentioned method for detecting people fighting when being executed by a processor.

According to the people fighting detection method, a video frame to be detected is obtained, and motion detection based on the body angle is carried out on the video frame to be detected to obtain a first detection result; performing motion amplitude detection based on optical flow loss on the video frame to be detected to obtain a second detection result; detecting the human body key point fusion state of the video frame to be detected based on the discrete degree to obtain a third detection result; generating a fighting detection result by using the first detection result, the second detection result and the third detection result; and if the fighting detection result is hit, determining that the people are fighting.

Therefore, the method performs motion detection on the video frame to be detected after acquiring the video frame to be detected, and the person who is fighting needs to raise the arm or leg to perform limb conflict, so that the included angle of the limb is different from that in a normal condition, and therefore motion detection based on the limb angle can be performed, and a first detection result is obtained. In the process of fighting, people can adopt a large action amplitude, so that in order to avoid misjudgment, the action amplitude can be detected from the angle of the optical flow loss, whether the detected action is fighting or not can be judged, and a second detection result can be obtained. Because the people participating in fighting usually have different postures, have obvious difference with the actions of people on the operation and public transport, and usually have physical contact, and the human key points are fused on the video frame to be detected, whether the behaviors of the people are consistent or not can be detected and judged by adopting the human key point fusion state based on the discrete degree, and a corresponding third detection result is obtained. The method has the advantages that the first detection result, the second detection result and the third detection result are used for generating the fighting detection result, whether the video frame to be detected has a fighting phenomenon or not can be detected from multiple angles, and when the fighting detection result is hit, people can be determined to be detected. The method has the advantages that the detection is carried out from multiple angles, and the detection result is comprehensively obtained, so that the obtained detection result is high in accuracy, false alarms are reduced, the method can be applied to more scenes, and the problem that the false alarms are easily generated in the related technology is solved.

In addition, the application also provides a people fighting detection device, electronic equipment and a computer readable storage medium, and the beneficial effects are also achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flow chart of a method for detecting people fighting provided in the embodiment of the present application;

FIG. 2 is a schematic diagram of a human skeleton frame obtained by connecting key points of a human body according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a first limb angle provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a second limb angle provided in an embodiment of the present application;

FIG. 5 is a schematic illustration of an optical loss according to an embodiment of the present application;

fig. 6 is a schematic diagram of a specific human body key point of a video frame to be detected according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a people fighting detection device provided in the embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a people fighting detection method provided in the embodiment of the present application. The method comprises the following steps:

s101: the method comprises the steps of obtaining a video frame to be detected, and carrying out motion detection based on the body angle on the video frame to be detected to obtain a first detection result.

The video frame to be detected is a video frame for carrying out people fighting detection, and the specific content is not limited. The video where the video frame to be detected is located may be referred to as a video to be detected, and the video to be detected may be a video acquired in real time or may be a non-real-time video. The specific acquiring mode of the video to be detected is not limited, for example, the video to be detected may be acquired in real time by the camera device, or the video in the designated path may be determined as the video to be detected, or videos sent by some designated electronic devices may be determined as the video to be detected. The video frame to be detected can be any one of the video frames in the video to be detected, and because each video frame in the video stream has a temporal precedence relationship, each video frame can be sequentially determined as the video frame to be detected from the video frame generated at the first time according to the precedence order of the video stream, namely the generation order of each video frame when people fighting detection is carried out.

It can be understood that the situations of people fighting phenomena recorded in the video to be detected are few, and the people fighting phenomena are not recorded in the video to be detected under most situations. Therefore, in order to reduce the waste of computing resources, it may be determined whether a video frame is a video frame to be detected after the video frame is acquired, and the video frame to be detected is subsequently detected after the video frame is determined to be the video frame to be detected. Therefore, the process of acquiring the video frame to be detected may specifically include the following steps:

step 11: and acquiring an initial video frame, and extracting the central coordinates of the persons corresponding to the persons in the initial video frame.

Step 12: and calculating the personnel distance corresponding to any two personnel by utilizing the personnel center coordinates.

Step 13: and if any person distance is smaller than the distance threshold value, obtaining the video frame to be detected by using the initial video frame.

It can be understood that, when the limb conflict occurs, the distance between the persons is necessarily closer, so whether the person fighting phenomenon possibly exists can be preliminarily judged by judging whether the distance between the persons is smaller. Specifically, in this embodiment, the directly acquired video frame is an initial video frame, and the center coordinates of the person corresponding to each person in the acquired initial video frame are extracted, and the specific extraction manner is not limited. After the person center coordinates are obtained, the distance between any two person center coordinates is calculated, and the distance is the person distance between two corresponding persons. If any person is shorter than the distance threshold, the fact that the distance between at least two persons is short is indicated, people fighting is possible to occur, and therefore the initial video frame can be judged as the video frame to be detected, and follow-up detection can be conducted. Correspondingly, if there is no person whose distance is smaller than the distance threshold, it is indicated that the distance between persons is long, and there is no possibility of a person fighting phenomenon, and the embodiment is not limited to the operation specifically performed in this case, for example, it may be no operation, that is, no operation is performed; or may be an operation to retrieve the original video frame.

It can be understood that the body motion of the person when fighting is different from the normal condition, and the body motion angle is different from the angle of the body motion in most normal conditions, so that the body angle motion of the video frame to be detected can be detected, whether the abnormal body angle exists in the motion of the person is judged, and a corresponding first detection result is obtained. The motion detection of the video frame to be detected may be performed on the entire video frame to be detected, or may be performed on a part thereof, for example, a part corresponding to two persons close to each other. Specifically, in a feasible implementation manner, the process of performing the motion detection based on the body angle on the video frame to be detected to obtain the first detection result may include the following steps:

step 21: and acquiring the key point information of the human body, and generating an arm vector, a trunk vector, a left leg vector and a right leg vector by using the key point information of the human body.

Referring to fig. 2, fig. 2 is a schematic diagram of a human skeleton frame obtained by connecting human key points provided in an embodiment of the present application, where each serial number is a label of each human key point, each point is a human key point, and a connection line between points is used to simulate a human skeleton. The number and the position of the key points of the human body are not limited, and the human skeleton obtained after the connection can represent the motion condition of the limbs of the human body. Using fig. 2 as an example, there are 14 human key points, including head point 1 (point 0), upper torso 9 totally, including central point 1 (point 1), shoulder point 2 (including left shoulder and right shoulder, point 2 and point 5), elbow point 2 (including left elbow and right elbow, point 3 and point 6), wrist point 2 (including left wrist and right wrist, point 4 and point 7), crotch point (including left crotch and right crotch, point 8 and point 11), lower torso 4 totally, including knee point 2 (including left knee and right knee, point 9 and point 12), foot point 2 (including left foot and right foot, point 10 and point 13).

It should be noted that, since at least two persons are necessarily present in the video frame to be detected, the number of the acquired human body key point information is necessarily multiple. In this embodiment, the process of obtaining the first detection result is the first detection result obtained for each piece of human body key point information. The human body key point information is used for representing the position of the human body key point, the specific form is not limited, for example, the human body key point information can be in a coordinate form, or can be in a serial number form, for example, a plurality of key point ranges are preset, each key point range corresponds to one serial number, and the corresponding serial number is determined by detecting the key point range where the human body key point is located. The embodiment does not limit the specific way of acquiring the human body key point information, for example, in a feasible implementation, the process of acquiring the human body key point information may specifically include the following steps:

step 31: and inputting the video frame to be detected into an openposition attitude detection model obtained based on a distillation learning training mode to obtain human body key point coordinates corresponding to each person in the video frame to be detected.

Step 32: determining the coordinates of the key points of the human body as the key point information of the human body

In this embodiment, the openposition gesture detection model may be used to detect the human key points, and obtain corresponding human key point coordinates, which are the human key point information. The OpenPose human posture detection model is a Convolutional neural network and supervised learning based deep learning network with a Convolutional structure with a function for Fast Feature Embedding as a frame, key points of human body parts are obtained through network forward processing, and posture estimation of human body actions, facial expressions, finger motions and the like can be realized through analysis of the key points. In this embodiment, in order to improve the performance of the openposition posture detection model and improve the recognition perusal of the coordinates of the key points of the human body, the openposition posture detection model can be obtained through training in a distillation learning training mode. The distillation learning training mode is a training mode of knowledge distillation, and the distillation learning comprises two network models, a complex Teacher Model (Teacher Model) and a simple Student Model (Student Model). By adopting the output of the pre-trained teacher model as a supervision signal to train the student model, the representation capability of the student model can be improved, namely the performance of the student model is improved. The embodiment does not limit the specific process of obtaining the coordinates of the human key points by using openposition, and for example, a human key point heat map corresponding to the video frame to be detected may be generated, and the corresponding coordinates of the human key points may be obtained according to the heat map. Furthermore, the coordinates of each human body key point can be connected to obtain a human body skeleton frame diagram similar to that shown in fig. 2.

Because the human key point information can represent the positions of the human key points, the human key point information can be used for generating an arm vector, a trunk vector, a left leg vector and a right leg vector, and the arm vector can be a left-hand arm vector and/or a right-hand arm vector. Continuing with fig. 2 as an example, specifically, any one wrist point and corresponding shoulder point in the skeleton frame of the human body may be selected to generate an arm vector, for example, point I of the wrist in the skeleton frame of the human body may be selected₄(x₄,y₄) Wherein (x)₄,y₄) Is the coordinate of point No. 4, and the point at the shoulder is I₂(x₂,y₂) Thus arm vector

The calculation formula of (2) is as follows:

by analogy, via the crotch joint point I in the skeleton frame of the human body₈(x₈,y₈) And a center point I₁(x₁,y₁) The torso vector can be obtained

Similarly, the left leg vector and the right leg vector can be obtained by the method, and the specifically selected key point coordinates are not limited.

Step 22: and obtaining a first limb included angle by utilizing the arm vector and the trunk vector, and generating a second limb included angle by utilizing the left leg vector and the right leg vector.

After the vectors are obtained, the corresponding included angle can be calculated by using the following formula:

wherein, the combination of the A vector and the B vector can be the combination of an arm vector and a trunk vector, or can be the combination of a left leg vector and a right leg vector, and the coordinate of the A vector is (a)₁，b₁) The coordinates of the B vector are (a)₂，b₂) And theta is the limb included angle, and the specific meaning of theta is different according to different contents of the A vector and the B vector, and the theta can be specifically the first limb included angle or the second limb included angle. Referring to fig. 3, fig. 3 is a schematic diagram of a first limb included angle according to an embodiment of the present disclosure, wherein two vectors of two persons form respective corresponding first limb included angles. Referring to fig. 4, fig. 4 is a schematic diagram of a second limb included angle according to an embodiment of the present application.

Step 23: and if the first limb included angle is in the first interval or the second limb included angle is in the second interval, determining that the first detection result is hit.

The specific sizes of the first interval and the second interval are not limited in this embodiment, and may be set according to actual needs, and the first interval and the second interval may be the same or different, for example, the first interval may be set to be greater than 45 degrees, and the second interval may be set to be greater than 60 degrees. If the first limb included angle is in the first interval or the second limb included angle is in the second interval, the limb action of the person is determined to be abnormal, and therefore the corresponding first detection result can be determined to be hit. On the contrary, if neither of the two included angles is in the corresponding interval, the first detection result is not hit. Because the number of the key point information of the human body is multiple, the number of the obtained first limb included angles and the number of the obtained second limb included angles are also multiple, in this case, any one first limb included angle is in a first interval, or any one second limb included angle is in a second interval, and the first detection result can be determined as hit.

S102: and detecting the motion amplitude of the video frame to be detected based on the optical flow loss to obtain a second detection result.

The optical flow method is a method for calculating motion information of an object between adjacent frames by using a change of a pixel in an image sequence in a time domain and a correlation between adjacent frames to find a correspondence existing between a previous frame and a current frame, and generally defines a gray instantaneous change rate at a specific coordinate point of a two-dimensional image plane as an optical flow vector. Referring to fig. 5, fig. 5 is a schematic diagram of optical loss according to an embodiment of the present application, which shows a projection of a motion vector (i.e., a 3D motion vector) of an object in a three-dimensional space onto a two-dimensional imaging plane, and a two-dimensional vector (i.e., a 2D optical flow vector) describing a position change is obtained. By obtaining the optical flow loss, the motion amplitude of the person can be judged by using the optical flow loss, and the motion amplitude of the person fighting is inevitably large, so that whether the person fighting occurs or not can be detected from the angle of the motion amplitude, and a second detection result is obtained. The embodiment does not limit the specific detection process of the motion amplitude, and may be set according to actual needs, for example, whether the motion speed is fast or whether the motion distance is long may be determined. Specifically, in a possible implementation manner, in order to obtain an accurate second detection result, the process of performing motion amplitude detection based on optical flow loss on the video frame to be detected may specifically include the following steps:

step 41: and detecting an effective area of the video frame to be detected to obtain an effective area image, and generating optical flow loss corresponding to the key points of the four limbs according to the effective area image.

Because only people can move in a large range or fast when people fight, and parts such as background non-people parts or people not participating in fighting do not move, in order to improve the reliability of the second detection result, the detection of the movement range can be carried out only on the effective area. The effective region may also be referred to as a region of interest (ROI) image, and a specific range of the effective region is not limited, for example, the effective region may be a region corresponding to a person at a short distance, or may be a region corresponding to an arm part and a leg part corresponding to the person, for example, refer to fig. 3 and 4, where a range defined by a white dashed frame is the effective region. After the effective area image in the effective area is obtained, the optical flow loss corresponding to the key points of the four limbs is generated by using the effective area image, and a specific generation manner of the optical flow loss may refer to related technologies, which is not described herein again.

Step 42: and calculating the direction entropy corresponding to the optical flow loss.

Under the scenes of operation and the like, the personnel can also do abnormal limb actions with large amplitude, and misjudgment is easily caused. To avoid misjudgment in such a scenario, the direction entropy corresponding to each optical flow vector may be calculated. The information entropy can be used as a measure of the complexity of a system, and the more a system is complex, the more the types of different situations occur, the larger the information entropy of the system is; the simpler a system, the fewer the kinds of occurrences (in the extreme case 1, the probability of correspondence is 1, and the entropy of correspondence is 0), the less the entropy of correspondence of the system. The directional entropy can indicate the complexity of the optical flow vector in the direction, and the embodiment does not limit the specific calculation mode of the directional entropy, and can be selected as required.

Step 43: and obtaining the movement distance corresponding to the four-limb key point by using the light loss, and determining the four-limb key point with the movement distance larger than the distance threshold value as a target four-limb key point.

In this embodiment, the optical flow loss corresponds to the key points of the four limbs, so the movement distance corresponding to the key points of the four limbs can be calculated by using the optical flow loss, and the larger the movement distance, the larger the movement amplitude. In this embodiment, a distance threshold is further set, the specific size of the distance threshold is not limited, and if the movement distance is greater than the distance threshold, the corresponding limb key point is the target limb key point.

Step 44: and if the number of the key points of the target limbs is in a third interval and the direction entropy is in a fourth interval, determining that the second detection result is hit.

If the number of the key points of the target limbs is in a third interval, the movement amplitude of the person is large; if the direction entropy is in the fourth interval, the movement of people is complex and not uniform, so that a fighting phenomenon possibly exists, and the second detection result is determined to be a hit. The specific range of the third interval is not limited, for example, the lower limit value thereof may be 60% or 70% of the number of the key points of the limbs, and the specific size and the upper and lower limit values of the fourth interval are also not limited, and the lower limit value thereof is a maximum value considered to be free from fighting.

S103: and detecting the human body key point fusion state of the video frame to be detected based on the discrete degree to obtain a third detection result.

The interactive behavior of four limbs can appear in the people fighting process, the interactive behavior is mapped to the two-dimensional space and is the fusion state of the key points of the human body, and due to the fact that the behaviors of people under the scenes of operation, public transportation and the like are relatively consistent, the distance between the key points of the human body is relatively consistent, the discrete degree is small, the actions of people under the fighting phenomenon are different, the distance difference between the key points of the human body is large, and the discrete degree is large. Therefore, whether the fighting phenomenon occurs or not can be detected by using the standard deviation and the variance of the distances between the human key points corresponding to the people or other discrete data capable of representing the data discrete degree, and a corresponding third detection result is obtained. Specifically, in an embodiment, the process of obtaining the third detection result may include:

step 51: and identifying a plurality of first human key points corresponding to a first person and a plurality of second human key points corresponding to a second person in the video frame to be detected.

In this embodiment, the first person and the second person may be two persons whose human body center distance is smaller than a distance threshold. Referring to fig. 6, fig. 6 is a schematic diagram of human key points of a specific video frame to be detected according to an embodiment of the present application, where two people in a white dashed frame are a first person and a second person respectively. A plurality of key points are respectively arranged on the first person and the second person, and the key points on the first person and the second person are in one-to-one correspondence.

Step 52: and respectively calculating the distance between each first human body key point and the corresponding second human body key point, and calculating the distance standard deviation by using the distance.

It should be noted that the present embodiment does not limit the distance calculation method, and for example, euclidean distances may be calculated to obtain a plurality of corresponding distances x₁,x₂,x₃,…,x_nThen the standard deviation of the distances between the respective distances is:

the smaller the standard deviation, the smaller the dispersion of the specification data, i.e., the more similar the actions of the two persons, the less likely it is to be a fighting behavior. Conversely, the larger the standard deviation, the larger the degree of dispersion of the caption data, the more different the motions of the two persons, and the more likely the caption is a fighting behavior.

Step 53: and if the distance standard deviation is larger than the reliability threshold value, determining that the third detection result is hit.

In this embodiment, a confidence threshold is set, the confidence threshold is a standard of the standard deviation, and if the distance standard deviation is greater than the confidence threshold, it is indicated that an fighting behavior is likely to occur, so that it is determined that the third detection result is a hit.

It should be noted that, the specific execution sequence of the steps of generating the first detection result, the second detection result, and the third detection result is not limited in this embodiment, and in an implementation, the three steps may be executed in parallel, that is, after the video frame to be detected is acquired, the first detection result, the second detection result, and the third detection result are simultaneously generated; in another embodiment, the three steps may be performed in series, i.e. three detection results are obtained in sequence.

S104: and generating an fighting detection result by using the first detection result, the second detection result and the third detection result.

And after the first detection result, the second detection result and the third detection result are determined to be obtained, generating a fighting detection result by using the first detection result, the second detection result and the third detection result. In one embodiment, the fighting detection result is only specific to the video frame to be detected, in this case, the fighting detection result is generated only by using the three detection results, that is, if all the three detection results are hit, the fighting detection result is hit; or when the two detection results are hits, the fighting detection result is a hit. In another embodiment, since the fighting behavior is persistent, in order to obtain a more accurate fighting detection result, the process of generating the fighting detection result may include the steps of:

step 61: and if the first detection result, the second detection result and the third detection result are hit, determining that the current detection result is hit.

Step 62: and obtaining a plurality of historical detection results, and if the number of hits corresponding to the current detection result and the historical detection result is greater than a judgment threshold value, determining that the fighting detection result is a hit.

In this embodiment, the attack detection result may be determined by integrating the history detection result and the current detection result. And the historical detection result is the detection result corresponding to a plurality of continuous video frames before the video frame to be detected, and if any video frame is not determined as the video frame to be detected, the corresponding historical detection result is a miss. And detecting the number of hits in the current detection result and the historical detection result, namely the number of hits, whether the number of hits is larger than a judgment threshold value, and if the number of hits is larger than the judgment threshold value, determining that the fighting detection result is a hit. By integrating the detection results of multiple frames, a more accurate fighting detection result can be obtained.

S105: and if the fighting detection result is hit, determining that the people are fighting.

And if the fighting detection result is hit, the fact that the video frame to be detected records the people fighting phenomenon is indicated, and people fighting is determined to be detected. In this case, an alarm may be performed, or operations such as record reporting may be performed, which is not limited in this embodiment.

By applying the people fighting detection method provided by the embodiment of the application, after the video frame to be detected is obtained, the action detection is carried out on the video frame, and because the fighting person needs to raise the arm or leg to carry out limb conflict, the included angle of the limb is different from that under normal conditions, the action detection based on the limb angle can be carried out, and a first detection result is obtained. In the process of fighting, people can adopt a large action amplitude, so that in order to avoid misjudgment, the action amplitude can be detected from the angle of the optical flow loss, whether the detected action is fighting or not can be judged, and a second detection result can be obtained. Because the people participating in fighting usually have different postures, have obvious difference with the actions of people on the operation and public transport, and usually have physical contact, and the human key points are fused on the video frame to be detected, whether the behaviors of the people are consistent or not can be detected and judged by adopting the human key point fusion state based on the discrete degree, and a corresponding third detection result is obtained. The method has the advantages that the first detection result, the second detection result and the third detection result are used for generating the fighting detection result, whether the video frame to be detected has a fighting phenomenon or not can be detected from multiple angles, and when the fighting detection result is hit, people can be determined to be detected. The method has the advantages that the detection is carried out from multiple angles, and the detection result is comprehensively obtained, so that the obtained detection result is high in accuracy, false alarms are reduced, the method can be applied to more scenes, and the problem that the false alarms are easily generated in the related technology is solved.

The people fighting detection device provided by the embodiment of the application is introduced below, and the people fighting detection device described below and the people fighting detection method described above can be referred to in a corresponding manner.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a people fighting detection apparatus provided in an embodiment of the present application, including:

the motion detection module 110 is configured to acquire a video frame to be detected, and perform motion detection based on a limb angle on the video frame to be detected to obtain a first detection result;

the amplitude detection module 120 is configured to perform motion amplitude detection based on the optical flow loss on the video frame to be detected to obtain a second detection result;

the fusion detection module 130 is configured to perform human key point fusion state detection based on the discrete degree on the video frame to be detected, so as to obtain a third detection result;

the result generation module 140 is configured to generate an fighting detection result by using the first detection result, the second detection result, and the third detection result;

and the determination detecting module 150 is used for determining that people are detected as being fighting if the fighting detection result is hit.

Optionally, the action detection module 110 includes:

the vector generation unit is used for acquiring the key point information of the human body and generating an arm vector, a trunk vector, a left leg vector and a right leg vector by using the key point information of the human body;

the included angle determining unit is used for obtaining a first limb included angle by utilizing the arm vector and the trunk vector and generating a second limb included angle by utilizing the left leg vector and the right leg vector;

and the first hit determining unit is used for determining that the first detection result is hit if the first limb included angle is in the first interval or the second limb included angle is in the second interval.

Optionally, the vector generating unit includes:

the coordinate detection subunit is used for inputting the video frame to be detected into an openposition attitude detection model obtained based on a distillation learning training mode to obtain human body key point coordinates respectively corresponding to each person in the video frame to be detected;

and the information determining subunit is used for determining the coordinates of the key points of the human body as the key point information of the human body.

Optionally, the amplitude detection module 120 includes:

the optical flow loss generating unit is used for detecting an effective area of a video frame to be detected to obtain an effective area image and generating optical flow loss corresponding to the key points of the four limbs according to the effective area image;

the direction entropy calculation unit is used for calculating the direction entropy corresponding to the optical flow loss;

the movement distance calculation unit is used for obtaining movement distances corresponding to the four-limb key points by using the optical loss, and determining the four-limb key points with the movement distances larger than a distance threshold value as target four-limb key points;

and the second hit determining unit is used for determining that the second detection result is hit if the number of the key points of the target limbs is in the third interval and the direction entropy is in the fourth interval.

Optionally, the fusion detection module 130 includes:

the identification unit is used for identifying a plurality of first human key points corresponding to a first person and a plurality of second human key points corresponding to a second person in the video frame to be detected;

the standard deviation calculation unit is used for calculating the distance between each first human body key point and the corresponding second human body key point respectively and calculating the distance standard deviation by using the distance;

and the third hit determining unit is used for determining that the third detection result is hit if the distance standard deviation is larger than the reliability threshold.

Optionally, the action detection module 110 includes:

the central coordinate extraction unit is used for acquiring an initial video frame and extracting the central coordinates of persons corresponding to each person in the initial video frame;

the distance calculation unit is used for calculating the personnel distance corresponding to any two personnel by utilizing the personnel center coordinates;

and the to-be-detected video frame determining unit is used for obtaining the to-be-detected video frame by using the initial video frame if any person distance is smaller than the distance threshold.

Optionally, the result generation module 140 includes:

the current detection result generation unit is used for determining that the current detection result is hit if the first detection result, the second detection result and the third detection result are hit;

and the fighting detection result generating unit is used for acquiring a plurality of historical detection results, and if the hit numbers corresponding to the current detection result and the historical detection result are greater than the judgment threshold value, the fighting detection result is determined to be hit.

In the following, the electronic device provided by the embodiment of the present application is introduced, and the electronic device described below and the people fighting detection method described above can be referred to correspondingly.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Wherein the electronic device 100 may include a processor 101 and a memory 102, and may further include one or more of a multimedia component 103, an information input/information output (I/O) interface 104, and a communication component 105.

The processor 101 is configured to control the overall operation of the electronic device 100 to complete all or part of the steps in the above-mentioned people fighting detection method; the memory 102 is used to store various types of data to support operation at the electronic device 100, such data may include, for example, instructions for any application or method operating on the electronic device 100, as well as application-related data. The Memory 102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as one or more of Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

The multimedia component 103 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 102 or transmitted through the communication component 105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 104 provides an interface between the processor 101 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 105 may include: Wi-Fi part, Bluetooth part, NFC part.

The electronic Device 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components, and is configured to perform the method for detecting a human fighting.

The following describes a computer-readable storage medium provided in an embodiment of the present application, and the computer-readable storage medium described below and the above-described people fighting detection method can be referred to in correspondence with each other.

The application also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program realizes the steps of the method for detecting the fighting.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relationships such as first and second, etc., are intended only to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms include, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A people fighting detection method is characterized by comprising the following steps:

2. The people fighting detection method according to claim 1, wherein the detecting the motion of the video frame to be detected based on the body angle to obtain a first detection result comprises:

3. The people fighting detection method according to claim 2, wherein the acquiring of the human body key point information comprises:

4. The people fighting detection method according to claim 1, wherein the detecting the motion amplitude of the video frame to be detected based on the optical flow loss to obtain a second detection result comprises:

calculating a direction entropy corresponding to the optical flow vector;

5. The people fighting detection method according to claim 1, wherein the human key point fusion state detection based on the discrete degree is performed on the video frame to be detected to obtain a third detection result, and the method comprises the following steps:

6. The people fighting detection method according to claim 1, wherein the acquiring a video frame to be detected comprises:

7. The people fighting detection method according to any one of claims 1 to 6, wherein the generating of the fighting detection result by using the first detection result, the second detection result, and the third detection result includes:

8. A people fighting detection device, characterized by comprising:

9. An electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor configured to execute the computer program to implement the people fighting detection method according to any of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the fighting detection method according to any of claims 1 to 7.