CN118378210A

CN118378210A - Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data

Info

Publication number: CN118378210A
Application number: CN202410463865.5A
Authority: CN
Inventors: 王超; 刘奇辉; 杨聪; 段鹏松; 曹仰杰
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2024-04-17
Filing date: 2024-04-17
Publication date: 2024-07-23

Abstract

The invention provides an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data, comprising a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform; the data acquisition module comprises a video acquisition module and an audio acquisition module, and is used for acquiring video data in a monitoring area and audio data in a scene as original data respectively; the data storage module is used for storing the collected original data; the data processing and analyzing module is used for preprocessing, cleaning and feature extraction of the original data and analyzing and judging the original data, and the data transmission module is used for completing data transmission among different modules; the early warning platform is used for sending out an alarm according to the judging result of the data processing and analyzing module. The system and the method have the advantages that the fusion processing and analysis are carried out on the action and sound data, and whether the current state is in the campus security domain is judged.

Description

Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data

Technical Field

The invention relates to the technical field of campus security prevention and control, in particular to an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data.

Background

With the rapid development of big data, cloud computing and big model technology, big data, cloud computing and big model technology related to security protection monitoring are required to be updated accordingly, at present, campus security protection mainly depends on the omnibearing coverage of a monitoring camera, and by combining a primary intelligent video analysis technology, instant alarm can be made on the actions of fighting, wandering in a suspicious person, and the like.

However, the existing intelligent video analysis technology cannot accurately and effectively find out the tyrant behavior, the detection accuracy of a purely visual camera is not high, campus tyrant is extremely secret, and actions such as the charging of the tyrant behavior, fan ear light and the like are easily confused with daily alarm or physical activity of students, so that misjudgment or missing report is easily caused.

Secondly, different from fighting and other behaviours such as violence, language abuse and other mental language violence and other overlooking behaviors frequently occur.

In the prior art, patents such as CN202311410907.0 for distinguishing behaviors according to sound and dynamically grabbing and identifying the motions of human bodies exist, and an intelligent campus student behavior analysis system based on artificial intelligence is provided; CN202310901557.1, a voice recognition alarm system for a toilet and a recognition method thereof; CN202211110111.9, multi-sensor data fusion slush behavior detection algorithm, etc.

The problem of campus tyrant is solved from different angles in the above-mentioned patent document, but to the coverage of tyrant means all relatively less, fuses different data and carries out discernment to most actions, and the emergence of the campus tyrant action of furthest's early warning intervention still is the technical problem that needs to solve urgently.

In order to solve the above problems, an ideal technical solution is always sought.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data, which are used for carrying out fusion processing and analysis on action and sound data and judging whether the current state is in the category of campus security protection.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the abnormal behavior detection system based on heterogeneous data fusion analysis of big data comprises a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform;

The data acquisition module comprises a video acquisition module and an audio acquisition module, and is used for acquiring video data in a monitoring area and audio data in a scene as original data respectively;

the data storage module is used for storing the collected original data;

the data processing and analyzing module is used for preprocessing the original data, cleaning and extracting the characteristics and analyzing and distinguishing the original data, and comprises the following contents:

Preprocessing the collected audio data, determining the direction information of each target person sound source, separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction, judging whether the independent sound information of each sound source relates to abnormal speech, and storing the audio and time at the occurrence time when the abnormal speech is identified to obtain a voice marking result;

Preprocessing the acquired image information and video stream, carrying out image segmentation and contour extraction on each detected person target, dividing person correlation areas on images in image frames, establishing a person action correlation vector table according to action recognition results, inputting the person action correlation vector table into a behavior discrimination model to carry out behavior discrimination, and taking the behavior discrimination result as an action marking result;

based on the voice marking result and the action marking result, fusion judgment is carried out:

When the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;

When the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;

the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;

the data transmission module is used for completing data transmission between different modules;

the early warning platform is used for sending out an alarm according to the judging result of the data processing and analyzing module.

The preprocessing process for the raw data is as follows:

Step 11: the method comprises the steps of collecting monitoring video data information of each monitoring point in a school zone through video collecting equipment, extracting video data of each monitoring point, and converting all the video data extracted by each monitoring point into corresponding video image frames;

step 12: acquiring microphone array data information of each monitoring point in a school park by using an array formed by a plurality of microphones, and acquiring original audio data of a plurality of sound signals;

Step 13: the method comprises the steps of processing collected original audio data, determining the direction information of each target person sound source, and separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction;

Step 14: and performing enhancement processing on the independent sound signals of each sound source to obtain the processed independent sound information of each direction sound source.

Based on the above, the acquired video image frame data is further processed as follows:

step 21, carrying out target detection on the characters in the video image frame set to obtain character target position information in the video, and carrying out outline framing on the corresponding character objects to obtain character outline size information;

Step 22, for each detected person object, processing the video image frames by using an image segmentation algorithm, and segmenting each person object from the background to form a foreground person image with a transparent background;

Step 23, transmitting the foreground character image to an action recognition module for analysis and action recognition;

step 24, dividing a person correlation area for an image in the video image frame according to the person contour size information, and establishing a person action correlation vector table according to an action recognition result;

and step 25, sending the character action correlation vector table into a behavior discrimination model to discriminate the behavior, and taking the behavior discrimination result as an action marking result.

The behavior discrimination model is a pretrained convolutional neural network, the output layer uses a softmax activation function, the input of the behavior discrimination model is single character foreground image data, and the probability distribution of a preset action category to which the human action belongs is output; the preset action categories comprise actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling, holding heads, walking, running, standing, squatting feet and the like, wherein the actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling and holding heads are set to be abnormal actions;

when the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; h is the height of the figure outline, and the figure action correlation vector table is a vector table containing action information of abnormal action target figures and figures in the correlation area;

the action score distribution for each person is first expressed as:

A_i＝[P_1i,P_2i,P_3i,…,P_Ki]

Where K is the number of preset actions, A _i represents all preset action score arrays of the ith person in a certain area, and P _ki is the score of the kth preset action class returned by the ith person;

According to the relevance area, a character action relevance vector table is built, and the character action score distribution of the whole relevance area is built by taking a target character i as a center, M ₁＝A_i, and the character action relevance vector table is expressed as:

V_j＝[M₁,M₂,M₃,…,M_N]

V _j is a K x N dimensional array, where V _j represents a table of motion relevance vectors for the artifacts in the jth region, N represents the number of the artifacts in the jth region, where M ₁ is the center human motion score vector, and the rest is the distribution of motion scores for the artifacts in the relevance regions;

And inputting the character action correlation vector table into the behavior judgment model to comprehensively analyze the character action correlation, so as to obtain the behavior judgment result of the corresponding character object and the correlation area.

Based on the above, the process of further processing the acquired independent sound information is as follows:

Step 31: abnormal speech recognition is carried out on the independent voice information of each person through an acoustic recognition model;

Step 32: when the recognition result is violence, storing voice violence occurrence time audio and recording voice violence occurrence time to obtain a voice marking result;

the abnormal speech recognition comprises voice recognition and emotion recognition:

Recognizing voice information into semantics by adopting a voice recognition technology, converting the semantics into text information, dividing and extracting the text information, storing the text information into a voice recognition database, and matching, comparing and judging keywords, key sentences and preset voice key words and other information;

And extracting acoustic characteristics of tone, volume and speech speed from the acquired speech data by adopting an emotion recognition technology, acquiring matching probability of emotion classification through a pre-trained emotion recognition model, and taking the maximum matching probability as a speech marking result.

Based on the above, the fusion judgment process is as follows based on the voice mark result and the action mark result:

Step 41: based on the action marking result or the audio frequency identification result, accurately acquiring the occurrence time of the corresponding Baling behavior;

Step 42: when the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;

step 43: when the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;

Step 44: the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;

the abnormal behavior fusion analysis model is a Baling behavior analysis model which is specially designed for an image sequence containing time characteristics and audio data, and deeper behavior analysis is realized by fusing the two data characteristics.

Based on the above, the working process of the early warning platform is as follows:

Step 51, when judging that there is a campus tyrant behavior, extracting a character image, a video and a corresponding audio corresponding to the campus tyrant behavior, synchronizing and merging the audio data and the video data by adopting a time alignment method, and uploading the audio data and the video data to an early warning platform through a data transmission module;

step 52, the early warning platform matches the object image of the related character object of the campus tyrant behavior with the current campus student and staff member information database to obtain related member information;

And 53, the early warning platform sends out early warning reminding based on the matching result, and transmits corresponding personnel information and abnormal behavior combined audio and video to the early warning system for warning.

Compared with the prior art, the invention has outstanding substantive characteristics and remarkable progress, and in particular, the invention combines the sound judgment factors and the action judgment factors in the Baling behavior by preprocessing, feature extraction and fusion judgment processes of the audio data and the video data of the detection area, so that the judging accuracy of the Baling behavior is higher, the applicable Baling behavior range is wider, the invention is not limited to actions and occasions in a specific range, and the security protection level is higher.

Further, in the preprocessing and analyzing process of the sound, the microphone array data is used for acquiring the sound source direction information of the target person, then the independent sound information of the target person is separated, and then the independent sound information is input into the acoustic recognition model for recognition, and the emotion recognition function is added, so that the accuracy of acoustic recognition is improved.

Furthermore, the neural network using the behavior discrimination model is included in the preprocessing and analysis process of the action behaviors, so that the recognition accuracy is improved.

Drawings

Fig. 1 is a schematic structural diagram of an abnormal behavior detection system based on heterogeneous data fusion analysis of big data in an embodiment of the present invention.

Fig. 2 is a schematic flow chart of an abnormal behavior detection method based on heterogeneous data fusion analysis of big data in an embodiment of the invention.

Fig. 3 is a schematic diagram of a branch flow of audio and video fusion analysis in an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further described in detail through the following specific embodiments.

As shown in FIG. 1, the abnormal behavior detection system based on heterogeneous data fusion analysis of big data comprises a data acquisition module, a data storage module, a data processing and analysis module, a data transmission module and an early warning platform;

The data acquisition module comprises a video acquisition module and an audio acquisition module, wherein the video acquisition module is used for capturing image information and video streams in a monitoring area, the audio acquisition module is a microphone array which can be used for sampling the spatial characteristics of a sound field and acquiring the audio information of characters in the scene under the scenes such as campuses, classrooms and the like, and the microphone array is an array which can effectively collect and distinguish the sound of the characters in different directions in the monitoring area.

The data storage module is used for storing the original data obtained from the data acquisition module and the processed data;

The data processing and analyzing module is responsible for processing, cleaning, feature extraction and analysis of the original data. The data of each modality is preprocessed and then fused for comprehensive analysis.

The data transmission module is responsible for effectively transmitting the original multi-mode data acquired from the data acquisition module to the data storage module, the data processing and analyzing module and the like, and transmitting the processed data to the early warning platform.

The early warning platform is responsible for giving an alarm to inform relevant personnel. Providing timely alerts to take necessary actions.

The method for detecting the abnormal behavior by adopting the system comprises the following steps:

step 1: the acquisition equipment acquires and preprocesses the audio and video data in the monitoring area;

step 2: performing action marking on each person in the video image frame of the preprocessing result;

step 3: voice marking is carried out on the audio data of the preprocessing result;

Step 4: acquiring corresponding audio segments and video segments according to the action marking result and the voice marking result, and inputting the corresponding audio segments and video segments into an abnormal behavior analysis model for fusion recognition;

step 5: abnormal behavior judgment is carried out according to the identification result, and when the judgment result is the Baling behavior, the abnormal behavior judgment is fed back to the early warning system for warning

In this embodiment, the data information of each monitoring point includes: video data information collected by the video collecting device and audio data information collected by the microphone array.

Specifically:

The method for acquiring the video data in the monitoring area through the video acquisition equipment and preprocessing the video data comprises the following steps:

Step 11: the method comprises the steps of collecting monitoring video data information of each monitoring point in a campus area through video collecting equipment, extracting video data of each monitoring point, and converting all the video data extracted by each monitoring point into corresponding video image frames;

Step 12: acquiring microphone array data information of each monitoring point in the campus area by using an array formed by a plurality of microphones, and acquiring original audio data of a plurality of sound signals;

Step 14: performing enhancement processing on the independent sound signals of each sound source to obtain the processed independent sound information of each direction sound source;

Wherein, the pretreatment result comprises: video image frames and independent sound information.

In this embodiment, the monitoring points include: campus, playground, classroom, etc.

In this embodiment, the data information of each monitoring point includes: video data information, audio data information, etc.

In this embodiment, the independent sound information is a set of audio data of different persons in each direction, which is matched with the video image, based on the audio data extracted by the microphone array, wherein the method includes determining the sound source direction information of each target person by using a time difference method, and obtaining the independent sound information of each target person in each direction by using a beam forming algorithm.

The microphone array is used for acquiring the audio data of a plurality of target persons in the detection area, and the audio data of the person of interest can be selectively identified, so that the detection precision is improved; the accuracy of identifying the campus tyrant behavior can be improved to a certain extent in a mode of combining and judging the two data information of the audio data and the video data.

Each person in the video image frame of the preprocessing result is marked with actions, which comprises the following steps:

Step 25, the character action correlation vector table is sent to a behavior discrimination module to discriminate behaviors, and the behavior discrimination result is used as a behavior marking result;

the foreground person images are independent foreground images of all person objects in the detection area.

In an embodiment, the motion recognition module is a pre-trained convolutional neural network, the output layer uses a softmax activation function, the input of the motion recognition module is single character foreground image data, and the probability distribution of a preset motion category to which the human motion belongs is output; the preset action categories comprise actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling, holding heads, walking, running, standing, squatting feet and the like, wherein the actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling and holding heads are set to be abnormal actions;

When the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; and h is the height of the figure outline, and the figure action correlation vector table is a vector table containing action information of abnormal action target figures and figures in the correlation area.

The action score distribution for each person is first expressed as:

A_i＝[P_1i,P_2i,P_3i,…,P_Ki]

Where K is the number of preset actions, A _i represents an array of all preset action scores for the ith person in a region, and P _ki is the score for the kth preset action class that returns the ith person.

V_j＝[M₁,M₂,M₃,…,M_N]

V _j is a K x N dimensional array, where V _j represents a table of motion relevance vectors for the people in the jth region, N represents the number of people in the jth region, where M ₁ is the center person motion score vector, and the rest are the distribution of motion scores for the people in the relevance region.

The behavior judgment model is an SVM model which is trained in advance, character motion correlation vector table data of a plurality of characters are input into the SVM model, normal or abnormal behaviors are output, the marked result is abnormal if the abnormal behaviors are generated, and the marked result is normal if the abnormal behaviors are generated.

Taking the hand waving or boxing waving of a person as an example, when the fact that the character waving action occurs is detected, the rest people in the character correlation area act normally, and the behavior can be judged to be normal; when detecting that a character performs a boxing action, the rest of the characters in the character correlation area have a head holding or shrinking action, and the occurrence of a tyrant behavior in the area can be judged.

The effective gain of the embodiment is that the behavior analysis of the character is analyzed by constructing character action correlation information to replace a single character action recognition method, so that the method has higher accuracy; and marking the moment when the video possibly has the slush behavior according to the action recognition result of the correlation information analysis, namely classifying the video in a search mode instead of directly, and being more beneficial to the detection of the small probability event such as the slush of the campus, thereby effectively reducing the operation amount of the subsequent continuous frame processing of the video.

Voice marking is carried out on the audio data of the preprocessing result, and the method comprises the following steps:

Step 31: abnormal speech recognition is carried out on the independent voice data of each person through an acoustic recognition model;

The emotion recognition comprises seven emotions of relaxation, injury, anger, fear, surprise, excitement and neutrality, wherein the seven emotions are divided into relaxation, excitement into normal emotion, the injury, anger and fear are divided into abnormal emotion, and the neutrality and surprise are divided into uncertainty. Namely, emotion recognition becomes a three-classification problem, and three emotion probabilities of abnormality, normal and uncertain are output; the maximum probability is used as the emotion recognition result.

When any result of the voice recognition technology or the emotion recognition technology is abnormal, the abnormal voice marking result is used, otherwise the normal voice marking result is used;

The method comprises the following steps of obtaining corresponding audio segments and video segments according to action marking results and voice marking results, inputting the corresponding audio segments and video segments into an abnormal behavior analysis model for fusion recognition, and comprising the following steps:

The abnormal behavior fusion analysis model is a tyrant behavior analysis model specially designed for an image sequence containing time features and audio data, and deeper behavior analysis is realized by fusing the two data features, wherein the tyrant behavior comprises behavior such as pushing, boxing, kicking, jostling, elbow hitting, palmar, and the like.

The innovative technical scheme has the following remarkable advantages: the voice and the action information are skillfully combined in a screening and processing mode, and the audio and the video are selectively subjected to deep analysis, so that the calculated amount is reduced; in the fusion analysis of action and sound information, high-precision judgment of campus tyrant behavior is realized, judgment efficiency is improved, and a more accurate and efficient solution is provided for abnormal behavior detection.

When judging that the abnormal behavior of the campus tyrant exists, extracting relevant personnel information, and feeding back the relevant personnel information to a user or a system for alarming, wherein the method comprises the following steps of:

Step 53, the early warning platform sends out early warning reminding based on the matching result, and transmits corresponding personnel information and abnormal behavior combined audio and video to the early warning system for warning;

The related personnel information comprises identities, school numbers or work numbers, names and the like;

Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical scheme of the present invention and are not limiting; while the invention has been described in detail with reference to the preferred embodiments, those skilled in the art will appreciate that: modifications may be made to the specific embodiments of the present invention or equivalents may be substituted for part of the technical features thereof; without departing from the spirit of the invention, it is intended to cover the scope of the invention as claimed.

Claims

1. An abnormal behavior detection system based on heterogeneous data fusion analysis of big data is characterized in that: the system comprises a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform;

the data acquisition module comprises a video acquisition module and an audio acquisition module, which are respectively used for acquiring video data in a monitoring area and audio data in a scene as original data, and the audio acquisition module adopts an array formed by a plurality of microphones;

the data storage module is used for storing the collected original data;

2. The abnormal behavior detection system based on big data fusion analysis of claim 1, wherein: the preprocessing process for the raw data is as follows:

3. The abnormal behavior detection system based on big data fusion analysis of claim 2, wherein: the acquired video image frame data is further processed as follows:

4. The abnormal behavior detection system based on big data fusion analysis of claim 3, wherein: the behavior discrimination model is a pretrained convolutional neural network, the output layer uses a softmax activation function, the input of the behavior discrimination model is single character foreground image data, and the probability distribution of a preset action category to which a character action belongs is output;

When the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; the character action correlation vector table is a vector table containing action information of abnormal action target characters and characters in a correlation area;

the action score distribution for each person is first expressed as:

A_i＝[P_1i,P_2i,P_3i,…,P_Ki]

V_j＝[M₁,M₂,M₃,…,M_N]

5. The abnormal behavior detection system based on big data fusion analysis of claim 2, wherein: the acquired independent sound information is further processed as follows:

6. The abnormal behavior detection system based on big data fusion analysis of claim 4 or 5, wherein: based on the voice marking result and the action marking result, the fusion judging process is as follows:

7. The abnormal behavior detection system based on big data fusion analysis of claim 6, wherein: the working process of the early warning platform is as follows:

8. A heterogeneous data fusion analysis abnormal behavior detection method based on big data is characterized by comprising the following steps of: an abnormal behavior detection system comprising the big data based heterogeneous data fusion analysis of any of claims 1-7, detected by:

step 5: and judging abnormal behaviors according to the identification result, and feeding back to the early warning platform for warning when the judgment result is the Baling behavior.