CN118378210A - Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data - Google Patents
Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data Download PDFInfo
- Publication number
- CN118378210A CN118378210A CN202410463865.5A CN202410463865A CN118378210A CN 118378210 A CN118378210 A CN 118378210A CN 202410463865 A CN202410463865 A CN 202410463865A CN 118378210 A CN118378210 A CN 118378210A
- Authority
- CN
- China
- Prior art keywords
- data
- action
- behavior
- information
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 57
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 46
- 230000004927 fusion Effects 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000001514 detection method Methods 0.000 title claims abstract description 29
- 230000009471 action Effects 0.000 claims abstract description 109
- 238000012544 monitoring process Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000007781 pre-processing Methods 0.000 claims abstract description 19
- 230000005540 biological transmission Effects 0.000 claims abstract description 14
- 238000013500 data storage Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 230000006399 behavior Effects 0.000 claims description 77
- 239000013598 vector Substances 0.000 claims description 32
- 230000002159 abnormal effect Effects 0.000 claims description 30
- 230000033001 locomotion Effects 0.000 claims description 27
- 230000008909 emotion recognition Effects 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 12
- 230000005236 sound signal Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 230000008451 emotion Effects 0.000 claims description 8
- 230000002902 bimodal effect Effects 0.000 claims description 5
- 238000003709 image segmentation Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 238000003491 array Methods 0.000 claims description 2
- 238000007499 fusion processing Methods 0.000 abstract description 2
- 208000027418 Wounds and injury Diseases 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 208000014674 injury Diseases 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/18—Status alarms
- G08B21/24—Reminder alarms, e.g. anti-loss alarms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Quality & Reliability (AREA)
- Business, Economics & Management (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Emergency Management (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Social Psychology (AREA)
- Alarm Systems (AREA)
Abstract
The invention provides an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data, comprising a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform; the data acquisition module comprises a video acquisition module and an audio acquisition module, and is used for acquiring video data in a monitoring area and audio data in a scene as original data respectively; the data storage module is used for storing the collected original data; the data processing and analyzing module is used for preprocessing, cleaning and feature extraction of the original data and analyzing and judging the original data, and the data transmission module is used for completing data transmission among different modules; the early warning platform is used for sending out an alarm according to the judging result of the data processing and analyzing module. The system and the method have the advantages that the fusion processing and analysis are carried out on the action and sound data, and whether the current state is in the campus security domain is judged.
Description
Technical Field
The invention relates to the technical field of campus security prevention and control, in particular to an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data.
Background
With the rapid development of big data, cloud computing and big model technology, big data, cloud computing and big model technology related to security protection monitoring are required to be updated accordingly, at present, campus security protection mainly depends on the omnibearing coverage of a monitoring camera, and by combining a primary intelligent video analysis technology, instant alarm can be made on the actions of fighting, wandering in a suspicious person, and the like.
However, the existing intelligent video analysis technology cannot accurately and effectively find out the tyrant behavior, the detection accuracy of a purely visual camera is not high, campus tyrant is extremely secret, and actions such as the charging of the tyrant behavior, fan ear light and the like are easily confused with daily alarm or physical activity of students, so that misjudgment or missing report is easily caused.
Secondly, different from fighting and other behaviours such as violence, language abuse and other mental language violence and other overlooking behaviors frequently occur.
In the prior art, patents such as CN202311410907.0 for distinguishing behaviors according to sound and dynamically grabbing and identifying the motions of human bodies exist, and an intelligent campus student behavior analysis system based on artificial intelligence is provided; CN202310901557.1, a voice recognition alarm system for a toilet and a recognition method thereof; CN202211110111.9, multi-sensor data fusion slush behavior detection algorithm, etc.
The problem of campus tyrant is solved from different angles in the above-mentioned patent document, but to the coverage of tyrant means all relatively less, fuses different data and carries out discernment to most actions, and the emergence of the campus tyrant action of furthest's early warning intervention still is the technical problem that needs to solve urgently.
In order to solve the above problems, an ideal technical solution is always sought.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides an abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data, which are used for carrying out fusion processing and analysis on action and sound data and judging whether the current state is in the category of campus security protection.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: the abnormal behavior detection system based on heterogeneous data fusion analysis of big data comprises a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform;
The data acquisition module comprises a video acquisition module and an audio acquisition module, and is used for acquiring video data in a monitoring area and audio data in a scene as original data respectively;
the data storage module is used for storing the collected original data;
the data processing and analyzing module is used for preprocessing the original data, cleaning and extracting the characteristics and analyzing and distinguishing the original data, and comprises the following contents:
Preprocessing the collected audio data, determining the direction information of each target person sound source, separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction, judging whether the independent sound information of each sound source relates to abnormal speech, and storing the audio and time at the occurrence time when the abnormal speech is identified to obtain a voice marking result;
Preprocessing the acquired image information and video stream, carrying out image segmentation and contour extraction on each detected person target, dividing person correlation areas on images in image frames, establishing a person action correlation vector table according to action recognition results, inputting the person action correlation vector table into a behavior discrimination model to carry out behavior discrimination, and taking the behavior discrimination result as an action marking result;
based on the voice marking result and the action marking result, fusion judgment is carried out:
When the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;
When the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;
the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;
the data transmission module is used for completing data transmission between different modules;
the early warning platform is used for sending out an alarm according to the judging result of the data processing and analyzing module.
The preprocessing process for the raw data is as follows:
Step 11: the method comprises the steps of collecting monitoring video data information of each monitoring point in a school zone through video collecting equipment, extracting video data of each monitoring point, and converting all the video data extracted by each monitoring point into corresponding video image frames;
step 12: acquiring microphone array data information of each monitoring point in a school park by using an array formed by a plurality of microphones, and acquiring original audio data of a plurality of sound signals;
Step 13: the method comprises the steps of processing collected original audio data, determining the direction information of each target person sound source, and separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction;
Step 14: and performing enhancement processing on the independent sound signals of each sound source to obtain the processed independent sound information of each direction sound source.
Based on the above, the acquired video image frame data is further processed as follows:
step 21, carrying out target detection on the characters in the video image frame set to obtain character target position information in the video, and carrying out outline framing on the corresponding character objects to obtain character outline size information;
Step 22, for each detected person object, processing the video image frames by using an image segmentation algorithm, and segmenting each person object from the background to form a foreground person image with a transparent background;
Step 23, transmitting the foreground character image to an action recognition module for analysis and action recognition;
step 24, dividing a person correlation area for an image in the video image frame according to the person contour size information, and establishing a person action correlation vector table according to an action recognition result;
and step 25, sending the character action correlation vector table into a behavior discrimination model to discriminate the behavior, and taking the behavior discrimination result as an action marking result.
The behavior discrimination model is a pretrained convolutional neural network, the output layer uses a softmax activation function, the input of the behavior discrimination model is single character foreground image data, and the probability distribution of a preset action category to which the human action belongs is output; the preset action categories comprise actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling, holding heads, walking, running, standing, squatting feet and the like, wherein the actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling and holding heads are set to be abnormal actions;
when the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; h is the height of the figure outline, and the figure action correlation vector table is a vector table containing action information of abnormal action target figures and figures in the correlation area;
the action score distribution for each person is first expressed as:
Ai=[P1i,P2i,P3i,…,PKi]
Where K is the number of preset actions, A i represents all preset action score arrays of the ith person in a certain area, and P ki is the score of the kth preset action class returned by the ith person;
According to the relevance area, a character action relevance vector table is built, and the character action score distribution of the whole relevance area is built by taking a target character i as a center, M 1=Ai, and the character action relevance vector table is expressed as:
Vj=[M1,M2,M3,…,MN]
V j is a K x N dimensional array, where V j represents a table of motion relevance vectors for the artifacts in the jth region, N represents the number of the artifacts in the jth region, where M 1 is the center human motion score vector, and the rest is the distribution of motion scores for the artifacts in the relevance regions;
And inputting the character action correlation vector table into the behavior judgment model to comprehensively analyze the character action correlation, so as to obtain the behavior judgment result of the corresponding character object and the correlation area.
Based on the above, the process of further processing the acquired independent sound information is as follows:
Step 31: abnormal speech recognition is carried out on the independent voice information of each person through an acoustic recognition model;
Step 32: when the recognition result is violence, storing voice violence occurrence time audio and recording voice violence occurrence time to obtain a voice marking result;
the abnormal speech recognition comprises voice recognition and emotion recognition:
Recognizing voice information into semantics by adopting a voice recognition technology, converting the semantics into text information, dividing and extracting the text information, storing the text information into a voice recognition database, and matching, comparing and judging keywords, key sentences and preset voice key words and other information;
And extracting acoustic characteristics of tone, volume and speech speed from the acquired speech data by adopting an emotion recognition technology, acquiring matching probability of emotion classification through a pre-trained emotion recognition model, and taking the maximum matching probability as a speech marking result.
Based on the above, the fusion judgment process is as follows based on the voice mark result and the action mark result:
Step 41: based on the action marking result or the audio frequency identification result, accurately acquiring the occurrence time of the corresponding Baling behavior;
Step 42: when the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;
step 43: when the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;
Step 44: the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;
the abnormal behavior fusion analysis model is a Baling behavior analysis model which is specially designed for an image sequence containing time characteristics and audio data, and deeper behavior analysis is realized by fusing the two data characteristics.
Based on the above, the working process of the early warning platform is as follows:
Step 51, when judging that there is a campus tyrant behavior, extracting a character image, a video and a corresponding audio corresponding to the campus tyrant behavior, synchronizing and merging the audio data and the video data by adopting a time alignment method, and uploading the audio data and the video data to an early warning platform through a data transmission module;
step 52, the early warning platform matches the object image of the related character object of the campus tyrant behavior with the current campus student and staff member information database to obtain related member information;
And 53, the early warning platform sends out early warning reminding based on the matching result, and transmits corresponding personnel information and abnormal behavior combined audio and video to the early warning system for warning.
Compared with the prior art, the invention has outstanding substantive characteristics and remarkable progress, and in particular, the invention combines the sound judgment factors and the action judgment factors in the Baling behavior by preprocessing, feature extraction and fusion judgment processes of the audio data and the video data of the detection area, so that the judging accuracy of the Baling behavior is higher, the applicable Baling behavior range is wider, the invention is not limited to actions and occasions in a specific range, and the security protection level is higher.
Further, in the preprocessing and analyzing process of the sound, the microphone array data is used for acquiring the sound source direction information of the target person, then the independent sound information of the target person is separated, and then the independent sound information is input into the acoustic recognition model for recognition, and the emotion recognition function is added, so that the accuracy of acoustic recognition is improved.
Furthermore, the neural network using the behavior discrimination model is included in the preprocessing and analysis process of the action behaviors, so that the recognition accuracy is improved.
Drawings
Fig. 1 is a schematic structural diagram of an abnormal behavior detection system based on heterogeneous data fusion analysis of big data in an embodiment of the present invention.
Fig. 2 is a schematic flow chart of an abnormal behavior detection method based on heterogeneous data fusion analysis of big data in an embodiment of the invention.
Fig. 3 is a schematic diagram of a branch flow of audio and video fusion analysis in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further described in detail through the following specific embodiments.
As shown in FIG. 1, the abnormal behavior detection system based on heterogeneous data fusion analysis of big data comprises a data acquisition module, a data storage module, a data processing and analysis module, a data transmission module and an early warning platform;
The data acquisition module comprises a video acquisition module and an audio acquisition module, wherein the video acquisition module is used for capturing image information and video streams in a monitoring area, the audio acquisition module is a microphone array which can be used for sampling the spatial characteristics of a sound field and acquiring the audio information of characters in the scene under the scenes such as campuses, classrooms and the like, and the microphone array is an array which can effectively collect and distinguish the sound of the characters in different directions in the monitoring area.
The data storage module is used for storing the original data obtained from the data acquisition module and the processed data;
The data processing and analyzing module is responsible for processing, cleaning, feature extraction and analysis of the original data. The data of each modality is preprocessed and then fused for comprehensive analysis.
The data transmission module is responsible for effectively transmitting the original multi-mode data acquired from the data acquisition module to the data storage module, the data processing and analyzing module and the like, and transmitting the processed data to the early warning platform.
The early warning platform is responsible for giving an alarm to inform relevant personnel. Providing timely alerts to take necessary actions.
The method for detecting the abnormal behavior by adopting the system comprises the following steps:
step 1: the acquisition equipment acquires and preprocesses the audio and video data in the monitoring area;
step 2: performing action marking on each person in the video image frame of the preprocessing result;
step 3: voice marking is carried out on the audio data of the preprocessing result;
Step 4: acquiring corresponding audio segments and video segments according to the action marking result and the voice marking result, and inputting the corresponding audio segments and video segments into an abnormal behavior analysis model for fusion recognition;
step 5: abnormal behavior judgment is carried out according to the identification result, and when the judgment result is the Baling behavior, the abnormal behavior judgment is fed back to the early warning system for warning
In this embodiment, the data information of each monitoring point includes: video data information collected by the video collecting device and audio data information collected by the microphone array.
Specifically:
The method for acquiring the video data in the monitoring area through the video acquisition equipment and preprocessing the video data comprises the following steps:
Step 11: the method comprises the steps of collecting monitoring video data information of each monitoring point in a campus area through video collecting equipment, extracting video data of each monitoring point, and converting all the video data extracted by each monitoring point into corresponding video image frames;
Step 12: acquiring microphone array data information of each monitoring point in the campus area by using an array formed by a plurality of microphones, and acquiring original audio data of a plurality of sound signals;
Step 13: the method comprises the steps of processing collected original audio data, determining the direction information of each target person sound source, and separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction;
Step 14: performing enhancement processing on the independent sound signals of each sound source to obtain the processed independent sound information of each direction sound source;
Wherein, the pretreatment result comprises: video image frames and independent sound information.
In this embodiment, the monitoring points include: campus, playground, classroom, etc.
In this embodiment, the data information of each monitoring point includes: video data information, audio data information, etc.
In this embodiment, the independent sound information is a set of audio data of different persons in each direction, which is matched with the video image, based on the audio data extracted by the microphone array, wherein the method includes determining the sound source direction information of each target person by using a time difference method, and obtaining the independent sound information of each target person in each direction by using a beam forming algorithm.
The microphone array is used for acquiring the audio data of a plurality of target persons in the detection area, and the audio data of the person of interest can be selectively identified, so that the detection precision is improved; the accuracy of identifying the campus tyrant behavior can be improved to a certain extent in a mode of combining and judging the two data information of the audio data and the video data.
Each person in the video image frame of the preprocessing result is marked with actions, which comprises the following steps:
step 21, carrying out target detection on the characters in the video image frame set to obtain character target position information in the video, and carrying out outline framing on the corresponding character objects to obtain character outline size information;
Step 22, for each detected person object, processing the video image frames by using an image segmentation algorithm, and segmenting each person object from the background to form a foreground person image with a transparent background;
Step 23, transmitting the foreground character image to an action recognition module for analysis and action recognition;
step 24, dividing a person correlation area for an image in the video image frame according to the person contour size information, and establishing a person action correlation vector table according to an action recognition result;
Step 25, the character action correlation vector table is sent to a behavior discrimination module to discriminate behaviors, and the behavior discrimination result is used as a behavior marking result;
the foreground person images are independent foreground images of all person objects in the detection area.
In an embodiment, the motion recognition module is a pre-trained convolutional neural network, the output layer uses a softmax activation function, the input of the motion recognition module is single character foreground image data, and the probability distribution of a preset motion category to which the human motion belongs is output; the preset action categories comprise actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling, holding heads, walking, running, standing, squatting feet and the like, wherein the actions such as waving hands, kicking legs, pushing hands, throwing, pouring feet, curling and holding heads are set to be abnormal actions;
When the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; and h is the height of the figure outline, and the figure action correlation vector table is a vector table containing action information of abnormal action target figures and figures in the correlation area.
The action score distribution for each person is first expressed as:
Ai=[P1i,P2i,P3i,…,PKi]
Where K is the number of preset actions, A i represents an array of all preset action scores for the ith person in a region, and P ki is the score for the kth preset action class that returns the ith person.
According to the relevance area, a character action relevance vector table is built, and the character action score distribution of the whole relevance area is built by taking a target character i as a center, M 1=Ai, and the character action relevance vector table is expressed as:
Vj=[M1,M2,M3,…,MN]
V j is a K x N dimensional array, where V j represents a table of motion relevance vectors for the people in the jth region, N represents the number of people in the jth region, where M 1 is the center person motion score vector, and the rest are the distribution of motion scores for the people in the relevance region.
And inputting the character action correlation vector table into the behavior judgment model to comprehensively analyze the character action correlation, so as to obtain the behavior judgment result of the corresponding character object and the correlation area.
The behavior judgment model is an SVM model which is trained in advance, character motion correlation vector table data of a plurality of characters are input into the SVM model, normal or abnormal behaviors are output, the marked result is abnormal if the abnormal behaviors are generated, and the marked result is normal if the abnormal behaviors are generated.
Taking the hand waving or boxing waving of a person as an example, when the fact that the character waving action occurs is detected, the rest people in the character correlation area act normally, and the behavior can be judged to be normal; when detecting that a character performs a boxing action, the rest of the characters in the character correlation area have a head holding or shrinking action, and the occurrence of a tyrant behavior in the area can be judged.
The effective gain of the embodiment is that the behavior analysis of the character is analyzed by constructing character action correlation information to replace a single character action recognition method, so that the method has higher accuracy; and marking the moment when the video possibly has the slush behavior according to the action recognition result of the correlation information analysis, namely classifying the video in a search mode instead of directly, and being more beneficial to the detection of the small probability event such as the slush of the campus, thereby effectively reducing the operation amount of the subsequent continuous frame processing of the video.
Voice marking is carried out on the audio data of the preprocessing result, and the method comprises the following steps:
Step 31: abnormal speech recognition is carried out on the independent voice data of each person through an acoustic recognition model;
Step 32: when the recognition result is violence, storing voice violence occurrence time audio and recording voice violence occurrence time to obtain a voice marking result;
the abnormal speech recognition comprises voice recognition and emotion recognition:
Recognizing voice information into semantics by adopting a voice recognition technology, converting the semantics into text information, dividing and extracting the text information, storing the text information into a voice recognition database, and matching, comparing and judging keywords, key sentences and preset voice key words and other information;
And extracting acoustic characteristics of tone, volume and speech speed from the acquired speech data by adopting an emotion recognition technology, acquiring matching probability of emotion classification through a pre-trained emotion recognition model, and taking the maximum matching probability as a speech marking result.
The emotion recognition comprises seven emotions of relaxation, injury, anger, fear, surprise, excitement and neutrality, wherein the seven emotions are divided into relaxation, excitement into normal emotion, the injury, anger and fear are divided into abnormal emotion, and the neutrality and surprise are divided into uncertainty. Namely, emotion recognition becomes a three-classification problem, and three emotion probabilities of abnormality, normal and uncertain are output; the maximum probability is used as the emotion recognition result.
When any result of the voice recognition technology or the emotion recognition technology is abnormal, the abnormal voice marking result is used, otherwise the normal voice marking result is used;
The method comprises the following steps of obtaining corresponding audio segments and video segments according to action marking results and voice marking results, inputting the corresponding audio segments and video segments into an abnormal behavior analysis model for fusion recognition, and comprising the following steps:
Step 41: based on the action marking result or the audio frequency identification result, accurately acquiring the occurrence time of the corresponding Baling behavior;
Step 42: when the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;
step 43: when the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;
Step 44: the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;
The abnormal behavior fusion analysis model is a tyrant behavior analysis model specially designed for an image sequence containing time features and audio data, and deeper behavior analysis is realized by fusing the two data features, wherein the tyrant behavior comprises behavior such as pushing, boxing, kicking, jostling, elbow hitting, palmar, and the like.
The innovative technical scheme has the following remarkable advantages: the voice and the action information are skillfully combined in a screening and processing mode, and the audio and the video are selectively subjected to deep analysis, so that the calculated amount is reduced; in the fusion analysis of action and sound information, high-precision judgment of campus tyrant behavior is realized, judgment efficiency is improved, and a more accurate and efficient solution is provided for abnormal behavior detection.
When judging that the abnormal behavior of the campus tyrant exists, extracting relevant personnel information, and feeding back the relevant personnel information to a user or a system for alarming, wherein the method comprises the following steps of:
Step 51, when judging that there is a campus tyrant behavior, extracting a character image, a video and a corresponding audio corresponding to the campus tyrant behavior, synchronizing and merging the audio data and the video data by adopting a time alignment method, and uploading the audio data and the video data to an early warning platform through a data transmission module;
step 52, the early warning platform matches the object image of the related character object of the campus tyrant behavior with the current campus student and staff member information database to obtain related member information;
Step 53, the early warning platform sends out early warning reminding based on the matching result, and transmits corresponding personnel information and abnormal behavior combined audio and video to the early warning system for warning;
The related personnel information comprises identities, school numbers or work numbers, names and the like;
Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical scheme of the present invention and are not limiting; while the invention has been described in detail with reference to the preferred embodiments, those skilled in the art will appreciate that: modifications may be made to the specific embodiments of the present invention or equivalents may be substituted for part of the technical features thereof; without departing from the spirit of the invention, it is intended to cover the scope of the invention as claimed.
Claims (8)
1. An abnormal behavior detection system based on heterogeneous data fusion analysis of big data is characterized in that: the system comprises a data acquisition module, a data storage module, a data processing and analyzing module, a data transmission module and an early warning platform;
the data acquisition module comprises a video acquisition module and an audio acquisition module, which are respectively used for acquiring video data in a monitoring area and audio data in a scene as original data, and the audio acquisition module adopts an array formed by a plurality of microphones;
the data storage module is used for storing the collected original data;
the data processing and analyzing module is used for preprocessing the original data, cleaning and extracting the characteristics and analyzing and distinguishing the original data, and comprises the following contents:
Preprocessing the collected audio data, determining the direction information of each target person sound source, separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction, judging whether the independent sound information of each sound source relates to abnormal speech, and storing the audio and time at the occurrence time when the abnormal speech is identified to obtain a voice marking result;
Preprocessing the acquired image information and video stream, carrying out image segmentation and contour extraction on each detected person target, dividing person correlation areas on images in image frames, establishing a person action correlation vector table according to action recognition results, inputting the person action correlation vector table into a behavior discrimination model to carry out behavior discrimination, and taking the behavior discrimination result as an action marking result;
based on the voice marking result and the action marking result, fusion judgment is carried out:
When the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;
When the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;
the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;
the data transmission module is used for completing data transmission between different modules;
the early warning platform is used for sending out an alarm according to the judging result of the data processing and analyzing module.
2. The abnormal behavior detection system based on big data fusion analysis of claim 1, wherein: the preprocessing process for the raw data is as follows:
Step 11: the method comprises the steps of collecting monitoring video data information of each monitoring point in a school zone through video collecting equipment, extracting video data of each monitoring point, and converting all the video data extracted by each monitoring point into corresponding video image frames;
step 12: acquiring microphone array data information of each monitoring point in a school park by using an array formed by a plurality of microphones, and acquiring original audio data of a plurality of sound signals;
Step 13: the method comprises the steps of processing collected original audio data, determining the direction information of each target person sound source, and separating a plurality of original sound signals mixed together to obtain independent sound information of each target person in each direction;
Step 14: and performing enhancement processing on the independent sound signals of each sound source to obtain the processed independent sound information of each direction sound source.
3. The abnormal behavior detection system based on big data fusion analysis of claim 2, wherein: the acquired video image frame data is further processed as follows:
step 21, carrying out target detection on the characters in the video image frame set to obtain character target position information in the video, and carrying out outline framing on the corresponding character objects to obtain character outline size information;
Step 22, for each detected person object, processing the video image frames by using an image segmentation algorithm, and segmenting each person object from the background to form a foreground person image with a transparent background;
Step 23, transmitting the foreground character image to an action recognition module for analysis and action recognition;
step 24, dividing a person correlation area for an image in the video image frame according to the person contour size information, and establishing a person action correlation vector table according to an action recognition result;
and step 25, sending the character action correlation vector table into a behavior discrimination model to discriminate the behavior, and taking the behavior discrimination result as an action marking result.
4. The abnormal behavior detection system based on big data fusion analysis of claim 3, wherein: the behavior discrimination model is a pretrained convolutional neural network, the output layer uses a softmax activation function, the input of the behavior discrimination model is single character foreground image data, and the probability distribution of a preset action category to which a character action belongs is output;
When the motion recognition result is the maximum probability of abnormal motion, constructing a person correlation area, wherein the size of the person correlation area is an area taking a target person with the initial motion recognition result as the maximum probability of abnormal motion as the center and taking the outline size of the person as the radius of 2 h; the character action correlation vector table is a vector table containing action information of abnormal action target characters and characters in a correlation area;
the action score distribution for each person is first expressed as:
Ai=[P1i,P2i,P3i,…,PKi]
Where K is the number of preset actions, A i represents all preset action score arrays of the ith person in a certain area, and P ki is the score of the kth preset action class returned by the ith person;
According to the relevance area, a character action relevance vector table is built, and the character action score distribution of the whole relevance area is built by taking a target character i as a center, M 1=Ai, and the character action relevance vector table is expressed as:
Vj=[M1,M2,M3,…,MN]
V j is a K x N dimensional array, where V j represents a table of motion relevance vectors for the artifacts in the jth region, N represents the number of the artifacts in the jth region, where M 1 is the center human motion score vector, and the rest is the distribution of motion scores for the artifacts in the relevance regions;
And inputting the character action correlation vector table into the behavior judgment model to comprehensively analyze the character action correlation, so as to obtain the behavior judgment result of the corresponding character object and the correlation area.
5. The abnormal behavior detection system based on big data fusion analysis of claim 2, wherein: the acquired independent sound information is further processed as follows:
Step 31: abnormal speech recognition is carried out on the independent voice information of each person through an acoustic recognition model;
Step 32: when the recognition result is violence, storing voice violence occurrence time audio and recording voice violence occurrence time to obtain a voice marking result;
the abnormal speech recognition comprises voice recognition and emotion recognition:
Recognizing voice information into semantics by adopting a voice recognition technology, converting the semantics into text information, dividing and extracting the text information, storing the text information into a voice recognition database, and matching, comparing and judging keywords, key sentences and preset voice key words and other information;
And extracting acoustic characteristics of tone, volume and speech speed from the acquired speech data by adopting an emotion recognition technology, acquiring matching probability of emotion classification through a pre-trained emotion recognition model, and taking the maximum matching probability as a speech marking result.
6. The abnormal behavior detection system based on big data fusion analysis of claim 4 or 5, wherein: based on the voice marking result and the action marking result, the fusion judging process is as follows:
Step 41: based on the action marking result or the audio frequency identification result, accurately acquiring the occurrence time of the corresponding Baling behavior;
Step 42: when the voice marking result indicates language violence, extracting relevant character audio information at the occurrence moment of the Baling behavior, and combining the voice marking result and video relevance area division to acquire character image sequences corresponding to time periods and areas in real time;
step 43: when the action marking result identifies abnormal tyrant behavior, rapidly extracting character image sequences related to characters in a corresponding time period and region, and simultaneously extracting audio information related to the characters in the corresponding time period from independent sound information according to the action marking result;
Step 44: the character image sequences are fused, and the related character audio information is combined and input into an abnormal behavior fusion analysis model to perform bimodal fusion analysis so as to judge whether abnormal behaviors occur or not;
the abnormal behavior fusion analysis model is a Baling behavior analysis model which is specially designed for an image sequence containing time characteristics and audio data, and deeper behavior analysis is realized by fusing the two data characteristics.
7. The abnormal behavior detection system based on big data fusion analysis of claim 6, wherein: the working process of the early warning platform is as follows:
Step 51, when judging that there is a campus tyrant behavior, extracting a character image, a video and a corresponding audio corresponding to the campus tyrant behavior, synchronizing and merging the audio data and the video data by adopting a time alignment method, and uploading the audio data and the video data to an early warning platform through a data transmission module;
step 52, the early warning platform matches the object image of the related character object of the campus tyrant behavior with the current campus student and staff member information database to obtain related member information;
And 53, the early warning platform sends out early warning reminding based on the matching result, and transmits corresponding personnel information and abnormal behavior combined audio and video to the early warning system for warning.
8. A heterogeneous data fusion analysis abnormal behavior detection method based on big data is characterized by comprising the following steps of: an abnormal behavior detection system comprising the big data based heterogeneous data fusion analysis of any of claims 1-7, detected by:
step 1: the acquisition equipment acquires and preprocesses the audio and video data in the monitoring area;
step 2: performing action marking on each person in the video image frame of the preprocessing result;
step 3: voice marking is carried out on the audio data of the preprocessing result;
Step 4: acquiring corresponding audio segments and video segments according to the action marking result and the voice marking result, and inputting the corresponding audio segments and video segments into an abnormal behavior analysis model for fusion recognition;
step 5: and judging abnormal behaviors according to the identification result, and feeding back to the early warning platform for warning when the judgment result is the Baling behavior.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410463865.5A CN118378210A (en) | 2024-04-17 | 2024-04-17 | Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410463865.5A CN118378210A (en) | 2024-04-17 | 2024-04-17 | Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118378210A true CN118378210A (en) | 2024-07-23 |
Family
ID=91911961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410463865.5A Pending CN118378210A (en) | 2024-04-17 | 2024-04-17 | Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118378210A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118689999A (en) * | 2024-08-23 | 2024-09-24 | 南方科技大学 | Campus security expert system based on large model |
-
2024
- 2024-04-17 CN CN202410463865.5A patent/CN118378210A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118689999A (en) * | 2024-08-23 | 2024-09-24 | 南方科技大学 | Campus security expert system based on large model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vahab et al. | Applications of object detection system | |
Soldner et al. | Box of lies: Multimodal deception detection in dialogues | |
CN111127830A (en) | Alarm method, alarm system and readable storage medium based on monitoring equipment | |
CN118378210A (en) | Abnormal behavior detection system and method based on heterogeneous data fusion analysis of big data | |
CN112101096A (en) | Suicide emotion perception method based on multi-mode fusion of voice and micro-expression | |
CN112768070A (en) | Mental health evaluation method and system based on dialogue communication | |
Liu et al. | Gaze-assisted multi-stream deep neural network for action recognition | |
Patil et al. | Guidance system for visually impaired people | |
CN117952808A (en) | Campus anti-spoofing method and system based on video and voice recognition | |
Karimi | Interpretable multimodal deception detection in videos | |
Mar et al. | A hybrid approach: image processing techniques and deep learning method for cow detection and tracking system | |
Yun et al. | Recognition of emergency situations using audio–visual perception sensor network for ambient assistive living | |
CN109308467A (en) | Traffic accident prior-warning device and method for early warning based on machine learning | |
CN112489787A (en) | Method for detecting human health based on micro-expression | |
CN116108396A (en) | Non-contact type joint participation artificial intelligence analysis system | |
CN114399993A (en) | Standard law enforcement training system | |
Han et al. | A new information fusion method for SVM-based robotic audio-visual emotion recognition | |
RU2005100267A (en) | METHOD AND SYSTEM OF AUTOMATIC VERIFICATION OF THE PRESENCE OF A LIVING FACE OF A HUMAN IN BIOMETRIC SECURITY SYSTEMS | |
Dharanaesh et al. | Video based facial emotion recognition system using deep learning | |
Kulkarni et al. | Automated System for Detection of Suspicious Activity in Examination Hall | |
Roy | A computational model of word learning from multimodal sensory input | |
Abidin et al. | Deepfake Detection in Videos Using Long Short-Term Memory and CNN ResNext | |
CN117218324B (en) | Camera regulation and control system and method based on artificial intelligence | |
CN115601714B (en) | Campus violent behavior identification method based on multi-modal data analysis | |
Dodiya et al. | Attention, emotion and attendance tracker with question generation system using deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |