CN112528898B

CN112528898B - Alarm event aggregation method and device based on monitoring video multi-target detection

Info

Publication number: CN112528898B
Application number: CN202011497409.0A
Authority: CN
Inventors: 刘红利; 李征; 王栓
Original assignee: Changyang Technology Beijing Co ltd
Current assignee: Changyang Technology Beijing Co ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2024-10-18
Anticipated expiration: 2040-12-17
Also published as: CN112528898A

Abstract

The invention discloses an alarm event aggregation method and device based on monitoring video multi-target detection, which comprises the steps of inputting an obtained video frame into a target detection model, judging whether a target object exists in a detection result, judging whether the target object is the target object according to whether the confidence coefficient exceeds a set threshold value, and entering the next step; extracting apparent feature vectors of target objects in video frames through a depth cosine measurement learning network model, and obtaining labels, apparent feature vectors and coordinate positions of the target objects by combining the steps; and judging whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through an alarm event aggregation rule based on the label, the system time, the apparent feature vector and the coordinate position of the target object. The invention can effectively remove repeated invalid alarm events, greatly saves the time cost of checking hidden danger of management personnel, is beneficial to high-efficiency management in safe production, is applicable to different industrial scenes, and has strong practicability.

Description

Alarm event aggregation method and device based on monitoring video multi-target detection

Technical Field

The invention relates to the field of intelligent monitoring, in particular to an alarm event aggregation method and device based on monitoring video multi-target detection.

Background

With the landing of new infrastructure, the industry internet field is faced with new development opportunities and challenges. The industrial intelligent alarm management platform is born under the scene demand, and analyzes operators and scenes through real-time monitoring videos, such as safety helmets, working clothes correct wearing detection, dangerous source identification detection of smoke, oil leakage and the like, boundary intrusion detection of an intrusion target and the like, and performs capture preservation, alarm, voice prompt and the like on the target meeting preset conditions. In daily safety production of enterprises, the industrial intelligent alarm platform provides standardized safety production application functions and information service support for the enterprises, and realizes dynamic perception of safety risks, instant alarm of accident potential and accurate and efficient safety supervision, thereby effectively promoting the implementation of main responsibility of the safety production of the enterprises.

Compared with the traditional alarm management platform, the industrial intelligent alarm management platform can reduce the number of invalid alarms, quickly provide operation guidance for management personnel, and obviously shorten the alarm management period. The method is characterized in that alarm problems such as invalid alarms and repeated alarms are analyzed and aggregated, important alarms are found to the greatest extent, and efficient hidden danger investigation is performed, so that the instantaneity and reliability of alarm management are improved, and the safety production and the safety operation level of an industrial process are effectively improved.

At present, less research is conducted on the aspect of alarm event aggregation, only target detection is conducted on a target object, alarms are generated in real time, a large number of identical or invalid alarms are generated in the process, and too many repeated alarms are generated, so that the time cost of troubleshooting hidden danger of management staff is greatly increased, and automation and efficient management in safety production are not facilitated.

In view of this, it is very significant to build an alarm event aggregation method and device based on monitoring video multi-target detection.

Disclosure of Invention

Aiming at the problems that the clustering and the investigation of repeated events in the alarm event are difficult, the time cost is high, the automation and the efficient management in the safety production are not facilitated, and the like. An object of the embodiments of the present application is to provide a method and an apparatus for aggregating alarm events based on monitoring video multi-target detection, which solve the technical problems mentioned in the background section above.

In a first aspect, an embodiment of the present application provides an alarm event aggregation method based on monitoring video multi-target detection, including the following steps:

a target detection step, namely inputting the acquired video frame into a target detection model, judging whether a target object exists in a detection result, judging whether the target object is the target object according to whether the confidence coefficient exceeds a set threshold value, and entering a measurement learning step;

A measurement learning step, namely extracting apparent feature vectors of target objects in the video frames through a depth cosine measurement learning network model, and obtaining labels, apparent feature vectors and coordinate positions of the target objects in combination with the target detection step; and

And an aggregation classification step, namely judging whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through an alarm event aggregation rule based on the label, the system time, the apparent feature vector and the coordinate position of the target object.

In some embodiments, prior to the target detection step, further comprising:

And acquiring a real-time video stream of the monitoring video through OpenCV, and intercepting the real-time video stream according to a certain time interval to obtain a video frame. Therefore, in practical application, multiple paths of real-time video streams can be simultaneously accessed for real-time intelligent analysis.

In some embodiments, the depth cosine metric learning network model extracts apparent feature vectors of the target object to construct an event aggregation feature vector queue. The event-aggregate feature vector queue may be used for similarity calculation of feature vectors.

In some embodiments, the alarm event aggregation rules specifically include:

inquiring whether a target object exists in the constructed event aggregation feature vector queue through a tag, if so, entering a time matching stage, otherwise, directly generating an alarm event, and adding the tag, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue;

Judging whether the difference value between the system time and the time of the target object in the event aggregation feature vector queue is within an effective time period, if so, entering a cosine similarity matching stage, otherwise, directly generating an alarm event, and adding the label, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue; and

And (3) performing cosine similarity calculation on the apparent feature vector of the target object and the apparent feature vector of the target object in an effective time period in the event aggregation feature vector queue, judging whether the target object accords with a matching rule or not by judging the cosine similarity value and comparing with a preset cosine similarity threshold value, if the cosine similarity value is smaller than the cosine similarity threshold value, indicating that the apparent feature vector is unsuccessfully matched, directly generating an alarm event, adding the tag, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue, and if the cosine similarity value is greater than or equal to the cosine similarity threshold value, indicating that the apparent feature vector is successfully matched, and the alarm event corresponding to the target object belongs to a repeated alarm event.

In some embodiments, the object detection model and the depth cosine metric learning network model are encapsulated using Django framework. And providing an http interface for the client to request after encapsulation, so that model management and external request are facilitated.

In some embodiments, the object detection model uses YoloV as a model framework, and the original classification model is improved to a lightweight classification model based on the YoloV model. The detection accuracy of the current detection model is relatively high, and the calculation efficiency is high.

In some embodiments, the activation function of each layer in the network structure of the object detection model is a linear rectification function ReLU, and the activation function of each layer in the network structure of the depth cosine metric learning network model is an exponential linear unit ELU. The accuracy of the model calculation at this time is high.

In a second aspect, an embodiment of the present application further provides an alarm event aggregation apparatus based on monitoring video multi-target detection, including:

The target detection module is configured to input the acquired video frame into a target detection model, judge whether a target object exists in a detection result, judge whether the target object exists according to whether the confidence coefficient exceeds a set threshold value, and enter a measurement learning step;

The measurement learning module is configured to extract apparent feature vectors of target objects in the video frames through the depth cosine measurement learning network model, and obtain labels, apparent feature vectors and coordinate positions of the target objects in combination with the target detection module; and

And the aggregation classification module is configured to judge whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through alarm event aggregation rules based on the label, the system time, the apparent feature vector and the coordinate position of the target object.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The invention discloses an alarm event aggregation method and device based on monitoring video multi-target detection, which comprises the steps of inputting an obtained video frame into a target detection model, judging whether a target object exists in a detection result, judging whether the target object is the target object according to whether the confidence coefficient exceeds a set threshold value, and entering the next step; extracting apparent feature vectors of target objects in video frames through a depth cosine measurement learning network model, and obtaining labels, apparent feature vectors and coordinate positions of the target objects by combining the steps; and judging whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through an alarm event aggregation rule based on the label, the system time, the apparent feature vector and the coordinate position of the target object. The invention is based on a high-precision target detection model and depth cosine measurement learning, carries out target detection on a real-time video frame of a video monitoring system, extracts the apparent feature vector of a target object through the depth cosine measurement learning model to construct a multi-target apparent feature vector feature library, and combines a cosine similarity matching rule to effectively remove repeated invalid alarm events, thereby greatly saving the time cost of checking hidden danger of management personnel, being beneficial to high-efficiency management in safety production, being applicable to different industrial scenes and having strong practicability.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary device frame pattern to which an embodiment of the present application may be applied;

FIG. 2 is a flowchart of an alarm event aggregation method based on surveillance video multi-objective detection according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network structure of a target detection model of an alarm event aggregation method based on monitoring video multi-target detection according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network structure of a deep metric learning network model of an alarm event aggregation method based on surveillance video multi-objective detection according to an embodiment of the present invention;

FIG. 5 is a flowchart of step S3 of an alarm event aggregation method based on surveillance video multi-objective detection according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an alarm event aggregation device based on surveillance video multi-objective detection according to an embodiment of the present invention;

Fig. 7 is a schematic diagram of a computer apparatus suitable for use in implementing an embodiment of the application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 illustrates an exemplary device architecture 100 of a surveillance video multi-target detection-based alarm event aggregation method or a surveillance video multi-target detection-based alarm event aggregation device to which embodiments of the present application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various applications, such as a data processing class application, a file processing class application, and the like, may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablets, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background data processing server processing files or data uploaded by the terminal devices 101, 102, 103. The background data processing server can process the acquired file or data to generate a processing result.

It should be noted that, the alarm event aggregation method based on the surveillance video multi-target detection provided by the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, 103, and accordingly, the alarm event aggregation device based on the surveillance video multi-target detection may be set in the server 105, or may be set in the terminal devices 101, 102, 103.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.

Fig. 2 shows an alarm event aggregation method based on monitoring video multi-target detection according to an embodiment of the present application, including the following steps:

S1, inputting the acquired video frames into a target detection model, judging whether a target object exists in a detection result, judging whether the target object is the target object according to whether the confidence coefficient exceeds a set threshold value, and entering the next step.

In a specific embodiment, before step S1, the method further includes:

And acquiring a real-time video stream of the monitoring video through OpenCV, and intercepting the real-time video stream according to a certain time interval to obtain a video frame. Therefore, in practical application, multiple paths of real-time video streams can be simultaneously accessed for real-time intelligent analysis. In a specific embodiment, an RTSP real-time video stream of a current monitoring video is obtained through OpenCV, and a picture is taken every 2s as an input picture for the prediction of a subsequent model. In practical application, one device is required to be simultaneously connected into multiple paths of real-time video streams for real-time intelligent analysis, and the operation time in an industrial production environment is long in consideration of the performance and pressure of the device, and the target in 2s is basically unchanged, so that video frame interception is carried out according to the time interval of 2 s.

Based on the python language design system framework, a Django framework is used for packaging a multi-target detection algorithm and an apparent feature extraction algorithm model service, an OpenCV is used for acquiring a monitoring video stream in real time, video frames are intercepted according to a user-defined time interval, then the Django packaged model service is requested to carry out multi-target detection and apparent feature vector extraction, whether an alarm event is repeated or not is judged through an alarm event aggregation rule, and finally the alarm event which is not repeated is sent to an industrial intelligent alarm management platform in real time for display.

In a specific embodiment, the object detection model uses YoloV as a model framework, and the original classification model is improved to a lightweight classification model on the basis of the YoloV model. The detection accuracy of the current detection model is relatively high, and the calculation efficiency is high.

The high-precision target detection model is based, and when the detected target is correct, the subsequent extraction of the characterization feature vector is meaningful. The application is described by taking the most common labor protection compliance wearing in an industrial production environment as an example, wherein the labor protection compliance wearing refers to the requirement that an operator correctly wears safety helmets and wears work clothes in a specified operation occasion, so that the safety of the operator is ensured. The same approach may be used for multi-target detection in other scenarios as well. In a specific embodiment, the embodiment of the application adopts the efficient convolutional neural network YoloV (You Only Look Once) as a model framework for multi-target detection, combines the color characteristics of safety helmets and work clothes in labor protection wearing and the real-time requirement in practical application, improves on the basis of a YoloV model proposed by Joseph Redmon, and changes a classification model in an original algorithm into a lightweight classification model. The network structure of the lightweight class model is shown in fig. 3, the input size of the target detection model is 416×416, and the activation function of each layer is a linear rectification function ReLU. When the sample is marked, the color of the safety helmet and the color of the work clothes are distinguished, meanwhile, the attention is focused on the head and the upper half of the human body, and then 9 anchors suitable for the safety helmet and the work clothes are selected by k-means clustering to replace the original anchors for model training. The precision of the labor insurance wearing model finally obtained by training reaches more than 95 percent.

When the label, the confidence score and the coordinate position of the target object predicted by the target detection model are obtained, judging whether the confidence score is larger than a set threshold value, and discarding if the confidence score is smaller than the set threshold value; and if the apparent feature vector is larger than the set threshold value, extracting the apparent feature vector.

S2, extracting apparent feature vectors of the target object in the video frame through a depth cosine measurement learning network model, and obtaining labels, apparent feature vectors and coordinate positions of the target object in combination with the step S1.

In a specific embodiment, the depth cosine metric learning network model extracts apparent feature vectors of the target object to construct an event aggregation feature vector queue. The event-aggregate feature vector queue may be used for similarity calculation of feature vectors.

Generally, two apparent features extracted from the same target are more similar than those extracted from different targets, and the invention aims to select a depth measurement learning network model as a method for extracting the similar feature space. Metric learning (METRIC LEARNING) can be understood as a clustering problem, and the clustering effect is generated by making the metric distance between similar target objects smaller after learning, namely, minimizing the distance between samples of the same class and increasing the distance between samples of different classes. The depth Cosine metric learning network model (Cosine METRIC LEARNING) proposed by Nicolai Wojke et al is used as a model frame for extracting apparent feature vectors of target detection, and the apparent feature vectors of the target are extracted to construct an event aggregation feature vector queue. Fig. 4 shows a network structure of a depth cosine metric learning network model, the input size of the model is 128×64, the activation function of each layer is an exponential linear unit ELU, and the final output apparent eigenvector length is 128. The data set trained by the deep cosine metric learning network model consists of three parts: the mark 1501 dataset, the part DukeMTMC-reID dataset and the self-collected dataset all make a training dataset according to the standard of the mark 1501 dataset, and the final pedestrian ID is 5257.

In a specific embodiment, a Django framework is used to encapsulate the object detection model and the depth cosine metric learning network model. And providing an http interface for the client to request after encapsulation, so that model management and external request are facilitated.

Packaging the method for generating and aggregating the alarm events in a logic service based on the python language, taking the acquired video frame as an input image, requesting Django a model service packaged by the frame, requesting YoloV a target detection model, and judging whether a target object exists in the detection result; if yes, judging whether the confidence coefficient is larger than a set threshold, if yes, discarding the confidence coefficient, and if yes, further requesting the depth cosine measurement learning network model to extract the feature vector of the current target object.

S3, judging whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through an alarm event aggregation rule based on the label, the system time, the apparent feature vector and the coordinate position of the target object.

In a specific embodiment, as shown in fig. 5, step S3 specifically includes:

S31, inquiring whether a target object exists in the constructed event aggregation feature vector queue through a label, if so, entering a time matching stage, otherwise, directly generating an alarm event, and adding the label, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue;

S32, judging whether the difference value between the system time and the time of the target object in the event aggregation feature vector queue is within an effective time period, if so, entering a cosine similarity matching stage, otherwise, directly generating an alarm event, and adding the tag, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue; and

S33, performing cosine similarity calculation on the apparent feature vector of the target object and the apparent feature vector of the target object in an effective time period in the event aggregation feature vector queue, judging whether the target object accords with a matching rule or not by judging the cosine similarity value and a preset cosine similarity threshold value, if the cosine similarity value is smaller than the cosine similarity threshold value, indicating that the apparent feature vector is unsuccessfully matched, directly generating an alarm event, adding the tag, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue, and if the cosine similarity value is larger than or equal to the cosine similarity threshold value, indicating that the apparent feature vector is successfully matched, and if the cosine similarity value is smaller than the cosine similarity threshold value, indicating that the alarm event corresponding to the target object belongs to a repeated alarm event.

After obtaining the label, apparent feature vector and coordinate position of the target object, taking the label and the current system time of the target object as query objects, firstly querying whether the currently obtained target object exists in an event aggregation feature vector queue, if not, indicating that the target object appears for the first time, directly generating an alarm event, and simultaneously adding the label, the feature vector and the current system time of the target object into the event aggregation feature vector queue; if a current target object exists, further judging whether the difference value between the current system time and the target time in the event aggregation feature vector queue is within an effective time period, wherein in a specific embodiment, the set effective time is 10 minutes, if the difference value is not within the effective time period, indicating that the event aggregation feature vector queue is expired, directly generating an alarm event, and updating the event aggregation feature vector queue; if the period of time is valid, a cosine similarity matching phase is entered.

The cosine similarity threshold is set according to the labels of the target objects, and the label thresholds of different target objects are different. And performing cosine similarity calculation on the apparent feature vector of the current target object and the apparent feature vector of the target object in the effective period in the event aggregation feature vector queue, and judging whether the current target object accords with a matching rule or not by judging and comparing the cosine similarity value with a preset cosine similarity threshold value. If the cosine similarity value is smaller than the cosine similarity threshold value, the apparent feature vector is unsuccessfully matched, the matching rule is not met, an alarm event is directly generated, and the apparent feature vector of the target in the event aggregation feature vector queue is updated; if the cosine similarity value is greater than or equal to the cosine similarity threshold value, the apparent feature vector is successfully matched, the alarm event corresponding to the current target object belongs to a repeated alarm event, and the alarm event is directly filtered and discarded and is not displayed. Therefore, the effect of aggregation of the alarm events corresponding to the same target object is achieved.

In a specific embodiment, the cosine similarity calculation formula is as follows:

Wherein A, B is the apparent eigenvector of the target, A _i、B_i is the component of vectors A and B, respectively, and cos (θ) is the cosine value. If the cosine similarity value is smaller than the cosine similarity threshold value, the fact that the target feature vector is unsuccessfully matched with the apparent feature vector of the same target in the feature queue is indicated, and the target feature vector is not a repeated alarm event and is pushed to an industrial intelligent alarm management platform to be displayed as a new alarm event; if the cosine similarity value is larger than or equal to the cosine similarity threshold value, the fact that the cosine similarity feature is successfully matched is indicated, the current event belongs to the repeated alarm event and can be abandoned.

The application scene of the invention can realize the de-duplication of multi-target alarm event of frame skip detection, analyze and aggregate and filter the alarm problems of invalid alarms, repeated alarms and the like, discover important alarms to the greatest extent, and conduct efficient hidden trouble investigation, thereby improving the instantaneity and reliability of alarm management and effectively improving the safety production and the safety operation level of industrial processes.

The invention uses the labor insurance compliance wearing as an example to describe the alarm event aggregation method, but is not limited to the method, and can also be applied to the aggregation of events such as smoke, flame, crude oil leakage, mask, pedestrians and the like.

With further reference to fig. 6, as an implementation of the method shown in the foregoing drawings, the present application provides an embodiment of a chat text feature classification apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus is particularly applicable to various electronic devices.

The embodiment of the application provides an alarm event aggregation device based on monitoring video multi-target detection, which comprises the following components:

the target detection module 1 is configured to input the acquired video frame into a target detection model, judge whether a target object exists in a detection result, judge whether the target object is the target object according to whether the confidence exceeds a set threshold value, and enter a measurement learning step;

The measurement learning module 2 is configured to extract an apparent feature vector of a target object in a video frame through a depth cosine measurement learning network model, and obtain a label, the apparent feature vector and a coordinate position of the target object in combination with the target detection module 1; and

The aggregation classification module 3 is configured to determine whether the alarm event corresponding to the target object in the video frame is a repeated alarm event according to alarm event aggregation rules based on the tag, the system time, the apparent feature vector and the coordinate position of the target object.

Referring now to fig. 7, there is illustrated a schematic diagram of a computer apparatus 700 suitable for use in an electronic device (e.g., a server or terminal device as illustrated in fig. 1) for implementing an embodiment of the present application. The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 7, the computer apparatus 700 includes a Central Processing Unit (CPU) 701 and a Graphics Processor (GPU) 702, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 703 or a program loaded from a storage section 709 into a Random Access Memory (RAM) 704. In the RAM704, various programs and data required for the operation of the apparatus 700 are also stored. The CPU 701, the GPU702, the ROM 703, and the RAM704 are connected to each other through a bus 705. An input/output (I/O) interface 706 is also connected to the bus 705.

The following components are connected to the I/O interface 706: an input section 707 including a keyboard, a mouse, and the like; an output portion 708 including a speaker, such as a Liquid Crystal Display (LCD), or the like; a storage section 709 including a hard disk or the like; and a communication section 710 including a network interface card such as a LAN card, a modem, and the like. The communication section 710 performs communication processing via a network such as the internet. The drives 711 may also be connected to the I/O interfaces 706 as needed. A removable medium 712 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 711, so that a computer program read out therefrom is installed into the storage section 709 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 710, and/or installed from the removable media 712. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 701 and a Graphics Processor (GPU) 702.

It should be noted that the computer readable medium according to the present application may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. The described modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: inputting the obtained video frame into a target detection model, judging whether a target object exists in a detection result, judging whether the target object is the target object according to whether the confidence coefficient exceeds a set threshold value, and entering a measurement learning step; extracting apparent feature vectors of the video frames through a depth cosine measurement learning network model, and obtaining labels, apparent feature vectors and coordinate positions of the target objects by combining the steps; and judging whether the alarm event corresponding to the target object in the video frame is a repeated alarm event or not through an alarm event aggregation rule based on the label, the system time, the apparent feature vector and the coordinate position of the target object.

It should be understood that the scope of the application is not limited to the specific combination of the above technical features, but also covers other technical features formed by any combination of the above technical features or their equivalents without departing from the inventive concept. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. The alarm event aggregation method based on the monitoring video multi-target detection is characterized by comprising the following steps of:

A target detection step of inputting the obtained video frame into a target detection model, judging whether a target object exists in a detection result, judging whether the target object exists according to whether the confidence exceeds a set threshold value, and entering a measurement learning step, wherein the video frame is obtained by using an OpenCV, and a multi-target detection algorithm in the target detection model is packaged by using a Django frame;

A measurement learning step, namely extracting an apparent feature vector of the target object in the video frame through a depth cosine measurement learning network model, and obtaining a label, the apparent feature vector and a coordinate position of the target object in combination with the target detection step; the depth cosine metric learning network model extracts the apparent feature vector of the target object to construct an event aggregation feature vector queue; and

An aggregation classification step of judging whether the alarm event of the target object in the video frame is a repeated alarm event through an alarm event aggregation rule based on the tag, system time, apparent feature vector and the coordinate position of the target object; the alarm event aggregation rule specifically includes:

Inquiring whether the constructed event aggregation feature vector queue has the target object or not through the tag, if yes, entering a time matching stage, otherwise directly generating an alarm event, and adding the tag of the target object, the apparent feature vector and the system time into the event aggregation feature vector queue;

judging whether the difference value between the system time and the time of the target object in the event aggregation feature vector queue is within an effective time period, if so, entering a cosine similarity matching stage, otherwise, directly generating an alarm event, and adding the tag of the target object, the apparent feature vector and the system time into the event aggregation feature vector queue: and

And performing cosine similarity calculation on the apparent feature vector of the target object and the apparent feature vector of the target object in an effective time period in the event aggregation feature vector queue, judging whether the target object accords with a matching rule or not by judging that a cosine similarity value is compared with a preset cosine similarity threshold value, if the cosine similarity value is smaller than the cosine similarity threshold value, indicating that the apparent feature vector is unsuccessfully matched, directly generating an alarm event, adding the tag, the apparent feature vector and the system time of the target object into the event aggregation feature vector queue, and if the cosine similarity value is larger than or equal to the cosine similarity threshold value, indicating that the apparent feature vector is successfully matched, wherein the alarm event corresponding to the target object belongs to a repeated alarm event.

2. The method for aggregating alarm events based on surveillance video multi-objective detection of claim 1, further comprising, prior to the objective detection step:

And acquiring a real-time video stream of the monitoring video through OpenCV, and intercepting the real-time video stream according to a certain time interval to obtain the video frame.

3. The method for aggregating alarm events based on surveillance video multi-objective detection of claim 1, wherein the objective detection model and depth cosine metric learning network model are encapsulated with Django framework.

4. The method for aggregating alarm events based on surveillance video multi-objective detection of claim 1, wherein the objective detection model adopts YoloV as a model framework, and the original classification model is improved to a lightweight classification model based on YoloV model.

5. The method for aggregating alarm events based on surveillance video multi-objective detection of claim 1, wherein the activation function of each layer in the network structure of the objective detection model is a linear rectification function ReLU, and the activation function of each layer in the network structure of the depth cosine metric learning network model is an exponential linear unit ELU.

6. An alarm event aggregation device based on monitoring video multi-target detection is characterized by comprising the following steps:

The target detection module is configured to input the acquired video frame into a target detection model, judge whether a target object exists in a detection result, judge whether the target object exists according to whether the confidence exceeds a set threshold value, and enter a measurement learning step, wherein the video frame is acquired by using an Open CV, and a multi-target detection algorithm in the target detection model is packaged by using a Django frame;

The measurement learning module is configured to extract apparent feature vectors of target objects in the video frames through a depth cosine measurement learning network model, and obtain labels, apparent feature vectors and coordinate positions of the target objects in combination with the target detection module; the depth cosine metric learning network model extracts the apparent feature vector of the target object to construct an event aggregation feature vector queue; and

An aggregate classification module configured to determine, via an alert event aggregation rule, whether the alert event of the target object in the video frame is a duplicate alert event based on the tag, system time, the apparent feature vector, and the coordinate location of the target object; the alarm event aggregation rule specifically includes:

7. An electronic device, comprising:

One or more processors;

Storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.