CN106934378B

CN106934378B - Automobile high beam identification system and method based on video deep learning

Info

Publication number: CN106934378B
Application number: CN201710156201.4A
Authority: CN
Inventors: 李成栋; 丁子祥; 许福运; 张桂青; 郝丽丽
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2020-04-24
Anticipated expiration: 2037-03-16
Also published as: CN106934378A

Abstract

The invention discloses a system and method for recognizing high beam lights of an automobile based on video deep learning. The system includes the following two parts: a front part, which is used to realize the identification and processing of illegal behavior of high beam lights, including road monitoring connected in sequence. Equipment module, video processing and recognition module, recognition result processing module, and database of illegal results to be checked; background part, used for video processing and deep learning of video, including key frame extraction algorithm, labelled database and deep learning module, labelled The database is constructed by calling the key frame extraction algorithm to extract key frames from the original video data. The data in the labeled database is used for the training of the deep learning module. The trained deep learning module and the key frame extraction algorithm are used for video processing and recognition. module call. The invention automatically analyzes and recognizes the surveillance video, ensures the completeness of the law enforcement evidence, and is similar to manual judgment and has intelligence.

Description

Automobile high beam identification system and method based on video deep learning

Technical Field

The invention relates to an automobile high beam identification system, in particular to an automobile high beam identification system and method based on video deep learning. Belongs to the technical field of intelligent traffic.

Background

Since the innovation is open, the economy of China is continuously, stably and rapidly developed, the living standard of people in China is improved unprecedentedly, and more people in China have private vehicles. The rapid increase of the number of private cars brings convenience to people going out, and meanwhile, the occurrence frequency of traffic accidents is higher and higher.

There are many reasons for traffic accidents, many of which are caused by improper use of high beam lights. At present, the violation of the high beam is mainly supervised by the traffic police, and due to the limitation of police force and time, the violation of all the high beams cannot be effectively supervised. In addition, some high beam snapshot systems developed in recent years all recognize snapshot pictures, but these methods have certain limitations, which are expressed in that: 1) the number of the captured high beam pictures is small and inconsistent, the high beam pictures are likely to be generated by a driver during normal use and are easily misjudged as disorder high beam, so that the pictures are not sufficient as law enforcement evidence; 2) in order to obtain the pictures, a plurality of capturing devices are often additionally erected at the same place, so that the manufacturing cost is high; 3) the originally laid video monitoring equipment cannot be completely utilized, and resource waste is caused.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an automobile high beam identification system based on video deep learning.

The invention also provides an automobile high beam identification method based on video deep learning corresponding to the system.

In order to achieve the purpose, the invention adopts the following technical scheme:

an automobile high beam identification system based on video deep learning comprises the following two parts:

the foreground part is used for realizing the identification and processing of the high beam violation behaviors and comprises a road monitoring equipment module, a video processing and identifying module, an identification result processing module and a database of the violation results to be detected, which are connected in sequence;

the background part is used for processing the video and realizing the deep learning of the video and comprises a key frame extraction algorithm, a labeled database and a deep learning module, wherein the labeled database is constructed by calling the key frame extraction algorithm to extract the key frame from the original video data, the data in the labeled database is used for the training of the deep learning module, and the trained deep learning module and the key frame extraction algorithm are used for the calling of the video processing and recognition module.

As one of the preferable technical solutions, the key frame extraction algorithm is a clustering-based key frame extraction algorithm.

As one of the preferable technical solutions, the deep learning module is a CNN + LSE (convolutional neural network + least square estimation) -based deep learning module.

The system corresponds to an automobile high beam identification method based on video deep learning, and the method specifically comprises the following steps:

(1) the road monitoring equipment module acquires driving video data of the automobile and transmits the driving video data to the video processing and identifying module;

(2) the video processing and identifying module calls a key frame extraction algorithm to extract key frames of video data, then graying operation is carried out, the grayed key frames are used as input, a deep learning module which is trained according to a database with labels and is based on CNN + LSE is called, output labels of all key frames are obtained, the output labels comprise dipped headlights, fog lamps or high beam lamps, and the labels are assigned to corresponding key frame images;

(3) and (3) taking the video data and the key frame with the label obtained in the step (2) as the input of a recognition result processing module for judging whether the vehicle violates the regulations, embedding a license plate recognition system in the recognition result processing module, extracting the license plate of the target vehicle when the target vehicle has the high beam violation behaviors, acquiring the vehicle information, and importing the suspected violation video data into a database of the violation results to be detected.

In the step (2), the key frame extraction algorithm is as follows:

(2-1) taking the ith segment V in the original video database_iExtracting n frames at equal time intervals and using F_i,jNaming the frame at the jth moment of the ith video data, and representing the key frame sequence of the corresponding video data as { F }_i,1,F_i,2,...,F_i,nIn which F_i,₁Is the first frame, F_i,nIs a tail frame; defining the similarity between two adjacent frames as the similarity of histograms of the two adjacent frames (namely histogram characteristic difference), and controlling the clustering density by a predefined threshold value delta; wherein i, j and n are integers;

(2-2) selecting the first frame F_i,1Is the initial cluster center and calculates frame F_i,jSimilarity with initial cluster center, if the value is less than delta, judging that the distance between the frame and the cluster center frame is too large, therefore, F_i,jCannot be added to the cluster; if F_i,jSimilarity with all clustering centers is less than delta, F_i,jForm aA new cluster, F_i,jIs a new cluster center; otherwise, adding the frame into the cluster with the maximum similarity to the frame, and enabling the distance between the frame and the center of the cluster to be minimum;

(2-3) repeating (2-2) to convert the original video data V_iAfter the n frames extracted in (1) are respectively classified into different clusters, the key frames can be selected: extracting the frame nearest to the cluster center from each cluster as the representative frame of the cluster, wherein the representative frames of all clusters form the original video data V_iThe key frame of (1).

In the step (2), the construction method of the database with the tags comprises the following steps:

the method comprises the steps of taking a large amount of vehicle running video data under a big data background as original video data, calling a key frame extraction algorithm based on clustering to the original video data to extract key frames, manually judging the light types of vehicles in the key frames, and adding labels to each key frame to enable the original key frames to become labeled data, wherein the label types comprise: three types of dipped headlight, fog light and high beam are respectively represented by-1, 0 and 1; storing the key frame data with the label into a labeled database, wherein the data in the labeled database are the original video data and the labeled key frame thereof, and the labeled key frame is represented as (F)_i,jK), where k takes the value-1, 0 or 1.

In the step (2), a construction method of the CNN + LSE-based deep learning module is that a LeNet5 convolutional neural network structure is adopted, the module is divided into eight layers, the first six layers are a feature extraction part, the second two layers are a classifier part, wherein the feature extraction layer adopts a classical convolutional neural network structure, and the classifier layer adopts a full-connection structure; the module takes data in a labeled database as training data, a CNN + LSE combined algorithm is adopted to train the deep learning module, a CNN method is adopted to train the feature extraction part, and an LSE method is adopted to train the classifier layer so as to realize the rapid learning of module parameters and enhance the generalization capability of the module.

The specific method comprises the following steps:

inputting a video key frame in a database with labels into a first layer of a CNN + LSE-based deep learning module; performing convolution operation on the upper layer output by adopting different convolution cores in the second layer; the third layer performs pooling (down-sampling) on the upper layer output; the fourth layer and the fifth layer repeat the operations of the second layer and the third layer; the sixth layer sequentially expands the output characteristics of the upper layer and arranges the output characteristics into a line; the seventh layer is fully interconnected with the upper layer output features; the last layer is also in a form of full interconnection with the upper layer. The output of the deep learning module based on CNN + LSE is in three cases: low beam, fog and high beam, denoted-1, 0 and 1, respectively.

The deep learning module based on CNN + LSE is trained as follows:

taking any sample from the tagged database (F)_i,jK) to F_i,jFirstly, graying operation is carried out to change the key frame into a grayscale image, and then the grayed key frame F is converted into a grayscale image_i,j' input into the module, i.e. input data as (F)_i,j', k); training the two parts of the deep learning module by adopting a CNN (common noise network) and LSE (least squares) method respectively; the parameter training method of the feature extraction part comprises the following steps:

(2-a1) initializing all connection weight parameters of the feature extraction part in the deep learning module;

(2-A2) calculating the actual output label O corresponding to the input key frame_k；

(2-A3) calculating actual output label O_kDifference from the corresponding ideal output label k;

(2-A4) weight learning: reversely transmitting and adjusting a connection weight parameter matrix of a feature extraction part in the deep learning module by a method of minimizing errors;

(2-A5) until all the key frames of the video data are traversed, and the parameter training is finished;

the parameter training method of the classifier part is as follows:

(2-B1) connection weights and biases between the rasterized layer and the fully-connected layer are randomly generated and the fully-connected layer output is written

Is a matrix

Wherein G (-) is an activation function, a_iTo connect weights, b_iFor bias, L is the number of nodes of the full link layer, N is the number of all key frames, x_jKey frame, i ═ 1,2, …, L, j ═ 1,2, …, N;

(2-B2) writing the net output result of the corresponding key frame as an output vector Y ═ Y₁y₂… y_n]^TWherein y is_jFor the jth key frame x_jA corresponding output tag;

(2-B3) calculating an output weight β ═ PHY between the fully-connected layer and the output layer, where P ═ H^TH)^-1。

In the step (3), the data in the database of the violation results to be detected is the video data judged to be violating the regulations by the identification result processing module, wherein the violation results to be detected should be manually checked, then the information which is confirmed to be correct is imported into the database of the violation regulations, and the information which is judged to be correct is deleted.

In the step (3), the method for judging whether the high beam violation exists is as follows: keyframe labeled as high beam

And its next key frame

Time interval Δ T between j₂-j₁If the delta T is larger than or equal to theta, the vehicle has the phenomenon that the high beam violates the regulations, wherein the theta is a violation time threshold value.

The invention has the beneficial effects that:

the invention automatically analyzes and identifies the monitoring video, ensures the completeness of law enforcement evidence, is similar to manual judgment, has intelligence, is simple in equipment arrangement, and can fully utilize the original monitoring equipment. The method comprises the following specific steps:

(1) by mining the video data, the sufficiency of law enforcement evidence is greatly improved on the basis of ensuring the accuracy, and the loss of an evidence chain is prevented when the high beam violates the law;

(2) the requirement of the same point position on the number of the devices is low, and a large amount of originally distributed monitoring devices can be directly reused, so that the cost is reduced, and the utilization rate of the devices is improved;

(3) the intelligent judgment of high beam violation is carried out by adopting a video deep learning-based mode, so that manual law enforcement is replaced, real automation is realized, and the efficiency is improved; meanwhile, after deep learning, the high beam violation identification effect is expected to reach or exceed the manual identification level, so that the real intellectualization of the identification system is realized;

(4) the deep learning module performs parameter learning on the system by adopting a CNN + LSE method, so that the parameter learning speed of the system is higher, the generalization capability of the module is stronger, and the robustness of the system is improved.

Drawings

FIG. 1 is a schematic diagram of the system architecture of the present invention;

fig. 2 is a diagram of a CNN + LSE-based deep learning module architecture.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, an automobile high beam identification system based on video deep learning includes the following two parts:

The key frame extraction algorithm is a key frame extraction algorithm based on clustering; the deep learning module is a CNN + LSE-based deep learning module.

(1) the road monitoring equipment module obtains the driving video data of the automobile and transmits the driving video data to the video processing and identifying module.

(2) The video processing and recognition module calls a key frame extraction algorithm to extract key frames of original video data, then graying operation is carried out, the grayed key frames are used as input, a trained deep learning module with a label database based on CNN + LSE is called, output labels of all key frames, including dipped headlights, fog lights or high beam lights, are obtained, and the labels are assigned to corresponding key frame images.

The key frame extraction algorithm is as follows:

(2-1) taking the ith segment V in the original video database_iExtracting n frames at equal time intervals and using F_i,jNaming the frame at the jth moment of the ith video data, and representing the key frame sequence of the corresponding video data as { F }_i,1,F_i,2,...,F_i,nIn which F_i,1Is the first frame, F_i,nIs a tail frame; defining the similarity between two adjacent frames as the similarity of histograms of the two adjacent frames (namely histogram characteristic difference), and controlling the clustering density by a predefined threshold value delta; wherein i, j and n are integers;

(2-2) selecting the first frame F_i,1Is the initial cluster center and calculates frame F_i,jSimilarity with initial cluster center, if the value is less than delta, judging that the distance between the frame and the cluster center frame is too large, therefore, F_i,jCannot be added to the cluster; if F_i,jSimilarity with all clustering centers is less than delta, F_i,jForm a new cluster, F_i,jIs a new cluster center; otherwise, adding the frame into the cluster with the maximum similarity to the frame, and enabling the distance between the frame and the center of the cluster to be minimum;

The construction method of the database with the labels comprises the following steps:

As shown in fig. 2, the deep learning module based on CNN + LSE is constructed by adopting a LeNet5 convolutional neural network structure, wherein the module is divided into eight layers, the first six layers are a feature extraction part, and the second two layers are a classifier part, wherein the feature extraction layer adopts a classical convolutional neural network structure, and the classifier layer adopts a full-connection structure; the module takes data in a labeled database as training data, a CNN + LSE combined algorithm is adopted to train the deep learning module, a CNN method is adopted to train the feature extraction part, and an LSE method is adopted to train the classifier layer so as to realize the rapid learning of module parameters and enhance the generalization capability of the module. The specific method comprises the following steps: inputting a video key frame in a database with labels into a first layer of a CNN + LSE-based deep learning module; performing convolution operation on the upper layer output by adopting different convolution cores in the second layer; the third layer performs pooling (down-sampling) on the upper layer output; the fourth layer and the fifth layer repeat the operations of the second layer and the third layer; the sixth layer sequentially expands the output characteristics of the upper layer and arranges the output characteristics into a line; the seventh layer is fully interconnected with the upper layer output features; the last layer is also in a form of full interconnection with the upper layer. The output of the deep learning module based on CNN + LSE is in three cases: low beam, fog and high beam, denoted-1, 0 and 1, respectively.

The deep learning module based on CNN + LSE is trained as follows:

the parameter training method of the classifier part is as follows:

Is a matrix

(2-B2) corresponding to the key frameThe network output result is written as an output vector Y ═ Y₁y₂… y_n]^TWherein y is_jFor the jth key frame x_jA corresponding output tag;

(3) And (3) taking the original video data and the key frame with the label obtained in the step (2) as the input of a recognition result processing module for judging whether the vehicle violates the regulations, embedding a license plate recognition system in the recognition result processing module, extracting the license plate of the target vehicle when the target vehicle has the high beam violation behaviors, acquiring the vehicle information, and importing the suspected violation video data into a database of the violation results to be detected.

The method for judging whether the high beam violation behaviors exist is as follows: keyframe labeled as high beam

And its next key frame

(4) The data in the database of the result of the violation to be detected is the video data judged to be violating the regulations by the identification result processing module, wherein the result of the violation to be detected should be manually checked, then the information which is confirmed to be correct is imported into the database of the violation to be detected, and the misjudged information is deleted.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto, and various modifications and variations which do not require inventive efforts and which are made by those skilled in the art are within the scope of the present invention.

Claims

1. a car high beam identification method based on video deep learning, is characterized in that, concrete steps are as follows:

(1) The road monitoring equipment module obtains the driving video data of the car and transmits it to the video processing and identification module;

(2) The video processing and recognition module calls the key frame extraction algorithm to extract the key frames of the original video data, and then performs a grayscale operation. The grayscale key frames are used as input, and the trained CNN+LSE based on the labeled database is called. The deep learning module of , obtains the output label of each key frame, including low beam, fog light or high beam, and assigns the label to the corresponding key frame image;

(3) The original video data and the key frame with the label obtained in step (2) are used as the input of the recognition result processing module to judge whether the vehicle violates regulations, and the license plate recognition system is embedded in the recognition result processing module, when the target When the vehicle has high beam violations, extract the license plate, obtain vehicle information, and import the suspected violation video data into the pending violation result database;

In step (3), the method for judging whether there is a high beam violation is: the key frame labeled as high beam

its next keyframe

The time interval between ΔT=j ₂ -j ₁ , if ΔT≥θ, the vehicle has the phenomenon of illegal use of high beams, where θ is the violation time threshold;

In step (2), the key frame extraction algorithm is as follows:

(2-1) Take the _i -th segment Vi in the original video database, extract n frames at equal time intervals, and use F _i,j to name the frame at the j-th moment of the i-th segment of video data, and assign the key of the corresponding video data The frame sequence is expressed as {F _i,1 ,F _i,2 ,...,Fi _,n }, where F _i,1 is the first frame and F _i,n is the last frame; defines the similarity between two adjacent frames Degree is the similarity between the histograms of two adjacent frames, that is, the difference in histogram features, and the predefined threshold δ controls the density of clusters; where i, j, and n are all integers;

(2-2) Select the first frame F _i,1 as the initial cluster center, and calculate the similarity between the frame F _i,j and the initial cluster center, if the similarity is less than δ, then determine the frame F _i The distance between _,j and the cluster center frame is too large, so F _i,j cannot be added to the cluster; if the similarity between F _i,j and all cluster centers is less than δ, then F _i,j forms a new cluster , F _i,j is the new cluster center; otherwise, the frame F _i,j is added to the cluster with the greatest similarity to it, so that the frame F _i,j and the center of the cluster are closer the smallest distance between

(2-3) Repeat the operation of (2-2), after classifying the n frames extracted from the original video data V _i into different clusters, the key frame can be selected: extract the isolated frames from each cluster The nearest frame of the cluster center is used as the representative frame of this cluster, and the representative frames of all clusters constitute the key frame of the original video data _Vi ;

In step (2), the construction method of the tagged database is as follows:

Take a large number of vehicle driving video data in the background of big data as the original video data, call the key frame extraction algorithm based on clustering on the original video data to extract key frames, manually determine the light type of the vehicle in the key frame, and add to each key frame. The label turns the original key frame into labeled data. The label categories include: low beam, fog light, and high beam, which are represented by -1, 0, and 1 respectively; store the labeled key frame data in Labeled database, the data in the labeled database is the original video data and its labelled key frame, which is expressed as (F _i,j ,k), where k is -1, 0 or 1;

In step (2), the construction method of the deep learning module based on CNN+LSE is to use the LeNet5 convolutional neural network structure. The module is divided into eight layers, the first six layers are the feature extraction part, and the last two layers are the classifier part. Among them, the feature extraction layer adopts the classic convolutional neural network structure, and the classifier layer adopts the fully connected structure; the data in the labeled database is used as the training data, and the deep learning module is trained by the CNN+LSE combination algorithm. For the feature extraction part The CNN method is used for training, and for the classifier layer, the LSE method is used for training;

The training process of the deep learning module based on CNN+LSE is as follows:

Take any sample (F _i,j ,k) from the labeled database, first perform a grayscale operation on F _i,j to make it a grayscale image, and then convert the grayscaled key frames F _{i, j} ' is input into the module, that is, the input data is (F _i,j ',k); the two parts of the deep learning module are trained by CNN and LSE respectively; among them, the parameter training method of the feature extraction part is as follows:

(2-A1) Initialize all connection weight parameters in the feature extraction part of the deep learning module;

(2-A2) Calculate the actual output label O _k corresponding to the input key frame;

(2-A3) Calculate the difference between the actual output label O _k and the corresponding ideal output label k;

(2-A4) Weight learning: adjust the connection weight parameter matrix of the feature extraction part in the deep learning module by backpropagation by minimizing the error;

(2-A5) until the key frames of all video data are traversed, and the parameter training is completed;

The parameter training method of the classifier part is as follows:

(2-B1) The connection weights and biases between the rasterization layer and the fully connected layer are randomly generated, and the output of the fully connected layer is written as a matrix

where G( ) is the activation function, a _i is the connection weight, b _i is the bias, L is the number of fully connected layer nodes, N is the number of all key frames, x _j is the key frame, i=1,2 ,...,L,j=1,2,...,N;

(2-B2) Write the network output result of the corresponding key frame as an output vector Y=[y ₁ y ₂ ... y _n ] ^T , where y _j is the output label corresponding to the jth key frame x _j ;

(2-B3) Calculate the output weight β=PHY between the fully connected layer and the output layer, where P=(H ^T H) ^-1 .

2. method according to claim 1 is characterized in that, in step (3), the data in the illegal result database to be inspected is the video data judged to be illegal through the identification result processing module, and the illegal result to be inspected wherein should be accepted Manual inspection, and then import the confirmed information into the violation database, and delete the misjudged information.