CN114926667B - Image identification method based on cloud edge cooperation - Google Patents
Image identification method based on cloud edge cooperation Download PDFInfo
- Publication number
- CN114926667B CN114926667B CN202210850570.4A CN202210850570A CN114926667B CN 114926667 B CN114926667 B CN 114926667B CN 202210850570 A CN202210850570 A CN 202210850570A CN 114926667 B CN114926667 B CN 114926667B
- Authority
- CN
- China
- Prior art keywords
- network
- anchor point
- model
- cloud
- point information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 47
- 238000001514 detection method Methods 0.000 claims abstract description 44
- 238000004364 calculation method Methods 0.000 claims abstract description 15
- 238000010586 diagram Methods 0.000 claims description 8
- 238000002372 labelling Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 5
- 230000002401 inhibitory effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000002829 reductive effect Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 11
- 230000006870 function Effects 0.000 description 16
- 238000013528 artificial neural network Methods 0.000 description 11
- 238000004519 manufacturing process Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 101100194363 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res2 gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an image identification method based on cloud edge-end cooperation, which solves the problems of data uploading delay, large bandwidth requirement, low analysis precision and the like of the conventional end-to-end image detection method, and the main scheme comprises the following steps: s1, constructing a MobileNet detection network model by a side end, carrying out forward propagation calculation on an uploaded image and correspondingly generating anchor point information, extracting and outputting the optimal anchor point information by the model, and judging misjudgment by detecting the optimal anchor point information by using a current picture; s2, a RetinaNet detection network model is built at the cloud end, the side-end misjudgment pictures are rechecked, whether the optimal anchor point information is finally output and the anchor point information uploaded by the side-end model is consistent or not is judged, and whether the side-end model has misdetection or not is judged correspondingly; and S3, extracting anchor point information corresponding to the false detection area in the picture into a feature vector by the edge-side model, outputting the feature vector through a sub-network, and performing cosine similarity matching and judgment on the feature vector and the feature vector corresponding to the anchor point of the residual false detection picture.
Description
Technical Field
The invention relates to the technical field of cloud edge image recognition, in particular to an image recognition method based on cloud edge collaboration.
Background
The intelligent video analysis in the power production environment becomes more and more important for the safe production of power, the traditional intelligent video analysis method has the problem of weak robustness, and the cloud-based or edge-based intelligent video method has the problems of being uneconomical, delayed, misinformation and the like, which can not meet the requirement of the safe production of power. In recent years, cloud computing and deep learning have achieved outstanding achievements in the field of intelligent video analysis, and application of cloud computing and deep learning to safety production of electric power also becomes a research hotspot, but challenges of scattered physical distribution, complex natural conditions and the like exist in an electric power safety scene. Therefore, the intelligent video analysis method based on the cloud edge collaborative framework combined with the deep learning method is adopted, and the safe production of electric power is effectively supported.
Intelligent video analysis by mainstream end-to-end methods faces the following challenges:
although the high-precision deep learning model can be trained through cloud computing if intelligent video analysis is carried out at the cloud end, the cloud end of large-scale video image data collected by terminal equipment often faces the influence of huge uploading delay brought by network bandwidth performance;
although original data such as images and videos can be obtained from nearby terminal nodes by performing edge calculation offline, the limitation of calculation of edge calculation equipment causes that edge equipment often uses some lightweight deep learning models in the video analysis process, so that the analysis accuracy is often not guaranteed.
Therefore, how to implement low-delay and high-precision calculation through the cloud-side collaborative framework to perform intelligent video analysis and guarantee the safe production of electric power becomes an important problem.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a cloud edge collaboration framework capable of realizing feature comparison of low-delay and high-precision calculation.
In order to solve the technical problems, the invention adopts the technical scheme that: the image identification method based on cloud edge cooperation comprises the following steps:
s1, constructing a MobileNet detection network model by an edge terminal, carrying out forward propagation calculation on an uploaded image and correspondingly generating anchor point information, predicting the anchor point information through a sub-network by the model, then inhibiting and extracting the optimal anchor point information and outputting the optimal anchor point information, judging whether misjudgment is carried out according to whether the optimal anchor point information is detected by a current picture, and carrying out cloud uploading on the misjudgment picture and the optimal anchor point information;
s2, a RetinaNet detection network model is built at the cloud, the images uploaded by the edge terminal in the S1 are rechecked, whether the best output anchor point information is consistent with the anchor point information uploaded by the edge terminal model or not is finally judged, whether the edge terminal model is falsely detected or not is correspondingly judged, and if yes, the falsely judged images are marked and sent to the edge terminal;
and S3, the picture is issued in the asynchronous reasoning S2 of the edge model, anchor point information corresponding to the false detection area in the picture is extracted as a feature vector and is output through a sub-network, the feature vector is subjected to cosine similarity matching with the feature vectors corresponding to the anchor points of the rest judged false detection pictures, the judgment that the anchor points are higher than the similarity threshold value is false judgment, and the asynchronous uploading that the anchor points are lower than the similarity threshold value is transmitted to the cloud model for rechecking.
Further, the cloud model construction steps are as follows:
constructing a ResNet50 residual network of a main network;
fusing different feature layers of the ResNet50 by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the anchor point sizes corresponding to the characteristic diagram, and giving each anchor point a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of the types of the targets to be detected, 4 is the coordinate of a box, and the anchor points with the IoU larger than 0.5 are regarded as positive samples;
and constructing a sub-network, wherein the sub-network comprises a classification sub-network for predicting the occurrence probability of the target and a frame sub-network for predicting the coordinate offset of the anchor point generation candidate area, and the Loss function of the classification sub-network is calculated by cross entry Loss and the Loss function of the frame sub-network is calculated by Smooth L1 Loss.
Further, the edge model construction steps are as follows:
sequentially constructing 1-10 layers of convolutional neural networks, wherein the dimensions of the 1 st layer and the 2 nd layer are reduced through two-dimensional convolution with the convolution kernel size of 3, the 3 rd, 4 th, 5 th, 7 th and 9 th layers are inverse residual convolutional layers, and the 6 th, 8 th and 10 th layers are inverse residual convolutional layers introducing a space attention mechanism;
fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the sizes of anchor points corresponding to the characteristic diagram, and endowing each anchor point with a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories of the target to be detected, 4 is the coordinate of box, and the anchor points with the IoU (IoU) of more than 0.5 are regarded as positive samples;
and constructing a sub-network comprising a classification sub-network, a regression sub-network and a full-connection sub-network, wherein the classification sub-network, the regression sub-network and the corresponding loss function are based on a cloud model, and the loss function of the full-connection sub-network is calculated based on softmax loss.
Further, the crossEncopy Loss function is defined as follows:
wherein N is the number of samples, C is the number of categories of the target to be detected,is label information of whether the ith sample belongs to the class c (if the ith sample belongs to the class c, the value is 1, otherwise, the value is 0),in order to be a super-parameter,is the confidence that the ith sample prediction belongs to class c,
definition ofIs a firstCoordinate vector of relative position of each prediction area and anchor point reference area,Coordinate vector of relative position of ith target real area and anchor point reference area
Wherein,representing the center coordinates;indicating the area border height and width;respectively representing the central abscissa of the real areas of the prediction area, the anchor point and the artificial labeling area;respectively representing the central vertical coordinates of the real areas of the prediction area, the anchor point and the artificial labeling area,
the Smooth L1 Loss function is defined as follows:
further, the cosine similarity matching calculation formula is as follows:
wherein,the feature vectors of the anchor information corresponding to the false detection regions,and determining the residual feature vectors corresponding to the false detection picture anchors in the edge model.
Compared with the prior art, the invention has the beneficial effects that: the cloud edge collaborative framework is used for realizing low delay and high precision calculation, the whole framework cloud model and the edge end model are cooperatively operated, reliability and stability are achieved, large-batch false detection pictures in the same time period can be judged, the model judgment speed is high, the precision is high, and safety production of electric power can be further guaranteed in intelligent video analysis.
Drawings
The disclosure of the present invention is illustrated with reference to the accompanying drawings. It is to be understood that the drawings are designed solely for the purposes of illustration and not as a definition of the limits of the invention. In the drawings, like reference numerals are used to refer to like parts. Wherein:
fig. 1 schematically shows a cloud edge collaboration flow chart according to an embodiment of the present invention;
fig. 2 schematically shows a network structure diagram of a cloud model according to an embodiment of the present invention;
FIG. 3 is a diagram schematically illustrating an edge-side model inverse residual structure backbone network according to an embodiment of the present invention;
FIG. 4 schematically shows a diagram of a proposed edge model network framework according to an embodiment of the present invention;
fig. 5 schematically shows a side end model sub-network framework diagram proposed according to an embodiment of the present invention.
Detailed Description
It is easily understood that according to the technical solution of the present invention, a person skilled in the art can propose various alternative structures and implementation ways without changing the spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as limiting or restricting the technical aspects of the present invention.
An embodiment according to the present invention is shown in conjunction with fig. 1-5.
The image identification method based on cloud edge cooperation comprises the following steps:
s1, constructing a MobileNet detection network model by an edge terminal, carrying out forward propagation calculation on an uploaded image and correspondingly generating anchor point information, predicting the anchor point information through a sub-network by the model, then inhibiting and extracting the optimal anchor point information and outputting the optimal anchor point information, judging whether misjudgment is carried out according to whether the optimal anchor point information is detected by a current picture, and carrying out cloud uploading on the misjudgment picture and the optimal anchor point information;
s2, a RetinaNet detection network model is built at the cloud, the images uploaded by the edge terminal in the S1 are rechecked, whether the best output anchor point information is consistent with the anchor point information uploaded by the edge terminal model or not is finally judged, whether the edge terminal model is falsely detected or not is correspondingly judged, and if yes, the falsely judged images are marked and sent to the edge terminal;
and S3, the picture is issued in the asynchronous reasoning S2 of the edge model, anchor point information corresponding to the false detection area in the picture is extracted as a feature vector and is output through a sub-network, the feature vector is subjected to cosine similarity matching with the feature vectors corresponding to the anchor points of the rest judged false detection pictures, the judgment that the anchor points are higher than the similarity threshold value is false judgment, and the asynchronous uploading that the anchor points are lower than the similarity threshold value is transmitted to the cloud model for rechecking.
As shown in fig. 2, for the establishment of the cloud RetinaNet detection network model:
backbone network
A backbone network ResNet50 residual error network is constructed, 5 blocks Res1, res2, res3, res4 and Res5 are sequentially constructed based on a residual error mapping method of H (x) = F (x) + x, the downsampling rates of the 5 blocks are respectively 2^1,2^2, 2^3, 2^4 and 2^5, and generally, retinaNet selects 3 modules as initial detection layers, namely Res3, res4 and Res5.
Feature pyramid network
The feature pyramid network FPN is used to fuse different feature layers of the ResNet50 by bottom-up, top-down and cross-connect. The top-down and bottom-up lines respectively generate Res3, res4, res5, P3, P4, P5, P6, P7 and other characteristic maps, wherein P3 to P5 are calculated from Res3 to Res5, and P6 to P7 are used for enabling the model to better detect a large object, and benefiting from a larger receptive field, the operation can ensure that each layer has proper resolution and strong semantic characteristics and is matched with a target detection algorithm and a Focal Loss, so that the detection performance of the object is improved.
Feature map anchor points for models
The Retina Net takes the idea of regional candidate networks (RPN) in the fast R-CNN as reference, the sizes of anchors corresponding to 5 levels P3, P4, P5, P6, P7 are respectively 32^2 to 512^2, the length-width ratio of each pyramid level is {1, 2.
Sub-network and loss function
The classification subnetwork can predict the probability of target occurrence for each Anchor. The classification subnetwork is a small FCN attached to the FPN, 4-3-x-3 convolutions are superimposed on feature of each hierarchy, each convolution layer has C filters and is activated by ReLU, and finally, 3-x-3 convolution layers of K-A filters are attached, and KA represents the probability that the A frames are respectively K categories.
Finally, cross entropy Loss (Cross Entry Loss) is used for predicting categories, and hyper-parameters are introduced according to the phenomenon of unbalance of positive and negative samplesNew loss function for controlling the weight of the contribution of positive and negative samples to the overall classification lossThe definition is as follows:
where N is the number of samples, C is the number of classes of objects to be detected,is the label information of whether the ith sample belongs to the class c (if the ith sample belongs to the class c, the value is 1, otherwise, the value is 0),is the confidence that the ith sample prediction belongs to class c.
For the problem of difficult-to-separate samples, inIs added with a regulating factorWhereinIs a hyper-parameter, the Focal local function is defined as follows:
the bounding sub-network is used for localization, which can predict the coordinate offset of each Anchor generating candidate region. The frame prediction sub-network and the classification sub-network are processed in parallel, the two structures are similar, and 4 3 × 3 convolutions are superposed on feature of each hierarchy, each convolution layer has C filters and is activated along with ReLU, finally, a 3 × 3 convolution layer with 4 × A filters is added, and 4 is prediction of frame regression 4 coordinates. In the bounding box regression task, the penalty function typically uses Smooth L1 Loss. Let ti denote the coordinate vector of the relative position of the ith prediction region and the Anchor reference regionWherein x, y, w, h respectively represent the x coordinate and y coordinate of the center of the prediction region and the width and height,coordinate vector representing relative position of ith target real area and Anchor reference area。
Wherein,which represents the coordinates of the center of the circle,indicating the height and width of the region's bounding box,respectively represents the central horizontal coordinates of the real areas of the prediction area, the Anchor and the artificial labeling area,and respectively representing the central vertical coordinates of the real areas of the prediction area, the Anchor and the artificial labeling area.
Smooth L1 Loss is defined as follows:
as shown in fig. 3, 4 and 5, for the establishment of the detection network model based on MobileNet at the edge:
backbone network
The specific steps of the whole construction of the neural network structure are as follows:
constructing the first layer of the neural network, the zeroth layer being the convolutional layer (conv 2d _ 1): the convolution kernel size is 3 × 3, the number of kernels is 32, and the step size is 2. An input image having an input size of 416 × 416 × 3 is subjected to convolution processing, and the output image size is 208 × 208 × 32.
Constructing a second layer of the neural network, the second layer being a convolutional layer (conv 2d _ 2): the convolution kernel size is 3 × 3, the number of kernels is 64, and the step size is 2. An input image having a size of 208 × 208 × 32 is subjected to convolution processing, and the output image has a size of 208 × 208 × 64.
Building a third layer of the neural network, the third layer being an inverse residual convolutional layer (bneck _ 1): the inverse residual convolution includes 2 1 × 1 convolutions and 13 × 3 convolutions, each convolution layer being followed by a BN layer and a ReLU activation function. After the feature map of 208 × 208 × 64 is subjected to inverse residual convolution, the output feature map size is 208 × 208 × 64, and the output feature map size is output to bneck _2.
A fourth layer of the neural network is constructed, which is an inverse residual convolution layer (bneck _ 2): the inverse residual convolution includes 2 1 × 1 convolutions and 13 × 3 convolutions, each convolution layer being followed by a BN layer and a ReLU activation function. After the 208 × 208 × 64 feature map is subjected to inverse residual convolution, the output feature map size is 104 × 104 × 128, and the output feature map is output to bneck _3.
Fifth layer of the neural network is constructed, and fifth layer is an inverse residual convolution layer (bneck _ 3): the feature map with the size of 104 × 104 × 128 is subjected to inverse residual convolution, and the output feature map with the size of 52 × 52 × 256 is output to samBnegk _1.
And constructing a sixth layer of the neural network, wherein the sixth layer is sam inverse residual convolution (samBnegk _ 1): the feature map with the size of 52 × 52 × 256 is subjected to sam inverse residual convolution, and the output feature map with the size of 52 × 52 × 256 is output to bneck _4.
Constructing a seventh layer of the neural network, the seventh layer being an inverse residual convolutional layer (bneck _ 4): the feature map with the size of 52 × 52 × 256 is subjected to inverse residual convolution, and the output feature map with the size of 26 × 26 × 512 is output to samBneck _2.
And constructing an eighth layer of the neural network, wherein the eighth layer is sam inverse residual convolution (samBnegk _ 2), the feature map with the size of 26 multiplied by 512 outputs the feature map with the size of 26 multiplied by 512 after the sam inverse residual convolution, and bneck _5 is output.
And constructing a ninth layer of the neural network, wherein the ninth layer is an inverse residual convolution layer (bneck _ 5), the feature map with the size of 26 multiplied by 512 outputs a feature map with the size of 13 multiplied by 1024 after inverse residual convolution, and samBeck _3 is output.
And constructing a tenth layer of the neural network, wherein the tenth layer is sam inverse residual convolution 1 (samBnegk _ 3), and the feature map with the size of 13 multiplied by 1024 outputs a feature map which is 13 multiplied by 1024 and outputs bneck _6 after the feature map with the size of 13 multiplied by 1024 is subjected to the sam inverse residual convolution.
In the whole 10 layers of convolutional neural network, the layers 3, 4, 5, 7 and 9 are inverse residual convolutional layers. In the inverse residual convolution layer, the inverse residual structure (Inverted Residuals) is shown in fig. 3, and residual concatenation is performed if and only if the input and output have the same number of channels.
In the entire 10-layer convolutional neural network, layers 6, 8 and 10 are inverse residual convolutional layers introducing a Spatial Attention mechanism (Spatial Attention Module). In all 3 layers we add a spatial attention module to the inverse residual convolution layer, as shown in fig. 4. The module extracts three-dimensional features from the feature extraction networkAs an input, a two-dimensional vector is generated that represents the importance of each region. Considering that the weight information of the local features cannot be only referred to the features of the current region, but also the context information should be considered, the network does not directly adopt the convolution of 1 × 1, but uses the two-dimensional convolution pair with the convolution kernel size of 3And reducing the dimension to change the output channel to the original 1/r until the output channel is smaller than r.
Feature pyramid network
And fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-to-top, top-to-bottom and transverse connection. From top to bottom and from bottom to top, samBnegk _1, samBnegk _2, samBnegk _3 and P1, P2 and P3 feature maps are generated respectively, wherein P1, P2 and P3 are obtained by calculating samBnegk _1, samBnegk _2 and samBnegk _3. Due to the fact that the larger receptive field is obtained, the operation can guarantee that each layer has proper resolution and strong semantic features, and therefore the detection performance of the object is improved.
Feature map anchor points for models
The sizes of the anchors corresponding to the 3 levels P1, P2 and P3 are respectively 13^2, 26^2 and 52^2, the length-width ratio of each pyramid level is {1, 2, 1.
Sub-networks
Compared with RetineNet, mobileNet further adds a fully connected sub-network for embedding space learning on the basis of the existing classification sub-network and regression sub-network. The classification sub-network and the regression sub-network and the corresponding penalty functions are based on the classification sub-network and the regression sub-network of RetinaNet, and are not described in detail here. The fully-connected sub-network structure is shown in fig. 5, a prediction head is converted into a 1-dimensional vector through a scatter operation, and a fully-connected network is used for converting a one-dimensional vector after the scatter into a 128-dimensional vector to further learn an embedding space.
Loss function
For a fully connected sub-network, the loss function is an additive shaped margin loss based on the softmax loss improvement.
The softmax loss function is as follows,
wherein m is the number of samples, n is the number of classes,is the ithThe feature vector of the sample is then calculated,is the category to which the ith sample belongs,is a weight vector for the j-th class,is a bias term of class j.
The offset bj is first set to 0, and then the inner product of the weight and the input is represented by the following equation,
when in useRegularization processSo that,The regularization is toEach value in the vector is divided by the value of the vectorThereby obtaining a newNew, newIs 1. The following equation can be obtained from equation (1),
then on the one hand inputAlso useRegularizing, and multiplying by a scale parameter s; on the other hand willBy usingThis part is the core of the MobileNet detection network, the formula is also very simple, and m is 0.5 by default. Then the following equation (4) is obtained,namely, an additive angular margin loss. In the constraint (5), the first two are exactly for the weights and input featuresThe process of the regularization is carried out,
subject to
after the cloud side end models are respectively established, the specific detection steps are as follows:
for edge detection:
the method comprises the steps that a side end receives monitoring images uploaded by video equipment, the images are input into a trained MobileNet detection model, forward propagation calculation is carried out on the model, high-level and bottom-level semantic fusion is carried out on feature maps of different scales generated by forward propagation on the basis of an FPN structure, feature maps of 3 different scales are generated, and corresponding 9 pieces of anchor point information are generated in 5 different feature maps;
the feature graph and the anchor point information respectively enter a classification sub-network and a regression sub-network, the classification sub-network predicts the category information of the anchor point, the regression sub-network predicts the position information of the anchor point, and the full-connection sub-network is used for predicting 128 feature vectors of an embedding space, so that the extraction of the feature of the detection target is realized;
and if the current picture detects the target, the edge terminal uploads the current picture to the cloud for rechecking through the network and judges whether the current picture has false detection or not.
After the cloud receives the recheck picture:
the cloud receives images uploaded by the edge, the images are input into a trained RetinaNet detection model, forward propagation calculation is carried out on the model, high-level and bottom-level semantic fusion is carried out on feature maps of different scales generated by forward propagation on the basis of an FPN structure, feature maps of 3 different scales are generated, and corresponding 9 pieces of anchor point information are generated in 5 different feature maps;
the feature map and the anchor point information respectively enter a classification subnetwork and a regression subnetwork, the classification subnetwork predicts the class information of the anchor points, the regression subnetwork predicts the position information of the anchor points, all the anchor points are subjected to non-maximum suppression, the optimal target detection anchor point is extracted, and the rest anchor points are ignored;
and outputting the final predicted target and position information of the Retinonet detection model, judging whether the final predicted target and position information of the Retinonet detection model are consistent with the target and position information uploaded by the MobileNet detection model on the side, if not, judging that the prediction result of the MobileNet detection model on the side is misjudgment, namely that the detection result of the current picture is misjudgment on the side, and marking the picture as misjudgment by the cloud end and issuing the picture to the side through the network.
After the edge terminal asynchronously receives the photo and the false detection information issued by the cloud:
the edge asynchronously reasons the photos sent by the cloud, extracts the false detection area as 128-dimensional features, asynchronously stores the 128-dimensional features and the marking information, outputs 128-dimensional feature vectors, and outputs the 128-dimensional feature vectors through a full-connection sub-network;
similarity matching, namely if the output of the MobileNet detection model has a position target and position information, outputting the 128-dimensional characteristic vector of the MobileNet detection model networkMisdetection feature vector with edge storageCosine similarity matching is carried out, and a similarity matching calculation formula is as follows
And setting a false alarm threshold value to be 0.6, in the subsequent picture detection process, if the matched similarity value is higher than the threshold value, considering the current detected picture as false alarm, if the matched similarity value is lower than the threshold value, judging that the target is identified by the side end, and simultaneously, asynchronously uploading the picture to the cloud for rechecking.
The technical scope of the present invention is not limited to the above description, and those skilled in the art can make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and such changes and modifications should fall within the protective scope of the present invention.
Claims (5)
1. The image identification method based on cloud edge-end cooperation is characterized by comprising the following steps:
s1, constructing a MobileNet detection network model by a side terminal, carrying out forward propagation calculation on an uploaded image and correspondingly generating anchor point information, predicting the anchor point information through a subnetwork by the model, then inhibiting and extracting the optimal anchor point information and outputting the optimal anchor point information, judging whether misjudgment is carried out or not according to the optimal anchor point information detected by a current picture, and carrying out cloud uploading on the misjudgment picture and the optimal anchor point information;
s2, a RetinaNet detection network model is built at the cloud, the images uploaded by the edge terminal in the S1 are rechecked, whether the best output anchor point information is consistent with the anchor point information uploaded by the edge terminal model or not is finally judged, whether the edge terminal model is falsely detected or not is correspondingly judged, and if yes, the falsely judged images are marked and sent to the edge terminal;
and S3, asynchronously reasoning the picture in the S2 by the edge-side model, extracting anchor point information corresponding to the false detection region in the picture into a feature vector, outputting the feature vector through a subnetwork, performing cosine similarity matching on the feature vector and the feature vector corresponding to the anchor points of the rest of the false detection picture, judging whether the feature vector is higher than a similarity threshold value as false judgment, and asynchronously uploading the feature vector lower than the similarity threshold value to a cloud-side model for rechecking.
2. The image recognition method based on cloud-edge collaboration as claimed in claim 1, wherein the cloud model construction step is as follows:
constructing a ResNet50 residual network of a main network;
fusing different feature layers of the ResNet50 by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the sizes of anchor points corresponding to the characteristic diagram, and endowing each anchor point with a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories of the target to be detected, 4 is the coordinate of box, and the anchor points with the IoU (IoU) of more than 0.5 are regarded as positive samples;
and constructing a sub-network, wherein the sub-network comprises a classification sub-network for predicting the occurrence probability of the target and a frame sub-network for predicting the coordinate offset of the anchor point generation candidate area, and the Loss function of the classification sub-network is calculated by cross entry Loss and the Loss function of the frame sub-network is calculated by Smooth L1 Loss.
3. The image recognition method based on cloud-edge collaboration as claimed in claim 1 or 2, wherein the edge model construction steps are as follows:
sequentially constructing 1-10 layers of convolutional neural networks, wherein the dimensions of the 1 st layer and the 2 nd layer are reduced through two-dimensional convolution with the convolution kernel size of 3, the 3 rd, 4 th, 5 th, 7 th and 9 th layers are inverse residual convolutional layers, and the 6 th, 8 th and 10 th layers are inverse residual convolutional layers introducing a space attention mechanism;
fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the sizes of anchor points corresponding to the characteristic diagram, and endowing each anchor point with a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories of the target to be detected, 4 is the coordinate of box, and the anchor points with the IoU (IoU) of more than 0.5 are regarded as positive samples;
and constructing a sub-network comprising a classification sub-network, a regression sub-network and a full-connection sub-network, wherein the classification sub-network, the regression sub-network and the corresponding loss function are based on a cloud model, and the loss function of the full-connection sub-network is calculated based on softmax loss.
4. The image recognition method based on cloud-edge collaboration as claimed in claim 2, wherein: the Cross Entrol Loss function is defined as follows:
wherein N is the number of samples, C is the number of classes of the object to be detected,is the label information of whether the ith sample belongs to the class c, has the value of 1, otherwise has the value of 0,in order to be a hyper-parameter,is the confidence that the ith sample predicted to belong to class c,
definition ofIs as followsCoordinate vector of relative position of each prediction area and anchor point reference area,Coordinate vector of relative position of ith target real area and anchor point reference area,
Wherein,representing the center coordinates;indicating the area border height and width;respectively representing the central abscissas of the real areas of the prediction area, the anchor point and the artificial labeling area;respectively represents the central vertical coordinates of the real areas of the prediction area, the anchor point and the artificial labeling area,
the Smooth L1 Loss function is defined as follows:
5. the image recognition method based on cloud-edge cooperation according to claim 1, wherein the cosine similarity matching calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210850570.4A CN114926667B (en) | 2022-07-20 | 2022-07-20 | Image identification method based on cloud edge cooperation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210850570.4A CN114926667B (en) | 2022-07-20 | 2022-07-20 | Image identification method based on cloud edge cooperation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114926667A CN114926667A (en) | 2022-08-19 |
CN114926667B true CN114926667B (en) | 2022-11-08 |
Family
ID=82815564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210850570.4A Active CN114926667B (en) | 2022-07-20 | 2022-07-20 | Image identification method based on cloud edge cooperation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114926667B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934298B (en) * | 2023-01-12 | 2024-05-31 | 南京南瑞信息通信科技有限公司 | Front-end and back-end collaborative power monitoring MEC unloading method, system and storage medium |
CN116055338B (en) * | 2023-03-28 | 2023-08-11 | 杭州觅睿科技股份有限公司 | False alarm eliminating method, device, equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784685A (en) * | 2020-07-17 | 2020-10-16 | 国网湖南省电力有限公司 | Power transmission line defect image identification method based on cloud edge cooperative detection |
CN111967305A (en) * | 2020-07-01 | 2020-11-20 | 华南理工大学 | Real-time multi-scale target detection method based on lightweight convolutional neural network |
CN113408087A (en) * | 2021-05-25 | 2021-09-17 | 国网湖北省电力有限公司检修公司 | Substation inspection method based on cloud side system and video intelligent analysis |
CN113989209A (en) * | 2021-10-21 | 2022-01-28 | 武汉大学 | Power line foreign matter detection method based on fast R-CNN |
WO2022082692A1 (en) * | 2020-10-23 | 2022-04-28 | 华为技术有限公司 | Lithography hotspot detection method and apparatus, and storage medium and device |
CN114697324A (en) * | 2022-03-07 | 2022-07-01 | 南京理工大学 | Real-time video analysis and processing method based on edge cloud cooperation |
-
2022
- 2022-07-20 CN CN202210850570.4A patent/CN114926667B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967305A (en) * | 2020-07-01 | 2020-11-20 | 华南理工大学 | Real-time multi-scale target detection method based on lightweight convolutional neural network |
CN111784685A (en) * | 2020-07-17 | 2020-10-16 | 国网湖南省电力有限公司 | Power transmission line defect image identification method based on cloud edge cooperative detection |
WO2022082692A1 (en) * | 2020-10-23 | 2022-04-28 | 华为技术有限公司 | Lithography hotspot detection method and apparatus, and storage medium and device |
CN113408087A (en) * | 2021-05-25 | 2021-09-17 | 国网湖北省电力有限公司检修公司 | Substation inspection method based on cloud side system and video intelligent analysis |
CN113989209A (en) * | 2021-10-21 | 2022-01-28 | 武汉大学 | Power line foreign matter detection method based on fast R-CNN |
CN114697324A (en) * | 2022-03-07 | 2022-07-01 | 南京理工大学 | Real-time video analysis and processing method based on edge cloud cooperation |
Also Published As
Publication number | Publication date |
---|---|
CN114926667A (en) | 2022-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102171122B1 (en) | Vessel detection method and system based on multidimensional features of scene | |
Kim et al. | High-speed drone detection based on yolo-v8 | |
CN112308019B (en) | SAR ship target detection method based on network pruning and knowledge distillation | |
Bakkay et al. | BSCGAN: Deep background subtraction with conditional generative adversarial networks | |
CN111444939B (en) | Small-scale equipment component detection method based on weak supervision cooperative learning in open scene of power field | |
CN114926667B (en) | Image identification method based on cloud edge cooperation | |
CN109063549B (en) | High-resolution aerial video moving target detection method based on deep neural network | |
CN113469050A (en) | Flame detection method based on image subdivision classification | |
CN112837315A (en) | Transmission line insulator defect detection method based on deep learning | |
CN111753677A (en) | Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure | |
WO2023000949A1 (en) | Detection method and device for video monitoring fire | |
CN111368634B (en) | Human head detection method, system and storage medium based on neural network | |
KR102391853B1 (en) | System and Method for Processing Image Informaion | |
CN115862066A (en) | Improved YOLOv5 lightweight community scene downlight detection method | |
CN111753732A (en) | Vehicle multi-target tracking method based on target center point | |
CN109447014A (en) | A kind of online behavioral value method of video based on binary channels convolutional neural networks | |
CN113989612A (en) | Remote sensing image target detection method based on attention and generation countermeasure network | |
CN111507353A (en) | Chinese field detection method and system based on character recognition | |
JP2024514175A (en) | Bird detection and species determination | |
CN112069997B (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net | |
Yang et al. | Abnormal Object Detection with an Improved YOLOv8 in the Transmission Lines | |
CN115719368B (en) | Multi-target ship tracking method and system | |
CN117789255A (en) | Pedestrian abnormal behavior video identification method based on attitude estimation | |
CN114972434B (en) | Cascade detection and matching end-to-end multi-target tracking system | |
CN114120076B (en) | Cross-view video gait recognition method based on gait motion estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Image recognition method based on cloud edge collaboration Granted publication date: 20221108 Pledgee: Hefei high tech Company limited by guarantee Pledgor: ANHUI JUSHI TECHNOLOGY CO.,LTD. Registration number: Y2024980013371 |