CN112115783A - Human face characteristic point detection method, device and equipment based on deep knowledge migration - Google Patents
Human face characteristic point detection method, device and equipment based on deep knowledge migration Download PDFInfo
- Publication number
- CN112115783A CN112115783A CN202010809064.1A CN202010809064A CN112115783A CN 112115783 A CN112115783 A CN 112115783A CN 202010809064 A CN202010809064 A CN 202010809064A CN 112115783 A CN112115783 A CN 112115783A
- Authority
- CN
- China
- Prior art keywords
- face
- network
- training
- feature
- student
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 44
- 238000013508 migration Methods 0.000 title claims abstract description 38
- 230000005012 migration Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 65
- 238000012549 training Methods 0.000 claims abstract description 60
- 238000012360 testing method Methods 0.000 claims abstract description 32
- 238000012795 verification Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 36
- 238000013140 knowledge distillation Methods 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000001174 ascending effect Effects 0.000 claims description 5
- 230000008014 freezing Effects 0.000 claims description 4
- 238000007710 freezing Methods 0.000 claims description 4
- 238000012544 monitoring process Methods 0.000 claims description 2
- 230000001815 facial effect Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 11
- 239000000284 extract Substances 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention discloses a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, wherein the method comprises the following steps: providing a face data set, and cutting a face image according to a face detection frame or a surrounding frame of a face characteristic point provided by the face data set to obtain a training set, a verification set and a test set; inputting a test sample and a training sample into an initial face alignment network frame; training an initial face to align a teacher network and a student network in a network frame by using a Pythrch until a loss function and the maximum iteration number meet a preset condition to generate a training model; model parameters of a teacher network are frozen, deep dark knowledge learned by the teacher network is extracted and transmitted to a student network to generate a final face alignment network model; and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result. The invention has the advantages of high face feature point detection precision, low model parameter quantity and low calculation complexity.
Description
Technical Field
The embodiment of the invention relates to the field of computer vision and digital image processing, in particular to a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration.
Background
The existing method for detecting the human face characteristic points cannot effectively solve the problem of human face characteristic point positioning in a natural scene, and the complicated method has huge model parameters and higher calculation complexity and cannot meet the requirement of running speed. The simple method cannot cope with the interference of factors such as extreme postures, variable illumination, severe shielding and the like in a natural scene, and the precision cannot meet the application requirement.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device and equipment for detecting human face characteristic points based on deep knowledge migration, which are used for solving the problems of higher computational complexity, low running speed and low precision of the existing human face characteristic point detection.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
in a first aspect, an embodiment of the present invention provides a method for detecting a human face feature point based on depth knowledge migration, including:
s1: providing a face data set containing face characteristic point labels, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set;
s2: acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework;
s3: setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions;
s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising a training process of the student network to generate a final face alignment network model;
s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face characteristic point detection result.
In one embodiment of the present invention, step S1 includes:
s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M testing pictures, each picture is provided with a picture label, the picture information comprises face frame information, face feature point position information and a plurality of attribute information, and N and M are positive integers larger than zero;
s1-2: and cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and turning to the face image so as to perform data enhancement to obtain the training set, the verification set and the test set.
In one embodiment of the present invention, the initial face alignment web frame is generated by:
generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three upper sampling layers and convolution layers, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer for classification and a final 1 × 1 convolution layer for ascending dimension;
adding the decoder behind the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain a feature map, converting the channel dimension of the feature map into the number of the human face feature points, and calculating the coordinates of the human face feature points expected to be obtained on each transformed feature map by using spatial softargmax operation;
providing a student network with an EfficientFAN structure, wherein a student network encoder comprises three upsampling layers and convolutional layers, the student network is used for final human face characteristic point detection, EfficientNet-B0 is used as a main part of the student network encoder, and the final average pooling layer of EfficientNet-B0, a full connection layer for classification and a 1 x 1 convolutional layer of the last ascending dimension are removed;
adding a convolution layer of 1 multiplied by 1 behind the student grid encoder, converting the channel number of the feature map obtained by up-sampling the student grid encoder into the number of the human face feature points, and calculating the coordinates of the human face feature points on the converted feature map by using spatial softargmax operation.
In one embodiment of the present invention, step S3 includes:
training the teacher network and the student network separately, using a feature point loss function LPOptimizing network parameters, characteristic point loss function LPCalculated by the Wingloss loss function, the Wingloss loss function is expressed as follows:
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, N is the number of face feature points, ω, ∈ is the preset parameter of f (x).
In one embodiment of the present invention, in step S4, extracting deep dark knowledge learned by the teacher network includes:
extracting pixel distribution information on a feature map based on a knowledge distillation method of feature alignment, aligning the pixel distribution of the teacher network feature map and the pixel distribution of the student network feature map, wherein a knowledge distillation loss function of feature alignment is as follows:
wherein A and B are feature maps of the teacher network and the student network at the same stage,is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.
In one embodiment of the present invention, in step S4, the passing the deep dark knowledge to the student network includes:
and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structural information of the face image to the student network from the teacher network.
In a second aspect, an embodiment of the present invention further provides a face feature point detection apparatus based on depth knowledge migration, including:
the system comprises a providing module, a judging module and a judging module, wherein the providing module is used for providing a human face data set containing human face characteristic point labels, and cutting a human face image according to a human face detection frame or a surrounding frame of the human face characteristic points provided by the human face data set to obtain a training set, a verification set and a test set;
an output module;
the control processing module is used for acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework; the control processing module is further used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network frame by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and monitoring a training process of the student network to generate a final face alignment network model; the control processing module is further used for inputting the RGB face image in the natural scene into the final face alignment network model, and outputting the face characteristic point detection result through the output module.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium containing one or more program instructions for being executed by the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
according to the method, the device and the equipment for detecting the human face feature points based on the deep knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, the up-sampling recovery process of the feature map is rapidly realized based on the decoder structure of up-sampling and deep separable convolution, and the spatial information of the feature map is effectively saved.
Compared with the current advanced large complex model, the method can achieve comparable human face characteristic point detection precision, but the model parameter quantity and the calculation complexity are obviously reduced.
The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning the feature points of the EfficientFAN face of a student network, provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, combines the pixel distribution information on a feature alignment knowledge distillation learning feature map, and supervises and guides the training process of the EfficientFAN together. Under the premise of not changing the network structure and not increasing the model parameters, EfficientFAN obtains a more accurate human face characteristic point detection result through a knowledge migration method. Experimental results on a public data set show that EfficientFAN is a simple and effective human face characteristic point detection network, and the knowledge distillation method effectively improves the human face characteristic point detection precision. In combination, EfficientFAN has excellent performance, precision and speed.
Drawings
Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention.
Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as meaning directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Fig. 1 is a flowchart of a method for detecting human face feature points based on depth knowledge migration according to the present invention. As shown in fig. 1, the method for detecting human face feature points based on depth knowledge migration of the present invention includes:
s1: providing a face data set containing the marks of the face characteristic points, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set.
Specifically, step S1 includes:
s1-1: a WFLW dataset is provided. The data set originates from IEEE Conference on Computer Vision and Pattern Recognition 2018, and comprises 10000 pictures (7500 training pictures and 2500 test pictures). Each picture label provides face frame information, 98 person face feature point location information, and 6 attribute information (pose, expression, lighting, makeup, occlusion, blur), and divides the entire data set into 6 classes of subsets according to the image attribute information.
S1-2: and cutting the face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and overturning to the face image so as to perform data enhancement to obtain a training set, a verification set and a test set.
S2: and acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework.
Specifically, the teacher network employs an encoder-decoder network architecture, using EfficientNet-B7 as the backbone of its encoder. The encoder is used to perform feature extraction and encoding on the input image, only the feature extraction part of the original network is retained, the last average pooling layer and the fully connected layer used for classification are removed, the last up-dimension 1 × 1 convolutional layer is also removed and features are extracted from the last inverse residual module. The operation enables the number of channels of the feature map extracted by the teacher network to have a smaller number of channels (640vs.2048) than the feature map after the feature map passes through the 1 × 1 convolutional layer, maintains more original feature information, does not lose information due to dimension increase, and is more suitable for being analyzed by a decoder.
And a decoder is added after the last reverse residual error module of EfficientNet-B7, the image features extracted by the encoder are subjected to spatial up-sampling, a more natural up-sampling method is used for improving the spatial dimension of the feature map, namely, the combination of an up-sampling layer and a convolution layer is used for replacing deconvolution, a general up-sampling method is firstly used for carrying out spatial up-sampling on the feature map, and then convolution operation is carried out on the basis of the up-sampling feature map to enrich the transformation of the feature map.
The present invention uses a combination of three upsampling layers and convolutional layers as a decoder for a face alignment network, added after the encoder. In the network model, the traditional convolution operation is replaced by the deep separable convolution, so that the calculation amount in the up-sampling process is reduced.
Specifically, the scaling factor of the upsampling layer is set to 2, that is, the length and width of each upsampled feature map are expanded to be twice of the input feature map, and the upsampling of the feature map is realized based on the nearest neighbor interpolation algorithm. A spatial thermodynamic diagram is generated using 1 x 1 convolutional layers after the decoder and the channel dimensions of the feature map are converted to the number of human face feature points. And calculating the coordinates of the expected corresponding human face characteristic points on each transformed characteristic map by using a spatial softargmax operation.
The spatial softargmax operation can be divided into two steps, the first step is normalized on the output characteristic diagram by using the softmax operation, and can be represented as follows:
wherein x and y are pixel indexes, exp represents an exponential function, and the obtained M is a normalized feature map. In the second step, the coordinate P of the feature point l can be finally expressed as:
a small and light student Network called Efficient Face Alignment Network (Efficient FAN) has a Network structure similar to that of a teacher Network and is used for final Face feature point detection. EfficientNet-B0 was used as the backbone for the student network EfficientFAN encoder. Like the teacher's network, the encoder of the student's network also removed the last average pooling layer and the fully connected layer for classification in EfficientNet-B0, as well as the 1 × 1 convolutional layer for the last ascending dimension.
Likewise, a combination of three upsampled layers and convolutional layers is used as a decoder for the student network, added after the encoder. The scale factor for each upsampled layer is 2 and the number of output channels per convolutional layer is 128. A1 multiplied by 1 convolutional layer is added after a decoder of the student network, and the number of channels of the feature map obtained by up-sampling the decoder is converted from 128 into the number of face feature points.
And finally, calculating the coordinates of the human face characteristic points on the converted characteristic graph by using a spatial softargmax operation.
TABLE 1 student network
The concrete structure of the student network is shown in table 1, where MBConv represents a Mobile phone-side inverse residual module (Mobile inversed bottleeck) used by the efficiency, DSConv represents a depth separable convolution, and k represents the size of a convolution kernel.
The teacher network located above and the student network located below are organically linked together by a Knowledge Transfer (knowlege Transfer) module.
Two knowledge distillation methods are used in the efficient face alignment network based on deep knowledge migration, so that different types of dark knowledge are migrated to a student network EfficientFAN from a teacher network.
The knowledge distillation method of feature alignment extracts pixel distribution information on the feature graph, aligns pixel distribution of the teacher network and the student network feature graph, and enables the feature graph distribution of the student network to be close to the distribution of the teacher network.
Correspondingly, the knowledge distillation method of block similarity extracts the face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.
Feature alignment distillation aligns the channel dimensions of feature maps at the same stage of teacher and student networks and directly compares the difference between the teacher and student network feature maps as the supervisory information in the student network training process.
S3: and setting parameters of the convolutional neural network, training an initial face by utilizing a Pythrch to align a teacher network and a student network in a network frame, and generating a training model until a loss function and the maximum iteration number meet preset conditions.
Specifically, the teacher network and the student network are trained separately, using only the feature point loss function LPAnd optimizing the network parameters. Characteristic point loss function LPThrough the calculation of the Wing loss function, the Wing loss function can be expressed as follows:
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, and N is the number of face feature points. (x) is a specially designed loss function that, for smaller errors, behaves as a logarithmic loss function with an offset; for larger errors, which are expressed as L1 loss functions, ω, ∈ is a preset parameter of f (x),is a constant.
S4: model parameters of the teacher network are frozen, deep dark knowledge learned by the teacher network is extracted, the deep dark knowledge is transmitted to the student network, and a final face alignment network model is generated in the training process of the student network.
Specifically, the knowledge distillation method for feature alignment extracts pixel distribution information on a feature map, aligns pixel distribution of a teacher network and a student network feature map, and enables the feature map distribution of the student network to be close to the distribution of the teacher network. The knowledge distillation loss function for feature alignment can be defined as follows:
wherein A and B are respectively the characteristic diagrams of the teacher network and the student network at the same stage,is a 1 x 1 convolutional layer for aligning the channel dimensions of both a and B feature maps.
The knowledge distillation method of block similarity extracts face structure information under different scales, and transmits the structural information of the face image to a student network from a teacher network, so that the simple student network can learn the face structure information of the current image.
And constructing relationship graphs with different scales for the input feature graph, and calculating a similarity matrix based on the constructed relationship graphs. For a feature map of size H × W, the feature map region may be divided into local blocks of different sizes. The size of the characteristic diagram is usually H-W-2nThe whole feature graph is used as a connected domain, the relation graph is constructed by taking local blocks with different sizes as nodes, and the nodes in the relation graph can be set to be 2k×2kK is a partial block of size 0, 1, …, k-1. A width 2n×2nThe feature map of (2) constructs a node size of 2k×2kContains 2n-k×2n-kA local block or a relational node. For simplicity, 2 will be used with an average pooling operationk×2kThe local blocks of (a) are aggregated into a 1 × 1 relationship graph node. For a feature graph with the number of channels being, the vectorization of the first node in the constructed relationship graph can be expressed as fi∈RC. Calculating similarity between nodes in a relational graph by using cosine similarity of vectors, i-th node vector fiAnd the jth node vector fjThe similarity between aijThe calculation is as follows.
In particular, the intermediate feature maps of the teacher network and the student network at the same stage have the same resolution and different channel numbers. Suppose the characteristic diagram of the teacher network is A e RC×H×WThe characteristic diagram of the student network is B e RC′×H×W2 on the characteristic diagramk×2kIn the affinity graph constructed by using local blocks with the same size as the nodes, the number of the nodes is 4n-kThe similarity relation between the two nodes can be calculated to obtain 4n-k×4n-kA similarity matrix of sizes. Order to2 on the network characteristic diagram of the teacherk×2kThe local block of the size is the cosine similarity obtained by the ith node and the jth node in a relational graph constructed by the nodes,the corresponding characteristic diagram of the student network is also shown by 2k×2kThe cosine similarity obtained by the ith node and the jth node in a relational graph constructed by local blocks with the size can be summarized as follows, and the loss function of the block similarity knowledge distillation method can be summarized as follows, wherein the size of the characteristic graph satisfies that H, W, 2n。
Introducing a knowledge migration loss function L by combining a feature alignment knowledge distillation method and a block similarity knowledge distillation methodKTThe training process of the student network is supervised as part of the network training loss function. The student network learns not only the true tag information provided by the labeled face feature point coordinates, but also more refined face structured knowledge and data distribution knowledge extracted from the teacher network. Optimizing the performance of the student network EfficientFAN by using a knowledge migration module and a knowledge distillation method, and completing pre-trainingThe parameters of the teacher network are kept in a frozen state, and the knowledge is transferred to a loss function LKTAnd the method is added into a training loss function, and the dark knowledge learned by a teacher through a network is distilled and transmitted to a student network in the training process of EfficientFAN, so that the positioning precision of the face feature points of the student network is improved. The loss function ultimately used to optimize the student network EfficientFAN is shown below, as a characteristic point loss function LPAnd LKTIn combination, where λ is an adjustable weighting parameter, for balancing the influence of two loss functions,andrespectively, a block similarity knowledge distillation loss function and a feature alignment knowledge distillation loss function at the d stage of the decoder.
S5: and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face characteristic point detection result.
According to the face feature point detection method based on depth knowledge migration, provided by the embodiment of the invention, EfficientFAN is used as a simple and effective lightweight model, and the decoder structure based on upsampling and depth separable convolution quickly realizes the upsampling recovery process of the feature map, so that the spatial information of the feature map is effectively saved.
Compared with the current advanced large complex model, the method can achieve comparable human face characteristic point detection precision, but the model parameter quantity and the calculation complexity are obviously reduced.
The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning the feature points of the EfficientFAN face of a student network, provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, combines the pixel distribution information on a feature alignment knowledge distillation learning feature map, and supervises and guides the training process of the EfficientFAN together. Under the premise of not changing the network structure and not increasing the model parameters, EfficientFAN obtains a more accurate human face characteristic point detection result through a knowledge migration method. Experimental results on a public data set show that EfficientFAN is a simple and effective human face characteristic point detection network, and the knowledge distillation method effectively improves the human face characteristic point detection precision. In combination, EfficientFAN has excellent performance, precision and speed.
Fig. 2 is a block diagram of the structure of the apparatus for detecting human face feature points based on deep knowledge migration according to the present invention. As shown in fig. 2, the apparatus for detecting human face feature points based on depth knowledge migration according to the present invention includes: a module 100, an output module 200 and a control processing module 300 are provided.
The providing module 100 is configured to provide a face data set including a label of a face feature point, and crop a face image according to a face detection frame provided by the face data set or an enclosure of the face feature point to obtain a training set, a verification set, and a test set. The control processing module 300 is configured to obtain a training sample from a training set, obtain a test sample from a test set, and input the test sample and the training sample into an initial face alignment network framework. The control processing module 300 is further configured to set parameters of the convolutional neural network, train an initial face to align with a teacher network and a student network in the network framework by using a Pytorch, and generate a training model until a loss function and the maximum iteration number meet predetermined conditions. The control processing module 300 is further configured to freeze model parameters of the teacher network, extract deep dark knowledge learned by the teacher network, transmit the deep dark knowledge to the student network, and supervise a training process of the student network to generate a final face alignment network model. The control processing module 300 is further configured to input the RGB face images in the natural scene into the final face alignment network model, and output a face feature point detection result through the output module.
It should be noted that, the specific implementation of the face feature point detection apparatus based on deep knowledge migration in the embodiment of the present invention is similar to the specific implementation of the face feature point detection method based on deep knowledge migration in the embodiment of the present invention, and specific reference is specifically made to the description of the face feature point detection method based on deep knowledge migration, and details are not repeated for reducing redundancy.
In addition, other configurations and functions of the apparatus for detecting human face feature points based on deep knowledge migration according to the embodiments of the present invention are known to those skilled in the art, and are not described in detail for reducing redundancy.
An embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method for detecting human face feature points based on deep knowledge migration according to the first aspect.
The disclosed embodiments of the present invention provide a computer-readable storage medium having stored therein computer program instructions, which, when run on a computer, cause the computer to execute the above-mentioned face feature point detection method based on deep knowledge migration.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (9)
1. A face feature point detection method based on depth knowledge migration is characterized by comprising the following steps:
s1: providing a face data set containing face characteristic point labels, and cutting a face image according to a face detection frame or an enclosing frame of the face characteristic points provided by the face data set to obtain a training set, a verification set and a test set;
s2: acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework;
s3: setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions;
s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising a training process of the student network to generate a final face alignment network model;
s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face characteristic point detection result.
2. The method for detecting human face feature points based on deep knowledge migration according to claim 1, wherein step S1 includes:
s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M testing pictures, each picture is provided with a picture label, the picture information comprises face frame information, face feature point position information and a plurality of attribute information, and N and M are positive integers larger than zero;
s1-2: and cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and turning to the face image so as to perform data enhancement to obtain the training set, the verification set and the test set.
3. The method of claim 1, wherein the initial face alignment network framework is generated by:
generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three upper sampling layers and convolution layers, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer for classification and a final 1 × 1 convolution layer for ascending dimension;
adding the decoder behind the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain a feature map, converting the channel dimension of the feature map into the number of the human face feature points, and calculating the coordinates of the human face feature points expected to be obtained on each transformed feature map by using spatial softargmax operation;
providing a student network with an EfficientFAN structure, wherein a student network encoder comprises three upsampling layers and convolutional layers, the student network is used for final human face characteristic point detection, EfficientNet-B0 is used as a main part of the student network encoder, and the final average pooling layer of EfficientNet-B0, a full connection layer for classification and a 1 x 1 convolutional layer of the last ascending dimension are removed;
adding a convolution layer of 1 multiplied by 1 behind the student grid encoder, converting the channel number of the feature map obtained by up-sampling the student grid encoder into the number of the human face feature points, and calculating the coordinates of the human face feature points on the converted feature map by using spatial softargmax operation.
4. The method for detecting human face feature points based on deep knowledge migration according to claim 3, wherein the step S3 includes:
training the teacher network and the student network separately, using a feature point loss function LPOptimizing network parameters, characteristic point loss function LPCalculated by the Wing loss function, the Wing loss function is expressed as follows:
wherein P ∈ R1×2NIs a coordinate vector of a predicted face characteristic point, G is belonged to R1×2NIs the real face feature point coordinate vector, N is the number of face feature points, ω, ∈ is the preset parameter of f (x).
5. The method for detecting human face feature points based on deep knowledge migration of claim 4, wherein in step S4, extracting deep dark knowledge learned by the teacher network comprises:
extracting pixel distribution information on a feature map based on a knowledge distillation method of feature alignment, aligning the pixel distribution of the teacher network feature map and the pixel distribution of the student network feature map, wherein a knowledge distillation loss function of feature alignment is as follows:
6. The method for detecting human face feature points based on deep knowledge migration according to claim 5, wherein in step S4, the transferring the deep dark knowledge to the student network comprises:
and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structural information of the face image to the student network from the teacher network.
7. A face feature point detection device based on deep knowledge migration, characterized by comprising:
the system comprises a providing module, a judging module and a judging module, wherein the providing module is used for providing a human face data set containing human face characteristic point labels, and cutting a human face image according to a human face detection frame or a surrounding frame of the human face characteristic points provided by the human face data set to obtain a training set, a verification set and a test set;
an output module;
the control processing module is used for acquiring a training sample from the training set, acquiring a test sample from the test set, and inputting the test sample and the training sample into an initial face alignment network framework; the control processing module is further used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network frame by utilizing a Pythrch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and monitoring a training process of the student network to generate a final face alignment network model; the control processing module is further used for inputting the RGB face image in the natural scene into the final face alignment network model, and outputting the face characteristic point detection result through the output module.
8. An electronic device, characterized in that the electronic device comprises: at least one processor and at least one memory;
the memory is to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method for detecting facial feature points based on deep knowledge migration according to any one of claims 1 to 6.
9. A computer-readable storage medium containing one or more program instructions for executing the method for deep knowledge migration based face feature point detection according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010809064.1A CN112115783B (en) | 2020-08-12 | 2020-08-12 | Depth knowledge migration-based face feature point detection method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010809064.1A CN112115783B (en) | 2020-08-12 | 2020-08-12 | Depth knowledge migration-based face feature point detection method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112115783A true CN112115783A (en) | 2020-12-22 |
CN112115783B CN112115783B (en) | 2023-11-14 |
Family
ID=73805270
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010809064.1A Active CN112115783B (en) | 2020-08-12 | 2020-08-12 | Depth knowledge migration-based face feature point detection method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112115783B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418195A (en) * | 2021-01-22 | 2021-02-26 | 电子科技大学中山学院 | Face key point detection method and device, electronic equipment and storage medium |
CN112633406A (en) * | 2020-12-31 | 2021-04-09 | 天津大学 | Knowledge distillation-based few-sample target detection method |
CN112634441A (en) * | 2020-12-28 | 2021-04-09 | 深圳市人工智能与机器人研究院 | 3D human body model generation method, system and related equipment |
CN112734632A (en) * | 2021-01-05 | 2021-04-30 | 百果园技术(新加坡)有限公司 | Image processing method, image processing device, electronic equipment and readable storage medium |
CN112767320A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Image detection method, image detection device, electronic equipment and storage medium |
CN113052144A (en) * | 2021-04-30 | 2021-06-29 | 平安科技(深圳)有限公司 | Training method, device and equipment of living human face detection model and storage medium |
CN113343898A (en) * | 2021-06-25 | 2021-09-03 | 江苏大学 | Mask shielding face recognition method, device and equipment based on knowledge distillation network |
CN113343979A (en) * | 2021-05-31 | 2021-09-03 | 北京百度网讯科技有限公司 | Method, apparatus, device, medium and program product for training a model |
CN113470099A (en) * | 2021-07-09 | 2021-10-01 | 北京的卢深视科技有限公司 | Depth imaging method, electronic device and storage medium |
CN113487614A (en) * | 2021-09-08 | 2021-10-08 | 四川大学 | Training method and device for fetus ultrasonic standard section image recognition network model |
CN113628635A (en) * | 2021-07-19 | 2021-11-09 | 武汉理工大学 | Voice-driven speaking face video generation method based on teacher and student network |
CN113705361A (en) * | 2021-08-03 | 2021-11-26 | 北京百度网讯科技有限公司 | Method and device for detecting model in living body and electronic equipment |
CN113705317A (en) * | 2021-04-14 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Image processing model training method, image processing method and related equipment |
CN113947801A (en) * | 2021-12-21 | 2022-01-18 | 中科视语(北京)科技有限公司 | Face recognition method and device and electronic equipment |
US20220156596A1 (en) * | 2020-11-17 | 2022-05-19 | A.I.MATICS Inc. | Neural architecture search method based on knowledge distillation |
WO2022156331A1 (en) * | 2021-01-22 | 2022-07-28 | 北京市商汤科技开发有限公司 | Knowledge distillation and image processing method and apparatus, electronic device, and storage medium |
CN114821654A (en) * | 2022-05-09 | 2022-07-29 | 福州大学 | Human hand detection method fusing local and depth space-time diagram network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108363962A (en) * | 2018-01-25 | 2018-08-03 | 南京邮电大学 | A kind of method for detecting human face and system based on multi-level features deep learning |
WO2019128646A1 (en) * | 2017-12-28 | 2019-07-04 | 深圳励飞科技有限公司 | Face detection method, method and device for training parameters of convolutional neural network, and medium |
CN110414400A (en) * | 2019-07-22 | 2019-11-05 | 中国电建集团成都勘测设计研究院有限公司 | A kind of construction site safety cap wearing automatic testing method and system |
CN110674714A (en) * | 2019-09-13 | 2020-01-10 | 东南大学 | Human face and human face key point joint detection method based on transfer learning |
-
2020
- 2020-08-12 CN CN202010809064.1A patent/CN112115783B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128646A1 (en) * | 2017-12-28 | 2019-07-04 | 深圳励飞科技有限公司 | Face detection method, method and device for training parameters of convolutional neural network, and medium |
CN108363962A (en) * | 2018-01-25 | 2018-08-03 | 南京邮电大学 | A kind of method for detecting human face and system based on multi-level features deep learning |
CN110414400A (en) * | 2019-07-22 | 2019-11-05 | 中国电建集团成都勘测设计研究院有限公司 | A kind of construction site safety cap wearing automatic testing method and system |
CN110674714A (en) * | 2019-09-13 | 2020-01-10 | 东南大学 | Human face and human face key point joint detection method based on transfer learning |
Non-Patent Citations (2)
Title |
---|
刘伦豪杰;王晨辉;卢慧;王家豪;: "基于迁移卷积神经网络的人脸表情识别", 电脑知识与技术, no. 07 * |
张延安;王宏玉;徐方;: "基于深度卷积神经网络与中心损失的人脸识别", 科学技术与工程, no. 35 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12051004B2 (en) * | 2020-11-17 | 2024-07-30 | Aimatics Co., Ltd. | Neural architecture search method based on knowledge distillation |
US20220156596A1 (en) * | 2020-11-17 | 2022-05-19 | A.I.MATICS Inc. | Neural architecture search method based on knowledge distillation |
CN112634441A (en) * | 2020-12-28 | 2021-04-09 | 深圳市人工智能与机器人研究院 | 3D human body model generation method, system and related equipment |
CN112634441B (en) * | 2020-12-28 | 2023-08-22 | 深圳市人工智能与机器人研究院 | 3D human body model generation method, system and related equipment |
CN112633406A (en) * | 2020-12-31 | 2021-04-09 | 天津大学 | Knowledge distillation-based few-sample target detection method |
CN112767320A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Image detection method, image detection device, electronic equipment and storage medium |
WO2022141859A1 (en) * | 2020-12-31 | 2022-07-07 | 平安科技(深圳)有限公司 | Image detection method and apparatus, and electronic device and storage medium |
CN112734632A (en) * | 2021-01-05 | 2021-04-30 | 百果园技术(新加坡)有限公司 | Image processing method, image processing device, electronic equipment and readable storage medium |
CN112734632B (en) * | 2021-01-05 | 2024-02-27 | 百果园技术(新加坡)有限公司 | Image processing method, device, electronic equipment and readable storage medium |
CN112418195A (en) * | 2021-01-22 | 2021-02-26 | 电子科技大学中山学院 | Face key point detection method and device, electronic equipment and storage medium |
WO2022156331A1 (en) * | 2021-01-22 | 2022-07-28 | 北京市商汤科技开发有限公司 | Knowledge distillation and image processing method and apparatus, electronic device, and storage medium |
CN113705317A (en) * | 2021-04-14 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Image processing model training method, image processing method and related equipment |
CN113705317B (en) * | 2021-04-14 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Image processing model training method, image processing method and related equipment |
CN113052144B (en) * | 2021-04-30 | 2023-02-28 | 平安科技(深圳)有限公司 | Training method, device and equipment of living human face detection model and storage medium |
CN113052144A (en) * | 2021-04-30 | 2021-06-29 | 平安科技(深圳)有限公司 | Training method, device and equipment of living human face detection model and storage medium |
CN113343979A (en) * | 2021-05-31 | 2021-09-03 | 北京百度网讯科技有限公司 | Method, apparatus, device, medium and program product for training a model |
CN113343898A (en) * | 2021-06-25 | 2021-09-03 | 江苏大学 | Mask shielding face recognition method, device and equipment based on knowledge distillation network |
CN113470099A (en) * | 2021-07-09 | 2021-10-01 | 北京的卢深视科技有限公司 | Depth imaging method, electronic device and storage medium |
CN113470099B (en) * | 2021-07-09 | 2022-03-25 | 北京的卢深视科技有限公司 | Depth imaging method, electronic device and storage medium |
CN113628635B (en) * | 2021-07-19 | 2023-09-15 | 武汉理工大学 | Voice-driven speaker face video generation method based on teacher student network |
CN113628635A (en) * | 2021-07-19 | 2021-11-09 | 武汉理工大学 | Voice-driven speaking face video generation method based on teacher and student network |
CN113705361A (en) * | 2021-08-03 | 2021-11-26 | 北京百度网讯科技有限公司 | Method and device for detecting model in living body and electronic equipment |
CN113487614A (en) * | 2021-09-08 | 2021-10-08 | 四川大学 | Training method and device for fetus ultrasonic standard section image recognition network model |
CN113947801B (en) * | 2021-12-21 | 2022-07-26 | 中科视语(北京)科技有限公司 | Face recognition method and device and electronic equipment |
CN113947801A (en) * | 2021-12-21 | 2022-01-18 | 中科视语(北京)科技有限公司 | Face recognition method and device and electronic equipment |
CN114821654A (en) * | 2022-05-09 | 2022-07-29 | 福州大学 | Human hand detection method fusing local and depth space-time diagram network |
Also Published As
Publication number | Publication date |
---|---|
CN112115783B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115783B (en) | Depth knowledge migration-based face feature point detection method, device and equipment | |
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
CN109840556B (en) | Image classification and identification method based on twin network | |
CN110569738B (en) | Natural scene text detection method, equipment and medium based on densely connected network | |
Chen et al. | Convolutional neural network based dem super resolution | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN113066089B (en) | Real-time image semantic segmentation method based on attention guide mechanism | |
CN110599502B (en) | Skin lesion segmentation method based on deep learning | |
CN114913379B (en) | Remote sensing image small sample scene classification method based on multitasking dynamic contrast learning | |
CN109299303B (en) | Hand-drawn sketch retrieval method based on deformable convolution and depth network | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
CN116468919A (en) | Image local feature matching method and system | |
Alshehri | A content-based image retrieval method using neural network-based prediction technique | |
CN117197462A (en) | Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment | |
CN114782752B (en) | Small sample image integrated classification method and device based on self-training | |
CN117693768A (en) | Semantic segmentation model optimization method and device | |
CN116503398B (en) | Insulator pollution flashover detection method and device, electronic equipment and storage medium | |
CN117765363A (en) | Image anomaly detection method and system based on lightweight memory bank | |
Wang et al. | Insulator defect detection based on improved you-only-look-once v4 in complex scenarios | |
CN116778164A (en) | Semantic segmentation method for improving deep V < 3+ > network based on multi-scale structure | |
CN117115616A (en) | Real-time low-illumination image target detection method based on convolutional neural network | |
CN117115880A (en) | Lightweight face key point detection method based on heavy parameterization | |
CN116343034A (en) | Remote sensing image change detection method, system, electronic equipment and medium | |
CN116246147A (en) | Cross-species target detection method based on cross-layer feature fusion and linear attention optimization | |
CN114913382A (en) | Aerial photography scene classification method based on CBAM-AlexNet convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |