CN115099294B

CN115099294B - Flower image classification algorithm based on feature enhancement and decision fusion

Info

Publication number: CN115099294B
Application number: CN202210275922.8A
Authority: CN
Inventors: 贾连印; 翟红淞; 丁家满; 李润鑫; 李晓武; 游进国
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2024-07-19
Anticipated expiration: 2042-03-21
Also published as: CN115099294A

Abstract

The invention provides a flower image classification algorithm based on feature enhancement and decision fusion, and belongs to the technical field of image classification. The method comprises a data preprocessing stage, a feature extraction and enhancement stage, a training stage and a decision fusion classification stage. The data preprocessing stage carries out corresponding operations such as cutting, scaling, normalization and the like on the data set; the feature extraction and enhancement stage is to send the preprocessed data into a VGG16 model pre-trained by ImageNet to extract multi-layer depth image features, and introduce feature enhancement strategies to adaptively allocate feature weights; the training stage uses the multiple groups of characteristics of the previous stage to train multiple custom softmax classifiers; and in the decision fusion classification stage, an information entropy is introduced to represent the certainty degree of each classifier, a fusion weight is determined according to the information entropy, and the fusion decision is used for realizing classification. The self-adaptive enhancement feature of the invention has stronger expression capability and stronger classification capability compared with the prior softmax classifier.

Description

Flower image classification algorithm based on feature enhancement and decision fusion

Technical Field

The invention relates to a flower image classification algorithm based on feature enhancement and decision fusion, and belongs to the technical field of image classification.

Background

Image classification is a fundamental problem in computer vision, which is the basis of techniques such as image localization, image detection, image segmentation, etc. There are numerous image classification algorithms such as manually defined feature based classification algorithms, neural network based algorithms such as convolutional neural networks, antagonistic neural networks, and attention mechanisms.

The quality of the extracted features is the key for determining the classification effect, and previous research algorithms are often focused on improving an image preprocessing method, a network structure, an activation function and the like, so as to extract more excellent image features. But these efforts have less focus on optimizing extracted features and on the study of the decision correlation of multiple softmax classifiers together, and thus classification accuracy remains to be improved.

Disclosure of Invention

The invention aims to solve the technical problem of providing a flower image classification algorithm based on feature enhancement and decision fusion, so as to solve the problem of misclassification caused by insufficient feature expression and excessive breakage of a softmax classifier.

The technical scheme of the invention is as follows: the flower image classification algorithm based on feature enhancement and decision fusion comprises a data preprocessing stage, a feature extraction and enhancement stage, a training stage and a decision fusion classification stage. And the data preprocessing stage performs corresponding operations such as clipping, scaling, normalization and the like on the data set. And in the feature extraction and enhancement stage, the preprocessed data is sent into a VGG16 model pre-trained by the ImageNet to extract multi-layer depth image features, and feature enhancement strategies are introduced to adaptively allocate feature weights. The training phase trains a plurality of custom softmax classifiers using multiple sets of features of the previous phase. The decision fusion classification stage introduces information entropy to represent the certainty degree of each classifier, determines fusion weight according to the information entropy, and fuses the decisions to realize classification.

The method comprises the following specific steps:

step1: and performing corresponding clipping, scaling and normalization operations on the data set.

The Step1 specifically comprises the following steps:

step1.1: the image dataset is divided into a training set, a validation set, and a test set.

Step1.2: and cutting each image in the training set into k square images (the required times of cutting each image can be used for expanding training set data) by adopting a normal random cutting strategy, and cutting each image in the verification set and the test set into one square image by adopting a central cutting strategy.

Step1.3: the images in the training set, the test set and the verification set after clipping are scaled to the size 224×224 required by the VGG16 model by using Lanczos interpolation.

Step1.4: and linearly transforming the pixel values of the images in the training set, the verification set and the test set after scaling from the range of [0,255] to the range of [0,1] by adopting a maximum and minimum normalization mode.

Step2: and sending the preprocessed data into a VGG16 model pre-trained by the ImageNet to extract a plurality of layers of high-level features, and introducing a feature enhancement strategy to adaptively allocate feature weights so as to realize the effect of adaptively enhancing the features.

The Step2 specifically comprises the following steps:

Step2.1: and removing three full-connection layers of the pretrained VGG16 model, and sending the training set, the verification set and the test set into the full-connection layers to extract high-level features.

Step2.2: the features of blocki _ convj layer and the block5_pool layer of the training set, validation set and test set, respectively, are saved.

Step2.3: the feature enhancement mask strategy is used to adaptively enhance the features of blocki _ convj layers to obtain blocki _ convj _en.

If the blocki _ convj output feature map is M _ij＝F_ij (x, y, P), the blocki _ convj _en output feature map is M _{ij_en}＝F_{ij_en} (x, y, P), and the feature enhancement mask M _en＝F_en(x,y,p),M_{ij_en}＝M_ij×M_en is introduced, wherein 1-x, y-W, 1-P, where W represents the side length of the feature map and P represents the number of channels of the feature map.

Step2.4: the blocki _ convj _en layer and the block5_pool layer are respectively spliced into a new tensor Concat _ij.

The step2.3 specifically comprises the following steps:

Step2.3.1: the blocki _ convj layer feature map, called M _ij, has dimensions w×w×p, i.e., each of the P channels has a pixel matrix of size w×w, and the P channels are compressed into one channel to obtain an overlay map M _stack with dimensions w×w, and an average map M _average is obtained, where M _ij＝F_ij (x, y, P) is M _stack＝m₁+m₂+…+m_P,M_average＝M_stack/P if one of the channel feature maps is denoted as M _p.

Step2.3.2: order theRepresenting a threshold that distinguishes between high response areas, which are more likely to be floral expressing areas, and non-high response areas.

Step2.3.3: if M _ij = F (x, y, p) > thres, then M _en＝1+(M_ij (x, y, p) -thres)/(a-thres). If M _ij =f (x, y, p) < thres, then M _en =b, where a and b are hyper-parameters.

Step2.3.4：M_{ij_en}＝M_ij×M_en。

Step3: multiple custom softmax classifiers are trained using multiple sets of features from the previous stage.

The Step3 specifically comprises the following steps:

Step3.1: a global average pooling layer (GAP) was connected to a fully connected layer containing 1024 nodes and a softmax classifier was constructed using the softmax function as an activation function.

Step3.2: tensors Concat _ij are respectively connected into the classifier, and the output characteristics of blocki _ convj and the output characteristics of the block5_pool layer are adopted for training and verification.

Step3.3: and saving the model trained by Concat _ij to obtain a decision model dec_ brij.

Step4: information entropy is introduced to represent the certainty degree of each classifier, so that decision fusion weights are distributed in a self-adaptive mode, and classification is achieved through decision fusion.

The Step4 specifically comprises the following steps:

Step4.1: using dec_ brij and testing the test set images separately, the probability that each sample belongs to each class is obtained, and for sample x _n, the test result p _ij(x_n)＝[p_ij1(x_n)p_ij2(x_n)…p_ijL(x_n of decision branch dec_ brij is obtained, where L represents the total number of classes of samples, and p _ijl(x_n) represents the probability that decision dec_ brij predicts that sample x _n belongs to class L.

Step4.2: information entropy H (x) is introduced to represent the degree of certainty of the decision model for the decision,Wherein i is more than or equal to 1 and less than or equal to 5, j is more than or equal to 1 and less than or equal to 3.

H _ij (x) represents the degree of certainty of the decision branch dec_ brij with respect to the classification result, and p _ijl (x) represents the probability that the decision branch dec_ brij decides the sample x as the first class.

Step4.3: the specific method of introducing decision weights ω to decide the decision fusion, ω is defined as follows,

Wherein i is more than or equal to 1 and less than or equal to 5, j is more than or equal to 1 and less than or equal to 3. Omega _ij represents the decision weight assigned to dec_ brij.

Step4.4: for sample x _n, the output result of dec_ brij blended with the decision weight is

Step4.5: for sample x _n, the decision fusion probability belonging to each class is P (x _n),

The probability that the prediction sample x belongs to the category L after the decision fusion is represented.

Step4.6: for sample x, the final prediction class Label (x) =argmax (P (x)).

The beneficial effects of the invention are as follows: the invention combines the feature enhancement and the feature fusion, and combines the softmax classifier and the information entropy, and has the following advantages:

1. the self-adaptive enhanced characteristics have stronger expression capability;

2. the classifier has stronger classifying capability compared with the prior softmax classifier.

Drawings

FIG. 1 is a diagram of a feature extraction and enhancement network of the present invention;

FIG. 2 is a block diagram of decision fusion of the present invention;

FIG. 3 is a dimension diagram of an output feature diagram of a network layer, wherein W represents the side length of the feature diagram, and P represents the number of channels;

FIG. 4 is a feature overlay visualization of the present invention;

FIG. 5 is a feature overlay visualization of the present invention with feature enhancement;

Fig. 6 is a flow chart of the steps of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and detailed description.

As shown in FIG. 6, the flower image classification algorithm based on feature enhancement and decision fusion comprises a data preprocessing stage, a feature extraction and enhancement stage, a training stage and a decision fusion classification stage. The data preprocessing stage carries out corresponding operations such as cutting, scaling, normalization and the like on the data set; the feature extraction and enhancement stage is to send the preprocessed data into a VGG16 model which is pre-trained by the ImageNet to extract multi-layer depth image features, and introduce feature enhancement strategies to adaptively allocate feature weights; the training stage uses the multiple groups of characteristics of the previous stage to train multiple custom softmax classifiers; and in the decision fusion classification stage, an information entropy is introduced to represent the certainty degree of each classifier, a fusion weight is determined according to the information entropy, and the fusion decision is used for realizing classification.

The method comprises the following specific steps:

step1: and performing corresponding operations such as clipping, scaling, normalization and the like on the data set.

Step2: and sending the preprocessed data into a VGG16 model pre-trained by the ImageNet to extract multi-layer depth image features, and introducing a feature enhancement strategy to adaptively allocate feature weights so as to realize self-adaptive enhancement features.

The Step1 specifically comprises the following steps:

In this example, common dataset oxford-17flowers provided by oxford university was used, which was 17flowers selected by oxford university Visual Geometry Group, which were relatively common in the uk. Where each flower has 80 pictures and the entire dataset has 1360 pictures. Because of the small number of data sets, 50 flowers are randomly selected as training sets, 15 flowers are used as verification sets and 15 flowers are used as test sets. Furthermore, since four of the extracted features are not available, most documents use 13 classes of flowers to test, and this approach is also used herein to evaluate the performance of the present invention.

In the embodiment, the training set, the verification set and the test set are respectively subjected to random cutting for 5 times, and the data are expanded to be 5 times of the original data, namely 3250 pieces of the training set, 975 pieces of the verification set and 975 pieces of the test set.

Step1.3: the training set, the test set and the validation set after clipping are scaled to the size 224×224 required by the VGG16 model using Lanczos interpolation.

Step1.4: and linearly transforming the pixel values of the scaled training set, verification set and test set from the range of [0,255] to the range of [0,1] by adopting a maximum and minimum normalization mode.

In this embodiment, taking the sample x ₁ as an example, as shown in fig. 1, the sample resolution is 540×500 before normalization, the pixel value range is [0,255], the sample resolution after normalization is 224×224 of the resolution required by the model, and the pixel value range is [0,1], as shown in the left image normalization portion of fig. 1. The pixels before normalization are ：[[[56,56,56,...,26,26,26],[57,58,58,...,26,26, 26],[59,60,61,...,26,26,26],...,[6,5,5,...,14,14,14],[6,5,5,...,14,14,14],[6,5, 5,...,13,13,13]],[[78,78,78,...,29,29,29],[79,80,80,...,29,29,29],[82,83,84,...,29,29, 29],...,[6,5,5,...,16,16,16],[6,5,5,...,16,16,16],[6,5,5,...,15,15,15]]]_540×500×3, where "540 x 500" represents the sample resolution and "3" represents the RGB three channels of the sample. The pixels after normalization are [[[0.21960784,0.21960784,0.21960784,...,0.10196079,0.10196079,0.10196079],[0.22352941, 0.22745098,0.22745098,...,0.10196079,0.10196079,0.10196079],...,[0.02352941,0.01960784, 0.01960784,...,0.05098039,0.05098039,0.05098039]],[[0.11372549,0.11372549,0.11372549,..., 0.03921569,0.03921569,0.03921569],[0.11764706,0.12156863,0.12156863,...,0.03921569, 0.03921569,0.03921569],...,[0.02352941,0.01960784,0.02745098,...,0.04705882,0.04705882, 0.04705882]]]_224×224×3.

In this embodiment, i in blocki _ convj represents the ith network block, j represents the jth convolution layer, block5_conv2 represents the 2 nd convolution layer of the 5 th network block of the VGG16 model, and, taking block5_conv2 as an example, the dimension of the layer of the feature map is (14,14,512), that is, the size of the feature map is 14×14, and the number of channels is 512.

Step2.3: the feature map of blocki _ convj layers is adaptively enhanced using a feature enhancement mask strategy to yield blocki _ convj _en.

In this embodiment, taking the examples of the block5_conv2 and the block5_conv2_en, the dimension of the output feature map M ₅₂、M_{52_en} of the block5_conv2_en is 14×14×512, that is, w=14 and p=512.

In this embodiment, taking the blocking 5_conv2_en and the blocking 5_pool layer as examples, the dimension of the feature map of the blocking 5_conv2_en layer is 14×14×512, and a one-dimensional tensor a of 512 parameters is obtained through the global average pooling layer. The dimension of the block5_pool layer feature map is 7×7×512, and a one-dimensional tensor B with 512 parameters is generated through the global average pooling layer. The two tensors A and B are spliced and fused to generate a one-dimensional tensor Concat _52 containing 1024 parameters.

Step2.3.1: the blocki _ convj layer feature map (hereinafter referred to as M _ij, as in fig. 1) has a dimension of w×w×p (each of the P channels has a pixel matrix of w×w), and the P channels are compressed into one channel to obtain an overlay map M _stack having the dimension of w×w, and an average map M _average is obtained, where M _ij＝F_ij (x, y, P), if one of the channel feature maps is denoted as M _p, is M _stack＝m₁+m₂+…+m_P,M_average＝M_stack/P.

In this embodiment, the feature map of 512 channels of the block5_conv2 layer is compressed to one channel, and a 14×14 overlay map M _stack is obtained, and the block5_conv2 layer feature map overlay map is shown in fig. 4.

In this embodiment, the sample distinguishes between the high response region and the non-high response region with threshold thres= 0.24598.

Step2.3.3: if M _ij = F (x, y, p) > thres, then M _en＝1+(M_ij (x, y, p) -thres)/(a-thres). If it is

M _ij =f (x, y, p) < thres, then M _en =b.

In this embodiment, a and b are super parameters, a is a factor for enhancing the characteristics of the high response area, and b is a factor for weakening the non-high response area, wherein a is equal to or less than 1 and equal to or less than 2, and b is equal to or less than 0 and equal to or less than 1.

Step2.3.4：M_{ij_en}＝M_ij×M_en。

In this embodiment, after the adaptive enhancement feature, the characteristic diagram superposition diagram of the block5_conv2 layer and the block5_pool layer is shown in fig. 5.

In this embodiment, an Adam optimizer is used for training, the sample batch is set to 32, 50 epochs are trained each time, and the model is saved each time the average accuracy is higher than the previous Epoch during training.

In this embodiment, taking dec_br52 as an example, dec_br52 represents a decision model obtained by training a classifier by splicing block5_conv2_en and block5_pool.

In this embodiment, dec_br51, dec_br52, dec_br53 are taken as examples, and the test result of dec_br51 is given as an example p₅₁(x₁)＝ [1.0000000e+00,4.1446593e-17,1.9549345e-15,7.9526921e-20,2.0109854e-20,3.1513676e-19,3.7332855e-20,2.1863510e-23,1.3979170e-16,2.2339542e-20,9.2374559e-15, 5.8055036e-11,3.0085874e-23].

Test results of dec_br52 p₅₂(x₁)＝[9.9999547e-01,7.5567144e-09,7.8939694e-10, 5.6319975e-17,7.2288855e-13,3.3209550e-14,4.4501590e-12,7.1541192e-16,5.3460147e-10,4.9500820e-12,6.0213007e-10,4.4727053e-06,1.1179825e-15].

Test results of dec_br53 p₅₃(x₁)＝[9.9999964e-01,8.5260909e-11,6.1670391e-08, 6.0080515e-14,2.3101764e-11,6.7670508e-11,1.8991200e-09,4.3540345e-12,2.2171498e-07,7.6678863e-10,3.7856211e-09,3.0988421e-08,1.2395125e-10].

Step4.2: information entropy H (x) is introduced to represent the degree of certainty of the decision model for the decision,

Wherein i is more than or equal to 1 and less than or equal to 5, j is more than or equal to 1 and less than or equal to 3.H _ij (x) represents the degree of certainty of the decision branch dec_ brij with respect to the classification result, and p _ijl (x) represents the probability that the decision branch dec_ brij decides the sample x as the first class.

In this embodiment, the information entropy H ₅₁(x₁ of dec_br51= 3.986313764776386e-08, the information entropy H ₅₂(x₂ of dec_br52) = 8.630309145106366e-05, and the information entropy H ₅₃(x₃ of dec_br53) = 7.871773456765137e-06.

In this embodiment, the decision weight ω ₅₁ =0.33437882598 ₅₂ corresponding to dec_br51, the decision weight ω ₅₂ =0.33331503418887511 corresponding to dec_br52, and the decision weight ω ₅₃ = 0.33334117755140763 corresponding to dec_br53.

In this embodiment, the output results of the dec_br51, dec_br52, dec_br53 blended with the decision weights are respectively:

In this embodiment, the decision fusion probability of the sample x ₁ belonging to each class is P(x₁)＝[9.99998371e-01, 2.58543500e-09,2.08537332e-08,1.00000001e-10,1.00000001e-10,1.00000001e-10,6.99720771e-10,1.00000001e-10,7.41182590e-08,3.22268109e-10,1.49593680e-09, 1.50118298e-06,1.07983939e-10].

Step4.6: for sample x, the final prediction class Label (x) =argmax (P (x)).

In this embodiment, as can be seen from the value of P (x ₁), the first probability value is the largest, so sample x ₁ belongs to class 0.

The invention can be further illustrated by the following experimental results.

Experimental environment: CPU is Intel (R) Core (TM) i7-7700 CPU@3.60GHz, memory is 32GB, operating system is windows 10, and compiling environment is jupyter notebook.

Experimental data: the invention classifies images using oxford flows public dataset.

Analysis of experimental results: the invention introduces the enhanced mask, adaptively enhances the characteristics and improves the expression capability of the characteristics; the lead-in information entropy fuses a plurality of decisions to classify. The classification accuracy of the poplar method reaches 95.41% in the data set, and table 1 shows that the classification accuracy of the poplar method is superior to that of the poplar method under most conditions in the data set under the characteristic enhancement effect of the data set, and the classification accuracy of the poplar method in the data set can reach 97.03%.

TABLE 1

Table 2 shows the classification accuracy under the single decision-to-decision fusion in the dataset. Experimental data shows that when two decision effects are relatively close, the decision fusion can play a better role, but when the two decisions differ greatly, the effect of 'decision neutralization' can also occur.

TABLE 2

While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims

1. A flower image classification algorithm based on feature enhancement and decision fusion is characterized in that:

step1: performing corresponding cutting, scaling and normalization operations on the data set;

Step2: sending the preprocessed data into a VGG16 model pre-trained by ImageNet to extract multi-layer high-level features, and introducing a feature enhancement strategy to adaptively allocate feature weights so as to realize the effect of self-adaptive enhancement features;

Step3: training a plurality of custom softmax classifiers using the plurality of sets of features of the previous stage;

Step4: introducing information entropy to represent the certainty degree of each classifier, so as to adaptively allocate decision fusion weights, and realizing classification through decision fusion;

the Step2 specifically comprises the following steps:

Step2.1: removing three full-connection layers of the pretrained VGG16 model, and sending a training set, a verification set and a test set into the three full-connection layers to extract high-level features;

Step2.2: respectively storing the characteristics of blocki _ convj layers and the block5_pool layers of the training set, the verification set and the test set;

step2.3: adaptively enhancing the features of blocki _ convj layers by using a feature enhancement masking strategy to obtain blocki _ convj _en;

If the blocki _ convj output feature map is M _ij＝F_ij (x, y, P), the blocki _ convj _en output feature map is M _{ij_en}＝F_{ij_en} (x, y, P), and a feature enhancement mask M _en＝F_en(x,y,p),M_{ij_en}＝M_ij×M_en is introduced, wherein x is equal to or less than 1, W is equal to or less than y, P is equal to or less than 1, and wherein W represents the side length of the feature map and P represents the channel number of the feature map;

step2.4: splicing blocki _ convj _en layers and the block5_pool layers into new tensors Concat _ij respectively;

The Step4 specifically comprises the following steps:

Step4.1: using dec_ brij and respectively testing the images of the test set to obtain the probability that each sample belongs to each class, and for sample x _n, determining a test result p _ij(x_n)＝[p_ij1(x_n)p_ij2(x_n)…p_ijL(x_n of a branch dec_ brij, wherein L represents the total class number of the samples, and p _ijl(x_n) represents the probability that a decision dec_ brij predicts that sample x _n belongs to class L;

step4.2: information entropy H (x) is introduced to represent the degree of certainty of the decision model for the decision, Wherein i is more than or equal to 1 and less than or equal to 5, j is more than or equal to 1 and less than or equal to 3;

H _ij (x) represents the degree of certainty of the decision branch dec_ brij with respect to the classification result, and p _ijl (x) represents the probability that the decision branch dec_ brij decides the sample x as the first class;

Wherein i is more than or equal to 1 and less than or equal to 5, j is more than or equal to 1 and less than or equal to 3; omega _ij represents the decision weight assigned to dec_ brij;

Representing the probability that the prediction sample x belongs to the category L after decision fusion;

Step4.6: for sample x, the final prediction class Label (x) =argmax (P (x)).

2. The floral image classification algorithm based on feature enhancement and decision fusion as claimed in claim 1, wherein Step1 is specifically:

step1.1: dividing an image data set into a training set, a verification set and a test set;

Step1.2: cutting each image in the training set into k square images by adopting a normal random cutting strategy, and cutting each image in the verification set and the test set into one square image by adopting a central cutting strategy;

step1.3: scaling the images in the training set, the test set and the verification set after clipping into the size 224 multiplied by 224 needed by the VGG16 model by using a Lanczos interpolation mode;

3. The floral image classification algorithm based on feature enhancement and decision fusion as claimed in claim 1, wherein step2.3 is specifically:

step2.3.1: blocki — convj layers of feature maps, called M _ij, with dimensions w×w×p, i.e., each of P channels has a pixel matrix with dimensions w×w, compressing the P channels to one channel to obtain an overlay map M _stack with dimensions w×w, and calculating an average map M _average, wherein M _ij＝F_ij (x, y, P), if one of the channel feature maps is denoted as M _p, M _stack＝m₁+m₂+…+m_P,M_average＝M_stack/P;

Step2.3.2: order the A threshold value representing a distinction between a high response area and a non-high response area, the high response area being more likely to be an area expressing flowers;

Step2.3.3: if M _ij = F (x, y, p) > thres, then M _en＝1+(M_ij (x, y, p) -thres)/(a-thres);

If M _ij =f (x, y, p) < thres, then M _en =b, where a and b are hyper-parameters;

Step2.3.4：M_{ij_en}＝M_ij×M_en。

4. The floral image classification algorithm based on feature enhancement and decision fusion as claimed in claim 1, wherein Step3 is specifically:

Step3.1: connecting the global average pooling layer with a full-connection layer containing 1024 nodes, and constructing a softmax classifier by adopting a softmax function as an activation function;

Step3.2: the tensor Concat _ij is respectively connected into the classifier, and the output characteristics of the blocki _ convj layer and the block5_pool layer are adopted for training and verification;