CN118429625B

CN118429625B - Kitchen waste target detection method based on active learning selection strategy

Info

Publication number: CN118429625B
Application number: CN202410900338.6A
Authority: CN
Inventors: 梁桥康; 舒立业; 秦海; 殷义; 郑正月; 申新朴
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2024-07-05
Filing date: 2024-07-05
Publication date: 2024-10-11
Anticipated expiration: 2044-07-05
Also published as: CN118429625A

Abstract

The invention discloses a kitchen waste target detection method based on an active learning selection strategy, which is used for acquiring a kitchen waste target detection data set acquired in a real scene; By aligningMedium sample clustering and constructing marked data set according to sample representativeness; makeTraining the target detection model while usingAndTraining a variation self-encoder (VAE) and an antagonism network (GAN) discriminator in the active learning model; when the active learning overall budget is not reached, reasoning the unlabeled image and the enhanced image thereof by using a current target detection model, and constructing a primary selection sample set according to the similarity; screening the samples by using the current VAE and the discriminator to obtain a final sample set; after labeling the selected final sample, repeating the steps until the labeling overall budget is reached; and carrying out target detection on the kitchen waste image by using the finally obtained target detection model. The invention achieves better detection effect with less labeling cost.

Description

Kitchen waste target detection method based on active learning selection strategy

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a kitchen waste target detection method based on an active learning selection strategy, which is used for detecting kitchen waste image targets.

Background

At present, the domestic garbage treatment plants basically adopt manual sorting, the sorting labor intensity is high, and the sorting efficiency is low. This problem can be effectively ameliorated by using intelligent control systems on the refuse treatment sorting line to assist or replace manual work with industrial robots, which involve a key technique, object detection, involving sorting the sorted objects and determining their bounding box positions. With the improvement of the data processing capability of computer technology, deep learning of automatic feature representation from data is favored, breakthrough progress is made, and the precision and accuracy are superior to those of the traditional target detection algorithm. However, deep learning relies on a large amount of manually marked data to improve model performance, and the data marking is time-consuming and labor-consuming and has subjective errors caused by lack of knowledge and the like, and particularly, the target detection task needs to specifically mark the types and positions of a plurality of targets in each picture, which brings great challenges to manual marking cost and efficiency. How to obtain the best model detection performance by using as few labeling samples as possible becomes a difficulty in the task of deep learning target detection.

The core idea of active learning in machine learning is to combine tag data and unlabeled data, and fully utilize the information of unlabeled data to improve the model performance. The active learning selects samples of unknown features or relatively uncertain features of the model to carry out manual annotation by designing an inquiry strategy, so that annotation deviation caused by the model is avoided, and better model performance is achieved by using less annotation cost. In 1987, d.angle professor proposed the use of active learning in labeling samples, active learning has been widely used in the fields of natural language processing, medical diagnosis, financial analysis, and the like. Active learning selects one/a group of sample points which are most valuable for improving the performance of the model from a large amount of unlabeled data, and transmits the sample points to Oracle annotation, the labeled sample is added into a training set to train the model again, and the process is repeated until the budget is exhausted or a preset termination condition is reached. The core of the method is that a sample which is most beneficial to improving the performance of a model is selected by a query strategy, a plurality of information-based methods and representative-based methods are developed aiming at different scenes of active learning based on pools, based on flows and based on synthetic queries, even if sampling deviation caused by the independent use of the information-based methods and the increase of labeling cost caused by the independent use of the representative-based methods selected by the samples with low information content are avoided, researchers try to find balance points between the two, and the information-based and representative-based query strategies are considered.

Disclosure of Invention

The invention provides a kitchen waste target detection method based on an active learning selection strategy, which fully exerts unlabeled data value by utilizing active learning, and considers the problem of unbalanced categories of targets, boundary frames and self-built data sets when a sample is selected by active learning, and achieves a better detection effect with less labeling cost by adopting a selection strategy based on information and representativeness.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a kitchen waste target detection method based on an active learning selection strategy comprises the following steps:

step 1, acquiring a kitchen waste target detection data set acquired in a scene of a real kitchen waste sorting line And pre-treating;

Step 2, for the data set Clustering the image samples in the model, and taking initial labeling budget by category number; selecting the image data with the highest representativeness in each class, manually marking the image data to form an initial marked data set; Updating unlabeled data sets；

Step 3, the current marked data setTraining in a target detection model while using a labeled datasetAnd unlabeled data setsTraining a variation self-encoder (VAE) and a discriminator against a network (GAN) in an active learning model;

step 4, when the total budget of active learning is not reached, continuing to step 5, otherwise, jumping to execute step 8;

Step 5, reasoning unlabeled images and enhanced images thereof by using a target detection model obtained by current training, comparing the similarity of reasoning results, and selecting image data with low similarity in unlabeled data sets as a primary selection sample to construct a primary selection sample set of the current round;

Step 6, predicting the labeling confidence of the primary selected samples of the primary selected sample set by using the variable self-encoder VAE and the discriminator against the network GAN obtained by the current training, and selecting a plurality of samples with the lowest labeling confidence to construct a final sample set of the current round;

Step 7, after the final sample set of the current round is manually marked, the final sample set is added to the marked data set And synchronously updating unlabeled data setsReturning to the step 3 to perform model training and labeling sample selection of the next round;

And 8, taking the target detection model obtained by current training as a final kitchen waste target detection model, and carrying out target detection on the kitchen waste image.

Further, the KNN core density of the sample is selected as a representative evaluation index in step 2, and the calculation formula is as follows:

In the method, in the process of the invention, Represents the jth sampleBased on itThe KNN core density of the nearest neighbors; target detection data set for kitchen waste The number of samples in (a); Representation unit The volume of the dimension ball is equal to the volume of the dimension ball,Is the characteristic dimension of the sample; representing a sample And to thisAverage distance of the neighbors; representing a sample A kind of electronic deviceThe first of the neighborsA sample number; representing a sample With its neighbor samplesIs a distance of (2); Representing a Gamma function.

Further, the distance between two samples, the L2 norm spatial distance between the two sample feature maps, is expressed as:

In the method, in the process of the invention, Representing a sampleAnd sampleIs defined by the distance between the feature maps of (a),Representative sampleThe feature map of the network is extracted via features,Representative sampleAnd extracting a characteristic diagram of the network through the characteristics.

Further, the target detection adopts a modified RT-DETR model, and is specifically as follows: firstly, adopting one Resnet as a backbone network to extract the characteristics of an input image, and outputting multi-scale characteristics with different downsampling multiples; then, a hybrid encoder is adopted, and the multi-scale features are converted into image feature sequences through intra-scale feature interaction AIFI and a trans-scale feature fusion module CCFM; then adopting IoU perceived inquiry selection to select a fixed number of characteristics from the image characteristic sequence output by the hybrid encoder as the initial object inquiry of the decoder; and finally, iteratively optimizing the object query by adopting a decoder, and mapping the object query into a boundary box and a category confidence score by a prediction head.

Further, the similarity of the reasoning results is compared according to the predicted consistency evaluation index, and the predicted consistency evaluation index is defined as:

representing the object detection model as The image enhancement operation is expressed asLet unlabeled image of input target detection model beThen:

(1) Using object detection models For unlabeled imagesReasoning is carried out, and then enhancement operation is carried out on the reasoning result, which is expressed as:

In the method, in the process of the invention, Is the firstThe individual inference targets are inferred as probability vectors for all classes,，Represents the firstThe probability of a class is determined by,Represent the firstThe bounding box coordinates of the objects are inferred,Representing the number of inference targets;

(2) For unlabeled images Performing enhancement operation and reusing the target detection modelReasoning is carried out on the obtained enhanced image, and the reasoning prediction result is expressed as:

In the method, in the process of the invention, The j-th inference target is inferred to be a probability vector for all classes,，Represents the firstThe probability of a class is determined by,Representing bounding box coordinates of the jth inference target;

(3) Combining all the target detection objects obtained in step (2) with Matching and selectingThe box with the largest cross-over ratio is taken as the matching box, and is expressed as:

In the method, in the process of the invention, Representation ofAnd (3) withIs a cross-over ratio of (c);

(4) The value of the cross-merging ratio of the matching frames is used as the consistency of the prediction boundary frames, the consistency of the prediction probability of the corresponding category of the target is calculated through JS divergence, and the minimum value of the consistency of all the prediction targets is taken as the unlabeled image With which the image is enhancedA predictive consistency evaluation index value of (1); expressed as:

In the method, in the process of the invention, AndRespectively representing the consistency of the prediction matching target bounding box and the consistency of the corresponding class prediction probability,The calculation of the degree of divergence is indicated,Representing a target detection objectIs used for the prediction consistency of (1),Representing unlabeled imagesWith which the image is enhancedIs a predictive consistency evaluation index of (1).

Further, on the basis of selecting the primary selection sample according to the similarity of the comparison and reasoning results of the prediction consistency evaluation indexes, the sample is further selected from the primary selection sample, so that the primary selection sample set of the current turn and the current marked data set are finally constructedThe category distribution difference of (2) is the largest; specifically:

counting the current marked data set Category distribution of (c):

In the method, in the process of the invention, Representing annotated data setsThe category distribution of all sample objects in the model,Representing a datasetCategory information set of all sample targets; Performing normalization operation to make vector Is in the range of (0, 1) and the sum of all elements is 1; representing the function of the index, Represent all of the first in the statistics setThe statistical number of class objects,Representative belongs toOne of the elements of (a);

a sample set formed by selecting the primary selected samples according to the similarity of the comparison and reasoning results of the predictive consistency evaluation indexes is set as Consider the most confident class of original image and enhanced image predictions:

In the method, in the process of the invention, Representation ofConcentrating samplesIs set in the target class distribution of (1),Representing a sampleMiddle (f)Class target class probability;

Comparing the category distribution of the two data sets, and selecting samples with larger difference to form a primary selection sample set of the current round ：

In the method, in the process of the invention,Representing a sample setSampling function, determiningThe samples that maximize the JS divergence of the category distribution are pooled,Representing the divergence calculation.

Further, the active learning model comprises a variational self-encoder (VAE) and a discriminator against a network (GAN), the VAE downsamples input data and maps the downsampled data to a low-dimensional potential space, and the obtained potential space vector is used for the discriminator of the GAN to judge whether the data is marked or not; training the input data of the VAE and the arbiter from the labeled setAnd unlabeled setsInput data at the time of active learning sample selection comes from a preliminary selection sample set；

When actively learning sample selection, if a potential space vector obtained by VAE encoding of a certain unlabeled sampleThe difference between the potential space vector learned by the GAN discriminator and the potential space vector learned by the GAN discriminator is larger, namely the confidence that the labeled sample is predicted to be lower by the GAN discriminator, and the output confidence of the discriminator is the lowest is selected from the initial sample set of the current roundAs the final selected samples.

The invention provides a kitchen waste target detection method based on an active learning selection strategy, which achieves a better detection effect with less labeling cost by utilizing active learning. And evaluating the sample with uncertain individual information according to the consistency of the prediction category of the target detection model and the boundary frame and the consistency of the distribution of the prediction category and the marked data set category, ensuring that the mutual information of the selected sample category is rich, and relieving the category imbalance problem of the data set. And further, a variation self-encoder and an antagonism network discriminator are used for distinguishing unlabeled and labeled data, judging the data value from the image, and avoiding sampling deviation caused by a target detection model. Compared with the existing kitchen waste target detection method by active learning, the method has the following advantages:

(1) The method overcomes the defects that a transducer hybrid encoder module in an RT-DETR target detection model lacks induction bias and the data marking cost is high due to the fact that a deep learning method excessively depends on marking data, and achieves a good detection effect with fewer marking cost by constructing an active learning framework based on model prediction and sample data.

(2) The unified measurement model disclosed by the invention is used for uniformly measuring the consistency of the model corresponding to the unlabeled image and the enhanced image thereof and the boundary frame, calculating the consistency of the sample prediction type and the labeled set type distribution, evaluating the samples with low individual information selection consistency (namely, the model is more uncertain), ensuring that the mutual information of the selected sample types is rich, and relieving the type imbalance problem of the data set.

(3) The invention uses a variational self-encoder (VAE) and a counternetwork discriminator to distinguish unlabeled and labeled data, selects labeled samples different from learned potential space vectors, and improves the representation performance of new labeled data to the maximum extent.

(4) Considering the selection of initial labeling samples, untrained countering network discriminant failures, obtaining samples with higher local density by using a KNN density estimator, and further selecting clustered unique samples by using a C-Means algorithm, so as to ensure the representativeness of the initial labeling samples and avoid excessive similarity of the samples.

(5) The method has strong effectiveness and practicability, and the method carries out full experiments on the self-built kitchen waste target detection data set OD-KW and obtains effective experimental effects.

Drawings

FIG. 1 is a schematic diagram of an active learning target detection technique based on a prediction consistency and a sample representative selection strategy according to an embodiment of the present invention.

FIG. 2 is a flow chart of active learning object detection based on predictive consistency and sample representative selection strategy in accordance with an embodiment of the present invention.

FIG. 3 shows comparison of RT-DETR target detection mAP on an OD-KW data set according to different active learning methods of the present invention.

FIG. 4 is a comparison of RT-DETR target detection AP ₅₀ on an OD-KW dataset for different active learning methods according to an embodiment of the present invention.

Fig. 5 illustrates the influence of two-stage selection strategy fusion on target detection mAP under different labeling amounts according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating the effect of a two-stage selection policy sequence in accordance with an embodiment of the present invention.

Detailed Description

According to the technical scheme, the kitchen waste image target detection is realized based on the active learning of the prediction consistency and the sample representative selection strategy, and the kitchen waste image target detection can be performed by using a Python programming language for experiments or a C/C++ programming language for engineering application.

The invention provides a kitchen waste target detection method based on an active learning selection strategy, wherein a technical roadmap is shown in fig. 1, a processing flow is shown in fig. 2, and the method comprises the following steps:

step 1, constructing a kitchen waste target detection data set OD-KW collected in a scene of a real kitchen waste sorting line, wherein the collection and pretreatment of the data set are included.

The OD-KW images of the data set are derived from a sorting production line of a ring-wound (Zhengzhou) kitchen garbage treatment plant, 15994 actual kitchen garbage sorting scene images are acquired in different months in one year, and the number of images in four seasons is 3771, 4735, 2130 and 4358 respectively, so that the representativeness and the richness of the time distribution and the category distribution of the data set are ensured. All images of the data set are adjusted from the original resolution 3256 multiplied by 2724 to 640 multiplied by 480 according to the configuration of a laboratory hardware platform, the data set is determined to be classified into eight types of targets including garbage bags, metals, plastic bottles, other plastics, paper boxes, glass, cola packages and shoes after the investigation of the subsequent processing links of kitchen waste, and the data set is open and allows new types to appear in the subsequent active learning process. In the embodiment, 3125 images are completely manually marked for improving the efficiency of the embodiment, and are divided into training sets according to the 8:2 proportionAnd a verification setThe training set annotation data is directly called by Oracle annotation links.

Step 2, for the data setClustering the image samples in the model, and taking initial labeling budget by category number; selecting the image data with the highest representativeness in each class, manually marking the image data to form an initial marked data set; Updating unlabeled data sets。

Judging the representativeness and diversity of the data from the image by KNN density estimation and C-Means clustering on the premise of uncertainty of a downstream task objective function, and according to the set initial labeling budget=500, Unlabeled setI.e. training set of step 1In determining an initial annotation dataset，Comprises=2500 Samples.

To avoid repeated sampling near the density peak point, and to ensure diversity of selected samples, unique samples in each cluster of the C-Means clusters are sampled, and the number of clusters is the initial sample labeling budget=500. The sampling process is toThe individual samples C-Means are clustered intoEach sample is provided with a cluster label, and a sample with the largest representative evaluation value in the same cluster sample (a sample with the same cluster label) is taken as a unique sample selected by the cluster. Each cluster traverses until a preliminary marker data budget is reachedSampling to form an initial set of labelsUpdating unlabeled gathers。

Estimating sample density the estimated sample is representative, the idea being that samples with closer distance to the neighbors and higher local densities are more representative. In this embodiment, the KNN core density of the sample is selected as a representative evaluation index, and the calculation formula is as follows:

The distance between two samples is represented as an L2 norm space distance between two sample feature maps:

In the method, in the process of the invention, Representing a sampleAnd sampleIs defined by the distance between the feature maps of (a),Representative sampleThe feature map of the network is extracted via features,Representative sampleAnd extracting a characteristic diagram of the network through the characteristics. The feature extraction network of this embodiment employs a pre-training network Resnet.

In addition, for data setsSamples of (3)Which is provided withThe definition of a neighbor is: calculating a sampleAnd data setDistance set of other samples in (a), i.e. For a pair ofSelecting the nearest front after sortingSamples as samplesA kind of electronic deviceAnd each neighbor sample.

Step 3, the marked data set selected in the step 2 is selectedAnd (3) sending the model into an RT-DETR target detection model for training, and simultaneously training a variational self-encoder (VAE) and a discriminator of a countermeasure network (GAN) in the active learning model by using the marked data set L and the unmarked data set U.

The target detection described in the examples uses an improved RT-DETR model, specifically as follows: firstly, taking one Resnet as a main network to extract the characteristics, and outputting three scale characteristics C3, C4 and C5 with downsampling multiples of 8, 16 and 32 respectively. And then the hybrid encoder processes the C5 feature map by a layer of encoder of a transducer, the two-dimensional C5 features are pulled into vectors, the vectors are restored to the two-dimensional F5 after multi-head self-attention and FFN, and the multi-scale features of C3, C4 and F5 are converted into a series of image features through cross-scale feature fusion. And finally, adopting IoU perception query selection encoder output sequence fixed number of characteristic Object queries as decoder initial Object query, and mapping the decoder iterative optimization Object query into target category confidence coefficient and boundary box by a prediction head.

In addition, the VAE downsamples and maps the input annotated data and the unlabeled data to a low-dimensional potential space, and encodes the input data into potential space vectors for the arbiter of the GAN to determine whether the data is annotated. The iteration of the VAE and GAN discriminators is a very small gaming process, the VAE attempts to fool all data of the discriminators into labeled data, and the discriminators confirm that the entered potential spatial vector belongs to the labeled data.

Only the GAN arbiter is used in the present invention. In the VAE+GAN discriminator of the invention, when the VAE+GAN discriminator is trained, the VAE maps marked data and unmarked data into the same potential space, which is equivalent to a generator of GAN, in other words, attempts to cheat the discriminator that all data are marked, in other words, the potential space vector V _u of an unmarked sample generated by the VAE is similar to the potential space vector Vl of the generated marked sample as far as possible, and the discriminator estimates that the input potential space vector comes from the marked sample as accurately as possible.

Some parameters in the training process are set as follows: the first round of model training is fine-tuned based on the model weights trained by the COCO dataset. The optimizer adopts AdamW, the basic learning rate is 0.0001, and the weight attenuation coefficient is 0.0001. The batch size is 16, the learning rate of the backbone network is 1×10 ^-5, and 50 epochs are trained for each active learning round. The hardware platform comprises a Ubuntu 20.04 system, a graphics card NVIDIA GeForce RTX 4090,CUDA11.7,CuDNN8.9.4 GPUs. Experimental environment configuration Pytorch.0.1, python3.8.

Step 4, judging whether to perform active learning selection samples: when the active learning overall budget is not reached, continuing to step 5, otherwise, jumping to execute step 8;

example the termination conditions according to active learning are as follows: whether an active learning annotation budget is reached. Setting the number of active learning rounds to 6, and marking 250 budgets for each round.

And 5, reasoning the unlabeled images and the enhanced images thereof by using the target detection model obtained by current training, comparing the similarity of the reasoning results, and selecting the image data with low similarity in the unlabeled data set as a primary selection sample to construct a primary selection sample set of the current round.

And comparing the similarity of the reasoning results according to the predicted consistency evaluation index, wherein the predicted consistency evaluation index is defined as:

(1) Image to be selected Enhanced image thereofIs a predictive consistency of (1).

Trained RT-DETR target detection modelFor imagesThe predicted outcome of (c) is expressed by the same enhancement:

Wherein, Is the firstThe individual inference targets are inferred as probability vectors for all classes,，Represents the firstThe probability of a class is determined by,Represent the firstThe bounding box coordinates of the objects are inferred,Representing the number of inference targets. Similarly, the prediction result of the enhanced image is expressed as:

Wherein, Is the firstThe individual inference targets are inferred as probability vectors for all classes,Represent the firstThe bounding box coordinates of the objects are inferred. Number of target detection objects due to single imageMost likely greater than 1, will need to beAnd (3) withMatching, selecting and enhancing in imageThe box with the largest cross-over ratio is taken as the matching box, and is expressed as:

The value of the cross-merging ratio of the matching frames is used as the consistency of the prediction boundary frames, the consistency of the prediction probability of the corresponding category of the target is calculated through JS divergence, and the minimum value of the consistency of all the prediction targets is taken as the unlabeled image With which the image is enhancedA predictive consistency evaluation index value of (1); expressed as:

In the method, in the process of the invention, AndRespectively representing the consistency of the prediction matching target bounding box and the consistency of the corresponding class prediction probability,The calculation of the degree of divergence is indicated,Representing a target detection objectIs used for the prediction consistency of (1),Representing unlabeled imagesWith which the image is enhancedIs a predictive consistency evaluation index of (1). Unlabeled collectionAll samples ofAscending order, selecting 2.4 times of samples (600 images) with marking budget to form a primary selected data set to be marked。

(2) Consistency of category distribution.

Based on the predicted category and the marked setThe consistency of the category distribution is that 2 times of samples (500 images) of the labeling budget with larger category distribution difference are selected. Marked setCategory distribution statistics:

Wherein, Representing annotated data setsThe category distribution of all sample objects in the model,Representing a datasetA set of class information for all sample targets,The normalization operation is performed so that the data of the data are obtained,Representing the function of the index,Represent all of the first in the statistics setThe statistical number of class objects,Is of the kindIs one of the elements of (a). Sample set to be selected in the same wayPrediction category distribution considers the most confident category of the original image and the enhanced image prediction:

Wherein, Representation ofConcentrating samplesIs set in the target class distribution of (1),Representing a sampleMiddle (f)Class target class probability. For the sampleAll target class probability distribution sets of predictive enhancement results of (2)WhereinWith all targetsClass category probabilityThe maximum (most confident) value represents the nth class probability of the image reasoning result, i.ePrediction results of original image and enhanced imageAnd (3) withIs sampled by the nth type probability additionFirst, theClass probability。

Comparing the category distribution of the two data sets, and selecting samples with larger difference to form a primary selection to-be-marked set of the current round：

Wherein, Representing a sample setSampling function, determiningThe samples that maximize the JS divergence of the category distribution are pooled,Representing the divergence calculation.

Step 6, using the variation self-encoder VAE obtained by current training and the discriminator against the network GAN to initially select the set to be markedThe labeling confidence of the initially selected samples is predicted, a plurality of samples with the lowest labeling confidence are selected, and a final labeling sample set of the current turn is constructed.

When actively learning sample selection, one unlabeled sample is mapped into potential space vector by VAEThe confidence (probability) from the labeled sample is estimated by the trained arbiter and input to the arbiter, and the closer the confidence of the arbiter output is to 1, the more probable data is labeled, and the closer to 0 is to be more probable unlabeled.

Description of the embodiments Using the variation self-encoder (VAE) trained in step 3 and the countermeasure network arbiter from the set to be annotatedIs selected if a potential space vector obtained by VAE encoding of an unlabeled sampleAND GAN discriminatorIf the learned potential space vector has larger difference, i.e. the prediction confidence of the discriminator is lower, the discriminator is considered to have more labeling value, and the labeling budget with the lowest confidence is selectedThe samples form the final set to be marked, and the embodiment is set=250：

Wherein, Representing the final set sampling function to be marked, and determiningConcentrated-result discriminatorAnd repeatedly sampling the sample with the minimum labeling confidence degree output for 250 times to form a final set to be labeled.

Step 7, after the final sample set of the current round is manually marked, the final sample set is added to the marked data setAnd synchronously updating unlabeled data setsAnd returning to the step 3 to perform model training and labeling sample selection of the next round.

And 8, when the active learning termination condition is reached, obtaining the optimal parameters of the target detection model. And taking the target detection model obtained by current training as a final kitchen waste target detection model, and carrying out target detection on the kitchen waste image.

In order to facilitate understanding of the technical effects of the present invention, a comparison of the application of the present invention and the conventional method is provided as follows:

Table 1, table 2, fig. 3 and fig. 4 show the comparison experimental results of labeling performance on the kitchen waste dataset ODKW based on different active learning algorithms of RT-DETR, and the comparison methods include Random (Random), sequential Sampling (SE), CALD and val. From fig. 3 and fig. 4, it is intuitively seen that under different labeling amounts selected by 6 rounds of active learning, the average precision mAP and the AP ₅₀ of the invention are better than those of other methods, and the labeling cost of the selection strategy of the invention is saved by about 15% as a whole. The detailed data in tables 1 and 2 show that the AP of the initial round is 1.3% higher than that of random selection, which shows that the effectiveness of KNN density estimation and C-Means clustering of the invention in selecting initial data is improved by 1% when the labeling quantity is 40% and 50%, and the samples selected by the method based on the selection strategy of prediction consistency and sample representativeness are more beneficial to improving the detection performance of the model.

In order to further demonstrate the effectiveness of different parts of the selection strategy based on the prediction consistency and the sample representativeness of the method, the influence of the fusion of different selection strategies on the target detection mAP under each labeling quantity is verified on ODKW data sets, the visual result is shown in figure 5, the method simultaneously considers that the improvement of the prediction consistency and the sample representativeness of the sample selection on the model detection precision is superior to that of the single use, and meanwhile, the sample representativeness selection strategy is used in the first round, so that the model performance of the current round is improved, and the effect of the subsequent round is also influenced. In addition, the sequencing experiment of the two-stage selection strategy of the prediction consistency and the sample representativeness is shown in fig. 6, and the final sample to be marked is better determined according to the sample representativeness after the sample is initially selected based on the prediction consistency.

The above embodiments are preferred embodiments of the present application, and various changes or modifications may be made thereto by those skilled in the art, which should be construed as falling within the scope of the present application as claimed herein, without departing from the general inventive concept.

Claims

1. The kitchen waste target detection method based on the active learning selection strategy is characterized by comprising the following steps of:

The KNN core density of the sample was selected as a representative evaluation index in step 2, and the calculation formula was:

；

In the method, in the process of the invention, Represents the jth sampleBased on itThe KNN core density of the nearest neighbors; target detection data set for kitchen waste The number of samples in (a); Representation unit The volume of the dimension ball is equal to the volume of the dimension ball,Is the characteristic dimension of the sample; representing a sample And to thisAverage distance of the neighbors; representing a sample A kind of electronic deviceThe first of the neighborsA sample number; representing a sample With its neighbor samplesIs a distance of (2); representing a Gamma function;

step 3, the current marked data set Training in a target detection model while using a labeled datasetAnd unlabeled data setsTraining a variation self-encoder (VAE) and a discriminator against a network (GAN) in an active learning model;

2. The kitchen waste target detection method based on the active learning selection strategy according to claim 1, wherein the distance between two samples is represented by an L2 norm space distance between two sample feature graphs:

；

3. The kitchen waste target detection method based on the active learning selection strategy according to claim 1, wherein the target detection adopts an improved RT-DETR model, specifically comprising the following steps: firstly, adopting one Resnet as a backbone network to extract the characteristics of an input image, and outputting multi-scale characteristics with different downsampling multiples; then, a hybrid encoder is adopted, and the multi-scale features are converted into image feature sequences through intra-scale feature interaction AIFI and a trans-scale feature fusion module CCFM; then adopting IoU perceived inquiry selection to select a fixed number of characteristics from the image characteristic sequence output by the hybrid encoder as the initial object inquiry of the decoder; and finally, iteratively optimizing the object query by adopting a decoder, and mapping the object query into a boundary box and a category confidence score by a prediction head.

4. The kitchen waste target detection method based on the active learning selection strategy according to claim 1, wherein the similarity of the reasoning results is compared according to the predicted consistency evaluation index, and the predicted consistency evaluation index is defined as:

；

In the method, in the process of the invention, The j-th inference target is inferred to be a probability vector for all classes,，Represents the firstThe probability of a class is determined by,Represent the firstBoundary frame coordinates of the individual inference targets;

；

5. The kitchen waste target detection method based on the active learning selection strategy according to claim 4, wherein the method is characterized in that on the basis of selecting the primary selection sample according to the similarity of the prediction consistency evaluation index comparison reasoning result, the sample is further selected from the primary selection sample, so that the primary selection sample set of the current turn and the current marked data set are finally constructedThe category distribution difference of (2) is the largest; specifically:

counting the current marked data set Category distribution of (c):

；

6. The kitchen waste target detection method based on the active learning selection strategy according to claim 1, wherein the active learning model comprises a variational self-encoder (VAE) and a discriminator against a network (GAN), the VAE downsamples input data and maps the downsampled data to a low-dimensional potential space, and the obtained potential space vector is used for the discriminator of the GAN to judge whether the data is marked or not; training the input data of the VAE and the arbiter from the labeled setAnd unlabeled setsInput data at the time of active learning sample selection comes from a preliminary selection sample set；