[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116958889A - Semi-supervised small sample target detection method based on pseudo tag - Google Patents

Semi-supervised small sample target detection method based on pseudo tag Download PDF

Info

Publication number
CN116958889A
CN116958889A CN202310919191.0A CN202310919191A CN116958889A CN 116958889 A CN116958889 A CN 116958889A CN 202310919191 A CN202310919191 A CN 202310919191A CN 116958889 A CN116958889 A CN 116958889A
Authority
CN
China
Prior art keywords
network
pseudo
image
target detection
small sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310919191.0A
Other languages
Chinese (zh)
Inventor
李忠辉
曹志强
唐英博
王硕
亢晋立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Nengchuang Technology Co ltd
Institute of Automation of Chinese Academy of Science
Original Assignee
Beijing Nengchuang Technology Co ltd
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Nengchuang Technology Co ltd, Institute of Automation of Chinese Academy of Science filed Critical Beijing Nengchuang Technology Co ltd
Priority to CN202310919191.0A priority Critical patent/CN116958889A/en
Publication of CN116958889A publication Critical patent/CN116958889A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application belongs to the field of computer vision, in particular relates to a semi-supervised small sample target detection method based on a pseudo tag, and aims to solve the problem that a new sample lacks in-class change under the condition that a labeling sample of a new sample object is limited, so that the detection performance of the small sample target is affected. The method comprises the following steps: acquiring an image of a scene to be detected through a visual sensor, and taking the image as a first image; the first image is sent to a student network in a trained semi-supervised small sample target detection network based on a pseudo tag, and a target detection result corresponding to the scene image to be detected is obtained; the semi-supervised small sample target detection network based on the pseudo tag comprises a TFA network, a teacher network and a student network; the network structure of the TFA network, the teacher network and the student network is the same. The application can effectively improve the adaptability of the small sample target detection network to new objects, and provides technical support for small sample target detection of robots in the fields of home service, office and the like.

Description

Semi-supervised small sample target detection method based on pseudo tag
Technical Field
The application belongs to the field of computer vision, and particularly relates to a semi-supervised small sample target detection method, system and device based on a pseudo tag.
Background
Along with the rapid development of artificial intelligence technology, the characterization learning ability of the deep neural network is increasingly enhanced, and a target detection network trained on a large-scale labeling data set obtains good detection performance. However, the method for establishing the large-scale labeling data set is time-consuming and labor-consuming, and under the conditions that some objects are more in category and data are not easy to acquire, a sample set meeting training requirements is difficult to construct, for example, a service robot is in actual environments such as home, office and the like, objects to be detected are various in category and different in appearance, and training samples which are enough in quantity and have rich intra-category changes are difficult to collect for all objects. Therefore, how to detect a new class of target objects (i.e., small sample target detection) with a small amount of annotation data is of interest to researchers.
For small sample target detection, researchers at home and abroad develop intensive researches, and category representative features are learned from a small number of samples by combining learning strategies such as meta learning or transfer learning. Among them, a small sample target detection method based on transfer learning is widely adopted because of simplicity and efficiency, and a representative method is TFA (Two-stage Fine-tuning application) or the like. TFA is based on the Faster R-CNN target detection network and is mainly composed of a feature extractor, candidate region generation networks RPN (Region Proposal Network), roI (Region of Interest) Pooling, a classifier and a regressor. TFA divides the training process into two phases: the first stage firstly, training is carried out on a basic class data set containing rich annotation data to obtain a universal object detector; and in the second stage, freezing the weight parameters of the feature extractor, the RPN and the RoIPooling, and performing fine adjustment on the classifier and the regressor by using a small amount of marking data of the new type of target objects, so that the detection of the new type of target objects is realized. Because the labeling data of the new type target object is less and the change in the new type is lacking, the fine-tuned network is difficult to capture the type distinguishing characteristics from the limited samples of the new type target object, so that the problem of poor adaptability of the network to the new type is easy to occur. In practice, there are a large number of images in the base class dataset that contain new class target objects, in which the new class target objects are not labeled, and thus are treated as background in the first stage training. If these new classes of objects can be automatically labeled and learned, this will lead to an increase in network performance. At this time, semi-supervised learning provides an effective solution. Semi-supervised learning generally adopts teacher network and student network collaborative training, wherein the teacher network generates pseudo labels aiming at unlabeled new types of target objects, and the pseudo labels are used for further training the student network, so that the problem of scarcity of new types of labeling data can be relieved.
Therefore, it is necessary for those skilled in the art to combine semi-supervised learning with small sample target detection methods to fully mine a large number of unlabeled new class target objects in the base class data set, so as to solve the problem that the new class sample lacks in-class variation under the condition that the labeled sample of the new class target object is limited, thereby affecting the small sample target detection performance.
Disclosure of Invention
In order to solve the above problems in the prior art, namely to solve the problem that the new class sample lacks intra-class variation under the condition that the labeling sample of the new class target object is limited, thereby influencing the detection performance of the small sample target, the application provides a semi-supervised small sample target detection method based on pseudo labels, which comprises the following steps:
step S10, acquiring an image of a scene to be detected through a vision sensor as a first image;
step S20, the first image is sent into a student network in a trained semi-supervised small sample target detection network based on pseudo labels, and a target detection result corresponding to the scene image to be detected is obtained;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
In some preferred embodiments, the student network is trained by:
step A10, training data is obtained, wherein the training data comprises a non-labeling training image and a small number of new class labeling samples; inputting the unlabeled training image into a pre-trained TFA network to obtain a predicted result of a new type of target object in the unlabeled training image, wherein the predicted result is used as a first predicted result; the non-labeling training image is an image containing a new type of target object in a base type data set of the COCO data set;
step A20, removing first object candidate frames with confidence scores lower than a set background threshold value from the first prediction result, calculating the values of the rest similar first object candidate frames at set quantiles, taking the values as primary class self-adaptive thresholds of the new classes, and taking the first object candidate frames with confidence scores higher than the primary class self-adaptive thresholds of the corresponding classes as first pseudo tags; the first object candidate frame is an object candidate frame corresponding to the first prediction result;
step A30, respectively carrying out weak data enhancement and strong data enhancement treatment on the non-labeling training image to obtain an image with reinforced weak data and an image with reinforced strong data; inputting the image with the reinforced weak data into a teacher network to obtain a prediction result of a new type of target object in the image with the reinforced weak data, and taking the prediction result as a second prediction result;
step A40, removing a second object candidate frame with the confidence score lower than a set background threshold value from the second prediction result, and taking a second object candidate frame with the confidence score higher than an advanced class self-adaptive threshold value of the corresponding class as a second pseudo tag in the rest second object candidate frames; the second object candidate frame is an object candidate frame corresponding to the second prediction result;
step A50, fusing the first pseudo tag and the second pseudo tag to obtain a third pseudo tag; enhancing the third pseudo tag with the strong data imageA new type of non-labeling sample is formed, the new type of non-labeling sample and the small amount of new type of labeling sample are input into a student network, a loss value is calculated, the weight parameter of the student network is updated, and the weight parameter of a teacher network is updated by utilizing the updated weight parameter of the student network;
step A60, the steps A10 to A50 are circulated until a trained student network is obtained.
In some preferred embodiments, the values of the remaining homogeneous first object candidate boxes at the set quantiles are calculated by:
and calculating the value of the remaining similar first object candidate frames at the set quantiles by using the quantile function in the python third-party data analysis tool library pandas library.
In some preferred embodiments, among the remaining second object candidate boxes, a second object candidate box having a confidence score higher than the advanced class adaptive threshold of the corresponding class is used as the second pseudo tag, by:
initializing the high-level class self-adaptive threshold value by using the primary class self-adaptive threshold value, and dynamically adjusting and updating in the training process; the dynamic adjustment and update process is as follows:
record the number of second pseudo tags accumulating the j new classes generated by n mini-latches asn is the number of accumulated mini-batch, when +.>Computing class adaptive quantiles-> wherein ,n for accumulating the number of candidate pseudo tags of the jth new class obtained by N mini-latches max Is the maximum number of pseudo tags;
when (when)Computing class adaptive quantiles-> wherein ,Nmin Is the minimum number of pseudo tags; in other cases, the class adaptive quantile remains unchanged;
further, by adopting a quantile function in the python third-party data analysis tool library pandas library, the confidence score which belongs to the same new class is calculated to be not lower than the background threshold tau b Updating the high-level class self-adaptive threshold value according to the value of the second object candidate frame in the class self-adaptive quantile;
and taking the second object candidate frame with the confidence score higher than the high-level class self-adaptive threshold value of the corresponding class in the second prediction result as a second pseudo tag.
In some preferred embodiments, the first pseudo tag and the second pseudo tag are fused to obtain a third pseudo tag, and the method comprises the following steps:
combining the first pseudo tag and the second pseudo tag, and taking the combined result as a pseudo tag candidate set;
in the pseudo tag candidate set, suppressing the pseudo tags with low confidence scores belonging to the same object by adopting a non-maximum suppression NMS (network management system), and further obtaining a tag fusion result, namely a third pseudo tag; the judging standards that the two pseudo tags in the pseudo tag candidate set belong to the same object are that the two pseudo tags are the same in category and the overlap ratio IoU is not lower than a set overlap ratio threshold.
In some preferred embodiments, the student network has a loss function during training of:
wherein ,for the total loss of the student network during training, < ->To annotate the sample for new class for supervision loss, +.>To be directed to the unsupervised loss of new class unlabeled samples, lambda u To lose the weight super-parameter, N l and Nu Respectively representing the number of new type marked samples and new type unmarked samples in a mini-batch, +.>Representing a kth new class of annotation samples,representing the kth new class unlabeled exemplar, < >>The RPN classification loss, the RPN regression loss, the RoI head classification loss, and the RoI head regression loss are shown, respectively.
In some preferred embodiments, the weight parameters of the student network are updated, and the weight parameters of the teacher network are updated by using the updated weight parameters of the student network, and the method comprises the following steps:
θ t ←αθ t +(1-α)θ st
wherein, gamma is learning rate super-parameter, alpha is smooth super-parameter, theta st Is the weight parameter of the student network, theta t Representing the weight parameters of the teacher's network.
In a second aspect of the present application, a semi-supervised small sample target detection system based on pseudo labels is provided, including: the system comprises an image acquisition module and a small sample target detection module;
the image acquisition module is configured to acquire an image of a scene to be detected through the vision sensor and serve as a first image;
the small sample target detection module is configured to send the first image into a student network in a trained semi-supervised small sample target detection network based on a pseudo tag to obtain a target detection result corresponding to the scene image to be detected;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
In a third aspect of the present application, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement a pseudo tag based semi-supervised small sample target detection method as described above.
In a fourth aspect of the present application, a processing device is provided, including a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement a pseudo tag based semi-supervised small sample target detection method as described above.
The application has the beneficial effects that:
the application can effectively improve the adaptability of the small sample target detection network to new objects, and provides technical support for small sample target detection of robots in the fields of home service, office and the like.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
fig. 1 is a schematic flow chart of a semi-supervised small sample target detection method based on pseudo labels.
FIG. 2 is a schematic diagram of a semi-supervised small sample target detection system based on pseudo labels according to the present application.
Detailed Description
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
The application discloses a semi-supervised small sample target detection method based on a pseudo tag, which is shown in fig. 1 and comprises the following steps:
step S10, acquiring an image of a scene to be detected through a vision sensor as a first image;
step S20, the first image is sent into a student network in a trained semi-supervised small sample target detection network based on pseudo labels, and a target detection result corresponding to the scene image to be detected is obtained;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
In order to more clearly describe the pseudo-label-based semi-supervised small sample target detection method of the present application, each step in the method embodiment of the present application is described in detail below with reference to fig. 1.
In the following embodiments, a detailed description is given of a construction and training process of a semi-supervised small sample target detection network based on pseudo labels, and then a description is given of an inference process of small sample target detection.
1. Construction and training process of semi-supervised small sample target detection network based on pseudo tag
The embodiment is a preferred implementation mode, and a semi-supervised small sample target detection network comprising a pre-training TFA network, a teacher network, a student network, a primary class self-adaptive threshold screening module, an advanced class self-adaptive threshold screening module and a label fusion module is pre-constructed. The training process of the pre-training TFA network is shown in X.Wang, T.E.Huang, T.Darrell, J.E.Gonzalez, and F.Yu.frame simple feature-shot object detection.International Conference on Machine Learning,2020:9919-9928, the teacher network and the student network have the same network structure as the pre-training TFA network, the weight parameters of the teacher network and the student network are initialized by adopting the weight parameters of the pre-training TFA network, and the weight parameters of the feature extractors of the teacher network and the student network are frozen in the training process. The teacher network and the student network follow the semi-supervised learning paradigm, and the teacher network and the pre-training TFA network generate pseudo labels for a large number of unlabeled new class target objects in the base class data set for further training the student network, so that the adaptability of the student network to the new class is improved. The primary class self-adaptive threshold screening module is used for screening a pseudo tag generated by a pre-training TFA network to serve as a first pseudo tag, the advanced class self-adaptive threshold screening module is used for screening a pseudo tag generated by a teacher network to serve as a second pseudo tag, and the tag fusion module fuses the first pseudo tag and the second pseudo tag into a third pseudo tag, so that a better supervision signal is provided for training of a student network. The student network updates the network weight in a random gradient descent SGD (Stochastic Gradient Descent) mode under the guidance of a loss function (the specific loss function setting is described in detail below); the teacher network directly updates the weight parameters of the teacher network by combining the weight parameters of the student network to generate better pseudo tags. The training process of the semi-supervised small sample target detection network based on the pseudo tag is as follows:
step A10, training data is obtained, wherein the training data comprises a non-labeling training image and a small number of new class labeling samples; inputting the unlabeled training image into a pre-trained TFA network to obtain a predicted result of a new type of target object in the unlabeled training image, wherein the predicted result is used as a first predicted result; the non-labeling training image is an image containing a new type of target object in a base type data set of the COCO data set;
in the present embodiment, first, an image x containing a new class of target object in a base class data set for a COCO data set u Generating x using a pre-trained TFA network u Prediction result y of new type target object 0 I.e. the first prediction result (position, category and corresponding confidence score of the object candidate box containing the new class of target object).
Step A20, removing first object candidate frames with confidence scores lower than a set background threshold value from the first prediction result, calculating the values of the rest similar first object candidate frames at set quantiles, taking the values as primary class self-adaptive thresholds of the new classes, and taking the first object candidate frames with confidence scores higher than the primary class self-adaptive thresholds of the corresponding classes as first pseudo tags; the first object candidate frame is an object candidate frame corresponding to the first prediction result;
in this embodiment, confidence score statistics is performed on the object candidate frames corresponding to the first prediction result (i.e., the first object candidate frames) according to the categories, and confidence scores lower than τ are removed b Wherein τ b As a background threshold constant, the threshold value is preferably set to 0.1 in the application; combining the rest first object candidate frames belonging to the same new class, and marking as wherein />Confidence score for the ith first object candidate box of the jth new class, j=1, 2, …, C New ,C New Is the number of categories of the new category. Calculation of +_In the use of the quantile function in the python third party data analysis tool library, pandas. DataFrame. Quat (see https:// pandas. Pydata. Org/docs/reference/api/pandas. DataFrame. Quat. Html for details of implementation)>The value at the q-ary point is used as the primary class adaptive threshold value of the jth new class +.>Wherein q-ary point is preferably set to 0.85 in the present application. Taking the calculated primary class self-adaptive threshold value of each new class as a screening threshold value of a primary class self-adaptive threshold value screening module, and pre-training a prediction result y of the TFA network 0 First object candidate boxes with medium confidence scores higher than the corresponding class screening threshold are used as first pseudo tags +.>
Step A30, respectively carrying out weak data enhancement and strong data enhancement treatment on the non-labeling training image to obtain an image with reinforced weak data and an image with reinforced strong data; inputting the image with the reinforced weak data into a teacher network to obtain a prediction result of a new type of target object in the image with the reinforced weak data, and taking the prediction result as a second prediction result;
in the present embodiment, the image x containing the new class target object in the base class data set for the COCO data set u The weak data enhancement and the strong data enhancement are respectively utilized (the data enhancement mode is shown in Y.C.Liu, C.Y.Ma, Z.He, C.W.Kuo, K.Chen, P.Zhang, B.Wu, Z.Kira, and P.Vajda.Unbiased teacher for semi-supervised object detection.International Conference on Learning Representations, 2021) to obtain the image with the weak data enhancementAnd strong data enhanced image ++>Teacher network is directed at->Generating a prediction result y of a new type of target object u As a second prediction result (position, category, and corresponding confidence score of the object candidate frame including the new class target object).
Step A40, removing a second object candidate frame with the confidence score lower than a set background threshold value from the second prediction result, and taking a second object candidate frame with the confidence score higher than an advanced class self-adaptive threshold value of the corresponding class as a second pseudo tag in the rest second object candidate frames; the second object candidate frame is an object candidate frame corresponding to the second prediction result;
in this embodiment, the second prediction result y is first u The confidence score of the removal of the corresponding object candidate box (i.e., the second object candidate box) is less than τ b The object candidate frame of the (2) is screened by an advanced category self-adaptive threshold screening module to generate a second pseudo tagClasses in which advanced class adaptive threshold screening modulesThe q-class dividing point and the screening threshold of the primary class self-adaptive threshold screening module are used as initialization respectively for the class self-adaptive dividing point and the advanced class self-adaptive threshold, and the class self-adaptive dividing point and the advanced class self-adaptive threshold are dynamically adjusted and updated in the training process. The specific dynamic adjustment and update process is as follows:
recording the number of second pseudo tags of the j new class generated by the n mini-latches accumulated by the advanced class adaptive threshold screening module asn is the number of accumulated mini-latches, and is preferably set to 500 in the present application. When->Computing class adaptive quantiles-> wherein />The number of candidate pseudo tags of the jth new class obtained for accumulating n mini-batches; n (N) max For the maximum number of pseudo tags, the number is preferably set to 250 in the application; when (when)Computing class adaptive quantiles-> wherein ,Nmin For the minimum number of pseudo tags, the number is preferably set to 150 in the application; in other cases, the class-adaptive quantiles remain unchanged. Further, calculating confidence score not lower than τ belonging to the same new class by using the quantile function pandas.DataFrame.quatile in the python third party data analysis tool library pandas library b The value of the second object candidate box at the class adaptive quantile is used to update the advanced class adaptive threshold. Second prediction result y u High-level categories with medium confidence scores higher than corresponding categoriesSecond object candidate box of adaptive threshold as second pseudo tag +.>
Step A50, fusing the first pseudo tag and the second pseudo tag to obtain a third pseudo tag; enhancing the third pseudo tag with the strong data imageA new type of non-labeling sample is formed, the new type of non-labeling sample and the small amount of new type of labeling sample are input into a student network, a loss value is calculated, the weight parameter of the student network is updated, and the weight parameter of a teacher network is updated by utilizing the updated weight parameter of the student network;
in this embodiment, the first pseudo tag is fused by the tag fusion moduleAnd a second pseudo tag->Fusion is carried out, in particular, by ∈ -> and />Merging, taking the merged result as a pseudo tag candidate set, adopting a Non-maximum suppression NMS (Non-Maximum Suppression) to suppress pseudo tags with low confidence scores belonging to the same object in the pseudo tag candidate set, and further obtaining a tag merging result>Namely a third pseudo tag, wherein the judgment standards of the two pseudo tags belonging to the same object in the pseudo tag candidate set are that the category of the two pseudo tags is the same and the intersection ratio IoU (Intersection over Union) is not lower than the set intersectionAnd a ratio threshold, wherein the ratio threshold is preferably 0.5 in the present application.
Third pseudo tagImage enhanced by strong data ++>Composition of new class unlabeled sample->Along with a small number of new class annotation samples (x l ,y l ) Inputting the training data into a student network for training. The loss function of the student network is calculated as follows:
wherein ,for the total loss of the student network during training, < ->To annotate the sample for new class for supervision loss, +.>To be directed to the unsupervised loss of new class unlabeled samples, lambda u In order to lose the weight super parameter, the priority is set to 2 in the application. Supervision of loss->And unsupervised loss->The specific forms of (2) are shown in formulas (2) and (3), respectively:
wherein ,Nl and Nu Respectively representing the number of new class marked samples and new class unmarked samples in one mini-batch, N in the application l and Nu Are all preferably 8;representing the kth new class annotation sample, +.>Representing the kth new class unlabeled exemplar, < >>The RPN classification loss, the RPN regression loss, the RoI head classification loss, and the RoI head regression loss are shown, respectively. In the present application, the RPN classification loss preferably employs a standard cross entropy loss, and the RPN regression loss and the RoI head regression loss preferably employ smooth-L 1 Loss, roI head classification loss, is preferably a focal loss, see in particular T.Y.Lin, P.Goyal, R.Girshick, K.He, and P.Doll. Focal loss for dense object detection IEEE International Conference on Computer Vision,2017:2980-2988.
Total loss based on student networkWeight parameter θ of student network st The updates are as follows:
wherein, gamma is a learning rate super parameter, and the learning rate super parameter is preferably set to be 0.01 in the application.
Further combine theta st And carrying out a weight parameter theta of a teacher network t Is updated by:
θ t ←αθ t +(1-α)θ st (5)
wherein alpha is a smooth super parameter, and the alpha is preferably set to 0.999 in the application.
Step A60, the steps A10 to A50 are circulated until a trained student network is obtained.
2. Inference process for small sample target detection
Step S10, acquiring an image of a scene to be detected through a vision sensor as a first image;
in the present embodiment, an image of a scene to be detected is acquired as a first image by the vision sensor ZED 2.
And step S20, the first image is sent into a student network in a trained semi-supervised small sample target detection network based on the pseudo tag, and a target detection result corresponding to the scene image to be detected is obtained.
In this embodiment, the first image is sent to a student network in a trained semi-supervised small sample target detection network based on pseudo labels, so as to obtain a bounding box of a new class of target objects contained in the first image and a class corresponding to the bounding box, thereby realizing small sample target detection.
The application can effectively improve the adaptability of the small sample target detection network to new objects, and provides technical support for small sample target detection of service robots in the fields of home service, office work and the like.
A second embodiment of the present application is a semi-supervised small sample target detection system based on pseudo labels, as shown in fig. 2, including: an image acquisition module 100 and a small sample target detection module 200;
the image acquisition module 100 is configured to acquire an image of a scene to be detected through a vision sensor as a first image;
the small sample target detection module 200 is configured to send the first image to a student network in a trained semi-supervised small sample target detection network based on a pseudo tag, so as to obtain a target detection result corresponding to the scene image to be detected;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes and related descriptions of the above-described system may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.
It should be noted that, in the semi-supervised small sample target detection system based on the pseudo tag provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present application are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present application are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present application.
A storage device according to a third embodiment of the present application stores therein a plurality of programs adapted to be loaded and executed by a processor to implement a pseudo tag-based semi-supervised small sample target detection method as described above.
A processing device according to a fourth embodiment of the present application includes a processor, a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement a pseudo tag based semi-supervised small sample target detection method as described above.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device and the related description of the foregoing description may refer to the corresponding process in the foregoing method example, which is not repeated herein.
Those of skill in the art will appreciate that the various illustrative modules, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software modules, method steps, may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "first," "second," "third," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
Thus far, the technical solution of the present application has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present application is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present application, and such modifications and substitutions will be within the scope of the present application.

Claims (10)

1. The semi-supervised small sample target detection method based on the pseudo tag is characterized by comprising the following steps of:
step S10, acquiring an image of a scene to be detected through a vision sensor as a first image;
step S20, the first image is sent into a student network in a trained semi-supervised small sample target detection network based on pseudo labels, and a target detection result corresponding to the scene image to be detected is obtained;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
2. The semi-supervised small sample target detection method based on the pseudo tag as set forth in claim 1, wherein the training method of the student network is as follows:
step A10, training data is obtained, wherein the training data comprises a non-labeling training image and a small number of new class labeling samples; inputting the unlabeled training image into a pre-trained TFA network to obtain a predicted result of a new type of target object in the unlabeled training image, wherein the predicted result is used as a first predicted result; the non-labeling training image is an image containing a new type of target object in a base type data set of the COCO data set;
step A20, removing first object candidate frames with confidence scores lower than a set background threshold value from the first prediction result, calculating the values of the rest similar first object candidate frames at set quantiles, taking the values as primary class self-adaptive thresholds of the new classes, and taking the first object candidate frames with confidence scores higher than the primary class self-adaptive thresholds of the corresponding classes as first pseudo tags; the first object candidate frame is an object candidate frame corresponding to the first prediction result;
step A30, respectively carrying out weak data enhancement and strong data enhancement treatment on the non-labeling training image to obtain an image with reinforced weak data and an image with reinforced strong data; inputting the image with the reinforced weak data into a teacher network to obtain a prediction result of a new type of target object in the image with the reinforced weak data, and taking the prediction result as a second prediction result;
step A40, removing a second object candidate frame with the confidence score lower than a set background threshold value from the second prediction result, and taking a second object candidate frame with the confidence score higher than an advanced class self-adaptive threshold value of the corresponding class as a second pseudo tag in the rest second object candidate frames; the second object candidate frame is an object candidate frame corresponding to the second prediction result;
step A50, fusing the first pseudo tag and the second pseudo tag to obtain a third pseudo tag; enhancing the third pseudo tag with the strong data imageA new type of non-labeling sample is formed, the new type of non-labeling sample and the small amount of new type of labeling sample are input into a student network, a loss value is calculated, the weight parameter of the student network is updated, and the weight parameter of a teacher network is updated by utilizing the updated weight parameter of the student network;
step A60, the steps A10 to A50 are circulated until a trained student network is obtained.
3. The pseudo-label-based semi-supervised small sample target detection method is characterized by calculating the value of the remaining similar first object candidate frames at the set dividing point, and comprises the following steps:
and calculating the value of the remaining similar first object candidate frames at the set quantiles by using the quantile function in the python third-party data analysis tool library pandas library.
4. The method for detecting the semi-supervised small sample target based on the pseudo tag according to claim 2, wherein the second object candidate frame with the confidence score higher than the high-level class adaptive threshold value of the corresponding class is taken as the second pseudo tag in the rest second object candidate frames, and the method comprises the following steps:
initializing the high-level class self-adaptive threshold value by using the primary class self-adaptive threshold value, and dynamically adjusting and updating in the training process; the dynamic adjustment and update process is as follows:
record the number of second pseudo tags accumulating the j new classes generated by n mini-latches asn is the number of accumulated mini-batch, when +.>Computing class adaptive quantiles-> wherein ,/>N for accumulating the number of candidate pseudo tags of the jth new class obtained by N mini-latches max Is the maximum number of pseudo tags;
when (when)Computing class adaptive quantiles-> wherein ,Nmin Is the minimum number of pseudo tags; in other cases, the class adaptive quantile remains unchanged;
further, by adopting a quantile function in the python third-party data analysis tool library pandas library, the confidence score which belongs to the same new class is calculated to be not lower than the background threshold tau b Updating the high-level class self-adaptive threshold value according to the value of the second object candidate frame in the class self-adaptive quantile;
and taking the second object candidate frame with the confidence score higher than the high-level class self-adaptive threshold value of the corresponding class in the second prediction result as a second pseudo tag.
5. The method for detecting the semi-supervised small sample target based on the pseudo tag according to claim 2 is characterized in that the first pseudo tag and the second pseudo tag are fused to obtain a third pseudo tag, and the method comprises the following steps:
combining the first pseudo tag and the second pseudo tag, and taking the combined result as a pseudo tag candidate set;
in the pseudo tag candidate set, suppressing the pseudo tags with low confidence scores belonging to the same object by adopting a non-maximum suppression NMS (network management system), and further obtaining a tag fusion result, namely a third pseudo tag; the judging standards that the two pseudo tags in the pseudo tag candidate set belong to the same object are that the two pseudo tags are the same in category and the overlap ratio IoU is not lower than a set overlap ratio threshold.
6. The semi-supervised small sample target detection method based on pseudo labels as set forth in claim 2, wherein the student network has a loss function in the training process of:
wherein ,for the total loss of the student network during training, < ->To annotate the sample for new class for supervision loss, +.>To be directed to the unsupervised loss of new class unlabeled samples, lambda ut To lose the weight super-parameter, N l and Nu Respectively representing the number of new type marked samples and new type unmarked samples in a mini-batch, +.>Representing the kth new class annotation sample, +.>Representing the kth new class unlabeled exemplar, < >>The RPN classification loss, the RPN regression loss, the RoI head classification loss, and the RoI head regression loss are shown, respectively.
7. The pseudo-label-based semi-supervised small sample target detection method is characterized by updating weight parameters of the student network and updating weight parameters of a teacher network by using the updated weight parameters of the student network, and comprises the following steps:
θ t ←αθ t +(1-α)θ st
wherein, gamma is learning rate super-parameter, alpha is smooth super-parameter, theta st Is the weight parameter of the student network, theta t Representing the weight parameters of the teacher's network.
8. A pseudo tag based semi-supervised small sample target detection system, comprising: the system comprises an image acquisition module and a small sample target detection module;
the image acquisition module is configured to acquire an image of a scene to be detected through the vision sensor and serve as a first image;
the small sample target detection module is configured to send the first image into a student network in a trained semi-supervised small sample target detection network based on a pseudo tag to obtain a target detection result corresponding to the scene image to be detected;
the semi-supervised small sample target detection network based on the pseudo tag comprises a pre-training TFA network, a teacher network and a student network; the network structures of the pre-training TFA network, the teacher network and the student network are the same.
9. A storage device having a plurality of programs stored therein, wherein the programs are adapted to be loaded and executed by a processor to implement a pseudo tag based semi-supervised small sample target detection method as recited in any of claims 1-7.
10. A processing device, comprising a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; a method for semi-supervised small sample target detection based on pseudo labels as recited in any one of claims 1-7, wherein the program is adapted to be loaded and executed by a processor.
CN202310919191.0A 2023-07-25 2023-07-25 Semi-supervised small sample target detection method based on pseudo tag Pending CN116958889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310919191.0A CN116958889A (en) 2023-07-25 2023-07-25 Semi-supervised small sample target detection method based on pseudo tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310919191.0A CN116958889A (en) 2023-07-25 2023-07-25 Semi-supervised small sample target detection method based on pseudo tag

Publications (1)

Publication Number Publication Date
CN116958889A true CN116958889A (en) 2023-10-27

Family

ID=88444127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310919191.0A Pending CN116958889A (en) 2023-07-25 2023-07-25 Semi-supervised small sample target detection method based on pseudo tag

Country Status (1)

Country Link
CN (1) CN116958889A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710970A (en) * 2024-02-05 2024-03-15 武汉互创联合科技有限公司 Embryo cell multinuclear target detection method based on semi-supervised algorithm
CN117975241A (en) * 2024-03-29 2024-05-03 厦门大学 Directional target segmentation-oriented semi-supervised learning method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710970A (en) * 2024-02-05 2024-03-15 武汉互创联合科技有限公司 Embryo cell multinuclear target detection method based on semi-supervised algorithm
CN117710970B (en) * 2024-02-05 2024-05-03 武汉互创联合科技有限公司 Embryo cell multinuclear target detection method based on semi-supervised algorithm
CN117975241A (en) * 2024-03-29 2024-05-03 厦门大学 Directional target segmentation-oriented semi-supervised learning method

Similar Documents

Publication Publication Date Title
CN109741332B (en) Man-machine cooperative image segmentation and annotation method
CN108256561B (en) Multi-source domain adaptive migration method and system based on counterstudy
CN116958889A (en) Semi-supervised small sample target detection method based on pseudo tag
CN111161311A (en) Visual multi-target tracking method and device based on deep learning
US20230042187A1 (en) Behavior recognition method and system, electronic device and computer-readable storage medium
Obinata et al. Temporal extension module for skeleton-based action recognition
CN113221903B (en) Cross-domain self-adaptive semantic segmentation method and system
CN104778481A (en) Method and device for creating sample library for large-scale face mode analysis
CN108764084A (en) Video classification methods based on spatial domain sorter network and the time domain network integration
CN112949693A (en) Training method of image classification model, image classification method, device and equipment
CN104680193A (en) Online target classification method and system based on fast similarity network fusion algorithm
CN116977633A (en) Feature element segmentation model training method, feature element segmentation method and device
CN113191183A (en) Unsupervised domain false label correction method and unsupervised domain false label correction device in personnel re-identification
Xiao et al. Self-explanatory deep salient object detection
CN112949456B (en) Video feature extraction model training and video feature extraction method and device
CN111127432B (en) Medical image detection method, device, equipment and storage medium
CN113192108A (en) Human-in-loop training method for visual tracking model and related device
CN110458867B (en) Target tracking method based on attention circulation network
Nikpour et al. Deep reinforcement learning in human activity recognition: A survey
CN113052191A (en) Training method, device, equipment and medium of neural language network model
CN114022509A (en) Target tracking method based on monitoring videos of multiple animals and related equipment
VEERAPPAN et al. Fish counting through underwater fish detection using deep learning techniques
CN117746266B (en) Tree crown detection method, device and medium based on semi-supervised interactive learning
Gong et al. GAM-YOLOv7-tiny and Soft-NMS-AlexNet: Improved lightweight sheep body object detection and pose estimation network
CN114140654B (en) Image action recognition method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination