CN117975190B - Method and device for processing simulated learning mixed sample based on vision pre-training model - Google Patents
Method and device for processing simulated learning mixed sample based on vision pre-training model Download PDFInfo
- Publication number
- CN117975190B CN117975190B CN202311868899.4A CN202311868899A CN117975190B CN 117975190 B CN117975190 B CN 117975190B CN 202311868899 A CN202311868899 A CN 202311868899A CN 117975190 B CN117975190 B CN 117975190B
- Authority
- CN
- China
- Prior art keywords
- sample
- expert
- network
- sample set
- training model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 85
- 238000012545 processing Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000006870 function Effects 0.000 claims abstract description 103
- 238000011156 evaluation Methods 0.000 claims abstract description 38
- 238000009826 distribution Methods 0.000 claims abstract description 33
- 238000012512 characterization method Methods 0.000 claims description 37
- 230000009471 action Effects 0.000 claims description 34
- 239000013598 vector Substances 0.000 claims description 34
- 230000000007 visual effect Effects 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 14
- 238000003672 processing method Methods 0.000 claims description 12
- 239000000203 mixture Substances 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000003709 image segmentation Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 3
- 230000008485 antagonism Effects 0.000 claims description 2
- 230000000875 corresponding effect Effects 0.000 description 46
- 238000004088 simulation Methods 0.000 description 16
- 239000003795 chemical substances by application Substances 0.000 description 12
- 230000001186 cumulative effect Effects 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method and a device for processing a simulated learning mixed sample based on a vision pre-training model, wherein the method comprises the following steps: acquiring an expert sample set; adding target noise to the suboptimal expert sample to obtain a noise expert sample, and obtaining a mixed sample set according to the noise expert sample and the optimal expert sample; calibrating weight coefficients of a mixed sample set, predicting and scoring the redistributed mixed sample set, training a strategy network and a reward function network according to a scoring result, scoring each sample of an evaluation data set according to a target reward function network to obtain a prediction sequence corresponding to the evaluation data set, updating the weight coefficients corresponding to each sample in the redistributed mixed sample set, and finally performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain an optimized expert sample. According to the method, differential learning is performed on mixed expert samples with different qualities, so that the distribution of data set samples is improved, and the generalization capability of the simulated learning intelligent agent is improved.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a method and a device for processing a simulated learning mixed sample based on a vision pre-training model.
Background
With the continuous development of high and new technologies such as robots, automatic driving automobiles, game intelligent agents and the like, how to complete complex decision tasks and quickly adapt to environmental changes becomes an important research problem.
The countermeasure generation type imitative learning is one of representative algorithms of imitative learning, and by referring to the thought of a countermeasure generation Network (GENERATIVE ADVERSARIAL Network, GAN), an agent approaches to the decision capability of an expert agent through countermeasure training of the agent and a reward function; challenge-generating-type simulation learning aims to simulate the behavior of an expert by training a generating model so as to realize automatic solution of tasks; in the method, a generating model is trained to generate samples similar to expert behaviors, and a discriminating model is used to distinguish between the samples generated by the generating model and the real samples of the expert so as to promote the generating model to gradually approach the expert behaviors.
In the related art, the counter generation type imitation learning relies on high-quality data, under many task scenes, due to the limitation of labor cost, sufficient optimal expert samples are lacking for imitation learning intelligent agents to be sufficiently trained, and common expert data sets generally contain sub-optimal expert samples, and the imitation of the sub-optimal expert samples reduces the performance of imitation learning algorithms, so that the efficiency of processing related characters is low; in addition, the generalization capability of data in the expert data set in a high-dimensional state space is poor, as the dimension of the state space increases, for example, in a visual observation task scene (state is characterized as a high-dimensional image), redundant information about the environment greatly increases, and the change condition of an agent main body is difficult to capture.
Disclosure of Invention
The invention provides a sample processing method and device based on an countermeasure generation type imitation learning algorithm, which are used for solving the defects that the countermeasure generation type imitation learning algorithm in the prior art depends on high-quality data, but the high-quality data is lack, and the imitation learning agent cannot be fully trained due to the influence of low-quality data, so that the training effect is poor, and in addition, the generalization capability of the existing expert data set in a high-dimensional state space is poor, so that the generalization capability of the trained agent is low, the quality of the expert sample set is improved, and the generalization capability of the imitation learning agent is further improved.
The invention provides a method for processing a simulated learning mixed sample based on a vision pre-training model, which comprises the following steps:
acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample;
Adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to an countermeasure generation network to obtain a mixed sample set;
calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to a reward function network to obtain a scoring result; training the strategy network and the reward function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target reward function network;
Scoring each sample in an evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing simulated learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
The invention also provides a method for processing the simulated learning mixed sample based on the vision pre-training model, wherein the method for acquiring the expert sample set comprises the following steps:
and carrying out forward graph reasoning on each sample in the expert sample set based on the vision pre-training model, and determining a feature graph corresponding to the network middle layer as the effective feature of the expert sample set.
The invention also provides a visual pre-training model-based simulated learning mixed sample processing method, wherein the countermeasure generation network comprises a generation network and a discrimination network;
The processing the noise expert sample and the optimal expert sample according to the countermeasure generation network, and obtaining a mixed sample set includes:
extracting characteristic features of the noise expert samples and the optimal expert samples according to the generating network to obtain a state characterization vector;
performing feature discrimination on the source of the state characterization vector according to the discrimination network to obtain discrimination results;
Calculating a loss value of the generating network according to the judging result and the first loss function, and carrying out parameter optimization on the generating network according to the loss value of the generating network to obtain the optimized generating network; calculating a loss value of the discrimination network according to the discrimination result and the second loss function, and carrying out parameter optimization on the discrimination network according to the loss value of the discrimination network to obtain the optimized discrimination network so as to output the mixed sample set;
The first loss function is determined based on an occupancy metric of an optimal expert sample state action pair, an occupancy metric after normal distribution noise, the state characterization vector, a conditional probability distribution corresponding to the state characterization vector, and a discrimination network parameter, and the second loss function is determined based on an occupancy metric of the optimal expert sample state action pair, an occupancy metric after normal distribution noise, the state characterization vector, a conditional probability corresponding to the state characterization vector, and a generation network parameter.
The invention also provides a method for processing the simulated learning mixed sample based on the vision pre-training model, wherein the first loss function comprises the following steps:
the second loss function includes:
Where E is the expectation, D (z) is the discrimination network, ρ * (s, a) is the occupancy measure of the optimal expert sample state action pair, ρ' (s, a) is the occupancy measure after adding normal distribution noise, z is the state characterization vector, p (z|x) is the conditional probability distribution corresponding to the state characterization vector, θ D is the discrimination network parameter, and θ E is the generation network parameter.
The invention also provides a method for processing the simulated learning mixed sample based on the vision pre-training model, wherein the first loss function further comprises the following steps:
the target regular term is determined according to the KL divergence corresponding to the optimal expert sample and the state characterization vector;
The second loss function further includes: the target regularization term.
The invention also provides a method for processing the simulated learning mixed sample based on the vision pre-training model, wherein the evaluation data set is obtained through the expert sample set according to prior sequencing;
Calculating the sorting error loss according to the prediction sorting, updating the weight coefficient corresponding to each sample in the redistributed mixed sample set through gradient optimization, wherein the obtaining the redistributed weight coefficient comprises the following steps:
calculating a ranking difference penalty by the predictive ranking, the prior ranking, and a ranking penalty function;
Determining the weight coefficient after redistribution according to the sorting difference loss;
the ordering loss function includes:
Wherein i and j are sample numbers, eta ζi is the actual accumulated expected return of the ith sample, eta' ζi is the predicted accumulated expected return of the ith sample; η ζj is the true cumulative expected return for sample j and η' ζj is the predicted cumulative expected return for sample j.
The invention also provides a device for processing the simulated learning mixed sample based on the vision pre-training model, which comprises the following components:
the sample acquisition module is used for acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample, and each sample in the expert sample set corresponds to different weight coefficients;
The first processing module is used for adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to an antagonism generation network to obtain a mixed sample set;
the second processing module is used for calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to a reward function network to obtain a scoring result; training the strategy network and the reward function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target reward function network;
The third processing module is used for scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method of processing a simulated learning hybrid sample based on a visual pre-training model as described in any of the above when the program is executed.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of processing a simulated learning hybrid sample based on a visual pre-training model as described in any one of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of processing a simulated learning mixture based on a visual pre-training model as described in any one of the above.
According to the visual pre-training model-based simulation learning mixed sample processing method and device provided by the invention, the target noise is added to the suboptimal expert sample, so that the data distribution of the suboptimal expert sample is improved, the counter generation network is subjected to parameter optimization according to the noise expert sample and the optimal expert sample, a mixed sample set is obtained, the characteristic distribution of the suboptimal expert sample is improved, the weight coefficients of all samples in the mixed sample set are calibrated, the calibrated mixed sample set is predicted and scored according to the strategy network, the strategy network and the reward function network are updated, the samples in the evaluation data set are scored according to the target reward function network, the corresponding prediction ordering is obtained, the weight coefficients corresponding to all samples in the redistributed mixed sample set are updated according to the prediction ordering, the simulation learning is performed on the weight coefficients after the redistribution according to the target strategy network, the optimized expert sample is obtained, the differential learning is performed on the mixed expert sample with different quality, the sample distribution of the data set is improved, and the generalization capability of a simulation learning intelligent body is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a method for processing a simulated learning mixture based on a visual pre-training model according to the present invention;
FIG. 2 is a second flow chart of a method for processing a simulated learning mixture based on a visual pre-training model according to the present invention;
FIG. 3 is a schematic flow chart of extracting an intermediate layer feature map based on a vision pre-training model provided by the invention;
FIG. 4 is a third flow chart of a method for processing a simulated learning mixture based on a visual pre-training model according to the present invention;
FIG. 5 is a schematic diagram of a device for processing a simulated learning mixture based on a visual pre-training model according to the present invention;
fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The following describes the method and apparatus for processing a simulated learning mixture based on a visual pre-training model according to the present invention with reference to fig. 1 to 5.
Fig. 1 is a schematic flow chart of a method for processing a simulated learning mixed sample based on a vision pre-training model, as shown in fig. 1, which comprises the following steps:
Step 110, an expert sample set is obtained, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample.
In this step, the expert sample set may include training samples or test samples required for at least one of the following scenarios: image processing tasks such as image classification, object detection, image segmentation, and the like.
For example, for an image classification task, the expert sample set includes image data to be classified and sample image data thereof, and for an image segmentation task, the expert sample set includes image data to be segmented and sample image data thereof.
In this embodiment, the optimal expert samples are of higher quality than the sub-optimal expert samples, e.g. the samples contain more data content and more useful information, etc.
And 120, adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to the countermeasure generation network to obtain a mixed sample set.
In this step, the target noise includes normally distributed noise, such as gaussian noise or gaussian white noise, or the like.
In this embodiment, the countermeasure generation network includes a generation network and a discrimination network; the generating network takes the noise expert sample and the optimal expert sample as characteristics, extracts new state characterization, judges the source of the new state characterization through the judging network, calculates corresponding loss functions according to judging results, and accordingly achieves optimization judgment on the countermeasure generating network and the judging network parameters, and the implicit layer state characterization of the suboptimal expert sample approaches to the characterization of the optimal expert sample through supervision and guidance of feature extractor learning.
In this embodiment, the behavior features of the suboptimal expert sample and the behavior features of the optimal expert sample are compared, the noise signal is used to simulate the difference of distribution between the two, and the model is compatible with the deep generation type simulation learning algorithm architecture; specifically, by minimizing the distribution distance, such as KL (Kullback-Leibler) divergence, on both distributions, the goal of enabling the state representation of the suboptimal expert sample after feature extraction to approach the state representation of the optimal expert sample is achieved.
Step 130, calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to the reward function network to obtain a scoring result; training the strategy network and the reward function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target reward function network.
In this step, each expert sample corresponds to a different weight coefficient, and the higher the weight coefficient corresponding to the optimal expert sample in the expert sample set is, the higher the quality of the expert sample set is, which is more suitable as the input data of the image processing task.
For example, the weight coefficient β e (0, 1), the higher the quality of a sample if the weight coefficient of the sample is closer to 1.
In this step, the imitation learning strategy is used to make the imitation of the sample more important the closer to 1 the sample's corresponding weight coefficient (the greater the sample weight ratio).
FIG. 2 is a second flow chart of a method for processing a simulated learning mixed sample based on a visual pre-training model according to the present invention, in the embodiment shown in FIG. 2, in the weight learning stage, a weight coefficient is calibrated for each sample in the mixed sample set to indicate the quality of the sample (initialized to 1, representing that the quality of each sample is considered to be the same a priori); and (3) calibrating the weight coefficient to obtain an expert sample which is redistributed by the weight coefficient, namely a redistributed mixed sample set.
In this embodiment, the redistributed mixed sample set is taken as input into the policy network, predicting the corresponding action output.
For example, setting a scene simulating learning is a sequence decision task, taking automatic driving as an example, expert samples are road condition images acquired by cameras, and predicted actions are steering wheel rotation angle and throttle force.
In the embodiment shown in fig. 2, in the stage of learning simulation, the action output of the strategy network, that is, the action prediction result is input into the reward function network, different actions are scored, (for example, high scores are scored for expert samples, low scores are scored for the generation samples of the strategy network), then the strategy network and the reward function network are updated by calculating the loss function, so as to obtain the target strategy network and the target reward function network, so as to improve the capability of the strategy network to generate similar expert actions, and improve the capability of the reward function to distinguish the generation samples of the strategy network, that is, countermeasure training.
Step 140, scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating a ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
In this step, the corresponding evaluation data set is extracted by sorting (i.e., a priori sorting) from the mixed sample set by a preset weighting factor.
For example, after the objective reward function network is obtained, a small portion of samples are extracted from the original mixed expert samples, and the relative goodness of the samples is calibrated as a priori knowledge to form an evaluation data set.
In this embodiment, the evaluation dataset is obtained by expert sample sets in a priori ordering; calculating the sorting error loss according to the prediction sorting update, optimizing the weight coefficient corresponding to each sample in the redistributed mixed sample set through the gradient, and obtaining the redistributed weight coefficient comprises the following steps: calculating the sorting difference loss through the prediction sorting, the priori sorting and the sorting loss function; determining weight coefficients after redistribution according to the sorting difference loss; the ordering loss function includes:
Wherein i and j are sample numbers, eta ζi is the actual accumulated expected return of the ith sample, eta' ζi is the predicted accumulated expected return of the ith sample; η ζj is the true cumulative expected return for sample j and η' ζj is the predicted cumulative expected return for sample j.
In the embodiment shown in fig. 2, the data of the evaluation data set is scored by using the reward function network to obtain a prediction order, the prediction order and the calibrated prior order information can be compared to calculate the order difference loss, and the weight coefficient is updated by using the loss function, wherein the smaller the loss is, the better the reward function mathematics imitating the learning stage is, and the indirect weight coefficient is reasonably set.
In this embodiment, after obtaining the weight coefficient after redistribution, sending the mixed sample corresponding to the weight coefficient after redistribution into the policy network to perform imitation learning, and performing imitation learning and weight learning alternately until the preset iteration turns are reached, and stopping to output the optimized expert sample.
In the embodiment, in a specific training process, self-adaptive optimization is performed on the weight coefficient, so that differential learning is performed on each sample in the expert sample set, and the weight coefficient of each sample is redistributed to realize the redistribution of the current imitation learning strategy, in each differential learning process, a return value is output by an optimized generation network, and the weight coefficient corresponding to each sample is reordered by an optimized discrimination network according to the return value and the ordering loss function; in the process of multiple training, the adaptive optimization of the weight coefficient sequencing corresponding to each sample is realized by maximizing the accumulated expected return value, so that the distribution of each sample in the expert sample set is improved, and the quality of the expert sample set is improved; wherein the maximized cumulative expected return is a cumulative value of expected returns predicted by the reward function after training by the opposed inverse reinforcement learning algorithm.
Specifically, assuming that the current hybrid expert policy is denoted pi d, the occupancy metric of the current policy during interaction with the environment isThe new strategy pi new can be obtained after the state action pairs of pi d are redistributed; its occupancy metric can be expressed asThe simulated learning loss function after weight coefficient redistribution can be expressed as:
Wherein, L Imitation of can be the loss function of any traditional imitation learning algorithm such as behavior cloning or inverse reinforcement learning.
In the embodiment, different expert samples are subjected to differential learning through the weight coefficients, so that the performance of a simulated learning algorithm can be maximized, namely the aim of maximizing accumulated expected returns is fulfilled; the specific optimization problem can be expressed as:
wherein, beta * represents the optimal weight coefficient distribution, Representing the cumulative expected return for policy pi new after redistribution, expressed as:
Wherein R is a reward function of the environment, and gamma is an attenuation coefficient; the purpose of weight learning is to maximize the performance of the simulated learning algorithm, so that the effect of current weight coefficient learning can be reflected by evaluating the performance of the simulated learning algorithm; let the current time be t, s t denote expert sample state information at time t, and a t denote expert sample action information at time t.
According to the simulation learning mixed sample processing method based on the vision pre-training model, target noise is added to the suboptimal expert sample, data distribution of the suboptimal expert sample is improved, parameter optimization is conducted on a countermeasure generation network according to the noise expert sample and the optimal expert sample, a mixed sample set is obtained, feature distribution of the suboptimal expert sample is improved, each sample in the mixed sample set is calibrated with weight coefficients, prediction and scoring are conducted on the calibrated mixed sample set according to a strategy network, the strategy network and a reward function network are updated, each sample in an evaluation data set is scored according to the target reward function network, corresponding prediction sorting is obtained, weight coefficients corresponding to each sample in the mixed sample set with redistribution are updated according to the prediction sorting, simulation learning is conducted on the weight coefficients after redistribution according to the target strategy network, the optimized expert sample is obtained, differential learning can be conducted on the mixed expert samples with different qualities, the sample distribution of the data set is improved, and generalization capability of a simulation learning intelligent body is improved.
In some embodiments, obtaining the expert sample set includes: and carrying out forward graph reasoning on each sample in the expert sample set based on the vision pre-training model, and determining a feature graph corresponding to the network middle layer as the effective feature of the expert sample set.
In this embodiment, the visual pre-training model is a deep learning model pre-trained using a large scale data set, which is generally useful for network parameter initialization for downstream tasks such as image classification, object detection, and image segmentation.
In the embodiment, the general vision reinforcement learning method adopts high-dimensional image information as the observation of the environment, and has higher requirements on the state abstraction of the feature extractor; because the factors such as the expressive, trainable and capacity of the image feature extractor are key links of algorithm generalization in the countermeasure generation type imitation learning scene, the feature realization generation type imitation learning extracted by the large-scale visual data set-based pre-training model can improve the generalization capability of the intelligent agent under a new task or a new environment.
Fig. 3 is a schematic flow chart of extracting a feature map of an intermediate layer based on a vision pre-training model, in the embodiment shown in fig. 3, an original feature map is input into the vision pre-training model (corresponding to a large-scale vision data set pre-training convolutional neural network), forward reasoning is only performed on input image data through fixed network parameters, and the feature map of the intermediate layer of the network is extracted as vision observation input and is input into a subsequent simulated learning algorithm.
In this embodiment, only the normalization layer (BatchNorm) of the pre-training model may be updated with parameters during reasoning, i.e. by keeping the mean and standard deviation parameters corresponding to the normalization layer of the pre-training model updated during training, since the statistical data updated at BatchNorm helps to better adapt to shifts in visual observations, thereby improving the generalization ability of the agent.
According to the simulation learning mixed sample processing method based on the vision pre-training model, forward graph reasoning is conducted on each sample in the expert sample set through the vision pre-training model, the feature graph corresponding to the network middle layer is determined to be effective features of the expert sample set, research and development cost is reduced, data input is provided for a follow-up generation type simulation learning algorithm, and therefore generalization capability of an intelligent agent under a new task or a new environment is improved.
In some embodiments, the countermeasure generation network includes a generation network and a discrimination network; processing the noise expert samples and the optimal expert samples according to the countermeasure generation network, obtaining a mixed sample set comprises: extracting characteristic features of the noise expert samples and the optimal expert samples according to the generating network to obtain a state characterization vector; performing feature discrimination on the source of the state characterization vector according to the discrimination network to obtain discrimination results; calculating a loss value of the generating network according to the judging result and the first loss function, and carrying out parameter optimization on the generating network according to the loss value of the generating network to obtain an optimized generating network; and calculating the loss value of the discrimination network according to the discrimination result and the second loss function, and carrying out parameter optimization on the discrimination network according to the loss value of the discrimination network to obtain an optimized discrimination network so as to output a mixed sample set.
In this embodiment, a feature extraction network E (corresponding generation network) and a discrimination network D are initialized; collecting an optimal expert sample xi * and a suboptimal expert sample xi' in an expert sample set, injecting normal distribution noise E into the suboptimal expert sample to obtain a noise expert sample, and inputting the optimal expert sample and the noise expert sample into a feature extraction network E to obtain a new state representation z, namely a state representation vector; and inputting the state characterization vector into a discrimination network, discriminating the characteristic source, calculating a loss function according to the discrimination result, and optimizing the characteristic extraction network and the discrimination network parameters.
The first loss function is determined based on the occupancy metric of the optimal expert sample state action pair, the occupancy metric after normal noise distribution, the state characterization vector, the conditional probability distribution corresponding to the state characterization vector and the discrimination network parameter, and the second loss function is determined based on the occupancy metric of the optimal expert sample state action pair, the occupancy metric after normal noise distribution, the state characterization vector, the conditional probability corresponding to the state characterization vector and the generation network parameter.
Specifically, the first loss function includes:
The second loss function includes:
Where E is the expectation, D (z) is the discrimination network, ρ * (s, a) is the occupancy measure of the optimal expert sample state action pair, ρ' (s, a) is the occupancy measure after adding normal distribution noise, z is the state characterization vector, p (z|x) is the conditional probability distribution corresponding to the state characterization vector, θ D is the discrimination network parameter, and θ E is the generation network parameter.
FIG. 4 is a third flow chart of a method for processing a simulated learning mixed sample based on a visual pre-training model according to the present invention, in the embodiment shown in FIG. 4, noise expert samples are obtained by injecting noise into sub-optimal expert samples, and the optimal expert samples and the noise expert samples are input into a generation network (corresponding feature extraction network) to obtain state characterization vectors (corresponding state features), wherein the state characterization vectors are used for providing mutual information constraint to the optimal expert samples; inputting the state characterization vector into a discrimination network to perform feature discrimination on the feature source to obtain a discrimination result, calculating a loss value of the generation network according to the discrimination result, updating the network gradient of the feature extraction network according to the loss value of the generation network, and obtaining an optimized generation network; meanwhile, calculating a loss value of the discrimination network according to the discrimination result, updating the gradient of the discrimination network according to the loss value of the discrimination network, and obtaining an optimized discrimination network, thereby outputting a mixed sample set.
In this embodiment, the first loss function further comprises: the target regular term is determined according to the KL divergence corresponding to the optimal expert sample and the state characterization vector; the second loss function further includes: target regularization term.
In the embodiment, in the training process of generating a network and discriminating the network, the KL divergence of the optimal expert sample and the extracted features can be used as a regularization term and added into a loss function of the feature extraction network, so that the difference between the extracted features and the features of the optimal expert sample is prevented from being too large, and the algorithm performance is improved.
According to the simulation learning mixed sample processing method based on the vision pre-training model, a noise expert sample and an optimal expert sample are processed through an countermeasure generation network to obtain a mixed sample set, a regeneration generation network is used for extracting characteristic features of the noise expert sample and the optimal expert sample to obtain a state characterization vector, the source of the state characterization vector is subjected to characteristic judgment according to a judgment network to obtain a judgment result, and finally the network and the judgment network are respectively optimized and generated according to the judgment result, a first loss function and a second loss function, and the mixed sample set is output; the noise contrast estimation is introduced into the countermeasure generation type imitation learning to improve the data distribution of the suboptimal expert sample, the feature extractor is trained by using the countermeasure generation method through the feature difference between the normal noise simulation optimal expert sample and the suboptimal expert sample, the feature distribution of the suboptimal expert sample is improved, and then the imitation learning algorithm performance under the mixed expert sample is improved.
The device for processing the simulated learning mixed sample based on the vision pre-training model is described below, and the device for processing the simulated learning mixed sample based on the vision pre-training model and the method for processing the simulated learning mixed sample based on the vision pre-training model described above can be referred to correspondingly.
Fig. 5 is a schematic structural diagram of a device for processing a simulated learning mixed sample based on a vision pre-training model according to the present invention, as shown in fig. 5, the device for processing a simulated learning mixed sample based on a vision pre-training model includes: a sample acquisition module 510, a first processing module 520, a second processing module 530, and a third processing module 540.
The sample obtaining module 510 is configured to obtain an expert sample set, where the expert sample set includes an optimal expert sample and a suboptimal expert sample, and each sample in the expert sample set corresponds to a different weight coefficient;
the first processing module 520 is configured to add target noise to the suboptimal expert sample to obtain a noise expert sample, and process the noise expert sample and the optimal expert sample according to the countermeasure generation network to obtain a mixed sample set;
The second processing module 530 is configured to calibrate the weight coefficient for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predict the redistributed mixed sample set according to the policy network to obtain an action prediction result; scoring the action prediction result according to the reward function network to obtain a scoring result; training the strategy network and the rewarding function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target rewarding function network;
The third processing module 540 is configured to score each sample in the evaluation data set according to the target reward function network, obtain a prediction ranking corresponding to the evaluation data set, calculate a ranking error loss according to the prediction ranking, update weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization, obtain redistributed weight coefficients, and perform imitative learning on the redistributed weight coefficients according to the target policy network, so as to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
According to the simulation learning mixed sample processing device based on the vision pre-training model, target noise is added to the suboptimal expert sample, data distribution of the suboptimal expert sample is improved, parameter optimization is conducted on a countermeasure generation network according to the noise expert sample and the optimal expert sample, a mixed sample set is obtained, feature distribution of the suboptimal expert sample is improved, each sample in the mixed sample set is calibrated with weight coefficients, prediction and scoring are conducted on the calibrated mixed sample set according to a strategy network, the strategy network and a reward function network are updated, each sample in an evaluation data set is scored according to the target reward function network, corresponding prediction sorting is obtained, weight coefficients corresponding to each sample in the mixed sample set with redistribution are updated according to the prediction sorting, simulation learning is conducted on the weight coefficients after redistribution according to the target strategy network, the optimized expert sample is obtained, differential learning can be conducted on the mixed expert samples with different qualities, the sample distribution of the data set is improved, and generalization capability of a simulation learning intelligent body is improved.
Fig. 6 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a simulated learning hybrid sample processing method based on a visual pre-training model, the method comprising: acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample; adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to the countermeasure generation network to obtain a mixed sample set; calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to the reward function network to obtain a scoring result; training the strategy network and the rewarding function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target rewarding function network; scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method for processing a simulated learning hybrid sample based on a visual pre-training model provided by the above methods, the method comprising: acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample; adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to the countermeasure generation network to obtain a mixed sample set; calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to the reward function network to obtain a scoring result; training the strategy network and the rewarding function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target rewarding function network; scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of simulated learning mixed sample processing based on a visual pre-training model provided by the methods above, the method comprising: acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample; adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to the countermeasure generation network to obtain a mixed sample set; calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to the reward function network to obtain a scoring result; training the strategy network and the rewarding function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target rewarding function network; scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. A method for processing a simulated learning mixture based on a vision pre-training model, comprising:
acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample;
Adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to an countermeasure generation network to obtain a mixed sample set;
calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to a reward function network to obtain a scoring result; training the strategy network and the reward function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target reward function network;
Scoring each sample in an evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing simulated learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set; the acquiring the expert sample set includes:
Performing forward graph reasoning on each sample in the expert sample set based on a vision pre-training model, and determining a feature graph corresponding to a network middle layer as effective features of the expert sample set;
The forward graph reasoning on each sample in the expert sample set based on the vision pre-training model comprises:
Inputting original feature graphs corresponding to all samples in the expert sample set into the vision pre-training model, and only forward reasoning is carried out on input image data through fixed network parameters, and in the reasoning process, only mean value and standard deviation parameters corresponding to a normalization layer of the vision pre-training model are updated;
The expert sample set includes training samples or test samples required for at least one of the following scenarios: image classification, object detection and image segmentation.
2. The visual pre-training model-based simulated learning mixture sample processing method of claim 1, wherein the countermeasure generation network comprises a generation network and a discrimination network;
The processing the noise expert sample and the optimal expert sample according to the countermeasure generation network, and obtaining a mixed sample set includes:
extracting characteristic features of the noise expert samples and the optimal expert samples according to the generating network to obtain a state characterization vector;
performing feature discrimination on the source of the state characterization vector according to the discrimination network to obtain discrimination results;
Calculating a loss value of the generating network according to the judging result and the first loss function, and carrying out parameter optimization on the generating network according to the loss value of the generating network to obtain the optimized generating network; calculating a loss value of the discrimination network according to the discrimination result and the second loss function, and carrying out parameter optimization on the discrimination network according to the loss value of the discrimination network to obtain the optimized discrimination network so as to output the mixed sample set;
The first loss function is determined based on an occupancy metric of an optimal expert sample state action pair, an occupancy metric after normal distribution noise, the state characterization vector, a conditional probability distribution corresponding to the state characterization vector, and a discrimination network parameter, and the second loss function is determined based on an occupancy metric of the optimal expert sample state action pair, an occupancy metric after normal distribution noise, the state characterization vector, a conditional probability corresponding to the state characterization vector, and a generation network parameter.
3. The visual pre-training model-based simulated learning mixture sample processing method of claim 2, wherein the first loss function comprises:
the second loss function includes:
Where E is the expectation, D (z) is the discrimination network, ρ * (s, a) is the occupancy measure of the optimal expert sample state action pair, ρ' (s, a) is the occupancy measure after adding normal distribution noise, z is the state characterization vector, p (z|x) is the conditional probability distribution corresponding to the state characterization vector, θ D is the discrimination network parameter, and θ E is the generation network parameter.
4. The visual pre-training model-based simulated learning mixture sample processing method of claim 2, wherein the first loss function further comprises:
the target regular term is determined according to the KL divergence corresponding to the optimal expert sample and the state characterization vector;
The second loss function further includes: the target regularization term.
5. The visual pre-training model-based simulated learning hybrid sample processing method of claim 1, wherein said evaluation dataset is obtained by said expert sample set in a priori ordering;
Calculating the sorting error loss according to the prediction sorting, updating the weight coefficient corresponding to each sample in the redistributed mixed sample set through gradient optimization, wherein the obtaining the redistributed weight coefficient comprises the following steps:
calculating a ranking difference penalty by the predictive ranking, the prior ranking, and a ranking penalty function;
Determining the weight coefficient after redistribution according to the sorting difference loss;
the ordering loss function includes:
wherein i and j are sample numbers, The expected return is accumulated for the true number i sample,Accumulating expected returns for predictions for sample i; the expected return is accumulated for the true sample j, The expected return is accumulated for the prediction for sample j.
6. An imitation learning hybrid sample processing device based on a vision pre-training model, comprising:
the sample acquisition module is used for acquiring an expert sample set, wherein the expert sample set comprises an optimal expert sample and a suboptimal expert sample, and each sample in the expert sample set corresponds to different weight coefficients;
The first processing module is used for adding target noise to the suboptimal expert sample to obtain a noise expert sample, and processing the noise expert sample and the optimal expert sample according to an antagonism generation network to obtain a mixed sample set;
the second processing module is used for calibrating weight coefficients for each sample in the mixed sample set to obtain a redistributed mixed sample set, and predicting the redistributed mixed sample set according to a strategy network to obtain an action prediction result; scoring the action prediction result according to a reward function network to obtain a scoring result; training the strategy network and the reward function network according to the discrimination loss function and the scoring result to obtain a target strategy network and a target reward function network;
The third processing module is used for scoring each sample in the evaluation data set according to the target reward function network to obtain a prediction ranking corresponding to the evaluation data set, calculating ranking error loss according to the prediction ranking, updating weight coefficients corresponding to each sample in the redistributed mixed sample set through gradient optimization to obtain redistributed weight coefficients, and performing imitation learning on the redistributed weight coefficients according to the target strategy network to obtain optimized expert samples; the evaluation dataset belongs to the mixed sample set;
The sample acquisition module is specifically configured to:
Performing forward graph reasoning on each sample in the expert sample set based on a vision pre-training model, and determining a feature graph corresponding to a network middle layer as effective features of the expert sample set;
The forward graph reasoning on each sample in the expert sample set based on the vision pre-training model comprises:
Inputting original feature graphs corresponding to all samples in the expert sample set into the vision pre-training model, and only forward reasoning is carried out on input image data through fixed network parameters, and in the reasoning process, only mean value and standard deviation parameters corresponding to a normalization layer of the vision pre-training model are updated;
The expert sample set includes training samples or test samples required for at least one of the following scenarios: image classification, object detection and image segmentation.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the visual pre-training model-based simulated learning hybrid sample processing method of any of claims 1 to 5 when the program is executed by the processor.
8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method of simulated learning mixed sample processing based on a visual pre-training model as claimed in any of claims 1 to 5.
9. A computer program product comprising a computer program which, when executed by a processor, implements a method of simulated learning hybrid sample processing based on a visual pre-training model as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311868899.4A CN117975190B (en) | 2023-12-29 | 2023-12-29 | Method and device for processing simulated learning mixed sample based on vision pre-training model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311868899.4A CN117975190B (en) | 2023-12-29 | 2023-12-29 | Method and device for processing simulated learning mixed sample based on vision pre-training model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117975190A CN117975190A (en) | 2024-05-03 |
CN117975190B true CN117975190B (en) | 2024-11-05 |
Family
ID=90850310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311868899.4A Active CN117975190B (en) | 2023-12-29 | 2023-12-29 | Method and device for processing simulated learning mixed sample based on vision pre-training model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117975190B (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109615627B (en) * | 2018-12-14 | 2021-07-27 | 国网智能科技股份有限公司 | Power transmission and transformation inspection image quality evaluation method and system |
US11376500B2 (en) * | 2019-02-27 | 2022-07-05 | Nvidia Corporation | Gamer training using neural networks |
US20210142181A1 (en) * | 2019-11-07 | 2021-05-13 | Microsoft Technology Licensing, Llc | Adversarial training of machine learning models |
CN114595799A (en) * | 2020-11-30 | 2022-06-07 | 华为技术有限公司 | Model training method and device |
EP4334900A1 (en) * | 2021-10-15 | 2024-03-13 | Bracco Imaging S.p.A. | Training a machine learning model for simulating images at higher dose of contrast agent in medical imaging applications |
CN116935170B (en) * | 2023-09-14 | 2024-05-28 | 腾讯科技(深圳)有限公司 | Processing method and device of video processing model, computer equipment and storage medium |
-
2023
- 2023-12-29 CN CN202311868899.4A patent/CN117975190B/en active Active
Non-Patent Citations (1)
Title |
---|
基于噪声对比估计的权重自适应对抗生成式模仿学习;关凡伟 等;《模式识别与人工智能》;20230430;第36卷(第4期);第300-312页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117975190A (en) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111310814A (en) | Method and device for training business prediction model by utilizing unbalanced positive and negative samples | |
CN112668235A (en) | Robot control method of DDPG algorithm based on offline model pre-training learning | |
CN111507469B (en) | Method and device for optimizing super parameters of automatic labeling device | |
CN116110022B (en) | Lightweight traffic sign detection method and system based on response knowledge distillation | |
CN111652264B (en) | Negative migration sample screening method based on maximum mean value difference | |
CN113947022B (en) | Near-end strategy optimization method based on model | |
CN114038055A (en) | Image generation method based on contrast learning and generation countermeasure network | |
CN107832789B (en) | Feature weighting K nearest neighbor fault diagnosis method based on average influence value data transformation | |
CN115511069A (en) | Neural network training method, data processing method, device and storage medium | |
CN114998602A (en) | Domain adaptive learning method and system based on low confidence sample contrast loss | |
CN114137967B (en) | Driving behavior decision method based on multi-network joint learning | |
JP7073171B2 (en) | Learning equipment, learning methods and programs | |
CN114332565A (en) | Method for generating image by generating confrontation network text based on distribution estimation condition | |
CN117975190B (en) | Method and device for processing simulated learning mixed sample based on vision pre-training model | |
JP7468088B2 (en) | Image processing system and image processing program | |
CN113807541B (en) | Fairness repair method, system, equipment and storage medium for decision system | |
CN113420706B (en) | Vehicle detection method based on multi-layer feature fusion | |
CN115410250A (en) | Array type human face beauty prediction method, equipment and storage medium | |
Li et al. | Policy gradient methods with gaussian process modelling acceleration | |
CN113807005A (en) | Bearing residual life prediction method based on improved FPA-DBN | |
CN113134238A (en) | Level setting method and device, computer equipment and storage medium | |
CN113095328A (en) | Self-training-based semantic segmentation method guided by Gini index | |
CN117574539B (en) | Optimization parameter verification system for vehicle restraint system | |
CN113688950B (en) | Multi-target feature selection method, device and storage medium for image classification | |
CN114444597B (en) | Visual tracking method and device based on progressive fusion network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |