[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112862094A - DRBM (distributed resource management protocol) fast adaptation method based on meta-learning - Google Patents

DRBM (distributed resource management protocol) fast adaptation method based on meta-learning Download PDF

Info

Publication number
CN112862094A
CN112862094A CN202110134999.9A CN202110134999A CN112862094A CN 112862094 A CN112862094 A CN 112862094A CN 202110134999 A CN202110134999 A CN 202110134999A CN 112862094 A CN112862094 A CN 112862094A
Authority
CN
China
Prior art keywords
layer
training
network
learning
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110134999.9A
Other languages
Chinese (zh)
Inventor
张新禹
刘子衿
任祖煜
霍凯
刘振
张双辉
刘永祥
姜卫东
黎湘
卢哲俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110134999.9A priority Critical patent/CN112862094A/en
Publication of CN112862094A publication Critical patent/CN112862094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明属于机器学习领域,具体涉及一种基于元学习的DRBM方法,通过改进网络的训练‑测试算法,将算法分为元学习和模型学习两个阶段。在元学习阶段利用训练任务更新网络参数,并将更新后的网络参数作为模型学习阶段的网络参数初始值,使网络参数初始值能够使网络训练的损失函数下降更快并且更容易达到全局最优,在模型学习阶段利用测试任务更新网络参数并进行测试。该算法引入元学习的方法对DRBM的训练过程进行改进,使网络参数的元学习阶段梯度下降方向为向“最适应”点下降,使得网络能够快速适应到一个新的任务中。

Figure 202110134999

The invention belongs to the field of machine learning, and in particular relates to a DRBM method based on meta-learning. By improving the training-testing algorithm of the network, the algorithm is divided into two stages: meta-learning and model learning. In the meta-learning stage, the training task is used to update the network parameters, and the updated network parameters are used as the initial value of the network parameters in the model learning stage, so that the initial value of the network parameters can make the loss function of the network training decrease faster and achieve the global optimum more easily. , in the model learning phase, the network parameters are updated and tested using the test task. The algorithm introduces the method of meta-learning to improve the training process of DRBM, so that the gradient descent direction of the network parameters in the meta-learning stage is to descend to the "most suitable" point, so that the network can quickly adapt to a new task.

Figure 202110134999

Description

DRBM (distributed resource management protocol) fast adaptation method based on meta-learning
Technical Field
The invention belongs to the field of machine Learning, and particularly relates to a rapid adaptation discriminant finite Boltzmann machine (DRBM) method based on Meta Learning (Meta Learning).
Background
A Restricted Boltzmann Machine (RBM) network is one of the most popular basic models in machine learning, and is also the most common basic component in a deep neural network. The RBM can perform feature extraction and learn data probability distribution by using its hidden unit and generate a new sample by using the learned probability distribution, and has been widely studied by scholars in the fields of target recognition, probability models, and the like. The DRBM is an expansion form of the RBM, and the core idea of the DRBM is to construct a discriminant function in a certain number of sample sets, and train the feature vector and the label together as the input of the RBM, so that the RBM has a classification function.
DRBMs were originally proposed in 2008 by Hugo Larochelle and Yoshua Bengio (Larochelle H, Bengio Y. Classification using discrete verified Boltzmann machineries [ C ]// Proceedings of the 25th international conference on Machine learning. ACM,2008: 536-.
The improvement and optimization of the DRBM network can be mainly divided into learning algorithm optimization and model structure optimization, and the DRBM network fast adapted by meta-learning belongs to optimization on the learning algorithm. The meta-learning method makesThe network parameter theta obtained by DRBM network training is not pursued to be best represented on a specific training set any more, but pursues the initial value theta of the network parameter in all training tasks0It is possible to quickly converge to the optimal solution through only a few steps.
The proposal of the DRBM algorithm based on Meta-Learning is inspired by the MAML (Model-adaptive Meta-Learning, MAML) algorithm. The MAML algorithm was proposed in 2017 by Chelsea Finn, Pieter Abbeel, and Sergey Levine, is a model-independent fast adaptive meta-learning algorithm, is applicable to any model trained based on gradient descent, and is applicable to various learning problems, such as classification, regression, and reinforcement learning. The method obtains better performance on two kinds of few-sample (few-shot) image classification data sets (Omniglot and MiniImagenet), obtains better effect on few-sample regression, and accelerates fine adjustment of strategy gradient reinforcement learning by utilizing a neural network strategy.
Disclosure of Invention
The invention aims to solve the problems that the DRBM network under-fitting and the initial values of network parameters can not lead the network to achieve global optimum after training under the condition of small samples.
The idea of the invention is to divide the algorithm into two stages of meta learning and model learning by improving the training-testing algorithm of the network. And updating the network parameters by using the training tasks in the meta-learning stage, taking the updated network parameters as initial values of the network parameters in the model learning stage, enabling the initial values of the network parameters to enable a loss function of network training to descend more quickly and to achieve global optimization more easily, and updating the network parameters by using the testing tasks in the model learning stage and testing. The algorithm introduces a meta-learning method to improve the training process of the DRBM, so that the gradient descending direction of the meta-learning stage of the network parameters is reduced to the most adaptive point, and the network can be quickly adapted to a new task.
The technical scheme adopted by the invention for solving the technical problems is as follows: a DRBM fast adaptation method based on meta-learning comprises the following steps:
s1, establishing a DRBM network structure. The network structure of the DRBM can be divided into three layers, namely a visible layer, a hidden layer and a classification layer, wherein each layer comprises a plurality of neurons, the connection mode of the neurons is that nodes in the same layer are not connected, and the nodes between the layers are mutually connected in a full connection mode. The state of each neuron is a binary value of 1 or 0,1 represents activation, 0 represents non-activation, and activation means that the node represented by the neuron processes data. The distribution of the DRBM is determined by the value of a neuron, wherein a visible layer is used for representing input data, the number of nodes of the visible layer is determined by the dimension of the input data, and the value of the nodes of the visible layer is the value of each dimension of the input data; the hidden layer obtains certain statistical characteristics of the observed data in an optimized mode, and the number of nodes of the hidden layer is manually adjusted according to different data and tasks; and the classification layer unit judges the class according to the data characteristics extracted by the hidden layer unit, and the number of classification layer nodes is determined by the number of data classes.
DRBM networks are described by network parameters. Assuming that the number of nodes of a visible layer is l, the number of nodes of a hidden layer is m, the number of nodes of a classification layer is n, a bias vector of the visible layer is b, b is a 1-row/column vector and a bias vector of the hidden layer is c, c is a 1-row/column vector and a bias vector of the classification layer is d, d is a 1-row/n-column vector, a weight matrix of an input layer and the hidden layer is W, W is a l-row/m-column matrix, a weight matrix of the classification layer and the hidden layer is U, and U is an n-row/m-column matrix. Let the vector θ be (W, U, b, c, d), the purpose of training the DRBM network is to find the best value of θ to predict the data class through the network.
DRBM is a model determined based on an energy function, which can be defined as:
E(y,x,h)=-hWTxT-bxT-chT-dey T-hUTyT (1)
wherein x represents the state vector of the visible layer, x is a 1-row/column vector, h represents the state vector of the hidden layer unit, h is a 1-row m-column vector, y represents the state vector of the classification layer, y is a 1-row n-column vector, and y is the "one-hot" (one-hot) type representation of the label, that is, only one node in all nodes is 1, and the rest nodes are 0. The joint probability distribution of x, h, y is:
Figure RE-GDA0003007253060000021
wherein
Figure RE-GDA0003007253060000022
Referred to as a partition function.
S2 meta learning stage:
s2.1 initializing network parameters:
initializing a bias vector b of a visible layer to be a zero matrix of 1 row and l column, a bias vector c of a hidden layer to be a zero matrix of 1 row and m column, a bias vector d of a classification layer to be a zero matrix of 1 row and n column, corresponding gradients of delta b to be a zero matrix of 1 row and l column, delta c to be a zero matrix of 1 row and m column and delta d to be a zero matrix of 1 row and n column, a bias matrix W of the visible layer and the hidden layer to be a zero matrix of l row and m column, a bias matrix U of the classification layer and the hidden layer to be a zero matrix of n row and m column, corresponding gradients of delta W to be a zero matrix of l row and m column and delta U to be a zero matrix of n row and m column, and a network parameter theta initial value is recorded as theta0(ii) a Setting an internal learning rate alpha to 0.01-0.5, an external learning rate beta to 0.005-0.05, a momentum (momentum) learning rate m to 0.5 and a penalty (penalty) coefficient p to 10-4
S2.2, completing the training of a task (task):
the training of the meta-learning phase takes one task as a basic unit, and each task comprises two parts, namely a support set (support set) and a challenge set (query set). The process of training with the support set is called internal learning, and the process of training with the challenge set is called external learning. The data categories for each task may be the same as or different from the other tasks. All tasks of the meta-learning phase together constitute a training task (training tasks). All samples in the training task contain both data information and label information. The method comprises the following specific steps:
s2.2.1 having the support set as the network input, θ0As the initial value of the network parameter, completing one training:
s2.2.1.1 calculate the probability distribution function of the hidden layer:
p(h|y,x)=sigmoid(x(0)W+y(0)U+c) (3)
wherein x(0)To input data, y(0)An input data tag in the form of a one-hot,
Figure RE-GDA0003007253060000031
s2.2.1.2, obtaining the hidden layer probability distribution function, and obtaining the hidden layer node value by utilizing Gibbs sampling. The specific method for Gibbs sampling comprises the following steps: generating m [0,1 ]]Random number r ofi,i∈[1,m]To hide the node number of the layer, if p (h)i|y,x)>riThen node hiIs 1, otherwise is 0, the resulting hidden layer node distribution is denoted as h(0)H is sampled from the probability distribution p (h | y, x)(0)Is recorded as h(0)~p(h|y,x)。
S2.2.1.3, reconstructing the visible layer and the classification layer according to the hidden layer node values, and respectively calculating the probability distribution functions of the visible layer and the classification layer:
Figure RE-GDA0003007253060000032
wherein
Figure RE-GDA0003007253060000033
c' represents all possible values of c.
S2.2.1.4 obtaining probability distribution function of visible layer and classification layer, using Gibbs to sample x(1)P (x | h) and y(1)-p (y | h) obtaining node values x of the visible layer and the classification layer(1)、y(1)
S2.2.1.5 taking value x according to nodes of visible layer and classification layer(1)、y(1)The hidden layer probability distribution function is calculated again:
p(h|y,x)=sigmoid(x(1)W+y(1)U+c) (5)
and sampled by GibbsTo h(1)~p(h|y,x)。
S2.2.1.6 according to x(0)、y(0)、x(1)、y(1)、h(0)、h(1)And solving an updating gradient of the network parameter theta:
Figure RE-GDA0003007253060000034
s2.2.1.7, inputting the tasks in the training task set, and correcting the gradient according to the internal learning rate alpha, the momentum learning rate m and the penalty coefficient p:
Figure RE-GDA0003007253060000041
wherein i belongs to [1, ns ], and ns is the number of samples in the support set;
s2.2.1.8, the network parameter θ is updated according to the gradient:
Figure RE-GDA0003007253060000042
the network parameter after inputting the support set data for training and updating is recorded as thetans
S2.2.2 takes a challenge set as a network input, θsAs the initial value of the network parameter, completing one training:
according to formulas (3), (4), (5) and (6), probability distribution function calculation, node distribution sampling and network parameter updating gradient calculation are completed in sequence, and the correction gradient is calculated according to formula (9):
Figure RE-GDA0003007253060000043
wherein beta is external learning rate, i belongs to [1, nq ]]And nq is the number of samples in the challenge set. Finally, the network parameter is updated according to the formula (8), and the obtained network parameter is marked as thetanq
After the training of one task is completed, the network parameters are updated once, and the updating only keeps the external learning part, namely:
θt+1=θt+(θnqns) (10)
wherein t is equal to [1, nt ∈]T represents the tth training task, nt represents the number of training tasks, and the network parameter theta is storedt+1And as the initial value of the network parameter of the next task, finishing the training of one task.
S2.3 completes all traversals (epoch):
each traversal needs to train a plurality of tasks, the number of the tasks is determined according to the size of the data set, and generally 20-100 groups of tasks can be set. And after the training of one task is finished, taking the updated initial value of the network parameter as the initial value of the network parameter of the next task, and repeating the S2.2 process in sequence until all the tasks finish one training, namely one traversal. And after one traversal is completed, taking the updated network parameters as the initial values of the network parameters of the next traversal until all the traversals are completed. The finally obtained network parameter is recorded as thetant
The meta-learning stage usually needs to complete multiple traversals, the number of traversals depends on the convergence speed of the network, and the number of traversals is usually set to 50.
S3 model learning phase:
similar to the meta-learning phase, the tasks of the model learning phase also include a support set and a challenge set, but the model learning phase typically has only one task, called a test task, in which the data classes are typically different from the training tasks.
S3.1 taking the support set as network input, θntAs the initial value of the network parameter, completing one training:
according to formulas (3), (4), (5) and (6), probability distribution function calculation, node distribution sampling and network parameter updating gradient calculation are completed in sequence, and then the correction gradient is calculated according to formula (11):
Figure RE-GDA0003007253060000051
wherein i belongs to [1, ts ], ts is the number of samples in the support set of the test task, and finally, the network parameter updating is completed according to a formula (8).
S3.2, completing all traversals:
each traversal is to complete one training of the support set in the test task, after one traversal is completed, the updated network parameters are used as the initial values of the network parameters of the next traversal until all traversals are completed, and finally the obtained network parameters are recorded as thetat
The model learning stage generally needs to complete 50-150 times of traversal, and the traversal time depends on the convergence speed of the network.
S3.3 inputting the data in the challenge set into the network, thetatAs network parameters, the prediction probability of the ith category is sequentially calculated:
prediction(i)=repeat(d(i),tq)+log(exp(x(0)·W+T(i)·U+c)+1) (12)
wherein T is tq row, nc column matrix, tq is the number of the samples of the challenge set, nc is the total number of the classes, and T (i) represents that all the rest columns of the offset vector T except the ith column is 1 are 0; d (i) indicates that all the rest columns of the offset vector d except the ith column are 1 are 0; repeat (d (i), tq) represents that the offset vector d (i) is repeated tq times to become a matrix of tq rows and n columns.
And after the prediction probabilities of the nc categories are calculated, taking the column where the maximum value is located as a category prediction result, and finishing the target classification.
Through relevant experiments, the beneficial effects obtained by the invention are as follows:
(1) after the training of the meta-learning stage, the initial value of the network parameter can be closer to the fastest convergence point, the test sample can be well fitted only by inputting a small number of samples to carry out the training fine tuning of the network, and the network can be more easily subjected to global optimization after the training fine tuning;
(2) the feature expression capability of the network can be enhanced, and the network is less prone to under-fitting when trained by using various tasks;
(3) the characteristic that the algorithm can enable network parameters to be rapidly converged can be applied to the condition of small samples so as to improve the identification accuracy of the network;
(4) not only as a training-testing method (test task is different from training task type), but also as a pre-training algorithm (test task is same as training task type).
Drawings
FIG. 1 is a diagram of the network architecture of the present invention;
FIG. 2 is a flow chart of the algorithm of the present invention;
FIG. 3 illustrates an identification process of High Resolution Range Profile (HRRP) data according to the present invention;
FIG. 4HRRP data;
FIG. 5HRRP data processing results;
FIG. 6HRRP data set processing results;
FIG. 7 compares the experimental results of the present invention with conventional algorithms;
fig. 8 a sample of the MNIST dataset;
FIG. 9 compares the experimental results of the present invention with conventional algorithms.
Detailed Description
The invention is further illustrated with reference to the accompanying drawings:
example 1 of the present invention demonstrates the complete recognition procedure of the proposed algorithm for HRRP data and the comparison of the recognition results with the conventional algorithm. Example 2 demonstrates the recognition of MNIST data sets by the proposed algorithm in comparison to conventional algorithms.
Fig. 1 shows a network structure of the present invention, and the network is divided into a visible layer (v layer), a label layer (y layer) and a hidden layer (h layer). v ═ v (v)1,v2,…,vl)、h=(h1,h2,…,hm) And y ∈ {0,1}nThe state vectors of the visible layer, the hidden layer and the label layer are respectively, and the label layer vector y belongs to {0,1}nIn the form of one-hot. In this example, the visible layer, hidden layer and label layer node numbers are 201, 130 and 3, respectively.
Fig. 2 is an algorithm flow chart showing a standard meta-learning based fast adaptation DRBM algorithm flow.
FIG. 3 is a flowchart illustrating the identification process of HRRP data according to the present invention, and comparing with FIG. 2, the data processing stages are added. The method is that the DRBM network can only input binary data, so data preprocessing is firstly carried out after HRRP data is obtained, and the method comprises the following four steps: the first step is data sorting, namely, data of different pitch angles and azimuth angles are selected according to different experiments, and then the data are arranged randomly; the second step is that the range normalization overcomes the distance sensitivity of the signal, namely, the range normalization is carried out according to the sample with the maximum range in the data set; the third step is mean value normalization to enhance the learning ability of the network to different characteristics. The meaning of the HRRP data mean normalization lies in two aspects: 1) after the average value is subtracted, the residual part can be intuitively understood as the difference part of each sample; 2) in the process of updating the network parameters, the algorithm has smaller oscillation and is easier to converge; and fourthly, carrying out data binarization, and converting the amplitude into probability to carry out sampling to obtain binarized HRRP data.
HRRP simulation data output by three-dimensional airplane model electromagnetic simulation software designed by Saibo corporation are used in example 1, the types of the simulated airplanes are F-35, F-117 and P-51, and specific parameters of the airplane are shown in figure 4.
Wherein the radar simulation wave band is an x wave band. The frequency range is 9.5GHz-10.5GHz, and the step size is 5 MHz. The polarization mode is vertical polarization. The target pitch angle is 0-10 degrees, the step length is 0.1 degrees, the azimuth angle is 0-90 degrees, and the step length is 0.1 degrees. Therefore, the total of our data aggregation is 201 frequency points, 101 elevation angles and 901 azimuth angles, i.e. 101 × 901 equals 91001 samples, each sample having 201 dimensions.
Fig. 5 illustrates the change in HRRP data during preprocessing, where the x-axis represents the data dimension and the y-axis represents the signal amplitude. (a) The HRRP data before preprocessing, (b) the normalized HRRP data, and (c) the binarized HRRP data.
FIG. 6 shows a data set of 8100 HRRP data, where the x-axis represents data dimensions, the y-axis represents sample number, the z-axis represents signal amplitude, the 1 st to 2700 th samples are F-35 aircraft HRRPs, the 2701 st to 5400 th samples are F-117 aircraft HRRPs, and the 5401 th to 8100 th samples are P-51 aircraft HRRPs. (a) The processed HRRP data satisfy 0-1 distribution and the distribution probability is the same as the original amplitude intensity.
In example 1, we selected from the simulation data of 0 ° to 10 ° and 80 ° to 90 ° for each type of airplane, 3 ° to 5 ° for the pitch angle, and 0.1 ° for each step, and a total of 12726 samples of (101+101) × 21 ═ 4242 samples for each type of airplane, where multiples of 0.2 ° for the azimuth angle were used as the test set
Figure RE-GDA0003007253060000071
6363 samples in total, the rest being training set
Figure RE-GDA0003007253060000078
A total of 6363 samples.
After the data processing is completed, we will go into the training and testing phase of the algorithm, and in example 1 we will perform two sets of experiments:
experiment 1: training DRBM using a conventional algorithm from
Figure RE-GDA0003007253060000073
Randomly extracting n samples of three kinds of targets, training, and collecting the training data
Figure RE-GDA0003007253060000074
Randomly extracting 2000 samples of each of the three types of targets and testing;
experiment 2: DRBM is trained and tested using the algorithm of the present invention, from
Figure RE-GDA0003007253060000075
Randomly extracting n samples of three types of targets and training, wherein a support set accounts for 1/4 of the number of samples in each task, and a challenge set accounts for 3/4 of the number of samples; from
Figure RE-GDA0003007253060000076
Randomly extracting n samples of three types of targets from the residual data as a support set in a test set, training the support set, and extracting the n samples from the three types of targets
Figure RE-GDA0003007253060000077
Randomly extracting 2000 samples of each of the three types of targets as a challenge set and testing;
the number of samples n was in the range of [20,400], and each set of experiments was performed 50 times (epoch is 50).
Experiment 2 the specific training and testing procedure was:
s1 meta learning stage:
initializing parameters: visible layer bias vector b ═ b1,b2,…,b201) And (c) hidden layer bias vector c ═ c1,c2,…,c130) And the bias vector d ═ of the label layer (d)1,d2,d3) The weight matrix W between the visible layer and the hidden layer is (W)i,j)∈R201×130And a weight matrix U between the label layer and the hidden layer is (U)i,j)∈R3×130The network parameters are zero matrixes of corresponding dimensionality respectively, and the gradients corresponding to the network parameters are zero vectors of the corresponding dimensionality. The internal learning rate α is 0.1, the external learning rate β is 0.001, the momentum learning rate m is 0.5, and the penalty factor p is 10-4
Will be provided with
Figure RE-GDA0003007253060000081
1/4 of each n samples of three types of targets are randomly extracted as a support set, 3/4 is used as an inquiry set, and 50 times of traversal training is sequentially completed according to the steps of S1.2 and S1.3 to obtain a trained network parameter initial value theta0
S2 model learning phase:
will be selected from
Figure RE-GDA0003007253060000082
Randomly extracting n samples of three types of targets from the residual data as a support set, training, finishing fine adjustment of a network parameter theta according to the steps of S2.1 and S2.2, and finishing trainingThe resultant network parameter is recorded as θs
Will be selected from
Figure RE-GDA0003007253060000083
And randomly drawing 2000 samples of each of the three types of targets as a challenge set and testing.
Fig. 7 shows the results of two experiments, where the circle and the square in the graph are the average of the classification correctness obtained by repeating the experiment for 20 times when n takes different values, and the band on each node is the classification correctness interval of 50 experiments. The following conclusions can be drawn from the figure:
1) in experiment 2, the present invention was used as a pre-training method because the training set had the same data type as the test set. It can be seen that when n takes different values, the classification accuracy of experiment 2 is higher than that of experiment 1, which means that the network parameter initial value obtained by meta-learning makes the network more easily reach global optimum than the randomized initial value;
2) under the condition of small samples (n is less than or equal to 200), the method has obvious improvement on the classification performance, which shows that the initial value of the network parameter obtained by meta-learning is faster to converge than the initial value of randomization;
3) when n takes different values, the classification accuracy intervals of experiment 2 are all smaller than those of experiment 1, which shows that the model obtained by training of the invention has stronger stability.
Experiments were performed in example 2 using a handwritten digit (MNIST) data set. The MNIST data set consists of 7 ten thousand pictures and corresponding labels, wherein 6 ten thousand pictures are used for training the neural network, and 1 ten thousand pictures are used for testing the neural network. Each picture is a 0-9 handwritten digital picture with 28 x 28 pixel points. The picture is a white character with black background, black is represented by 0, white is represented by a floating point number between 0 and 1, and the closer to 1, the more white the color is. The MNIST dataset provides a label for each picture, which is given in a unique form, i.e. the label vector is a one-dimensional array with a length of 10. A sample of the MNIST dataset is shown in fig. 8.
784 pixel points are combined into a one-dimensional array with the length of 784, and the one-dimensional array is input data to be input into a neural network. For the RBM network, input data of the network can only be binary, so that an MNIST data set needs to be preprocessed, random binarization processing is performed according to black and white brightness of each pixel point, namely, sampling is performed according to probability of converting white intensity value of each pixel point, and the binary distribution is met.
We performed three sets of experiments in total:
experiment 1: training the DRBM by using a traditional algorithm, training n numbers of 0-9 respectively, and testing 5000 numbers of 8-9 respectively;
experiment 2: training the DRBM by using a traditional algorithm, training n numbers of 8-9 respectively, and testing 5000 numbers of 8-9 respectively;
experiment 3: training and testing the DRBM by using the algorithm provided by the invention, wherein a training task set in a training stage comprises n numbers of 0-7, wherein a support set accounts for 1/4 of the number of samples in each task, and a challenge set accounts for 3/4 of the number of samples; in the testing stage, n numbers of 8-9 in the supporting set are used for carrying out parameter fine adjustment, and 5000 numbers of 8-9 in the inquiry set are used for testing.
Example 2 the steps of the specific training and testing are similar to example 1, except that the number of network nodes and the learning rate are set differently. The value range of the number n of each sample is [5,1000], each experiment is repeated for 20 times, and the result is shown in fig. 9 by drawing according to the interval of the classification accuracy:
in the graph, each node is a classification accuracy average value obtained by repeating 20 times of experiments when n takes different values, and a band on each node is a classification accuracy interval of 20 experiments.
The following conclusions can be drawn from the figure:
under small sample conditions (n ≦ 100):
comparing experiment 1 and experiment 2, when the class of the training data is more than that of the test data, the classification accuracy of the network on the test data is reduced. The reason is that the feature expression capability is limited due to the limitation of the network structure, and the more training task types, the more the network is likely to fall into under-fitting, so that the recognition capability of the network for a single task is reduced.
Comparing experiment 1 with experiment 3, although the sample category and number participating in network training are the same, in experiment 3, when n takes different values, the classification accuracy is higher than that in experiment 1, especially under the condition of small sample. This is because the proposed algorithm tends to find the "fastest convergence point" more in the training phase than to approach the optimal point. Good initial parameters can be learned even if a number different from the test data category is input for network training. And when n takes different values, the classification accuracy intervals of the experiment 3 are all smaller than those of the experiment 1, which shows that the model obtained by the algorithm training is stronger in stability.
Comparing experiment 2 with experiment 3, when n takes different values, the classification accuracy of experiment 3 is higher than that of experiment 2, which shows that the algorithm does not reduce the feature expression ability of the network due to too many training categories, and also shows that the initial values of the network parameters learned in the training stage of experiment 3 give consideration to the learning ability of different tasks, and the network can better fit the training data only by a small amount of training.
When the number of samples is sufficient (n is 1000), the classification accuracy of experiment 3 is higher than that of the other two samples, which indicates that the network parameters reach the global optimal point by the meta-learning method.
In conclusion, the method can effectively improve the DRBM classification accuracy under the condition of small samples, and has higher engineering application value.

Claims (5)

1. A DRBM (distributed resource management) method based on meta-learning is characterized by comprising the following steps:
s1, establishing a DRBM network structure: the network structure of the DRBM can be divided into three layers, namely a visible layer, a hidden layer and a classification layer, wherein each layer comprises a plurality of neurons, the connection mode of the neurons is that nodes in the same layer are not connected, and the nodes between the layers are mutually connected in a full connection mode; the state of each neuron is a binary value of 1 or 0, wherein 1 represents activation, 0 represents non-activation, and activation means that the node represented by the neuron processes data; the distribution of the DRBM is determined by the value of a neuron, wherein a visible layer is used for representing input data, the number of nodes of the visible layer is determined by the dimension of the input data, and the value of the nodes of the visible layer is the value of each dimension of the input data; the hidden layer obtains certain statistical characteristics of the observed data in an optimized mode, and the number of nodes of the hidden layer is manually adjusted according to different data and tasks; the classification layer unit judges the type according to the data characteristics extracted by the hidden layer unit, and the number of classification layer nodes is determined by the number of data types;
the DRBM network is described by network parameters; assuming that the number of nodes of a visible layer is l, the number of nodes of a hidden layer is m, the number of nodes of a classification layer is n, a bias vector of the visible layer is b, b is a 1-row/column vector and a bias vector of the hidden layer is c, c is a 1-row/column vector and a bias vector of the classification layer is d, d is a 1-row/n-column vector, a weight matrix of an input layer and the hidden layer is W, W is a l-row/m-column matrix, a weight matrix of the classification layer and the hidden layer is U, and U is an n-row/m-column matrix; if the vector θ is (W, U, b, c, d), the purpose of training the DRBM network is to find the optimal value of θ to predict the data category through the network;
DRBM is a model determined based on an energy function, which can be defined as:
E(y,x,h)=-hWTxT-bxT-chT-dey T-hUTyT (1)
wherein x represents the state vector of the visible layer, x is 1 row and l column vector, h represents the state vector of the hidden layer unit, h is 1 row and m column vector, y represents the state vector of the classification layer, y is 1 row and n column vector, y is the 'one-hot' type representation of the label, namely only one node in all nodes is 1, and the rest nodes are 0; the joint probability distribution of x, h, y is:
Figure FDA0002923473620000011
wherein
Figure FDA0002923473620000012
Called the partition function;
s2 meta learning stage:
s2.1 initializing network parameters:
initializing a bias vector b of a visible layer to be a zero matrix of 1 row and l column, a bias vector c of a hidden layer to be a zero matrix of 1 row and m column, a bias vector d of a classification layer to be a zero matrix of 1 row and n column, corresponding gradients of delta b to be a zero matrix of 1 row and l column, delta c to be a zero matrix of 1 row and m column and delta d to be a zero matrix of 1 row and n column, a bias matrix W of the visible layer and the hidden layer to be a zero matrix of l row and m column, a bias matrix U of the classification layer and the hidden layer to be a zero matrix of n row and m column, corresponding gradients of delta W to be a zero matrix of l row and m column and delta U to be a zero matrix of n row and m column, and a network parameter theta initial value is recorded as theta0(ii) a Setting an internal learning rate alpha to be 0.01-0.5, an external learning rate beta to be 0.005-0.05, a momentum learning rate m to be 0.5 and a penalty coefficient p to be 10-4
S2.2, completing the training of one task:
the training of the meta-learning stage takes one task as a basic unit, and each task comprises two parts, namely a support set and a challenge set; the process of training by utilizing the support set is called internal learning, and the process of training by utilizing the challenge set is called external learning; the data category of each task may be the same as or different from that of other tasks; all tasks in the meta-learning stage jointly form a training task; all samples in the training task contain both data information and label information; the method comprises the following specific steps:
s2.2.1 having the support set as the network input, θ0As the initial value of the network parameter, completing one training:
s2.2.1.1 calculate the probability distribution function of the hidden layer:
p(h|y,x)=sigmoid(x(0)W+y(0)U+c) (3)
wherein x(0)To input data, y(0)An input data tag in the form of a one-hot,
Figure FDA0002923473620000021
s2.2.1.2, obtaining a hidden layer probability distribution function, and obtaining hidden layer node values by utilizing Gibbs sampling;
s2.2.1.3, reconstructing the visible layer and the classification layer according to the hidden layer node values, and respectively calculating the probability distribution functions of the visible layer and the classification layer:
Figure FDA0002923473620000022
wherein
Figure FDA0002923473620000023
c' represents all possible values of c;
s2.2.1.4 obtaining probability distribution function of visible layer and classification layer, using Gibbs to sample x(1)P (x | h) and y(1)-p (y | h) obtaining node values x of the visible layer and the classification layer(1)、y(1)
S2.2.1.5 taking value x according to nodes of visible layer and classification layer(1)、y(1)The hidden layer probability distribution function is calculated again:
p(h|y,x)=sigmoid(x(1)W+y(1)U+c) (5)
and h is obtained by Gibbs sampling(1)~p(h|y,x);
S2.2.1.6 according to x(0)、y(0)、x(1)、y(1)、h(0)、h(1)And solving an updating gradient of the network parameter theta:
Figure FDA0002923473620000024
s2.2.1.7, inputting the tasks in the training task set, and correcting the gradient according to the internal learning rate alpha, the momentum learning rate m and the penalty coefficient p:
Figure FDA0002923473620000031
wherein i belongs to [1, ns ], and ns is the number of samples in the support set;
s2.2.1.8, the network parameter θ is updated according to the gradient:
Figure FDA0002923473620000032
the network parameter after inputting the support set data for training and updating is recorded as thetans
S2.2.2 takes a challenge set as a network input, θsAs the initial value of the network parameter, completing one training:
according to formulas (3), (4), (5) and (6), probability distribution function calculation, node distribution sampling and network parameter updating gradient calculation are completed in sequence, and the correction gradient is calculated according to formula (9):
Figure FDA0002923473620000033
wherein beta is external learning rate, i belongs to [1, nq ]]Nq is the number of samples in the challenge set; finally, the network parameter is updated according to the formula (8), and the obtained network parameter is marked as thetanq
After the training of one task is completed, the network parameters are updated once, and the updating only keeps the external learning part, namely:
θt+1=θt+(θnqns) (10)
wherein t is equal to [1, nt ∈]T represents the tth training task, nt represents the number of training tasks, and the network parameter theta is storedt+1Taking the initial value of the network parameter of the next task, and finishing the training of one task;
s2.3, completing all traversals:
a plurality of tasks need to be trained in each traversal, and the number of the tasks is determined according to the size of the data set; after the training of one task is finished, the updated initial value of the network parameter is used as the initial value of the network parameter of the next task, and the process of S2.2 is repeated in sequence until all tasks finish one-time training, and the process is finishedNamely, one traversal is carried out; after one traversal is completed, the updated network parameters are used as the initial values of the network parameters of the next traversal until all traversals are completed, and finally the obtained network parameters are marked as thetant
S3 model learning phase:
s3.1 taking the support set as network input, θntAs the initial value of the network parameter, completing one training:
according to formulas (3), (4), (5) and (6), probability distribution function calculation, node distribution sampling and network parameter updating gradient calculation are completed in sequence, and then the correction gradient is calculated according to formula (11):
Figure FDA0002923473620000041
wherein i belongs to [1, ts ], ts is the number of samples in the support set of the test task, and finally, the network parameter updating is completed according to a formula (8);
s3.2, completing all traversals:
each traversal is to complete one training of the support set in the test task, after one traversal is completed, the updated network parameters are used as the initial values of the network parameters of the next traversal until all traversals are completed, and finally the obtained network parameters are recorded as thetat
S3.3 inputting the data in the challenge set into the network, thetatAs network parameters, the prediction probability of the ith category is sequentially calculated:
prediction(i)=repeat(d(i),tq)+log(exp(x(0)·W+T(i)·U+c)+1) (12)
wherein T is tq row, nc column matrix, tq is the number of the samples of the challenge set, nc is the total number of the classes, and T (i) represents that all the rest columns of the offset vector T except the ith column is 1 are 0; d (i) indicates that all the rest columns of the offset vector d except the ith column are 1 are 0; repeat (d (i), tq) represents that the offset vector d (i) is repeated tq times to become a matrix of tq rows and n columns;
and after the prediction probabilities of the nc categories are calculated, taking the column where the maximum value is located as a category prediction result, and finishing the target classification.
2. A meta-learning based fast adaptation DRBM method according to claim 1, wherein: s2.2.1.2, the specific method of Gibbs sampling is as follows: generating m [0,1 ]]Random number r ofi,i∈[1,m]To hide the node number of the layer, if p (h)i|y,x)>riThen node hiIs 1, otherwise is 0, the resulting hidden layer node distribution is denoted as h(0)H is sampled from the probability distribution p (h | y, x)(0)Is recorded as h(0)~p(h|y,x)。
3. A meta-learning based fast adaptation DRBM method according to claim 1, wherein: and in S2.3, 20-100 groups of tasks are set.
4. A meta-learning based fast adaptation DRBM method according to claim 1, wherein: the meta learning stage sets 50 traversals.
5. A meta-learning based fast adaptation DRBM method according to claim 1, wherein: the set traversal times in the model learning stage are 50-150 times.
CN202110134999.9A 2021-01-29 2021-01-29 DRBM (distributed resource management protocol) fast adaptation method based on meta-learning Pending CN112862094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110134999.9A CN112862094A (en) 2021-01-29 2021-01-29 DRBM (distributed resource management protocol) fast adaptation method based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110134999.9A CN112862094A (en) 2021-01-29 2021-01-29 DRBM (distributed resource management protocol) fast adaptation method based on meta-learning

Publications (1)

Publication Number Publication Date
CN112862094A true CN112862094A (en) 2021-05-28

Family

ID=75987291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110134999.9A Pending CN112862094A (en) 2021-01-29 2021-01-29 DRBM (distributed resource management protocol) fast adaptation method based on meta-learning

Country Status (1)

Country Link
CN (1) CN112862094A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114844A (en) * 2022-05-09 2022-09-27 东南大学 Meta learning prediction model for reinforced concrete bonding slip curve
CN116737939A (en) * 2023-08-09 2023-09-12 恒生电子股份有限公司 Meta learning method, text classification device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114844A (en) * 2022-05-09 2022-09-27 东南大学 Meta learning prediction model for reinforced concrete bonding slip curve
CN115114844B (en) * 2022-05-09 2023-09-19 东南大学 A meta-learning prediction model for reinforced concrete bond-slip curves
CN116737939A (en) * 2023-08-09 2023-09-12 恒生电子股份有限公司 Meta learning method, text classification device, electronic equipment and storage medium
CN116737939B (en) * 2023-08-09 2023-11-03 恒生电子股份有限公司 Meta learning method, text classification device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109948029A (en) Deep Hash Image Search Method Based on Neural Network Adaptive
CN105528638B (en) The method that gray relative analysis method determines convolutional neural networks hidden layer characteristic pattern number
CN110188685A (en) A target counting method and system based on double-attention multi-scale cascade network
CN113222011B (en) Small sample remote sensing image classification method based on prototype correction
CN108596327B (en) A Deep Learning-Based Artificial Intelligence Picking Method for Seismic Velocity Spectrum
CN107229914B (en) A handwritten digit recognition method based on deep Q-learning strategy
CN109740679B (en) Target identification method based on convolutional neural network and naive Bayes
CN109146000B (en) Method and device for improving convolutional neural network based on freezing weight
CN112884059A (en) Small sample radar working mode classification method fusing priori knowledge
Minh et al. Automated image data preprocessing with deep reinforcement learning
CN116503676B (en) Picture classification method and system based on knowledge distillation small sample increment learning
CN113541985B (en) Internet of things fault diagnosis method, model training method and related devices
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN112862094A (en) DRBM (distributed resource management protocol) fast adaptation method based on meta-learning
CN112101364A (en) A Semantic Segmentation Method Based on Incremental Learning of Parameter Importance
Bianchi et al. Improving image classification robustness through selective cnn-filters fine-tuning
US20220318633A1 (en) Model compression using pruning quantization and knowledge distillation
CN111310791A (en) A Dynamic Progressive Automatic Target Recognition Method Based on Small Sample Number Sets
CN113553918B (en) Machine ticket issuing character recognition method based on pulse active learning
Xie et al. Data augmentation of sar sensor image via information maximizing generative adversarial net
Sufikarimi et al. Speed up biological inspired object recognition, HMAX
CN111222529A (en) GoogLeNet-SVM-based sewage aeration tank foam identification method
CN110110625A (en) SAR image target identification method and device
US20230041338A1 (en) Graph data processing method, device, and computer program product
CN112784969A (en) Convolutional neural network accelerated learning method based on sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210528

WD01 Invention patent application deemed withdrawn after publication