[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021051987A1 - 神经网络模型训练的方法和装置 - Google Patents

神经网络模型训练的方法和装置 Download PDF

Info

Publication number
WO2021051987A1
WO2021051987A1 PCT/CN2020/102594 CN2020102594W WO2021051987A1 WO 2021051987 A1 WO2021051987 A1 WO 2021051987A1 CN 2020102594 W CN2020102594 W CN 2020102594W WO 2021051987 A1 WO2021051987 A1 WO 2021051987A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
data
network model
training data
training
Prior art date
Application number
PCT/CN2020/102594
Other languages
English (en)
French (fr)
Inventor
于德权
吴觊豪
贾明波
马杰延
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021051987A1 publication Critical patent/WO2021051987A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method and device for training a neural network model.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making
  • Computer vision is an inseparable part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, and military. It is about how to use cameras/video cameras and computers to obtain What we need is the knowledge of the data and information of the subject. To put it vividly, it is to install eyes (camera/camcorder) and brain (algorithm) on the computer to replace the human eye to identify, track and measure the target, so that the computer can perceive the environment. Because perception can be seen as extracting information from sensory signals, computer vision can also be seen as a science that studies how to make artificial systems "perceive" from images or multi-dimensional data.
  • computer vision uses various imaging systems to replace the visual organs to obtain input information, and then the computer replaces the brain to complete the processing and interpretation of the input information.
  • the ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, and have the ability to adapt to the environment autonomously.
  • the cluster-based small sample learning scheme extracts the characteristics of the training data through the neural network model, and calculates the distance between the characteristics of different types of training data to train the neural network model. Due to the limited training data of the small sample learning scheme, the neural network model obtained by training has poor generalization ability.
  • This application provides a method for training a neural network model, which can train a neural network model with higher accuracy and good generalization ability when the amount of training data is small or the amount of data is unbalanced.
  • a method for training a neural network model including: obtaining a neural network model, first training data, and a category of the first training data, the neural network model being trained based on the second training data,
  • the first training data includes support data and query data, the support data includes all or part of each category of the first training data, and the query data includes each category of the first training data.
  • each of the class center features of each category is the feature of the support data of each category The average value of the corresponding bits.
  • the neural network model obtained by training is used to extract the features of the training data, and the parameters of some layers of the neural network model are adjusted according to the feature distance between the features of the training data, and a neural network model with higher accuracy and strong generalization ability can be obtained.
  • the adjusting the parameters of some layers in the neural network model according to the feature distance between the class center feature of each category and the query data feature includes: according to the The feature distance between the class center feature of each category and the feature of the query data, and the average value of the feature distance between the feature of the first training data of each category, adjust the parameters of the partial layer.
  • the center loss represents the feature distance between the class center feature of each category and the query data feature.
  • the introduction of center loss can improve the efficiency of neural network model training and improve the accuracy of neural network model.
  • the use of the neural network model to perform feature extraction on the first training data to obtain the features of the first training data includes: The training data is input to the neural network model; the features extracted by the neural network model are deeply hashed to obtain the features of the first training data.
  • the volume of the features can be reduced, the training time can be reduced, and the training of the neural network model can be guaranteed to have higher accuracy.
  • the inference speed can be improved.
  • the adjusting the parameters of some layers in the neural network model according to the feature distance between the class center feature of each category and the query data feature includes: when the When the data amount of the first training data is less than the preset value, the hyperparameters are adjusted through the Bayesian optimization scheme, and the parameters of the partial layer are adjusted according to the characteristic distance between the class center feature of each category and the query data feature When the data amount of the first training data is greater than or equal to the preset value, according to the preset hyperparameters corresponding to the neural network model and the class center feature of each category and the query data feature The characteristic distance, adjust the parameters of the partial layer.
  • the efficiency of training the neural network model through the Bayesian optimization scheme is low.
  • the neural network model is trained according to the preset hyperparameters corresponding to the neural network model, and the accuracy of the neural network model obtained by training is low.
  • the hyperparameters include one or more of learning rate, learning rate decay rate, learning rate decay period, number of iteration cycles, batch size, and network structure parameters of the neural network model .
  • a method for training a neural network model including: acquiring first training data and a category of the first training data; when the data amount of the first training data is less than a preset value, using Bayesian The optimization scheme adjusts hyperparameters, and trains the neural network model according to the first training data and the categories of the first training data; when the data amount of the first training data is greater than or equal to the preset value, according to The preset hyperparameters corresponding to the neural network model, the first training data, and the category of the first training data are used to train the neural network model.
  • the type of neural network model may be default or specified.
  • the neural network model can be stored in the memory of the electronic device that executes the neural network model training method, and can also receive neural network models sent by other electronic devices.
  • the method further includes: acquiring the neural network model, the neural network model being trained according to second training data; the method according to the first training data And the category of the first training data, training a neural network model includes: using the neural network model to perform feature extraction on the first training data to obtain features of the first training data, and the first training The data includes support data and query data, the support data includes all or part of the data of each category in the first training data, and the query data includes all or part of the data of each category in the first training data ; According to the feature distance between the class center feature of each category and the query data feature, adjust the parameters of some layers in the neural network model to obtain the adjusted neural network model.
  • the class center feature of each category Each bit of is the average value of the corresponding bit of the feature of the support data of each category.
  • the method further includes: according to the feature distance between the center feature of each category and the feature of the query data, and the feature of the first training data of each category The average value of the distance between the features, adjust the parameters of the partial layer.
  • the method further includes: inputting the first training data into the neural network model; performing deep hashing on the features extracted by the neural network model to obtain the Describe the characteristics of the first training data.
  • the hyperparameters include one or more of learning rate, learning rate decay rate, learning rate decay period, number of iteration cycles, batch size, and network structure parameters of the neural network model .
  • a device for training a neural network model which includes each module used to execute the method in the above-mentioned first aspect.
  • a device for training a neural network model which includes each module used to execute the method in the second aspect.
  • a device for training a neural network model includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in the above-mentioned first aspect.
  • a neural network training device in a sixth aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, the device The processor is used to execute the method in the second aspect described above.
  • a computer storage medium stores program code, and the program code includes instructions for executing the steps in the method in the first aspect or the second aspect.
  • a chip system in an eighth aspect, includes at least one processor. When a program instruction is executed in the at least one processor, the chip system is caused to execute the chip system described in the first aspect or the second aspect. method.
  • the chip system may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is used to execute the method in the first aspect.
  • the above-mentioned chip system may specifically be a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • the method of the first aspect may specifically refer to the first aspect and a method in any one of the various implementation manners of the first aspect.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of using a convolutional neural network model provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • Fig. 4 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of a method for training a neural network model provided by another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for clustering-based small sample learning provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a fine-tuning method provided by an embodiment of the present application.
  • Fig. 8 is a schematic flow chart of the Bayesian optimization scheme.
  • Fig. 9 is a schematic structural diagram of a neural network model training device provided by another embodiment of the present application.
  • FIG. 10 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the present application.
  • a neural network model can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept b as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network model to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • the neural network model is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network model
  • DNN can be understood as a neural network model with many hidden layers. There is no special metric for "many” here. From the division of DNN according to the location of different layers, the neural network model inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. For example, in the fully connected neural network model, the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as It should be noted that there is no W parameter in the input layer.
  • more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network model is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network model (weight matrix formed by the vector W of many layers).
  • the convolutional neural network model (convolutional neuron network, CNN) is a deep neural network model with a convolutional structure.
  • the convolutional neural network model contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network model.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training process of the convolutional neural network model, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network model, and at the same time reduce the risk of overfitting.
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, and the training of the deep neural network model becomes a process of reducing this loss as much as possible.
  • the residual network includes a convolutional layer and/or a pooling layer.
  • the residual network can be: in the deep neural network model, in addition to connecting multiple hidden layers layer by layer, for example, the first hidden layer is connected to the second hidden layer, and the second hidden layer is connected to the third layer.
  • Hidden layer the third hidden layer is connected to the fourth hidden layer (this is a data operation path of the neural network model, which can also be called the neural network model transmission), and the residual network has one more direct connection Branch, this direct branch is directly connected from the hidden layer of the 1st layer to the hidden layer of the 4th layer, that is, the processing of the 2nd and 3rd hidden layers is skipped, and the hidden layer of the 1st layer The data is directly transmitted to the 4th hidden layer for calculation.
  • the highway network can be: in addition to the above-mentioned calculation path and direct connection branch, the deep neural network model also includes a weight acquisition branch. This branch introduces a transmission gate (transform gate) to acquire the weight value, and The output weight value T is used for the subsequent operations of the above arithmetic paths and directly connected branches.
  • a transmission gate transform gate
  • the convolutional neural network model can use the back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal until the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the pixel value of the image can be a red-green-blue (RGB) color value, and the pixel value can be a long integer representing the color.
  • the pixel value is 256 ⁇ Red+100 ⁇ Green+76 ⁇ Blue, where Blue represents the blue component, Green represents the green component, and Red represents the red component. In each color component, the smaller the value, the lower the brightness, and the larger the value, the higher the brightness.
  • the pixel values can be grayscale values.
  • the purpose of small sample research is to design a relevant learning model so that the model can realize rapid learning and identify the category of new samples from only a small number of labeled samples.
  • the existing research ideas applicable to small sample problems include transfer learning methods and semi-supervised learning methods. These methods can to a certain extent alleviate the problems of overfitting and data scarcity in the process of training with a small amount of data.
  • FIG. 1 is a schematic diagram of the system architecture of an embodiment of the present application.
  • the system architecture 100 includes an execution device 110, a training device 120, a database 130, a client device 140, a data storage system 150, and a data collection system 160.
  • the execution device 110 includes a calculation module 111, an I/O interface 112, a preprocessing module 113, and a preprocessing module 114.
  • the calculation module 111 may include the target model/rule 101, and the preprocessing module 113 and the preprocessing module 114 are optional.
  • the data collection device 160 is used to collect training data.
  • the training data may include the first training data and the category of the first training data.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 trains to obtain the target model/rule 101 based on the training data maintained in the database 130.
  • the training device 120 processes the input first training data, and compares the feature distance between the output query data feature and the class center feature of each category. Until the feature distance between the feature of the query data output by the training device 120 and the class center feature of each category meets the preset condition, the training of the target model/rule 101 is completed.
  • the above-mentioned target model/rule 101 can be used to realize the classification of the neural network model of the embodiment of the present application, that is, input the data to be processed (after relevant preprocessing) into the target model/rule 101 to obtain the category of the data to be processed .
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network model.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application. Limitations of the embodiment.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), in-vehicle terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data in this embodiment of the application may include: data to be processed input by the client device.
  • the client device 140 here may specifically be a terminal device.
  • the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as data to be processed) received by the I/O interface 112.
  • the preprocessing module 113 and the preprocessing module 114 may be omitted. Or there is only one preprocessing module.
  • the calculation module 111 can be directly used to process the input data.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the I/O interface 112 presents the processing result, such as the classification result calculated by the target model/rule 101, to the client device 140, so as to provide it to the user.
  • the classification result obtained by the target model/rule 101 processing in the calculation module 111 can be processed by the preprocessing module 113 (or the processing of the preprocessing module 114 can also be added) and then the processing result is sent to the I/O. Interface, and then the I/O interface sends the processing result to the client device 140 for display.
  • the calculation module 111 may also transmit the processed classification results to the I/O interface, and then the I/O interface will process the results. It is sent to the client device 140 for display.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above tasks provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the target model/rule 101 obtained by training according to the training device 120 may be the neural network model in the embodiment of the present application.
  • the neural network model provided in the embodiment of the present application may be CNN and deep convolutional neural Network model (deep convolutional neural networks, DCNN) and so on.
  • CNN is a very common neural network model
  • the structure of CNN will be introduced in detail below in conjunction with Figure 2.
  • the convolutional neural network model is a deep neural network model with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning at different levels of abstraction.
  • CNN is a feed-forward artificial neural network model in which each neuron in the feed-forward artificial neural network model can respond to the input data.
  • the convolutional neural network model (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (the pooling layer is optional), and a fully connected layer 230 .
  • CNN convolutional neural network model
  • the convolutional layer/pooling layer 220 shown in FIG. 2 may include layers 221-226 as shown in the examples.
  • layer 221 is a convolutional layer
  • layer 222 is a pooling layer
  • layer 223 is a convolutional layer.
  • Layers, 224 is the pooling layer
  • 225 is the convolutional layer
  • 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers.
  • Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network model 200 performs correct prediction.
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features;
  • the neural network model 200 deepens, and the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics.
  • the features with higher semantics are more suitable for the problem to be solved. .
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network model 200 After processing by the convolutional layer/pooling layer 220, the convolutional neural network model 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolutional neural network model 200 needs to use the fully connected layer 230 to generate one or a group of required classes of output. Therefore, the fully connected layer 230 may include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multiple hidden layers in the fully connected layer 230, that is, the final layer of the entire convolutional neural network model 200 is the output layer 240.
  • the output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error .
  • the convolutional neural network model 200 shown in FIG. 2 is only used as an example of a convolutional neural network model. In specific applications, the convolutional neural network model may also exist in the form of other network models.
  • CNN convolutional neural network model
  • FIG. 2 may be used to execute the classification method of the embodiment of the present application.
  • the data to be processed passes through the input layer 210 and the convolutional layer/pooling layer 220. After processing with the fully connected layer 230, the category of the data to be processed can be obtained.
  • FIG. 3 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network model processor 50.
  • the chip can be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101.
  • the algorithms of each layer in the convolutional neural network model as shown in Fig. 2 can be implemented in the chip as shown in Fig. 3.
  • a neural network model processor (neural-network processing unit, NPU) 50 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU allocates tasks.
  • the core part of the NPU is the arithmetic circuit 503.
  • the controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit 503 fetches the data corresponding to matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit 503.
  • the arithmetic circuit 503 takes the matrix A data and the matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.
  • the vector calculation unit 507 can perform further processing on the output of the arithmetic circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network model, such as pooling, batch normalization, and local response normalization. Wait.
  • the vector calculation unit 507 can store the processed output vector to the unified buffer 506.
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 507 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network model.
  • the unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
  • An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;
  • the controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, referred to as DDR SDRAM
  • HBM high bandwidth memory
  • each layer in the convolutional neural network model shown in FIG. 2 may be performed by the arithmetic circuit 503 or the vector calculation unit 507.
  • Deep learning technology is developing rapidly.
  • the current training of neural network models still has certain difficulties.
  • Engineers with certain experience are required to adjust the parameters of the neural network model and select the learning model.
  • to achieve the high accuracy of the neural network model also relies on expert experience to adjust the parameters of the training neural network model, which is time-consuming and labor-intensive, which is not conducive to rapid iteration of related services.
  • machine learning Under the traditional machine learning framework, the task of machine learning is to learn a classification model on the basis of given sufficient training data; and then use this learned model to classify and predict the test data.
  • machine learning algorithms have a key problem: it is difficult to obtain a large amount of training data in some newly emerging fields.
  • Hyperparameters include learning rate (LR), learning rate decay rate, learning rate decay period, number of iterations, batch size, network structure parameters of neural network models, etc. .
  • the learning rate is an important hyperparameter in supervised learning and deep learning, which determines whether the objective function can converge to a local minimum and when to converge to the minimum.
  • the learning rate set in the hyperparameters can also be understood as the initial learning rate.
  • the learning rate decay rate can be understood as the decline value of the learning rate in each iteration period. After each learning rate decay cycle, the learning rate decreases.
  • the learning rate decay period can be a positive integer multiple of the iteration period.
  • the number of iterations can also be called epochs, which can be understood as a single training iteration of all batches in the forward and backward propagation. This means that 1 cycle is a single forward and backward pass of the entire input data.
  • Hyper-parameters can be adjusted by means of automatic parameter adjustment, such as grid search, random search, genetic algorithm, particle swarm optimization, Bayesian optimization, etc.
  • grid search random search
  • genetic algorithm genetic algorithm
  • particle swarm optimization particle swarm optimization
  • Bayesian optimization etc.
  • Bayesian optimization is an example of Bayesian optimization.
  • this application proposes a method of neural network model training.
  • Fig. 4 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the present application.
  • step S401 the neural network model, the first training data, and the category of the first training data are acquired.
  • the acquisition can be read from the memory or received from other devices.
  • the neural network model may be obtained by training according to the second training data.
  • the second training data may be different data from the first training data, and the second training data may be, for example, all or part of the public data set.
  • the first training data includes supporting data and query data.
  • the supporting data includes all or part of the data of each category in the first training data.
  • the query data includes all or part of the data of each category in the first training data.
  • the first training data may be text, voice, image, etc., for example.
  • the category of the first training data can be, for example, the part of speech of each word in a sentence (noun, verb, etc.), or it can be a person’s voice corresponding to the person’s emotion when speaking, or it can be the person or object in the picture The category and so on.
  • step S402 the neural network model is used to perform feature extraction on the first training data to obtain the features of the first training data.
  • the features of the first training data may be features extracted by the neural network model, or may be obtained by processing the features extracted by the neural network model.
  • the first training data may be input to the neural network model, and the features extracted by the neural network model can be deeply hashed, so as to obtain the features of the first training data. That is, the feature of the first training data may be the result of deep hashing the feature extracted by the neural network model.
  • the characteristic distance can be expressed by Hamming distance.
  • the volume of the features can be reduced, the training time can be reduced, and the training of the neural network model can be guaranteed to have higher accuracy.
  • the inference speed can be improved.
  • step S403 according to the feature distance between the class center feature of each category and the query data feature, the parameters of some layers in the neural network model are adjusted to obtain the adjusted neural network model.
  • each bit in the center feature of each category is the average value of the corresponding bit of the feature of the supporting data of each category.
  • the center loss can be used to represent the average value of the feature distance between the features of the first training data of each category.
  • the introduction of center loss can improve the efficiency of neural network model training and improve the accuracy of neural network model.
  • the network structure of some layers in the neural network model is adjusted and the hyperparameters are optimized through the Bayesian optimization scheme, according to the class center characteristics of each class and The feature distance of the query data feature adjusts the parameters of the partial layer of the neural network model.
  • Adjust the parameters of some layers of the neural network model, and the adjusted layers can be preset values.
  • the parameters of the last few layers in the neural network model can be adjusted.
  • the preset hyperparameters can be determined based on expert experience.
  • the preset hyperparameters can correspond one-to-one with the neural network model.
  • the neural network model training device may store the preset hyperparameters and the corresponding relationship between the neural network model.
  • the efficiency of training the neural network model through the Bayesian optimization scheme is low.
  • the neural network model is trained according to the preset hyperparameters corresponding to the neural network model, and the accuracy of the neural network model obtained by training is low.
  • the generalization ability of the neural network model can be improved by adjusting the parameters of some layers of the neural network model obtained by training.
  • Fig. 5 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the present application.
  • an embodiment of the present application provides a method for training a neural network model.
  • the training data can be verified. During the training data verification process, you can verify whether the picture is damaged, if it is damaged, delete the damaged picture, and do not process the undamaged picture. During the training data verification process, you can also verify whether the picture is a three-channel picture, if not, convert it to a three-channel jpg format. During the training data verification process, the training data can also be balanced.
  • the preset conditions for the data volume ratio of various types of training data can be preset. If the amount of various types of training data is roughly equal, and the proportion of the amount of various types of training data is less than the preset condition, no processing is performed. If the amount of various types of training data differs greatly, that is, if the proportion of the amount of training data of certain two types does not meet the preset condition, a warning message can be output. The warning message is used to indicate that the training data is not balanced.
  • the training data can be formatted. Data format conversion can also be understood as the sorting or packaging of training data. During the conversion of the training data format, the image data and its labels can be converted to the tfrecord format.
  • the neural network model is trained.
  • Indication information can be obtained to indicate the type of neural network model to be trained. That is, the neural network model of the specified type can be used. It is also possible to train the default type of neural network model.
  • the Bayesian optimization scheme can adjust the hyperparameters and automatically adjust the parameters. However, the efficiency of the Bayesian optimization scheme is low, and it takes a long time to optimize the hyperparameters.
  • the Bayesian optimization scheme can be seen in the description of Figure 8.
  • the first preset value may be, for example, 200, and the neural network model may be trained according to the training data.
  • the data amount of the training data of a single category may be the data amount of the category with the smallest amount of data in the training data, or the data amount of each category in the training data may be averaged as the data amount of the training data of the single category.
  • the second preset value can be 200,000, for example.
  • the Bayesian optimization scheme can be used to adjust the network structure of the neural network model and optimize the hyperparameters, and train according to the training data.
  • Neural network model The structure of all or part of the neural network model can be adjusted, and the parameters of all or part of the neural network model can be adjusted.
  • the neural network model can be trained according to the preset hyperparameters and training data corresponding to the neural network model.
  • the optimal neural network model is obtained.
  • the neural network model can be trained according to the small sample learning scheme.
  • the robustness of the neural network model obtained by training can be enhanced, that is, the generalization ability can be improved, and thus the accuracy can be improved.
  • the neural network model may be trained according to the preset hyperparameters corresponding to the neural network model first.
  • the accuracy of the neural network model obtained by training reaches the standard, no small sample learning is performed.
  • the accuracy can reach 95%, for example.
  • Small-sample learning solutions include cluster-based small-sample learning solutions, small-sample learning solutions based on fine tune, and so on. Refer to the description of FIG. 6 for the cluster-based small sample learning solution, and refer to the description of FIG. 7 for the small sample learning solution based on fine tune.
  • the accuracy of the neural network model can also be understood as the accuracy of the neural network model, and the accuracy of the neural network model can be determined on the training data or other labeled data.
  • a variety of small sample learning schemes can be used to train the neural network model.
  • the neural network model with the highest accuracy can be used as the optimal neural network model.
  • the neural network model is trained according to the preset hyperparameters, and the accuracy of the neural network model obtained by training is low.
  • the Bayesian optimization scheme a neural network model with higher accuracy can be obtained, but in the case of a large amount of data, the efficiency is low and it takes a long time.
  • the training result includes the optimal neural network model obtained by training.
  • the training result may also include the processing result of the optimal neural network model on part of the training data, and the highlighted mark of the part of each training data that has the greatest impact on the processing result. For example, some pixels in the image of the training data that have the greatest impact on the processing result can be highlighted to highlight.
  • the reason for the accuracy of the neural network model obtained by the training can be manually judged.
  • the reason may include, for example, poor training data, and/or hyperparameters for training need to be further optimized.
  • the embodiment of this application uses a small sample learning scheme to train the neural network model.
  • it is between 200-2000 sheets/class, use Bayesian optimization combined with network-wide fine-tuning technology to train the classification model.
  • the sample size is greater than 2000 sheets/class, because the sample size is sufficient, directly use the preset determined based on manual experience Hyperparameters train the neural network model to obtain a high-precision classification model.
  • neural network model training In the process of neural network model training, early stop technology can be combined, that is, when the number of iterations reaches the preset number of iterations, the accuracy of the neural network model is no longer improved, and the training of the neural network model can be stopped.
  • the neural network model training method provided in the embodiments of the application is completely automated, does not rely on expert tuning, and is simple and easy to use. Especially when the sample is less than 30 sheets/class, small sample learning based on clustering is used to ensure the accuracy of the model.
  • the preset hyperparameters corresponding to the neural network model train the neural network model, which can be understood as a general training strategy preset in the system.
  • the neural network model is trained by the neural network model training method provided in the embodiments of the present application, and the neural network model obtained by training can achieve the same or better accuracy as the artificial parameter tuning.
  • Fig. 6 is a schematic flowchart of a cluster-based small sample learning solution provided by an embodiment of the present application.
  • a small sample learning scheme can be used to train the neural network model.
  • a small sample learning scheme can be used to train the neural network model.
  • the characteristics of the training data can be extracted through the neural network model.
  • the relational network can use a clustering algorithm to cluster the training data to determine the type of the training data.
  • the training data can be clustered according to the characteristics of the training data to determine the clustering result.
  • cross entropy loss (cross entropy loss) is used to adjust the neural network model.
  • cross entropy loss By minimizing the cross-entropy loss, the feature distance between the features extracted by the neural network model can be increased, that is, the feature distance between the features of different classes can be increased.
  • Training data includes supporting data and query data.
  • the supporting data may include all or part of the training data.
  • the query data may include all or part of the training data.
  • the union of the support data and the query data may include all the data in the training data.
  • Supporting data and query data may or may not have intersections.
  • the supporting data includes all or part of the data of each category in the training data.
  • the query data includes all or part of the data of each category in the training data.
  • the class center feature of each class in the first training data can be calculated according to the features of the supporting data.
  • the class center feature of each class is the average of the corresponding bits of all supporting data features of the class.
  • the parameters of some layers in the neural network model are adjusted to obtain the adjusted neural network model.
  • the cross entropy loss can be calculated according to the feature distance between the class center feature of each category and the query data feature.
  • the parameters of some layers in the neural network model are adjusted to minimize the cross-entropy loss, so as to obtain the adjusted neural network model.
  • Part of the data can be randomly selected from the training data to form a support set, and other data to form a query set.
  • the average value of all the features of the support data in the support set extracted by the neural network can be called the support feature
  • the feature of the query data in the query set extracted by the neural network can be called the query feature.
  • the support set includes each type of support data
  • the query set includes each type of support data.
  • the cross entropy loss is calculated according to the feature distance between the support feature of the category to which the query data corresponding to the query feature belongs and the feature distance between the query feature. According to the cross entropy loss, adjust the parameters of the neural network model and train the neural network model.
  • the transfer learning program can be used in the learning process of Xiaoyan.
  • a transfer learning scheme that can fine-tune the parameters of the neural network model obtained by pre-training, that is, adjust the parameters of some layers.
  • This kind of transfer learning scheme can also be called fine-tuning of the neural network model. For details of the fine-tuning scheme, refer to the description of FIG. 7.
  • the neural network model obtained by pre-training may be a neural network model trained on a public data set.
  • the parameters of some layers in the neural network model obtained by pre-training are adjusted to improve the generalization ability of the neural network model obtained by the final training.
  • center loss In order to improve the training accuracy of the neural network model and the accuracy of the neural network model, when adjusting the neural network model, center loss can be introduced.
  • the center loss may be calculated according to the average value of the distance between the features of the first training data of each category.
  • Training through cross-entropy loss can increase the distance between classes; training through center loss can reduce the distance within classes. According to the cross entropy loss and center loss, training the neural network model can improve the efficiency of neural network model training and improve the accuracy of the neural network model.
  • the neural network model extracts the features of the training data.
  • the bit width of the features may be relatively large. Saving the features of the training data and the calculation based on the features of the training data occupy more resources.
  • the features extracted by the neural network model can be compressed.
  • the deep hash method can be used to compress the features extracted by the neural network model.
  • the characteristic distance can be expressed by Hamming distance.
  • the average value of the characteristics of each type of training data can be determined according to the characteristics of the training data extracted by the neural network model obtained by the training, as each The central feature of the category.
  • the characteristics of the data to be classified can be extracted according to the neural network model obtained by training, according to the characteristics of the data to be classified and the central feature of each category
  • the feature distance of is used to classify the data to be classified. For example, it can be determined that the category corresponding to the smallest feature distance in the feature distance between the feature of the data to be classified and the central feature of each category is the category of the data to be classified.
  • the cluster-based small-sample learning solution uses transfer learning for feature extraction, ensuring the accuracy of the neural network model trained with a small number of training samples, and reducing the amount of training data for the neural network model training.
  • Dependence By performing deep hashing on the extracted features to compress the features, the feature volume is small, the efficiency of the feature calculation is improved, and the resource occupation is reduced.
  • FIG. 7 is a schematic flowchart of a small sample learning solution based on fine-tuning provided by an embodiment of the present application.
  • the prediction ability of the neural network model for samples outside the training set can be called the generalization ability of the neural network model.
  • An important topic in machine learning is to improve the generalization ability of neural network models.
  • a model with strong generalization ability is a good model.
  • the training neural network model is prone to underfitting. Due to insufficient training data for learning, the neural network model cannot learn the general rules in the training data, resulting in weak generalization ability.
  • transfer learning can be performed. According to less training data, the neural network model that has been trained through a large amount of data is trained again, and the parameters of some layers in the neural network model are adjusted. Adjusting the parameters of some layers in the neural network model can also be called the fine-tuning of the neural network model.
  • the parameters of the shallow network of the neural network model can be kept unchanged, that is, the weight of the shallow network is unchanged, and the parameters of the last few layers of the neural network model can be adjusted.
  • the robustness of the model can be ensured while ensuring the accuracy of the neural network model with a small sample size.
  • the neural network model trained on the large data set can also be fine-tuned in combination with the Bayesian optimization scheme.
  • Fig. 8 is a schematic flow chart of the Bayesian optimization scheme.
  • the Bayesian optimization scheme can adopt Gaussian process regression, random forest regression and other methods.
  • the replacement function of the objective function is different, that is, the function of the fitting curve used when performing the curve fitting is different.
  • Gaussian process regression Take the Gaussian process regression as an example.
  • step S801 the hyperparameters are initialized.
  • step S802 the neural network model is trained.
  • the network structure of the neural network model can be adjusted according to each set of hyperparameters in the multiple sets of hyperparameter training obtained by initialization, and the neural network model can be trained according to the set of hyperparameters, so as to obtain the basis of each set of hyperparameters in the multiple sets of hyperparameters The trained neural network model.
  • step S803 curve fitting.
  • each hyperparameter is fitted through the Gaussian distribution curve.
  • step S804 the hyperparameter corresponding to the maximum expected accuracy is determined.
  • the hyperparameters corresponding to the neural network model with the highest expected accuracy are obtained through the fitted curve.
  • steps S802-S804 are performed, and the neural network model is trained according to the hyperparameters corresponding to the neural network model with the highest accuracy, and the curve fitting is performed again, and the fitting curve is updated. According to the updated curve, the hyperparameters corresponding to the neural network model with the highest expected accuracy are obtained.
  • step S805 the optimal neural network model.
  • the final neural network model can be used as the optimal neural network obtained by training model.
  • one of hyperparameters such as learning rate (LR), learning rate decay rate, learning rate decay period, iteration period, batch size, dropout, etc. can be adjusted Or multiple parameters for optimization.
  • the neural network training method of the embodiment of the application is described in detail above in conjunction with the accompanying drawings.
  • the neural network training device of the embodiment of the application is described in detail below in conjunction with the accompanying drawings. It should be understood that the neural network training device described below is Each step of the neural network training method of the embodiment of the present application can be executed. To avoid unnecessary repetition, the repetitive description will be omitted when introducing the neural network model training device of the embodiment of the present application below.
  • Fig. 9 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application.
  • the device 3000 includes an acquisition module 3001 and a processing module 3002.
  • the acquisition module 3001 and the processing module 3002 may be used to execute the neural network training method of the embodiment of the present application.
  • the obtaining module 3001 may perform step S401, and the processing module 3002 may perform steps S402-S403.
  • the obtaining module 3001 is configured to obtain a neural network model, first training data, and a category of the first training data.
  • the neural network model is obtained by training according to the second training data, and the first training data includes supporting data and Query data, the support data includes all or part of the data of each category in the first training data, and the query data includes all or part of the data of each category in the first training data.
  • the processing module 3002 is configured to perform feature extraction on the first training data by using the neural network model to obtain features of the first training data.
  • the processing module 3002 is configured to adjust the parameters of some layers in the neural network model according to the feature distance between the class center feature of each category and the query data feature to obtain an adjusted neural network model.
  • Each bit in the center feature of the class is the average value of the corresponding bit of the feature of the supporting data of each class.
  • the processing module 3002 is configured to, according to the feature distance between the class center feature of each category and the feature of the query data, and the average value of the feature distance between the features of the first training data of each category, Adjust the parameters of the part of the layer.
  • the processing module 3002 is configured to input the first training data into the neural network model; perform deep hashing on the features extracted by the neural network model to obtain the features of the first training data.
  • the processing module 3002 is configured to, when the data amount of the first training data is less than a preset value, adjust the hyperparameters through the Bayesian optimization scheme, and adjust the hyperparameters according to the class center characteristics of each category and the query The feature distance of the data feature, adjust the parameters of the part of the layer; when the data amount of the first training data is greater than or equal to the preset value, according to the preset hyperparameters corresponding to the neural network model and the each The feature distance between a class center feature and the query data feature is adjusted to adjust the parameters of the partial layer.
  • the hyperparameters include one or more of learning rate, learning rate decay rate, learning rate decay period, number of iteration periods, batch size, abstention, and network structure parameters of the neural network model.
  • the obtaining module 3001 is configured to obtain the first training data and the category of the first training data.
  • the processing module 3002 is configured to: when the data amount of the first training data is less than a preset value, adjust hyperparameters through a Bayesian optimization scheme, and train according to the first training data and the category of the first training data Neural network model; when the data amount of the first training data is greater than or equal to the preset value, according to the preset hyperparameters corresponding to the neural network model, the first training data, and the first training data The category of training the neural network model.
  • the neural network model is obtained by training according to the second training data.
  • the processing module 3002 is configured to use the neural network model to perform feature extraction on the first training data to obtain features of the first training data.
  • the first training data includes supporting data and query data, and the supporting The data includes all or part of the data of each category in the first training data, and the query data includes all or part of the data of each category in the first training data.
  • the processing module 3002 is configured to adjust the parameters of some layers in the neural network model according to the feature distance between the class center feature of each category and the query data feature to obtain an adjusted neural network model.
  • Each bit in the center feature of the class is the average value of the corresponding bit of the feature of the supporting data of each class.
  • the processing module 3002 is configured to, according to the feature distance between the class center feature of each category and the feature of the query data, and the average value of the feature distance between the features of the first training data of each category, Adjust the parameters of the part of the layer.
  • the processing module 3002 is configured to input the first training data into the neural network model; perform deep hashing on the features extracted by the neural network model to obtain the features of the first training data.
  • the hyperparameters include one or more of learning rate, learning rate decay rate, learning rate decay period, number of iteration periods, batch size, abstention, and network structure parameters of the neural network model.
  • FIG. 10 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.
  • the electronic device 1000 shown in FIG. 10 includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004.
  • the memory 1001, the processor 1002, and the communication interface 1003 implement communication connections between each other through the bus 1004.
  • the memory 1001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1001 may store a program. When the program stored in the memory 1001 is executed by the processor 1002, the processor 1002 and the communication interface 1003 are used to execute each step of the neural network model training method of the embodiment of the present application.
  • the processor 1002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the neural network model training device of the embodiment of the present application, or to execute the neural network model training method of the method embodiment of the present application.
  • the processor 1002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the neural network model training method of the present application can be completed by the integrated logic circuit of hardware in the processor 1002 or instructions in the form of software.
  • the aforementioned processor 1002 may also be a general purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit, a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and combines its hardware to complete the functions required by the units included in the neural network model training apparatus of the embodiments of the present application, or execute the method embodiments of the present application The method of neural network model training.
  • the communication interface 1003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1000 and other devices or a communication network. For example, one or more of the neural network model, the first training data, etc. can be obtained through the communication interface 1003.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1000 and other devices or a communication network. For example, one or more of the neural network model, the first training data, etc. can be obtained through the communication interface 1003.
  • the bus 1004 may include a path for transferring information between various components of the device 1000 (for example, the memory 1001, the processor 1002, and the communication interface 1003).
  • An embodiment of the present application also provides a computer program storage medium, which is characterized in that the computer program storage medium has program instructions, and when the program instructions are directly or indirectly executed, the foregoing method can be realized.
  • An embodiment of the present application further provides a chip system, characterized in that the chip system includes at least one processor, and when the program instructions are executed in the at least one processor, the foregoing method can be realized.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

提供了人工智能领域中的一种神经网络模型训练的方法,包括:获取神经网络模型、第一训练数据和第一训练数据的类别,神经网络模型是根据第二训练数据训练得到的(S401),第一训练数据包括支持数据和查询数据,支持数据包括第一训练数据中的每一类的全部或部分数据,查询数据包括第一训练数据中每一类的全部或部分数据;利用神经网络模型对于第一训练数据进行特征提取,以得到第一训练数据的特征(S402);根据每一类的类中心特征与查询数据特征的特征距离,调整神经网络模型中部分层的参数,以得到调整后的神经网络模型(S403)。通过对训练得到的神经网络模型的部分层的参数进行调整,从而得到具有良好精度和泛化能力的神经网络模型。

Description

神经网络模型训练的方法和装置
本申请要求于2019年9月18日提交中国专利局、申请号为201910883124.1、申请名称为“神经网络模型训练的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种神经网络模型训练的方法及装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能
计算机视觉是各个应用领域,如制造业、检验、文档分析、医疗诊断,和军事等领域中各种智能/自主系统中不可分割的一部分,它是一门关于如何运用照相机/摄像机和计算机来获取我们所需的,被拍摄对象的数据与信息的学问。形象地说,就是给计算机安装上眼睛(照相机/摄像机)和大脑(算法)用来代替人眼对目标进行识别、跟踪和测量等,从而使计算机能够感知环境。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。总的来说,计算机视觉就是用各种成象系统代替视觉器官获取输入信息,再由计算机来代替大脑对这些输入信息完成处理和解释。计算机视觉的最终研究目标就是使计算机能像人那样通过视觉观察和理解世界,具有自主适应环境的能力。
基于聚类的小样本学习方案,通过神经网络模型提取训练数据的特征,并计算不同类别的训练数据的特征之间的距离,训练神经网络模型。由于小样本学习方案的训练数据有限,训练得到的神经网络模型泛化能力较差。
发明内容
本申请提供一种神经网络模型训练的方法,能够在训练数据的数据量较小或数据量不平衡的情况下,训练得到具有较高精度和良好泛化能力的神经网络模型。
第一方面,提供一种神经网络模型训练的方法,包括:获取神经网络模型、第一训练数据和所述第一训练数据的类别,所述神经网络模型是根据第二训练数据训练得到的,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的 特征;根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
利用训练得到的神经网络模型提取训练数据的特征,根据训练数据的特征之间的特征距离调整神经网络模型的部分层的参数,能够得到精度较高,且泛化能力较强的神经网络模型。
结合第一方面,在一些可能的实现方式中,所述根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,包括:根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
中心损失表示每一类的类中心特征与所述查询数据特征的特征距离。在神经网络模型的训练过程中,引入中心损失,可以提高神经网络模型训练的效率,提高神经网络模型的精度。
结合第一方面,在一些可能的实现方式中,所述利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,包括:将所述第一训练数据输入所述神经网络模型;对所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
通过对神经网络模型提取的特征进行深度哈希,可以减小特征的体积,减小训练时间,并且保证神经网络模型训练具有较高的精度。在采用训练得到的神经网络模型确定数据的类别的过程中,可以提升推理速度。
结合第一方面,在一些可能的实现方式中,所述根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,包括:当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数;当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数以及所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数。
在数据量较大时,通过贝叶斯优化方案训练神经网络模型效率较低。在数据量较小时,根据所述神经网络模型对应的预设超参数训练神经网络模型,训练得到的神经网络模型的精度较低。通过仅在第一训练数据的数据量较小时通过贝叶斯优化方案训练神经网络模型能够在提高训练得到的神经网路模型的精度,并提高训练效率。
结合第一方面,在一些可能的实现方式中,超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
第二方面,提供一种神经网络模型训练的方法,包括:获取第一训练数据和所述第一训练数据的类别;当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述第一训练数据和所述第一训练数据的类别,训练神经网络模型;当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数、所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型。
应当理解,神经网络模型的类型可以是默认的或指定的。神经网络模型可以存储在执行神经网络模型训练的方法的电子设备的存储器中,也可以接收其他电子设备发送的神经 网络模型。
结合第二方面,在一些可能的实现方式中,所述方法还包括:获取所述神经网络模型,所述神经网络模型是根据第二训练数据训练得到的;所述根据所述第一训练数据和所述第一训练数据的类别,训练神经网络模型,包括:利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
结合第二方面,在一些可能的实现方式中,所述方法还包括:根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
结合第二方面,在一些可能的实现方式中,所述方法还包括:将所述第一训练数据输入所述神经网络模型;将所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
结合第二方面,在一些可能的实现方式中,超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
第三方面,提供一种神经网络模型训练的装置,包括用于执行上述第一方面中的方法中的各个模块。
第四方面,提供一种神经网络模型训练的装置,包括用于执行上述第二方面中的方法中的各个模块。
第五方面,提供了一种神经网络模型训练的装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第一方面中的方法。
第六方面,提供了一种神经网络的训练装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第二方面中的方法。
第七方面,提供一种计算机存储介质,该计算机可读存储介质存储有程序代码,该程序代码包括用于执行第一方面或第二方面中的方法中的步骤的指令。
第八方面,提供一种芯片系统,所述芯片系统包括至少一个处理器,当程序指令在所述至少一个处理器中执行时,使得所述芯片系统执行第一方面或第二方面所述的方法。
可选地,作为一种实现方式,所述芯片系统还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面中的方法。
上述芯片系统具体可以是现场可编程门阵列(Field Programmable Gate Array,FPGA)或者专用集成电路(Application Specific Integrated Circuit,ASIC)。
应理解,本申请中,第一方面的方法具体可以是指第一方面以及第一方面中各种实现方式中的任意一种实现方式中的方法。
附图说明
图1是本申请实施例提供的系统架构的结构示意图。
图2是利用本申请实施例提供的卷积神经网络模型的示意图。
图3是本申请实施例提供的一种芯片硬件结构示意图。
图4是本申请一个实施例提供的一种神经网络模型训练的方法的示意性流程图。
图5是本申请另一个实施例提供的一种神经网络模型训练的方法的示意性流程图。
图6是本申请实施例提供的一种基于聚类的小样本学习的方法的示意性流程图。
图7是本申请实施例提供的一种微调的方法的示意图。
图8是贝叶斯优化方案的示意性流程图。
图9是本申请另一个实施例提供的一种神经网络模型训练的装置的示意性结构图。
图10是本申请一个实施例提供的一种电子装置的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
由于本申请实施例涉及大量神经网络模型的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络模型等相关概念进行介绍。
(1)神经网络模型
神经网络模型可以是由神经单元组成的,神经单元可以是指以x s和截距b为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2020102594-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络模型中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络模型是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络模型
深度神经网络模型(deep neural network,DNN),也称多层神经网络模型,可以理解为具有很多层隐含层的神经网络模型,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络模型可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。例如,全连接神经网络模型中层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2020102594-appb-000002
其中,
Figure PCTCN2020102594-appb-000003
是输入向量,
Figure PCTCN2020102594-appb-000004
是输出向量,
Figure PCTCN2020102594-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2020102594-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2020102594-appb-000007
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2020102594-appb-000008
的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假 设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020102594-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020102594-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络模型中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络模型的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络模型的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络模型
卷积神经网络模型(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络模型。卷积神经网络模型包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络模型中对输入信号进行卷积处理的神经元层。在卷积神经网络模型的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络模型的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络模型各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数
在训练深度神经网络模型的过程中,因为希望深度神经网络模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络模型的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络模型的训练就变成了尽可能缩小这个loss的过程。
(5)残差网络
在不断加神经网络模型的深度时,会出现退化的问题,即随着神经网络模型深度的增加,准确率先上升,然后达到饱和,再持续增加深度则会导致准确率下降。普通直连的卷积神经网络模型和残差网络(residual network,ResNet)的最大区别在于,ResNet有很多 旁路的支线将输入直接连到后面的层,通过直接将输入信息绕道传到输出,保护信息的完整性,解决退化的问题。残差网络包括卷积层和/或池化层。
残差网络可以是:深度神经网络模型中多个隐含层之间除了逐层相连之外,例如第1层隐含层连接第2层隐含层,第2层隐含层连接第3层隐含层,第3层隐含层连接第4层隐含层(这是一条神经网络模型的数据运算通路,也可以形象的称为神经网络模型传输),残差网络还多了一条直连支路,这条直连支路从第1层隐含层直接连到第4层隐含层,即跳过第2层和第3层隐含层的处理,将第1层隐含层的数据直接传输给第4层隐含层进行运算。公路网络可以是:深度神经网络模型中除了有上面所述的运算通路和直连分支之外,还包括一条权重获取分支,这条支路引入传输门(transform gate)进行权重值的获取,并输出权重值T供上面的运算通路和直连分支后续的运算使用。
(6)反向传播算法
卷积神经网络模型可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
(7)像素值
图像的像素值可以是一个红绿蓝(RGB)颜色值,像素值可以是表示颜色的长整数。例如,像素值为256×Red+100×Green+76×Blue,其中,Blue代表蓝色分量,Green代表绿色分量,Red代表红色分量。各个颜色分量中,数值越小,亮度越低,数值越大,亮度越高。对于灰度图像来说,像素值可以是灰度值。
(8)小样本学习
小样本研究的目的是设计相关的学习模型,使得该模型可以仅在少量的有标签样本中实现快速学习并识别出新样本的类别。目前存在的适用于小样本问题的研究思想有迁移学习方法和半监督学习方法,这些方法在一定程度上可以缓解少量数据训练过程中出现的过拟合问题和数据稀缺问题。
以上对神经网络模型的一些基本内容做了简单介绍,下面针对图像数据处理时可能用到的一些特定神经网络模型进行介绍。
下面结合图1对本申请实施例的系统架构进行详细的介绍。
图1是本申请实施例的系统架构的示意图。如图1所示,系统架构100包括执行设备110、训练设备120、数据库130、客户设备140、数据存储系统150、以及数据采集系统160。
另外,执行设备110包括计算模块111、I/O接口112、预处理模块113和预处理模块114。其中,计算模块111中可以包括目标模型/规则101,预处理模块113和预处理模块114是可选的。
数据采集设备160用于采集训练数据。针对本申请实施例的神经网络模型训练方法来说,训练数据可以包括第一训练数据以及第一训练数据的类别。在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的第一训练数据进行处理,将输出的查询数据的特征与每一类的类中心特征进行的特征距离的计算,直到训练设备120输出的查询数据的特征与每一类的类中心特征的特征距离满足预设条件,从而完成目标模型/规则101的训练。
上述目标模型/规则101能够用于实现本申请实施例的神经网络模型的分类,即,将待处理数据(通过相关预处理后)输入该目标模型/规则101,即可得到待处理数据的类别。本申请实施例中的目标模型/规则101具体可以为神经网络模型。需要说明的是,在实际应用中,数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图1所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图1中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理数据。这里的客户设备140具体可以是终端设备。
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据(如待处理数据)进行预处理,在本申请实施例中,可以没有预处理模块113和预处理模块114或者只有的一个预处理模块。当不存在预处理模块113和预处理模块114时,可以直接采用计算模块111对输入数据进行处理。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如将目标模型/规则101计算得到的分类结果呈现给客户设备140,从而提供给用户。
具体地,经过计算模块111中的目标模型/规则101处理得到的分类结果可以通过预处理模块113(也可以再加上预处理模块114的处理)的处理后将处理结果送入到I/O接口,再由I/O接口将处理结果送入到客户设备140中显示。
应理解,当上述系统架构100中不存在预处理模块113和预处理模块114时,计算模块111还可以将处理得到的分类结果传输到I/O接口,然后再由I/O接口将处理结果送入到客户设备140中显示。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户 设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图1中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图1所示,根据训练设备120训练得到目标模型/规则101,可以是本申请实施例中的神经网络模型,具体的,本申请实施例提供的神经网络模型可以是CNN以及深度卷积神经网络模型(deep convolutional neural networks,DCNN)等等。
由于CNN是一种非常常见的神经网络模型,下面结合图2重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络模型是一种带有卷积结构的深度神经网络模型,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络模型,该前馈人工神经网络模型中的各个神经元可以对输入其中的数据作出响应。
如图2所示,卷积神经网络模型(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及全连接层(fully connected layer)230。下面对这些层的相关内容做详细介绍。
卷积层/池化层220:
卷积层:
如图2所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层221为例,介绍一层卷积层对图像进行处理的内部工作原理。
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵, 即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络模型200进行正确的预测。
当卷积神经网络模型200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络模型200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图2中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
全连接层230:
在经过卷积层/池化层220的处理后,卷积神经网络模型200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络模型200需要利用全连接层230来生成一个或者一组所需要的类的数量的输出。因此,在全连接层230中可以包括多层隐含层(如图2所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在全连接层230中的多层隐含层之后,也就是整个卷积神经网络模型200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络模型200的前向传播(如图2由210至240方向的传播为前向传播)完成,反向传播(如图2由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络模型200的损失,及卷积神经网络模型200通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络模型200仅作为一种卷积神经网络模型的 示例,在具体的应用中,卷积神经网络模型还可以以其他网络模型的形式存在。
应理解,可以采用图2所示的卷积神经网络模型(CNN)200执行本申请实施例的分类方法,如图2所示,待处理数据经过输入层210、卷积层/池化层220和全连接层230的处理之后可以得到待处理数据的类别。
图3为本申请实施例提供的一种芯片硬件结构,该芯片包括神经网络模型处理器50。该芯片可以被设置在如图1所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则101。如图2所示的卷积神经网络模型中各层的算法均可在如图3所示的芯片中得以实现。
神经网络模型处理器(neural-network processing unit,NPU)50作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路503从权重存储器502中取矩阵B相应的数据,并缓存在运算电路503中每一个PE上。运算电路503从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)508中。
向量计算单元507可以对运算电路503的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络模型中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现中,向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络模型中的后续层中的使用。
统一存储器506用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器505(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过 程。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(on-chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,简称DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
另外,在本申请中,图2所示的卷积神经网络模型中各层的运算可以由运算电路503或向量计算单元507执行。
深度学习技术迅速发展,但是,目前神经网络模型的训练还存在一定的难度,需要有一定经验的工程师进行对神经网络模型进行参数的调整,并且进行学习模型的选择。学习模型包括多种,如小样本学习、迁移学习等等。目前来说,要实现神经网络模型的高精度还依赖于专家经验对训练神经网络模型的参数进行调整,耗时耗力,不利于相关业务的快速迭代。
在传统的机器学习的框架下,机器学习的任务就是在给定充分训练数据的基础上来学习一个分类模型;然后利用这个学习到的模型来对测试数据进行分类与预测。然而,机器学习算法存在着一个关键的问题:一些新出现的领域中很难获取大量训练数据。
大量新的领域不断涌现,传统的机器学习需要对每个领域都标定大量训练数据,这将会耗费大量的人力与物力。而没有大量的标注数据,会使得很多与学习相关研究与应用无法开展。通常可能发生的一种情况是训练数据过期。这往往需要我们去重新标注大量的训练数据以满足我们训练的需要,但标注新数据是非常昂贵的,需要大量的人力与物力。从另外一个角度上看,如果我们有了大量的、在不同分布下的训练数据,完全丢弃这些数据也是非常浪费的。如何合理的利用这些数据就是迁移学习主要解决的问题。迁移学习(transfer learning)可以从现有的数据中迁移知识,用来帮助将来的学习。迁移学习的目标是将从一个环境中学到的知识用来帮助新环境中的学习任务。
神经网络模型训练过程中,需要对影响性能的超参数(hyper-parameter)进行设置和调整。定义神经网络模型属性或者定义训练过程的参数,可以称为超参数。超参数包括学习率(learning rate,LR)、学习率衰减速率、学习率衰减周期、迭代(iterations)周期数量、批尺寸(batch size)、神经网络模型的网络结构参数等中的一种多多种。
运用梯度下降算法进行优化时,权重的更新规则中,在梯度项前会乘以一个系数,这个系数可以称为学习率。学习率是监督学习以及深度学习中重要的超参数,其决定着目标函数能否收敛到局部最小值以及何时收敛到最小值。
为了防止学习率过大,在收敛到全局最优点的时候会来回摆荡,可以通过设置学习率衰减速率使得学习率随着训练轮数不断下降,收敛梯度下降的学习步长。随迭代次数增加减少学习率来加快学习。超参数中设置的学习率也可以理解为初始的学习率。
学习率衰减速率可以理解为每个迭代周期学习率的下降值。每经过一个学习率衰减周期,学习率下降。学习率衰减周期可以是迭代周期的正整数倍。
迭代(iterations)周期数量也可以称为轮(epochs),可以理解为向前和向后传播中所有批次的单次训练迭代。这意味着1个周期是整个输入数据的单次向前和向后传递。简单说,epochs指的就是训练过程中训练数据将被“轮”多少次。举个例子,训练集有1000 个样本,batch size=10,那么训练完整个样本集需要100次迭代,1次epoch。
可以通过自动调参的方式对超参数进行调整,自动调参的方式如网格搜索(grid search)、随机搜索(random search)、遗传算法(genetic algorithm)、粒子群优化(paticle swarm optimization)、贝叶斯优化(Bayesian optimization)等。下面以贝叶斯优化为例进行说明。
在训练数据的数据量较少的情况下,通过聚类方案,可以得到具有分类功能的神经网络模型。但是,由于训练数据的数据量较少,得到的模型泛化能力较弱。为了解决上述问题,本申请提出了一种神经网络模型训练的方法。
图4是本申请实施例提供的一种神经网络模型训练的方法的示意性流程图。
在步骤S401,获取神经网络模型、第一训练数据和所述第一训练数据的类别。
获取可以是从存储器中读取,也可以是从其他设备处接收。神经网络模型可以是根据第二训练数据训练得到的。第二训练数据可以是与第一训练数据不同的数据,第二训练数据例如可以是公开数据集的全部或部分数据。
第一训练数据包括支持数据和查询数据。支持数据包括所述第一训练数据中的每一类的全部或部分数据。查询数据包括所述第一训练数据中每一类的全部或部分数据。
第一训练数据例如可以是文本、语音、图像等。第一训练数据的类别例如可以是一句话里每个词的词性(名词、动词等),或可以是一段人的语音对应的其此人说话时的情绪,或可以是图片中的人或物体的类别等等。
在步骤S402,利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征。
第一训练数据的特征可以是神经网络模型提取的特征,也可以是对神经网络模型提取的特征进行处理得到的。可以将所述第一训练数据输入所述神经网络模型,将所述神经网络模型提取的特征进行深度哈希,从而得到所述第一训练数据的特征。即第一训练数据的特征可以是对神经网络模型提取的特征进行深度哈希的结果。特征距离可以通过汉明距离表示。
通过对神经网络模型提取的特征进行深度哈希,可以减小特征的体积,减小训练时间,并且保证神经网络模型训练具有较高的精度。在采用训练得到的神经网络模型确定数据的类别的过程中,可以提升推理速度。
在步骤S403,根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型。
在步骤S403之前,可以计算每一类的类中心特征。每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
还可以根据每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。中心损失可以用于表示每一类的第一训练数据的特征之间的特征距离的平均值。
在神经网络模型训练的过程中,引入中心损失,可以提高神经网络模型训练的效率,提高神经网络模型的精度。
当所述第一训练数据的数据量满足预设条件时,通过贝叶斯优化方案调整所述神经网络模型中部分层的网络结构并优化超参数,根据所述每一类的类中心特征与所述查询数据特征的特征距离,调整神经网络模型的部分层的参数。
调整神经网络模型的部分层的参数,调整的层可以是预设值的。可以调整神经网络模型中最后几层的参数。
当所述第一训练数据的数据量不满足预设条件时,根据所述神经网络模型对应的预设超参数以及所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数。
预设超参数可以是根据专家经验确定的。预设超参数可以与神经网络模型一一对应。神经网络模型训练的装置可以存储有预设超参数可以与神经网络模型的对应关系。
在数据量较大时,通过贝叶斯优化方案训练神经网络模型效率较低。在数据量较小时,根据所述神经网络模型对应的预设超参数训练神经网络模型,训练得到的神经网络模型的精度较低。通过仅在第一训练数据的数据量较小时通过贝叶斯优化方案训练神经网络模型能够在提高训练得到的神经网路模型的精度,并提高训练效率。
通过步骤S401-S403,通过对训练得到的神经网络模型的部分层的参数进行调整,能够提高神经网络模型的泛化能力。
图5是本申请实施例提供的一种神经网络模型训练的方法的示意性流程图。
为了解决通过人工调整训练神经网络模型的参数效率较低的问题,本申请实施例提供了一种神经网络模型训练的方法。
首先进行训练数据的预处理。以训练数据为图像数据为例进行说明。
可以对训练数据进行校验。训练数据校验过程中,可以校验图片是否损坏,如果损坏,删除损坏的图片,对未损坏的图片不进行处理。训练数据校验过程中,也可以校验图片是否为三通道图片,如果不是,转成三通道jpg格式。训练数据校验过程中,还可以对训练数据进行平衡校验。可以预设置各类训练数据的数据量比例的预设条件。如果各类训练数据的数量大致相等,各类训练数据的数据量比例均小于预设条件,不进行处理。如果各类训练数据的数量差异较大,即存在某两类的训练数据量比例不满足预设条件,可以输出警告信息。警告信息用于指示训练数据不平衡。
可以对训练数据进行格式转换。数据格式转换,也可以理解为对训练数据的整理或打包。在训练数据格式转换过程中,可以将图片数据及其标签转换为tfrecord格式。
然后,根据预处理得到的训练数据,训练神经网络模型。
可以获取指示信息,用于指示训练的神经网络模型的类型。即可以对具有指定类型的神经网络模型。也可以对默认类型的神经网络模型进行训练。
贝叶斯优化方案可以对超参数进行调整,自动调节参数。但是贝叶斯优化方案的效率较低,优化超参数需要占用较长的时间。贝叶斯优化方案可以参见图8的说明。
当单个类别的训练数据的数据量大于或等于第一预设值时,第一预设值例如可以为200,可以根据训练数据进行神经网络模型的训练。
单个类别的训练数据的数据量可以是训练数据中数据量最小的类别的数据量,也可以是将训练数据中每个类别的数据量取平均值,作为单个类别的训练数据的数据量。
当训练数据的总数据量小于第二预设值时,第二预设值例如可以为20万,可以通过贝叶斯优化方案,调整神经网络模型的网络结构并优化超参数,根据训练数据训练神经网络模型。可以对神经网络模型的全部或部分层的结构进行调整,并调整神经网络模型的全部或部分层的参数。
当训练数据的总数据量大于或等于第二预设值时,可以根据神经网络模型对应的预设超参数和训练数据,对神经网络模型进行训练。
通过对神经网络模型的训练,得到最优的神经网络模型
当单个类别的训练数据的数据量小于第一预设值时,可以根据小样本学习方案,训练神经网络模型。通过小样本学习方案,可以增强训练得到的神经网络模型的鲁棒性,即提高泛化能力,从而提高准确性。
在单个类别的训练数据的数据量小于第一预设值的情况下,可以先根据神经网络模型对应的预设超参数训练神经网络模型。在训练得到的神经网络模型的精度达标时不再进行小样本学习。精度达标例如可以是精度达到95%。小样本学习方案包括基于聚类的小样本学习方案、基于微调(fine tune)的小样本学习方案等。基于聚类的小样本学习方案可以参见图6的说明,基于微调(fine tune)的小样本学习方案可以参见图7的说明。神经网络模型的精度也可以理解为神经网络模型的准确度,可以在训练数据或其他标注数据上确定神经网络模型的准确度。
可以进行多种小样本学习方案训练神经网络模型,在训练的多个神经网络模型中,可以将精度最高的神经网络模型作为最优的神经网络模型。
在数据量较小的情况下,根据预设的超参数训练神经网络模型,训练得到的神经网络模型的精度较低。通过贝叶斯优化方案能够得到精度较高的神经网络模型,但是在数据量较大的情况下,效率较低,需要占用较长的时间。
根据训练数据的总数据量选择是否使用贝叶斯优化方案调整神经网络模型的网络结构和训练神经网络模型的超参数,能够在保证训练得到的神经网络模型的精度的情况下,减少神经网络模型训练占用的时间,减小对资源的占用。
最后,输出训练结果。训练结果包括训练得到的最优的神经网络模型。训练结果还可以包括最优的神经网络模型对部分训练数据的处理结果,以及每个训练数据中对处理结果影响最大的部分的突出显示标记。例如,可以对训练数据的图像中对处理结果影响最大的部分像素进行高亮以突出显示。
根据每个训练数据中对处理结果影响最大的部分的突出显示标记,可以通过人工判断影响训练得到的神经网络模型精度的原因。该原因例如可以包括训练数据的较差,和/或进行训练的超参数需要进一步优化等。
通过本申请实施例提供的神经网络模型训练的方法,在当样本量较少,如当样本量少于200张/类时,本申请实施例利用小样本学习方案训练神经网络模型,当样本量介于200-2000张/类之间时,利用贝叶斯优化结合全网微调技术训练分类模型,当样本量大于2000张/类时,由于样本量充足,直接利用根据人工经验确定的预设超参数训练神经网络模型,从而得到高精度分类模型。在神经网络模型训练的过程中,可以结合早停(early stop)技术,即当迭代次数达到预设的迭代次数之前,神经网络模型的精度不再提升,可以停止对神经网络模型的训练。本申请实施例提供的神经网络模型训练的方法完全自动化,不依赖专家调优,简单易用。尤其是当样本小于30张/类时,利用基于聚类的小样本学习保证模型精度。
结合自动调参的方案和神经网络模型对应的预设超参数,解决了需要人工调参的问题,可以完全摆脱人工调参的繁琐的过程,自动调节参数,提高了神经网络模型的精度。 神经网络模型对应的预设超参数对神经网络模型进行训练,可以理解为系统内预置的通用训练策略。
通过本申请实施例提供的神经网络模型训练的方法对神经网络模型进行训练,训练得到的神经网络模型能够实现与人工调参相同或更好的精度。
图6是本申请实施例提供的一种基于聚类的小样本学习方案的示意性流程图。
为了保证小样本量情况下,神经网络模型的精度,可以采用小样本学习方案对神经网络模型进行训练。参见图5,可以在训练数据中单类训练数据的数据量小于200张时,采用小样本学习方案对神经网络模型进行训练。
可以通过神经网络模型提取训练数据的特征。根据训练数据的特征之间的距离,关系网络可以利用聚类算法对训练数据进行聚类,从而确定训练数据的类别。
利用关系网络,可以根据训练数据的特征对训练数据进行聚类,从而确定聚类结果。
在传统的小样本学习的方案中,采用交叉熵损失(cross entropy loss)对神经网络模型进行调整。通过最小化交叉熵损失,可以增大神经网络模型提取的特征在不同的类之间的特征距离,也就是说,增加不同的类的特征之间的特征距离。
利用所述神经网络模型对于训练数据进行特征提取,可以得到训练数据的特征。训练数据包括支持数据和查询数据。支持数据可以包括训练数据中的全部或部分数据。查询数据可以包括训练数据中的全部或部分数据。支持数据与查询数据的并集可以包括训练数据中的全部数据。支持数据与查询数据可以存在交集,也可以不存在交集。支持数据包括训练数据中的每一类的全部或部分数据。查询数据包括训练数据中的每一类的全部或部分数据。
可以根据支持数据的特征,计算第一训练数据中的每一类的类中心特征。每一类的类中心特征为该类的所有支持数据的特征对应位的平均值。
根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,可以得到调整后的神经网络模型。可以根据每一类的类中心特征与所述查询数据特征的特征距离,计算交叉熵损失。调整所述神经网络模型中部分层的参数,以最小化交叉熵损失,从而得到调整后的神经网络模型。
可以从训练数据中随机选择部分数据形成支持集合,其他数据形成查询集合。神经网路提取的支持集合中所有支持数据的特征的平均值可以称为支持特征,神经网路提取的查询集合中查询数据的特征可以称为查询特征。支持集合中包括每一类的支持数据,查询集合中包括每一类的支持数据。根据查询特征对应的查询数据所属的类别的支持特征和该查询特征之间的特征距离,计算交叉熵损失。根据交叉熵损失,调整神经网络模型的参数,训练神经网络模型。
在训练数据的数据量较小的情况下,对初始的神经网络模型进行训练,可能导致训练得到的神经网络模型仅在训练数据中适用,泛化能力较差。因此,在晓燕本学习的过程中可以采用迁移学习方案。一种迁移学习的方案,可以对预训练得到的神经网络模型的参数进行微调,即调整部分层的参数。这种迁移学习的方案也可以称为神经网络模型的微调。微调的方案具体可以参见图7的说明。
预训练得到的神经网络模型可以是在公共数据集上训练得到的神经网络模型。在训练数据的数据量较小的情况下,通过对预训练得到的神经网络模型中部分层的参数进行调 整,提高最终训练得到的神经网络模型的泛化能力。
为了提高神经网络模型的训练的准确性,提高神经网络模型的精度,在调整神经网络模型时,可以引入中心损失(center loss)。可以根据每一类的所述第一训练数据的特征之间的距离的平均值,计算中心损失。
通过交叉熵损失进行训练可以增大类间距离;通过中心损失进行训练可以缩小类内距离。根据交叉熵损失和中心损失,训练神经网络模型,可以提高神经网络模型训练的效率,提高神经网络模型的精度。
神经网络模型对训练数据进行特征的提取,特征的位宽可能较大,保存训练数据的特征以及根据训练数据的特征的计算占用较多的资源。可以对神经网络模型提取的特征进行压缩。可以采用深度哈希的方式,对神经网络模型提取的特征进行压缩。特征距离可以通过汉明距离表示。通过对神经网络模型提取的特征进行深度哈希,可以减小特征的体积,减小训练时间,并且保证神经网络模型训练具有较高的精度。在采用训练得到的神经网络模型确定数据的类别的过程中,可以提升推理速度。
应用基于聚类的小样本学习方案得到的神经网络模型对数据进行分类之前,可以根据训练得到的神经网络模型提取的训练数据的特征,确定每一类训练数据的特征的平均值,作为每一个类别的中心特征。在应用基于聚类的小样本学习方案得到的神经网络模型对待分类数据进行分类时,可以根据训练得到的神经网络模型提取待分类数据的特征,根据待分类数据的特征与每一个类别的中心特征的特征距离,对待分类数据进行分类。例如可以确定待分类数据的特征与各个类别的中心特征的特征距离中最小的特征距离对应的类别为该待分类数据的类别。
本申请实施例提供的基于聚类的小样本学习方案,利用迁移学习的方式进行特征提取,保证少量训练样本下的训练得到的神经网络模型的精度,降低神经网络模型训练对训练数据的数据量的依赖。通过对提取的特征进行深度哈希从而进行特征的压缩,特征体积小,提高对特征的计算的效率,减少资源占用。
图7是本申请实施例提供的一种基于微调的小样本学习方案的示意性流程图。
神经网络模型对训练集以外样本的预测能力可以称为神经网络模型的泛化能力。机器学习中一个重要的话题便是提高神经网络模型的泛化能力,泛化能力强的模型才是好模型。在训练数据的数据量不足的情况下,训练神经网络模型容易出现欠拟合,神经网络模型由于用于学习的训练数据不足,无法学习到训练数据中的一般规律,因而导致泛化能力弱。
为了解决在训练数据的数据量不足的情况下的训练神经网络模型的泛化能力较差的问题,可以进行迁移学习。根据较少的训练数据,对已经通过大量数据训练得到的神经网络模型再次进行训练,调整神经网络模型中部分层的参数。对神经网络模型中部分层的参数进行调整,也可以称为神经网络模型的微调。
可以基于业界开源,大数据集上训练的神经网络模型,保持神经网络模型浅层网络的参数不变,即浅层网络权重不变,调整神经网络模型最后几层的参数。通过对大数据集上训练的神经网络模型进行微调,可以保证模型鲁棒性的同时保证小样本量下的神经网络模型精度。
还可以结合贝叶斯优化方案对大数据集上训练的神经网络模型进行微调。
图8是贝叶斯优化方案的示意性流程图。
贝叶斯优化方案可以采用高斯过程回归、随机森林回归等方式。对于不同的方式,目标函数的代替函数不同,即进行曲线拟合时采用的拟合曲线的函数不同。以采用高斯过程回归为例进行说明。
在步骤S801,初始化超参数。
通过初始化,可以获取多组超参数,即训练神经网络模型的参数。
在步骤S802,神经网络模型训练。
可以根据初始化得到的多组超参数训练中的每组超参数,调整神经网络模型的网络结构,并根据该组超参数对神经网络模型进行训练,从而得到根据多组超参数中每组超参数训练得到的神经网络模型。
在步骤S803,曲线拟合。
假设神经网络模型的超参数与神经网络模型的精度之间的关系符合高斯分布,通过高斯分布曲线对各个超参数进行拟合。
在步骤S804,确定精度期望最大值对应的超参数。
通过拟合的曲线得到精度期望最高神经网络模型对应的各个超参数。
之后,进行步骤S802-S804,通根据精度最高神经网络模型对应的各个超参数训练神经网络模型,并重新进行曲线拟合,更新拟合曲线。根据更新后的曲线得到对应的精度期望最高神经网络模型对应的各个超参数。
在步骤S805,最优神经网络模型。
当达到预设的最高迭代次数,或者通过曲线拟合得到的对应的精度期望最高神经网络模型对应的各个超参数不再变化,则可以将最终得到的神经网络模型作为训练得到的最优神经网络模型。
通过步骤S801-S805,可以对学习率(learning rate,LR)、学习率衰减速率、学习率衰减周期、迭代(iterations)周期、批尺寸(batch size)、弃权(dropout)等超参数中的一个或多个参数进行优化。
上文结合附图对本申请实施例的神经网络训练的方法进行了详细描述,下面结合附图对本申请实施例的神经网络训练的装置进行详细的描述,应理解,下面描述的神经网络训练的装置能够执行本申请实施例的神经网络训练的方法的各个步骤,为了避免不必要的重复,下面在介绍本申请实施例的神经网络模型训练的装置时适当省略重复的描述。
图9是本申请实施例提供的一种神经网络训练装置的示意性结构图。装置3000包括获取模块3001和处理模块3002。获取模块3001和处理模块3002可以用于执行本申请实施例的神经网络训练的方法。
在一些实施例中,具体地,获取模块3001可以执行步骤S401,处理模块3002可以执行步骤S402-S403。
获取模块3001用于,获取神经网络模型、第一训练数据和所述第一训练数据的类别,所述神经网络模型是根据第二训练数据训练得到的,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据。
处理模块3002用于,利用所述神经网络模型对于所述第一训练数据进行特征提取, 以得到所述第一训练数据的特征。
处理模块3002用于,根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
可选地,处理模块3002用于,根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
可选地,处理模块3002用于,将所述第一训练数据输入所述神经网络模型;对所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
可选地,处理模块3002用于,当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数;当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数以及所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数。
可选地,超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、弃权、神经网络模型的网络结构参数中的一种或多种。
在另一些实施例中,获取模块3001用于,获取第一训练数据和所述第一训练数据的类别。
处理模块3002用于,当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述第一训练数据和所述第一训练数据的类别,训练神经网络模型;当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数、所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型。
可选地,所述神经网络模型是根据第二训练数据训练得到的。
处理模块3002用于,利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据。
处理模块3002用于,根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
可选地,处理模块3002用于,根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
可选地,处理模块3002用于,将所述第一训练数据输入所述神经网络模型;将所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
可选地,超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、弃权、神经网络模型的网络结构参数中的一种或多种。
图10是本申请实施例提供的一种电子装置的硬件结构示意图。图10所示的电子装置1000(该装置1000具体可以是一种计算机设备)包括存储器1001、处理器1002、通信接 口1003以及总线1004。其中,存储器1001、处理器1002、通信接口1003通过总线1004实现彼此之间的通信连接。
存储器1001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1001可以存储程序,当存储器1001中存储的程序被处理器1002执行时,处理器1002和通信接口1003用于执行本申请实施例的神经网络模型训练的方法的各个步骤。
处理器1002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的神经网络模型训练的装置中的单元所需执行的功能,或者执行本申请方法实施例的神经网络模型训练的方法。
处理器1002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的神经网络模型训练的方法的各个步骤可以通过处理器1002中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1001,处理器1002读取存储器1001中的信息,结合其硬件完成本申请实施例的神经网络模型训练的装置中包括的单元所需执行的功能,或者执行本申请方法实施例的神经网络模型训练的方法。
通信接口1003使用例如但不限于收发器一类的收发装置,来实现装置1000与其他设备或通信网络之间的通信。例如,可以通过通信接口1003获取神经网络模型、第一训练数据等中的一种或多种。
总线1004可包括在装置1000各个部件(例如,存储器1001、处理器1002、通信接口1003)之间传送信息的通路。
本申请实施例还提供一种计算机程序存储介质,其特征在于,所述计算机程序存储介质具有程序指令,当所述程序指令被直接或者间接执行时,使得前文中的方法得以实现。
本申请实施例还提供一种芯片系统,其特征在于,所述芯片系统包括至少一个处理器,当程序指令在所述至少一个处理器中执行时,使得前文中的方法得以实现。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装 置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种神经网络模型训练的方法,其特征在于,包括:
    获取神经网络模型、第一训练数据和所述第一训练数据的类别,所述神经网络模型是根据第二训练数据训练得到的,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;
    利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征;
    根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
  2. 根据权利要求1所述的方法,其特征在于,所述根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,包括:
    根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,包括:
    将所述第一训练数据输入所述神经网络模型;
    对所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,包括:
    当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数;
    当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数以及所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数。
  5. 根据权利要求4所述的方法,其特征在于,所述超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
  6. 一种神经网络模型训练的方法,其特征在于,包括:
    获取神经网络模型、第一训练数据和所述第一训练数据的类别;
    当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整超参数,根据所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型;
    当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数、所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型。
  7. 根据权利要求6所述的方法,其特征在于,所述神经网络模型是根据第二训练数据训练得到的;
    所述根据所述第一训练数据和所述第一训练数据的类别,训练神经网络模型,包括:
    利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;
    根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
  8. 根据权利要求7所述的方法,其特征在于,所述根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,包括:
    根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
  9. 根据权利要求7或8所述的方法,其特征在于,所述利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,包括:
    将所述第一训练数据输入所述神经网络模型;
    将所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
  10. 根据权利要求6-9中任一项所述的方法,其特征在于,所述超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
  11. 一种神经网络模型训练的装置,其特征在于,包括:
    获取模块,用于获取神经网络模型、第一训练数据和所述第一训练数据的类别,所述神经网络模型是根据第二训练数据训练得到的,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;
    处理模块,用于:
    利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征;
    根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
  12. 根据权利要求11所述的装置,其特征在于,所述处理模块用于:
    根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
  13. 根据权利要求11或12所述的装置,其特征在于,所述处理模块用于:
    将所述第一训练数据输入所述神经网络模型;
    将所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
  14. 根据权利要求11-13中任一项所述的装置,其特征在于,所述处理模块用于:
    当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整所述神经网络模型中部分层的网络结构并优化超参数,根据所述每一类的类中心特征与所述查询数据特 征的特征距离,调整所述部分层的参数;
    当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数以及所述每一类的类中心特征与所述查询数据特征的特征距离,调整所述部分层的参数。
  15. 根据权利要求14所述的装置,其特征在于,所述超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
  16. 一种神经网络模型训练的装置,其特征在于,包括:
    获取模块,用于获取神经网络模型、第一训练数据和所述第一训练数据的类别;
    处理模块,用于:
    当所述第一训练数据的数据量小于预设值时,通过贝叶斯优化方案调整神经网络模型的网络结构并优化超参数,根据所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型;
    当所述第一训练数据的数据量大于或等于所述预设值时,根据所述神经网络模型对应的预设超参数、所述第一训练数据和所述第一训练数据的类别,训练所述神经网络模型。
  17. 根据权利要求16所述的装置,其特征在于,所述神经网络模型是根据第二训练数据训练得到的;
    所述处理模块用于:
    利用所述神经网络模型对于所述第一训练数据进行特征提取,以得到所述第一训练数据的特征,所述第一训练数据包括支持数据和查询数据,所述支持数据包括所述第一训练数据中的每一类的全部或部分数据,所述查询数据包括所述第一训练数据中每一类的全部或部分数据;
    根据每一类的类中心特征与所述查询数据特征的特征距离,调整所述神经网络模型中部分层的参数,以得到调整后的神经网络模型,所述每一类的类中心特征中的每一位为所述每一类的所述支持数据的特征对应位的平均值。
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块用于:
    根据所述每一类的类中心特征与所述查询数据特征的特征距离,以及每一类的第一训练数据的特征之间的特征距离的平均值,调整所述部分层的参数。
  19. 根据权利要求17或18所述的装置,其特征在于,所述处理模块用于:
    将所述第一训练数据输入所述神经网络模型;
    将所述神经网络模型提取的特征进行深度哈希,以得到所述第一训练数据的特征。
  20. 根据权利要求16-19中任一项所述的装置,其特征在于,所述超参数包括学习率、学习率衰减速率、学习率衰减周期、迭代周期数量、批尺寸、神经网络模型的网络结构参数中的一种或多种。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1-10中任一项所述的方法。
  22. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1-10中任一项所述的方法。
PCT/CN2020/102594 2019-09-18 2020-07-17 神经网络模型训练的方法和装置 WO2021051987A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910883124.1A CN112529146B (zh) 2019-09-18 2019-09-18 神经网络模型训练的方法和装置
CN201910883124.1 2019-09-18

Publications (1)

Publication Number Publication Date
WO2021051987A1 true WO2021051987A1 (zh) 2021-03-25

Family

ID=74883014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/102594 WO2021051987A1 (zh) 2019-09-18 2020-07-17 神经网络模型训练的方法和装置

Country Status (2)

Country Link
CN (1) CN112529146B (zh)
WO (1) WO2021051987A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407820A (zh) * 2021-05-29 2021-09-17 华为技术有限公司 模型训练方法及相关系统、存储介质
CN113723692A (zh) * 2021-09-02 2021-11-30 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、介质及程序产品
CN113807183A (zh) * 2021-08-17 2021-12-17 华为技术有限公司 模型训练方法及相关设备
CN114723998A (zh) * 2022-05-05 2022-07-08 兰州理工大学 基于大边界贝叶斯原型学习的小样本图像分类方法及装置
CN116503674A (zh) * 2023-06-27 2023-07-28 中国科学技术大学 一种基于语义指导的小样本图像分类方法、装置及介质
CN117892799A (zh) * 2024-03-15 2024-04-16 中国科学技术大学 以多层次任务为导向的金融智能分析模型训练方法及系统
WO2024103352A1 (zh) * 2022-11-17 2024-05-23 华为技术有限公司 一种通信方法、装置及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535899B (zh) * 2021-07-07 2024-02-27 西安康奈网络科技有限公司 一种针对互联网信息情感倾向性的自动研判方法
CN114358115A (zh) * 2021-11-23 2022-04-15 北京邮电大学 通信信号自动调制识别方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480261A (zh) * 2017-08-16 2017-12-15 上海荷福人工智能科技(集团)有限公司 一种基于深度学习细粒度人脸图像快速检索方法
CN108875045A (zh) * 2018-06-28 2018-11-23 第四范式(北京)技术有限公司 针对文本分类来执行机器学习过程的方法及其系统
CN108898162A (zh) * 2018-06-08 2018-11-27 东软集团股份有限公司 一种数据标注方法、装置、设备及计算机可读存储介质
US20190147371A1 (en) * 2017-11-13 2019-05-16 Accenture Global Solutions Limited Training, validating, and monitoring artificial intelligence and machine learning models
CN110175655A (zh) * 2019-06-03 2019-08-27 中国科学技术大学 数据识别方法及装置、存储介质及电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL2015087B1 (en) * 2015-06-05 2016-09-09 Univ Amsterdam Deep receptive field networks.
CN106874921B (zh) * 2015-12-11 2020-12-04 清华大学 图像分类方法和装置
CN107609598A (zh) * 2017-09-27 2018-01-19 武汉斗鱼网络科技有限公司 图像鉴别模型训练方法、装置及可读存储介质
CN110163234B (zh) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 一种模型训练方法、装置和存储介质
CN109558942B (zh) * 2018-11-20 2021-11-26 电子科技大学 一种基于浅度学习的神经网络迁移方法
CN109740657B (zh) * 2018-12-27 2021-10-29 郑州云海信息技术有限公司 一种用于图像数据分类的神经网络模型的训练方法与设备
CN109947940B (zh) * 2019-02-15 2023-09-05 平安科技(深圳)有限公司 文本分类方法、装置、终端及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480261A (zh) * 2017-08-16 2017-12-15 上海荷福人工智能科技(集团)有限公司 一种基于深度学习细粒度人脸图像快速检索方法
US20190147371A1 (en) * 2017-11-13 2019-05-16 Accenture Global Solutions Limited Training, validating, and monitoring artificial intelligence and machine learning models
CN108898162A (zh) * 2018-06-08 2018-11-27 东软集团股份有限公司 一种数据标注方法、装置、设备及计算机可读存储介质
CN108875045A (zh) * 2018-06-28 2018-11-23 第四范式(北京)技术有限公司 针对文本分类来执行机器学习过程的方法及其系统
CN110175655A (zh) * 2019-06-03 2019-08-27 中国科学技术大学 数据识别方法及装置、存储介质及电子设备

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407820A (zh) * 2021-05-29 2021-09-17 华为技术有限公司 模型训练方法及相关系统、存储介质
CN113407820B (zh) * 2021-05-29 2023-09-15 华为技术有限公司 利用模型进行数据处理的方法及相关系统、存储介质
CN113807183A (zh) * 2021-08-17 2021-12-17 华为技术有限公司 模型训练方法及相关设备
CN113723692A (zh) * 2021-09-02 2021-11-30 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、介质及程序产品
CN114723998A (zh) * 2022-05-05 2022-07-08 兰州理工大学 基于大边界贝叶斯原型学习的小样本图像分类方法及装置
CN114723998B (zh) * 2022-05-05 2023-06-20 兰州理工大学 基于大边界贝叶斯原型学习的小样本图像分类方法及装置
WO2024103352A1 (zh) * 2022-11-17 2024-05-23 华为技术有限公司 一种通信方法、装置及系统
CN116503674A (zh) * 2023-06-27 2023-07-28 中国科学技术大学 一种基于语义指导的小样本图像分类方法、装置及介质
CN116503674B (zh) * 2023-06-27 2023-10-20 中国科学技术大学 一种基于语义指导的小样本图像分类方法、装置及介质
CN117892799A (zh) * 2024-03-15 2024-04-16 中国科学技术大学 以多层次任务为导向的金融智能分析模型训练方法及系统
CN117892799B (zh) * 2024-03-15 2024-06-04 中国科学技术大学 以多层次任务为导向的金融智能分析模型训练方法及系统

Also Published As

Publication number Publication date
CN112529146B (zh) 2023-10-17
CN112529146A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2021042828A1 (zh) 神经网络模型压缩的方法、装置、存储介质和芯片
CN110084281B (zh) 图像生成方法、神经网络的压缩方法及相关装置、设备
WO2020238293A1 (zh) 图像分类方法、神经网络的训练方法及装置
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2021120719A1 (zh) 神经网络模型更新方法、图像处理方法及装置
WO2020073951A1 (zh) 用于图像识别的模型的训练方法、装置、网络设备和存储介质
WO2021043193A1 (zh) 神经网络结构的搜索方法、图像处理方法和装置
WO2020253416A1 (zh) 物体检测方法、装置和计算机存储介质
WO2022105714A1 (zh) 数据处理方法、机器学习的训练方法及相关装置、设备
WO2022052601A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
WO2021057056A1 (zh) 神经网络架构搜索方法、图像处理方法、装置和存储介质
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
CN113705769A (zh) 一种神经网络训练方法以及装置
CN110222718B (zh) 图像处理的方法及装置
CN111797882B (zh) 图像分类方法及装置
WO2022267036A1 (zh) 神经网络模型训练方法和装置、数据处理方法和装置
WO2023231794A1 (zh) 一种神经网络参数量化方法和装置
WO2020062299A1 (zh) 一种神经网络处理器、数据处理方法及相关设备
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20864998

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20864998

Country of ref document: EP

Kind code of ref document: A1