[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114254764B - Feedback-based machine learning model searching method, system, equipment and medium - Google Patents

Feedback-based machine learning model searching method, system, equipment and medium Download PDF

Info

Publication number
CN114254764B
CN114254764B CN202111620457.9A CN202111620457A CN114254764B CN 114254764 B CN114254764 B CN 114254764B CN 202111620457 A CN202111620457 A CN 202111620457A CN 114254764 B CN114254764 B CN 114254764B
Authority
CN
China
Prior art keywords
machine learning
learning model
current optimal
training
optimal machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111620457.9A
Other languages
Chinese (zh)
Other versions
CN114254764A (en
Inventor
沈超
张笑宇
蔺琛皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202111620457.9A priority Critical patent/CN114254764B/en
Publication of CN114254764A publication Critical patent/CN114254764A/en
Application granted granted Critical
Publication of CN114254764B publication Critical patent/CN114254764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention belongs to the field of machine learning, and discloses a machine learning model searching method, a system, equipment and a medium based on feedback, which comprise the steps of obtaining initial machine learning model parameters and constructing an initial machine learning model according to the initial machine learning model parameters; training an initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model; determining a current optimal machine learning model, acquiring training feedback data of the current optimal machine learning model, and acquiring search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model; judging whether a preset termination condition is met or not, and when the preset termination condition is not met, modifying the current optimal machine learning model and repeating the steps as an initial machine learning model according to the current optimal machine learning model and search operation; and when the search result is met, outputting the current optimal machine learning model, and greatly improving the search efficiency of the machine learning model.

Description

Feedback-based machine learning model searching method, system, equipment and medium
Technical Field
The invention belongs to the field of machine learning, and relates to a machine learning model searching method, a system, equipment and a medium based on feedback.
Background
With the widespread use of machine learning in various industries, how to obtain a machine learning model suitable for training tasks for users without relevant knowledge bases has become an important problem. At present, a machine learning model is mainly searched by an automatic machine learning method, the automatic machine learning aims to automatically search a model with the best performance on a data set by designing a set of optimization algorithm, and the common method is to consider the machine learning model as two parts of a framework and a super parameter, and search is performed by using a method based on a substitution model.
However, this separate search method cannot take into account the interaction relationship between architecture and super parameters, and some possible effective combinations are easily ignored; moreover, algorithms based on surrogate models are essentially machine learning problems that treat the search process as independent, and past search histories are treated as datasets that address this problem. Expansion of the data set limits the search efficiency of such algorithms, resulting in a failure of such methods to search machine learning models suitable for training tasks in a short period of time and efficiently.
The existing research on the machine learning model searching method generally improves the alternative model method, and a new machine learning model is proposed to improve the searching effect, and a set of real and efficient model searching method does not exist, so that the application in practical occasions is limited to a certain extent.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a machine learning model searching method, a system, equipment and a medium based on feedback.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect of the present invention, a feedback-based machine learning model search method includes the steps of:
s1: acquiring initial machine learning model parameters, and constructing an initial machine learning model according to the initial machine learning model parameters;
s2: training an initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model;
s3: when the current optimal machine learning model exists, acquiring a training score of the current optimal machine learning model, and updating the current optimal machine learning model to the one with higher training scores in the current optimal machine learning model and the initial machine learning model; otherwise, taking the initial machine learning model as a current optimal machine learning model;
S4: acquiring training feedback data of a current optimal machine learning model, and acquiring search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model;
s5: judging whether a preset termination condition is met, when the termination condition is not met, modifying the current optimal machine learning model according to the current optimal machine learning model and the search operation of the current optimal machine learning model, and returning to S2 as an initial machine learning model; and when the termination condition is met, outputting the current optimal machine learning model.
Optionally, the initial machine learning model parameters include architecture information, super parameter information and training information, wherein the architecture information includes a network structure and a network layer relationship, the super parameter information includes an initializer and an activation function, and the training information includes an optimizer and a learning rate.
Optionally, the preset training data set is an image data set or a text data set.
Optionally, the training score is the highest accuracy of the initial machine learning model in training, the searching operation includes changing a model architecture, changing a model hyper-parameter, and changing a model training configuration, and the preset termination condition is that the searching operation times reach a preset searching operation times threshold, or the searching time reaches a preset searching time threshold.
Optionally, the training feedback data includes architecture information, accuracy, gradients of each layer, and parameter weights.
Optionally, the specific method for obtaining the search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model includes:
determining the architecture type, convergence type, gradient type and weight type of the current optimal machine learning model according to the architecture information, accuracy, gradient of each layer and parameter weight of the current optimal machine learning model;
obtaining a condition combination of the current optimal machine learning model according to the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model, and obtaining a preset searching operation under the condition combination to serve as the searching operation of the current optimal machine learning model.
Optionally, the specific method for determining the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model according to the architecture information, the accuracy, the gradients of each layer and the parameter weight of the current optimal machine learning model is as follows:
when the architecture information of the current optimal machine learning model is a Resnet architecture, the architecture type of the current optimal machine learning model is a Resnet architecture; when the architecture information of the current optimal machine learning model is an Xreception architecture, the architecture type of the current optimal machine learning model is an Xreception architecture; when the architecture information of the current optimal machine learning model is an efficiency architecture, the architecture type of the current optimal machine learning model is the efficiency architecture; otherwise, the architecture type of the current optimal machine learning model is other architectures;
When the maximum improvement of the accuracy rate of the current optimal machine learning model in two adjacent training steps does not exceed a preset first threshold value, the convergence type of the current optimal machine learning model is slow convergence; otherwise, the convergence type of the current optimal machine learning model is normal convergence;
when the gradient of each layer of the current optimal machine learning model gradually increases from the output layer to the input layer and the increment exceeds a preset second threshold value, the gradient type of the current optimal machine learning model is an explosion gradient; when the gradient of each layer of the current optimal machine learning model gradually decreases from the output layer to the input layer and the decreasing amount exceeds a preset third threshold value, the gradient type of the current optimal machine learning model is a vanishing gradient; when the gradients of each layer of the current optimal machine learning model contain zero values exceeding a certain proportion and the activation function is a ReLU activation function, the gradient type of the current optimal machine learning model is a death gradient; otherwise, the gradient type of the current optimal machine learning model is a normal gradient;
when NaN values exist in the parameter weights of the current optimal machine learning model, the weight type of the current optimal machine learning model is NaN weight; otherwise, the weight type of the current optimal machine learning model is a normal weight.
In a second aspect of the present invention, a feedback-based machine learning model search system includes:
the acquisition module is used for acquiring initial machine learning model parameters and constructing an initial machine learning model according to the initial machine learning model parameters;
the training module is used for training the initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model;
the updating module is used for acquiring the training score of the current optimal machine learning model when the current optimal machine learning model exists, and updating the current optimal machine learning model to the one with higher training score in the current optimal machine learning model and the initial machine learning model; otherwise, taking the initial machine learning model as a current optimal machine learning model;
the operation determining module is used for acquiring training feedback data of the current optimal machine learning model and obtaining search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model;
the output module is used for judging whether a preset termination condition is met or not, and when the termination condition is not met, the current optimal machine learning model is modified according to the current optimal machine learning model and the search operation of the current optimal machine learning model and is used as an initial machine learning model to be sent to the training module; and when the termination condition is met, outputting the current optimal machine learning model.
In a third aspect of the present invention, a computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the feedback-based machine learning model search method described above when executing the computer program.
In a fourth aspect of the present invention, a computer readable storage medium stores a computer program which, when executed by a processor, implements the steps of the feedback-based machine learning model search method described above.
Compared with the prior art, the invention has the following beneficial effects:
according to the machine learning model searching method based on feedback, training feedback data and training scores in the machine learning model training process are monitored in real time, different searching operations are selected according to different training feedback data, and then a new machine learning model is generated, so that automatic and efficient searching of the machine learning model is achieved. In the method, the searching process of the machine learning model is simple, the operation is convenient, and the searching of the machine learning model can be automatically completed only by providing the initial machine learning model parameters and the training data set, and the manual observation data or the manual searching of the model architecture is not needed. Such characteristics determine that the method is simple to implement and low in complexity. In addition, the machine learning model searching method does not treat the model searching problem as a machine learning problem, but forms the model searching problem as a software engineering searching problem, and solves the problem by using a feedback-based method, dynamically selects searching operations for different machine learning models to generate a new machine learning model, does not need a large amount of searching histories as a data set, has high searching efficiency for the machine learning model, and can effectively reduce resource expenditure in the searching process. In addition, the machine learning model searching method has low requirements on dependent machine learning frameworks, and the existing Keras, tensorFlow and Pytorch can be applied to the machine learning model searching method, so that the searching method can be applied to almost all machine learning frameworks, and efficient machine learning model searching on different machine learning frameworks becomes possible, and the machine learning model searching efficiency is improved.
Drawings
FIG. 1 is a flow chart of a feedback-based machine learning model search method of the present invention;
FIG. 2 is a schematic diagram of a machine learning model condition evaluation in an embodiment of the present invention;
FIG. 3 is a diagram illustrating the results of machine learning model generation in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a machine learning model condition evaluation in accordance with a further embodiment of the present invention;
FIG. 5 is a diagram illustrating the results of machine learning model generation in accordance with yet another embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the attached drawing figures:
referring to fig. 1, in an embodiment of the present invention, a feedback-based machine learning model searching method is provided, including the following steps:
s1: and acquiring initial machine learning model parameters, and constructing an initial machine learning model according to the initial machine learning model parameters.
Wherein the initial machine learning model parameters include architecture information, hyper-parameter information, and training information. Specifically, for a given initial machine learning model parameter p= { P archi ,P hyper ,P train The implementation may be based on a TensorFlow, which is an end-to-end open source machine learning platform. Wherein P is archi The architecture information parameters of the model comprise network structures of the model, network layer-by-layer relationships and the like; wherein P is hyper The super-parameter information parameters of the model comprise an initializer, an activation function and the like of the model; wherein P is train Training information parameters for the model comprise an optimizer for model training, learning rate and the like.
S2: training an initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model.
Wherein the preset training data set is an image data set or a text data set. The image dataset comprises: CIFAR-10 image dataset, MNIST image dataset, STL-10 dataset, etc.; the text data set includes an IMDB text data set, and the like.
The training feedback data comprises architecture information, accuracy, gradients of each layer and parameter weights. Specifically, the training feedback data includes static feedback data: machine learning model layer relationships and configuration details, machine learning training optimizer configuration details, and the like. Dynamic feedback data: gradients of each layer of a machine learning model training process, parameter weights, machine learning model training accuracy and other evaluation indexes, machine learning model other super parameters and the like. During training, the built-in function of the AutoKeras library can be used for loading the parameters of the initial machine learning model and generating an initial machine learning model M init And then training according to the training configuration. Recording layer parameter configuration L= { L of initial machine learning model 1 ,l 2 ,…,l n Training the optimizer parameter set o= { O } 1 ,O 2 …, in addition, based on training process monitoring of the initial machine learning model, and recording Data in real time in the ith training iteration period i ={L i ,A i ,G i ,W i …, which contains a loss function value L for each training iteration i Accuracy A i Gradient G of each layer i And parameter weight W i Etc. feedback information.
Wherein the training score is the highest accuracy of the initial machine learning model in training. The training score is used to measure model performance, the model with the highest accuracy is generally considered to have better performance, and other metrics can be used as model scores.
S3: when the current optimal machine learning model exists, acquiring a training score of the current optimal machine learning model, and updating the current optimal machine learning model to the one with higher training scores in the current optimal machine learning model and the initial machine learning model; otherwise, the initial machine learning model is used as the current optimal machine learning model.
Specifically, comparing the highest accuracy of the initial machine learning model in training with the highest accuracy of the current optimal machine learning model in training, and updating the higher model into the current optimal machine learning model; and if the current optimal machine learning model does not exist, directly taking the initial machine learning model as the current optimal machine learning model.
S4: and acquiring training feedback data of the current optimal machine learning model, and acquiring search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model.
The specific method for obtaining the search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model comprises the following steps: determining the architecture type, convergence type, gradient type and weight type of the current optimal machine learning model according to the architecture information, accuracy, gradient of each layer and parameter weight of the current optimal machine learning model; obtaining a condition combination of the current optimal machine learning model according to the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model, and obtaining a preset searching operation under the condition combination to serve as the searching operation of the current optimal machine learning model.
Specifically, according to the architecture information, the accuracy, the gradients of each layer and the parameter weights of the current optimal machine learning model, the specific methods for determining the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model are as follows:
architecture type: when the architecture information of the current optimal machine learning model is a Resnet architecture, the architecture type of the current optimal machine learning model is a Resnet architecture; when the architecture information of the current optimal machine learning model is an Xreception architecture, the architecture type of the current optimal machine learning model is an Xreception architecture; when the architecture information of the current optimal machine learning model is an efficiency architecture, the architecture type of the current optimal machine learning model is the efficiency architecture; otherwise, the architecture type of the current optimal machine learning model is other architectures.
Convergence type: when the maximum improvement of the accuracy rate of the current optimal machine learning model in two adjacent training steps does not exceed a preset first threshold value, the convergence type of the current optimal machine learning model is slow convergence; otherwise, the convergence type of the current optimal machine learning model is normal convergence.
Gradient type: when the gradient of each layer of the current optimal machine learning model gradually increases from the output layer to the input layer and the increment exceeds a preset second threshold value, the gradient type of the current optimal machine learning model is an explosion gradient; when the gradient of each layer of the current optimal machine learning model gradually decreases from the output layer to the input layer and the decreasing amount exceeds a preset third threshold value, the gradient type of the current optimal machine learning model is a vanishing gradient; when the gradients of each layer of the current optimal machine learning model contain zero values exceeding a certain proportion and the activation function is a ReLU activation function, the gradient type of the current optimal machine learning model is a death gradient; otherwise, the gradient type of the current optimal machine learning model is a normal gradient.
Weight type: when NaN values exist in the parameter weights of the current optimal machine learning model, the weight type of the current optimal machine learning model is NaN weight; otherwise, the weight type of the current optimal machine learning model is a normal weight.
Specifically, the searching operation comprises three major types of changing model architecture, changing model super parameters and changing model training configuration, so that the model can be comprehensively searched and improved. Wherein the model architecture is changed, for example: deepening the model depth, changing the configuration of the model layers, changing the connection relation of the model layers, adding or deleting the model layers, changing the pre-training condition of the model, and the like; changing model hyper-parameters, such as: adding data enhancement operation, changing a model activation function, changing a model initializer and the like; changing the model training configuration, for example: changing training optimizers, changing model learning rates, etc.
When the method is specifically applied, a set of search operation priorities are preset for different condition combinations of the current optimal machine learning model, the higher the priority is, the more likely the search operation is to improve the performance and score of the model, the condition combinations of the current optimal machine learning model are obtained when a new model is searched each time, and the search operation with the highest priority is selected from the preset priorities one by one to serve as the search operation of the current optimal machine learning model.
The preset search operation priority is obtained according to the result of a large-scale experiment, and the specific method comprises the following steps: generating machine learning models in batches to cover various super-parameter combinations of the search space; training each generated machine learning model, recording training data such as architecture information, accuracy, gradients of each layer, parameter weights and the like, and determining the condition combination such as architecture type, convergence type, gradient type, weight type and the like of the current optimal machine learning model; applying various search operations to the model one by one, restarting training and recording the influence of the search operations on the training score of the machine learning model, if the training score is improved compared with that before the search operations are applied, considering that the operation is beneficial to the training of the machine learning model, recording the score difference, otherwise, considering that the training of the machine learning model is unfavorable, and thus completing the recording of all generated machine learning models and the corresponding condition combinations and the search operations; finally, the average value is calculated by applying the training score change after each search operation to each condition combination, and the higher the training score change average value is, the better the search operation is under the condition combination, and the higher the priority of the application of the search operation is when the machine learning model is under the condition combination. For example, when the condition combination is "Resnet architecture-slow convergence-explosion gradient-normal weight", the average training score of "change model layer configured as xception architecture" is changed to 0.32, the average score of "change model layer configured as other architecture" is changed to 0.11, then the search operation of "change model layer configured as xception architecture" should be preferentially searched in the search.
S5: judging whether a preset termination condition is met, when the termination condition is not met, modifying the current optimal machine learning model according to the current optimal machine learning model and the search operation of the current optimal machine learning model, and returning to S2 as an initial machine learning model; and when the termination condition is met, outputting the current optimal machine learning model.
The preset termination condition is that the search operation times reach a preset search operation times threshold value or the search time reaches a preset search time threshold value.
The method can be applied to the field of image classification or text classification, and can be used for accurately and rapidly classifying the images to be classified or texts to be classified after the images to be classified or the images to be classified are finally acquired and input into a machine learning model which is searched by the image data set or the text data set after the given image data set or the text data set is searched. For example: after the image dataset is searched, the searched machine learning model can classify the images to be classified with high accuracy and high efficiency, and the correct category to which the images to be classified belong can be effectively predicted. Taking a CIFAR-10 image dataset as an example, the searched machine learning model can predict images in the dataset, acquire images to be classified and input the images to be classified into the searched machine learning model to judge which of 10 tags belong to the dataset. After the text data set is searched, the searched machine learning model can also classify the texts in the text data set with high accuracy and high efficiency. Taking IMDB data sets as an example, the machine learning model after searching can judge whether the text input in the text data sets belongs to positive evaluation or negative evaluation.
According to the machine learning model searching method based on feedback, training feedback data and training scores in the machine learning model training process are monitored in real time, different searching operations are selected according to different training feedback data, and then a new machine learning model is generated, so that automatic and efficient searching of the machine learning model is achieved. In the method, the searching process of the machine learning model is simple, the operation is convenient, and the searching of the machine learning model can be automatically completed only by providing the initial machine learning model parameters and the training data set, and the manual observation data or the manual searching of the model architecture is not needed. Such characteristics determine that the method is simple to implement and low in complexity. In addition, the machine learning model searching method does not treat the model searching problem as a machine learning problem, but forms the model searching problem as a software engineering searching problem, and solves the problem by using a feedback-based method, dynamically selects searching operations for different machine learning models to generate a new machine learning model, does not need a large amount of searching histories as a data set, has high searching efficiency for the machine learning model, and can effectively reduce resource expenditure in the searching process. In addition, the machine learning model searching method has low requirements on dependent machine learning frameworks, and the existing Keras, tensorFlow and Pytorch can be applied to the machine learning model searching method, so that the searching method can be applied to almost all machine learning frameworks, and efficient machine learning model searching on different machine learning frameworks becomes possible, and the machine learning model searching efficiency is improved.
Further, the condition combination comprises an architecture type, a convergence type, a gradient type and a weight type, wherein the architecture type comprises four architecture types of a Resnet architecture, an Xreception architecture, an efficiency architecture and other architectures; slowly converging, and normally converging two converging types; vanishing gradient, explosion gradient, death gradient, normal gradient; and NaN weight, normal weight and two weight types, and four kinds of twelve conditions can describe the training state of the machine learning model more comprehensively than the traditional work, and search for the operation more likely to improve the performance of the machine learning model better.
Further, the search operation of the machine learning model comprises the steps of changing the model framework, changing the model super-parameters and changing the model training configuration, so that the machine learning model with excellent performance is more likely to be searched compared with the traditional work by combining the search of the machine learning model framework and the search of the super-parameters.
Referring to fig. 2, in an embodiment of the present invention, a machine learning model with an initial architecture of Resnet is trained on a CIFAR-10 dataset, and a machine learning model with excellent performance is searched on the dataset by the above-mentioned feedback-based machine learning model searching method, which specifically includes the following steps:
Step 1: the training condition evaluation step of the machine learning model comprises the following specific steps:
step 1-1: using the TensorFlow and AutoKeras, generating a machine learning model from the initial model architecture, training the machine learning model according to a training configuration. The AutoKeras tool can conveniently generate a machine learning model according to the super parameters and the model architecture. And recording static feedback data, wherein the static feedback data comprises static machine learning model layer relation and configuration details, machine learning training optimizer configuration details and the like, and recording can be completed before training begins.
Step 1-2: and recording dynamic feedback data of the model, which change in the training process, such as loss function, accuracy, gradient of each layer, parameter weight and the like in the training process. The information such as the loss function can be directly obtained from training through the Keras interface, and the gradient and the parameter weight of each layer are obtained through a solving method realized by TensorFlow.
Step 1-3: after sufficient training feedback data is collected, the current optimal machine learning model is updated, typically at the end of the default training. The recorded dynamic data and static data are used to evaluate a conditional combination of the machine learning model. Training conditions include four classes of architecture type, convergence type, gradient type, and weight type. In this embodiment, the four types of conditions satisfied by the model 1 are a Resnet architecture, normal convergence, normal gradient, and normal weight, respectively; the four classes of conditions for model 2 are other architecture, slow convergence, death gradient, and normal weights, respectively.
Step 2: the machine learning model generation and retraining steps are as follows:
step 2-1: and for the model training conditions evaluated before, sequentially selecting search operations from preset search priorities according to the priority order from high to low, and taking the search operations as the content of the next machine learning model search.
Step 2-2: and regenerating the initial machine learning model by the selected searching operation and the current optimal machine learning model, restarting the training process, and collecting training feedback data again. Referring to fig. 3, in this step, according to training conditions and search priorities, model 1 selects a search operation "module type=other" with the highest priority to replace the model architecture with other architecture, and then obtains model 2; the model 2 selects the search operation 'pretraining=true' with the highest priority according to the four corresponding training conditions, and adjusts the model pretraining setting to generate a model 3.
Referring to FIG. 4, in yet another embodiment of the present invention, a model initially structured as an efficiency is trained on a CIFAR-100 dataset and further machine learning models with more excellent performance are searched.
The training feedback data of the model 1 is analyzed, and four condition combinations are judged as follows: the architecture of the efficiency, slow convergence, vanishing gradient and normal weight; the four conditions of model 2 are combined into Xception architecture, normal convergence, normal gradient and normal weight.
Searching the search operation after the condition combination is obtained, and referring to fig. 5 for a model search process result, wherein the condition combination met by the model 1 and the preset search priority, selecting the operation with the highest priority, namely 'module type=xception architecture', to improve the model 1, replacing the architecture with Xception and generating a model 2 for retraining; the operation of the most preferred search of model 2 is "pretraining = True", changing the pretraining setting of the model to True, and thus continuing to generate model 3 for searching.
The following are device embodiments of the present invention that may be used to perform method embodiments of the present invention. For details not disclosed in the apparatus embodiments, please refer to the method embodiments of the present invention.
In still another embodiment of the present invention, a feedback-based machine learning model search system is provided, which can be used to implement the feedback-based machine learning model search method described above, and specifically, the feedback-based machine learning model search system includes an acquisition module, a training module, an updating module, an operation determining module, and an output module.
The acquisition module is used for acquiring initial machine learning model parameters and constructing an initial machine learning model according to the initial machine learning model parameters; the training module is used for training the initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model; the updating module is used for acquiring the training score of the current optimal machine learning model when the current optimal machine learning model exists, and updating the current optimal machine learning model to the one with higher training score in the current optimal machine learning model and the initial machine learning model; otherwise, taking the initial machine learning model as a current optimal machine learning model; the operation determining module is used for acquiring training feedback data of the current optimal machine learning model and obtaining search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model; the output module is used for judging whether a preset termination condition is met, and when the termination condition is not met, the current optimal machine learning model is modified according to the current optimal machine learning model and the search operation of the current optimal machine learning model and is used as an initial machine learning model to be sent to the training module; and when the termination condition is met, outputting the current optimal machine learning model.
All relevant contents of each step involved in the foregoing embodiment of the feedback-based machine learning model searching method may be cited to the functional description of the functional module corresponding to the feedback-based machine learning model searching system in the embodiment of the present invention, which is not described herein.
The division of the modules in the embodiments of the present invention is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions; the processor of the embodiments of the present invention may be used for the operation of a feedback-based machine learning model search method.
In yet another embodiment of the present invention, a storage medium, specifically a computer readable storage medium (Memory), is a Memory device in a computer device, for storing a program and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the respective steps of the above-described embodiments with respect to a feedback-based machine learning model search method.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (5)

1. An image classification method, characterized by comprising the steps of:
s1: acquiring initial machine learning model parameters, and constructing an initial machine learning model according to the initial machine learning model parameters;
s2: training an initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model; the training data set is an image data set;
s3: when the current optimal machine learning model exists, acquiring a training score of the current optimal machine learning model, and updating the current optimal machine learning model to the one with higher training scores in the current optimal machine learning model and the initial machine learning model; otherwise, taking the initial machine learning model as a current optimal machine learning model;
s4: acquiring training feedback data of a current optimal machine learning model, and acquiring search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model;
s5: judging whether a preset termination condition is met, when the termination condition is not met, modifying the current optimal machine learning model according to the current optimal machine learning model and the search operation of the current optimal machine learning model, and returning to S2 as an initial machine learning model; outputting a current optimal machine learning model when the termination condition is met;
S6: acquiring and inputting an image to be classified into a current optimal machine learning model to obtain the category of the image to be classified;
the initial machine learning model parameters comprise architecture information, super-parameter information and training information, wherein the architecture information comprises a network structure and a network layer relation, the super-parameter information comprises an initializer and an activation function, and the training information comprises an optimizer and a learning rate;
the training score is the highest accuracy of the initial machine learning model in training, the searching operation comprises the steps of changing a model framework, changing model super parameters and changing model training configuration, and the preset termination condition is that the searching operation times reach a preset searching operation times threshold value or the searching time reaches a preset searching time threshold value;
the training feedback data comprises architecture information, accuracy, gradients of each layer and parameter weights;
the specific method for obtaining the search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model comprises the following steps:
determining the architecture type, convergence type, gradient type and weight type of the current optimal machine learning model according to the architecture information, accuracy, gradient of each layer and parameter weight of the current optimal machine learning model;
Obtaining a condition combination of the current optimal machine learning model according to the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model, and obtaining a preset searching operation under the condition combination to serve as the searching operation of the current optimal machine learning model.
2. The image classification method according to claim 1, wherein the specific method for determining the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model according to the architecture information, the accuracy, the gradients of each layer and the parameter weight of the current optimal machine learning model is as follows:
when the architecture information of the current optimal machine learning model is a Resnet architecture, the architecture type of the current optimal machine learning model is a Resnet architecture; when the architecture information of the current optimal machine learning model is an Xreception architecture, the architecture type of the current optimal machine learning model is an Xreception architecture; when the architecture information of the current optimal machine learning model is an efficiency architecture, the architecture type of the current optimal machine learning model is the efficiency architecture; otherwise, the architecture type of the current optimal machine learning model is other architectures;
When the maximum improvement of the accuracy rate of the current optimal machine learning model in two adjacent training steps does not exceed a preset first threshold value, the convergence type of the current optimal machine learning model is slow convergence; otherwise, the convergence type of the current optimal machine learning model is normal convergence;
when the gradient of each layer of the current optimal machine learning model gradually increases from the output layer to the input layer and the increment exceeds a preset second threshold value, the gradient type of the current optimal machine learning model is an explosion gradient; when the gradient of each layer of the current optimal machine learning model gradually decreases from the output layer to the input layer and the decreasing amount exceeds a preset third threshold value, the gradient type of the current optimal machine learning model is a vanishing gradient; when the gradients of each layer of the current optimal machine learning model contain zero values exceeding a certain proportion and the activation function is a ReLU activation function, the gradient type of the current optimal machine learning model is a death gradient; otherwise, the gradient type of the current optimal machine learning model is a normal gradient;
when NaN values exist in the parameter weights of the current optimal machine learning model, the weight type of the current optimal machine learning model is NaN weight; otherwise, the weight type of the current optimal machine learning model is a normal weight.
3. An image classification system, comprising:
the acquisition module is used for acquiring initial machine learning model parameters and constructing an initial machine learning model according to the initial machine learning model parameters;
the training module is used for training the initial machine learning model through a preset training data set to obtain training feedback data and training scores of the initial machine learning model; the training data set is an image data set;
the updating module is used for acquiring the training score of the current optimal machine learning model when the current optimal machine learning model exists, and updating the current optimal machine learning model to the one with higher training score in the current optimal machine learning model and the initial machine learning model; otherwise, taking the initial machine learning model as a current optimal machine learning model;
the operation determining module is used for acquiring training feedback data of the current optimal machine learning model and obtaining search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model;
the output module is used for judging whether a preset termination condition is met or not, and when the termination condition is not met, the current optimal machine learning model is modified according to the current optimal machine learning model and the search operation of the current optimal machine learning model and is used as an initial machine learning model to be sent to the training module; outputting a current optimal machine learning model when the termination condition is met;
The classification module is used for acquiring and inputting the images to be classified into the current optimal machine learning model to obtain the categories of the images to be classified;
the initial machine learning model parameters comprise architecture information, super-parameter information and training information, wherein the architecture information comprises a network structure and a network layer relation, the super-parameter information comprises an initializer and an activation function, and the training information comprises an optimizer and a learning rate;
the training score is the highest accuracy of the initial machine learning model in training, the searching operation comprises the steps of changing a model framework, changing model super parameters and changing model training configuration, and the preset termination condition is that the searching operation times reach a preset searching operation times threshold value or the searching time reaches a preset searching time threshold value;
the training feedback data comprises architecture information, accuracy, gradients of each layer and parameter weights;
the specific method for obtaining the search operation of the current optimal machine learning model according to the training feedback data of the current optimal machine learning model comprises the following steps:
determining the architecture type, convergence type, gradient type and weight type of the current optimal machine learning model according to the architecture information, accuracy, gradient of each layer and parameter weight of the current optimal machine learning model;
Obtaining a condition combination of the current optimal machine learning model according to the architecture type, the convergence type, the gradient type and the weight type of the current optimal machine learning model, and obtaining a preset searching operation under the condition combination to serve as the searching operation of the current optimal machine learning model.
4. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image classification method according to any one of claims 1 to 2 when the computer program is executed.
5. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the image classification method according to any one of claims 1 to 2.
CN202111620457.9A 2021-12-27 2021-12-27 Feedback-based machine learning model searching method, system, equipment and medium Active CN114254764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111620457.9A CN114254764B (en) 2021-12-27 2021-12-27 Feedback-based machine learning model searching method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111620457.9A CN114254764B (en) 2021-12-27 2021-12-27 Feedback-based machine learning model searching method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN114254764A CN114254764A (en) 2022-03-29
CN114254764B true CN114254764B (en) 2024-04-05

Family

ID=80798358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111620457.9A Active CN114254764B (en) 2021-12-27 2021-12-27 Feedback-based machine learning model searching method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114254764B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657805A (en) * 2018-12-07 2019-04-19 泰康保险集团股份有限公司 Hyper parameter determines method, apparatus, electronic equipment and computer-readable medium
CN110991658A (en) * 2019-11-28 2020-04-10 重庆紫光华山智安科技有限公司 Model training method and device, electronic equipment and computer readable storage medium
CN111260073A (en) * 2020-01-09 2020-06-09 京东数字科技控股有限公司 Data processing method, device and computer readable storage medium
WO2020160252A1 (en) * 2019-01-30 2020-08-06 Google Llc Task-aware neural network architecture search
CN113077057A (en) * 2021-04-20 2021-07-06 中国科学技术大学 Unbiased machine learning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657805A (en) * 2018-12-07 2019-04-19 泰康保险集团股份有限公司 Hyper parameter determines method, apparatus, electronic equipment and computer-readable medium
WO2020160252A1 (en) * 2019-01-30 2020-08-06 Google Llc Task-aware neural network architecture search
CN110991658A (en) * 2019-11-28 2020-04-10 重庆紫光华山智安科技有限公司 Model training method and device, electronic equipment and computer readable storage medium
CN111260073A (en) * 2020-01-09 2020-06-09 京东数字科技控股有限公司 Data processing method, device and computer readable storage medium
CN113077057A (en) * 2021-04-20 2021-07-06 中国科学技术大学 Unbiased machine learning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DLPF:基于异构体系结构的并行深度学习编程框架;王岳青;窦勇;吕启;李宝峰;李腾;;计算机研究与发展;20160615(第06期);全文 *

Also Published As

Publication number Publication date
CN114254764A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
US12008445B2 (en) Black-box optimization using neural networks
CN108052394B (en) Resource allocation method based on SQL statement running time and computer equipment
US11036483B2 (en) Method for predicting the successfulness of the execution of a DevOps release pipeline
US20200333772A1 (en) Semantic modeling and machine learning-based generation of conceptual plans for manufacturing assemblies
CN111950810B (en) Multi-variable time sequence prediction method and equipment based on self-evolution pre-training
CN109891438B (en) Numerical quantum experiment method and system
CN107015875B (en) Method and device for evaluating storage life of electronic complete machine
US11269760B2 (en) Systems and methods for automated testing using artificial intelligence techniques
US20220366315A1 (en) Feature selection for model training
WO2022252694A1 (en) Neural network optimization method and apparatus
JPWO2018079225A1 (en) Automatic prediction system, automatic prediction method, and automatic prediction program
CN113221017B (en) Rough arrangement method and device and storage medium
CN114254764B (en) Feedback-based machine learning model searching method, system, equipment and medium
CN113761026A (en) Feature selection method, device, equipment and storage medium based on conditional mutual information
CN117762897A (en) Database configuration parameter tuning method and system based on large language model and deep reinforcement learning
CN107784032B (en) Progressive output method, device and system of data query result
CN116166967A (en) Data processing method, equipment and storage medium based on meta learning and residual error network
CN115730709A (en) Method, system, equipment and storage medium for predicting operation duration
vanden Broucke et al. Declarative process discovery with evolutionary computing
CN116070165A (en) Model reduction method, device, apparatus, storage medium and program product
Loveland et al. Dpll: The core of modern satisfiability solvers
CN111582498B (en) QA auxiliary decision-making method and system based on machine learning
US20230367303A1 (en) Automated intelligence facilitation of routing operations
US20230368085A1 (en) Automated intelligence facilitation of routing operations
CN111667107B (en) Research and development management and control problem prediction method and device based on gradient random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant