CN117116370A

CN117116370A - Chemical reaction yield prediction method and electronic equipment

Info

Publication number: CN117116370A
Application number: CN202311018103.6A
Authority: CN
Inventors: 杨昊澎
Original assignee: Guangzhou Biaozhi Future Science And Technology Co ltd
Current assignee: Guangzhou Biaozhi Future Science And Technology Co ltd
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-11-24

Abstract

The invention relates to the technical field of chemical reaction yield prediction, and particularly discloses a chemical reaction yield prediction method and electronic equipment, wherein the method comprises the following steps: constructing an instantiation model; encoding the sample data to generate a reaction vector; acquiring a super parameter combination and initializing an instantiation model to obtain an initialization model; inputting the reaction vector into an initialization model to operate and obtain model parameters, and obtaining corresponding evaluation indexes; reporting the evaluation index to a super-parameter adjustment tool; screening out an excellent model according to the evaluation index corresponding to the initialized model; and storing the model according to the super parameter combination of the excellent model and the model parameters. According to the invention, a plurality of machine learning models are trained and stored, and a model with higher adaptation degree to chemical reaction can be obtained, so that the accuracy of the model for predicting the chemical yield is improved, manual adjustment is not required for super-parameter combination, the efficiency and quality of model training are further improved, and the method has extremely high use and popularization values.

Description

Chemical reaction yield prediction method and electronic equipment

Technical Field

The invention relates to the technical field of chemical reaction yield prediction, in particular to a chemical reaction yield prediction method and electronic equipment.

Background

To improve the accuracy of prediction of chemical reaction yields in order to optimize reaction conditions, improve yields, and reduce costs, chemical reaction yield prediction models train systems. The chemical reaction yield prediction model training system can solve the following problems:

1. the cost is reduced: by predicting the reaction yield, the optimal reaction condition can be better selected, so that the reaction cost is reduced, and the chemical synthesis is more economical; by predicting the reaction yield, the trial-and-error cost can be reduced, the efficiency of the chemical experiment is improved, and the failure rate in the chemical experiment is reduced, so that the time and the cost are saved;

2. improving the yield: by optimizing the reaction conditions, selecting the catalyst and the ligand, and the like, the reaction yield is improved, and the chemical synthesis process is more efficient.

The existing method for constructing the yield prediction model and the yield prediction method predict the yield by utilizing a random forest algorithm, map the factors into a plurality of factor sets through a factor mapping module according to a plurality of factors which possibly affect the yield in training data, and construct the processed training data and weights into a random forest model through a model construction module.

However, the existing mode has (1) single descriptor of reactants, products and reaction conditions; (2) The precision of the machine learning model used, the random forest model, is limited; (3) The super-parameters of the machine learning model have a larger influence on the model performance, and manual adjustment of the super-parameters is inefficient.

Therefore, a new solution to the above-mentioned problems is needed for those skilled in the art.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a chemical reaction yield prediction method and electronic equipment.

The invention includes a chemical reaction yield prediction method, comprising:

acquiring configuration information of a user, and constructing an instantiation model according to the configuration information;

reading sample data of chemical yield, and encoding the sample data according to a preset reaction vector generation strategy to generate a reaction vector;

acquiring a super-parameter combination from a preset super-parameter adjustment tool, and initializing an instantiation model according to the super-parameter combination to obtain an initialization model;

inputting the reaction vector into an initialization model for operation to obtain model parameters, and obtaining corresponding evaluation indexes according to a preset evaluation strategy;

reporting the evaluation index to a super-parameter adjustment tool;

acquiring a super parameter combination from the super parameter adjusting tool again, and executing the subsequent steps until reaching the preset stopping condition;

screening an excellent model according to a preset model screening strategy and an evaluation index corresponding to the obtained initialization model;

and storing the model according to the super parameter combination of the excellent model and the model parameters.

Further, obtaining configuration information of the user, and constructing an instantiation model according to the configuration information, including:

reading configuration information of a user from a set path in a project folder; the configuration information at least comprises model names, the number N of training models and training duration;

and reading model data from the project folder according to the model names in the configuration information, and acquiring a super-parameter search space of the corresponding model.

Further, reading sample data of chemical yield, and encoding the sample data according to a preset reaction vector generation strategy to generate a reaction vector; comprising the following steps:

reading sample data, and dividing the sample data into a training sample set and a test sample set according to a set proportion; the training sample set and the test sample set both contain reactants, products and reaction conditions;

and sequentially encoding reactants, products and reaction conditions in the training sample set and the test sample set according to a reaction vector generation strategy to generate a reaction vector corresponding to each sample data.

Further, when a super-parameter combination is obtained from a preset super-parameter adjustment tool, the super-parameter combination is an optimal parameter set which is found by the super-parameter adjustment tool after traversing the super-parameter search space through a preset search algorithm; or, the super-parameter combination is an optimal parameter set which is found by a super-parameter adjustment tool after traversing the super-parameter search space through a preset search algorithm and combining with an evaluation index.

Further, inputting the reaction vector into an initialization model for operation to obtain model parameters, and obtaining corresponding evaluation indexes according to a preset evaluation strategy; comprising the following steps:

inputting the training set reaction vector into an initialization model for training to obtain model parameters;

generating a training model according to the model parameters;

inputting the reaction vector of the test set into a training model for testing to obtain a prediction result;

and evaluating the training model according to the evaluation strategy, the test set response vector and the prediction result to obtain an evaluation index.

Further, the training model is evaluated according to an evaluation strategy, a test set response vector and a prediction result, and an evaluation index is obtained, wherein the evaluation index comprises:

when the training model is a classification model,

calculating model accuracyWherein TP represents the real number of cases in which the real situation is a positive case and the predicted result is a positive case, FN represents the real number of false cases in which the real situation is a positive case and the predicted result is a negative case, FP represents the real number of false positive cases in which the real situation is a negative case and the predicted result is a positive case, TN represents the real number of false cases in which the real situation is a negative case and the predicted result is a negative case;

calculating the accuracy of the model

Calculating model recall ratio

Calculation of modelingEnergy parameterWherein P represents model Precision, and R represents model Recall ratio Recall;

calculate model AUC values:wherein pred _pos Representing the number of positive examples of the predicted result, pred _neg Representing the number of counterexamples of the predicted result; pos _num The number of positive examples of the real situation is represented; neg (neg) _num The number of counter examples is indicated.

when the training model is a regression model,

calculating the mean absolute error of the modelWherein y is _i Predictive tag value representing the ith sample,/->A true tag value representing the i-th sample; n represents the total number of samples;

calculating the maximum error of the model

Calculating root mean square error of model

Calculating the decision coefficients of the modelWherein (1)>Representing the average of n real labels;

pearson correlation coefficients of a computational modelx and y represent the value of the real result and the value of the predicted result, respectively; m is m _x And m _y Respectively representing the average value of the sum y;

kendell correlation coefficient of calculation modelWherein N is ₁ Indicating the consistent quantity of the real situation and the predicted result, N ₂ Indicating the number of real cases and inconsistent predicted results.

Further, the method also comprises the following steps:

if the type of the instantiation model is a classification model, processing the reaction vector through a preset oversampling algorithm to obtain a few types of reaction vectors;

inputting the original reaction vector and a few types of reaction vectors into an initialization model together for operation;

and/or the number of the groups of groups,

performing dimension reduction operation on the reaction vector through a preset dimension reduction algorithm to obtain a low-dimension reaction vector;

and inputting the low-dimensional reaction vector into an initialization model for operation.

Further, the evaluation indexes comprise a main evaluation index and a secondary evaluation index; screening an excellent model according to a preset model screening strategy and an evaluation index corresponding to the obtained initialization model; comprising the following steps:

screening a plurality of training models with top ranking according to the main evaluation index;

and screening an optimal training model from the training models according to the secondary evaluation indexes to serve as an excellent model.

The invention also includes an electronic device comprising:

a memory for storing a computer program;

and a processor for implementing the chemical reaction yield prediction method when executing the computer program.

According to the chemical reaction yield prediction method and the electronic equipment, an instantiation model is constructed through the acquired configuration information of a user, then the read sample data of the chemical yield is coded to generate a reaction vector, a super parameter combination is acquired from a preset super parameter adjustment tool, the instantiation model is initialized to obtain an initialization model, the reaction vector is input into the initialization model to operate to obtain model parameters, a corresponding evaluation index is obtained according to a preset evaluation strategy, before a preset stop condition is not met, a super parameter combination is acquired from the super parameter adjustment tool again, operation is continued to obtain the model parameters and the evaluation index until the preset stop condition is met, an excellent model is screened out according to a preset model screening strategy and is stored, and the stored model can be used for yield prediction of the chemical reaction; according to the invention, a plurality of machine learning models are trained and stored, and a model with higher adaptation degree to chemical reaction can be obtained, so that the accuracy of the model for predicting chemical yield is improved, and the super-parameter combination is a more excellent super-parameter obtained by combining the evaluation parameters of the existing model, and is free from manual adjustment, so that the efficiency and quality of model training are further improved, and the method has extremely high use and popularization values.

Drawings

For a clearer description of embodiments of the invention or of solutions in the prior art, the drawings which are used in the description of the embodiments or of the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart showing steps of a chemical reaction yield prediction method according to an embodiment of the present invention;

FIG. 2 is a flowchart showing a chemical reaction yield prediction method according to an embodiment of the present invention;

FIG. 3 is a flowchart showing a chemical reaction yield prediction method according to an embodiment of the present invention;

FIG. 4 is a flowchart showing a chemical reaction yield prediction method according to an embodiment of the present invention;

fig. 5 is a structural composition diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The present invention is directed to a chemical reaction yield prediction method, as shown in fig. 1, comprising:

step S10: and acquiring configuration information of the user, and constructing an instantiation model according to the configuration information.

Prior to the implementation of this step, model data for a plurality of machine models is created, from which one or more specified instantiation models are constructed from the plurality of machine models via configuration information of a user. The machine model may include XGBoost (eXtreme Gradient Boosting, extreme gradient lifting), random forest, SVM (Support Vector Machine ), KNN (K-Nearest Neighbor), and the like, and may be classified into two main types, i.e., regression model and classification model, according to the types of models. The classification model may be used to qualitatively and quantitatively predict the absence of yield of a chemical reaction, and the regression model may be used to qualitatively and quantitatively predict the magnitude of the yield value of a chemical reaction.

Specifically, as shown in fig. 2, step S10 includes:

step S101: the user's configuration information is read in from under the set path in the project folder. The configuration information at least comprises model names, the number N of training models and training time length.

The project folder in the embodiment of the invention comprises model data of various machine models, and configuration information of a user is read through a set path to acquire configuration requirements of the user, such as model names designated by the user, the number N of training models obtained by training the models, total training duration and the like. In addition, the user may generate configuration information together according to other needs, including but not limited to: (1) kinds of models (classification model/regression model); (2) Whether the model needs to be stored after model training; (3) whether only the model is trained and not tested; (4) whether only the model is tested without training; (5) model parameter adjustment information; (6) number of parallel model training tasks; (7) a time stamp; (8) saving the number of models after parameter adjustment; (9) training the test set proportion; (10) a random number seed; (11) Sample data related parameters (data set name, data set file name), etc.

Step S102: and reading model data from the project folder according to the model names in the configuration information, and acquiring a super-parameter search space of the corresponding model.

And reading corresponding model data according to the model name in the configuration information, and acquiring a super-parameter search space of the model. The method is used for subsequent model training and testing.

After the instantiation model construction is completed, the next step is performed.

Step S20: and reading sample data of chemical yield, and encoding the sample data according to a preset reaction vector generation strategy to generate a reaction vector.

The sample data of chemical yield in this embodiment can be determined by the user, and similarly, the stored sample data is read in from the set path of the project folder. And then, sample data are processed to obtain a reaction vector which can be read and processed by the instantiation model.

Specifically, as shown in fig. 3, step S20 includes:

step S201: reading sample data, and dividing the sample data into a training sample set and a test sample set according to a set proportion; both the training sample set and the test sample set contain reactants, products, and reaction conditions.

The set proportion for dividing the training sample set and the test sample set in this step may be included in the configuration information of the user obtained in step S10, and if the configuration information obtained in step S10 does not include the set proportion, the division may be performed according to a proportion value preset by the system. For example, 10000 samples are taken together, the ratio is set to be 8:2, 8000 samples are divided into training sample sets, the remaining 2000 samples are divided into test sample sets, and the dividing mode of the step can be random division or other modes, and the specific limitation is not made here.

Each sample in the training sample set and the test sample set contains reactants, products, and reaction conditions. All sample data of the invention under the same model are the same chemical reaction, but the types of reactants and products in each sample are different, and the corresponding reaction conditions are different, for example, whether catalysts, the types and the contents of the catalysts are adopted among different samples, whether ligands or solvents are adopted, the types and the contents of the ligands and the solvents are adopted, the adding modes of the ligands and the solvents are adopted, and the like.

Step S202: and sequentially encoding reactants, products and reaction conditions in the training sample set and the test sample set according to a reaction vector generation strategy to generate a reaction vector corresponding to each sample data.

The reaction vector generation strategy of the embodiment of the invention comprises the following steps:

respectively calculating molecular fingerprints of reactants and products in a training sample set and a testing sample set according to a molecular fingerprint algorithm, and splicing the molecular fingerprints into reaction vectors;

and respectively coding all the reaction conditions in the training sample set and the test sample set according to a specific reaction condition coding mode, and integrating the codes into the reaction vector to obtain the reaction vector corresponding to the sample data.

The optional molecular fingerprint algorithm in the step comprises Morgan2, morgan3 and the like, and the reaction condition coding mode is realized by adopting a single-heat coding mode.

The algorithm adopted by the reaction vector generation strategy in the embodiment of the invention can be selected according to different chemical reactions, namely different types of models and different coding modes of the reaction vectors can be adopted in the model acquisition process of different chemical reactions.

Step S30: obtaining a super parameter combination from a preset super parameter adjustment tool, and initializing the instantiation model according to the super parameter combination to obtain an initialization model.

The preset hyper-parameter adjustment tool in this embodiment adopts a Search algorithm to find an optimal hyper-parameter combination, and the Search algorithm can be implemented by TPE (Tree-structured Parzen Estimator, tree structure parameter estimation), grid Search (Grid Search), BOHB (Bayesian optimization and Hyperband), and other algorithms. The super parameter adjusting tool can adopt NNI (Neural Network Intelligence) tools, and can automatically adjust the super parameters under the conditions of more super parameters and time adjustment.

Step S40: inputting the reaction vector into an initialization model for operation, obtaining model parameters, and obtaining corresponding evaluation indexes according to a preset evaluation strategy.

Specifically, as shown in fig. 4, step S40 includes:

step S401: and inputting the training set reaction vector into an initialization model for training to obtain model parameters.

The invention aims to predict the yield of the chemical reaction, so that the information corresponding to reactants, products and reaction conditions in the reaction vector of the training set is used as the input of an initialization model, and the information corresponding to the yield is used as the output of the initialization model, thereby training the initialization model. And obtaining model parameters of the initialization model after training is completed.

Step S402: and generating a training model according to the model parameters.

After the model parameters are obtained in step S401, a corresponding training model is generated according to the model parameters and the initialization model.

Step S403: and inputting the reaction vector of the test set into a training model for testing, and obtaining a prediction result.

And taking reaction vector data corresponding to reactants, products and reaction conditions in the reaction vector of the test set as input of a training model, and predicting the yield of the chemical reaction by the training model to obtain a prediction result.

It should be noted that this step also exists for the true result (or referred to as true case) in the test set reaction vector.

Step S404: and evaluating the training model according to the evaluation strategy, the test set response vector and the prediction result to obtain an evaluation index.

The evaluation index of the step is obtained by combining the information of the real situation and the predicted situation of the reaction vector of the test set, and the evaluation result is used for reflecting the predicted effect or the predicted capability of the training model.

Specifically, the evaluation strategy in the embodiment of the invention comprises the following steps:

when the training model is a classification model:

(1) Calculating model accuracyWherein TP represents the real number of cases where the real situation is a positive case and the predicted result is a positive case, FN represents the real number of false cases where the real situation is a positive case and the predicted result is a negative case, FP represents the real number of false positive cases where the real situation is a negative case and the predicted result is a positive case, TN represents the real number of false cases where the real situation is a negative case and the predicted result is a negative case.

The Accuracy Accuracy of the model has a value range of [0,1], and shows that the correct prediction result accounts for the percentage of the total test sample.

(2) Calculating the accuracy of the model

The value range of the model Precision is [0,1], which represents the probability of actually positive samples in all samples predicted to be positive, for example, in the yield prediction, the ratio of samples predicted to be yield in samples representing actual yield.

(3) Calculating model recall ratio

The model Recall ratio Recall has a value in the range of 0,1, and represents the probability of being predicted as a positive sample among samples which are actually positive, for example, in the yield prediction, the ratio of samples which are predicted as having yield to be actually the yield is represented.

(4) Calculating model performance parametersWherein P represents model Precision, and R represents model Recall ratio Recall.

Model performance parameter F ₁ The value range of (2) is [0,1]]Is the harmonic mean of Precision and Recall. F (F) ₁ The higher the value of (2) is, the more the model can predict the alignment cases as much as possible, and the better the model performance is.

(5) Calculate model AUC values:wherein pred _pos Representing the number of positive examples of the predicted result, pred _neg Representing the number of counterexamples of the predicted result; pos _num The number of positive examples of the real situation is represented; neg (neg) _num The number of counter examples is indicated.

The range of values for the model AUC is [0,1], which means: randomly giving a positive sample and a negative sample, and outputting a probability value of the positive sample being positive more than a probability value of the negative sample being positive; if the samples are classified completely randomly, the AUC should be close to 0.5; the AUC of the model trained in general is >0.5; if auc=0.5, the classification effect is the same as completely random. The higher the AUC, the less false the actual lack of yield can be reported as much as possible in yield prediction.

And when the training model is a regression model:

(1) Calculating the mean absolute error of the modelWherein y is _i Predictive tag value representing the ith sample,/->A true tag value representing the i-th sample; n represents the total number of samples.

The average absolute error MAE is in the range of [0, ++ ], the closer the value is to 0, the smaller the error is represented.

(2) Calculating the maximum error of the model

The maximum error MaxError has a value range of [0, + ], the closer the value is to 0, the smaller the error is represented.

(3) Calculating root mean square error of model

The root mean square error MSE has a value in the range of 0, + -infinity, and the closer the value is to 0, the smaller the error is.

(4) Calculating the decision coefficients of the modelWherein (1)>Representing the average of n real labels.

Determining the coefficient R ² The range of the values of (E) is [ - + -infinity, + -infinity [ - + -infinity ]]The closer the value is to 1, the better.

(5) Pearson correlation coefficients of a computational modelx and y represent the value of the real result and the value of the predicted result, respectively; m is m _x And m _y The mean of the sum y is shown separately.

The value range of the pearson correlation coefficient r is [ -1,1], the larger the absolute value of the coefficient is, the stronger the correlation of the two groups of data is, the positive number is positive correlation, the negative number is negative correlation, and 0 is no correlation.

(6) Kendell correlation coefficient of calculation modelWherein N is ₁ Indicating the consistent quantity of the real situation and the predicted result, N ₂ Indicating the number of real cases and inconsistent predicted results.

The value range of the Kendell correlation coefficient tau is [ -1,1], the larger the absolute value of the coefficient is, the stronger the correlation of the two groups of data is, the positive number is positive correlation, the negative number is negative correlation, and 0 is no correlation.

After the evaluation index of the corresponding model is obtained, step S50 is performed.

Step S50: reporting the evaluation index to a hyper-parameter adjustment tool.

Step S60: and acquiring a super parameter combination from the super parameter adjusting tool again, and executing the subsequent steps until a preset stopping condition is reached.

When the hyper-parameter tool of the step gives out the hyper-parameter combination again, the hyper-parameter tool is carried out by combining the evaluation index obtained in the previous step. Because the super-parameters are parameters set before machine model learning and are not model parameters obtained through training, the super-parameters are required to be optimized, a group of optimal super-parameters are selected for the model so as to improve the performance and effect of model learning, and the basis of super-parameter optimization in the invention is the evaluation index of the last model.

Therefore, when the step acquires a super-parameter combination from the preset super-parameter adjustment tool, the super-parameter combination is the optimal parameter set which is found by the super-parameter adjustment tool after traversing the super-parameter search space through the preset search algorithm; or, the super-parameter combination is an optimal parameter set which is found by a super-parameter adjustment tool after traversing the super-parameter search space through a preset search algorithm and combining with an evaluation index.

If the preset stopping condition is not met, continuing to acquire a super parameter combination from the super parameter adjustment tool again, and continuing to execute step S30, namely initializing the instantiation model according to the newly acquired super parameter combination to obtain an initialization model, inputting the response vector into the initialization model to operate and obtain model parameters, obtaining corresponding evaluation indexes according to a preset evaluation strategy, and finally reporting the evaluation indexes to the super parameter adjustment tool.

In the model training process of the embodiment of the invention, training and testing can be performed aiming at models with only one name, and parallel training and testing of a plurality of models with different names can be selected. When each model is trained, the number N of training models set in the configuration information of the user can be used as a preset stopping condition, or the total training duration set in the configuration information of the user can be used as a preset stopping condition, or other preset stopping conditions, so that the super-parameter tool does not continue to give out super-parameter combinations, and the initialization model corresponding to the new super-parameter combinations is not generated.

Step S70: and screening out an excellent model according to a preset model screening strategy and an evaluation index corresponding to the obtained initialization model.

And obtaining a plurality of training models through the steps, wherein each training model has a corresponding evaluation index, and selecting an excellent model from the training models according to a preset model screening strategy. The evaluation index in the present embodiment includes a main evaluation index and a sub-evaluation index. The primary and secondary evaluation indexes are set by the skilled person, for example, the default primary evaluation index of the classification model is set as the model performance parameter F1, and the default primary evaluation index of the regression model is set as the root mean square error MSE; setting the default sub-evaluation index of the classification model as AUC and the default sub-evaluation index of the regression model as the determination coefficient R ² The main evaluation index and the sub evaluation index may be reset by those skilled in the art according to the circumstances, and the present invention is not limited thereto.

The method comprises the steps of screening an excellent model according to a preset model screening strategy and an evaluation index corresponding to an obtained initialization model; the method specifically comprises the following steps:

step S701: and screening a plurality of training models with top ranking according to the main evaluation index.

For example, N training models belonging to the classification model are obtained in total, and the training model with the top 5% of the model performance parameters F1 is screened out.

Step S702: and screening an optimal training model from the training models according to the secondary evaluation indexes to serve as an excellent model.

And screening out the training model with the optimal AUC value from the training models with the top 5% of the ranking of the model performance parameters F1 as the optimal model.

And if the configuration information of the user contains the number of the optimal models, selecting a corresponding number of training models as the optimal models according to the ranking of the evaluation indexes.

Step S80: and storing the model according to the super parameter combination of the excellent model and the model parameters.

The stored model can be used for predicting the yield of the same chemical reaction or the yield of the same chemical reaction.

Specifically, before step S40, the method in the embodiment of the present invention further includes:

and inputting the original reaction vector and the minority reaction vector into an initialization model together for operation.

For the training task of the classification model, if the problem of unbalanced yield occurs during model training, namely the difference of data of different types in a training sample set is large, the reaction vector can be input into an oversampling algorithm for processing, the data of the type with smaller original quantity is generated, namely the reaction vector of a few types is constructed, and then the original reaction vector and the obtained reaction vector of the few types are input into an initialization model together for operation.

And/or, further comprising the steps of: performing dimension reduction operation on the reaction vector through a preset dimension reduction algorithm to obtain a low-dimension reaction vector; and inputting the low-dimensional reaction vector into an initialization model for operation.

If the data feature dimensions in the training sample set and the test sample set are too many, a preset dimension reduction algorithm can be adopted to carry out dimension reduction operation on the reaction vector, for example, a low-dimension reaction vector is obtained after the operation of a PCA algorithm is adopted, so that redundant features can be removed, main features of data are extracted, and the model training efficiency is improved.

And obtaining corresponding model parameters through the steps, obtaining corresponding evaluation indexes according to a preset evaluation strategy, and executing subsequent steps.

The embodiment of the present invention further includes an electronic device 200, as shown in fig. 5, including: a memory 201 for storing a computer program; a processor 202 for implementing the chemical reaction yield prediction method of the above embodiment when executing a computer program. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The memory may include at least: any entity or device capable of carrying the computer program code to the photographing apparatus/terminal device, recording medium, computer memory, read-only memory (ROM), random access memory (random access memory, RAM), such as a U-disk, removable hard disk, magnetic or optical disk, and the like.

According to the chemical reaction yield prediction method and the electronic equipment, an instantiation model is constructed through the acquired configuration information of a user, then the read sample data of the chemical yield is coded to generate a reaction vector, a super-parameter combination is acquired from a preset super-parameter adjustment tool, the instantiation model is initialized to obtain an initialization model, the reaction vector is input into the initialization model to operate to obtain model parameters, a corresponding evaluation index is obtained according to a preset evaluation strategy, before a preset stop condition is not met, a super-parameter combination is acquired from the super-parameter adjustment tool again, operation is continued to obtain the model parameters and the evaluation index until the preset stop condition is met, an excellent model is screened out according to a preset model screening strategy and is stored, and the stored model can be used for yield prediction of the chemical reaction; according to the invention, a plurality of machine learning models are trained and stored, and a model with higher adaptation degree to chemical reaction can be obtained, so that the accuracy of the model for predicting chemical yield is improved, and the super-parameter combination is a more excellent super-parameter obtained by combining the evaluation parameters of the existing model, and is free from manual adjustment, so that the efficiency and quality of model training are further improved, and the method has extremely high use and popularization values.

The invention has been further described with reference to specific embodiments, but it should be understood that the detailed description is not to be construed as limiting the spirit and scope of the invention, but rather as providing those skilled in the art with the benefit of this disclosure with the benefit of their various modifications to the described embodiments.

Claims

1. A method for predicting yield of a chemical reaction, the method comprising:

acquiring a super-parameter combination from a preset super-parameter adjustment tool, and initializing the instantiation model according to the super-parameter combination to obtain an initialization model;

inputting the reaction vector into the initialization model for operation, obtaining model parameters, and obtaining corresponding evaluation indexes according to a preset evaluation strategy;

reporting the evaluation index to the hyper-parameter adjustment tool;

acquiring a super parameter combination from the super parameter adjusting tool again, and executing the subsequent steps until a preset stop condition is reached;

2. The method of claim 1, wherein obtaining configuration information of a user and constructing an instantiation model based on the configuration information, comprises:

3. The method for predicting chemical reaction yield according to claim 2, wherein sample data of chemical yield is read, and the sample data is encoded according to a preset reaction vector generation strategy to generate a reaction vector; comprising the following steps:

and sequentially encoding reactants, products and reaction conditions in the training sample set and the test sample set according to the reaction vector generation strategy to generate a reaction vector corresponding to each sample data.

4. A chemical reaction yield prediction method according to claim 3, wherein when a superparameter combination is obtained from a preset superparameter adjustment tool, the superparameter combination is an optimal parameter set found by the superparameter adjustment tool after traversing the superparameter search space through a preset search algorithm; or, the super parameter combination is an optimal parameter set found after the super parameter search space is traversed by the super parameter adjustment tool through a preset search algorithm and the evaluation index.

5. The method for predicting chemical reaction yield according to claim 4, wherein the reaction vector is input into the initialization model to be operated and model parameters are obtained, and corresponding evaluation indexes are obtained according to a preset evaluation strategy; comprising the following steps:

inputting the training set reaction vector into the initialization model for training to obtain the model parameters;

generating a training model according to the model parameters;

inputting the test set reaction vector into the training model for testing to obtain a prediction result;

and evaluating the training model according to the evaluation strategy, the test set reaction vector and the prediction result to obtain the evaluation index.

6. The method of predicting yield of a chemical reaction according to claim 5, wherein evaluating the training model based on the evaluation strategy, the test set reaction vector, and the prediction result to obtain the evaluation index comprises:

when the training model is a classification model,

calculating the accuracy of the model

Calculating model recall ratio

Calculating model performance parametersWherein P represents model Precision, and R represents model Recall ratio Recall;

7. The method of predicting yield of a chemical reaction according to claim 5, wherein evaluating the training model based on the evaluation strategy, the test set reaction vector, and the prediction result to obtain the evaluation index comprises:

when the training model is a regression model,

calculating the maximum error of the model

Calculating root mean square error of model

Determination of computational modelCoefficients ofWherein (1)>Representing the average of n real labels;

8. The method for predicting yield of a chemical reaction of claim 1, further comprising:

if the type of the instantiation model is a classification model, processing the reaction vector through a preset oversampling algorithm to obtain a minority reaction vector;

inputting the original reaction vector and the minority reaction vector into the initialization model together for operation;

and/or the number of the groups of groups,

and inputting the low-dimensional reaction vector into the initialization model for operation.

9. The method of predicting yield of a chemical reaction according to claim 5, wherein said evaluation index comprises a main evaluation index and a sub-evaluation index; screening an excellent model according to a preset model screening strategy and an evaluation index corresponding to the obtained initialization model; comprising the following steps:

and screening an optimal training model from a plurality of training models according to the secondary evaluation index to serve as an excellent model.

10. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing a chemical reaction yield prediction method according to any one of claims 1 to 9 when executing said computer program.