CN116126995A

CN116126995A - Index information generation method and device and computer readable storage medium

Info

Publication number: CN116126995A
Application number: CN202211488395.5A
Authority: CN
Inventors: 李国冬; 李云彬; 蒋宁; 吴海英
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-05-16

Abstract

The embodiment of the specification provides a method, a device and a computer readable storage medium for generating index information, which comprise the following steps: after each machine learning model is trained, acquiring first class data and second class data generated by training; storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform; generating file index information according to the model file data set and the text file data set; the file index information includes access entry information for a first class of data and a second class of data corresponding to a machine learning model. According to the method, the multiple models are trained by using the model training platform, and in the training process, data generated by training are continuously acquired and then are uniformly stored, so that the technical problem that team members cannot conveniently check experimental results of other members in the team is solved, and the efficiency of checking the experimental results by the team members is improved.

Description

Index information generation method and device and computer readable storage medium

Technical Field

The present disclosure relates to the field of machine learning, and in particular, to a method and apparatus for generating index information, and a computer readable storage medium.

Background

Machine learning and big data are becoming more and more popular in recent years, and their impact on society is expanding. Many industries increasingly rely on machine learning algorithms and artificial intelligence models to make critical decisions that affect businesses and individuals every day. In a complete machine learning model experiment lifecycle, there are many machine learning articles such as data sets, model training codes, model experiment index evaluation data, model files, and the like.

In the prior art, for the development task of a team-based multi-machine learning model, team members cannot conveniently check experimental results of other members in the team.

Disclosure of Invention

In view of this, various embodiments of the present disclosure aim to provide a method for generating index information, so as to solve the technical problem that team members cannot conveniently view experimental results of other members in a team in a development task of a multi-machine learning model.

Various embodiments in the present specification provide a method for generating index information, where the method is applied to a model training platform, and the method includes: acquiring first class data and second class data generated by training a machine learning model; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file. And storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform. Generating file index information according to the model file data set and the text file data set; wherein the file index information includes access entry information for a first class of data and a second class of data of the machine learning model.

One embodiment of the present specification provides a method for displaying index information, the method including: receiving index information sent by a model training platform; the index information is obtained by adopting the method for generating the index information. Forming an index page according to the index information; the index page comprises access entry information of first class data and second class data corresponding to the machine learning model or text identification bound with the access entry information.

One embodiment of the present specification provides an index information generating apparatus applied to a model training platform for training a plurality of machine learning models; the device comprises: an acquisition unit for acquiring first-class data and second-class data generated by machine learning model training; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file. And the storage unit is used for storing the first type data of each machine learning model into the model file data set of the model training platform, and storing the second type data into the text file data set of the model training platform. A generating unit for generating file index information according to the model file data set and the text file data set; the file index information comprises access entry information of first-class data and second-class data corresponding to the machine learning model.

An embodiment of the present specification provides an electronic device including a memory storing a computer program and a processor implementing the method described above when executing the computer program.

One embodiment of the present specification provides a computer-readable storage medium having a computer program stored therein, which when executed by a processor, implements the method described above.

In the embodiments provided by the specification, in the development task of a team-based multi-machine learning model, the model training platform is used for training a plurality of machine learning models, corresponding first-class data and second-class data are acquired after training is completed, and then unified storage is carried out to form a model file data set and a text file data set, so that index information of a file is built, and therefore team personnel can conveniently check experimental results of other members through the index information, the team members can quickly check the experimental results, and checking and comparing efficiency is improved.

Drawings

Fig. 1 is a schematic application environment diagram of a method for generating index information according to an embodiment of the present disclosure.

Fig. 2 is a flowchart of a method for generating index information according to an embodiment of the present disclosure.

FIG. 3 is a block diagram of a model training platform provided in one embodiment of the present description.

Fig. 4 is a schematic diagram of a working process logic of a model training platform according to an embodiment of the present disclosure.

Fig. 5 is a schematic flow chart of a working process of a model training platform according to an embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating a method for displaying index information according to an embodiment of the present disclosure.

Fig. 7 is a block diagram of an index information generating device according to an embodiment of the present disclosure.

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In the related art, machine learning is to parse data through some algorithms and learn from it, thereby obtaining a machine learning model. Some data may be inferred, as well as predicted, using the machine learning model obtained to accomplish some specific tasks, such as classification tasks. The machine learning model may be a neural network model, a linear network model, a deep learning model, a support vector machine, or other type of machine learning model that results from performing the training. The model is a function which is learned from data by adopting some algorithms and can realize specific mapping. The machine learning model file is capable of identifying a particular type of pattern. The machine learning model files generally include files describing model structures (e.g., structures of convolutional neural networks), such as meta files; also included are files describing model parameters (e.g., connection weight parameters between layers), such as ckpt files.

In a complete machine learning task, various kinds of data are generally involved, such as machine learning model training data, model training code file data, model evaluation index file data, and model file data. Therefore, tracking and management of individual data in one task is very important. In the development task of a team-based multi-machine learning model, in the existing scene, some team members may develop locally, and after training is completed, model file data and evaluation index file data of the model directly exist locally or are stored in a file server. Some team members may develop in the model development server, and after training is completed, the model file data and the evaluation index file data of the model are stored in the cloud, and even some team members do not manage the data. Therefore, the whole team members are inconvenient to check the experimental results of the other members, and one member in the team cannot check the experimental results of other members in the team conveniently. Sharing experimental results between team members is also inconvenient.

In summary, it is necessary to provide a method for generating index information, which trains a plurality of machine learning models by using a model training platform, acquires corresponding first-class data and second-class data after training is completed, and performs unified storage, so that the technical problem that team members cannot conveniently check experimental results of other members in the team is solved, quick checking of the experimental results by the team members is realized, and checking and comparing efficiency is improved.

As shown in fig. 1, the embodiment of the present specification provides a system for generating index information, which may include a terminal and a server. The server may be an electronic device with some arithmetic processing capability. For example, the server may be a server of a distributed system, or may be a system having a plurality of processors, memories, network communication modules, etc. operating in conjunction. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The server may also be a cluster of servers formed for several servers. Or, with the development of science and technology, the server may also be a new technical means capable of realizing the corresponding functions of the embodiment of the specification. For example, a new form of "server" based on quantum computing implementation may be possible.

In this embodiment of the present specification, the terminal may be an electronic device having network access capability. Specifically, for example, the terminal may be a desktop computer, a tablet computer, a notebook computer, a smart phone, a digital assistant, a shopping terminal, a television, or the like. Alternatively, the terminal may be software capable of running in the electronic device.

The network may be any type of network that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. The one or more networks may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The index information generation system may also include one or more databases. For example, a database used by a server may be local to the server, or may be remote from the server and may communicate with the server via a network-based or dedicated connection. The databases may be of different types. In some embodiments, the database used by the server may be a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

As shown in fig. 2, one embodiment of the present specification provides a method of generating index information. The index information generation method is applied to a model training platform, and the model training platform is used for training a plurality of machine learning models. The index information generating method may include the following steps.

Step S101: after each machine learning model is trained, acquiring first class data and second class data generated by training; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file.

In some cases, for a team-like multi-machine learning model development task, multiple different kinds of models may need to be developed, and different versions may exist for the same kind of model. Many kinds of data and corresponding files are involved in the development of a particular model. Specifically, training data for training needs to be prepared before training is performed. The training data may be in the form of data set files and require different kinds of pre-processing of the data set files to form different versions of the data set files.

After a plurality of iterations, the loss function converges, and model file data of the machine learning model and machine learning model evaluation index file data are obtained. Analyzing the evaluation index file data of the machine learning model finds that the corresponding evaluation index of the machine learning model obtained at this time is very different from the expected evaluation index, and then the machine learning model may need to be retrained once or multiple times. Prior to retraining, modifications may be required to the hyper-parameters of the machine learning model, the network parameters of the model, and the structure of the model. Thus, a plurality of different versions of model code file data are obtained. After retraining, model file data of the corresponding machine learning model and machine learning model evaluation index file data are also obtained.

From the above, it can be seen that a complete machine learning model training life process involves multiple kinds of data that need to be managed, and that the same kind of data may have different versions. There is therefore a need for a unified data version management approach.

In this embodiment, the data may be classified into the first type data and the second type data according to the size of the data. The first type of data may include, in addition to model file data for a machine learning model. In some embodiments, model file data of machine learning models trained under different machine learning frameworks is different. For example, model file data of a machine learning model trained by using a keras framework mainly comprises file data in h5 format. The file data in h5 format includes the structure of the model, the weights of the model, the training configuration and the state of the optimizer. Model file data of a machine learning model obtained by training by adopting a Tensorflow framework mainly comprises file data in meta format and file data in ckpt format. The meta format file data is mainly used for storing the model diagram structure. The file data in the ckpt format is mainly used for storing variables such as network weight parameters and the like.

In the present embodiment, the first type of data may include model file data of a machine learning model, machine learning model evaluation index file data, and machine learning model training data. The machine learning model training data may include image data, video data, text data, and the like used as training samples. The machine learning model evaluation index file data is data representing an evaluation machine learning model performance index. The performance metrics may include Accuracy (Accuracy), precision (Precision), recall (Recall), P-R Curve (Precision-Recall Curve), F1 Score, and the like.

The second class of data may include code file data of a machine learning model and metafile data of a model file. The code file data represents code that is specifically run in a machine learning framework. The metadata may be used as index data of the specific storage location of the first type of data, where the metadata may include the specific storage location of the first type of data and the data amount of the first type of data. In some embodiments, a cryptographic hash function may be used to calculate a hash value for the first type of data, store the hash value in the metafile data, and modify the file name of the first type of data accordingly. Therefore, the uniqueness of the MD5 value can be utilized to distinguish a plurality of first type files, and furthermore, the required first type data can be conveniently found through the specific storage position, the data quantity and the hash value of the corresponding first type data in the metadata.

In this embodiment, some tracking tools of the model training process may be used to obtain the first type of data and the second type of data, for example, a MLFLOW tool may be used to track the model training process. In order to enable the MLFLOW tool to be applicable to the model training platform, the MLFLOW tool can be subjected to secondary development, and by combining with the Http protocol, the MLFLOW tool can be called through an access request, so that the MLFLOW tool can be applicable to tracking a machine learning model deployed in the model training platform.

In the development task of a team-based multi-machine learning model, the MLFLOW tool can be used for timely acquiring the first type data and the second type data. In some embodiments, other tracking tools for model training processes may also be employed. Such as TensorBoard, trans, etc.

In this embodiment, as shown in FIG. 3, a development team may train multiple machine learning models by using a model training platform that is primarily intended to implement an online programming environment. The model training platform can be configured in a server side or cloud environment. The model training platform may include a model training module, a data version control module, a storage module. Specifically, the model training module can be realized through cloud technology and container technology, and team members can perform model training in the model training module. And wherein an MLFLOW tool is also configured for tracking experimental results. The data version control module includes a DVC (data version control, data version management) module and a GIT (open source distributed version control system) module. The data version control module is used for carrying out version control on experimental results. And the storage module is used for uniformly storing the experimental results. Specifically, the data version control module can perform a certain secondary development for the DVC module and the GIT module, package the DVC module and the GIT module into a software service, and provide an access port based on the Http protocol. When the model training platform needs to call the DVC module or the GIT module, an access request can be sent to the access port by the Http protocol, parameters can be attached to the access request, and the parameters in the access request are used as control instructions, so that the DVC module or the GIT module can execute corresponding functions.

In this embodiment, referring to fig. 4, a labeling person may perform sample labeling on a model training platform to obtain a sample data set. The algorithm engineer can use the online programming environment provided by the model training platform to develop and debug the model. Furthermore, the machine learning model after model debugging can be used for model training by using the sample data set, and a training result can be obtained. The training result comprises model file data of a machine learning model, index file data and super-parameter file data of the machine learning model, and metadata file data corresponding to the model file data of the machine learning model and the machine learning model evaluation index file data. The algorithm engineer can verify whether the effect of the trained machine learning model is ideal based on the model training platform. Under the condition that the model is considered to be not ideal, the parameters of the model file data can be manually adjusted, and after the model is debugged, the model training process is executed again. Alternatively, some data optimization suggestions for sample data may be fed back to optimize sample data in the sample data set in case it is deemed less than ideal. If the machine learning model is deemed to have achieved the desired effect, processing of the machine learning model may be ended.

In some embodiments, the tracking tool of the model training process described above may be preconfigured in the online programming environment. Such as an MLFLOW tool. The machine learning framework may also be preconfigured. For example, pytorch, tensorFlow, XGBoost, scikit-learn et al machine learning framework. In some embodiments, the online programming environment described above may be implemented in the following manner. For example, when the model training platform is configured in a cloud environment, members of the development team may enter configuration information through an application program interface (application program interface, API) or graphical user interface (graphical user interface, GUI). The configuration information is sent to the model training platform. And the model training platform creates a corresponding Docker according to the configuration information. And when training is specifically executed, starting the Docker, and mounting the GPU to the Docker and storing the GPU according to the configuration information. Dock executes a machine learning framework, an MLFLOW tool, and reads the dataset file for training according to the configuration information. In some embodiments, if the members of the development team need to modify the code of the machine learning model, the code of the machine learning model may be adjusted online using Jupyter NoteBook.

In this embodiment, the file obtained after training of each machine learning model is divided into the first-class data and the second-class data in consideration of the characteristics of the machine learning model development task. Lays a foundation for subsequent file storage management.

Step S102: and storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform.

In some cases, after the first type of data and the second type of data are obtained by the tracking tool of the model training process, these files also need to be stored for later viewing or use.

In this embodiment, the model file data set is used to store the first type of data. Accordingly, the text file data set is used to store the second type of data. In some implementations, the model file data set may employ a remote repository. Such as FS/HDFS/NFS/NAS, etc. The text file data set may employ a remote code repository. For example, gitee/Gitleb/Github, etc. Specifically, a set of general data storage schemes are constructed for the different data types (model file data of the machine learning model, machine learning model evaluation index file data, machine learning model training data, code file data of the machine learning model and corresponding metadata file data), and unified data version management is performed. The method comprises the steps of inputting password hash functions to model file data, machine learning model evaluation index file data and machine learning model training data of a machine learning model to generate hash values, storing the corresponding hash values in corresponding metadata file data, naming file names of first-class data as hash values corresponding to the file, and storing the hash values in a remote storage library. Corresponding metadata is stored in a remote code repository. In this way, an association relationship can be established between the metadata and the first-type data by the hash value. The first type data can be conveniently searched through the content of the metadata.

Code data for the machine learning model is stored in a remote code repository. The code file data for the machine learning model may be managed using Git for version management of the code. The model file data, the machine learning model evaluation index file data and the machine learning model training data of the machine learning model are subjected to version control of the file by using Git, so that dependence on other components is reduced. But for binary files, git requires the storage of changes per commit, which generates an additional commit size each time a change occurs to a binary file modification. This will result in a large increase in the amount of data and a rapid increase in the size of the remote code repository. To reduce the volume of the remote code repository itself, in some embodiments, a DVC may be introduced for data versioning. For example, only the model file data of the machine learning model, the machine learning model evaluation index file data, and the meta file data of the machine learning model training data are stored in the remote code repository. And model file data of the machine learning model, machine learning model evaluation index file data, and machine learning model training data are stored in a remote repository.

In this embodiment, the save action may be implemented by two tools, i.e., the Git and DVC tools. Specifically, the code file data of the machine learning model, the model file data of the machine learning model, the machine learning model evaluation index file data, and the meta file data of the machine learning model training data are processed by Git. The model file data, the machine learning model evaluation index file data, and the machine learning model training data of the machine learning model are processed by the DVC. More specifically, the actual data sources (model file data of the machine learning model, machine learning model evaluation index file data, and machine learning model training data) are first transferred to a remote repository via a DVC. The mapping data of the DVC, i.e. the above-mentioned meta file data, is then pushed to a remote code base, e.g. gitsub or gitlab, using a gitpush.

In some embodiments, a directory structure of the model file data, the machine learning model evaluation index file data, the machine learning model training data, and the code file data of the machine learning model in a development task of a specific machine learning model may be specified. Through a unified directory structure, team members can conveniently locate related problems when troubleshooting the problems. Meanwhile, a unified directory structure is adopted, so that a multi-version data merging function can be realized, and the conflict of data files can be effectively avoided by defining corresponding protocol data, wherein the unified directory structure can comprise the following 6 first-level directories. Specifically, a dvc directory (data version management directory), a git directory (code version management directory), a data_set directory (machine learning model training data directory), a model_file directory (machine learning model file data directory), a model_metric directory (machine learning model evaluation index file data directory), and a source_code directory (code file data directory).

In this embodiment, an operation of storing the first type of data and the second type of data can be described as a specific example. Please refer to fig. 5. Specifically, a text file data set is stored by using a gitlab, a model file data set is stored by using an hdfs, a scikit-learn is used as a machine learning framework, and an MLflow is used as a tracking tool of a model training process for illustration. Firstly, an experimental task of a local new machine learning model can be created in a model training platform, whether an output catalog of the new machine learning model exists or not is judged, if yes, processing is not performed, if not, a new catalog is created, a new version branch is created, and the new branch is switched. The algorithm engineer may use the online editing environment provided by the model training platform to perform model development and debugging, and in particular, to formulate an initial version of the machine learning model based on the machine learning framework and debug the machine learning model.

Next, model training may be performed for the machine learning model using a dataset formed from sample data labeled in the model training platform by a labeling person, the MLflow tracking training process, monitoring whether the training was successful. If training fails, ending the experimental task; and if the training is successful, ending the experimental task after obtaining a training result, wherein the training result comprises model file data of a machine learning model and a machine learning model evaluation index file, and generating metadata file data corresponding to the model file data of the machine learning model and the machine learning model evaluation index file data. The meta file data is added to the temporary storage area. And submitting the metadata to a local code warehouse, and pushing the metadata in the local code warehouse to a gitlab. The DVC may be invoked to transfer model file data of the machine learning model and machine learning model evaluation index file data to hfs for storage.

Step S103: generating file index information according to the model file data set and the text file data set; wherein the file index information includes access entry information for a first class of data and a second class of data of the machine learning model.

In some cases, in the development task of a team-based multi-machine learning model, after tracking the development process and unified storage and version control of data are further realized, the team member needs to be able to conveniently check the experimental results of other members in the team.

In this embodiment, the file index information may include access entry information of the first type data and the second type data. The access entry information is expressed as information of actual storage addresses of the first type data and the second type data corresponding to the machine learning model. In some embodiments, the file index information may include, in addition to access entry information of the first type of data and the second type of data corresponding to the machine learning model, information carried by the first type of data and the second type of data corresponding to the machine learning model. In particular, for example, the information carried by the code file data may include specific code content data and corresponding versions. The information carried by the machine learning model evaluation index file data can comprise an evaluation index corresponding to a certain machine learning model. Team members may query information included in the first type of data and the second type of data based on the file index information. The first type data and the second type data may also be downloaded locally, etc., based on the file index information.

In particular, in some embodiments, the first type of data may include model file data of a machine learning model and machine learning model evaluation index file data. The second class data may include code file data of a machine learning model, model file data, and metafile data of an evaluation index file data. In this case, team members may query model file data of a machine learning model according to the file index information, thereby obtaining model parameters, network architecture, and the like of the machine learning model. Team members can query the machine learning model evaluation index file data according to the file index information, so that the specific evaluation index of the machine learning model is obtained. Team members can query the metadata of the machine learning model evaluation index file data according to the file index information, and then determine the real storage position of the machine learning model evaluation index file data according to the metadata of the machine learning model evaluation index file, so that the machine learning model evaluation index file data is downloaded to the local. Further, team members may also choose whether to retrain the machine learning model based on the specific evaluation criteria.

In this embodiment, the file index information is generated by using the model file data set and the text file data set. Team members can conveniently check experimental results of other members in the team according to the file index information. The technical problem that team members cannot conveniently check experimental results of other members in the team is solved, the team members can check the experimental results quickly, and checking and comparing efficiency is improved. In this embodiment, a model file data set and a text file data set are provided for storing the first type of data and the second type of data. Therefore, disaster recovery processing is convenient for the first-class data and the second-class data, and the situation of data loss caused by disk damage is avoided. Correspondingly, the centralized management mode of the first-class data and the second-class data is also convenient for carrying out versioning management on experimental results and is convenient for tracking experiments.

In some embodiments, before the step of acquiring the first class data and the second class data generated by training after each machine learning model completes training, the method further comprises: receiving a request for executing a plurality of machine learning model training tasks sent by a data port; wherein the request includes configuration information of the machine learning model training task; creating corresponding workspaces for the plurality of machine learning model training tasks according to the configuration information; the machine learning model training task is performed within the workspace.

In some cases, in the development task of one team-based multi-machine learning model, only one team program needs to perform the machine learning model training task at a certain time node, and a plurality of machine learning model training tasks need to be performed simultaneously. Thus, for the case where only one team member is developing but there are a plurality of machine learning models, unified management and storage of files is also required.

The data port is indicated as that a certain team member is based on a terminal device, and the model training platform can log in through an account of the terminal device to execute a machine learning model training task. The team member may log into a web page configured on the terminal, may be a client configured on the terminal, or other type of user terminal, etc. The team member submits a request to perform a plurality of machine learning model training tasks on the web page or client. The request is then sent to the server.

The configuration information may be a specific configuration file. Specifically, when a team member inputs corresponding information on the web page or client, the web page or client generates a configuration file according to the information. The information may be, among other things, the type of model training framework, code version, operating parameters, etc. Or, the team member directly selects the corresponding configuration through the configuration template provided by the web page or the client, and then the web page or the client generates the configuration file. The configuration file is carried in a request for executing a plurality of machine learning model training tasks sent by the account, and after the server receives the configuration file, the configuration file is parsed to obtain the types, code versions, operation parameters, paths of data set files and the like of model training frames corresponding to the plurality of machine learning model training tasks. The operating parameters may also include, among other things, system version, memory size, GPU card count, etc.

The workspace may be represented as building an isolated and complete training environment through a mirror using containerization techniques. Corresponding working spaces or training environments are respectively built for a plurality of machine learning model training tasks. Specifically, in some embodiments, one container instance may be deployed quickly through a built container image to run interactive development workbooks (e.g., jupyter Notebook), tracking tools for model training processes (e.g., MLflow), data version control tools (Git and DVC) in the image. A team member a may log into the interactive development workbook to control multiple machine learning model training processes. Proportional, hyper-parameters of a machine learning model may be modified, etc. Meanwhile, experimental results of the machine learning models can be collected by a tracking tool in the model training process and then respectively stored by a data version control tool. At this point, another team member B may log into the web page or client to view training results of the plurality of machine learning models. For example, specific training results may be rendered by MLflow to a front page for presentation.

The present method is directed to situations where only one team member is present but multiple machine learning model training tasks need to be performed. And creating a corresponding working space for each machine learning model training task by adopting a containerization technology, executing the corresponding machine learning model training task in the working space, and correspondingly, logging in a model training platform to check experimental results of the plurality of machine learning models by other team members who do not need to execute the machine learning model training task. According to the method, a special situation in the development task of the team multi-machine learning model is considered, the team member is guaranteed to be capable of rapidly checking and comparing the experimental result, and the efficiency of the team development task is improved.

In some embodiments, before the step of obtaining the first class data and the second class data generated by training the machine learning model, the method further includes: receiving requests for executing machine learning model training tasks sent by a plurality of data ports; wherein the request includes configuration information for performing a plurality of machine learning model training tasks and the machine learning model training tasks; creating a corresponding working space for the data port according to the configuration information; the machine learning model training task is performed within the workspace.

In some cases, in the development task of a team-based multi-machine learning model, at a certain time node, there are multiple team members that need to perform the machine learning model training task. And where at least one team member needs to perform multiple machine learning model training tasks simultaneously. Therefore, unified file management and storage are also required for this case.

The plurality of data ports represent that a plurality of team members log in the model training platform through respective terminal devices to execute machine learning model training tasks. The creation of a corresponding workspace for the account according to the configuration information represents creation of its corresponding workspace for each data port. That is, one data port corresponds to one container instance. One machine learning model training task needs to be performed in some container examples, and a plurality of machine learning model training tasks need to be performed in some container examples.

The request may be expressed as a request including a machine learning model training task and configuration information corresponding to the task. The request may also be expressed as a request including a plurality of machine learning model training tasks and configuration information corresponding to the tasks. The request may also be expressed as a request including only one machine learning model training task or configuration information corresponding to one machine learning model training task. The request may also be expressed as a request including a plurality of machine learning model training tasks or configuration information corresponding to the plurality of machine learning model training tasks.

In the method, in the development task of a team-based multi-machine learning model, a plurality of team members need to execute the machine learning model training task, and at least one team member needs to execute a plurality of machine learning model training tasks simultaneously. And creating a corresponding working space for each data port by adopting a containerization technology, executing a corresponding machine learning model training task in the working space, and correspondingly, any team member in the team can log in a model training platform to check the experimental results of the machine learning models of other team members. According to the method, another special situation in the development task of the team multi-machine learning model is considered, the team members are guaranteed to be capable of rapidly checking and comparing experimental results, and the efficiency of the team development task is improved.

In some embodiments, the model file data set stores machine learning model training data in advance, the training data includes training set data, and the working space is preconfigured with a machine learning model training frame; the step of performing the machine learning model training task within the workspace comprises: acquiring machine learning model training data from the model file data set according to the configuration information and acquiring code file data of a machine learning model from the text file data set; and executing the machine learning model training task by using the training data, the code file data and the training frame.

In some cases, training data used by different machine learning models is different for the development task of one team-like multi-machine learning model. For example, the training data used for the machine learning model for performing image recognition is image data, and the training data used for the machine learning model for performing voice recognition is voice data. There may also be multiple different versions of the data set for the same machine learning model. Therefore, in the development task of one team-like multi-machine learning model, the machine learning model training data also needs to be managed uniformly. Different machine learning models may employ different machine learning model training frameworks, and correspondingly, the code file data employed may also be different, thus allowing for time savings in team member configuration of the corresponding development tools and environments. The machine learning model training framework may be pre-configured directly in the workspace. Also, the machine learning model training data is stored in the model file data set in advance, and the code file data of the machine learning model is stored in the text file data set in advance. When use is required, a call is made from the model file dataset and the text file dataset.

The machine learning model training data includes training set data, which may be classified into an image data file, a text data file, a voice data file, a video data file, and the like.

The machine learning model training framework is a preconfigured, e.g., pytorch, tensorFlow, XGBoost, scikit-learn, etc., machine learning framework. The code file is a file of a code corresponding to the machine learning model.

The method can greatly save time for team members to prepare the data set and time for configuring corresponding development tools and environments. And the working efficiency of team members is improved.

In some embodiments, the first type of data further comprises machine learning model evaluation index file data; after the step of obtaining the first class data and the second class data generated by training the machine learning model, the method further comprises the following steps: and comparing the evaluation index in the evaluation index file data with a preset evaluation index threshold value, and continuously executing the machine learning model training task under the condition that the preset condition is met.

In some cases, the machine learning model requires multiple rounds of iterative training before the loss function of the machine learning model can converge. After the loss function converges, the obtained machine learning model is not necessarily required for the task. For example, some machine learning model evaluation indicators of this machine learning model may not meet the requirements of the present task. Therefore, training is needed to meet the requirement of the task.

The machine learning model evaluation index file data represents data for evaluating a machine learning model performance index. The performance metrics may include Accuracy (Accuracy), precision (Precision), recall (Recall), P-R Curve (Precision-Recall Curve), false Positive Rate (FPR), F1 Score, and so forth.

The machine learning model evaluation index threshold can be set manually and autonomously, and the specific setting is needed to correspond to specific tasks. The machine learning model evaluation index threshold may be set for all machine learning model evaluation indexes or only for some key machine learning model evaluation indexes. As long as the machine learning model evaluation index file data obtained after the training of a certain machine learning model is completed, the evaluation indexes of the machine learning model, which comprise a certain number of keys, reach the threshold value. Training may not be necessary. Specifically, which machine learning model evaluation index is the key machine learning model evaluation index is selected, and the machine learning model evaluation index can be set according to specific tasks.

The constraint condition may be expressed as continuing to perform the machine learning model training task if a certain evaluation index is below the preset evaluation index threshold. Alternatively, the machine learning model training task may be further executed when a certain evaluation index is higher than the preset evaluation index threshold. Specifically, in a specific machine learning training task, the evaluation index threshold of the accuracy rate may be set to 99%, and the constraint condition may be expressed as that the machine learning model training task is continuously executed when the accuracy rate is lower than 99%. When a specific machine learning training task is heavy, the evaluation index threshold of the false alarm rate can be set to be 1%, and the constraint condition can be expressed as that the machine learning model training task is continuously executed under the condition that the false alarm rate is higher than 1%.

According to the method, the situation that the machine learning model obtained after training is completed in an actual application scene and the corresponding evaluation index possibly cannot reach the actual requirement of a task is considered, so that training is needed. The method judges whether to execute training again or not by comparing the machine learning model evaluation index represented by the machine learning model evaluation index file with a preset machine learning model evaluation index threshold. The method has obvious use value and can improve the working efficiency of team members.

In some embodiments, the method further comprises: receiving an access request for the file index information; responding to an access request of a terminal to the file index information, and sending the file index information to the terminal for forming an index page for displaying; the index page comprises access entry information of first-class data and second-class data of the machine learning model or text identification bound with the access entry information.

In some cases, there is such a need in the development of a team-like multi-machine learning model. Each team member expects evaluation indexes, model file data, code file data and the like obtained after the completion of training of each machine learning model can be queried or browsed simultaneously on the page of one terminal. That is, each team member desires to be able to quickly view and compare the experimental results of other members of the team. Specifically, when team members A and B are simultaneously training the same machine learning model. After the training of the machine learning model trained by the team member A is completed, the team member B can quickly inquire about the situation through a front-end page, and then the team member B can stop training. Therefore, if a terminal page is provided, so that all team members inquire about the experimental results of the team members or other team members, the working efficiency of the team members can be improved.

The index page is a page for visualizing the first type data and the second type data. The page mainly shows information contained in the first type of data and the second type of data. The access entry information may be represented as information of actual storage addresses of the first type of data and the second type of data corresponding to the machine learning model. And the text identifier is bound with the access entry information and can be expressed as the first-type data and the second-type data carrying information.

Specifically, for example, the first type of data includes model file data of a machine learning model, machine learning model evaluation index file data, and machine learning model training data. The second class of data includes code file data of the machine learning model, metadata of the model file, metadata of machine learning model evaluation index file data, and metadata of machine learning model training data. The access entry information may be represented as the actual storage address of the data as described above. The text identifier bound with the access entry information can be represented as the information carried by the data. For example, the information carried by the model file data of the machine learning model includes the category name of the machine learning model, model parameters, and the like. The information carried by the machine learning model evaluation index file data comprises a plurality of evaluation index values of a specific machine learning model. More specifically, the index page is represented as a page obtained by visualizing data included in the model file data of the machine learning model, the machine learning model evaluation index file data, and the code file data of the machine learning model. The machine learning model category name, the machine learning model evaluation index, and the code version of the machine learning model may be visualized. The specific visualization form can take various forms, for example, a graph form, a pie chart form, and a table form. When in the form of a table, the table header may include training start time, training duration, team members, machine learning model names, machine learning model evaluation index, super parameters, and the like.

According to the method, in the development task of a team-oriented multi-machine learning model, each team member can query the experimental results of other members of the team on a front-end page, so that the efficiency of team member query and comparison of the experimental results is improved, the working efficiency of the team members is further improved, and the method has good practical value.

As shown in fig. 6, one embodiment of the present specification provides a method for displaying index information, which may be applied to a terminal. The method comprises the following steps.

Step S201: receiving index information sent by a model training platform; wherein the index information is obtained according to the method for generating index information.

Step S202: forming an index page according to the index information; the index page comprises access entry information of first class data and second class data corresponding to the machine learning model or text identification bound with the access entry information.

Step S203: and displaying the index page.

In some cases, there is such a need in the development of a team-like multi-machine learning model. Each team member expects to be able to query or browse the evaluation index, model file, code file, etc. obtained after the training of each machine learning model is completed at the same time on one front-end page. Therefore, a display method of index information can be provided, which is applied to a terminal. The terminal receives the index information sent by the server, and forms an index page according to the index information and the front page. Team members can quickly view and compare experimental results through the index page.

In some embodiments, the index information may be transmitted from the index information in the server to the terminal based on HTTP protocol or WebSocket2 protocol. The corresponding front-end system in the terminal may be built using a VUE (progressive framework for building user interfaces) framework, and the front-end system may employ an MVVM (Model-View Model) architecture mode. The index information can be displayed by means of tools such as data visualization chart libraries echarties, antv and the like.

As shown in fig. 7, an embodiment of the present specification further provides an apparatus for generating index information. The device is applied to a model training platform, and the model training platform is used for training a plurality of machine learning models; the device comprises: an acquisition unit for acquiring first-class data and second-class data generated by machine learning model training; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file; the storage unit is used for storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform; a generating unit for generating file index information according to the model file data set and the text file data set; the file index information comprises access entry information of first-class data and second-class data corresponding to the machine learning model.

In some embodiments, the generating means further comprises: the request receiving unit is used for receiving a request which is sent by one data port and used for executing a plurality of machine learning model training tasks; wherein the request includes configuration information of the machine learning model training task; a space creating unit, configured to create a corresponding working space for the plurality of machine learning model training tasks according to the configuration information; and the task execution unit is used for executing the machine learning model training task in the working space.

In some embodiments, the generating means further comprises: the request receiving unit is used for receiving requests which are sent by the plurality of data ports and used for executing the training tasks of the machine learning model; wherein the request includes configuration information for performing a plurality of machine learning model training tasks and the machine learning model training tasks; a space creation unit, configured to create a corresponding workspace for the data port according to the configuration information; and the task execution unit is used for executing the machine learning model training task in the working space.

In some embodiments, the model file data set has pre-stored therein machine learning model training data, the training data comprising training set data, and the workspace has pre-configured therein a machine learning model training framework. A task execution unit comprising: a data acquisition module for acquiring machine learning model training data from the model file data set and code file data of a machine learning model from the text file data set according to the configuration information; and the execution module is used for executing the machine learning model training task by using the training data, the code file data and the training framework.

In some embodiments, the generating means further comprises: a calculation unit configured to calculate a hash value of the first type of data using a cryptographic hash function; and the hash value processing module is used for storing the hash value into the metadata and taking the hash value as the file name of the first type of data.

In some embodiments, the generating means further comprises: the response module is used for responding to an access request of the terminal to the file index information and sending the file index information to the terminal for forming an index page for displaying; the index page comprises access entry information of first-class data and second-class data of the machine learning model or text identification bound with the access entry information.

The embodiment of the specification also provides a display device of the index information. The display device includes: the receiving unit is used for receiving the index information sent by the model training platform; wherein, the index information is obtained by adopting the generation method of the index information; an index forming unit for forming an index page according to the index information; the index page comprises access entry information of first class data and second class data corresponding to a machine learning model or text identifiers bound with the access entry information; and the display unit is used for displaying the index page.

As shown in fig. 8, the embodiment of the present disclosure further provides an electronic device, which may be a smart phone, a tablet computer, an electronic book, or an electronic device capable of running an application program. The electronic device in this embodiment may include one or more of the following components: a processor, a network interface, memory, non-volatile storage, and one or more application programs, wherein the one or more application programs may be stored in the non-volatile storage and configured to be executed by the one or more processors, the one or more program configured to perform the method as described in the foregoing method embodiments.

The present specification also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, causes the computer to execute the index information generation method of any of the above embodiments.

The present description also provides a computer program product containing instructions which, when executed by a computer, cause the computer to perform the method of generating index information in any of the above embodiments.

It will be appreciated that the specific examples herein are intended only to assist those skilled in the art in better understanding the embodiments of the present disclosure and are not intended to limit the scope of the present invention.

It should be understood that, in various embodiments of the present disclosure, the sequence number of each process does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

It will be appreciated that the various embodiments described in this specification may be implemented either alone or in combination, and are not limited in this regard.

Unless defined otherwise, all technical and scientific terms used in the embodiments of this specification have the same meaning as commonly understood by one of ordinary skill in the art to which this specification belongs. The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to limit the scope of the description. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It will be appreciated that the processor of the embodiments of the present description may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital signal processor (Digital SignalProcessor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The methods, steps and logic blocks disclosed in the embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

It will be appreciated that the memory in the embodiments of this specification may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable Programmable ROM (EPROM), an Electrically Erasable Programmable ROM (EEPROM), or a flash memory, among others. The volatile memory may be Random Access Memory (RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present specification.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and unit may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this specification, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present specification may be integrated into one processing unit, each unit may exist alone physically, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present specification may be essentially or portions contributing to the prior art or portions of the technical solutions may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present specification. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM), a magnetic disk, or an optical disk, etc.

The foregoing is merely specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope disclosed in the present disclosure, and should be covered by the scope of the present disclosure. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for generating index information, wherein the method is applied to a model training platform, and the method comprises the following steps:

acquiring first class data and second class data generated by training a machine learning model; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file;

storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform;

generating file index information according to the model file data set and the text file data set; wherein the file index information includes access entry information for a first class of data and a second class of data of the machine learning model.

2. The method of claim 1, wherein prior to the step of obtaining the first type of data and the second type of data generated by the machine learning model training, further comprising:

receiving a request for executing a plurality of machine learning model training tasks sent by a data port; wherein the request includes configuration information of the machine learning model training task;

Creating corresponding workspaces for the plurality of machine learning model training tasks according to the configuration information;

the machine learning model training task is performed within the workspace.

3. The method of claim 1, wherein prior to the step of obtaining the first type of data and the second type of data generated by the machine learning model training, further comprising:

receiving requests for executing machine learning model training tasks sent by a plurality of data ports; wherein the request includes configuration information for performing a plurality of machine learning model training tasks and the machine learning model training tasks;

creating a corresponding working space for the data port according to the configuration information;

the machine learning model training task is performed within the workspace.

4. A method according to claim 2 or 3, wherein the model file data set has stored therein machine learning model training data in advance, the training data including training set data, the working space having a machine learning model training framework pre-configured therein; the step of performing the machine learning model training task within the workspace comprises:

Acquiring machine learning model training data from the model file data set according to the configuration information and acquiring code file data of a machine learning model from the text file data set;

and executing the machine learning model training task by using the training data, the code file data and the training frame.

5. The method according to claim 1, wherein the method further comprises:

calculating a hash value of the first type of data using a cryptographic hash function;

and storing the hash value into the metadata, and taking the hash value as the file name of the first type of data.

6. The method according to claim 1, wherein the method further comprises:

responding to an access request of a terminal to the file index information, and sending the file index information to the terminal for forming an index page for displaying; the index page comprises access entry information of first-class data and second-class data of the machine learning model or text identification bound with the access entry information.

7. A method for displaying index information, the method comprising:

Receiving index information sent by a model training platform; wherein the index information is obtained by the method of any one of claims 1-6;

forming an index page according to the index information; the index page comprises access entry information of first class data and second class data corresponding to a machine learning model or text identifiers bound with the access entry information;

and displaying the index page.

8. An index information generating device is characterized in that the device is applied to a model training platform; the device comprises:

an acquisition unit for acquiring first-class data and second-class data generated by machine learning model training; the first type of data at least comprises model file data of a machine learning model, and the second type of data at least comprises code file data of the machine learning model and metafile data of a model file;

the storage unit is used for storing the first type data of each machine learning model into a model file data set of the model training platform, and storing the second type data into a text file data set of the model training platform;

a generating unit for generating file index information according to the model file data set and the text file data set; the file index information comprises access entry information of first-class data and second-class data corresponding to the machine learning model.

9. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the method of generating index information according to any one of claims 1 to 6 or implementing the method of displaying index information according to claim 7 when executing the computer program.

10. A computer-readable storage medium, wherein a computer program is stored in the readable storage medium, and the computer program when executed by a processor implements the method of generating index information according to any one of claims 1 to 6, or implements the method of displaying index information according to claim 7.