CN113850372A

CN113850372A - Neural network model training method, device, system and storage medium

Info

Publication number: CN113850372A
Application number: CN202111111622.8A
Authority: CN
Inventors: 郑亚平; 刘朝晖; 贺佳
Original assignee: Shenshuo Railway Branch of China Shenhua Energy Co Ltd
Current assignee: Shenshuo Railway Branch of China Shenhua Energy Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-12-28

Abstract

The application relates to a neural network model training method, device, system and storage medium. The neural network model training method comprises the following steps: acquiring a downloading gradient of the server, and updating the data parameters of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update; obtaining an edge copy model of the edge model after the current data parameter updating through copying, training the edge copy model based on a local training sample, and obtaining a data parameter gradient of the trained edge copy model; selecting an uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on the expert model. According to the method and the device, only the uploading gradient and the downloading gradient are transmitted, so that the safety of model data network transmission is improved.

Description

Neural network model training method, device, system and storage medium

Technical Field

The present application relates to the field of neural network technology, and in particular, to a neural network model training method, apparatus, system, and storage medium.

Background

With the development of neural network technology, deep learning technology appears, and the application of the deep learning technology brings many conveniences for edge calculation. In the application scenario of deep learning, edge computing places a large number of computing nodes close to the end device to meet the high computation and low latency requirements of deep learning on edge devices and provide additional benefits in terms of privacy, bandwidth efficiency and scalability. However, in practical application, the deep learning technology often faces the problems of high training difficulty, difficulty in convergence and the like, and particularly in an edge computing environment, edge devices are limited by the volume of the edge devices, often have limited computing power and limited battery power, and are difficult to meet the requirements of edge model training.

The traditional training model depends on the local computing capability of the edge equipment or the model transfer of a cloud center, the traditional training model occupies a large amount of local computing resources and energy consumption, and has certain influence on the performance and the endurance of the edge equipment; the whole transfer of the model brings huge challenges to the security of the whole network, and the cloud-trained complex model is not necessarily suitable for being deployed in the edge node. Knowledge distillation fits the problem to be solved through a complex model, and then the complex model is used for providing guidance for the simple model to help the simple model to converge as soon as possible. In the implementation process, the inventor finds that at least the following problems exist in the conventional technology: in the prior art, the problems of unsafe transmission of model data in a network and the like exist.

Disclosure of Invention

In view of the above, there is a need to provide a safe and reliable neural network model training method, apparatus, system and storage medium.

In order to achieve the above object, an embodiment of the present application provides a neural network model training method, including the following steps:

acquiring a downloading gradient of the server, and updating the data parameters of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

obtaining an edge copy model of the edge model after the current data parameter updating through copying, training the edge copy model based on a local training sample, and obtaining a data parameter gradient of the trained edge copy model;

selecting an uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on the expert model.

In one embodiment, the preset selection rule includes: selecting the number of uploading gradients as the product of the total data parameter contained in the edge model and the terminal computing capacity ratio, and selecting the standard of the uploading gradients as any one or any combination of the following rules: the data parameter gradient is selected sequentially or randomly from big to small;

the terminal computing capacity ratio is the ratio of the target terminal computing capacity to the sum of all terminal computing capacities.

In one embodiment, in the step of selecting the upload gradient from the data parameter gradients according to a preset selection rule, the upload gradient is selected by using the following formula:

in the formula, the number of all terminals is n, the target terminal is the jth terminal, and m_jFor uploading the number of gradients, M is the total amount of data parameters contained in the edge model, c_jFor the computing power of the target terminal, c_iIs the computing power of the ith terminal.

A neural network model training method comprises the following steps:

transmitting the download gradient to the terminal; the download gradient is used for indicating the terminal to complete the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

if the uploading gradient uploaded by the terminal is received, updating the data parameter of the cloud model based on the expert model; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge copy model through a terminal according to a preset selection rule; the trained edge copy model is obtained by training the edge copy model based on a local training sample through a terminal; the edge copy model is obtained by copying the edge model after the data parameters are updated at this time through a terminal.

In one embodiment, before the step of transmitting the download gradient to the terminal, the method further comprises the steps of: deploying an expert model reaching a preset training target; and initializing the cloud model according to the expert model reaching the preset training target.

In one embodiment, in the step of completing the updating of the data parameter of the cloud model based on the expert model, the cloud model is updated by using the following loss function:

wherein KL is the KL divergence, P_expertFor the probability distribution of the expert model, P_cloudIs the probability distribution of the cloud model and x is the input to the neural network model.

A neural network model training apparatus, comprising:

the parameter updating module is used for acquiring the downloading gradient of the server and finishing the updating of the data parameters of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

the training module is used for obtaining an edge replica model of the edge model after the data parameters are updated at this time through copying, training the edge replica model based on a local training sample, and obtaining the data parameter gradient of the trained edge replica model;

the uploading module is used for selecting the uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on the expert model.

A neural network model training apparatus, comprising:

the data transmission module is used for transmitting the downloading gradient to the terminal; the download gradient is used for indicating the terminal to complete the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

the updating module is used for finishing the data parameter updating of the cloud model based on the expert model if the uploading gradient uploaded by the terminal is received; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge copy model through a terminal according to a preset selection rule; the trained edge copy model is obtained by training the edge copy model based on a local training sample through a terminal; the edge copy model is obtained by copying the edge model after the data parameters are updated at this time through a terminal.

A neural network model training system comprises a server and a plurality of terminals which are connected with the server; the terminal is used for executing the step of the neural network model training method realized from the terminal perspective; the server is used for executing the step of the neural network model training method realized from the server perspective; wherein, each terminal adopts a synchronous downloading mode to obtain a downloading gradient; and uploading the uploading gradient by each terminal in an asynchronous uploading mode.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

One of the above technical solutions has the following advantages and beneficial effects:

based on the application, in the training process of the neural network model, the user data and the model data parameter information are not shared in the whole system network, only the uploading gradient and the downloading gradient are used as intermediate calculation results and transmitted to update the model data parameter, and because the information quantity of the uploading gradient and the downloading gradient is less than that of the training data, the direct output of the user data and the model data parameter information in the network is avoided in the data transmission process, so that the safety of the user data and the model data parameter information can be protected, and the safety of the transmission of the model data in the network is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the conventional technologies of the present application, the drawings used in the descriptions of the embodiments or the conventional technologies will be briefly introduced below, it is obvious that the drawings in the following descriptions are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of an exemplary implementation of a neural network model training method;

FIG. 2 is a schematic flow chart of a neural network model training method implemented from a terminal perspective in one embodiment;

FIG. 3 is a schematic flow chart diagram illustrating a neural network model training method implemented from a server perspective in one embodiment;

FIG. 4 is a schematic diagram illustrating an update timing sequence of a synchronous download mode and an asynchronous upload mode in one embodiment;

FIG. 5 is a schematic diagram of a neural network model training system in some examples.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Embodiments of the present application are set forth in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.

Spatial relational terms, such as "under," "below," "under," "over," and the like may be used herein to describe one element or feature's relationship to another element or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements or features described as "below" or "beneath" other elements or features would then be oriented "above" the other elements or features. Thus, the exemplary terms "under" and "under" can encompass both an orientation of above and below. In addition, the device may also include additional orientations (e.g., rotated 90 degrees or other orientations) and the spatial descriptors used herein interpreted accordingly.

It will be understood that when an element is referred to as being "connected" to another element, it can be directly connected to the other element or be connected to the other element through intervening elements. Further, "connection" in the following embodiments is understood to mean "electrical connection", "communication connection", or the like, if there is a transfer of electrical signals or data between the connected objects.

As used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises/comprising," "includes" or "including," etc., specify the presence of stated features, integers, steps, operations, components, parts, or combinations thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, components, parts, or combinations thereof. Also, as used in this specification, the term "and/or" includes any and all combinations of the associated listed items.

In a traditional training model, under a local computing mode, edge devices (such as intelligent sensors, programmable logic controllers, edge intelligent routers and the like) carry out back propagation on gradients generated by differences between model predicted values and actual values, model data parameters are updated by means of local CPUs (central processing units) or GPUs (graphic processing units), the training process of the neural network model depends on the local computing capability of the edge devices, a large amount of local computing resources and energy consumption are occupied in the process, and certain influence is generated on the performance and the endurance of the edge devices; in a cloud computing mode, a cloud center is responsible for uniformly training a model and transferring the model to an edge end, the neural network model training process depends on the model transfer of the cloud center, however, the whole transfer of the model brings huge challenges to the safety of the whole network, and a complex model trained by a cloud end is not necessarily suitable for being deployed in an edge node. Knowledge distillation fits the problem to be solved through a complex model, and then the complex model is used for providing guidance for the simple model to help the simple model to converge as soon as possible. By deploying a complex model (namely an expert model) in the cloud center and deploying a simple model (namely an edge model) on the edge side, the task requirement of the edge equipment can be met under a lower computing load. However, the existing knowledge distillation technology has the problems of low synchronous updating efficiency, unsafe transmission of model data in a network and the like.

It should be noted that the neural network model training method, apparatus, system, and storage medium in the present application are applied in an edge computing scenario in which cloud computing power sinks from the center to the edge. The neural network model training method adopts a mode of end edge cooperation, wherein an end refers to terminal equipment, such as a host, Internet of things equipment, a mobile phone, industrial Internet equipment and the like; an edge refers to a server or a server cluster (actually, the edge may also be implemented in a cloud server environment) deployed on an edge side, that is, a deployment location of the server may be a cloud end or the edge side. In the scene of edge computing, a server and n edge devices are deployed; an expert model, a cloud model and cloud backup models of n edge models are deployed in a server; in the n edge devices, one edge model and one edge copy model are deployed in each device. The model parameters include structural parameters and data parameters. Except for the expert models, all models have the same structural parameters, namely each model has the same network layer number and the same network has the same model type, and particularly, the structural parameters of the models are determined in the initialization stage and are kept unchanged in the subsequent training process. The server in the application has high computing power compared with the edge device.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The neural network model training method provided by the application can be applied to the edge computing application environment shown in fig. 1. Among them, terminals such as terminal 1 and terminal 2 …, terminal n, and the like communicate with a server via a network. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a neural network model training method is provided, which is described by taking the method as an example applied to any one of the terminals in fig. 1, and includes the following steps:

step 202, acquiring a downloading gradient of the server, and updating the data parameter of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

step 204, obtaining an edge replica model of the edge model after the current data parameter updating through copying, training the edge replica model based on a local training sample, and obtaining a data parameter gradient of the trained edge replica model;

step 206, selecting an uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on the expert model.

Specifically, in the model data parameter updating stage, the terminal updates the edge model according to the acquired downloading gradient of the server, and updates the data parameters of the edge replica model by copying the data parameters of the edge model after the data parameters are updated this time; in some examples, before obtaining the download gradient of the server, if the model in the server has just completed initialization, the edge model in the terminal updates parameters from the initialized cloud model, and after the initialization of the parameters of the edge model is completed, the edge replica model is initialized by copying the parameters of the initialized edge model; the terminal may be an edge device;

in the model training stage, the terminal trains an edge copy model based on a local training sample and obtains the data parameter gradient of the trained edge copy model; the data parameter gradient is a data parameter difference value between the edge copy model after the current training and the edge copy model before the current training (namely the edge model after the current data parameter updating); in some examples, the data parameter difference may be a variance of the model data parameter, e.g., an optimal direction of data parameter adjustment calculated using a back propagation algorithm after derivation of the objective function;

in the gradient uploading stage, the terminal selects an uploading gradient from the data parameter gradients according to a preset selection rule and uploads the uploading gradient; uploading the gradient to instruct a server to complete the updating of the data parameter of the cloud model based on the expert model; in some examples, the number of selected upload gradients is less than the total number of data parameters of the edge model, and the criteria for selecting upload gradients include selecting the data parameter gradients sequentially from large to small and randomly.

In the training process of the neural network model, user data and model parameter information are not shared in the whole system network, only uploading gradient and downloading gradient are used as intermediate calculation results and transmitted to update model data parameters, and because the information quantity of the uploading gradient and the downloading gradient is less than that of the training data, direct output of the user data and the model parameter information in the network is avoided in the data transmission process, so that the safety of the user data and the model parameter information can be protected, and the safety of the model data transmission in the network is improved.

In one embodiment, the preset selection rule includes: selecting the number of uploading gradients as the product of the total data parameter contained in the edge model and the terminal computing capacity ratio, and selecting the standard of the uploading gradients as any one or any combination of the following rules: the data parameter gradient is selected sequentially or randomly from big to small; the terminal computing capacity ratio is the ratio of the target terminal computing capacity to the sum of all terminal computing capacities.

Specifically, when the data parameter gradients are selected from big to small in sequence, the convergence rate can be improved in the process of training the cloud model, and meanwhile, the probability of overfitting is increased, so that the performance of the cloud model fluctuates to a certain extent; when the data parameter gradient is randomly selected, the overall updating direction of the model can be better represented, the large fluctuation of the model performance is avoided, but the random selection causes that some characteristics of the actual task are not sensitive in the training process, and the convergence speed of the model can be reduced. In some examples, the random selection method may employ a simple random sampling method.

Specifically, the higher the computing power of the target terminal is relative to the computing power of other terminals, the faster the speed of training the edge copy model by the target terminal is, and then for the terminal with the higher computing power, the more the number of the selected uploading gradients is, the faster the training speed of the neural network model can be accelerated, and the accuracy and the reliability can be improved.

In some examples, the computing power of the terminal may be determined according to the actual conditions and requirements of the neural network model; the total amount of data parameters contained in the edge model is the sum data parameter of all trainable data parameters in the edge model, and the total amount of data parameters directly determines the size of the model, and specifically, the total amount of data parameters contains all weights and biases in network layers with trainable data parameters, such as a linear layer, a convolutional layer, a cyclic layer and the like of the model.

In one embodiment, as shown in fig. 3, a neural network model training method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step S310, transmitting the downloading gradient to the terminal; the download gradient is used for indicating the terminal to complete the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

step S320, if the uploading gradient uploaded by the terminal is received, updating the data parameter of the cloud model based on the expert model; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge copy model through a terminal according to a preset selection rule; the trained edge copy model is obtained by training the edge copy model based on a local training sample through a terminal; the edge copy model is obtained by copying the edge model after the data parameters are updated at this time through a terminal.

Specifically, the server takes a data parameter difference value between the cloud backup model and the cloud model after the last data parameter update as a download gradient required by the current data parameter update of the edge model in the terminal, and transmits the download gradient to the terminal; and if the server receives the uploading gradient uploaded by the terminal, updating the data parameters of the cloud model under the guidance of the expert model. In some examples, after the data parameter update of the cloud model is completed, the server copies the data parameter of the cloud model to update the cloud backup model, so as to obtain a download gradient transmitted to the terminal next time.

Specifically, the preset target may be a preset model learning curve, accuracy, precision, and the like, and the expert model reaching the preset target has the capability of guiding the cloud model to update parameters. In some examples, the expert model that achieves the preset goal is a trained model with extremely high accuracy and reliability; and after the initialization of the cloud model is completed, initializing the cloud backup model by copying the parameters of the cloud model.

Wherein, the KL divergence (Kullback-Leibler divergence) is the relative entropy, and is used for measuring the difference of two probability distributions. In particular, a probability distribution P of the expert model is calculated_expertProbability distribution P of the cloud model_cloudAnd according to the minimized result of the KL divergence, the decision of the cloud model is made to be as close to the expert model as possible. In some examples, the objective function of the cloud model updating process is related to the expert model, and the cloud model is updated according to the minimized result of the KL divergence, and further, the decision of the edge model is made to be as close to the expert model as possible.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, there is provided a neural network model training apparatus including:

Specifically, the parameter updating module updates the edge model according to the acquired downloading gradient of the server, and updates the data parameters of the edge replica model by copying the data parameters of the edge model after the data parameters are updated this time; in some examples, before obtaining the download gradient of the server, if the model in the server has just completed initialization, updating parameters of the edge model by the initialized cloud model, and after the initialization of the parameters of the edge model is completed, initializing the edge replica model by copying the parameters of the initialized edge model;

the training module trains an edge copy model based on a local training sample and obtains the data parameter gradient of the trained edge copy model; the data parameter gradient is a data parameter difference value between the edge copy model after the current training and the edge copy model before the current training (namely the edge model after the current data parameter updating); in some examples, the data parameter difference may be a variance of the model data parameter;

the uploading module selects and uploads an uploading gradient from the data parameter gradients according to a preset selection rule; uploading the gradient to instruct a server to complete the updating of the data parameter of the cloud model based on the expert model; in some examples, the number of selected upload gradients is less than the total number of data parameters of the edge model, and the criteria for selecting upload gradients include selecting the data parameter gradients sequentially from large to small and randomly.

Specifically, the data transmission module takes a data parameter difference value between the cloud backup model and the cloud model after the last data parameter update as a download gradient required by the current data parameter update of the edge model in the terminal, and transmits the download gradient to the terminal; and if the uploading gradient uploaded by the terminal is received, the updating module completes the data parameter updating of the cloud model under the guidance of the expert model. In some examples, after the data parameter update of the cloud model is completed, the update module copies the data parameter of the cloud model to update the cloud backup model, so as to obtain a download gradient transmitted to the terminal next time.

For specific limitations of the neural network model training device, reference may be made to the above limitations of the neural network model training method, which are not described herein again. The modules in the neural network model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

In one embodiment, a neural network model training system is provided, which comprises a server and a plurality of terminals connected with the server; the terminal is used for executing the step of the neural network model training method realized from the terminal perspective; the server is used for executing the step of the neural network model training method realized from the server perspective; wherein, each terminal adopts a synchronous downloading mode to obtain a downloading gradient; and uploading the uploading gradient by each terminal in an asynchronous uploading mode.

The synchronous downloading means that after the downloading gradients obtained by all the terminals are calculated, the downloading gradients are uniformly downloaded to all the terminals to update the edge model; asynchronous uploading means that each terminal independently calculates to obtain an uploading gradient and uploads the uploading gradient to a server for updating of the cloud model.

In some examples, as shown in fig. 4, when a synchronous downloading manner is adopted, the updating timing sequence is that after the cloud model is updated, each edge model uniformly obtains a corresponding downloading gradient for updating the data parameters; when an asynchronous uploading mode is adopted, the updating time sequence is that after the edge model is updated, whether the cloud model is in the updating time or not, the uploading gradient is selected to be uploaded, and the cloud model updates data parameters according to the uploading gradient acquired at the last time; wherein the time interval of each two adjacent updates of the cloud model is fixed. The cloud model is updated in a mode of supporting asynchronous uploading, and the parameter updating efficiency of the neural network model is optimized.

In some examples, as shown in fig. 5, a trained expert model with extremely high accuracy and reliability is deployed in the server, and in an initialization stage, the server initializes the cloud model based on the expert model; after the initialization of the cloud model is completed, initializing a cloud backup model by copying parameters of the cloud model; updating parameters of the edge model by the initialized cloud model, and after the initialization of the parameters of the edge model is completed, obtaining an edge copy model of the initialized edge model by copying;

taking a neural network model updating process in any edge device as an example (the neural network model updating processes in other edge devices are the same), the edge device acquires the downloading gradient of the server, and completes the current data parameter updating of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update; copying the data parameters of the edge model after the data parameters are updated at this time by the edge equipment, updating the edge replica model, training the updated edge replica model based on a local training sample, and acquiring the data parameter gradient of the trained edge replica model after the training is finished; selecting and uploading an uploading gradient from the data parameter gradients according to the following formula:

in the formula, the number of all edge devices is n, the target edge device is the jth edge device, and m_jFor uploading the number of gradients, M is the total amount of data parameters contained in the edge model, c_jFor target edge device computing power, c_jCalculation for the ith edge deviceCapability. And the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on the expert model.

For the server, the server transmits the download gradient to the edge device; downloading the gradient to indicate the edge equipment to finish the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update; if the server receives the uploading gradient uploaded by the edge equipment, experience sharing is provided for the cloud model based on the expert model, and the data parameter updating of the cloud model is completed; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge replica model from the data parameter gradient through edge equipment; the trained edge copy model is obtained by training the edge copy model based on a local training sample through edge equipment; the edge copy model is obtained by copying the edge model after the data parameters are updated by the edge device; the cloud model is updated with the following loss function:

And for the edge equipment 1 to the edge equipment n, acquiring a downloading gradient in a synchronous downloading mode, and uploading the uploading gradient in an asynchronous uploading mode.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

In the description herein, references to the description of "some embodiments," "other embodiments," "desired embodiments," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, a schematic description of the above terminology may not necessarily refer to the same embodiment or example.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A neural network model training method is characterized by comprising the following steps:

acquiring a downloading gradient of a server, and updating the data parameters of the edge model according to the downloading gradient; the downloading gradient is a data parameter difference value of the cloud backup model in the server and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

obtaining an edge replica model of the edge model after the current data parameter is updated through copying, training the edge replica model based on a local training sample, and obtaining a data parameter gradient of the trained edge replica model;

selecting an uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on an expert model.

2. The neural network model training method of claim 1,

the preset selection rule comprises the following steps: selecting the number of the uploading gradients as the product of the total data parameters contained in the edge model and the terminal computing capacity ratio, and selecting the uploading gradients with any one or any combination of the following rules: the data parameter gradient is selected sequentially or randomly from big to small;

and the terminal computing capacity ratio is the ratio of the computing capacity of the target terminal to the sum of the computing capacities of all the terminals.

3. The neural network model training method of claim 2, wherein in the step of selecting an upload gradient from the data parameter gradients according to a preset selection rule, the upload gradient is selected by using the following formula:

in the formula, the number of all terminals is n, and the target terminal is the jth terminalEnd, m_jThe number of the uploading gradients, M is the total amount of data parameters contained in the edge model, c_jComputing power for the target terminal, c_iIs the computing power of the ith terminal.

4. A neural network model training method is characterized by comprising the following steps:

transmitting the download gradient to the terminal; the downloading gradient is used for indicating the terminal to complete the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

if the uploading gradient uploaded by the terminal is received, updating the data parameters of the cloud model based on an expert model; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge copy model through the terminal according to a preset selection rule; the trained edge copy model is obtained by training the edge copy model based on a local training sample through the terminal; and the edge replica model is obtained by copying the edge model after the data parameter is updated at this time through the terminal.

5. The neural network model training method of claim 4, wherein the step of transmitting the download gradient to the terminal is preceded by the step of:

deploying the expert model reaching a preset training target;

initializing the cloud model according to the expert model reaching a preset training target.

6. The neural network model training method according to claim 4, wherein in the step of completing the updating of the current data parameters of the cloud model based on the expert model, the cloud model is updated by using a loss function as follows:

wherein KL is the KL divergence, P_expertAs a probability distribution of the expert model, P_cloudAnd x is the probability distribution of the cloud model and is the input of the neural network model.

7. A neural network model training device, comprising:

the training module is used for obtaining an edge replica model of the edge model after the data parameter updating through copying, training the edge replica model based on a local training sample, and obtaining the data parameter gradient of the trained edge replica model;

the uploading module is used for selecting the uploading gradient from the data parameter gradients according to a preset selection rule and uploading the uploading gradient; and the uploading gradient is used for indicating the server to complete the data parameter updating of the cloud model based on an expert model.

8. A neural network model training device, comprising:

the data transmission module is used for transmitting the downloading gradient to the terminal; the downloading gradient is used for indicating the terminal to complete the updating of the data parameter of the edge model; the downloading gradient is a data parameter difference value of the cloud backup model and the cloud model after the last data parameter updating; the cloud backup model is a copy of the cloud model before the last data parameter update;

the updating module is used for finishing the updating of the data parameters of the cloud model based on an expert model if the uploading gradient uploaded by the terminal is received; the uploading gradient is obtained by selecting the data parameter gradient of the trained edge copy model through the terminal according to a preset selection rule; the trained edge copy model is obtained by training the edge copy model based on a local training sample through the terminal; and the edge replica model is obtained by copying the edge model after the data parameter is updated at this time through the terminal.

9. A neural network model training system is characterized by comprising a server and a plurality of terminals which are connected with the server;

the terminal is configured to perform the steps of the method of any one of claims 1 to 3;

the server is configured to perform the steps of the method of any one of claims 4 to 6;

each terminal acquires the downloading gradient in a synchronous downloading mode; and each terminal uploads the uploading gradient in an asynchronous uploading mode.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.