CN113568740B - Model aggregation method, system, equipment and medium based on federal learning - Google Patents
Model aggregation method, system, equipment and medium based on federal learning Download PDFInfo
- Publication number
- CN113568740B CN113568740B CN202110804987.2A CN202110804987A CN113568740B CN 113568740 B CN113568740 B CN 113568740B CN 202110804987 A CN202110804987 A CN 202110804987A CN 113568740 B CN113568740 B CN 113568740B
- Authority
- CN
- China
- Prior art keywords
- aggregation
- clients
- model
- index
- performance indexes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002776 aggregation Effects 0.000 title claims abstract description 263
- 238000004220 aggregation Methods 0.000 title claims abstract description 263
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 142
- 238000012549 training Methods 0.000 claims abstract description 114
- 230000004931 aggregating effect Effects 0.000 claims abstract description 10
- 230000015654 memory Effects 0.000 claims description 56
- 230000001360 synchronised effect Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 238000007726 management method Methods 0.000 description 74
- 238000012550 audit Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000007717 exclusion Effects 0.000 description 3
- 238000006116 polymerization reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Transfer Between Computers (AREA)
- Selective Calling Equipment (AREA)
Abstract
The invention discloses a model aggregation method, a system, equipment and a medium based on federal learning, wherein the method is suitable for a management end deployed in a distributed network, the network is provided with the management end and a plurality of clients, and the method comprises the following steps: acquiring an initial training model obtained by training each of a plurality of clients and performance indexes of the clients; and aggregating each initial training model according to a target aggregation algorithm which is selected from a plurality of preset aggregation algorithms and corresponds to the performance index to obtain an aggregation model. According to the method, an initial training model obtained by training a plurality of clients and performance indexes of the clients are obtained; and aggregating each initial training model according to the target aggregation algorithm corresponding to the performance index to obtain an aggregation model. The method and the system realize aggregation of the initial training models obtained by training a plurality of clients based on the performance indexes of the clients, and can accelerate the aggregation speed by configuring a plurality of aggregation algorithms in advance and selecting the algorithm which is most matched with the performance indexes of the clients for aggregation.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a model aggregation method, system, equipment and medium based on federal learning.
Background
With the gradual perfection of various regulations, the requirements of various industries on data security are gradually improved. Sensitive data is not suitable for external interaction, but model training using such data is desirable. The data volume of the middle and small enterprises is insufficient to train a model with higher precision, so that a data island is formed. However, federal learning solves the problem of sensitive data interaction, but the model generated by the client has different precision, so that the existing aggregation model has complex flow, low aggregation speed and lower aggregation capability under specific environments.
Disclosure of Invention
The invention aims to overcome the defects of complex flow and low polymerization speed of the existing polymerization model in the prior art and provides a model polymerization method, system, equipment and medium based on federal learning.
The invention solves the technical problems by the following technical scheme:
the first aspect of the present invention provides a model aggregation method based on federal learning, the model aggregation method is suitable for a management end deployed in a distributed network, wherein the management end and a plurality of clients are deployed in the distributed network, and the model aggregation method includes:
acquiring an initial training model obtained by training each of the plurality of clients;
Acquiring performance indexes of the plurality of clients;
Selecting a target aggregation algorithm corresponding to performance indexes of a plurality of clients from a plurality of preset aggregation algorithms;
and aggregating each initial training model according to the target aggregation algorithm to obtain an aggregation model.
Preferably, the obtaining performance indexes of the plurality of clients includes:
Acquiring hardware performance indexes of the plurality of clients;
And acquiring the software performance indexes of the clients.
Preferably, the obtaining the hardware performance indexes of the plurality of clients includes:
And acquiring the hardware performance indexes of the clients according to the memories, the CPUs, the storage spaces and/or the electric quantity of the clients.
Preferably, the obtaining the software performance indexes of the plurality of clients includes:
And acquiring the software performance indexes of the clients according to the operating systems, the training sample numbers and/or the connected network types of the clients.
Preferably, the selecting a target aggregation algorithm corresponding to the performance indexes of the plurality of clients from a plurality of preset aggregation algorithms includes:
acquiring index ranges to which performance indexes of the plurality of clients belong;
and selecting the target aggregation algorithm from the plurality of aggregation algorithms according to the index range.
The second aspect of the invention provides a model aggregation system based on federal learning, which is applicable to a management end deployed in a distributed network, wherein the management end and a plurality of clients are deployed in the distributed network, and the model aggregation system comprises a first acquisition module, a second acquisition module, a selection module and an aggregation module;
The first acquisition module is used for acquiring initial training models obtained by respective training of the plurality of clients;
the second acquisition module is used for acquiring performance indexes of the plurality of clients;
the selection module is used for selecting a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms;
And the aggregation module is used for aggregating the initial training models according to the target aggregation algorithm to obtain an aggregation model.
Preferably, the second acquisition module comprises a first acquisition unit and a second acquisition unit;
the first obtaining unit is used for obtaining hardware performance indexes of the plurality of clients;
the second obtaining unit is used for obtaining the software performance indexes of the plurality of clients.
Preferably, the first obtaining unit is specifically configured to obtain the hardware performance indexes of the plurality of clients according to the memories, the CPUs, the storage spaces and/or the electric quantities of the plurality of clients.
Preferably, the second obtaining unit is specifically configured to obtain the software performance indexes of the plurality of clients according to the operating systems, the training sample numbers and/or the connected network types of the plurality of clients.
Preferably, the selection module comprises a third acquisition unit and a selection unit;
The third obtaining unit is configured to obtain an index range to which performance indexes of the plurality of clients belong;
the selection unit is used for selecting the target aggregation algorithm from the aggregation algorithms according to the index range.
The third aspect of the present invention provides a model aggregation method based on federal learning, the model aggregation method is applicable to any client deployed in a distributed network, where a management end and a plurality of clients are deployed in the distributed network, the model aggregation method includes:
Uploading the initial training models obtained by respective training to the management end together with other clients, so that the management end obtains the performance indexes of the clients after obtaining the initial training models, selects a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms, and aggregates the initial training models according to the target aggregation algorithm to obtain an aggregation model.
Preferably, the model aggregation method further comprises:
And after the management end is disconnected from the distributed network, selecting at least one client from the plurality of clients together with other clients as a new management end.
The fourth aspect of the present invention provides a model aggregation system based on federal learning, the model aggregation system is suitable for any client deployed in a distributed network, wherein a management end and a plurality of clients are deployed in the distributed network, and the model aggregation system includes an uploading module;
The uploading module is used for uploading the initial training models obtained by respective training together with other clients to the management end, so that the management end obtains the performance indexes of the clients after obtaining the initial training models, selects a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms, and aggregates the initial training models according to the target aggregation algorithm to obtain an aggregation model.
Preferably, the model aggregation system further comprises an election module;
The election module is used for electing at least one client from the plurality of clients as a new management terminal together with other clients after the management terminal is disconnected from the distributed network.
A fifth aspect of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the federal learning-based model aggregation method according to the first aspect or performing the federal learning-based model aggregation method according to the third aspect when executing the computer program.
A sixth aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the federal learning-based model aggregation method according to the first aspect or performs the federal learning-based model aggregation method according to the third aspect.
The invention has the positive progress effects that:
The method comprises the steps of firstly, obtaining an initial training model obtained by training a plurality of clients and performance indexes of the clients; then selecting a target aggregation algorithm corresponding to the performance indexes of a plurality of clients from a plurality of preset aggregation algorithms; and finally, aggregating each initial training model according to a target aggregation algorithm to obtain an aggregation model. Therefore, the method and the system realize aggregation of the initial training models obtained by training the clients based on the performance indexes of the clients, and can accelerate the aggregation speed by configuring a plurality of aggregation algorithms in advance and selecting the algorithm which is most matched with the performance indexes of the clients to aggregate as the running efficiency of the same aggregation algorithm under different performance indexes can be different.
Drawings
FIG. 1 is a flow chart of a federally learning-based model aggregation method according to embodiment 1 of the present invention.
FIG. 2 is a block diagram of a federally learning-based model aggregation system according to embodiment 2 of the present invention.
FIG. 3 is a flow chart of a federally learning-based model aggregation method according to embodiment 3 of the present invention.
FIG. 4 is a block diagram of a federally learning-based model aggregation system according to embodiment 4 of the present invention.
Fig. 5 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention.
Detailed Description
The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, the present embodiment provides a model aggregation method based on federal learning, which is applicable to a management end deployed in a distributed network, where the management end and a plurality of clients are deployed in the distributed network, and specifically, the model aggregation method includes:
step 101, obtaining initial training models obtained by training a plurality of clients.
In this embodiment, the management end creates a plurality of aggregation instances in advance, for example, the aggregation instances may be instances of face recognition, voice recognition, and the like, or may be other aggregation instances, where one aggregation instance corresponds to one initial model. When a plurality of clients need to perform model aggregation for a certain aggregation instance, a management terminal receives a request for adding the aggregation instance sent by the plurality of clients, and audits the request for adding the aggregation instance by the plurality of clients, after the audit is passed, the plurality of clients download initial models corresponding to the aggregation instance from the management terminal, train the initial models to obtain initial training models, and then the plurality of clients upload the initial training models obtained by training to the management terminal through protocols such as HTTP (hyper text transfer protocol).
It should be noted that, the client may directly upload the initial training model file to the management end, or may upload the initial training model parameters to the management end.
In the embodiment, the management end can audit the requests for adding the aggregation instance sent by a plurality of clients in a signature authentication mode, and if the signature authentication is successful, the audit is passed; if signature authentication fails, the verification fails. Of course, the auditing can be performed in other modes, the auditing result is stored in the database, and the storing result is returned.
Step 102, obtaining performance indexes of a plurality of clients.
In this embodiment, the management end receives own basic information that is periodically reported by a plurality of clients, calculates performance indexes of the plurality of clients according to own basic information of the plurality of clients, and provides decision basis for a scheduling aggregation algorithm.
In this embodiment, the basic information of the client may be the electric quantity, the memory, the storage space, the CPU (central processing unit), the operating system, the number of training samples, the connected network type, or other information of the client, which is not specifically limited herein.
Step 103, selecting a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms.
In this embodiment, the management end presets a plurality of basic model aggregation algorithms, for example, the aggregation algorithms include, but are not limited to, a model average aggregation algorithm, BMUF (communication efficient general distributed optimization) aggregation algorithm, ADMM (alternating direction multiplier method) aggregation algorithm, synchronous SGD (random gradient descent) aggregation algorithm, asynchronous SGD aggregation algorithm, adaDelay (delayed asynchronous) aggregation algorithm, hogwild (parallel SGD) aggregation algorithm, exclusion model aggregation algorithm, decentralization aggregation algorithm, and the like, and also supports the imported custom aggregation algorithm.
In this embodiment, the management end selects, from a plurality of preset aggregation algorithms, a target aggregation algorithm corresponding to the performance indexes of the plurality of clients according to the performance indexes of the plurality of clients.
And 104, aggregating each initial training model according to a target aggregation algorithm to obtain an aggregation model.
In this embodiment, after the management end generates the aggregation model, all clients participating in aggregation are notified to download the aggregation model, or download the latest initial training model file or initial training model parameters.
In this embodiment, performance indexes of a plurality of clients acquired by a management end generally belong to the same interval, and the management end selects a part of performance indexes from the performance indexes to perform selection of an aggregation algorithm, and only adopts one aggregation algorithm to aggregate each initial training model.
In this embodiment, the basic information of the aggregate object is saved, for example, the name, unique code, aggregation number, number of participating clients, aggregation trigger rule (such as minimum number of devices), aggregation failure rule, selected aggregation flow, elected master node (i.e. management end), etc. of the aggregate object are saved.
In one embodiment, step 102 comprises: acquiring hardware performance indexes of a plurality of clients; and acquiring software performance indexes of a plurality of clients.
In the implementation process, hardware performance indexes of the clients are obtained according to the memories, the CPUs, the storage spaces and/or the electric quantity of the clients.
In this embodiment, when calculating the hardware performance index, the proportion of the memory score of the client is 20%, the CPU score is 30%, the storage space score is 20%, and the electric quantity score is 30%, and the score proportion may be set according to the actual situation, which is not limited herein specifically.
In the implementation process, according to the operating systems, the training sample numbers and/or the connected network types of the plurality of clients, the software performance indexes of the plurality of clients are obtained.
In this embodiment, when calculating the performance index of the software, the ratio of the operating system score of the client is 30% (i.e., the ios (internet operating system) score is 20%, the Android (Android) score is 10%) in the operating system, the training sample number score is 40%, and the score ratio of the connected network type is 70% (i.e., the 4G is 20%, the 5G is 50%), which may be set according to the actual situation, and is not limited herein specifically.
In this embodiment, the calculation formula of the hardware index total score is: hardware index total score= (current power/total power) ×30++ (current memory/total memory) ×20++ (current memory/total memory) ×20++cpu core count×30%.
The calculation formula of the total score of the software index is as follows: software index total score = (operating system=1) + (ios=20, android=10%) + (number of training samples > 10000=1 or number of training samples < 1000=0.2) ×40++ (connected network type=1) + (4g=20, 5g=50%), where (operating system=1) in the formula, 1 is the initial value of the index (i.e. operating system).
In one embodiment, step 103 includes:
First, an index range to which performance indexes of a plurality of clients belong is obtained.
Then, the target aggregation algorithm is selected from a plurality of aggregation algorithms according to the index range.
In this embodiment, when the hardware indicator total score of the client is > =1.5 and the software indicator total score is < =0.5, the synchronization aggregation algorithm is suitable to be selected; wherein, the number of CPU cores is 30% >2.0, and an ADMM aggregation algorithm is suitable to be selected; (current power/total power) > 30% 0.2, a synchronous SGD aggregation algorithm is suitable to be selected; (current memory/total memory) > 20% >0.1, suitable for selecting a model average aggregation algorithm; if the scores of the individual items (CPU core number, electric quantity and memory) do not meet the specified index range, the model average aggregation algorithm is selected by default.
The hardware index score < = 1.5 and the software index score > = 0.7 of the client are suitable for selecting an asynchronous aggregation algorithm; wherein (network type=1) score >0.2 is suitable for selecting an asynchronous SGD aggregation algorithm, and (network type=1) score <0.1 is suitable for selecting AdaDelay aggregation algorithm; (training sample number > 10000=1 or training sample number < 1000=0.2) 40% score >0.3, the selection Hogwild of the aggregation algorithm is suitable; if none of the scores of the individual items (operating system, number of training samples, network type) meets the specified index range, the asynchronous SGD aggregation algorithm is selected by default.
It should be noted that, the hardware index total score of the client is high, which is suitable for adopting a synchronous aggregation algorithm, and the software index total score is high, which is suitable for adopting an asynchronous aggregation algorithm, and the higher the number of CPU cores, the stronger the calculation power.
If the hardware index total score and the software index total score of the client do not meet the specified index range, but are higher than the lowest threshold (0.5 or 0.7), the management end selects a model average aggregation algorithm; and selecting an exclusion model aggregation algorithm if the threshold value is lower than the lowest threshold value.
For example, the synchronous aggregation algorithm such as the synchronous SGD aggregation algorithm and the model average aggregation algorithm has higher requirements on the electric quantity, the memory, the storage space and the like of the client, wherein the electric quantity is a main index and accounts for 50% of all indexes, [ hardware index total score= (current electric quantity/total electric quantity) ×30++ (current memory/total memory) ×20++ (current storage space/total storage space) ×20++cpu core number×30% ], and because the aggregation operation needs to be performed after waiting for all clients to upload the initial training model, the indexes such as CPU, memory and training sample number are used for deciding the aggregation algorithm of the large model.
According to the embodiment, an initial training model obtained by training each of a plurality of clients and performance indexes of the clients are obtained; then selecting a target aggregation algorithm corresponding to the performance indexes of a plurality of clients from a plurality of preset aggregation algorithms; and finally, aggregating each initial training model according to a target aggregation algorithm to obtain an aggregation model. Therefore, the method and the system realize aggregation of the initial training models obtained by training the clients based on the performance indexes of the clients, and can accelerate the aggregation speed by configuring a plurality of aggregation algorithms in advance and selecting the algorithm which is most matched with the performance indexes of the clients to aggregate as the running efficiency of the same aggregation algorithm under different performance indexes can be different.
Example 2
As shown in fig. 2, the present embodiment provides a model aggregation system based on federal learning, which is suitable for a management end deployed in a distributed network, where the management end and a plurality of clients are deployed in the distributed network, and includes a first acquisition module 1, a second acquisition module 2, a selection module 3, and an aggregation module 4.
The first obtaining module 1 is configured to obtain an initial training model obtained by training each of the plurality of clients.
In this embodiment, the management end creates a plurality of aggregation instances in advance, for example, the aggregation instances may be instances of face recognition, voice recognition, and the like, or may be other aggregation instances, where one aggregation instance corresponds to one initial model. When a plurality of clients need to perform model aggregation for a certain aggregation instance, a management terminal receives a request for adding the aggregation instance sent by the plurality of clients, and audits the request for adding the aggregation instance by the plurality of clients, after the audit is passed, the plurality of clients download initial models corresponding to the aggregation instance from the management terminal, train the initial models to obtain initial training models, and then the plurality of clients upload the initial training models obtained by training to the management terminal through protocols such as HTTP (hyper text transfer protocol).
It should be noted that, the client may directly upload the initial training model file to the management end, or may upload the initial training model parameters to the management end.
In the embodiment, the management end can audit the requests for adding the aggregation instance sent by a plurality of clients in a signature authentication mode, and if the signature authentication is successful, the audit is passed; if signature authentication fails, the verification fails. Of course, the auditing can be performed in other modes, the auditing result is stored in the database, and the storing result is returned.
The second obtaining module 2 is configured to obtain performance indexes of a plurality of clients.
In this embodiment, the management end receives own basic information that is periodically reported by a plurality of clients, calculates performance indexes of the plurality of clients according to own basic information of the plurality of clients, and provides decision basis for a scheduling aggregation algorithm.
In this embodiment, the basic information of the client may be the electric quantity, the memory, the storage space, the CPU, the operating system, the number of training samples, the connected network type, or other information of the client, which is not limited herein.
The selection module 3 is configured to select a target aggregation algorithm corresponding to performance indexes of a plurality of clients from a plurality of preset aggregation algorithms.
In this embodiment, the management end presets a plurality of basic model aggregation algorithms, for example, the aggregation algorithms include, but are not limited to, a model average aggregation algorithm, BMUF aggregation algorithm, ADMM aggregation algorithm, synchronous SGD aggregation algorithm, asynchronous SGD aggregation algorithm, adaDelay aggregation algorithm, hogwild aggregation algorithm, exclude model aggregation algorithm, decentralize aggregation algorithm, and the like, and also support importing custom aggregation algorithm.
In this embodiment, the management end selects, from a plurality of preset aggregation algorithms, a target aggregation algorithm corresponding to the performance indexes of the plurality of clients according to the performance indexes of the plurality of clients.
The aggregation module 4 is configured to aggregate each initial training model according to a target aggregation algorithm, so as to obtain an aggregate model.
In this embodiment, after the management end generates the aggregation model, all clients participating in aggregation are notified to download the aggregation model, or download the latest initial training model file or initial training model parameters.
In this embodiment, performance indexes of a plurality of clients acquired by a management end generally belong to the same interval, and the management end selects a part of performance indexes from the performance indexes to perform selection of an aggregation algorithm, and only adopts one aggregation algorithm to aggregate each initial training model.
In this embodiment, the basic information of the aggregate object is saved, for example, the name, unique code, aggregation number, number of participating clients, aggregation trigger rule (such as minimum number of devices), aggregation failure rule, selected aggregation flow, elected master node (i.e. management end), etc. of the aggregate object are saved.
In an embodiment, as shown in fig. 2, the second acquisition module 2 includes a first acquisition unit 21 and a second acquisition unit 22.
The first obtaining unit 21 is configured to obtain hardware performance indexes of a plurality of clients.
The second obtaining unit 22 is configured to obtain software performance indexes of a plurality of clients.
In a specific implementation process, the first obtaining unit 21 is specifically configured to obtain the hardware performance indexes of the plurality of clients according to the memories, CPUs, storage spaces and/or electric quantities of the plurality of clients.
In this embodiment, when calculating the hardware performance index, the proportion of the memory score of the client is 20%, the CPU score is 30%, the storage space score is 20%, and the electric quantity score is 30%, and the score proportion may be set according to the actual situation, which is not limited herein specifically.
In a specific implementation process, the second obtaining unit 22 is specifically configured to obtain the software performance indexes of the plurality of clients according to the operating systems, the training sample numbers, and/or the connected network types of the plurality of clients.
In this embodiment, when calculating the performance index of the software, the ratio of the operating system score of the client is 30% (i.e. the ios score is 20% and the Android score is 10% in the operating system), the training sample number score is 40%, the score of the connected network type is 70% (i.e. the 4G is 20% and the 5G is 50%), and the score ratio may be set according to the actual situation, which is not limited herein specifically.
In this embodiment, the calculation formula of the hardware index total score is: hardware index total score= (current power/total power) ×30++ (current memory/total memory) ×20++ (current memory/total memory) ×20++cpu core count×30%.
The calculation formula of the total score of the software index is as follows: software index total score = (operating system=1) + (ios=20, android=10%) + (number of training samples > 10000=1 or number of training samples < 1000=0.2) ×40++ (connected network type=1) + (4g=20, 5g=50%), where (operating system=1) in the formula, 1 is the initial value of the index (i.e. operating system).
In an embodiment, as shown in fig. 2, the selection module 3 includes a third acquisition unit 311 and a selection unit 312.
The third obtaining unit 311 is configured to obtain an index range to which performance indexes of the plurality of clients belong.
The selection unit 312 is configured to select a target aggregation algorithm from a plurality of aggregation algorithms according to the index range.
In this embodiment, when the hardware indicator total score of the client is > =1.5 and the software indicator total score is < =0.5, the synchronization aggregation algorithm is suitable to be selected; wherein, the number of CPU cores is 30% >2.0, and an ADMM aggregation algorithm is suitable to be selected; (current power/total power) > 30% 0.2, a synchronous SGD aggregation algorithm is suitable to be selected; (current memory/total memory) > 20% >0.1, suitable for selecting a model average aggregation algorithm; if the scores of the individual items (CPU core number, electric quantity and memory) do not meet the specified index range, the model average aggregation algorithm is selected by default.
The hardware index score < = 1.5 and the software index score > = 0.7 of the client are suitable for selecting an asynchronous aggregation algorithm; wherein (network type=1) score >0.2 is suitable for selecting an asynchronous SGD aggregation algorithm, and (network type=1) score <0.1 is suitable for selecting AdaDelay aggregation algorithm; (training sample number > 10000=1 or training sample number < 1000=0.2) 40% score >0.3, the selection Hogwild of the aggregation algorithm is suitable; if none of the scores of the individual items (operating system, number of training samples, network type) meets the specified index range, the asynchronous SGD aggregation algorithm is selected by default.
It should be noted that, the hardware index total score of the client is high, which is suitable for adopting a synchronous aggregation algorithm, and the software index total score is high, which is suitable for adopting an asynchronous aggregation algorithm, and the higher the number of CPU cores, the stronger the calculation power.
If the hardware index total score and the software index total score of the client do not meet the specified index range, but are higher than the lowest threshold (0.5 or 0.7), the management end selects a model average aggregation algorithm; and selecting an exclusion model aggregation algorithm if the threshold value is lower than the lowest threshold value.
For example, the synchronous aggregation algorithm such as the synchronous SGD aggregation algorithm and the model average aggregation algorithm has higher requirements on the electric quantity, the memory, the storage space and the like of the client, wherein the electric quantity is a main index and accounts for 50% of all indexes, [ hardware index total score= (current electric quantity/total electric quantity) ×30++ (current memory/total memory) ×20++ (current storage space/total storage space) ×20++cpu core number×30% ], and because the aggregation operation needs to be performed after waiting for all clients to upload the initial training model, the indexes such as CPU, memory and training sample number are used for deciding the aggregation algorithm of the large model.
According to the embodiment, an initial training model obtained by training each of a plurality of clients and performance indexes of the clients are obtained; then selecting a target aggregation algorithm corresponding to the performance indexes of a plurality of clients from a plurality of preset aggregation algorithms; and finally, aggregating each initial training model according to a target aggregation algorithm to obtain an aggregation model. Therefore, the method and the system realize aggregation of the initial training models obtained by training the clients based on the performance indexes of the clients, and can accelerate the aggregation speed by configuring a plurality of aggregation algorithms in advance and selecting the algorithm which is most matched with the performance indexes of the clients to aggregate as the running efficiency of the same aggregation algorithm under different performance indexes can be different.
Example 3
The embodiment provides a model aggregation method based on federal learning, which is applicable to any client deployed in a distributed network, wherein a management end and a plurality of clients are deployed in the distributed network, as shown in fig. 3, and the model aggregation method includes:
step 301, uploading the respective training obtained initial training models to the management end together with other clients, so that the management end obtains the performance indexes of the clients after obtaining the respective initial training models, selects a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms, and aggregates the respective initial training models according to the target aggregation algorithm to obtain an aggregation model.
In this embodiment, a plurality of clients send a request for adding an aggregation instance to a management end, after the request passes through the audit, the plurality of clients download initial models corresponding to the aggregation instance from the management end, train the initial models to obtain initial training models, and upload the initial training models obtained by training to an upper management end through protocols such as HTTP.
In one embodiment, as shown in fig. 3, the model aggregation method further includes:
Step 302, when the management end is disconnected from the distributed network, at least one client end is selected from the plurality of client ends together with other client ends to serve as a new management end.
In this embodiment, after losing network connection with the management end, the client may elect a new master node (i.e. the management end) to aggregate, and issue metadata (e.g. aggregate instance information). Specifically, when the original master node fails, one or more backup nodes of the master are selected from the client, and the aggregation task is continuously executed.
In the embodiment, the initial training models obtained by respective training are uploaded to the management end together with other clients, so that aggregation of the initial training models is realized. When the management end is disconnected from the distributed network, at least one client end is selected from a plurality of client ends together with other client ends to serve as a new management end, so that the client ends can self-networking to perform small-range model aggregation in a weak network or no-network environment, and the model aggregation capability of the client ends in a specific environment is enhanced.
Example 4
The present embodiment provides a model aggregation system based on federal learning, which is applicable to any client deployed in a distributed network, where a management end and a plurality of clients are deployed in the distributed network, and as shown in fig. 4, the model aggregation system includes an upload module 41.
The uploading module 41 is configured to upload the respective initial training models obtained by training together with other clients to the management end, so that the management end obtains performance indexes of the clients after obtaining the respective initial training models, selects a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms, and aggregates the respective initial training models according to the target aggregation algorithm to obtain an aggregation model.
In this embodiment, a plurality of clients send a request for adding an aggregation instance to a management end, after the request passes through the audit, the plurality of clients download initial models corresponding to the aggregation instance from the management end, train the initial models to obtain initial training models, and upload the initial training models obtained by training to an upper management end through protocols such as HTTP.
In one embodiment, as shown in FIG. 4, the model aggregation system further includes an election module 42.
The election module 42 is configured to elect at least one client from the plurality of clients as a new management client together with other clients after the management client is disconnected from the distributed network.
In this embodiment, after losing network connection with the management end, the client may elect a new master node (i.e. the management end) to aggregate, and issue metadata (e.g. aggregate instance information). Specifically, when the original master node fails, one or more backup nodes of the master are selected from the client, and the aggregation task is continuously executed.
In the embodiment, the initial training models obtained by respective training are uploaded to the management end together with other clients, so that aggregation of the initial training models is realized. When the management end is disconnected from the distributed network, at least one client end is selected from a plurality of client ends together with other client ends to serve as a new management end, so that the client ends can self-networking to perform small-range model aggregation in a weak network or no-network environment, and the model aggregation capability of the client ends in a specific environment is enhanced.
Example 5
Fig. 5 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the federal learning-based model aggregation method of any one of embodiments 1 or 3. The electronic device 30 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 5, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, a bus 33 connecting the different system components, including the memory 32 and the processor 31.
The bus 33 includes a data bus, an address bus, and a control bus.
Memory 32 may include volatile memory such as Random Access Memory (RAM) 321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as the federal learning-based model aggregation method in any one of embodiments 1 or 3 of the present invention, by running a computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 35. Also, the resulting device 30 may also communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 36. As shown in fig. 5, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 30, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the federal learning-based model aggregation method of any one of embodiments 1 or 3.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the federal learning-based model aggregation method of any one of embodiments 1 or 3 when the program product is run on the terminal device.
Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.
Claims (4)
1. A model aggregation method based on federal learning, the model aggregation method being applicable to a management end deployed in a distributed network, wherein the management end and a plurality of clients are deployed in the distributed network, the model aggregation method being characterized in that the model aggregation method is applied to the management end and comprises:
acquiring an initial training model obtained by training each of the plurality of clients;
obtaining performance indexes of the plurality of clients comprises the following steps:
Acquiring software performance indexes of the plurality of clients;
The software performance index comprises an operating system of the client, the number of training samples and the type of the connected network;
The obtaining performance indexes of the plurality of clients further includes:
obtaining the hardware performance indexes of the plurality of clients comprises the following steps:
Acquiring hardware performance indexes of the plurality of clients according to the memories, the CPU, the storage space and the electric quantity of the plurality of clients
Selecting a target aggregation algorithm corresponding to performance indexes of a plurality of clients from a plurality of preset aggregation algorithms, wherein the target aggregation algorithm comprises the following steps:
acquiring index ranges to which performance indexes of the plurality of clients belong;
Selecting the target aggregation algorithm from the plurality of aggregation algorithms according to the index range, including:
Acquiring a hardware index total score and a software index total score of the client;
The calculation formula of the hardware index total score is as follows: hardware index total score= (current power/total power) ×30++ (current memory/total memory) ×20++ (current memory/total memory) ×20++cpu core count×30%;
The calculation formula of the software index total score is as follows: software index total score = (operating system=1) + (ios=20, android=10%) + (number of training samples > 10000=1 or number of training samples < 1000=0.2) ×40++ (connected network type=1) + (4g=20, 5g=50%);
when the hardware index total score of the client is > =1.5 and the software index total score is < =0.5, selecting a synchronous aggregation algorithm;
when the hardware index score < = 1.5 and the software index score > = 0.7 of the client side, an asynchronous aggregation algorithm is selected;
and aggregating each initial training model according to the target aggregation algorithm to obtain an aggregation model.
2. The model aggregation system based on federal learning is suitable for a management end deployed in a distributed network, wherein the management end and a plurality of clients are deployed in the distributed network, and is characterized by comprising a first acquisition module, a second acquisition module, a selection module and an aggregation module;
The first acquisition module is used for acquiring initial training models obtained by respective training of the plurality of clients;
the second acquisition module is used for acquiring performance indexes of the plurality of clients;
The second acquisition module includes a second acquisition unit: the second obtaining unit is used for obtaining software performance indexes of the plurality of clients; the software performance index comprises an operating system of the client, the number of training samples and the type of the connected network;
The second acquisition module further comprises a first acquisition unit;
The first obtaining unit is used for obtaining the hardware performance indexes of the plurality of clients, and the first obtaining unit is specifically used for obtaining the hardware performance indexes of the plurality of clients according to the memories, the CPUs, the storage spaces and the electric quantity of the plurality of clients;
the selection module is used for selecting a target aggregation algorithm corresponding to the performance indexes of the clients from a plurality of preset aggregation algorithms;
The selection module comprises a third acquisition unit and a selection unit;
The third obtaining unit is configured to obtain an index range to which performance indexes of the plurality of clients belong;
The selecting unit is configured to select, according to the index range, the target aggregation algorithm from the plurality of aggregation algorithms, including:
Acquiring a hardware index total score and a software index total score of the client;
The calculation formula of the hardware index total score is as follows: hardware index total score= (current power/total power) ×30++ (current memory/total memory) ×20++ (current memory/total memory) ×20++cpu core count×30%;
The calculation formula of the software index total score is as follows: software index total score = (operating system=1) + (ios=20, android=10%) + (number of training samples > 10000=1 or number of training samples < 1000=0.2) ×40++ (connected network type=1) + (4g=20, 5g=50%);
when the hardware index total score of the client is > =1.5 and the software index total score is < =0.5, selecting a synchronous aggregation algorithm;
when the hardware index score < = 1.5 and the software index score > = 0.7 of the client side, an asynchronous aggregation algorithm is selected;
And the aggregation module is used for aggregating the initial training models according to the target aggregation algorithm to obtain an aggregation model.
3. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the federal learning-based model aggregation method of claim 1 when the computer program is executed by the processor.
4. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the federal learning based model aggregation method according to claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804987.2A CN113568740B (en) | 2021-07-16 | 2021-07-16 | Model aggregation method, system, equipment and medium based on federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110804987.2A CN113568740B (en) | 2021-07-16 | 2021-07-16 | Model aggregation method, system, equipment and medium based on federal learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113568740A CN113568740A (en) | 2021-10-29 |
CN113568740B true CN113568740B (en) | 2024-09-03 |
Family
ID=78165099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110804987.2A Active CN113568740B (en) | 2021-07-16 | 2021-07-16 | Model aggregation method, system, equipment and medium based on federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113568740B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114254386B (en) * | 2021-12-13 | 2024-06-07 | 北京理工大学 | Federal learning privacy protection system and method based on hierarchical aggregation and blockchain |
CN115081626B (en) * | 2022-07-21 | 2024-05-31 | 山东大学 | Personalized federal few-sample learning system and method based on characterization learning |
CN116028031A (en) * | 2023-03-29 | 2023-04-28 | 中科航迈数控软件(深圳)有限公司 | Code automatic generation model training method, system and storage medium |
CN117114146B (en) * | 2023-08-11 | 2024-03-29 | 南京信息工程大学 | Method, device, medium and equipment for poisoning reconstruction of federal learning model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106131641A (en) * | 2016-06-30 | 2016-11-16 | 乐视控股(北京)有限公司 | A kind of barrage control method, system and Android intelligent television |
CN110929880A (en) * | 2019-11-12 | 2020-03-27 | 深圳前海微众银行股份有限公司 | Method and device for federated learning and computer readable storage medium |
US10713754B1 (en) * | 2018-02-28 | 2020-07-14 | Snap Inc. | Remote distribution of neural networks |
CN111796925A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Method and device for screening algorithm model, storage medium and electronic equipment |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10360012B2 (en) * | 2017-11-09 | 2019-07-23 | International Business Machines Corporation | Dynamic selection of deployment configurations of software applications |
CN112036558A (en) * | 2019-06-04 | 2020-12-04 | 北京京东尚科信息技术有限公司 | Model management method, electronic device, and medium |
US20210150037A1 (en) * | 2019-11-15 | 2021-05-20 | International Business Machines Corporation | Secure Federation of Distributed Stochastic Gradient Descent |
CN111144571B (en) * | 2019-12-20 | 2023-09-05 | 深圳市金溢科技股份有限公司 | Deep learning reasoning operation method and middleware |
CN111355739B (en) * | 2020-03-06 | 2023-03-28 | 深圳前海微众银行股份有限公司 | Data transmission method, device, terminal equipment and medium for horizontal federal learning |
CN111695675B (en) * | 2020-05-14 | 2024-05-07 | 平安科技(深圳)有限公司 | Federal learning model training method and related equipment |
CN112181666B (en) * | 2020-10-26 | 2023-09-01 | 华侨大学 | Equipment assessment and federal learning importance aggregation method based on edge intelligence |
CN112507973B (en) * | 2020-12-29 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | Text and picture recognition system based on OCR technology |
CN113033712B (en) * | 2021-05-21 | 2021-09-14 | 华中科技大学 | Multi-user cooperative training people flow statistical method and system based on federal learning |
-
2021
- 2021-07-16 CN CN202110804987.2A patent/CN113568740B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106131641A (en) * | 2016-06-30 | 2016-11-16 | 乐视控股(北京)有限公司 | A kind of barrage control method, system and Android intelligent television |
US10713754B1 (en) * | 2018-02-28 | 2020-07-14 | Snap Inc. | Remote distribution of neural networks |
CN111796925A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Method and device for screening algorithm model, storage medium and electronic equipment |
CN110929880A (en) * | 2019-11-12 | 2020-03-27 | 深圳前海微众银行股份有限公司 | Method and device for federated learning and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113568740A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113568740B (en) | Model aggregation method, system, equipment and medium based on federal learning | |
US10354201B1 (en) | Scalable clustering for mixed machine learning data | |
CN107391633A (en) | Data-base cluster Automatic Optimal processing method, device and server | |
US9513835B2 (en) | Impact-based migration scheduling from a first tier at a source to a second tier at a destination | |
US9210219B2 (en) | Systems and methods for consistent hashing using multiple hash rings | |
CN103701635A (en) | Method and device for configuring Hadoop parameters on line | |
CN116134448A (en) | Joint machine learning using locality sensitive hashing | |
WO2023169274A1 (en) | Data processing method and device, and storage medium and processor | |
CN115150471B (en) | Data processing method, apparatus, device, storage medium, and program product | |
CN114500578B (en) | Distributed storage system load balancing scheduling method, device and storage medium | |
CN115269108A (en) | Data processing method, device and equipment | |
AU2021244852B2 (en) | Offloading statistics collection | |
CN114490078A (en) | Dynamic capacity reduction and expansion method, device and equipment for micro-service | |
CN113487086B (en) | Method, device, computer equipment and medium for predicting residual service life of equipment | |
CN108664322A (en) | Data processing method and system | |
CN116185797A (en) | Method, device and storage medium for predicting server resource saturation | |
US20230061902A1 (en) | Intelligent dataset slicing during microservice handshaking | |
CN107798056A (en) | A kind of data query method and device | |
CN114913008A (en) | Decision tree-based bond value analysis method, device, equipment and storage medium | |
CN113392131A (en) | Data processing method and device and computer equipment | |
CN112183644B (en) | Index stability monitoring method and device, computer equipment and medium | |
CN118333529B (en) | Inventory early warning method and device, electronic equipment and storage medium | |
US20210406070A1 (en) | Managing Storage Device Compute Operations | |
CN113225228B (en) | Data processing method and device | |
CN118567854A (en) | Cloud platform storage device management method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |