CN118410860B - Efficient knowledge editing method and device in federal learning environment - Google Patents
Efficient knowledge editing method and device in federal learning environment Download PDFInfo
- Publication number
- CN118410860B CN118410860B CN202410892912.8A CN202410892912A CN118410860B CN 118410860 B CN118410860 B CN 118410860B CN 202410892912 A CN202410892912 A CN 202410892912A CN 118410860 B CN118410860 B CN 118410860B
- Authority
- CN
- China
- Prior art keywords
- data
- model
- forgetting
- category
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 94
- 239000013598 vector Substances 0.000 claims description 105
- 230000006870 function Effects 0.000 claims description 54
- 238000012545 processing Methods 0.000 claims description 50
- 230000008569 process Effects 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000003902 lesion Effects 0.000 claims description 2
- 230000000977 initiatory effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioethics (AREA)
- Computer Security & Cryptography (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the application provides a high-efficiency knowledge editing method and device in a federal learning environment, and the method for obtaining a target model meeting knowledge editing requirements by constructing a teacher model and a student model to enable the student model to learn the teacher model is characterized in that compared with the method for obtaining the target model meeting knowledge editing requirements by retraining through calibrating cached historical training parameters in a central server in the prior art, the method does not need to cache the historical parameters, so that privacy leakage can be avoided, and the problem of low operation efficiency of the central server caused by excessive cache can be avoided. Moreover, the method provided by the application can be applied to the scene that only part of data is forgotten, has universality and can meet different requirements of users.
Description
Technical Field
The application relates to the field of network communication, in particular to a method and a device for efficiently editing knowledge in a federal learning environment.
Background
Because federal learning has the capability of guaranteeing data privacy, in the scene that requirements on data privacy such as intelligent finance, intelligent medical treatment and intelligent communication are higher, the federal learning is often utilized to realize distributed deployment and training of the global model.
In the federal learning scenario, the client may initiate a forgetting request, where the forgetting request is used to indicate that local data of the rejection target client contributes to global model training, and how to make the performance of the obtained global model after performing the forgetting operation equal to the performance of the original global model (i.e. how to implement knowledge editing) is a problem that needs to be focused.
In the existing knowledge editing method under the federal learning environment, after a target client requests a forgetting request, a global model is rebuilt by calibrating historical training parameters cached in a central server. However, this method is only applicable to all local data of the forgetting target client, and cannot be applied to a scenario of partial data of the forgetting target client. Also, maintaining too many historical training parameters in the central server is detrimental to privacy protection.
Disclosure of Invention
In view of this, the embodiment of the application provides an efficient knowledge editing device method and device under the federal learning environment, which are not only suitable for the scene of forgetting partial data of a target client, but also capable of avoiding privacy disclosure.
The embodiment of the application provides a high-efficiency knowledge editing method in a federal learning environment, which is applied to a client, and comprises the following steps:
A forgetting request is initiated to a central server to trigger the central server to issue a global model F (omega) to the client; the global model F (omega) is trained based on a training data set, forgetting data which need to be forgotten and indicated by a forgetting request is part of data in local data of a client, and the local data is a subset of the training data set;
Splitting a target processing layer from the global model F (omega); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process;
Obtaining data of each category except forgetting data from the local data, and determining a centroid vector of each category by a target processing layer according to the data of the category; the centroid vector of any category is used for describing the characteristics of the data of the category;
Taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except forgetting data in the local data, and combining the trained student model and non-target processing layers in global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on centroid vectors of all classes, the teacher model and the trained student model; the performance of the target model after the client deletes the forgotten data and the performance difference of the global model F (omega) when the client does not delete the forgotten data are within a set difference range.
The embodiment of the application also provides a high-efficiency knowledge editing device in the federal learning environment, which is applied to the client and comprises:
The sending module is used for sending a forgetting request to the central server so as to trigger the central server to send a global model F (omega) to the client; the global model F (omega) is trained based on a training data set, forgetting data which need to be forgotten and indicated by a forgetting request is part of data in local data of a client, and the local data is a subset of the training data set;
The data processing module is used for obtaining data of various categories except forgetting data from the local data, and determining a centroid vector of each category according to the data of the category by the global model F (omega) aiming at each category; the centroid vector of any category is used for describing the characteristics of the data of the category;
The training module is used for splitting the target processing layer from the global model F (omega); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process; and taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except forgetting data in the local data, and combining the trained student model and the non-target processing layer in the global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on the centroid vector of each category, the teacher model and the trained student model.
The embodiment of the application also provides an image detection method based on federal learning, which is applied to the client, and comprises the following steps:
Inputting a target image to be detected into a deployed target model in the client, wherein the target model is obtained based on the method provided by the embodiment, and detecting a target in the target image through the target model so as to detect a target object existing in the target image;
the target image is any one of the following images:
a CT image, a lane image acquired by a camera aiming at a target lane, and a face image;
Wherein when a CT image is selected as the target image, the target object is a lesion area;
when a lane image is selected as the target image, the target object is a vehicle;
when a face image is selected as the target image, the target object is a living face.
The embodiment of the application also provides electronic equipment, which comprises: a processor and a memory for storing computer program instructions which, when executed by the processor, cause the processor to perform the steps of the method as above.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method as described above.
Embodiments of the present application also provide a machine-readable storage medium storing computer program instructions which, when executed, enable the steps of the method as above to be carried out.
As can be seen from the above technical solution, in this embodiment, the client initiates a forgetting request to the central server, so as to trigger the central server to issue the global model F (ω) to the client. The client splits the target processing layer from the global model F (omega), obtains data of each category except the forgetting data from the local data, and determines the centroid vector of each category according to the data of the category by the target processing layer for each category. And taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by utilizing local data, and combining the trained student model and a non-target processing layer in global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on the centroid vector of each category, the teacher model and the trained student model.
Compared with the method for retraining the target model meeting the knowledge editing requirement by calibrating the cached historical training parameters in the central server in the prior art, the method does not need to cache the historical parameters, can avoid privacy leakage, and can also avoid the problem of low operation efficiency of the central server caused by excessive cache.
Furthermore, the method provided by the application can be applied to the scene that only part of data is forgotten, has universality and can meet different requirements of users.
Drawings
FIG. 1 is a diagram of a scenario architecture for federal learning provided in the related art;
FIG. 2 is a schematic flow chart of a method according to an embodiment of the present application;
Fig. 3 is a schematic flow chart of a teacher-student model training provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solution provided by the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more obvious, the technical solution in the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before describing the method provided by the application, the problems existing in the prior knowledge editing are elaborated by combining the federal learning scene:
In the scene of high requirements on data privacy such as intelligent finance, intelligent medical treatment and intelligent communication, the distributed deployment and training of the global model are often realized by utilizing federal learning. Federal learning is adopted because the data can be stored among a plurality of clients in a scattered way, and the distributed deployment and training of the global model are realized through the cooperation of the clients, so that the privacy requirement of the scene can be met in a mode of training and deploying the model without gathering all the data together.
Referring to fig. 1, fig. 1 is a scene architecture diagram of federal learning provided in the related art, and as shown in fig. 1, the federal learning scene generally involves one central server and N clients. The central server selects k clients from the N clients to participate in training, and the central server transmits an initial global model F (omega) to the clients. Then, in each round of training, each client performs local training on the global model F (ω) by using the local data Dk, and obtains an updated local model F (ω k) through gradient descent. Each server uploads the updated local model F (ω k) to the central server. The central server obtains an updated global model corresponding to the round of training by carrying out weighted average aggregation on the updated local models F (omega k) received by the k clients. And the central server repeatedly iterates the training according to the mode after the global model corresponding to the round of training is sent to k clients again until the iteration stopping condition is met, so as to obtain a trained global model Q.
In the process, the data of each client is stored locally and is not transmitted to other clients, so that the data privacy of each client can be protected.
On the basis, a user can initiate a forgetting request according to actual demands, wherein the forgetting request is used for indicating that local data of a client side is removed to make contributions in global model training, for example, a client side D in the k client sides initiates the forgetting request due to the actual demands, and the contributions of the local data of the client side D to the global model Q are required to be deleted. However, how to make the performance of the obtained global model after the forgetting operation be equivalent to that of the original global model by the knowledge editing is a problem that needs to be focused, where the knowledge editing is to make the performance of the obtained global model after the forgetting operation be equivalent to that of the original global model by the execution operation.
In the related art, knowledge editing is realized by the steps of:
1. the central server caches all the historical training parameters uploaded by the k clients in the training process.
2. After receiving the forgetting request initiated by the target client, the central server issues a current global model to each client (here, the current global model is a trained global model Q obtained through federal learning).
3. In addition to the other k-1 clients preceding the target client, iterative training with fewer passes than the training process described above (e.g., 100 passes in the federal learning described above, where the passes may be 50) is performed based on the global model Q, and the local model parameters for each pass of iterations are sent to the central server.
4. The central server uses the newly received local model parameters to calibrate the stored historical parameters to reconstruct the global model Q'.
However, this approach has the following problems:
First, such forgetting operation is in the smallest unit of the client, that is, the forgetting operation only requires to forget all local data of the target client, but only partial data of the target client may be needed in more application scenarios, and the above method is not applicable in this case.
Secondly, the central server retains the historical training parameters in each iteration, which can pose a risk of privacy disclosure if acquired by other personnel, which is not conducive to privacy protection.
Finally, maintaining excessive historical training parameters in the central server may occupy significant system resources of the central server, which may affect the operating efficiency of the central server.
Based on the above, in order to solve the above problems, the embodiments of the present application provide a method and an apparatus for efficient knowledge editing in a federal learning environment.
The method provided by the embodiment of the application is described below:
referring to fig. 2, fig. 2 is a schematic flow chart of a method according to an embodiment of the present application. The flow the method is applied to the client.
As shown in fig. 2, the process may include the steps of:
S201, a forgetting request is initiated to the central server to trigger the central server to issue a global model F (omega) to the client. The global model F (ω) is trained based on a training data set, forgetting data to be forgotten indicated by a forgetting request is part of local data of the client, and the local data is a subset of the training data set.
As an embodiment, the global model is a classification model, and each training sample in the training sample set has a corresponding label, from which the classification of the training data can be determined.
S202, obtaining data of various categories except forgetting data from local data, and determining a centroid vector of each category by the data of the category of the global model F (omega) aiming at each category; the centroid vector of any category is used to characterize the data of the present category.
In this embodiment, the global model is known as a classification model, and each data in the local data is used as a training sample, and is configured with a sample tag, where the sample tag indicates the category to which the data belongs, so that the data in each category can be obtained according to the tag carried by the data except the forgetting data in the local data.
In an embodiment, the global model F (ω) is composed of a feature extraction layer D (ω) (also referred to as a feature extractor) and a classification layer E (ω) (also referred to as a classifier), and the specific implementation manner of determining the centroid vector of the class according to the data of the class may be: and for each piece of residual data in the category, carrying out feature extraction on the residual data by using a feature extraction layer D (omega) in the global model F (omega) to obtain a feature vector of the residual data. And carrying out specified operation on the feature vectors of the residual data under the category to obtain an operation result, and taking the operation result as a centroid vector under the category. For example, each piece of residual data in the category a is input into the feature extraction layer D (ω) to obtain a feature vector of the residual data, and an average value of the feature vectors of the residual data in the category a is used as a centroid vector in the category a.
S203, splitting a target processing layer from the global model F (omega); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process; and taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except forgetting data in the local data, and combining the trained student model and the non-target processing layer in the global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on the centroid vector of each category, the teacher model and the trained student model.
The difference between the performance of the target model after the client deletes the forgotten data and the performance of the global model F (ω) when the client does not delete the forgotten data is within a set difference range. That is, the performance of the target model after the client deletes the forgotten data is equal to the performance of the global model F (ω) when the client does not delete the forgotten data.
In the embodiment, the feature extraction layer D (ω) is a processing layer most related to the forgetting data, and the feature extraction layer D (ω) is used for extracting features of training data and also extracting features of each forgetting data, which are most related to the forgetting data, and therefore, the feature extraction layer D (ω) is taken as a target processing layer.
In this embodiment, the target processing layer is split from the global model F (ω) as a teacher model, and the target processing layer is directly copied as a student model, and the student model and the teacher model are initialized with the same initial parameters to eliminate training differences due to the different parameters.
In this embodiment, the specific implementation manner of training the teacher-student model will be described by way of example in the following specific embodiment, which is also omitted herein.
Thus, the flow shown in fig. 2 is completed.
As can be seen from the flow shown in fig. 2, the client initiates a forget request to the central server to trigger the central server to issue the global model F (ω) to the client. The client splits the target processing layer from the global model F (omega), obtains data of each category except the forgetting data from the local data, and determines the centroid vector of each category according to the data of the category by the target processing layer for each category. And taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by utilizing local data, and combining the trained student model and a non-target processing layer in global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on the centroid vector of each category, the teacher model and the trained student model.
Compared with the method for retraining the target model meeting the knowledge editing requirement by calibrating the cached historical training parameters in the central server in the prior art, the method does not need to cache the historical parameters, can avoid privacy leakage, and can also avoid the problem of low operation efficiency of the central server caused by excessive cache.
Furthermore, the method provided by the application can be applied to the scene that only part of data is forgotten, has universality and can meet different requirements of users.
The specific training process of the teacher-student model is described in detail below with reference to fig. 3:
As shown in fig. 3, the process may include the steps of:
s301, obtaining a set loss function.
In the present embodiment, the loss function is constituted by the following three parts.
1. The loss function of the first portion is determined in accordance with a degree of difference between an output result of the teacher model corresponding to each of the remaining data and an output result of the corresponding student model. One of the purposes of training the teacher and student models is to: the student model is made to be as identical as possible in the feature extraction capability of the remaining data as the teacher model, so that the obtained target model is comparable in the performance of the remaining data to the original global model. Therefore, the above object can be achieved by determining the loss function of the first portion according to the degree of difference between the output result of the teacher model and the output result of the student model for each remaining data.
For example, the difference between the output result of the teacher model and the output result of the student model corresponding to each piece of remaining data is represented by the euclidean distance value, and the larger the euclidean distance value is, the larger the difference is. In this technique, the loss function of the first portion is expressed in terms of a weighted average of euclidean distance values corresponding to each remaining data.
Illustratively, the loss function L1 of the first portion may be expressed by the following formula:
Where NDr represents the amount of remaining data;
x j denotes the j-th remaining data of the total remaining data Dr;
Representing the output result of the teacher model corresponding to the j-th residual data;
representing the output result of the student model corresponding to the j-th residual data;
a Euclidean distance value between the output result of the teacher model corresponding to the jth residual data and the output result of the student model corresponding to the jth residual data is represented;
2. the loss function of the second part is determined according to the output result of the student model corresponding to each piece of the obtained forgetting data and the difference degree between the forgetting vectors which are matched with the forgetting data and are searched from the centroid vectors of the categories. Here, the forgetting vector is used to characterize the target direction of the update optimized by the target feature layer after the forgetting data is forgotten.
The purpose of training the teacher-student model is that: the feature extraction capability of the student model is different from that of the teacher model, in other words, the greater the difference between the output result of the forgetting data under the student model and the output result of the forgetting data under the teacher model, the more the aim can be achieved. The greater the difference between the output result of the forgotten data under the student model and the output result of the forgotten data under the teacher model, the smaller the difference between the output result of the forgotten data under the student model and the characteristics of the forgotten data under the non-present category (the characteristics of the forgotten data under the non-present category can be represented by the forgetting vector corresponding to the forgotten data, which is described in detail below). And each part of the loss function is optimized towards small trend, so the loss function of the second part is determined according to the difference degree between the output result of each forgetting data under the student model and the forgetting vector corresponding to the forgetting data.
For example, the difference between the output result of the student model corresponding to each forgetting data and the forgetting vector corresponding to the forgetting data is represented by the euclidean distance value, and the larger the euclidean distance value is, the larger the difference between the output result and the euclidean distance value is, and vice versa. The loss function of the second part is embodied according to a weighted average of Euclidean distance values corresponding to each forgetting data.
Illustratively, the loss function L2 of the second portion may be expressed by the following formula:
The loss function of the second portion may be expressed by the following formula:
wherein ND f represents the amount of forgetting data;
x i denotes the i-th forget data in the total forget data D f;
Representing an output result of the student model corresponding to the ith forgetting data;
representing a forgetting vector corresponding to the i-th forgetting data; The output result of the student model corresponding to the ith forgetting data and the Euclidean distance value between forgetting vectors corresponding to the ith forgetting data are represented;
Alpha is a first set weight.
The root cause that the characteristics of the forgotten data under the non-present category can be reflected by the forgotten vector corresponding to the forgotten data is as follows: the forgetting vector corresponding to any forgetting data is obtained by the following steps: for each type of centroid vector, a similarity value between the centroid vector of the type and the output result of the obtained forgetting data under the teacher model is obtained, and the centroid vector under the type corresponding to the maximum similarity value is determined as the forgetting vector from the centroid vectors of the types different from the type to which the forgetting data belongs. Since the forgetting vector is one with the largest similarity in centroid vectors different from the category to which the forgetting data belongs, namely the forgetting vector does not belong to the same category and is very similar, the characteristics of the forgetting data under the non-self category can be represented by the forgetting vector corresponding to the forgetting data.
3. The loss function of the third part is a set constraint function, and the constraint function is used for constraining the adjustment amplitude of each parameter in the student model in the iterative training process. Alternatively, the constraint function is obtained by regularizing each parameter of the teacher model and each parameter of the student model.
Illustratively, the loss function L3 of the third portion may be expressed by the following formula:
wherein ω D represents parameters corresponding to the teacher model;
representing parameters corresponding to the student model;
representing parameters corresponding to the teacher model and parameters corresponding to the student model for regularization calculation;
beta is a second set weight.
In summary, the loss function can be expressed by the following formula:
S302, inputting the output result of the teacher model corresponding to each piece of residual data, the output result of the student model corresponding to each piece of residual data and each centroid vector into a preset loss function to obtain a loss function value.
S303, judging whether the loss function value meets the set requirement.
If the result of the above step S303 is yes, the following step S304 is performed, otherwise, parameters of the student model are adjusted, and the above step S302 is performed back.
S304, determining that training iteration stops.
In this embodiment, when the loss function value meets the set requirement, the characteristics of the output of the student model are similar to those of the output of the teacher model when the remaining data is input, so that a basis is provided for making the obtained target model and the original global model equivalent in the performance of the remaining data.
In this embodiment, the student model obtained when the training iteration is stopped is determined to be the trained student model. And combining the trained student model and non-target processing layers in the global model parameters F (omega) into a target model, so that the obtained target model can meet the requirement of knowledge editing, and the performance of the target model is equivalent to that of the global model F (omega) when forgetting data are not deleted.
For a more detailed description, the following provides a more particular description of the application, by way of specific embodiments:
In this embodiment, the process includes the following steps:
1. The target client initiates a forget request to the central server.
In this embodiment, the target client initiates a forgetting request for partial data in the local data of the forgetting client to the central server, for privacy protection or data update. Here, all local data is denoted by Dk, and all forgetting data is denoted by D f; all remaining data is denoted by Dr.
Dr=Dk-Df
2. The central server transmits a global model F (omega) to the target client; wherein the global model F (omega) is composed of a feature extraction layer D (omega) and a classification layer E (omega), and D (omega) is used as a teacher model to construct a student modelThe student model is identical to the teacher model in structure.
3. Classifying all the residual data, and obtaining a centroid vector corresponding to each category by using the following formula for each category:
N m represents the amount of remaining data under category m;
The feature vector representing the j-th remaining data under the category m (this feature vector is obtained by using the feature extraction layer D (ω), that is, the output result of the teacher model corresponding to the j-th remaining data under the category m), cm is the centroid vector of the category m.
5. Each forget dataInputting the data into a teacher model D (omega) for feature extraction to obtain feature vectors of the forgetting data(I.e., the teacher model corresponding to the forgotten data outputs the result), and the forgotten data is obtained by the following formulaCorresponding forgetting vector:
thus, a corresponding forgetting vector of each forgetting data is obtained.
6. Each forget dataInput to student modelExtracting features to obtain student model output results corresponding to the forgetting data。
7. Each remaining dataRespectively inputting the residual data into a teacher model and a student model to obtain an output result of the teacher model corresponding to the residual dataAnd the output result of the student model corresponding to the residual data。
8. Outputting the result of the teacher model corresponding to the residual dataOutput results of student models corresponding to the residual dataOutput data of student model corresponding to each forgetting dataCorresponding forgetting vector of each forgetting dataThe following formula is entered:
the loss function value is obtained.
9. And when the loss function value meets the set requirement, determining that the training iteration is stopped.
10. Student model with trainingAnd the classification layer E (omega) is spliced to obtain a target model F' (omega) meeting knowledge requirements.
After the target model is obtained by using the method provided in the above embodiment, the above target model is deployed on any client in the networking architecture shown in fig. 1. As such, the clients obtained based on federal learning described above may be applied in many fields.
The application to the aspect of image detection is specifically described below, and the embodiment of the application also provides an image detection method based on federal learning. Based on the distributed cluster shown in fig. 1, the object model deployed on any client in the distributed cluster is obtained by training the method provided by the above-described embodiment. When any client obtains a target image to be detected, the target image to be detected is input into a locally deployed target model, and targets in the target image are detected through the target model, so that target objects existing in the target image are detected. That is, the image detection method is an image detection method based on efficient knowledge editing in the federal learning environment.
Optionally, as an embodiment, in the process of obtaining the target model, the training sample set is a CT image set, and the sample is labeled as a label of a focal region (such as a tumor). Correspondingly, the target image is also a CT image, and the target object is a focus area. That is, the object model in the client in the distributed cluster may be used to detect focal regions in the CT image.
As an embodiment, in the process of obtaining the target model, the training sample set is a lane image set collected by the camera aiming at the target lane, and the sample label is a label of the vehicle. Correspondingly, the target image is also a lane image, and the target object is a vehicle. That is, the object model in the clients in the distributed cluster may be used to detect vehicles in the lane images.
As an embodiment, in the process of obtaining the target model, the training sample set is a face image set, and the sample label is a label of a living face. Correspondingly, the target image is also a face image, and the target object is a living face. That is, the object model in the client in the distributed cluster may be used to detect a live face in the face image.
It should be noted that, the application of the object model deployed in any of the above clients is not limited to the above image detection scenario, but may be applicable to other scenarios, and the embodiment is not particularly limited.
The method provided by the embodiment of the application is described above, and the device provided by the embodiment of the application is described below:
Referring to fig. 4, fig. 4 is a block diagram of an apparatus according to an embodiment of the present application. The device is applied to a client, and comprises: a transmitting module 401, a data processing module 402 and a training module 403.
A sending module 401, configured to initiate a forget request to the central server, so as to trigger the central server to issue a global model F (ω) to the client; the global model F (omega) is trained based on a training data set, forgetting data which need to be forgotten and indicated by a forgetting request is part of data in local data of a client, and the local data is a subset of the training data set;
A data processing module 402, configured to obtain data of each category other than the forgetting data from the local data, and for each category, determine a centroid vector of the category according to the data of the category by the global model F (ω); the centroid vector of any category is used for describing the characteristics of the data of the category;
A training module 403, configured to split the target processing layer from the global model F (ω); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process; and taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except forgetting data in the local data, and combining the trained student model and the non-target processing layer in the global model parameters F (omega) into the target model based on the requirement of knowledge editing when determining that training iteration is stopped based on the centroid vector of each category, the teacher model and the trained student model.
As one embodiment, the global model F (ω) determines a centroid vector for the category from the data for the category, comprising:
For each piece of residual data in the category, carrying out feature extraction on the residual data by utilizing a feature extraction layer in a global model F (omega) to obtain a feature vector of the residual data;
and carrying out specified operation on the feature vectors of the residual data under the category to obtain an operation result, and taking the operation result as a centroid vector under the category.
As one embodiment, training a student model using the remaining data other than the forgetting data in the local data, determining when the training iteration is stopped based on the centroid vector of each category, the teacher model, and the trained student model includes:
For each piece of residual data in the residual data, respectively inputting the residual data into a teacher model and a student model to respectively obtain an output result of the teacher model and an output result of the student model;
Inputting the output result of the teacher model corresponding to each piece of residual data, the output result of the student model corresponding to each piece of residual data and each centroid vector into a preset loss function to obtain a loss function value;
if the loss function value meets the set requirement, determining that the training iteration is stopped;
Wherein, the loss function is composed of the following three parts:
The loss function of the first part is determined according to the degree of difference between the output result of the teacher model and the output result of the student model corresponding to each piece of residual data;
The loss function of the second part is determined according to the output result of the student model corresponding to each piece of forgetting data in the obtained forgetting data and the difference degree between the forgetting vectors matched with the forgetting data and found from the centroid vectors of all the categories; the forgetting vector is used for representing the object direction of the object feature layer optimization updating after the forgetting data are forgotten;
The loss function of the third part is a set constraint function, and the constraint function is used for constraining the adjustment amplitude of each parameter in the student model in the iterative training process.
As one embodiment, the forgetting vector that matches the forgetting data obtained from each class of centroid vectors includes:
for each class of centroid vector, obtaining a similarity value of the centroid vector of the class and an output result of the teacher model corresponding to the obtained forgetting data;
from among centroid vectors of categories different from the category to which the forgetting data belongs, a centroid vector under the category corresponding to the maximum similarity value is determined as a forgetting vector.
As one embodiment, the constraint function is obtained by regularized calculation of each parameter of the teacher model and each parameter of the student model.
The embodiment of the application also provides a device which is applied to the client, and the device comprises: the detection module is used for inputting the target image to be detected into the deployed target model in the client, and detecting the target in the target image through the target model so as to detect the target object existing in the target image; the target model is obtained through training by the method provided by the embodiment, and the target image is any one of the following images: a CT image, a lane image acquired by a camera aiming at a target lane, and a face image;
Wherein, when the CT image is selected as the target image, the target object is a focus area; when the lane image is selected as the target image, the target object is a vehicle; when a face image is selected as the target image, the target object is a living face.
Referring to fig. 5, fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the hardware structure may include: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to implement the methods disclosed in the above examples of the present application.
Based on the same application concept as the above method, the embodiment of the present application further provides a machine-readable storage medium, where a number of computer instructions are stored, where the computer instructions can implement the method disclosed in the above example of the present application when the computer instructions are executed by a processor.
By way of example, the machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, and the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state disk, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Claims (10)
1. An efficient knowledge editing method in a federal learning environment, wherein the method is applied to a client, the method comprising:
initiating a forgetting request to a central server to trigger the central server to issue a global model F (omega) to the client; the global model F (omega) is trained based on a training data set, forgetting data which need to be forgotten and indicated by the forgetting request is part of data in local data of the client, and the local data is a subset of the training data set;
Obtaining data of various categories except the forgetting data from the local data, and determining a centroid vector of each category by the global model F (omega) according to the data of the category; the centroid vector of any category is used for describing the characteristics of the data of the category;
Splitting a target processing layer from the global model F (omega); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process; and taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except the forgetting data in the local data, and forming a target model by the trained student model and a non-target processing layer in the global model parameters F (omega) based on the requirement of knowledge editing when determining that training iteration is stopped based on centroid vectors of all classes, the teacher model and the trained student model.
2. The method according to claim 1, wherein the global model F (ω) determines a centroid vector for the category from the data for the category, comprising:
For each piece of residual data in the category, carrying out feature extraction on the residual data by utilizing a feature extraction layer in the global model F (omega) to obtain a feature vector of the residual data;
And carrying out specified operation on the feature vectors of the residual data under the category to obtain an operation result, and taking the operation result as a centroid vector under the category.
3. The method of claim 1, wherein the training the student model using the remaining data of the local data other than the forgetting data, determining a training iteration stop based on the centroid vector of each category, the teacher model, and the trained student model comprises:
For each piece of residual data in the residual data, respectively inputting the residual data into the teacher model and the student model to respectively obtain an output result of the teacher model and an output result of the student model;
Inputting the output result of the teacher model corresponding to each piece of residual data, the output result of the student model corresponding to each piece of residual data and each centroid vector into a preset loss function to obtain a loss function value;
If the loss function value meets the set requirement, determining that training iteration is stopped;
wherein, the loss function is composed of the following three parts:
The loss function of the first part is determined according to the difference degree between the output result of the teacher model and the output result of the student model corresponding to each piece of residual data;
The loss function of the second part is determined according to the output result of the student model corresponding to each piece of forgetting data in the obtained forgetting data and the difference degree between the forgetting vectors matched with the forgetting data and found from the centroid vectors of all the categories; the forgetting vector is used for representing the object direction of the object processing layer optimization updating after the forgetting data are forgotten;
The loss function of the third part is a set constraint function, and the constraint function is used for constraining the adjustment amplitude of each parameter in the student model in the iterative training process.
4. A method according to claim 3, wherein the forgetting vector matching the forgetting data obtained from the centroid vector of each class comprises:
for each class of centroid vector, obtaining a similarity value between the class of centroid vector and an output result of the teacher model corresponding to the obtained forgetting data;
And determining the centroid vector under the category corresponding to the maximum similarity value from the centroid vectors of the categories which are different from the category to which the forgetting data belongs as the forgetting vector.
5. A method according to claim 3, wherein the constraint function is obtained by regularizing each parameter of the teacher model and each parameter of the student model.
6. An efficient knowledge editing device in a federal learning environment, the device being applied to a client, the device comprising:
The sending module is used for sending a forgetting request to the central server so as to trigger the central server to send a global model F (omega) to the client; the global model F (omega) is trained based on a training data set, forgetting data which need to be forgotten and indicated by the forgetting request is part of data in local data of the client, and the local data is a subset of the training data set;
A data processing module, configured to obtain, from the local data, data of each category other than the forgetting data, and for each category, determine, by the global model F (ω), a centroid vector of the category according to the data of the category; the centroid vector of any category is used for describing the characteristics of the data of the category;
The training module is used for splitting a target processing layer from the global model F (omega); the target processing layer is a processing layer which has correlation with the forgetting data in the global model F (omega) training process; and taking the target processing layer as a teacher model, constructing a student model with the same structure as the teacher model, training the student model by using the rest data except the forgetting data in the local data, and forming a target model by the trained student model and a non-target processing layer in the global model parameters F (omega) based on the requirement of knowledge editing when determining that training iteration is stopped based on centroid vectors of all classes, the teacher model and the trained student model.
7. The apparatus of claim 6, wherein the global model F (ω) determines a centroid vector for the category from the data for the category, comprising:
For each piece of residual data in the category, carrying out feature extraction on the residual data by utilizing a feature extraction layer in the global model F (omega) to obtain a feature vector of the residual data;
performing specified operation on the feature vectors of all the residual data under the category to obtain an operation result, and taking the operation result as a centroid vector under the category;
And/or the number of the groups of groups,
The training the student model by using the rest data except the forgetting data in the local data, and determining that the training iteration stops based on the centroid vector of each category, the teacher model and the trained student model comprises:
For each piece of residual data in the residual data, respectively inputting the residual data into the teacher model and the student model to respectively obtain an output result of the teacher model and an output result of the student model;
Inputting the output result of the teacher model corresponding to each piece of residual data, the output result of the student model corresponding to each piece of residual data and each centroid vector into a preset loss function to obtain a loss function value;
If the loss function value meets the set requirement, determining that training iteration is stopped;
wherein, the loss function is composed of the following three parts:
The loss function of the first part is determined according to the difference degree between the output result of the teacher model and the output result of the student model corresponding to each piece of residual data;
The loss function of the second part is determined according to the output result of the student model corresponding to each piece of forgetting data in the obtained forgetting data and the difference degree between the forgetting vectors matched with the forgetting data and found from the centroid vectors of all the categories; the forgetting vector is used for representing the object direction of the object processing layer optimization updating after the forgetting data are forgotten;
The loss function of the third part is a set constraint function, and the constraint function is used for constraining the adjustment amplitude of each parameter in the student model in the iterative training process;
And/or the number of the groups of groups,
The forgetting vector which is obtained from the centroid vector of each class and is matched with the forgetting data comprises:
for each class of centroid vector, obtaining a similarity value between the class of centroid vector and an output result of the teacher model corresponding to the obtained forgetting data;
Determining a centroid vector under a category corresponding to a maximum similarity value from centroid vectors of categories different from the category to which the forgetting data belongs as the forgetting vector;
And/or the number of the groups of groups,
The constraint function is obtained by regularized calculation of each parameter of the teacher model and each parameter of the student model.
8. An image detection method based on federal learning, which is applied to a client, the method comprising:
inputting a target image to be detected into a deployed target model in the client, wherein the target model is obtained based on the method of any one of claims 1 to 5, and detecting a target in the target image through the target model so as to detect a target object existing in the target image;
the target image is any one of the following images:
a CT image, a lane image acquired by a camera aiming at a target lane, and a face image;
Wherein when a CT image is selected as the target image, the target object is a lesion area;
when a lane image is selected as the target image, the target object is a vehicle;
when a face image is selected as the target image, the target object is a living face.
9. An electronic device, comprising:
A processor; and
A memory in which computer program instructions are stored which, when executed by the processor, cause the processor to perform the steps of the method of any one of claims 1 to 5 and 8.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 5 and 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410892912.8A CN118410860B (en) | 2024-07-03 | 2024-07-03 | Efficient knowledge editing method and device in federal learning environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410892912.8A CN118410860B (en) | 2024-07-03 | 2024-07-03 | Efficient knowledge editing method and device in federal learning environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118410860A CN118410860A (en) | 2024-07-30 |
CN118410860B true CN118410860B (en) | 2024-09-24 |
Family
ID=92032695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410892912.8A Active CN118410860B (en) | 2024-07-03 | 2024-07-03 | Efficient knowledge editing method and device in federal learning environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118410860B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10942250B2 (en) | 2014-03-03 | 2021-03-09 | Rosemount Inc. | Positioning system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022811A (en) * | 2021-10-29 | 2022-02-08 | 长视科技股份有限公司 | Water surface floater monitoring method and system based on continuous learning |
CN115064155A (en) * | 2022-06-09 | 2022-09-16 | 福州大学 | End-to-end voice recognition incremental learning method and system based on knowledge distillation |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114787833A (en) * | 2019-09-23 | 2022-07-22 | 普雷萨根私人有限公司 | Distributed Artificial Intelligence (AI)/machine learning training system |
US20230177812A1 (en) * | 2021-12-08 | 2023-06-08 | The Hong Kong University Of Science And Technology | Random sampling consensus federated semi-supervised learning |
CN114429219A (en) * | 2021-12-09 | 2022-05-03 | 之江实验室 | Long-tail heterogeneous data-oriented federal learning method |
CN116468114A (en) * | 2023-03-31 | 2023-07-21 | 华为技术有限公司 | Federal learning method and related device |
CN117152480A (en) * | 2023-04-17 | 2023-12-01 | 北京工商大学 | Personalized federal learning method based on decoupling knowledge distillation |
CN117194974A (en) * | 2023-08-10 | 2023-12-08 | Oppo广东移动通信有限公司 | Data processing method, device, electronic equipment and storage medium |
CN117010534B (en) * | 2023-09-27 | 2024-01-30 | 中国人民解放军总医院 | Dynamic model training method, system and equipment based on annular knowledge distillation and meta federal learning |
CN118036708A (en) * | 2024-02-21 | 2024-05-14 | 南京航空航天大学 | Federal forgetting learning method based on history updating and correction |
-
2024
- 2024-07-03 CN CN202410892912.8A patent/CN118410860B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114022811A (en) * | 2021-10-29 | 2022-02-08 | 长视科技股份有限公司 | Water surface floater monitoring method and system based on continuous learning |
CN115064155A (en) * | 2022-06-09 | 2022-09-16 | 福州大学 | End-to-end voice recognition incremental learning method and system based on knowledge distillation |
Also Published As
Publication number | Publication date |
---|---|
CN118410860A (en) | 2024-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11893781B2 (en) | Dual deep learning architecture for machine-learning systems | |
Budiharto et al. | Fast object detection for quadcopter drone using deep learning | |
CN108182394B (en) | Convolutional neural network training method, face recognition method and face recognition device | |
WO2020125623A1 (en) | Method and device for live body detection, storage medium, and electronic device | |
Cuevas et al. | A comparison of nature inspired algorithms for multi-threshold image segmentation | |
US8953888B2 (en) | Detecting and localizing multiple objects in images using probabilistic inference | |
EP3757873B1 (en) | Facial recognition method and device | |
CN118410860B (en) | Efficient knowledge editing method and device in federal learning environment | |
Huang et al. | Multiple objects tracking in the UAV system based on hierarchical deep high-resolution network | |
CN113129335B (en) | Visual tracking algorithm and multi-template updating strategy based on twin network | |
Reddy et al. | AdaCrowd: Unlabeled scene adaptation for crowd counting | |
WO2019128564A1 (en) | Focusing method, apparatus, storage medium, and electronic device | |
WO2021030899A1 (en) | Automated image retrieval with graph neural network | |
JP7086878B2 (en) | Learning device, learning method, program and recognition device | |
Abbott et al. | Deep object classification in low resolution lwir imagery via transfer learning | |
Duman et al. | Distance estimation from a monocular camera using face and body features | |
Liu | RETRACTED ARTICLE: Video Face Detection Based on Deep Learning | |
Liu et al. | Collaborating domain-shared and target-specific feature clustering for cross-domain 3d action recognition | |
Negi et al. | End-to-end residual learning-based deep neural network model deployment for human activity recognition | |
Chen et al. | Edge artificial intelligence camera network: an efficient object detection and tracking framework | |
US11810385B2 (en) | Subject identification based on iterated feature representation | |
CN115018884A (en) | Visible light infrared visual tracking method based on multi-strategy fusion tree | |
Wang et al. | A robust long-term pedestrian tracking-by-detection algorithm based on three-way decision | |
CN115170994A (en) | Video identification method, device, equipment and computer readable storage medium | |
Chen et al. | Research on warehouse object detection algorithm based on fused densenet and ssd |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |