Privacy-Preserving Federated Learning Using Homomorphic Encryption
<p>Diagram for encryption, decryption, and partial decryption.</p> "> Figure 2
<p>System model for privacy-preserving federated learning.</p> "> Figure 3
<p>Diagram for secure averaging local model vector algorithm.</p> "> Figure 4
<p>Running time to execute homomorphic encryption and decryption with respect to the key size.</p> "> Figure 5
<p>Running time to execute the proposed secure averaging local model algorithm with respect to the number of clients in the cryptosystem with different key sizes.</p> "> Figure 6
<p>Running time to execute the proposed secure averaging local model algorithm with respect to the neural network size in the cryptosystem with different key sizes.</p> "> Figure 7
<p>Running time to execute the proposed PPFL and the PPDL using the Paillier cryptosystem with respect to the number of clients.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Preliminary
3.1. Federated Learning
3.2. Homomorphic Encryption
3.3. Distributed Homomorphic Cryptosystem
- Key generation: Function that generates a public-private key pair (), of a user for given two large prime numbers p and q, where is the number of clients participating in the local training.The public key is calculated by , and the private key corresponding to the public key is calculated by , where denotes the least common multiple (LCM) of x and y. Note that the key size K is . Then, as the selected prime numbers increase, the computational complexity of cryptosystem increases because the complexity for the exponentiation operation of encryption and decryption increases [22]. Then, partially private keys for distributed servers can be obtained by splitting the private key , where is the number of distributed servers [23]. We select that satisfies and at the same time and select y random numbers from . Then, we use these values to define the polynomial . The partial private key is obtained by calculating the polynomial using a non-zero value from .
- Encryption: Function that generates a ciphertext for a plaintext , using a public key , where the key size K is . For simplicity, the ciphertext can be represented by ;
- Decryption: Function that decrypts a ciphertext using a private key and returns m;
- Partial decryption: Function that generates a partially decrypted ciphertext by partially decrypting using a partial private key , , as shown in Figure 1. For simplicity, the partially decrypted ciphertext can be denoted by ;
- Combined decryption: Function that obtains and returns m using partially decrypted ciphertexts for and the partially private key . Note that a vector signifies in Figure 1.
4. System and Attack Models
4.1. System Model
- The CS, CP, and the clients may attempt to abuse each others’ data;
- Both the CS and CP are not simultaneously compromised by attackers;
- The CS and the CP do not cooperate to access client information.
4.2. Attack Model
4.2.1. Single Malicious Entity Attack
- Malicious clients: Clients protect data privacy by encrypting shared parameters using the HE scheme. In the proposed system where all clients have different private keys, even if a malicious client can eavesdrop on all channels between the cloud and the clients, the malicious client cannot access the data because it cannot decrypt the ciphertexts without the corresponding private key. In addition, even if multiple malicious clients cooperate, they cannot access other clients’ data without the corresponding private key;
- Single malicious server: The CS cannot access decrypted values in the proposed system because it only receives and handles encrypted values. For the CP, local model parameters can be accessed through combined decryption when partially decrypted values are delivered from the CS. However, the CS can prevent the CP from accessing the local model parameters by adding random noise to the encrypted local model parameters.
4.2.2. Cooperative Attacks by Multiple Malicious Entities
- Malicious clients and CS: If the CS cooperates with malicious clients, it can access the local model parameters of other clients because the CS can decide which clients participate in every iteration. For example, the CS can determine the list of participants with one client and the other malicious clients and calculate the average of encrypted local model parameters based on the HE scheme to obtain encrypted global model parameters. Then, the malicious clients can access the local parameters of the honest client by offsetting the local model parameter of the malicious clients by sharing their parameters because malicious clients can decrypt the global model parameter from the CS. This threat can be eliminated by ensuring more than one honest client at each iteration. In the proposed system, the threat can be eliminated by delivering the sum of local parameters through the cooperation of two or more honest clients. In addition, the privacy threat can be kept very low by ensuring the randomness of client selection through the KGC. When one honest client participates in the learning process, the conditional probability of all remaining participants being malicious clients can be expressed as , where is the ratio of the number of malicious clients to the total number of clients, and is the number of clients participating in the local training. Thus, the probability of having access to local parameters of honest clients becomes very small, despite multiple malicious clients and CS collaborating. For example, even if half of all clients are malicious and is 20, the probability is less than .
- Malicious clients and CP: The CP cannot access the client information because random noise is added to the client information by the CS. Even if several malicious clients cooperate with the CP, the CP cannot access the local model parameters because the CS samples random noise for each client.
- Malicious CS and CP: When the CS and the CP cooperate to access a shared local model parameter, the client’s information may be leaked. If all the distributed servers participating in the secure aggregation algorithm of SMC are compromised and cooperate with each other, there is no way to protect personal information. This paper assumes that the CS and the CP may be compromised simultaneously but do not cooperate to access client data. These assumptions are needed for building SMC-based DHC. To improve security in practice, we can increase the number of CPs participating in the secure aggregation algorithm, reducing the probability that multiple CPs and CS are malicious servers that cooperate with each other. In addition, an authentication procedure for the distributed cloud servers can be performed at the KGC to guarantee the servers participating in SMC are honest.
5. Privacy Preserving Federated Learning
5.1. Homomorphic Key Generation and Distribution
5.2. FL Local Model Training
5.3. Secure Global Model Update with DHC
- The CS receives the encrypted local model vectors from the clients participating in the training process. Note that the local model vector encrypted with the public key of the i-th client is represented as . Thereafter, the CS generates random vectors with the same size as the local model vector and encrypts them using the client’s public key, as shown in lines 2–3 of Algorithm 1;
- The CS performs homomorphic addition operations with the encrypted local model vectors using the encrypted random vectors in line 5. The result of homomorphic addition between and is represented as . Then, the CS partially decrypts the result of the homomorphic addition using the partial private key in line 6. This process is repeated for local model vectors. Subsequently, the CP sends the partially decrypted vectors to the CP, as shown in lines 8–9;
- As shown in lines 10–11, the CP partially decrypts the partially decrypted vectors using the partial private key and obtains . Afterwards, the CP adds all the decrypted vectors and divides the sum by the number of clients to obtain a vector containing the average parameters in line 12. The result is represented as ;
- The CP encrypts using the public keys of the clients in lines 13–15 and sends the encrypted vectors to the CS in line 16;
- Finally, the CS calculates the encrypted average global model vectors for the clients by performing the homomorphic addition operation with the encrypted sum of random noises , as shown in lines 17–19.
Algorithm 1 Secure averaging local model algorithm |
|
5.4. Data Structure and Protocol
6. Performance Analysis
6.1. Computational Overhead
6.1.1. Computational Overhead on the Client’s Side
6.1.2. Computational Overhead on the Server’s Side
6.2. Communication Overhead
6.2.1. Communication Overhead between Clients and the Cloud Server
6.2.2. Communication Overhead between the Cloud Server and the Computation Provider
6.3. Overhead Comparison
6.4. Security Analysis
- In the attack model of malicious clients, malicious clients can eavesdrop on the communication channels, and obtain ciphertexts and partially decrypted ciphertexts. Because the DHC is semantically secure, as described in [25], an attacker has to break the cryptosystem to obtain private data. If an honest client uses a longer key size, the attacker will have to use more computational resources to break the victim’s cryptosystem.
- In the attack model of a single malicious server, malicious servers can eavesdrop on the communication channels, and obtain ciphertexts and partially decrypted ciphertexts. As in the client attack model, an attacker must break the cryptosystem to obtain private data. Even though CP acquires the model parameters and performs the combined decryption to obtain the plaintext, it cannot access private data because the plaintext includes a random noise added by the CS.
- In the attack model of malicious clients and servers, the malicious entities can cooperate; the malicious client may provide a private key to the malicious server. If all clients use the same private key, the malicious server can access all clients’ private data because a malicious client can provide the private key to the malicious server. On the other hand, since the proposed system allows clients to use different private keys in the same FL-based system, the privacy leakage can be prevented in this attack model.
7. Performance Evaluation
7.1. Computational Overhead
7.2. Communication Overhead
8. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1333–1345. [Google Scholar]
- Hitaj, B.; Ateniese, G.; Perez-Cruz, F. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 603–618. [Google Scholar]
- Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
- Khan, L.U.; Saad, W.; Han, Z.; Hossain, E.; Hong, C.S. Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. 2021, 23, 1759–1799. [Google Scholar] [CrossRef]
- Lin, J.C.W.; Srivastava, G.; Zhang, Y.; Djenouri, Y.; Aloqaily, M. Privacy-preserving multiobjective sanitization model in 6G IoT environments. IEEE Internet Things J. 2020, 8, 5340–5349. [Google Scholar] [CrossRef]
- Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning, Virtual. 18–24 July 2021; pp. 6357–6368. [Google Scholar]
- Lin, F.P.C.; Hosseinalipour, S.; Azam, S.S.; Brinton, C.G.; Michelusi, N. Semi-decentralized federated learning with cooperative D2D local model aggregations. IEEE J. Sel. Areas Commun. 2021, in press. [Google Scholar] [CrossRef]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Ou, W.; Zeng, J.; Guo, Z.; Yan, W.; Liu, D.; Fuentes, S. A homomorphic-encryption-based vertical federated learning scheme for rick management. Comput. Sci. Inf. Syst. 2020, 17, 819–834. [Google Scholar] [CrossRef]
- Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; Liu, Y. Batchcrypt: Efficient homomorphic encryption for cross-silo federated learning. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIXATC 20), Online. 15–17 July 2020; pp. 493–506. [Google Scholar]
- Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. Secureboost: A lossless federated learning framework. IEEE Intell. Syst. 2021, in press. [Google Scholar] [CrossRef]
- Shokri, R.; Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1310–1321. [Google Scholar]
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
- Xu, G.; Li, H.; Liu, S.; Yang, K.; Lin, X. Verifynet: Secure and verifiable federated learning. IEEE Trans. Inf. Forensics Secur. 2019, 15, 911–926. [Google Scholar] [CrossRef]
- Fang, H.; Qian, Q. Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning. Future Internet 2021, 13, 94. [Google Scholar] [CrossRef]
- Shamir, A. How to share a secret. Commun. ACM 1979, 22, 612–613. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Singapore, 14–18 November 1999; pp. 223–238. [Google Scholar]
- Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the Annual ACM Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar]
- Liu, X.; Deng, R.H.; Choo, K.K.R.; Weng, J. An efficient privacy-preserving outsourced calculation toolkit with multiple keys. IEEE Trans. Inf. Forensics Secur. 2016, 11, 2401–2414. [Google Scholar] [CrossRef]
- Katz, J.; Lindell, Y. Introduction to Modern Cryptography; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
- Liu, X.; Choo, K.K.R.; Deng, R.H.; Lu, R.; Weng, J. Efficient and privacy-preserving outsourced calculation of rational numbers. IEEE Trans. Dependable Secur. Comput. 2018, 15, 27–39. [Google Scholar] [CrossRef]
- Barker, E.; Barker, E.; Burr, W.; Polk, W.; Smid, M. Recommendation for Key Management: Part 1: General; National Institute of Standards and Technology, Technology Administration: Gaithersburg, MD, USA, 2006. [Google Scholar]
- Bresson, E.; Catalano, D.; Pointcheval, D. A simple public-key cryptosystem with a double trapdoor decryption mechanism and its applications. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, 30 November–4 December 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 37–54. [Google Scholar]
K (Key Size) | 1024 | 3072 | 15,360 |
Client–CS | 422.1 | 422.4 | 422.4 |
CS–CP () | 8443 | 8448 | 8448 |
CS–CP () | 84,429 | 84,480 | 84,480 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.; Lim, H. Privacy-Preserving Federated Learning Using Homomorphic Encryption. Appl. Sci. 2022, 12, 734. https://doi.org/10.3390/app12020734
Park J, Lim H. Privacy-Preserving Federated Learning Using Homomorphic Encryption. Applied Sciences. 2022; 12(2):734. https://doi.org/10.3390/app12020734
Chicago/Turabian StylePark, Jaehyoung, and Hyuk Lim. 2022. "Privacy-Preserving Federated Learning Using Homomorphic Encryption" Applied Sciences 12, no. 2: 734. https://doi.org/10.3390/app12020734
APA StylePark, J., & Lim, H. (2022). Privacy-Preserving Federated Learning Using Homomorphic Encryption. Applied Sciences, 12(2), 734. https://doi.org/10.3390/app12020734