In this repository we aim to benchmark the hinderance that Fully Homomorphic Encryption (FHE) introduces, impacting efficiency.
Our project simulates a Federated Learning Model (FL) with its topology as shown below.
There are three separate tests that we run:
- Base Case - This example will only transfer information in plaintext under our FL model.
- Base Case + FHE - This example will transfer information under FHE and do computation only in the aggregator node under our FL model.
- Base Case + FHE + In-Network Processing - This example attempts to optimize and improve the efficiency of the process by introducing in-network processing, which offloads some of the computation to the middle switches in our FL model.
We successfully implement our FL model with the use of 3 major libraries:
- Flask - A python web application framework that enables our servers to listen and transfer data. We modify the routes to simulate a FL model environment.
- TenSEAL - A library that enables homomorphic encryption operations to be done on tensors. We leveraged this library to easily work with ML applications.
- Pytorch< 8000 /a> - A python framework that enables us to effortlessly implement ML models.
The motivation for this project was based on wanting to understand how much applications would suffer when FHE is implemented in order to protect users privacy. We wanted to gain an understanding and see how feasible it is to take the tradeoff that FHE provides. For this reason, there are several metrics that we would like to investigate.
Metric Name | Base | Base + FHE | Base + FHE + INP |
---|---|---|---|
Peak Memory (Gb) | 10 | 20 | 15 |
Process Time (s) | 5 | 8 | 12 |
Max File Size (Gb) | 30 | 25 | 40 |
Time Saved (s) | 30 | 25 | 40 |
Acronym: INP3.
In this section we will describe the needed requirements and interface setups for our FL model simulation.
Clone this GitHub repository
git clone https://github.com/edzhangsy/CS277.git
Change Directory into the folder
cd CS277
Make the setup file an executable and run
chmod +x setup.sh
./setup.sh
We combine the code for three roles in one repository.
To run the cuurent setup, begin by starting the switch and client nodes
python main.oy
Run the aggregator node
python main.py agg
After open up the localhost webpage and change the url route to /train
Note: Remember to start the server in background so it doesn't get killed when you disconnet your ssh.
The aggregator will first read the config.json
file.
Then it will call the other machines' config
interface and send them their config.
The config for other clients are stored under the others
dictionary.
The key is the IP address, The value is the config.
Send the config to the corresponding client.
After receiving the config, which is a json from the aggregator, the other servers will register the blueprint dynamically based on the type
in the config.
Check the config.json
file and modify configs if needed.
The configs should be self-explainatory.
When the type is client, there is client_number
and index
.
Which indicates how many clients is used in this experiment, and current client's index.
This is useful to divide the training and testing data set.
For example, a client is index 0 among 4 clients, so he will slice the data set into 4 slices, and he operates on the index 0 of the slices.
When you want to start another experiment, just edit the config.json
and restart the aggregator.
Because different types of servers have their corresponding unique interfaces, we separate them into different blueprints.
See the flask documentation for what is blueprint.
If you are developing the client
, just edit the client.py
and add interfaces.
The config that is received from the server is stored in the config
variable in client.py
or switch.py
For example, if you are developing the client, the config that received from the server should be accessible in local config
file.
If you want to send some data to the switch, just read the config, and get which address you should send to.
Then send to the interface of that address, for example, http://10.10.1.5:5000/s/receive
.
Then, look at the status code!!
If it's 200, that is successful.
Look for the flask documentation for how to check the status code.
Also, you should write the log into the local log
directory.
Just use some dictionary to store the logs.
For example, the log dictionary can look like this.
{
"iteration": [
{
"start_time": "timestamp",
"end_time": "timestamp",
"byte_received": 50,
"byte_send": 100
},
{
"start_time": "timestamp",
"end_time": "timestamp",
"byte_received": 50,
"byte_send": 100
},
{
"start_time": "timestamp",
"end_time": "timestamp",
"byte_received": 50,
"byte_send": 100
}
]
}
There are 15 machines on the cloudlab.
The 15 machines are connected physically using one switch.
And the address beginning with 10.10
is the local address.
The node0
has address 10.10.1.1
.
The node1
has address 10.10.1.2
.
So on and so forth.
For convenience, let's use the node14, 10.10.1.15
as the aggregator.
Remember not to add unless files when you commit.
Remember to start the all the clients first, then start the aggregator
Maybe you should open the port 5000 using the iptables
SEAL is an open source homomorphic encryption library developed by the Cryptography and Privacy Research Group at Microsoft.
For more information, visit the Microsoft SEAL GitHub repository.
Initially the PySEAL library was chosen for its ease of use being directly compatible with our setup written in Python.
Note: Microsoft SEAL library does not work with Tensors out of the box.
Since PySEAL simply invokes a python wrapper to the Microsoft SEAL library, we will have to modify futher if we want to use it on Tensors.
Below are the directions for running PySEAL and for more information, visit the PySEAL GitHub repository.
PySeal library should be compiled first.
After compilation, you can see the seal.*.so
Copy it under the directory of the this repo, and you should be able to use it by import seal
You can run the seal.sh
to set it up.
The examples to use the seal is included in the 5_ckks_basics.py
.
Because the seal-python is using the pybind to bind the original c++ library, we are dealing with python objects wrapping the c++ objects.
It's useful to use the dir()
function to look at what methods are available for use.
For example, after generating the secret_key
, use the dir(secret_key)
and we can find the save
, load
and to_string
methods.
TanSEAL is a library built on top of the Microsoft SEAL library.
It introduces extra features such as Dot Product
and Tensors
that makes Machine Learning applications easy to invoke FHE.
The examples to use the seal is included in the tenseal_ckks.py
.
For more information, visit the TenSEAL GitHub repository.
For our ML model, we took an existing Pytorch implementation of a basic 2-layer neural network training on the MNIST dataset.
We have made some modifications to the mnist.py
file where it writes the weights and biases vectors into a json file allowing us to then call TenSEAL and encrypt the training tensors directly with FHE.
When we complete our Federated Learning model, the client will call the replace_weights_mnist.py
where it will load the averaged weights returned by the Aggregator and resume training.
In this section we will describe in more detail the the simulation logic for each scenario.
For the Base Case, the Client Nodes
will begin by training its ML models locally.
Then after some training iterations, the client will send its parameters up to the Aggregator Nodes
.
In this scenario, our Switch Nodes
will act as dumb switches that simply forward the files as they come up to the Aggregator Nodes
.
Once the Aggregator Node
receives all the necessary files it will aggregate, average them, then send the new set of files back to all Client Nodes
.
Finally, once the Client Nodes
receives the files back from the Aggregator Node
, it will updates its values in the ML model and continue training through a warmstart.
For the Base Case with FHE, the Client Nodes
will begin by training its ML models locally.
Then after completing its training, the client will encrypt the parameters and send it as a ciphertext up to the Aggregator Nodes
.
In this scenario, our Switch Nodes
will act as dumb switches that simply forward the files as they come up to the Aggregator Nodes
.
Once the Aggregator Node
receives all the necessary files it will aggregate, average them, then decrypt the ciphertext back to plaintext before sending the new set of files back to all Client Nodes
.
Finally, once the Client Nodes
receives the files back from the Aggregator Node
, it will updates its values in the ML model and continue training through a warmstart.
For the Base Case with FHE with In-Network Processing, the Client Nodes
will begin by training its ML models locally.
Then after completing its training, the client will encrypt the parameters and send it as a ciphertext up to the Aggregator Nodes
.
In this scenario, our Switch Nodes
will help decrease the load for the aggregator and perform local aggregation combining two files into one before forwarding the files up to the Aggregator Nodes
.
Once the Aggregator Node
receives all the necessary files it will aggregate the remaining files, average them, then decrypt the ciphertext back to plaintext before sending the new set of files back to all Client Nodes
.
Finally, once the Client Nodes
receives the files back from the Aggregator Node
, it will updates its values in the ML model and continue training through a warmstart.