This project includes several Python scripts that manage and analyze Ethereum MEV block data.
Additionally, Jupyter Notebook versions of the MEV_boost_EDA, MEV_boost_ML, and feature_select scripts are provided for easier viewing and interaction. During the development phase, the Spyder IDE was primarily used, offering a convenient variable explorer for debugging and enhancing model training speed. This choice is particularly important as the dataset exceeds 3 million entries, which can strain the Jupyter kernel and slow down processing.
For a deeper understanding of this research👉 you can watch the video on YouTube:
- The
get_parquet.py
script has completed execution. The output file is stored in thedata
folder. - The
data_process.py
script has been imported into other programs and does not need to be run independently. - The images generated by the
MEV_boost_EDA.py
andMEV_boost_ML.py
scripts are stored in thegraphs
folder.
- Purpose: Verifies the format of the file
ethereum_mev_blocks_19580000_to_19589999.parquet
. - Functionality:
- Performs basic data processing.
- Saves the processed data as a CSV file.
- Purpose: Handles data cleaning and feature engineering.
- Functionality:
- Processes the payload data to identify the winning bids within the bids data.
- Stores the results in a DataFrame called
matched_df
.
- Purpose: Performs statistical analysis on various datasets.
- Functionality:
- Analyzes the
bids data
,payload data
, andmatched_df
DataFrame. - Provides insights and visualizations to understand the characteristics and distributions within these datasets.
- Analyzes the
- Purpose: Dedicated to model training, evaluation, and optimisation.
- Functionality:
- Uses cleaned and processed data to train machine learning models.
- Evaluates the model performances.
- Chooses best hyperparameters and slot ranges.
The data used in this project is sourced from the following:
- Eden Public Data: Eden Network Public Data Overview
- MEV-Boost Winning Bid Data: MEV-Boost Data Repository
This repository contains a collection of public domain Ethereum MEV-Boost winning bid data.
The original dataset is large, and for the purposes of model training, it was initially set up with 3 million records. In the GitHub example, this has been reduced to 500,000 records for efficiency. The dataset file used is Eden_MEV-Boost_bid_20240404.csv
.
Due to the reduced sample size, you might encounter errors in the slot_range
parameter while running the script MEV_boost_ML.py
. Specifically, if you use the following lines:
best_train_size1, best_rf1 = RF_turning(8787590, 1201, ...)
best_train_size2, best_rf2 = RF_turning(8787590, 1201, ...)
best_train_size3, best_rf3 = RF_turning(8787590, 1201, ...)
You may encounter issues because of the insufficient parameter settings. To resolve this, you can:
-
Adjust the Slot Range Parameter: Change the
1201
value to201
in theRF_turning
function calls:best_train_size1, best_rf1 = RF_turning(8787590, 201, ...) best_train_size2, best_rf2 = RF_turning(8787590, 201, ...) best_train_size3, best_rf3 = RF_turning(8787590, 201, ...)
-
Download the Full Dataset: Alternatively, you can query and download the complete dataset from Eden Public Data. Use the following SQL query to retrieve the data:
SELECT block_timestamp, relay, slot, block_hash, gas_used, value, num_tx, block_number, timestamp, optimistic_submission FROM `eden-data-public.mev_boost.bids` WHERE TIMESTAMP_TRUNC(block_timestamp, DAY) BETWEEN TIMESTAMP("2024-02-01") AND TIMESTAMP("2024-04-04") ORDER BY block_timestamp DESC LIMIT 3000000
After downloading the complete dataset, replace the
Eden_MEV-Boost_bid_20240404.csv
file in your local directory with the newly downloaded file.
Ensure you have Python 3.6+ installed. You can install the required dependencies via pip:
pip install pandas numpy matplotlib seaborn sklearn