Ethereum MEV Blocks Analysis Project

This project includes several Python scripts that manage and analyze Ethereum MEV block data.

Additionally, Jupyter Notebook versions of the MEV_boost_EDA, MEV_boost_ML, and feature_select scripts are provided for easier viewing and interaction. During the development phase, the Spyder IDE was primarily used, offering a convenient variable explorer for debugging and enhancing model training speed. This choice is particularly important as the dataset exceeds 3 million entries, which can strain the Jupyter kernel and slow down processing.

For a deeper understanding of this research👉 you can watch the video on YouTube:

Current Status

The get_parquet.py script has completed execution. The output file is stored in the data folder.
The data_process.py script has been imported into other programs and does not need to be run independently.
The images generated by the MEV_boost_EDA.py and MEV_boost_ML.py scripts are stored in the graphs folder.

Scripts Description

`get_parquet.py`

Purpose: Verifies the format of the file ethereum_mev_blocks_19580000_to_19589999.parquet.
Functionality:
- Performs basic data processing.
- Saves the processed data as a CSV file.

`data_process.py`

Purpose: Handles data cleaning and feature engineering.
Functionality:
- Processes the payload data to identify the winning bids within the bids data.
- Stores the results in a DataFrame called matched_df.

`MEV_boost_EDA.py`

Purpose: Performs statistical analysis on various datasets.
Functionality:
- Analyzes the bids data, payload data, and matched_df DataFrame.
- Provides insights and visualizations to understand the characteristics and distributions within these datasets.

`ME 7063 V_boost_ML.py`

Purpose: Dedicated to model training, evaluation, and optimisation.
Functionality:
- Uses cleaned and processed data to train machine learning models.
- Evaluates the model performances.
- Chooses best hyperparameters and slot ranges.

Dataset Sources

The data used in this project is sourced from the following:

Eden Public Data: Eden Network Public Data Overview
MEV-Boost Winning Bid Data: MEV-Boost Data Repository

This repository contains a collection of public domain Ethereum MEV-Boost winning bid data.

Important Note

The original dataset is large, and for the purposes of model training, it was initially set up with 3 million records. In the GitHub example, this has been reduced to 500,000 records for efficiency. The dataset file used is Eden_MEV-Boost_bid_20240404.csv.

Due to the reduced sample size, you might encounter errors in the slot_range parameter while running the script MEV_boost_ML.py. Specifically, if you use the following lines:

best_train_size1, best_rf1 = RF_turning(8787590, 1201, ...)
best_train_size2, best_rf2 = RF_turning(8787590, 1201, ...)
best_train_size3, best_rf3 = RF_turning(8787590, 1201, ...)

You may encounter issues because of the insufficient parameter settings. To resolve this, you can:

Adjust the Slot Range Parameter: Change the 1201 value to 201 in the RF_turning function calls:

best_train_size1, best_rf1 = RF_turning(8787590, 201, ...)
best_train_size2, best_rf2 = RF_turning(8787590, 201, ...)
best_train_size3, best_rf3 = RF_turning(8787590, 201, ...)

Download the Full Dataset: Alternatively, you can query and download the complete dataset from Eden Public Data. Use the following SQL query to retrieve the data:

SELECT block_timestamp, relay, slot, block_hash, gas_used, value, num_tx, block_number, timestamp, optimistic_submission
FROM `eden-data-public.mev_boost.bids`
WHERE TIMESTAMP_TRUNC(block_timestamp, DAY) BETWEEN TIMESTAMP("2024-02-01") AND TIMESTAMP("2024-04-04")
ORDER BY block_timestamp DESC
LIMIT 3000000

After downloading the complete dataset, replace the Eden_MEV-Boost_bid_20240404.csv file in your local directory with the newly downloaded file.

Installation

Ensure you have Python 3.6+ installed. You can install the required dependencies via pip:

pip install pandas numpy matplotlib seaborn sklearn

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
graphs		graphs
.gitattributes		.gitattributes
.gitignore		.gitignore
Feature_select.py< 8000 /a>		Feature_select.py
MEV_boost_EDA.ipynb		MEV_boost_EDA.ipynb
MEV_boost_EDA.py		MEV_boost_EDA.py
MEV_boost_ML.ipynb		MEV_boost_ML.ipynb
MEV_boost_ML.py		MEV_boost_ML.py
README.md		README.md
README.txt		README.txt
data_process.py		data_process.py
feature_select.ipynb		feature_select.ipynb
get_parquet.py		get_parquet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ethereum MEV Blocks Analysis Project

Current Status

Scripts Description

`get_parquet.py`

`data_process.py`

`MEV_boost_EDA.py`

`ME 7063 V_boost_ML.py`

Dataset Sources

Important Note

Installation

About

Uh oh!

Releases

Packages

Languages

dyeee/MEV-Boost-Project

Folders and files

Latest commit

History

Repository files navigation

Ethereum MEV Blocks Analysis Project

Current Status

Scripts Description

get_parquet.py

data_process.py

MEV_boost_EDA.py

ME 7063 V_boost_ML.py

Dataset Sources

Important Note

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`get_parquet.py`

`data_process.py`

`MEV_boost_EDA.py`

`ME 7063 V_boost_ML.py`

Packages