Python framework for building blockchain data pipelines.
Cherry is in the early stages of development, so the API is changing, and we are still figuring things out.
We would love to help you get started and get your feedback on our telegram channel.
Core libraries we use for ingesting/decoding/validating/transforming blockchain data are implemented in cherry-core repo.
This project is sponsored by:
- Ingest data from multiple providers with a uniform interface. This makes switching providers as easy as changing a couple lines in config.
- Prebuilt functionality to decode/validate/transform blockchain data.
- Support for both Ethereum (EVM) and Solana (SVM) based blockchains.
- Write data into Clickhouse, Iceberg, Deltalake, Parquet/Arrow (via pyarrow).
- Keep datasets fresh with continuous ingestion.
We are still trying to figure out our core use cases and trying to build up to them. Here is a rough roadmap::
- Add option to ingest Solana data from geyser plugin/RPC.
- Add option to ingest EVM data from Ethereum RPC.
- Implement more advanced validation.
- Add more writers like DuckDB, PostgreSQL.
- Build an end-to-end testing flow so we can test the framework and users can test their pipelines using the same flow.
- Build a benchmark flow so we can optimize the framework and user can optimize their pipelines using the same flow. This will also make it easy to compare performance of providers and writers.
- Implement more blockchain formats like SUI, Aptos, Fuel.
- Implement automatic rollback handling. Currently we don't handle rollbacks so we stay behind the tip of the chain in order to avoid writing wrong data.
See examples
directory.
Can run examples with uv run examples/{example_name}/main.py --provider {sqd or hypersync}
For examples that require databases or other infra, run docker-compose -f examples/{example_name}/docker-compose.yaml up -d
to start the necessary docker containers.
Can run docker-compose -f examples/{example_name}/docker-compose.yaml down -v
after running the example to stop the docker containers and delete the data.
Python code uses the standard logging
module of python, so it can be configured according to python docs.
Set RUST_LOG
environment variable according to env_logger docs in order to see logs from rust modules.
To run an example with trace level logging for rust modules:
RUST_LOG=trace uv run examples/{example_name}/main.py --provider {sqd or hypersync}
This repo uses uv
for development.
- Format the code with
uv run ruff format
- Lint the code with
uv run ruff check
- Run type checks with
uv run pyright
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.