OLake

The fastest open-source tool for replicating databases to Apache Iceberg. OLake, an easy-to-use web interface and a CLI for efficient, scalable, & real-time data ingestion. Visit olake.io/docs for the full documentation, and benchmarks

🚀 Getting Started with OLake UI (Recommended)

OLake UI is a web-based interface for managing OLake jobs, sources, destinations, and configurations. You can run the entire OLake stack (UI, Backend, and all dependencies) using Docker Compose. This is the recommended way to get started.

Quick Start (2 step process):

Start OLake UI via docker compose:

curl -sSL https://raw.githubusercontent.com/datazip-inc/olake-ui/master/docker-compose.yml | docker compose -f - up -d

Access the UI:
- OLake UI: http://localhost:8000
- Log in with default credentials: admin / password.

Detailed getting started using OLake UI can be found here.

Creating Your First Job

With the UI running, you can create a data pipeline in a few steps:

Create a Job: Navigate to the Jobs tab and click Create Job.
Configure Source: Set up your source connection (e.g., PostgreSQL, MySQL, MongoDB).
Configure Destination: Set up your destination (e.g., Apache Iceberg with a Glue, REST, Hive, or JDBC catalog).
Select Streams: Choose which tables to sync and configure their sync mode (CDC or Full Refresh).
Configure & Run: Give your job a name, set a schedule, and click Create Job to finish.

For a detailed walkthrough, refer to the Jobs documentation.

Performance Benchmarks*

OLake is engineered for high-throughput data replication.

Postgres Connector to Apache Iceberg: (See Detailed Benchmark)
- Full load: Syncs at 46,262 RPS for 4 billion rows. (101x Airbyte, 11.6x Estuary, 3.1x Debezium)
- CDC: Syncs at 36,982 RPS for 50 million changes. (63x Airbyte, 12x Estuary, 2.7x Debezium)
MongoDB Connector to Apache Iceberg: (See Detailed Benchmark)
- Syncs 35,694 records/sec, replicating a 664 GB dataset (230 million rows) in 46 minutes. (20× Airbyte, 15× Debezium, 6× Fivetran)

*These are preliminary results. Fully reproducible benchmark scores will be published soon.

Getting Started with OLake

Install OLake

Below are different ways you can run OLake:

Source / Connectors

Writers / Destination

Apache Iceberg Docs
1. Catalogs
2. Azure ADLS Gen2
3. Google Cloud Storage (GCS)
4. MinIO (local)
5. Iceberg Table Management
  1. S3 Tables Supported
Parquet Writer

Source Connectors

Functionality	MongoDB	Postgres	MySQL
Full Refresh Sync	✅	✅	✅
Incremental Sync	WIP	WIP	WIP
CDC Sync	✅	✅	✅
Full Load Parallel Processing	✅	✅	✅
CDC Parallel Processing	✅	❌	❌
Resumable Full Load	✅	✅	✅
CDC Heartbeat (Planned)	-	-	-

Destination Writers

Functionality	Local Filesystem	AWS S3	Apache Iceberg
Flattening & Normalization	✅	✅	✅
Partitioning	✅	✅	✅
Schema Data Type Changes	✅	✅	WIP
Schema Evolution	✅	✅	✅

Supported Catalogs For Iceberg Writer

Catalog	Status
Glue Catalog	Supported
Hive Metastore	Supported
JDBC Catalog	Supported
REST Catalog	Supported (with AWS S3 table)
Azure Purview	Not Planned, submit a request
BigLake Metastore	Not Planned, submit a request

⚙️ Core Framework & CLI

For advanced users and automation, OLake's core logic is exposed via a powerful CLI. The core framework handles state management, configuration validation, logging, and type detection. It interacts with drivers using four main commands:

spec: Returns a render-able JSON Schema for a connector's configuration.
check: Validates connection configurations for sources and destinations.
discover: Returns all available streams (e.g., tables) and their schemas from a source.
sync: Executes the data replication job, extracting from the source and writing to the destination.

Find out more about how OLake works here.

Playground

🗺️ Roadmap

Check out our GitHub Project Roadmap and the Upcoming OLake Roadmap to track what's next. If you have ideas or feedback, please share them in our GitHub Discussions or by opening an issue.

❤️ Contributing

We ❤️ contributions, big or small! Check out our Bounty Program. A huge thanks to all our amazing contributors!

To contribute to the OLake core, see CONTRIBUTING.md.
To contribute to the UI, visit the OLake UI Repository.
To contribute to our website and documentation, visit the Olake Docs Repository.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
.githooks		.githooks
.github		.github
constants		constants
destination		destination
drivers		drivers
examples		examples
pkg		pkg
protocol		protocol
types		types
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.trivyignore		.trivyignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
connector.go		connector.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.work		go.work
release-tool.sh		release-tool.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OLake

🚀 Getting Started with OLake UI (Recommended)

Quick Start (2 step process):

Creating Your First Job

Performance Benchmarks*

Getting Started with OLake

Install OLake

Source / Connectors

Writers / Destination

Source Connectors

Destination Writers

Supported Catalogs For Iceberg Writer

⚙️ Core Framework & CLI

Playground

🗺️ Roadmap

❤️ Contributing

About

Uh oh!

Releases 6

Uh oh!

Contributors 23

Uh oh!

Languages

License

datazip-inc/olake

Folders and files

Latest commit

History

Repository files navigation

OLake

🚀 Getting Started with OLake UI (Recommended)

Quick Start (2 step process):

Creating Your First Job

Performance Benchmarks*

Getting Started with OLake

Install OLake

Source / Connectors

Writers / Destination

Source Connectors

Destination Writers

Supported Catalogs For Iceberg Writer

⚙️ Core Framework & CLI

Playground

🗺️ Roadmap

❤️ Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 23

Uh oh!

Languages