8000 GitHub - duckdb/ducklake: DuckLake is an integrated data lake and catalog format
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

duckdb/ducklake

Repository files navigation

DuckLake logo

DuckDB DuckLake Extension

DuckLake is released under version 0.1 and is currently experimental. If you encounter any issues, please file them here.

DuckLake is an open Lakehouse format that is built on SQL and Parquet. DuckLake stores metadata in a catalog database, and stores data in Parquet files. The DuckLake extension allows DuckDB to directly read and write data from DuckLake.

See the DuckLake website for more information.

Installation

DuckLake can be installed using the INSTALL command:

INSTALL ducklake;

The latest development version can be installed from core_nightly:

FORCE INSTALL ducklake FROM core_nightly;

Usage

DuckLake databases can be attached using the ATTACH syntax, after which tables can be created, modified and queried using standard SQL.

Below is a short usage example that stores the metadata in a DuckDB database file called metadata.ducklake, and the data in Parquet files in the file_path directory:

ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'file_path/');
USE my_ducklake;
CREATE TABLE my_ducklake.my_table(id INTEGER, val VARCHAR);
INSERT INTO my_ducklake.my_table VALUES (1, 'Hello'), (2, 'World');
FROM my_ducklake.my_table;
┌───────┬─────────┐
│  id   │   val   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ Hello   │
│     2 │ World   │
└───────┴─────────┘
Updates
UPDATE my_ducklake.my_table SET val='DuckLake' WHERE id=2;
FROM my_ducklake.my_table;
┌───────┬──────────┐
│  id   │   val    │
│ int32 │ varchar  │
├───────┼──────────┤
│     1 │ Hello    │
│     2 │ DuckLake │
└───────┴──────────┘
Time Travel
FROM my_ducklake.my_table AT (VERSION => 2);
┌───────┬─────────┐
│  id   │   val   │
│ int32 │ varchar │
├───────┼─────────┤
│     1 │ Hello   │
│     2 │ World   │
└───────┴─────────┘
Schema Evolution
ALTER TABLE my_ducklake.my_table ADD COLUMN new_column VARCHAR;
FROM my_ducklake.my_table;
┌───────┬──────────┬────────────┐
│  id   │   val    │ new_column │
│ int32 │ varcharvarchar   │
├───────┼──────────┼────────────┤
│     1 │ Hello    │ NULL       │
│     2 │ DuckLake │ NULL       │
└───────┴──────────┴────────────┘
Change Data Feed
FROM my_ducklake.table_changes('my_table', 2, 2);
┌─────────────┬───────┬─────────────┬───────┬─────────┐
│ snapshot_id │ rowid │ change_type │  id   │   val   │
│    int64    │ int64 │   varchar   │ int32 │ varchar │
├─────────────┼───────┼─────────────┼───────┼─────────┤
│           20 │ insert      │     1 │ Hello   │
│           21 │ insert      │     2 │ World   │
└─────────────┴───────┴─────────────┴───────┴─────────┘

See the Usage guide for more information.

Building & Loading the Extension

To build, type

# to build with multiple cores, use `make GEN=ninja release`
make pull
make

To run, run the bundled duckdb shell:

 ./build/release/duckdb

About

DuckLake is an integrated data lake and catalog format

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0