Lists (1)
Sort Name ascending (A-Z)
Stars
Secure and fast microVMs for serverless computing.
An understandable, fast and scalable Raft Consensus implementation
Distributed stream processing engine in Rust
ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
A library for building fast, reliable and evolvable network services.
Apache DataFusion Comet Spark Accelerator
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as Delta Lake, Apache Hudi, and Apache Iceberg.
A library that provides an in-memory Kafka instance to run your tests against.
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, Du…
Transmute-free Rust library to work with the Arrow format
Apache Doris is an easy-to-use, high performance and unified analytics database.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
IDE-style autocomplete for your existing terminal & shell
A highly efficient daemon for streaming data from Kafka into Delta Lake
This repository is for the active development of the Azure SDK for Rust. For consumers of the SDK we recommend visiting Docs.rs and looking up the docs for any of libraries in the SDK.
A native Rust library for Delta Lake, with bindings into Python
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
lakeFS - Data version control for your data lake | Git for data
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache Spark - A unified analytics engine for large-scale data processing
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.