Stars
Class materials for a distributed systems lecture series
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Make awesome display tables using Python
A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀
A curated list of awesome blogs, videos, tools and resources about Data Contracts
An Awesome List of projects using the pyproject.toml Python configuration file.
Format and convert Python docstrings and generates patches
The Metadata Platform for your Data and AI Stack
DuckDB is an analytical in-process SQL database management system
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Always know what to expect from your data.
🙃 A delightful community-driven (with 2,400+ contributors) framework for managing your zsh configuration. Includes 300+ optional plugins (rails, git, macOS, hub, docker, homebrew, node, php, python…
Visualize and compare datasets, target values and associations, with one line of code.
Algorithms for outlier, adversarial and drift detection
Create HTML profiling reports from Apache Spark DataFrames
Visualizer for neural network, deep learning and machine learning models
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
A Python package to assess and improve fairness of machine learning models.
A flexible, high-performance serving system for machine learning models
Make drawing and labeling bounding boxes easy as cake
Cross-platform, customizable ML solutions for live and streaming media.