-
dianping.com
- shanghai
Stars
FireFlyer Record file format, writer and reader for DL training samples.
Java bindings for https://github.com/facebookincubator/velox
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Fluss is a streaming storage built for real-time analytics.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
dask / fastparquet
Forked from jcrobak/parquet-pythonpython implementation of the parquet columnar file format.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
BibiGPT v1 · one-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube | Tweet丨TikTok丨Dropbox丨Google Drive丨Local files | Websites丨Podcasts | Meetings | Lectures, etc. 音视…
🔬 Online Heap Dump, GC Log, Thread Dump & JFR File Analyzer.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Flowchart for debugging Spark applications
A better notebook for Scala (and more)
A query predictor pipeline and service to predict resource usages of Presto queries
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Warp is a modern, Rust-based terminal with AI built in so you and your team can build great software, faster.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊
🔥 人人可用的开源 BI 工具,数据可视化神器。An open-source BI tool alternative to Tableau.
The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them
Data Lineage Tracking And Visualization Solution