Stars
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
Free, simple, fast interactive diagrams for any GitHub repository
Technical report of Kimina-Prover Preview.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstra…
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of lists of statically-shaped tensors, referred to as a Fractal…
A beautiful, simple, clean, and responsive Jekyll theme for academics
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
nnScaler: Compiling DNN models for Parallel Training
MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relati…
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
A unified 3D Transformer Pipeline for visual synthesis
Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4
A validation and profiling tool for AI infrastructure
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
A decoupled transaction component providing transaction processing for applications
Resource scheduling and cluster management for AI
Extension to connect OpenPAI clusters, submit AI jobs, simulate jobs locally, manage files, and so on.
A marketplace which stores examples and job templates of openpai. Users could use openpaimarketplace to share their jobs or run-and-learn others' sharing job.