-
Alibaba
- HangZhou
-
18:53
(UTC +08:00) - http://wangfakang.github.io
Lists (3)
Sort Name ascending (A-Z)
Stars
Pipeline Parallelism Emulation and Visualization
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
DeepEP: an efficient expert-parallel communication library
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
Fast OS-level support for GPU checkpoint and restore
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
prime is a framework for efficient, globally distributed training of AI models over the internet.
tee-like program that tee-s stdin to a rotated log file(s) and can compress them.
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
HTNN: A cloud-native gateway offering seamless extensibility for Istio and Envoy, in a native way by Go.
MSCCL++: A GPU-driven communication stack for scalable AI applications
oneAPI Collective Communications Library (oneCCL)
DeepLearning Framework Performance Profiling Toolkit
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A library to analyze PyTorch traces.
Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)