8000 wangfakang (sky) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View wangfakang's full-sized avatar

Organizations

@envoyproxy

Block or report wangfakang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pipeline Parallelism Emulation and Visualization

Python 45 3 Updated Jun 12, 2025

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 803 68 Updated Jun 3, 2025

Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport

Cuda 51 2 Updated May 9, 2025

Perplexity GPU Kernels

C++ 377 44 Updated Jun 10, 2025

official implementation of paper SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

Python 38 7 Updated Dec 11, 2024

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,840 279 Updated May 15, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,208 820 Updated Jun 26, 2025

Morpheus SDK

Python 490 177 Updated Jun 25, 2025

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Python 984 97 Updated Sep 29, 2022

Fast OS-level support for GPU checkpoint and restore

C++ 199 20 Updated Jun 18, 2025
HTML 203 36 Updated May 30, 2025

CUDA checkpoint and restore utility

C 345 19 Updated Jan 27, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,854 1,526 Updated Jun 26, 2025

prime is a framework for efficient, globally distributed training of AI models over the internet.

Python 771 82 Updated May 22, 2025
42 3 Updated Nov 5, 2024

tee-like program that tee-s stdin to a rotated log file(s) and can compress them.

C++ 13 5 Updated Jan 28, 2018

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 179 25 Updated Jun 7, 2025

HTNN: A cloud-native gateway offering seamless extensibility for Istio and Envoy, in a native way by Go.

Go 113 37 Updated Jun 15, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 380 57 Updated Jun 26, 2025

oneAPI Collective Communications Library (oneCCL)

C++ 237 81 Updated Jun 12, 2025

DeepLearning Framework Performance Profiling Toolkit

Python 285 27 Updated Mar 28, 2022

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

C++ 186 32 Updated Aug 17, 2023

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 42,226 7,041 Updated Dec 9, 2024

A library to analyze PyTorch traces.

Python 391 60 Updated Jun 23, 2025

ROCm Communication Collectives Library (RCCL)

C++ 342 157 Updated Jun 26, 2025

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators

C++ 95 29 Updated Jun 13, 2025
Python 84 39 Updated Dec 11, 2019

NCCL Profiling Kit

Python 138 12 Updated Jul 1, 2024

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,376 470 Updated Jun 26, 2025
Next
0