gipsyh

Yuheng Su gipsyh

47 followers · 170 following

ISCAS; UCAS
Beijing, China
11:12 (UTC +08:00)
https://gipsyh.github.io/
https://orcid.org/0009-0009-2571-8135

Achievements

Highlights

Starred repositories

gipsyh / evaltor

Rust 2 Updated May 25, 2025

Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Rust 4,372 183 Updated May 24, 2025

linux-rdma / rdma-core

RDMA core userspace libraries and daemons

C 1,806 745 Updated May 6, 2025

infinigence / FlashOverlap

A lightweight design for computation-communication overlap.

Cuda 128 5 Updated May 6, 2025

microsoft / nnscaler

nnScaler: Compiling DNN models for Parallel Training

Python 112 15 Updated Apr 29, 2025

rsmpi / rsmpi

MPI bindings for Rust

Rust 535 57 Updated Apr 24, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 3,732 920 Updated May 20, 2025

ByteDance-Seed / Triton-distributed

Distributed Triton for Parallel Systems

Python 761 49 Updated May 26, 2025

IST-DASLab / qmoe

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 275 21 Updated Nov 3, 2023

ISCAS-modelchecker / modelchecker

ModelChecker: A bit-level model checking tool

C++ 7 1 Updated Mar 18, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,505 1,696 Updated May 22, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,195 94 Updated May 25, 2025

vanna-ai / vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.

Python 17,817 1,587 Updated Apr 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,373 601 Updated May 20, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,696 773 Updated May 23, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,572 836 Updated Apr 29, 2025

arminbiere / cadical

CaDiCaL SAT Solver

C++ 447 150 Updated May 23, 2025

pulp-platform / FlooNoC

A Fast, Low-Overhead On-chip Network

SystemVerilog 206 37 Updated May 23, 2025

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 50,117 7,255 Updated Apr 20, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,300 2,239 Updated Feb 1, 2025

agurfinkel / btor2mlir

BTOR2 MLIR project

C++ 25 7 Updated Jan 17, 2024

CoriolisSP / FuzzBtor2

Random Generator of Btor2 Files

C++ 10 Updated Sep 2, 2023

deepseek-ai / DeepSeek-V3

Python 97,116 15,782 Updated Apr 9, 2025

fuqi-jia / BLAN

Bit-bLAsting solving Non-linear integer constraints.

C++ 21 1 Updated Jul 12, 2024

6030 dtolnay / inventory

Typed distributed plugin registration

Rust 1,112 49 Updated Mar 3, 2025

YosysHQ / eqy

Equivalence checking with Yosys

Python 43 7 Updated May 6, 2025

stepwise-alan / btor2llvm

A tool to convert btor2 files to LLVM.

Python 7 1 Updated Dec 29, 2020

google / souper

A superoptimizer for LLVM IR

C++ 2,233 175 Updated Aug 28, 2024

Froleyks / cerbotor

C++ 3 Updated Mar 4, 2025

MikePopoloski / slang

SystemVerilog compiler and language services

C++ 749 158 Updated May 25, 2025

Yuheng Su gipsyh

Highlights

Starred repositories

aig

Rust

Linux