8000 LiuShuoJiang (Shuojiang Liu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View LiuShuoJiang's full-sized avatar

Highlights

  • Pro

Block or report LiuShuoJiang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Leetcode for Pytorch

Jupyter Notebook 440 72 Updated May 18, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,567 1,240 Updated May 20, 2025

Solutions for Object Oriented Design Problems

590 125 Updated Aug 28, 2022

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,772 68 Updated May 19, 2025

Redis for LLMs

Python 1,123 165 Updated May 21, 2025

FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.

50,351 15,471 Updated May 21, 2025

Safe rust wrapper around CUDA toolkit

Rust 839 101 Updated May 7, 2025

try to build a fully open-source ggml-hexagon backend for llama.cpp on Android phone equipped with Qualcomm's Hexagon NPU, details can be seen at https://github.com/zhouwg/ggml-hexagon/discussions/18

C++ 20 Updated May 21, 2025

LLM inference in C/C++

C++ 41 4 Updated May 16, 2025

My small collection of C++ utilities

C++ 402 110 Updated Apr 29, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,072 373 Updated May 21, 2025

Asynchronous Low Latency C++ Logging Library

C++ 2,193 208 Updated May 21, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 1,251 188 Updated May 21, 2025

Utilities intended for use with Llama models.

Python 7,011 1,155 Updated May 7, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,046 154 Updated Jul 29, 2023

learning how CUDA works

Cuda 261 35 Updated Mar 3, 2025
Verilog 1,514 326 Updated May 18, 2025

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

C++ 7,371 469 Updated May 21, 2025

Integrate the DeepSeek API into popular softwares

32,414 3,561 Updated May 13, 2025

cuML - RAPIDS Machine Learning Library

C++ 4,711 573 Updated May 21, 2025

real time face swap and one-click video deepfake with only a single image

Python 68,513 9,643 Updated May 21, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 28,192 1,763 Updated Mar 21, 2025

Introduction to Machine Learning Systems

TeX 1,842 216 Updated May 21, 2025

Efficient Triton Kernels for LLM Training

Python 5,042 326 Updated May 21, 2025

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 39,113 3,064 Updated May 21, 2025

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,318 459 Updated May 16, 2025

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python 772 79 Updated May 21, 2025

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

Python 89 6 Updated May 21, 2025
Next
0