Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,772 68 Updated May 19, 2025

LMCache / LMCache

Redis for LLMs

Python 1,123 165 Updated May 21, 2025

x1xhlol / system-prompts-and-models-of-ai-tools

FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.

50,351 15,471 Updated May 21, 2025

coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Rust 839 101 Updated May 7, 2025

zhouwg / ggml-hexagon

Forked from ggml-org/llama.cpp

try to build a fully open-source ggml-hexagon backend for llama.cpp on Android phone equipped with Qualcomm's Hexagon NPU, details can be seen at https://github.com/zhouwg/ggml-hexagon/discussions/18

C++ 20 Updated May 21, 2025

chraac / llama.cpp

Forked from ggml-org/llama.cpp

LLM inference in C/C++

C++ 41 4 Updated May 16, 2025

adah1972 / nvwa

My small collection of C++ utilities

C++ 402 110 Updated Apr 29, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,072 373 Updated May 21, 2025

odygrd / quill

Asynchronous Low Latency C++ Logging Library

C++ 2,193 208 Updated May 21, 2025

vllm-project / production-stack

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 1,251 188 Updated May 21, 2025

ImagineAILab / ai-by-hand-excel

4,759 580 Updated Jan 28, 2025

meta-llama / llama-models

Utilities intended for use with Llama models.

Python 7,011 1,155 Updated May 7, 2025

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,046 154 Updated Jul 29, 2023