Stars
🎦 Video comparison player for Mac and Windows, built using Electron
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Implementation of popular deep learning networks with TensorRT network definition API
Repository for the book "Crafting Interpreters"
Various translations of OSTEP can be found here. Help the cause and contribute!
One second to read GitHub code with VS Code.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Winograd minimal convolution algorithm generator for convolutional neural networks.