Stars
Lightning fast C++/CUDA neural network framework
AivisSpeech: AI Voice Imitation System - Text to Speech Software
This repo contains CUDA-Q Academic materials, including self-paced Jupyter notebook modules for building and optimizing hybrid quantum-classical algorithms using CUDA-Q.
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.
Parlant is the open-source conversation modeling engine for building better, deliberate Agentic UX. It gives you the power of LLMs without the unpredictability.
This project integrates a custom CUDA-based matrix multiplication kernel into a PyTorch deep learning model, leveraging GPU acceleration for matrix operations. The goal is to compare the performanc…
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Code for the paper "Language Models are Unsupervised Multitask Learners"
An extension library of WMMA API (Tensor Core API)
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
The homepage of OneBit model quantization framework.
Witness the aha moment of VLM with less than $3.
Library for fast text representation and classification.
Fully open reproduction of DeepSeek-R1
EvaByte: Efficient Byte-level Language Models at Scale
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Scalable RL solution for advanced reasoning of language models
A collection of advanced CSS styles to create realistic-looking effects for the faces of Pokemon cards.
Explore training for quantized models