8000 aksarben09's list / llm · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View aksarben09's full-sized avatar

Highlights

  • Pro

Block or report aksarben09

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

llm

Large Language Models
12 repositories

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 249 17 Updated Oct 28, 2024

Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI

Python 281 21 Updated Mar 19, 2025

This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.

Python 102 5 Updated Jul 1, 2024

Reverse Engineering: Decompiling Binary Code with Large Language Models

Python 5,604 379 Updated May 23, 2025

Official inference framework for 1-bit LLMs

Python 19,814 1,480 Updated May 23, 2025
Jupyter Notebook 90 16 Updated Dec 23, 2024

Run PyTorch LLMs locally on servers, desktop and mobile

Python 3,583 249 Updated May 20, 2025

FlashAttention (Metal Port)

Swift 486 27 Updated Sep 22, 2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Python 107 15 Updated Oct 15, 2024

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,435 548 Updated May 22, 2025

fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。

C++ 3,574 366 Updated May 20, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,553 1,448 Updated May 25, 2025
0