-
Peak Labs
- Boulder, Colorado
- https://magi.com
- @peakji
Highlights
- Pro
Stars
how to optimize some algorithm in cuda.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Header-only C++/python library for fast approximate nearest neighbors
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
OLMoE: Open Mixture-of-Experts Language Models
Developer-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!
Everything we actually know about the Apple Neural Engine (ANE)
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
A natural language interface for computers
Build real-time multimodal AI applications 🤖🎙️📹
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
A generative speech model for daily dialogue.
Minimal container for Chrome's headless shell, useful for automating / driving the web
OpenGFW is a flexible, easy-to-use, open source implementation of GFW (Great Firewall of China) on Linux
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Make images smaller using best-in-class codecs, right in the browser.
A blazing fast inference solution for text embeddings models
Retrieval and Retrieval-augmented LLMs
A tokenizer based on Unicode text segmentation (UAX #29), for Go. Split words, sentences and graphemes.