Starred repositories
Production-ready Inference, Ingestion and Indexing built in Rust 🦀
A high-throughput and memory-efficient inference and serving engine for LLMs
🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library
🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.
Plug-and-play document processing pipelines with zero-shot models.
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Tools for merging pretrained large language models.
State-of-the-Art Text Embeddings
Simple, safe way to store and distribute tensors
pyright fork with various type checking improvements, improved vscode support and pylance features built into the language server
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A Python library for calculating a large variety of metrics from text
A fast implementation of Aho-Corasick in Rust.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Integrate Git version control with automatic commit-and-sync and other advanced features in Obsidian.md
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
👽 Out-of-Distribution Detection with PyTorch
Faker is a Python package that generates fake data for you.
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Generalist and Lightweight Model for Text Classification