More
Starred repositories
Official implementation of "WhisperNER: Unified Open Named Entity and Speech Recognition"
The python library for real-time communication
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A fast static site generator in a single binary with everything built-in. https://www.getzola.org
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
A specialized RWKV-7 model for Othello(a.k.a. Reversi) that predicts legal moves, evaluates positions, and performs in-context search. Its performance scales with the number of test-time tokens.
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input fr…
Python programs, usually short, of considerable difficulty, to perfect particular skills.
Bringing BERT into modernity via both architecture changes and scaling
Virtual whiteboard for sketching hand-drawn like diagrams
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Efficient Triton Kernels for LLM Training
Supercharge Your LLM Application Evaluations 🚀
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Reference implementation for DPO (Direct Preference Optimization)
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]