Stars
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Best practices & guides on how to write distributed pytorch training code
Triton implementation of FlashAttention2 that adds Custom Masks.
[MalayMMLU] This is the first-ever Bahasa Melayu multitask benchmark designed to elevate the performance of Large Language Models (LLMs) and Large Vision Language Models (LVLMs).
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
This is a Sarawak Malay speech and text data for the purpose of speech technology research. The data was collected by Faculty of Computer Science and Information Technology, Universiti Malaysia Sar…
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.
WebLINX is a benchmark for building web navigation agents with conversational capabilities
A small automatic differentiation engine, supporting higher-order derivatives
A translation app for Bahasa Melayu to English with support for Manglish and bahasa pasar.
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
AI powered speech denoising and enhancement
HTTP/WebSocket proxy for starlette/FastAPI
Machine Learning Engineering Open Book
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Fast inference engine for Transformer models
Fast & Simple repository for pre-training and fine-tuning T5-style models
A Data Streaming Library for Efficient Neural Network Training
A family of diffusion models for text-to-audio generation.
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
🔊 Text-Prompted Generative Audio Model