-
Microsoft Research
- https://hypjudy.github.io/website/
Highlights
- Pro
Stars
Official inference repo for FLUX.1 models
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
verl: Volcano Engine Reinforcement Learning for LLMs
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Minimal reproduction of DeepSeek R1-Zero
Open-Sora: Democratizing Efficient Video Production for All
A fork to add multimodal model training to open-r1
Fully open reproduction of DeepSeek-R1
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
A curated list of recent diffusion models for video generation, editing, and various other applications.
A high-throughput and memory-efficient inference and serving engine for LLMs
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
A generative world for general-purpose robotics & embodied AI learning.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Playing Pokemon Red with Reinforcement Learning
A suite of image and video neural tokenizers
Get your documents ready for gen AI
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
Official inference framework for 1-bit LLMs
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Deezer source separation library including pretrained models.
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pai 4C75 rs data (about 600k including English/Chinese)
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.
The RedPajama-Data repository contains code for preparing large datasets for training large language models.