-
Fudan University
- Shanghai, China
- https://xinyu1205.github.io
Stars
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
Official inference repo for FLUX.1 models
This repo contains the python code as well as the webpage html files for the Spice-E project from VAILab at TAU.
Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enou…
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
High-Resolution Image Synthesis with Latent Diffusion Models
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Understanding R1-Zero-Like Training: A Critical Perspective
An Open-source RL System from ByteDance Seed and Tsinghua AIR
No fortress, purely open ground. OpenManus is Coming.
verl: Volcano Engine Reinforcement Learning for LLMs
Explore the Multimodal “Aha Moment” on 2B Model
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Solve Visual Understanding with Reinforced VLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
A very simple GRPO implement for reproducing r1-like LLM thinking.
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.