-
P.h.D student@University of Adelaide
- Sydney, Australia
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
A Self-Training Framework for Vision-Language Reasoning
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, realistic, and adaptive scene generation for applications in…
Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
Large Language Model (LLM) Systems Paper List
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
GRPO Algorithm for Llava Architecture (Based on Verl)
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
[LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Understanding R1-Zero-Like Training: A Critical Perspective
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"
Collections of Papers and Projects for Multimodal Reasoning.
Paper list for Efficient Reasoning.
Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
Official implementation of UnifiedReward & UnifiedReward-Think
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation