Lists (8)
Sort Name ascending (A-Z)
Starred repositories
Awesome Unified Multimodal Models
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
Official PyTorch implementation for "Large Language Diffusion Models"
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
[ICML'25] The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Wan: Open and Advanced Large-Scale Video Generative Models
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
🚀🚀🚀A curated list of papers on controllable video generation.
[CSUR] A Survey on Video Diffusion Models
Collect some World Models for Autonomous Driving (and Robotic) papers.
The repository for paper 'Task-Oriented Communications for Visual Navigation with Edge-Aerial Collaboration in Low Altitude Economy'.
This is the official implenmentation of "CP-Guard: Malicious agent detection and defense in collaborative bird's eye view segmentation"
A comprehensive list of excellent research papers, models, datasets, and other resources on Vision-Language-Action (VLA) models in robotics.
A modern, responsive academic personal website.
All you need for Multi-Agent Autonomous Driving (MAAD)
Recent multi-robot projects and papers: Including SLAM, place recognition, Large Language Models navigation. (continually updated)
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
A generalized framework for subspace tuning methods in parameter efficient fine-tuning.