Stars
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
A research prototype of a human-centered web agent
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
All-in-one Web Agent framework for post-training. Start building with a few clicks!
The development and future prospects of multimodal reasoning models.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
ACE-Step: A Step Towards Music Generation Foundation Model
verl: Volcano Engine Reinforcement Learning for LLMs
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Agent S: an open agentic framework that uses computers like a human
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
A simple screen parsing tool towards pure vision based GUI agent
Explore the Multimodal “Aha Moment” on 2B Model
A mini, open-weights, version of our Proxy assistant.
[ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
English pronunciation correction teacher built with gemini
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Enforce the output format (JSON Schema, Regex etc) of a language model
Retrieval and Retrieval-augmented LLMs
经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新