8000 Cescfangs (Cesc) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Cescfangs's full-sized avatar

Block or report Cescfangs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 807 85 Updated Mar 4, 2024

A research prototype of a human-centered web agent

Python 5,000 469 Updated Jun 6, 2025

Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay

Python 73 5 Updated May 29, 2025

All-in-one Web Agent framework for post-training. Start building with a few clicks!

Python 255 18 Updated Jun 4, 2025

The development and future prospects of multimodal reasoning models.

341 14 Updated May 29, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,168 43 Updated May 21, 2025

ACE-Step: A Step Towards Music Generation Foundation Model

Python 2,390 232 Updated Jun 4, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 9,109 1,174 Updated Jun 7, 2025
Python 3,652 360 Updated May 13, 2025

R1-like Computer-use Agent

Python 74 3 Updated Mar 21, 2025

Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"

Python 108 7 Updated May 26, 2025

Agent S: an open agentic framework that uses computers like a human

Python 5,373 541 Updated Jun 6, 2025

This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…

Python 590 13 Updated May 7, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 22,362 1,880 Updated Mar 26, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 592 20 Updated Mar 18, 2025

A mini, open-weights, version of our Proxy assistant.

Python 924 135 Updated Feb 26, 2025

[ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Python 32 Updated Feb 21, 2025
Python 6,345 419 Updated May 21, 2025

This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.

374 14 Updated Jun 4, 2025

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 62,577 7,052 Updated Jun 7, 2025

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 17,473 1,889 Updated Apr 4, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,387 655 Updated May 31, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,379 1,504 Updated Jun 2, 2025

English pronunciation correction teacher built with gemini

Python 1,060 132 Updated Jan 16, 2025

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Python 1,695 124 Updated Jul 10, 2024

Structured Text Generation

Python 11,719 596 Updated Jun 6, 2025

Enforce the output format (JSON Schema, Regex etc) of a language model

Python 1,817 78 Updated Feb 26, 2025

Retrieval and Retrieval-augmented LLMs

Python 9,859 723 Updated Jun 4, 2025

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

CSS 24,459 1,982 Updated Jun 6, 2025
Next
0