8000 GitHub - GAIR-NLP/PC-Agent-E: Efficient Agent Training for Computer Use
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

GAIR-NLP/PC-Agent-E

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Agent Training for Computer Use

📄 Paper   |   🌐 Website   |   🤖 Model   |   🤗 Dataset

animation

Demo

Check out our demo of PC Agent-E autonomously controlling a computer to complete tasks on Windows and Linux systems!

Windows.mp4
Linux.mp4

Introduction

We introduce PC Agent-E, an efficient agent training framework that elicits strong computer use capabilities with remarkable data efficiency. This framework is implemented with four key components:

  1. Trajectory Collection, gathering a small set of task trajectories from human annotators with PC Tracker;
  2. Thought Completion, reconstructing the latent human thought process before each action;
  3. Trajectory Boost, synthesizing diverse alternative action decisions;
  4. Agent Training, training native agent model with augmented trajectories.

overview

Main Results

Table: Results of successful rate (%) for different models on WindowsAgentArena-V2, an improved benchmark we also released.

Models LibreOffice Chrome Edge System VS Code VLC Utils Total
Number of Tasks 42 17 13 24 19 14 12 141
Qwen2.5-VL-72B 0.0 34.7 15.4 20.8 26.3 7.6 16.7 14.9
UI-TARS-1.5-7B 7.1 34.7 23.1 45.8 21.1 7.6 16.7 21.3
UI-TARS-72B-DPO 0.0 40.6 38.5 58.3 36.8 7.6 25.0 26.2
Claude 3.7 Sonnet 2.4 46.5 61.5 54.2 52.6 29.0 16.7 32.6
Claude 3.7 Sonnet (thinking) 2.4 64.1 46.2 66.7 52.6 21.9 25.0 35.4
PC Agent-E (Ours) 4.8 64.1 46.2 50.0 57.9 35.7 33.3 36.0

Quick Start

Trajectory Collection

Collect raw human trajectory with PC Tracker. See usage here.

Post Processing

To convert raw human trajectory into high-quality trajectories for training, follow these steps:

  1. Place recorded in the data/ directory.
  2. Run post processing pipeline:
# Data refinement
python postprocess/refinement.py

# Thought completion and Trajectory Boost    
python postprocess/boost.py    

Note: You need to prepare your API key in advance.

Agent Training

You can use our dataset or build data set with above steps on your own. To prepare data for agent training, put the dataset in the data/ directory, and run:

python postprocess/prepare.py 

We recommend using LLaMA-Factory for agent training. To launch distributed training across multiple nodes, you can run:

FORCE_TORCHRUN=1 NNODES=4 NODE_RANK=${PET_NODE_RANK} MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train train/sft.yaml

Replace PET_NODE_RANK with the rank of the current node (from 0 to 3).

Agent Deployment

We provide a reference implementation of our PC Agent-E scaffold in the deploy/ directory. To deploy our agent on your computer, run:

python deploy/main.py

Reference scripts for model deployment can be found in scripts/server.sh.

Acknowledgments

We would like to express our sincere gratitude to Shijie Xia for his meticulous review and constructive suggestions, which significantly improved the quality of this paper. This project is supported by SJTU SEIEE - ByteDance Large Language Model Joint Laboratory, SII.

Citation

If you find this work helpful, please consider citing:

@misc{he2025efficientagenttrainingcomputer,
      title={Efficient Agent Training for Computer Use}, 
      author={Yanheng He and Jiahe Jin and Pengfei Liu},
      year={2025},
      eprint={2505.13909},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.13909}, 
}

About

Efficient Agent Training for Computer Use

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0