📄 Paper | 🌐 Website | 🤖 Model | 🤗 Dataset
Check out our demo of PC Agent-E autonomously controlling a computer to complete tasks on Windows and Linux systems!
Windows.mp4
Linux.mp4
We introduce PC Agent-E, an efficient agent training framework that elicits strong computer use capabilities with remarkable data efficiency. This framework is implemented with four key components:
- Trajectory Collection, gathering a small set of task trajectories from human annotators with PC Tracker;
- Thought Completion, reconstructing the latent human thought process before each action;
- Trajectory Boost, synthesizing diverse alternative action decisions;
- Agent Training, training native agent model with augmented trajectories.
Table: Results of successful rate (%) for different models on WindowsAgentArena-V2, an improved benchmark we also released.
Models | LibreOffice | Chrome | Edge | System | VS Code | VLC | Utils | Total |
---|---|---|---|---|---|---|---|---|
Number of Tasks | 42 | 17 | 13 | 24 | 19 | 14 | 12 | 141 |
Qwen2.5-VL-72B | 0.0 | 34.7 | 15.4 | 20.8 | 26.3 | 7.6 | 16.7 | 14.9 |
UI-TARS-1.5-7B | 7.1 | 34.7 | 23.1 | 45.8 | 21.1 | 7.6 | 16.7 | 21.3 |
UI-TARS-72B-DPO | 0.0 | 40.6 | 38.5 | 58.3 | 36.8 | 7.6 | 25.0 | 26.2 |
Claude 3.7 Sonnet | 2.4 | 46.5 | 61.5 | 54.2 | 52.6 | 29.0 | 16.7 | 32.6 |
Claude 3.7 Sonnet (thinking) | 2.4 | 64.1 | 46.2 | 66.7 | 52.6 | 21.9 | 25.0 | 35.4 |
PC Agent-E (Ours) | 4.8 | 64.1 | 46.2 | 50.0 | 57.9 | 35.7 | 33.3 | 36.0 |
Collect raw human trajectory with PC Tracker. See usage here.
To convert raw human trajectory into high-quality trajectories for training, follow these steps:
- Place recorded in the
data/
directory. - Run post processing pipeline:
# Data refinement
python postprocess/refinement.py
# Thought completion and Trajectory Boost
python postprocess/boost.py
Note: You need to prepare your API key in advance.
You can use our dataset or build data set with above steps on your own. To prepare data for agent training, put the dataset in the data/
directory, and run:
python postprocess/prepare.py
We recommend using LLaMA-Factory for agent training. To launch distributed training across multiple nodes, you can run:
FORCE_TORCHRUN=1 NNODES=4 NODE_RANK=${PET_NODE_RANK} MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train train/sft.yaml
Replace PET_NODE_RANK with the rank of the current node (from 0 to 3).
We provide a reference implementation of our PC Agent-E scaffold in the deploy/
directory. To deploy our agent on your computer, run:
python deploy/main.py
Reference scripts for model deployment can be found in scripts/server.sh
.
We would like to express our sincere gratitude to Shijie Xia for his meticulous review and constructive suggestions, which significantly improved the quality of this paper. This project is supported by SJTU SEIEE - ByteDance Large Language Model Joint Laboratory, SII.
If you find this work helpful, please consider citing:
@misc{he2025efficientagenttrainingcomputer,
title={Efficient Agent Training for Computer Use},
author={Yanheng He and Jiahe Jin and Pengfei Liu},
year={2025},
eprint={2505.13909},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.13909},
}