GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

We present a cost-effective pretraining paradigm for VLA models using only synthetic data, achieving direct sim-to-real transfer and strong zero-shot generalizability for robotic grasping. Key contributions include:

SynGrasp-1B: a billion-frame synthetic grasping dataset, spanning 240 object categories and 10,000+ objects.
GraspVLA: a VLA model pretrained on SynGrasp-1B that achieves zero-shot generalization to real-world grasping without fine-tuning.
Unified CoT Framework: GraspVLA integrates autoregressive perception and flow-matching-based action generation into a single reasoning process, enabling joint training on synthetic action data and internet-scale semantic data for open-vocabulary grasping.

TODO List:

Release the supplementary material
Release model weights
Release SynGrasp-1B dataset

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
figs		figs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

About

Uh oh!

Releases

Packages

PKU-EPIC/GraspVLA

Folders and files

Latest commit

History

Repository files navigation

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages