[CVPR 2025] Official repository of "GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities".
[Project Page] [Paper] [Video]
Authors: Rao Fu* · Dingxi Zhang* · Alex Jiang · Wanjia Fu · Austin Funk · Daniel Ritchie · Srinath Sridhar
The demo data contains 5 motion sequences. The file directory looks like this:
demo_data/
├── hand_pose/
├── p<participant id>-<scene>-<squence id>/
├── bboxes/ # bounding boxes for 2D keypoints tracking
├── keypoints_2d/ # 2D hand keypoints
├── keypoints_3d/ # 3D hand keypoints (triangulate multi-view 2D keypoints.)
├── keypoints_3d_mano/ # 3D hand keypoints (extract from mano parms and normalized, more smooth)
├── mano_vid/ # visualizations of mano parameters
├── params/ # mano parameters
├── rgb_vid/ # raw multiview videos
├── brics-odrind-<camera id>-camx
├── xxx.mp4
├── xxx.txt
├── ...
├── repro_2d_vid/ # visualizations of 2d hand keypoints
├── repro_3d_vid/ # visualizations of 3d hand keypoints
├── optim_params.txt # camera parameters
├── ...
└── object_pose
├── p<participant id>-<scene>-<squence id>/
├── mesh # reconstructed object mesh
├── pose # object pose
├── render # visualizations of object pose
├── segmentation # segmented object frames
├── ...
We store our dataset on Globus. You can download a demo sequence from here, all annotations from here, and access the raw data via here.
[2025/05/23] For object poses, access our Globus repository: here. Download each .tar.gz
separately (contains 1000 motion sequences per file.)
[2025/04/30] For multiview RGB videos, access our Globus repository: here. Download each .tar.gz
separately (contains 10 views per file, 51 camera views in total.)
[2025/04/02] We are pleased to release our full hand pose dataset, available for download here (Including all keypoints_3d
, keypoints_3d_mano
and params
).
Complete text annotation are available here. We used the rewritten_annotation
for model training.
More data coming soon! 🔜
The dataset directory should look like this:
./dataset/GigaHands/
├── hand_poses/
├── p<participant id>-<scene>/
├── keypoints_3d/ # 3D hand keypoints (triangulate multi-view 2D keypoints.)
├── keypoints_3d_mano/ # 3D hand keypoints (extract from mano parms and normalized, more smooth)
├── params/ # mano parameters
├── object_poses/
├── <object name>
├── p<participant id>-<scene>_<squence id>/
├── pose # object 6DoF poses
└── annotations_v2.jsonl # text annotations
This code requires:
- Python 3.8
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
- Create a virtual environment and install necessary dependencies
conda create -n gigahands python==3.8
conda activate gigahands
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge ffmpeg
pip install -r requirements.txt
- Install EasyMocap
cd third-party/EasyMocap
python setup.py develop
- Download mano models and place the
MANO_*.pkl
files underbody_models/smplh
. - Download the pretrained models by running
bash dataset/download_pretrained_models.sh
, which should be like:
./checkpoints/GigaHands/
./checkpoints/GigaHands/GPT/ # Text-to-motion generation model
./checkpoints/GigaHands/VQVAE/ # Motion autoencoder
./checkpoints/GigaHands/text_mot_match/ # Motion & Text feature extractors for evaluation
After downloading all hand pose annotations, run the script below to visualize them.
python visualize_hands.py
You will see videos of the MANO render results and reprojected keypoints in the visualizations
directory.
Sampling results from customized descriptions:
python gen_motion_custom.py --resume-pth ./checkpoints/GigaHands/VQVAE/net_last.pth --resume-trans ./checkpoints/GigaHands/GPT/net_best_fid.pth --input-text ./input.txt
The results are saved in the folder output
.
Training motion VQ-VAE:
python3 train_vq_hand.py \
--batch-size 256 \
--lr 2e-4 \
--total-iter 300000 \
--lr-scheduler 200000 \
--nb-code 512 \
--down-t 2 \
--depth 3 \
--dilation-growth-rate 3 \
--out-dir output \
--dataname GigaHands \
--vq-act relu \
--quantizer ema_reset \
--loss-vel 0.5 \
--recons-loss l1_smooth \
--exp-name VQVAE \
--window-size 128
Training T2M GPT model:
python3 train_t2m_trans_hand.py \
--exp-name GPT \
--batch-size 128 \
--num-layers 9 \
--embed-dim-gpt 1024 \
--nb-code 512 \
--n-head-gpt 16 \
--block-size 51 \
--ff-rate 4 \
--drop-out-rate 0.1 \
--resume-pth output/VQVAE/net_last.pth \
--vq-name VQVAE \
--out-dir output \
--total-iter 300000 \
--lr-scheduler 150000 \
--lr 0.0001 \
--dataname GigaHands \
--down-t 2 \
--depth 3 \
--quantizer ema_reset \
--eval-iter 10000 \
--pkeep 0.5 \
--dilation-growth-rate 3 \
--vq-act relu \
- Release demo data
- Release hand pose data
- Release multi-view video data
- Release object pose data (13k) and meshes
- Release inference code for text-to-motion task
- Release training code for text-to-motion task
We appreciate helps from :
- Public code like EasyMocap, text-to-motion, TM2T, MDM, T2M-GPT etc.
- This research was supported by AFOSR grant FA9550-21 1-0214, NSF CAREER grant #2143576, and ONR DURIP grant N00014-23-1-2804. We would like to thank the Ope nAI Research Access Program for API support and extend our gratitude to Ellie Pavlick, Tianran Zhang, Carmen Yu, Angela Xing, Chandradeep Pokhariya, Sudarshan Harithas, Hongyu Li, Chaerin Min, Xindi Qu, Xiaoquan Liu, Hao Sun, Melvin He and Brandon Woodard.
This dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/4.0/.
If you find our work useful in your research, please consider citing:
@article{fu2024gigahands,
title={GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities},
author={Fu, Rao and Zhang, Dingxi and Jiang, Alex and Fu, Wanjia and Funk, Austin and Ritchie, Daniel and Sridhar, Srinath},
journal={arXiv preprint arXiv:2412.04244},
year={2024}
}