This is the pytorch implementation of our paper at ECCV2020 (Spotlight).
The repository mainly includes 3 parts: (1) Extract RoI feature; (2) Train and inference; and (3) Generate relation-aware trajectories.
Anaconda 3, python 3.6.5, pytorch 0.4.1 (higher version is OK) and cuda >= 9.0. For others libs, please refer to the file requirements.txt.
Please create an envs for this project using anaconda3 (should install anaconda first)
>conda create -n envname python=3.6.5 # Create
>conda activate envname # Enter
>pip install -r requirements.txt # Install the provided libs
>sh vRGV/lib/make.sh # Set the environment for detection
Please download the data here. The folder [ground_data] should be at the same directory as vRGV [this project]. Please merge the downloaded vRGV folder with this repo.
Please download the raw videos here, and extract them into ground_data/vidvrd/JPEGImages/. The directory should be like: JPEGImages/ILSVRC2015_train_xxx/000000.JPEG
Feature Extraction (need about 100G storage! Because I dumped all the detected bboxes along with their features. It can be greatly reduced by changing detect_frame.py and returning the top-40 bboxes and save with h5py file.)
>./detection.sh
Train
>./ground.sh 0 train # Train the model with GPU id 0
Inference
>./ground.sh 0 val # Output the relation-aware spatio-temporal attention
>python generate_track_lick.py # Generate relation-aware trajectories with Viterbi algorithm
>python eval_ground.py # Evaluate the performance
Query | bicycle-jump_beneath-person | person-feed-elephant | person-stand_above-bicycle | dog-watch-turtle |
---|---|---|---|---|
Result | ||||
Query | person-ride-horse | person-ride-bicycle | person-drive-car | bicycle-move_toward-car |
Result |
If you find the codes useful in your research, please kindly cite:
@inproceedings{junbin2020visual,
title={Visual Relation Grounding in Videos},
author={Junbin, Xiao and Xindi, Shang and Xun, Yang and Sheng, Tang and Tat-Seng, Chua},
booktitle={Proceedings of the 16th European Conference on Computer Vision (ECCV)},
year={2020}
}
NUS © NExT++