Visual Relation Grounding in Videos

This is the pytorch implementation of our paper at ECCV2020 (Spotlight). The repository mainly includes 3 parts: (1) Extract RoI feature; (2) Train and inference; and (3) Generate relation-aware trajectories.

Environment

Anaconda 3, python 3.6.5, pytorch 0.4.1 (higher version is OK) and cuda >= 9.0. For others libs, please refer to the file requirements.txt.

Install

Please create an envs for this project using anaconda3 (should install anaconda first)

>conda create -n envname python=3.6.5 # Create
>conda activate envname # Enter
>pip install -r requirements.txt # Install the provided libs
>sh vRGV/lib/make.sh # Set the environment for detection

Data Preparation

Please download the data here. The folder [ground_data] should be at the same directory as vRGV [this project]. Please merge the downloaded vRGV folder with this repo.

Please download the raw videos here, and extract them into ground_data/vidvrd/JPEGImages/. The directory should be like: JPEGImages/ILSVRC2015_train_xxx/000000.JPEG

Usage

Feature Extraction (need about 100G storage! Because I dumped all the detected bboxes along with their features. It can be greatly reduced by changing detect_frame.py and returning the top-40 bboxes and save with h5py file.)

>./detection.sh

Train

>./ground.sh 0 train # Train the model with GPU id 0

Inference

>./ground.sh 0 val # Output the relation-aware spatio-temporal attention
>python generate_track_lick.py # Generate relation-aware trajectories with Viterbi algorithm
>python eval_ground.py # Evaluate the performance

Visualization

Query	bicycle-jump_beneath-person	person-feed-elephant	person-stand_above-bicycle	dog-watch-turtle
Result
Query	person-ride-horse	person-ride-bicycle	person-drive-car	bicycle-move_toward-car
Result

Note

If you find the codes useful in your research, please kindly cite:

@inproceedings{junbin2020visual,
  title={Visual Relation Grounding in Videos},
  author={Junbin, Xiao and Xindi, Shang and Xun, Yang and Sheng, Tang and Tat-Seng, Chua},
  booktitle={Proceedings of the 16th European Conference on Computer Vision (ECCV)},
  year={2020}
}

License

NUS © NExT++

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.idea		.idea
cfgs		cfgs
dataloader		dataloader
dataset/vidvrd		dataset/vidvrd
evaluations		evaluations
lib		lib
models		models
networks		networks
tools		tools
.gitignore		.gitignore
README.md		README.md
detect_frame.py		detect_frame.py
detection.py		detection.py
detection.sh		detection.sh
eval_ground.py		eval_ground.py
generate_track_link.py		generate_track_link.py
ground.py		ground.py
ground.sh		ground.sh
ground_relation.py		ground_relation.py
introduction.png		introduction.png
model.png		model.png
requirements.txt		requirements.txt
tube.py		tube.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visual Relation Grounding in Videos

Environment

Install

Data Preparation

Usage

Visualization

Note

License

About

Uh oh!

Releases

Packages

Languages

cfh3c/vRGV

Folders and files

Latest commit

History

Repository files navigation

Visual Relation Grounding in Videos

Environment

Install

Data Preparation

Usage

Visualization

Note

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages