8000 GitHub - cfh3c/vRGV: Visual Relation Grounding in Videos
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cfh3c/vRGV

 
 

Repository files navigation

Visual Relation Grounding in Videos

This is the pytorch implementation of our paper at ECCV2020 (Spotlight). teaser The repository mainly includes 3 parts: (1) Extract RoI feature; (2) Train and inference; and (3) Generate relation-aware trajectories.

Environment

Anaconda 3, python 3.6.5, pytorch 0.4.1 (higher version is OK) and cuda >= 9.0. For others libs, please refer to the file requirements.txt.

Install

Please create an envs for this project using anaconda3 (should install anaconda first)

>conda create -n envname python=3.6.5 # Create
>conda activate envname # Enter
>pip install -r requirements.txt # Install the provided libs
>sh vRGV/lib/make.sh # Set the environment for detection

Data Preparation

Please download the data here. The folder [ground_data] should be at the same directory as vRGV [this project]. Please merge the downloaded vRGV folder with this repo.

Please download the raw videos here, and extract them into ground_data/vidvrd/JPEGImages/. The directory should be like: JPEGImages/ILSVRC2015_train_xxx/000000.JPEG

Usage

Feature Extraction (need about 100G storage! Because I dumped all the detected bboxes along with their features. It can be greatly reduced by changing detect_frame.py and returning the top-40 bboxes and save with h5py file.)

>./detection.sh

Train

>./ground.sh 0 train # Train the model with GPU id 0

Inference

>./ground.sh 0 val # Output the relation-aware spatio-temporal attention
>python generate_track_lick.py # Generate relation-aware trajectories with Viterbi algorithm
>python eval_ground.py # Evaluate the performance

Visualization

Query bicycle-jump_beneath-person person-feed-elephant person-stand_above-bicycle dog-watch-turtle
Result
Query person-ride-horse person-ride-bicycle person-drive-car bicycle-move_toward-car
Result

Note

If you find the codes useful in your research, please kindly cite:

@inproceedings{junbin2020visual,
  title={Visual Relation Grounding in Videos},
  author={Junbin, Xiao and Xindi, Shang and Xun, Yang and Sheng, Tang and Tat-Seng, Chua},
  booktitle={Proceedings of the 16th European Conference on Computer Vision (ECCV)},
  year={2020}
}

License

NUS © NExT++

About

Visual Relation Grounding in Videos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.9%
  • C 12.4%
  • Cuda 9.3%
  • Other 1.4%
0