8000 GitHub - ed-fish/UnAV: Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ UnAV Public
forked from ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

License

Notifications You must be signed in to change notification settings

ed-fish/UnAV

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

This repository contains code for CVPR 2023 paper "Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline". This paper introduces the first Untrimmed Audio-Visual (UnAV-100) dataset and proposes to sovle audio-visual event localization problem in more realistic and challenging scenarios. [Project page] [Arxiv].

Requirements

The implemetation is based on PyTorch. Follow INSTALL.md to install required dependencies.

Data preparation

The proposed UnAV-100 dataset can be downloaded from [Project Page], including YouTube links of raw videos, annotations and extracted features. A download script is provided for raw videos at scripts/video_download.py. Note: after downloading data, unpack files under data/unav100. The folder structure should look like:

This folder
│   README.md
│   ...  
└───data/
│    └───unav100/
│    	 └───annotations/
│               └───unav100_annotations.json
│    	 └───av_features/   
│               └───__2MwJ2uHu0_flow.npy    # mix all features together
│               └───__2MwJ2uHu0_rgb.npy 
│               └───__2MwJ2uHu0_vggish.npy 
|                   ...
└───libs
│   ...

Training

Run train.py to train the model on UnAV-100 dataset. This will create an experiment folder under ./ckpt that stores training config, logs, and checkpoints.

python ./train.py ./configs/avel_unav100.yaml --output reproduce

Evaluation

Run eval.py to evaluate the trained model.

python ./eval.py ./configs/avel_unav100.yaml ./ckpt/avel_unav100_reproduce

[Optional] We also provide a pretrained model for UnAV-100, which can be downloaded from this link.

Citation

If you find our dataset and code are useful for your research, please cite our paper

@article{geng2023dense,
  title={Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline},
  author={Geng, Tiantian and Wang, Teng and Duan, Jinming and Cong, Runmin and Zheng, Feng},
  journal={arXiv preprint arXiv:2303.12930},
  year={2023}
}

Acknowledgement

The video features of I3D-rgb & flow and Vggish-audio were extracted using video_features. Our baseline model was implemented based on ActionFormer. We thank the authors for sharing their codes. If you use our code, please consider to cite their works.

About

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.5%
  • C++ 3.5%
0