GitHub - Zhao-Jianing-SUDA/Hawkeye: The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model in ACM MM 2024 Oral

Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

This repository contains the official implementation of our work in ACM MM 2024. More details can be viewed in our paper. [PDF]

Abstract

In real-world recon-videos such as surveillance and drone reconnaissance videos, commonly used explicit language, acoustic and facial expressions information is often missing. However, these videos are always rich in anomalous sentiments (e.g., criminal tendencies), which urgently requires the implicit scene information (e.g., actions and object relations) to fast and precisely identify these anomalous sentiments. Motivated by this, this paper proposes a new chat-paradigm Implicit anomalous sentiment Discovering and grounding (IasDig) task, aiming to interactively, fast discovering and grounding anomalous sentiments in recon-videos via leveraging the implicit scene information (i.e., actions and object relations). Furthermore, this paper believes that this IasDig task faces two key challenges, i.e., scene modeling and scene balancing. To this end, this paper proposes a new Scene-enhanced Video Large Language Model named Hawkeye, i.e., acting like a raptor (e.g., a Hawk) to discover and locate prey, for the IasDig task. Specifically, this approach designs a graph-structured scene modeling module and a balanced heterogeneous MoE module to address the above two challenges, respectively. Extensive experimental results on our constructed scene-sparsity and scene-density IasDig datasets demonstrate the great advantage of Hawkeye to IasDig over the advanced Video-LLM baselines, especially on the metric of false negative rates. This justifies the importance of the scene information for identifying implicit anomalous sentiments and the impressive practicality of Hawkeye for real-world applications.

Dependencies

You can set up the environments by using conda env create -f environment.yml.

Training Pipeline

Dataset Preparation

Prepare TSL-300 dataset.
Prepare UCF-Crime dataset.
Split the videos into frames, extract action features and object-relation features with HigherHRNet and RelTR.
Place the features inside the dataset folder.
- Please ensure the data structure is as below.

├── dataset
   └── vid_split
       ├── 1_Ekman6_disgust_3
           ├── 1.mp4
           ├── 2.mp4
           └── ...
       ├── Abuse028_x264
           ├── 1.mp4
           ├── 2.mp4
           └── ...
   └── pose_feat
       ├── 1_Ekman6_disgust_3
           ├── frame_1.npy
           ├── frame_2.npy
           └── ...
       ├── Abuse028_x264
           ├── frame_1.npy
           ├── frame_2.npy
           └── ...
   └── rel_feat
         ├── 1_Ekman6_disgust_3
              ├── frame_1.npy
              ├── frame_2.npy
              └── ...
         ├── Abuse028_x264
              ├── frame_1.npy
              ├── frame_2.npy
              └── ...

Model checkpoint preparation

Download the pretrained vicuna-v1.5 model from Haggingface and place it in the lmsys folder.
Download the pretrained LanguageBind model from LanguageBind and place it in the LanguageBind folder.

Training

$ bash scripts/v1_5/finetune_lora_a100.sh

After training, the checkpoint will be saved in the output_folder folder.

Model Zoo

Metric	FNRs	F2	mAP@ 0.1	mAP@ 0.2	mAP@ 0.3	Url
On TSL Dataset	35.82	38.09	35.24	21.21	14.71	Google drive
On UCF-Crime Dataset	45.66	45.03	34.41	19.22	12.1	Google drive

Evaluation

You can evaluate the model by running the command below.

python3 eval.py

Citation

If you find this work useful, please consider citing it.

@inproceedings{zhao2024hawkeye,
  title={Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model},
  author={Zhao, Jianing and Wang, Jingjing and Jin, Yujie and Luo, Jiamin and Zhou, Guodong},
  booktitle={Proceedings of {ACM MM} 2024},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assets		assets
checkpoints/Video-LLaVA-Pretrain-7B		checkpoints/Video-LLaVA-Pretrain-7B
dataset		dataset
llava		llava
scripts		scripts
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
train.py		train.py
train_mem.py		train_mem.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

Abstract

Dependencies

Training Pipeline

Dataset Preparation

Model checkpoint preparation

Training

Model Zoo

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

Zhao-Jianing-SUDA/Hawkeye

Folders and files

Latest commit

History

Repository files navigation

Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model

If you like our project, please give us a star ⭐ on GitHub for latest update.

Abstract

Dependencies

Training Pipeline

Dataset Preparation

Model checkpoint preparation

Training

Model Zoo

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages