8000 GitHub - Zhao-Jianing-SUDA/Hawkeye: The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model in ACM MM 2024 Oral
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model in ACM MM 2024 Oral

Notifications You must be signed in to change notification settings

Zhao-Jianing-SUDA/Hawkeye

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for latest update.

This repository contains the official implementation of our work in ACM MM 2024. More details can be viewed in our paper. [PDF]

Abstract

In real-world recon-videos such as surveillance and drone reconnaissance videos, commonly used explicit language, acoustic and facial expressions information is often missing. However, these videos are always rich in anomalous sentiments (e.g., criminal tendencies), which urgently requires the implicit scene information (e.g., actions and object relations) to fast and precisely identify these anomalous sentiments. Motivated by this, this paper proposes a new chat-paradigm Implicit anomalous sentiment Discovering and grounding (IasDig) task, aiming to interactively, fast discovering and grounding anomalous sentiments in recon-videos via leveraging the implicit scene information (i.e., actions and object relations). Furthermore, this paper believes that this IasDig task faces two key challenges, i.e., scene modeling and scene balancing. To this end, this paper proposes a new Scene-enhanced Video Large Language Model named Hawkeye, i.e., acting like a raptor (e.g., a Hawk) to discover and locate prey, for the IasDig task. Specifically, this approach designs a graph-structured scene modeling module and a balanced heterogeneous MoE module to address the above two challenges, respectively. Extensive experimental results on our constructed scene-sparsity and scene-density IasDig datasets demonstrate the great advantage of Hawkeye to IasDig over the advanced Video-LLM baselines, especially on the metric of false negative rates. This justifies the importance of the scene information for identifying implicit anomalous sentiments and the impressive practicality of Hawkeye for real-world applications.

Dependencies

You can set up the environments by using conda env create -f environment.yml.

Training Pipeline

Dataset Preparation

  1. Prepare TSL-300 dataset.

  2. Prepare UCF-Crime dataset.

  3. Split the videos into frames, extract action features and object-relation features with HigherHRNet and RelTR.

  4. Place the features inside the dataset folder.

    • Please ensure the data structure is as below.
├── dataset
   └── vid_split
       ├── 1_Ekman6_disgust_3
           ├── 1.mp4
           ├── 2.mp4
           └── ...
       ├── Abuse028_x264
           ├── 1.mp4
           ├── 2.mp4
           └── ...
   └── pose_feat
       ├── 1_Ekman6_disgust_3
           ├── frame_1.npy
           ├── frame_2.npy
           └── ...
       ├── Abuse028_x264
           ├── frame_1.npy
           ├── frame_2.npy
           └── ...
   └── rel_feat
         ├── 1_Ekman6_disgust_3
              ├── frame_1.npy
              ├── frame_2.npy
              └── ...
         ├── Abuse028_x264
              ├── frame_1.npy
              ├── frame_2.npy
              └── ...

Model checkpoint preparation

  1. Download the pretrained vicuna-v1.5 model from Haggingface and place it in the lmsys folder.
  2. Download the pretrained LanguageBind model from LanguageBind and place it in the LanguageBind folder.

Training

$ bash scripts/v1_5/finetune_lora_a100.sh

After training, the checkpoint will be saved in the output_folder folder.

Model Zoo

Metric FNRs F2 mAP@ 0.1 mAP@ 0.2 mAP@ 0.3 Url
On TSL Dataset 35.82 38.09 35.24 21.21 14.71 Google drive
On UCF-Crime Dataset 45.66 45.03 34.41 19.22 12.1 Google drive

Evaluation

You can evaluate the model by running the command below.

python3 eval.py

Citation

If you find this work useful, please consider citing it.

@inproceedings{zhao2024hawkeye,
  title={Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model},
  author={Zhao, Jianing and Wang, Jingjing and Jin, Yujie and Luo, Jiamin and Zhou, Guodong},
  booktitle={Proceedings of {ACM MM} 2024},
  year={2024}
}

About

The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model in ACM MM 2024 Oral

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0