Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

The video-based cross-modal auxiliary network (VCAN) consists of two components: the Audio Feature Map Module (AFMM), which enhances multi-scale acoustic sentiment representation; and the Cross-Modal Selection Module (CMSM), which aims to improve the interactivity of audiovisual modalities and eliminate redundant computations.

Code for the TCSVT paper Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Getting Started

These instructions will providee you with a core copy of the VCAN. you can adapt it and run on your local machine for development and testing purposes. This model runs on a Windows system using the PyCharm Community Edition. See following deployment for notes on how to deploy the project on a live system.

Prerequisites and Installing

What things you need to install the software and how to install them. you can download the reuqirements.txt to install the repated packages. The list of some important installation packages is as follows:

--emd==0.4.0
--librosa==0.8.1
--scikit-learn==0.24.2
...

Running the tests

We only provide the core code of the model, but for the data preprocessing and classification network part of the code is not available. If you need the corresponding demos, please refer to this repository

As shown in Fig.1. in the paper, the output of AFMM is part of the input of CMSM, you must run the AFMM and the CMSM in order after installing the aforementioned packages, i.e.,

data pre-processing. the adjustment of data types, data structures and corresponding labels.
Run the AFMM to generate three MEL-spectrams (three kinds of audio feature sequences), which is the acoustic feature input part of CMSM.
Run the CMSM and output the filtered video keyframes
Classify video keyframes (images) using designed classifiers or a cooperative classifier group.

Dataset

You need to declare the related citations if you use the following datasets as benchmark datasets for your papers

Note

We provide a simple demo of the basic raw data pre-processing so that users can easily adjust the data structure.
The cooperative classifier group mentioned in our paper is consisted of different CNN models, and users can retrieve the source code from the corresponding papers or refer to this URL.
If you have any questions about using VCAN, please contact me by email (chenrongfei@shu.edu.cn)

Citation

If you think the data set and code we collate are helpful for your research, please cite:

@article{chen2022video,
  title={Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis},
  author={Chen, Rongfei and Zhou, Wenju and Li, Yang and Zhou, Huiyu},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2022},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src		src
.gitignore		.gitignore
AFMM		AFMM
CMSM		CMSM
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Getting Started

Prerequisites and Installing

Running the tests

Dataset

Note

Citation

About

Uh oh!

Releases

Packages

Languages

rongfei-chen/VCAN

Folders and files

Latest commit

History

Repository files navigation

Video-based Cross-modal Auxiliary Network for Multimodal Sentiment Analysis

Getting Started

Prerequisites and Installing

Running the tests

Dataset

Note

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages