8000 GitHub - cug-ygh/TMT
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cug-ygh/TMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TMT

A novel Token-disentangling Mutual Transformer(TMT) for multimodal emotion recognition. The TMT can effectively disentangle inter-modality emotion consistency features and intra-modality emotion heterogeneity features and mutually fuse them for more comprehensive multimodal emotion representations by introducing two primary modules, namely multimodal emotion Token disentanglement and Token mutual Transformer. The Models folder contains our overall TMT model code, and the subNets folder contains the Transforme structure used in it.

Features

  • Train, test and compare in a unified framework.
  • Supports Our TMT model.
  • Supports 3 datasets: MOSI, MOSEI, and CH-SIMS.
  • Easy to use, provides Python APIs and commandline tools.
  • Experiment with fully customized multimodal features extracted by MMSA-FET toolkit.

1. Get Started

Note: From version 2.0, we packaged the project and uploaded it to PyPI in the hope of making it easier to use. If you don't like the new structure, you can always switch back to v_1.0 branch.

1.1 Use Python API

At present, only the 6E3D model and test code are uploaded, and the training code will be released after the paper is published.

  • Import and use in any python file:

    python test.py
    python test_acc5.py
  • For more detailed usage, please contact ygh2@cug.edu.cn.

1.3 Clone & Edit the Code

  • Clone this repo and install requirements.
    $ git clone https://github.com/cug-ygh/TMT

2. Datasets

TMT currently supports MOSI, MOSEI, and CH-SIMS dataset. Use the following links to download raw videos, feature files and label files. You don't need to download raw videos if you're not planning to run end-to-end tasks.

SHA-256 for feature files:

`MOSI/Processed/unaligned_50.pkl`:  `78e0f8b5ef8ff71558e7307848fc1fa929ecb078203f565ab22b9daab2e02524`
`MOSI/Processed/aligned_50.pkl`:    `d3994fd25681f9c7ad6e9c6596a6fe9b4beb85ff7d478ba978b124139002e5f9`
`MOSEI/Processed/unaligned_50.pkl`: `ad8b23d50557045e7d47959ce6c5b955d8d983f2979c7d9b7b9226f6dd6fec1f`
`MOSEI/Processed/aligned_50.pkl`:   `45eccfb748a87c80ecab9bfac29582e7b1466bf6605ff29d3b338a75120bf791`
`SIMS/Processed/unaligned_39.pkl`:  `c9e20c13ec0454d98bb9c1e520e490c75146bfa2dfeeea78d84de047dbdd442f`

Our uses feature files that are organized as follows:

{
    "train": {
        "raw_text": [],              # raw text
        "audio": [],                 # audio feature
        "vision": [],                # video feature
        "id": [],                    # [video_id$_$clip_id, ..., ...]
        "text": [],                  # bert feature
        "text_bert": [],             # word ids for bert
        "audio_lengths": [],         # audio feature lenth(over time) for every sample
        "vision_lengths": [],        # same as audio_lengths
        "annotations": [],           # strings
        "classification_labels": [], # Negative(0), Neutral(1), Positive(2). Deprecated in v_2.0
        "regression_labels": []      # Negative(<0), Neutral(0), Positive(>0)
    },
    "valid": {***},                  # same as "train"
    "test": {***},                   # same as "train"
}

3. Citation

Please cite our paper if you find our work useful for your research:

@article{YIN2024108348,

title = {Token-disentangling Mutual Transformer for multimodal emotion recognition},

journal = {Engineering Applications of Artificial Intelligence},

volume = {133},

pages = {108348},

year = {2024},

issn = {0952-1976},

doi = {https://doi.org/10.1016/j.engappai.2024.108348},

url = {https://www.sciencedirect.com/science/article/pii/S0952197624005062},

}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0