ExpGest

🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

IEEE International Conference on Multimedia and Expo (ICME), 2024

This is the official repository of the ExpGest.

ExpGest_demo.mp4

ExpGest is a method that accepts audio, phrases, and motion description text as inputs, and based on a diffusion model, it generates highly expressive motion speakers.

News 🚩

[2025/03/13] Code release! ⭐
[2024/10/12] ExpGest is on arXiv now.
[2024/03/08] ExpGest got accepted by ICME 2024! 🎉

To-Do list 📝

Inference code
Training code

Requirements 🎉

Conda environments

conda create -n ExpGest python=3.7
conda activate ExpGest 
conda install -n ExpGest pytorch==1.10.0 torchvision==0.11.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Pre-trained model and data

Download CLIP model. Put the folder in the root.
Download pre-trained weights from here (only audio-control) and put it in ./mydiffusion_zeggs
Download pre-trained weights from here (action-audio-control) and put it in ./mydiffusion_zeggs
Download pre-trained weights from here (text-audio-control) and put it in ./mydiffusion_zeggs
Download WavLM weights from here and put it in ./mydiffusion_zeggs
Download ASR weights from here and put it in ./mydiffusion_zeggs
Download BERT weights from here and put it in ./mydiffusion_zeggs

Run

Users can modify the configuration file according to their own needs to achieve the desired results. Instructions for modification are provided as comments in the configuration file.

# More detailed controls such as texts and phrases, please set them in the configuration file.
# Run audio-control demo:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_audio_only.yml

# Run hybrid control demo:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid.yml

# Run demo with emotion guided:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid_w_emo.yml

Citation

@inproceedings{cheng2024expgest,
  title={ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance},
  author={Cheng, Yongkang and Liang, Mingjiang and Huang, Shaoli and Ning, Jifeng and Liu, Wei},
  booktitle={2024 IEEE International Conference on Multimedia and Expo (ICME)},
  pages={1--6},
  year={2024},
  organization={IEEE}
}

Acknowledgement

The pytorch implementation of ExpGest is based on DiffuseStyleGesture. We use some parts of the great code from FreeTalker and MLD. We thank all the authors for their impressive works!!

Contact

For technical questions, please contact cyk990422@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
assets		assets
data_loaders		data_loaders
diffusion		diffusion
eval		eval
model		model
mydiffusion_zeggs		mydiffusion_zeggs
process		process
ubisoft-laforge-ZeroEGGS-main		ubisoft-laforge-ZeroEGGS-main
9.png		9.png
ExpGest_config_audio_only.yml		ExpGest_config_audio_only.yml
ExpGest_config_hybrid.yml		ExpGest_config_hybrid.yml
ExpGest_config_hybrid_w_emo.yml		ExpGest_config_hybrid_w_emo.yml
README.md		README.md
mfcc.py		mfcc.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExpGest

News 🚩

To-Do list 📝

Requirements 🎉

Conda environments

Pre-trained model and data

Run

Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cyk990422/ExpGest

Folders and files

Latest commit

History

Repository files navigation

ExpGest

News 🚩

To-Do list 📝

Requirements 🎉

Conda environments

Pre-trained model and data

Run

Citation

Acknowledgement

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages