🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance
IEEE International Conference on Multimedia and Expo (ICME), 2024
This is the official repository of the ExpGest.
ExpGest_demo.mp4
ExpGest is a method that accepts audio, phrases, and motion description text as inputs, and based on a diffusion model, it generates highly expressive motion speakers.
- [2025/03/13] Code release! ⭐
- [2024/10/12] ExpGest is on arXiv now.
- [2024/03/08] ExpGest got accepted by ICME 2024! 🎉
- Inference code
- Training code
conda create -n ExpGest python=3.7
conda activate ExpGest
conda install -n ExpGest pytorch==1.10.0 torchvision==0.11.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
- Download CLIP model. Put the folder in the root.
- Download pre-trained weights from here (only audio-control) and put it in
./mydiffusion_zeggs
- Download pre-trained weights from here (action-audio-control) and put it in
./mydiffusion_zeggs
- Download pre-trained weights from here (text-audio-control) and put it in
./mydiffusion_zeggs
- Download WavLM weights from here and put it in
./mydiffusion_zeggs
- Download ASR weights from here and put it in
./mydiffusion_zeggs
- Download BERT weights from here and put it in
./mydiffusion_zeggs
User 6130 s can modify the configuration file according to their own needs to achieve the desired results. Instructions for modification are provided as comments in the configuration file.
# More detailed controls such as texts and phrases, please set them in the configuration file.
# Run audio-control demo:
python sample_demo.py --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_audio_only.yml
# Run hybrid control demo:
python sample_demo.py --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid.yml
# Run demo with emotion guided:
python sample_demo.py --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid_w_emo.yml
@inproceedings{cheng2024expgest,
title={ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance},
author={Cheng, Yongkang and Liang, Mingjiang and Huang, Shaoli and Ning, Jifeng and Liu, Wei},
booktitle={2024 IEEE International Conference on Multimedia and Expo (ICME)},
pages={1--6},
year={2024},
organization={IEEE}
}
The pytorch implementation of ExpGest is based on DiffuseStyleGesture. We use some parts of the great code from FreeTalker and MLD. We thank all the authors for their impressive works!!
For technical questions, please contact cyk990422@gmail.com