8000 GitHub - cyk990422/ExpGest: 🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

Notifications You must be signed in to change notification settings

cyk990422/ExpGest

Repository files navigation

ExpGest

🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

Arxiv Paper

IEEE International Conference on Multimedia and Expo (ICME), 2024

This is the official repository of the ExpGest.

ExpGest_demo.mp4

ExpGest is a method that accepts audio, phrases, and motion description text as inputs, and based on a diffusion model, it generates highly expressive motion speakers.

News 🚩

  • [2025/03/13] Code release!
  • [2024/10/12] ExpGest is on arXiv now.
  • [2024/03/08] ExpGest got accepted by ICME 2024! 🎉

To-Do list 📝

  • Inference code
  • Training code

Requirements 🎉

Conda environments

conda create -n ExpGest python=3.7
conda activate ExpGest 
conda install -n ExpGest pytorch==1.10.0 torchvision==0.11.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Pre-trained model and data

  • Download CLIP model. Put the folder in the root.
  • Download pre-trained weights from here (only audio-control) and put it in ./mydiffusion_zeggs
  • Download pre-trained weights from here (action-audio-control) and put it in ./mydiffusion_zeggs
  • Download pre-trained weights from here (text-audio-control) and put it in ./mydiffusion_zeggs
  • Download WavLM weights from here and put it in ./mydiffusion_zeggs
  • Download ASR weights from here and put it in ./mydiffusion_zeggs
  • Download BERT weights from here and put it in ./mydiffusion_zeggs

Run

User 6130 s can modify the configuration file according to their own needs to achieve the desired results. Instructions for modification are provided as comments in the configuration file.

# More detailed controls such as texts and phrases, please set them in the configuration file.
# Run audio-control demo:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_audio_only.yml

# Run hybrid control demo:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid.yml

# Run demo with emotion guided:
python sample_demo.py  --audiowavlm_path '../1_wayne_0_79_79.wav' --max_len 320 --config ../ExpGest_config_hybrid_w_emo.yml

Citation

@inproceedings{cheng2024expgest,
  title={ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance},
  author={Cheng, Yongkang and Liang, Mingjiang and Huang, Shaoli and Ning, Jifeng and Liu, Wei},
  booktitle={2024 IEEE International Conference on Multimedia and Expo (ICME)},
  pages={1--6},
  year={2024},
  organization={IEEE}
}

Acknowledgement

The pytorch implementation of ExpGest is based on DiffuseStyleGesture. We use some parts of the great code from FreeTalker and MLD. We thank all the authors for their impressive works!!

Contact

For technical questions, please contact cyk990422@gmail.com

About

🔥(ICME 2024) ExpGest: Expressive Speaker Generation Using Diffusion Model and Hybrid Audio-Text Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0