10000 GitHub - ishine/EmoVoice: Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"

Notifications You must be signed in to change notification settings

ishine/EmoVoice

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Installation

Create a separate environment if needed

conda create -n EmoVoice python=3.10
conda activate EmoVoice
pip install -r requirements.txt

Decode with checkpoints

bash examples/tts/scripts/inference_EmoVoice.sh
bash examples/tts/scripts/inference_EmoVoice-PP.sh
bash examples/tts/scripts/inference_EmoVoice_1.5B.sh

Train from scratch

# Fisrt Stage: Pretrain TTS
bash examples/tts/scripts/pretrain_EmoVoice.sh
bash examples/tts/scripts/pretrain_EmoVoice-PP.sh
bash examples/tts/scripts/pretrain_EmoVoice_1.5B.sh

# Second Stage: Finetune Emotional TTS
bash examples/tts/scripts/ft_EmoVoice.sh
bash examples/tts/scripts/ft_EmoVoice-PP.sh
bash examples/tts/scripts/ft_EmoVoice_1.5B.sh

Checkpoints

Dataset

Acknowledgements

Citation

If our work and codebase is useful for you, please cite as:

@article{yang2025emovoice,
  title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting},
  author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others},
  journal={arXiv preprint arXiv:2504.12867},
  year={2025}
}

Paper link: https://arxiv.org/abs/2504.12867

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

About

Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.9%
  • HTML 21.4%
  • CSS 1.4%
  • Shell 0.3%
0