EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Installation

Create a separate environment if needed

conda create -n EmoVoice python=3.10
conda activate EmoVoice
pip install -r requirements.txt

Decode with checkpoints

bash examples/tts/scripts/inference_EmoVoice.sh
bash examples/tts/scripts/inference_EmoVoice-PP.sh
bash examples/tts/scripts/inference_EmoVoice_1.5B.sh

Train from scratch

# Fisrt Stage: Pretrain TTS
bash examples/tts/scripts/pretrain_EmoVoice.sh
bash examples/tts/scripts/pretrain_EmoVoice-PP.sh
bash examples/tts/scripts/pretrain_EmoVoice_1.5B.sh

# Second Stage: Finetune Emotional TTS
bash examples/tts/scripts/ft_EmoVoice.sh
bash examples/tts/scripts/ft_EmoVoice-PP.sh
bash examples/tts/scripts/ft_EmoVoice_1.5B.sh

Checkpoints

Checkpoints can be found on hugging face: https://huggingface.co/yhaha/EmoVoice

Dataset

Pretrain TTS: VoiceAssistant
Finetune Emotional TTS: EmoVoice-DB and part of laions_got_talent

Acknowledgements

Our codes is built on SLAM-LLM

Citation

If our work and codebase is useful for you, please cite as:

@article{yang2025emovoice,
  title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting},
  author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others},
  journal={arXiv preprint arXiv:2504.12867},
  year={2025}
}

Paper link: https://arxiv.org/abs/2504.12867

License

Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license due to the training data Emilia, which is an in-the-wild dataset. Sorry for any inconvenience this may cause.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
audios		audios
css		css
examples/tts		examples/tts
pics		pics
scripts		scripts
src/slam_llm		src/slam_llm
.gitignore		.gitignore
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Installation

Create a separate environment if needed

Decode with checkpoints

Train from scratch

Checkpoints

Dataset

Acknowledgements

Citation

License

About

Releases

Packages

Languages

ishine/EmoVoice

Folders and files

Latest commit

History

Repository files navigation

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Installation

Create a separate environment if needed

Decode with checkpoints

Train from scratch

Checkpoints

Dataset

Acknowledgements

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages