StyleTTS2-Vocos

StyleTTS2-Vocos is a modified version of StyleTTS2 that replaces the original HiFi-GAN decoder with the Vocos decoder. This project maintains the advantages of the original StyleTTS2 while providing improved inference speed and memory efficiency through the Vocos decoder.

Acknowledgments

This project is built upon StyleTTS2 by @yl4579. I express my deepest gratitude to the original authors for their groundbreaking work.

Key Changes

Vocos Decoder Integration
- Replaced the original StyleTTS2's HiFi-GAN decoder with Vocos decoder
- Significantly reduced and near-constant memory usage during inference
- Faster inference speed with comparable quality - generation time increases only marginally with audio length, unlike proportional scaling in GAN-based models
- For detailed implementation of the Vocos module, please refer to Modules/vocos.py
- Note: While the Vocos decoder offers significant advantages in terms of memory usage and inference speed, the current implementation in vocos.py may not achieve the same audio quality as the original HiFi-GAN decoder. I believe this is due to the current implementation of the Vocos module, particularly in the source module and forward pass. I welcome contributions from the community to improve the quality while maintaining the efficiency benefits.
No-BERT Version
- Provides a version without PL-BERT model
- Optimized for languages not supported by multi-lingual PL-BERT model like Korean
- Minimal performance degradation without PL-BERT

Model Download

Pre-trained models can be downloaded from the following Hugging Face repository: https://huggingface.co/5Hyeons/StyleTTS2

Training

The training process consists of two stages, similar to the original StyleTTS2:

First stage training:

accelerate launch train_first.py --config_path ./Configs/config.yml

Second stage training:

python train_second.py --config_path ./Configs/config.yml

For No-BERT version:

python train_second_no_bert.py --config_path ./Configs/config.yml

The model will be saved in the format "epoch_1st_%05d.pth" and "epoch_2nd_%05d.pth". Checkpoints and Tensorboard logs will be saved at log_dir.

Key Differences from Original StyleTTS2

The configuration file has two main differences from the original StyleTTS2:

Decoder Type:

decoder:
  type: 'vocos'  # Changed from original 'hifigan' or 'istftnet'

Root Path:

data_params:
  root_path: "/data/LibriTTS"  # Update this path according to your dataset location

Inference

Please refer to the Jupyter notebooks in the Demo folder for detailed inference examples:

Inference_stage1.ipynb: For first stage inference
Inference_stage2.ipynb: For second stage inference

Contributing

I welcome contributions to improve the Vocos decoder implementation, particularly in the following areas:

Source module implementation in Modules/vocos.py
Forward pass optimization
Quality improvements while maintaining the efficiency benefits

If you have ideas or implementations that could improve the audio quality while maintaining the efficiency benefits, please feel free to contribute.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name
Latest commit History 4 Commits
Configs		Configs
Data		Data
Demo		Demo
Modules		Modules
Outputs/libritts_vocos_stage2		Outputs/libritts_vocos_stage2
Utils		Utils
.gitignore		.gitignore
LICENSE	LICENSE
README.md		README.md
losses.py		losses.py
meldataset.py		meldataset.py
models.py		models.py
optimizers.py		optimizers.py
requirements.txt		requirements.txt
symbols.py		symbols.py
symbols_en.py		symbols_en.py
text_utils.py		text_utils.py
train_finetune.py		train_finetune.py
train_finetune_accelerate.py		train_finetune_accelerate.py
train_finetune_accelerate_no_bert.py		train_finetune_accelerate_no_bert.py
train_finetune_no_bert.py		train_finetune_no_bert.py
train_first.py		train_first.py
train_second.py		train_second.py
train_second_no_bert.py		train_second_no_bert.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StyleTTS2-Vocos

Acknowledgments

Key Changes

Model Download

Training

Key Differences from Original StyleTTS2

Inference

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

5Hyeons/StyleTTS2-Vocos

Folders and files

Latest commit

History

Repository files navigation

StyleTTS2-Vocos

Acknowledgments

Key Changes

Model Download

Training

Key Differences from Original StyleTTS2

Inference

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages