[Paper] - [Code] - [Demo] - [Weights]
You can install the Python dependencies with
pip install -r requirements.txt
The supported dataset is
- LJSpeech: The LJSpeech Dataset consists of speech data recorded by a single English speaker (single-speaker TTS) with an American accent. The dataset contains approximately 13,100 audio clips, each corresponding to a short passage from classic literature and other public domain texts. The total audio duration is around 24 hours, making it well-suited for training text-to-speech models.
Train with
python train.py -c configs/vits2_ljs_ring.json -m "model name"
Use
tensorboard --logdir ./logs/
to serve TensorBoard on your localhost. The loss curves, synthesized mel-spectrograms, and audios are shown.
python inference.py --text "your text to synthesize" --output "output.wav"