TTS HW 3

Implementation of a TTS pipeline using Fastspeech2 model trained on a LJSpeech dataset.

See the results of both models at the end of this README.

Checkpoints

The first model was trained with the following configuration: batch size=20, batch expand size=24, AdamW with warmup, max_lr=5e-4, len_epoch=3000, num_epochs=40, grad_norm_clip=2.

The second model uses pre layer norm in attention. It was trained with batch size=64, max_lr=1e-3. Initialization in MultiHead Attention was replaced on xavier uniform. All other configuration parameters are the same. To use model with or without prelayer norm, add "attn_use_prelayer_norm": true/false to model's config.

Installation guide

pip install -r ./requirements.txt

To reproduce training download necessary files, including LJSpeech, it's mel spectrograms, alignments for FastSpeech and Waveglow model weights, using a shell script.

sh scripts/download_data.sh

To get pitches and energies run scripts/setup.py, which saves them in the same data folder as other files.

python scripts/setup.py

Configs can be found in hw_tts/configs folder. In particular, for testing use config_test.json.

Training

One can redefine parameters which are set within config by passing them in terminal via flags.

python train.py -c CONFIG -r CHECKPOINT -k WANDB_KEY --wandb_run_name WANDB_RUN_NAME --n_gpu NUM_GPU --batch_size REAL_BATCH_SIZE --batch_expand_size MULTIPLIER_FOR_BATCH_SAMPLING --len_epoch ITERS_PER_EPOCH --waveglow_path WAVEGLOW_WEIGHTS_PATH --data_path PATH_TO_TRAIN_TEXTS --mel_ground_truth PATH_TO_GT_MELS --alignment_path PATH_TO_GT_ALIGNMENTS --pitch_path PATH_TO_GT_PITCHES --energy_path PATH_TO_GT_ENERGIES

Testing

To use model with or without prelayer norm, add "attn_use_prelayer_norm": true/false to model's config.

python test.py -c hw_tts/configs/config_test.json -r CHECKPOINT -t test.txt -o output

test.txt is a file with 3 sentences for evaluation, each on a newline.
output is a folder to save the result.

You can tune parameters for speeding up / slowing down, pitching up or down, changing energy of an audio. One can find variants of parameters, called params_list, with which the audio will be generated also in config_test.json. They are given as a list of triplets, where first one is related to duration (greater value means slowing down), second one to pitch (greater value means pitching up), and the third one to energy (greater means lower volume).

Current parameters list generates the following audios for texts given in test.txt:

regular generated audio
audio with +20%/-20% for pitch/speed/energy
audio with +20/-20% for pitch, speed and energy together

Results

Generation of these 3 sentences. Filename corresponds to the order of a sentence.

A defibrillator is a device that gives a high energy electric shock to the heart of someone who is in cardiac arrest

Massachusetts Institute of Technology may be best known for its math, science and engineering education

Wasserstein distance or Kantorovich Rubinstein metric is a distance function defined between probability distributions on a given metric space

I considered to publish results of two models, that were described in a report, because their generation quality difference is quite subjective.

First model

3_speed.1_pitch.1_energy.1.mp4

3_speed.1_pitch.1_energy.1.2.mp4

3_speed.1_pitch.1_energy.0.8.mp4

3_speed.1_pitch.1.2_energy.1.mp4

3_speed.1_pitch.0.8_energy.1.mp4

3_speed.1.2_pitch.1_energy.1.mp4

3_speed.1.2_pitch.1.2_energy.1.2.mp4

3_speed.0.8_pitch.1_energy.1.mp4

3_speed.0.8_pitch.0.8_energy.0.8.mp4

2_speed.1_pitch.1_energy.1.mp4

2_speed.1_pitch.1_energy.1.2.mp4

2_speed.1_pitch.1_energy.0.8.mp4

2_speed.1_pitch.1.2_energy.1.mp4

2_speed.1_pitch.0.8_energy.1.mp4

2_speed.1.2_pitch.1_energy.1.mp4

2_speed.1.2_pitch.1.2_energy.1.2.mp4

2_speed.0.8_pitch.1_energy.1.mp4

2_speed.0.8_pitch.0.8_energy.0.8.mp4

1_speed.1_pitch.1_energy.1.mp4

1_speed.1_pitch.1_energy.1.2.mp4

1_speed.1_pitch.1_energy.0.8.mp4

1_speed.1_pitch.1.2_energy.1.mp4

1_speed.1_pitch.0.8_energy.1.mp4

1_speed.1.2_pitch.1_energy.1.mp4

1_speed.1.2_pitch.1.2_energy.1.2.mp4

1_speed.0.8_pitch.1_energy.1.mp4

1_speed.0.8_pitch.0.8_energy.0.8.mp4

Second model

3_speed.1_pitch.1_energy.1.2.mp4

3_speed.1_pitch.1_energy.0.8.mp4

3_speed.1_pitch.1.2_energy.1.mp4

3_speed.1_pitch.0.8_energy.1.mp4

3_speed.1.2_pitch.1_energy.1.mp4

3_speed.1.2_pitch.1.2_energy.1.2.mp4

3_speed.0.8_pitch.1_energy.1.mp4

3_speed.0.8_pitch.0.8_energy.0.8.mp4

2_speed.1_pitch.1_energy.1.mp4

2_speed.1_pitch.1_energy.1.2.mp4

2_speed.1_pitch.1_energy.0.8.mp4

2_speed.1_pitch.1.2_energy.1.mp4

2_speed.1_pitch.0.8_energy.1.mp4

2_speed.1.2_pitch.1_energy.1.mp4

2_speed.1.2_pitch.1.2_energy.1.2.mp4

2_speed.0.8_pitch.1_energy.1.mp4

2_speed.0.8_pitch.0.8_energy.0.8.mp4

1_speed.1_pitch.1_energy.1.mp4

1_speed.1_pitch.1_energy.1.2.mp4

1_speed.1_pitch.1_energy.0.8.mp4

1_speed.1_pitch.1.2_energy.1.mp4

1_speed.1_pitch.0.8_energy.1.mp4

1_speed.1.2_pitch.1_energy.1.mp4

1_speed.1.2_pitch.1.2_energy.1.2.mp4

1_speed.0.8_pitch.1_energy.1.mp4

1_speed.0.8_pitch.0.8_energy.0.8.mp4

3_speed.1_pitch.1_energy.1.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TTS HW 3

Checkpoints

Installation guide

Training

Testing

Results

First model

Second model

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
hw_tts		hw_tts
scripts		scripts
waveglow		waveglow
.gitignore		.gitignore
README.md		README.md
glow.py		glow.py
requirements.txt		requirements.txt
test.py		test.py
test.txt		test.txt
train.py		train.py

kkorolev1/tts_dla

Folders and files

Latest commit

History

Repository files navigation

TTS HW 3

Checkpoints

Installation guide

Training

Testing

Results

First model

Second model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages