Description
Discussed in #639
Originally posted by BillyBobQuebec July 10, 2021
I am training Tacotron2-DDC (LJ) from scratch using the recipe provided with no changes, tensorboard looks good to my eyes, but the alignment and duration seem to be way off when I actually try inferencing the audio, I'm suspecting it's a problem with it being inferenced improperly. specifically, the r-value that it is attempting to inference with. Here is the command that I used to initiate training:
cd ~/repo/coqui-clean
bash recipes/ljspeech/tacotron2-DDC/run.sh
Since the recipe uses gradual training which uses "r" as the starting value for the fine decoder (if I understand it correctly), but then changes it over time during training, I suspect it's using the starting r value during inference, instead of the latest r-value the fine decoder was at during training, when I try to force a different r value for inference (by passing through a recipe config with "r": 2,
instead of "r": 6,
) it gives me this error:
RuntimeError: Error(s) in loading state_dict for Tacotron2:
size mismatch for decoder.linear_projection.linear_layer.weight: copying a param with shape torch.Size([480, 1536]) from checkpoint, the shape in current model is torch.Size([160, 1536]).
size mismatch for decoder.linear_projection.linear_layer.bias: copying a param with shape torch.Size([480]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for decoder.stopnet.1.linear_layer.weight: copying a param with shape torch.Size([1, 1504]) from checkpoint, the shape in current model is torch.Size([1, 1184]).
Here's the command used for inferencing and here's how it sounds at different points:
cd ~
cp ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/scale_stats.npy .
cp ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json config.json
CUDA_VISIBLE_DEVICES="" tts \
--text "Hello I bought this T.V. today, and it's cold outside. I should probably grab my sweater and go to your moms house." \
--model_path ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/ljspeech-ddc-July-06-2021_09+10AM-8fbadad6/checkpoint_280000.pth.tar \
--config_path config.json \
--out_path output.wav