Inference/recipe not working properly.

Discussed in #639

^{Originally posted by BillyBobQuebec July 10, 2021}
I am training Tacotron2-DDC (LJ) from scratch using the recipe provided with no changes, tensorboard looks good to my eyes, but the alignment and duration seem to be way off when I actually try inferencing the audio, I'm suspecting it's a problem with it being inferenced improperly. specifically, the r-value that it is attempting to inference with. Here is the command that I used to initiate training:

cd ~/repo/coqui-clean
bash recipes/ljspeech/tacotron2-DDC/run.sh

Since the recipe uses gradual training which uses "r" as the starting value for the fine decoder (if I understand it correctly), but then changes it over time during training, I suspect it's using the starting r value during inference, instead of the latest r-value the fine decoder was at during training, when I try to force a different r value for inference (by passing through a recipe config with "r": 2, instead of "r": 6, ) it gives me this error:

RuntimeError: Error(s) in loading state_dict for Tacotron2:
size mismatch for decoder.linear_projection.linear_layer.weight: copying a param with shape torch.Size([480, 1536]) from checkpoint, the shape in current model is torch.Size([160, 1536]).
size mismatch for decoder.linear_projection.linear_layer.bias: copying a param with shape torch.Size([480]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for decoder.stopnet.1.linear_layer.weight: copying a param with shape torch.Size([1, 1504]) from checkpoint, the shape in current model is torch.Size([1, 1184]).

Here's the command used for inferencing and here's how it sounds at different points:

cd ~
cp ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/scale_stats.npy .
cp ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json config.json
CUDA_VISIBLE_DEVICES="" tts \
  --text "Hello I bought this T.V. today, and it's cold outside. I should probably grab my sweater and go to your moms house." \
  --model_path ~/repo/coqui-clean/recipes/ljspeech/tacotron2-DDC/ljspeech-ddc-July-06-2021_09+10AM-8fbadad6/checkpoint_280000.pth.tar \
  --config_path config.json \
  --out_path output.wav

280k.mp4

110k.mp4

50k.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discussed in #639

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Discussed in #639

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions