Description
The weights for acoustic target features (viz. duration, log F0, and log F0 delta), are set to 0 by default, disabling these features for newly built unit-selection voices. These default values were initially hard-coded and never updated, which means that the weights can only be tuned (or just simply modified) by manually editing the mary/halfphoneUnitFeatureDefinition_ac.txt
file after it is generated by the AcousticFeatureFileWriter voicebuilding component.
However, virtually all of our published voices do contain manually tweaked acoustic feature weights, enabling prosody for unit selection, and significantly reducing pitch discontinuities in the resulting synthesis output.
To wit,
$ for j in lib/*.jar; do echo $j; unzip -c $j *_ac.txt | tail -n5; done
lib/voice-bits3-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
500 linear | unit_duration
50 linear | unit_logf0
50 linear | unit_logf0delta
lib/voice-dfki-obadiah-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1500 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-dfki-pavoque-neutral-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-dfki-pavoque-styles-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-dfki-poppy-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-dfki-prudence-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-dfki-spike-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
2000 linear | unit_duration
50 linear | unit_logf0
0 linear | unit_logf0delta
lib/voice-voxforge-ru-nsh-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
0 linear | unit_duration
0 linear | unit_logf0
0 linear | unit_logf0delta
Accordingly, I believe that enabling the acoustic features, by at least setting reasonable default weights for duration and log F0, would be an improvement. This is especially true when no weights tuning is applied, e.g., in unsupervised voicebuilding workflows.