enable acoustic features by default

The weights for acoustic target features (viz. duration, log F0, and log F0 delta), are set to 0 by default, disabling these features for newly built unit-selection voices. These default values were initially hard-coded and never updated, which means that the weights can only be tuned (or just simply modified) by manually editing the mary/halfphoneUnitFeatureDefinition_ac.txt file after it is generated by the AcousticFeatureFileWriter voicebuilding component.

However, virtually all of our published voices do contain manually tweaked acoustic feature weights, enabling prosody for unit selection, and significantly reducing pitch discontinuities in the resulting synthesis output.
To wit,

$ for j in lib/*.jar; do echo $j; unzip -c $j *_ac.txt | tail -n5; done
lib/voice-bits3-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
500 linear | unit_duration
50 linear | unit_logf0
50 linear | unit_logf0delta

lib/voice-dfki-obadiah-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1500 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-dfki-pavoque-neutral-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-dfki-pavoque-styles-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-dfki-poppy-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-dfki-prudence-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
1000 linear | unit_duration
100 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-dfki-spike-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
2000 linear | unit_duration
50 linear | unit_logf0
0 linear | unit_logf0delta

lib/voice-voxforge-ru-nsh-5.0-SNAPSHOT.jar
ContinuousFeatureProcessors
0 linear | unit_duration
0 linear | unit_logf0
0 linear | unit_logf0delta

Accordingly, I believe that enabling the acoustic features, by at least setting reasonable default weights for duration and log F0, would be an improvement. This is especially true when no weights tuning is applied, e.g., in unsupervised voicebuilding workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions