unit selection: final boundary durations synthesized 50% shorter than requested

Using the cmu-slt unit-selection voice, the TEXT

uh.

uh.

oh.

has boundary durations predicted as ACOUSTPARAMS

<?xml version="1.0" encoding="UTF-8"?>
<maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="en-US">
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' V" pos="UH">
uh
<syllable accent="!H*" ph="V" stress="1"><ph d="398" end="0.398275" f0="(0,165) (50,267) (100,235)" p="V"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="400" tone="L-L%"/>
      </phrase>
    </s>
  </p>
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' V" pos="UH">
uh
<syllable accent="!H*" ph="V" stress="1"><ph d="398" end="0.398275" f0="(0,165) (50,267) (100,235)" p="V"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="400" tone="L-L%"/>
      </phrase>
    </s>
  </p>
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' @U" pos="UH">
oh
<syllable accent="!H*" ph="@U" stress="1"><ph d="338" end="0.338394" f0="(0,165) (50,311) (100,235)" p="@U"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="400" tone="L-L%"/>
      </phrase>
    </s>
  </p>
</maryxml>

Note the constant duration="400" (ms) for each boundary element.

But when this is actually synthesized, the REALISED_ACOUSTPARAMS becomes

<?xml version="1.0" encoding="UTF-8"?>
<maryxml xmlns="http://mary.dfki.de/2002/MaryXML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.5" xml:lang="en-US">
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' V" pos="UH">
uh
<syllable accent="!H*" ph="V" stress="1"><ph d="88" end="0.088750005" f0="(0,165) (50,267) (100,235)" p="V" units="V_L arctic_a0146 10273 0.045; V_R arctic_a0146 10274 0.04375"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="200" tone="L-L%" units="__L arctic_b0385 67582 0.2"/>
      </phrase>
    </s>
  </p>
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' V" pos="UH">
uh
<syllable accent="!H*" ph="V" stress="1"><ph d="88" end="0.088750005" f0="(0,165) (50,267) (100,235)" p="V" units="V_L arctic_a0146 10273 0.045; V_R arctic_a0146 10274 0.04375"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="200" tone="L-L%" units="__L arctic_b0385 67582 0.2"/>
      </phrase>
    </s>
  </p>
  <p>
    <s>
      <phrase>
        <t accent="!H*" g2p_method="lexicon" ph="' @U" pos="UH">
oh
<syllable accent="!H*" ph="@U" stress="1"><ph d="246" end="0.2468125" f0="(0,165) (50,311) (100,235)" p="@U" units="@U_L arctic_a0105 7295 0.0880625; @U_R arctic_b0352 65184 0.15875"/></syllable>
</t>
        <t pos=".">
.
</t>
        <boundary breakindex="5" duration="200" tone="L-L%" units="__L arctic_b0352 65185 0.2"/>
      </phrase>
    </s>
  </p>
</maryxml>

Note how the specified boundary durations have been halved from 400 to 200 ms.

Furthermore, by inspecting the PRAAT_TEXTGRID or similar, we can plainly confirm that the boundaries are only 0.2 seconds long.

And the units tier tells us which units from the unit-selection database are selected to render the boundaries as pauses.

Interestingly, dumping and inspecting the voice data reveals that those units (indices 67582 and 65185) are actually 0.1284 and 0.1529 seconds long, respectively.

TL;DR: The duration attributes of boundary elements have their specified values reduced by 50% when synthesizing from the specified ACOUSTPARAMS to REALISED_ACOUSTPARAMS, and the lengths of the corresponding pauses are accordingly wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions