Header menu link for other important links
X
A syllable based statistical text to speech system
, Abhijit Pradhan, Aswin Shanmugam S, Anusha Prakash,
Published in IEEE
2013
Abstract
A statistical parametric speech synthesis system uses triphones, phones or full context phones to address the problem of co-articulation. In this paper, syllables are used as the basic units in the parametric synthesiser. Conventionally full context phones in a HiddenMarkovModel (HMM) based speech synthesis framework are modeled with a fixed number of states. This is because each phoneme corresponds to a single indivisible sound. On the other hand a syllable is made up of a sequence of one or more sounds. To accommodate this variation, a variable number of states are used to model a syllable. Although a variable number of states are required to model syllables, a syllable captures co-articulation well since it is the smallest production unit. A syllable based speech synthesis system therefore does not require a well designed question set. The total number of syllables in a language is quite high and all of them cannot be modeled. To address this issue, a fallback unit is modeled instead. The quality of the proposed system is comparable to that of the phoneme based system in terms of DMOS and WER. © 2013 EURASIP.
About the journal
JournalData powered by Typeset21st European Signal Processing Conference (EUSIPCO 2013)
PublisherData powered by TypesetIEEE
ISSN22195491
Open AccessNo
Concepts (11)
  •  related image
    Signal processing
  •  related image
    Telephone sets
  •  related image
    HTS
  •  related image
    PRODUCTION UNITS
  •  related image
    SPEECH SYNTHESIS SYSTEM
  •  related image
    STATISTICAL PARAMETRIC SPEECH SYNTHESIS
  •  related image
    STATISTICAL TTS
  •  related image
    SYLLABLE
  •  related image
    TEXT-TO-SPEECH SYSTEM
  •  related image
    TTS
  •  related image
    Speech synthesis