A syllable based statistical text to speech system

Hema Murthy; Abhijit Pradhan; Aswin Shanmugam S; Anusha Prakash; Veezhinathan Kamakoti

Profiles Research Units Publications

Conferences

A syllable based statistical text to speech system

, Abhijit Pradhan, Aswin Shanmugam S, Anusha Prakash,

Published in IEEE

2013

Abstract

A statistical parametric speech synthesis system uses triphones, phones or full context phones to address the problem of co-articulation. In this paper, syllables are used as the basic units in the parametric synthesiser. Conventionally full context phones in a HiddenMarkovModel (HMM) based speech synthesis framework are modeled with a fixed number of states. This is because each phoneme corresponds to a single indivisible sound. On the other hand a syllable is made up of a sequence of one or more sounds. To accommodate this variation, a variable number of states are used to model a syllable. Although a variable number of states are required to model syllables, a syllable captures co-articulation well since it is the smallest production unit. A syllable based speech synthesis system therefore does not require a well designed question set. The total number of syllables in a language is quite high and all of them cannot be modeled. To address this issue, a fallback unit is modeled instead. The quality of the proposed system is comparable to that of the phoneme based system in terms of DMOS and WER. © 2013 EURASIP.