A unified parser for developing Indian language text to speech synthesizers

Hema Murthy; Arun Baby; N. L. Nishanthi; Anju Leela Thomas

doi:10.1007/978-3-319-45510-5_59

Profiles Research Units Publications

Book Chapter

A unified parser for developing Indian language text to speech synthesizers

, Arun Baby, N. L. Nishanthi, Anju Leela Thomas

Published in Springer Verlag

2016

DOI: 10.1007/978-3-319-45510-5_59

Volume: 9924 LNCS

Pages: 514 - 521

Abstract

This paper describes the design of a language independent parser for text-to-speech synthesis in Indian languages. Indian languages come from 5–6 different language families of the world. Most Indian languages have their own scripts. This makes parsing for text to speech systems for Indian languages a difficult task. In spite of the number of different families which leads to divergence, there is a convergence owing to borrowings across language families. Most importantly Indian languages are more or less phonetic and can be considered to consist broadly of about 35–38 consonants and 15–18 vowels. In this paper, an attempt is made to unify the languages based on this broad list of phones. A common label set is defined to represent the various phones in Indian languages. A uniform parser is designed across all the languages capitalising on the syllable structure of Indian languages. The proposed parser converts UTF-8 text to common label set, applies letter-to-sound rules and generates the corresponding phoneme sequences. The parser is tested against the custom-built parsers for multiple Indian languages. The TTS results show that the accuracy of the phoneme sequences generated by the proposed parser is more accurate than that of language specific parsers. © Springer International Publishing Switzerland 2016.

Topics: Parsing (63)%, Scripting language (53)%, Syllable (53)% and Speech synthesis (51)%

View more info for "A Unified Parser for Developing Indian Language Text to Speech Synthesizers"

About the journal

Journal	Data powered by TypesetLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher	Data powered by TypesetSpringer Verlag
ISSN	03029743
Open Access	No

Authors (1)

Hema Murthy
- Department of Computer Science and Engineering

Concepts (11)

Acoustic equipment
Linguistics
Speech synthesis
Telephone sets
Speech
COMMON LABEL SET
Indian languages
LETTER TO SOUND (LTS)
PARSER
SYLLABLE
Computational linguistics

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND