In developing speech recognition based services for any task domain, it is necessary to account for the support of an increasing number of languages over the life of the service. This paper considers a small vocabulary speech recognition task in multiple Indian languages. To configure a multi-lingual system in this task domain, an experimental study is presented using data from two linguistically similar languages - Hindi and Marathi. We do so by training a subspace Gaussian mixture model (SGMM) (Povey et al., 2011; Rose et al., 2011) under a multi-lingual scenario (Burget et al., 2010; Mohan et al., 2012a). Speech data was collected from the targeted user population to develop spoken dialogue systems in an agricultural commodities task domain for this experimental study. It is well known that acoustic, channel and environmental mismatch between data sets from multiple languages is an issue while building multi-lingual systems of this nature. As a result, we use a cross-corpus acoustic normalization procedure which is a variant of speaker adaptive training (SAT) (Mohan et al., 2012a). The resulting multi-lingual system provides the best speech recognition performance for both languages. Further, the effect of sharing "similar" context-dependent states from the Marathi language on the Hindi speech recognition performance is presented. © 2013 Elsevier B.V. All rights reserved.

Srinivasan Umesh

Department of Electrical Engineering

Agriculture

Aluminum

Deep neural networks

Gaussian distribution

Linguistics

Modeling languages

Population statistics

Speech

Speech processing

AGRICULTURAL COMMODITIES

Automatic Speech Recognition

SPEAKER ADAPTIVE TRAININGS

SPEECH RECOGNITION PERFORMANCE

SPOKEN DIALOGUE SYSTEM

SUB-SPACE MODELLING

SUBSPACE GAUSSIAN MIXTURE MODELS

UNDER-RESOURCED LANGUAGES

Speech Recognition

IIT Madras is a public technical and research university located in Chennai, Tamil Nadu. Founded in 1959, it is recognised as an Institute of National Importance.

IIT Madras has been ranked as the top engineering institute in India for four years in a row by the National Institutional Ranking Framework of the MHRD

It currently offers undergraduate, postgraduate and research degrees across 16 disciplines in Engineering, Sciences, Humanities and Management. About 596 faculty belonging to science and engineering departments and centres of the Institute are engaged in teaching, research and industrial consultancy.

IIT Madras

Speech Communication

The computer-assisted learning of spoken language is closely tied to automatic speech recognition (ASR) technology which, as is well known, is challenging with non-native speech. By focusing on specific phonological differences between the target and source languages of non-native speakers, pronunciation assessment can be made more reliable. The four-way contrast of Hindi stops, where voicing and aspiration are phonemic for each of five distinct places-of-articulation, are typically challenging for a learner from a different native language group. The improper production of the aspiration contrast is thus often the salient cue to non-native accents of spoken Hindi. In this work, acoustic-phonetic features, motivated by an understanding of the production of the aspirated plosives, are evaluated for the classification of plosives along the aspiration dimension. Several new acoustic measures are proposed for the reliable detection of the aspiration contrast in unvoiced and voiced plosives. The acoustic-phonetic features are shown to perform well in the two-way classification task, and also appear robust to cross-language transfer where statistical models trained on Marathi speech were tested on native Hindi utterances. In experiments on native and non-native utterances of Hindi words by Tamil-L1 speakers, the acoustic-phonetic features clearly separate the non-native speakers from native on pronunciation quality of aspirated plosives. The acoustic-phonetic features also outperformed an ASR system based on more generic spectral features in terms of phone-level feedback that was consistent with human judgement. © 2015 Elsevier Ltd.

Journal of Phonetics

Detection of phonemic aspiration for spoken Hindi pronunciation evaluation

The interest in this paper is in efficient configuration of automatic speech recognition (ASR) systems for use by under-served speaker populations. A task domain involving Indian farmers accessing information on agricultural commodities through a spoken dialog system in multiple languages is presented. To facilitate the development of ASR system for this domain, a speech corpus was collected in rural areas from speakers of four languages over wireless cellular channels. This paper investigates the problem of ASR acoustic modelling for this task domain. Continuous density hidden Markov model (CDHMM) and subspace Gaussian mixture model (SGMM) [1] based techniques are used to train acoustic models in four languages: Assamese, Bengali, Hindi and Marathi. Issues relating to limited linguistic resources with their impact on ASR word accuracy for these languages are addressed. © 2012 IEEE.

2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012

Subspace based for Indian languages

HTM Journal of Heat Treatment and Materials

HTM-Praxis

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

Combination of data borrowing strategies for low-resource LVCSR

International Journal of Computer Applications

Indian Language Speech Database: A Review

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition

Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain

Journal	Data powered by TypesetSpeech Communication
Publisher	Data powered by TypesetElsevier B.V.
ISSN	01676393
Open Access	No