The interest in this paper is in efficient configuration of automatic speech recognition (ASR) systems for use by under-served speaker populations. A task domain involving Indian farmers accessing information on agricultural commodities through a spoken dialog system in multiple languages is presented. To facilitate the development of ASR system for this domain, a speech corpus was collected in rural areas from speakers of four languages over wireless cellular channels. This paper investigates the problem of ASR acoustic modelling for this task domain. Continuous density hidden Markov model (CDHMM) and subspace Gaussian mixture model (SGMM) [1] based techniques are used to train acoustic models in four languages: Assamese, Bengali, Hindi and Marathi. Issues relating to limited linguistic resources with their impact on ASR word accuracy for these languages are addressed. © 2012 IEEE.

Srinivasan Umesh

Department of Electrical Engineering

Acoustic model

ACOUSTIC MODELLING

AGRICULTURAL COMMODITIES

Automatic speech recognition system

CONTINUOUS DENSITY HIDDEN MARKOV MODELS

GAUSSIAN MIXTURE MODEL

Indian languages

LINGUISTIC RESOURCES

Multiple languages

SPEECH CORPORA

SPOKEN DIALOG SYSTEMS

SUBSPACE BASED

TASK DOMAIN

WORD ACCURACIES

Agriculture

Information science

Rural areas

signal processing

Speech Recognition

IIT Madras is a public technical and research university located in Chennai, Tamil Nadu. Founded in 1959, it is recognised as an Institute of National Importance.

IIT Madras has been ranked as the top engineering institute in India for four years in a row by the National Institutional Ranking Framework of the MHRD

It currently offers undergraduate, postgraduate and research degrees across 16 disciplines in Engineering, Sciences, Humanities and Management. About 596 faculty belonging to science and engineering departments and centres of the Institute are engaged in teaching, research and industrial consultancy.

IIT Madras

Subspace based for Indian languages

2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012

In developing speech recognition based services for any task domain, it is necessary to account for the support of an increasing number of languages over the life of the service. This paper considers a small vocabulary speech recognition task in multiple Indian languages. To configure a multi-lingual system in this task domain, an experimental study is presented using data from two linguistically similar languages - Hindi and Marathi. We do so by training a subspace Gaussian mixture model (SGMM) (Povey et al., 2011; Rose et al., 2011) under a multi-lingual scenario (Burget et al., 2010; Mohan et al., 2012a). Speech data was collected from the targeted user population to develop spoken dialogue systems in an agricultural commodities task domain for this experimental study. It is well known that acoustic, channel and environmental mismatch between data sets from multiple languages is an issue while building multi-lingual systems of this nature. As a result, we use a cross-corpus acoustic normalization procedure which is a variant of speaker adaptive training (SAT) (Mohan et al., 2012a). The resulting multi-lingual system provides the best speech recognition performance for both languages. Further, the effect of sharing "similar" context-dependent states from the Marathi language on the Hindi speech recognition performance is presented. © 2013 Elsevier B.V. All rights reserved.

Speech Communication

Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain

Cross-lingual acoustic modeling using Subspace Gaussian Mixture Model for low-resource languages of Indian origin is investigated. Building acoustic model for a low-resource language with limited vocabulary by leveraging resources from another language with comparatively larger resources was focused upon. Experiments were done on Bengali and Tamil corpus from MANDI database, with Tamil having greater resources than Bengali. We observed that the word accuracy of cross-lingual acoustic model of Bengali was approximately 2.5% above it's CDHMM model and gave equivalent performance as it's monolingual SGMM model. © 2014 IEEE.

2014 20th National Conference on Communications, NCC 2014

Cross-lingual acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Dealing with acoustic mismatch for training multilingual subspace Gaussian mixture models for speech recognition

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

Strategies for using MLP based features with limited target-language training data

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition

Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil

Acoustic data sharing for Afghan and Persian languages

Journal	2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012
Open Access	No