On improving acoustic models for TORGO dysarthric speech database

Srinivasan Umesh

doi:10.21437/Interspeech.2017-878

Profiles Research Units Publications

Conferences

On improving acoustic models for TORGO dysarthric speech database

Published in International Speech Communication Association

2017

DOI: 10.21437/Interspeech.2017-878

Volume: 2017-August

Pages: 2695 - 2699

Abstract

Assistive technologies based on speech have been shown to improve the quality of life of people affected with dysarthria, a motor speech disorder. Multiple ways to improve Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network (DNN) based automatic speech recognition (ASR) systems for TORGO database for dysarthric speech are explored in this paper. Past attempts in developing ASR systems for TORGO database were limited to training just monophone models and doing speaker adaptation over them. Although a recent work attempted training triphone and neural network models, parameters like the number of context dependent states, dimensionality of the principal component features etc were not properly tuned. This paper develops speakerspecific ASR models for each dysarthric speaker in TORGO database by tuning parameters of GMM-HMM model, number of layers and hidden nodes in DNN. Employing dropout scheme and sequence discriminative training in DNN also gave significant gains. Speaker adapted features like feature-space maximum likelihood linear regression (FMLLR) are used to pass the speaker information to DNNs. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date. Copyright © 2017 ISCA.

About the journal

Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher	International Speech Communication Association
ISSN	2308457X
Open Access	No

Authors (1)

Srinivasan Umesh
- Department of Electrical Engineering

Concepts (19)

Database systems
Deep neural networks
Gaussian distribution
Hidden markov models
Markov processes
Maximum likelihood
Principal component analysis
Speech
Speech communication
Trellis codes
Automatic speech recognition system
DISCRIMINATIVE TRAINING
DYSARTHRIA
GAUSSIAN MIXTURE MODEL
GMM-HMM
MAXIMUM LIKELIHOOD LINEAR REGRESSION
Principal components
TORGO
Speech recognition

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND