Header menu link for other important links
X
On improving acoustic models for TORGO dysarthric speech database
Published in International Speech Communication Association
2017
Volume: 2017-August
   
Pages: 2695 - 2699
Abstract
Assistive technologies based on speech have been shown to improve the quality of life of people affected with dysarthria, a motor speech disorder. Multiple ways to improve Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network (DNN) based automatic speech recognition (ASR) systems for TORGO database for dysarthric speech are explored in this paper. Past attempts in developing ASR systems for TORGO database were limited to training just monophone models and doing speaker adaptation over them. Although a recent work attempted training triphone and neural network models, parameters like the number of context dependent states, dimensionality of the principal component features etc were not properly tuned. This paper develops speakerspecific ASR models for each dysarthric speaker in TORGO database by tuning parameters of GMM-HMM model, number of layers and hidden nodes in DNN. Employing dropout scheme and sequence discriminative training in DNN also gave significant gains. Speaker adapted features like feature-space maximum likelihood linear regression (FMLLR) are used to pass the speaker information to DNNs. To the best of our knowledge, this paper presents the best recognition accuracies for TORGO database till date. Copyright © 2017 ISCA.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association
ISSN2308457X
Open AccessNo
Concepts (19)
  •  related image
    Database systems
  •  related image
    Deep neural networks
  •  related image
    Gaussian distribution
  •  related image
    Hidden markov models
  •  related image
    Markov processes
  •  related image
    Maximum likelihood
  •  related image
    Principal component analysis
  •  related image
    Speech
  •  related image
    Speech communication
  •  related image
    Trellis codes
  •  related image
    Automatic speech recognition system
  •  related image
    DISCRIMINATIVE TRAINING
  •  related image
    DYSARTHRIA
  •  related image
    GAUSSIAN MIXTURE MODEL
  •  related image
    GMM-HMM
  •  related image
    MAXIMUM LIKELIHOOD LINEAR REGRESSION
  •  related image
    Principal components
  •  related image
    TORGO
  •  related image
    Speech recognition