Header menu link for other important links
X
Improving deep neural networks using state projection vectors of subspace Gaussian mixture model as features
Published in Institute of Electrical and Electronics Engineers Inc.
2014
Pages: 129 - 134
Abstract
Recent advancement in deep neural network (DNN) has surpassed the conventional hidden Markov model-Gaussian mixture model (HMM-GMM) framework due to its efficient training procedure. Providing better phonetic context information in the input gives improved performance for DNN. The state projection vectors (state specific vectors) in subspace Gaussian mixture model (SGMM) captures the phonetic information in low dimensional vector space. In this paper, we propose to use state specific vectors of SGMM as features thereby providing additional phonetic information for the DNN framework. To each observation vector in the train data, the corresponding state specific vectors of SGMM are aligned to form the state specific vector feature set. Linear discriminant analysis (LDA) feature set are formed by applying LDA to the training data. Since bottleneck features are efficient in extracting useful discriminative information for the phonemes, LDA feature set and state specific vector feature set are converted to bottleneck features. These bottleneck features of both feature sets act as input features to train a single DNN framework. Relative improvement of 8.8% for TIMIT database (core test set) and 9.7% for WSJ corpus is obtained by using the state specific vector bottleneck feature set when compared to the DNN trained only with LDA bottleneck feature set. Also training Deep belief network - DNN (DBN-DNN) using the proposed feature set attains a WER of 20.46% on TIMIT core test set proving the effectiveness of our method. The state specific vectors while acting as features, provide additional useful information related to phoneme variation. Thus by combining it with LDA bottleneck features improved performance is obtained using the DNN framework. © 2014 IEEE.
About the journal
JournalData powered by Typeset2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
Open AccessNo
Concepts (16)
  •  related image
    Discriminant analysis
  •  related image
    Gaussian distribution
  •  related image
    Hidden markov models
  •  related image
    Linguistics
  •  related image
    Markov processes
  •  related image
    Trellis codes
  •  related image
    Vector spaces
  •  related image
    BOTTLENECK FEATURES
  •  related image
    DEEP BELIEF NETWORKS
  •  related image
    Deep neural networks
  •  related image
    GAUSSIAN MIXTURE MODEL
  •  related image
    Linear discriminant analysis
  •  related image
    PHONETIC INFORMATION
  •  related image
    SGMM
  •  related image
    SUBSPACE GAUSSIAN MIXTURE MODELS
  •  related image
    Vectors