Header menu link for other important links
X
FMLLR Speaker Normalization with i-Vector: In Pseudo-FMLLR and Distillation Framework
Published in Institute of Electrical and Electronics Engineers Inc.
2018
Volume: 26
   
Issue: 4
Pages: 797 - 805
Abstract
When an automatic speech recognition (ASR) system is deployed for real-world applications, it often receives only one utterance at a time for decoding. This single utterance could be of short duration depending on the ASR task. In these cases, robust estimation of speaker normalizing methods like feature-space maximum likelihood linear regression (FMLLR) and i-vectors may not be feasible. In this paper, we propose two unsupervised speaker normalization techniques - one at feature level and other at model level of acoustic modeling - to overcome the drawbacks of FMLLR and i-vectors in real-time scenarios. At feature level, we propose the use of deep neural networks (DNN) to generate pseudo-FMLLR features from time-synchronous pair of filterbank and FMLLR features. These pseudo-FMLLR features can then be used for DNN acoustic model training and decoding. At model level, we propose a generalized distillation framework, where a teacher DNN trained on FMLLR features guides the training and optimization of a student DNN trained on filterbank features. In both the proposed methods, the ambiguity in choosing the speaker-specific FMLLR transform can be reduced by augmenting i-vectors to the input filterbank features. Experiments conducted on 33-h and 110-h subsets of Switchboard corpus show that the proposed methods provide significant gains over DNNs trained on FMLLR, i-vector appended FMLLR, filterbank and i -vector appended filterbank features, in real-time scenario. © 2014 IEEE.
About the journal
JournalData powered by TypesetIEEE/ACM Transactions on Audio Speech and Language Processing
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
ISSN23299290
Open AccessNo
Concepts (24)
  •  related image
    Acoustics
  •  related image
    Decoding
  •  related image
    Deep neural networks
  •  related image
    Distillation
  •  related image
    ELECTRIC SWITCHBOARDS
  •  related image
    Feature extraction
  •  related image
    Filter banks
  •  related image
    Hidden markov models
  •  related image
    Markov processes
  •  related image
    Mathematical transformations
  •  related image
    Maximum likelihood
  •  related image
    Maximum likelihood estimation
  •  related image
    Personnel training
  •  related image
    Speech
  •  related image
    Speech processing
  •  related image
    Teaching
  •  related image
    Vector spaces
  •  related image
    Vectors
  •  related image
    I VECTORS
  •  related image
    PSEUDO- FMLLR
  •  related image
    SPEAKER NORMALIZATION
  •  related image
    SWITCHBOARD
  •  related image
    UN-SUPERVISED
  •  related image
    Speech recognition