Header menu link for other important links
X
DNNs for unsupervised extraction of pseudo FMLLR features without explicit adaptation data
Published in International Speech and Communication Association
2016
Volume: 08-12-September-2016
   
Pages: 3479 - 3483
Abstract
In this paper, we propose the use of deep neural networks (DNN) as a regression model to estimate feature-space maximum likelihood linear regression (FMLLR) features from unnormalized features. During training, the pair of unnormalized features as input and corresponding FMLLR features as target are provided and the network is optimized to reduce the mean-square error between output and target FMLLR features. During test, the unnormalized features are passed through this DNN feature extractor to obtain FMLLR-like features without any supervision or first pass decode. Further, the FMLLR-like features are generated frame-by-frame, requiring no explicit adaptation data to extract the features unlike in FMLLR or ivector. Our proposed approach is therefore suitable for scenarios where there is little adaptation data. The proposed approach provides sizable improvements over basis-FMLLR and conventional FMLLR when normalization is done at utterance level on TIMIT and Switchboard-33hour data sets. Copyright © 2016 ISCA.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
ISSN2308457X
Open AccessYes
Concepts (11)
  •  related image
    Maximum likelihood
  •  related image
    Mean square error
  •  related image
    Regression analysis
  •  related image
    Speech processing
  •  related image
    Speech recognition
  •  related image
    ADAPTATION DATA
  •  related image
    BASIS-FMLLR
  •  related image
    FMLLR
  •  related image
    I VECTORS
  •  related image
    SPEAKER NORMALIZATION
  •  related image
    Speech communication