Header menu link for other important links
Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM
, Karthick B.M., Kolhar P.
Published in International Speech and Communication Association
Volume: 2015-January
Pages: 1096 - 1100
The recent success of convolutional neural network (CNN) in speech recognition is due to its ability to capture translational variance in spectral features while performing discrimination. The CNN architecture requires correlated features as input and thus fMLLR transform which is estimated in de-correlated feature space fails to give significant improvement. In this paper, we propose two methods for extracting speaker adapted features in a correlated space using SGMMs. First, we estimate fMLLR transforms for correlated features by full covariance Gaussians using SGMM approach. Second, we augment speaker specific subspace vectors with acoustic features to provide speaker information in CNN models. Finally we propose a bottleneck - joint CNN/DNN framework to exploit the effects of both (fMLLR+ ivectors) and (SGMM-fMLLR+speaker vectors) features. Experiments on TIMIT task show that our proposed features give 5.7 % relative improvement over the log-mel features. Furthermore experiments on switchboard task show that the bottleneck - joint CNN/DNN model achieves 12.2 % relative improvement over baseline joint CNN/DNN framework. Copyright © 2015 ISCA.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
Open AccessNo