Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM

Srinivasan Umesh; Karthick B.M.; Kolhar P.

Profiles Research Units Publications

Other

Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM

, Karthick B.M., Kolhar P.

Published in International Speech and Communication Association

2015

Volume: 2015-January

Pages: 1096 - 1100

Abstract

The recent success of convolutional neural network (CNN) in speech recognition is due to its ability to capture translational variance in spectral features while performing discrimination. The CNN architecture requires correlated features as input and thus fMLLR transform which is estimated in de-correlated feature space fails to give significant improvement. In this paper, we propose two methods for extracting speaker adapted features in a correlated space using SGMMs. First, we estimate fMLLR transforms for correlated features by full covariance Gaussians using SGMM approach. Second, we augment speaker specific subspace vectors with acoustic features to provide speaker information in CNN models. Finally we propose a bottleneck - joint CNN/DNN framework to exploit the effects of both (fMLLR+ ivectors) and (SGMM-fMLLR+speaker vectors) features. Experiments on TIMIT task show that our proposed features give 5.7 % relative improvement over the log-mel features. Furthermore experiments on switchboard task show that the bottleneck - joint CNN/DNN model achieves 12.2 % relative improvement over baseline joint CNN/DNN framework. Copyright © 2015 ISCA.

About the journal

Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher	International Speech and Communication Association
ISSN	2308457X
Open Access	No

Authors (1)

Srinivasan Umesh
- Department of Electrical Engineering

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND