Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework

Srinivasan Umesh

Profiles Research Units Publications

Conferences

Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework

Published in

2010

Pages: 2738 - 2741

Abstract

Anchor modeling technique has been shown to be useful in reducing computational complexity for speaker identification and indexing of large audio database. In this technique, speakers are projected onto a talker space spanned by a set of predefined anchor models which are usually represented by Gaussian Mixture Models (GMMs). The characterization of each speaker involves calculation of likelihood with each of the anchor models, and is therefore expensive even in the GMM Universal Background model (GMM-UBM) frame work using top-C mixtures per feature vector. In this paper, we propose a computationally efficient (Fast) method to calculate the likelihood of the speech utterances using anchor speaker-specific Maximum Likelihood Linear Regression (MLLR) matrices and sufficient statistics estimated from the utterance. We show that the proposed method is faster by an order of magnitude for evaluating the speaker characterization vector. Since anchor models use simple distance measures to identify speakers, they are used as a first stage to select N probable speakers and then cascaded by a conventional GMM-UBM stage which finally identifies the speaker from this reduced set. We show that the proposed method in cascade combination perform 4.21× faster than the conventional cascade anchor model system with comparable performance. The experiments are performed on NIST 2004 SRE in core condition. © 2010 ISCA.