In speaker verification task requires some sort of background model for the system to make decision. Most of the cases, a speaker independent large Gaussian Universal Background Model (GMM-UBM) is used. In this paper, we propose to use a Speaker Cluster-wise UBM (SC-UBM) for a group of target speakers. In this method, the target speakers are clustered into group based on their similarity in Vocal Tract Length Normalization (VTLN) parameter. The VTLN parameter depends on the physiological structure of human speech production system. Hence, the group of speakers with same VTLN factor represent a speaker with unique characteristic. The SC-UBMs are derived from GMM-UBM with Maximum Likelihood Linear Regression (MLLR) by pooling data from the specific group of target speakers. The speaker dependent models are then adapted from their respective SC-UBM using Maximum a Posteriori (MAP) method. During verification, the log likelihood ratio for the claimant is calculated with respect to the corresponding group specific UBM. The comparative study are performed on NIST 2004 SRE in core condition. The SCUBM system reduced equal error rate (EER) by 9% over the GMM-UBM system. ©2010 IEEE.

Srinivasan Umesh

Department of Electrical Engineering

Background model

Comparative studies

EQUAL ERROR RATE

Gaussians

GROUP-BASED

HUMAN SPEECH

IN-CORE

LOG LIKELIHOOD RATIO

Maximum a posteriori

MAXIMUM LIKELIHOOD LINEAR REGRESSION

PHYSIOLOGICAL STRUCTURES

SPEAKER DEPENDENTS

SPEAKER VERIFICATION

TARGET SPEAKER

UBM SYSTEMS

UNIVERSAL BACKGROUND MODEL

VOCAL TRACT LENGTH NORMALIZATION

MAGNETOSTRICTIVE DEVICES

Maximum likelihood estimation

Physiological models

Targets

Speech Recognition

IIT Madras is a public technical and research university located in Chennai, Tamil Nadu. Founded in 1959, it is recognised as an Institute of National Importance.

IIT Madras has been ranked as the top engineering institute in India for four years in a row by the National Institutional Ranking Framework of the MHRD

It currently offers undergraduate, postgraduate and research degrees across 16 disciplines in Engineering, Sciences, Humanities and Management. About 596 faculty belonging to science and engineering departments and centres of the Institute are engaged in teaching, research and industrial consultancy.

IIT Madras

Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification

Proceedings of 16th National Conference on Communications, NCC 2010

Recently, Multiple Background Models (M-BMs) [1, 2] have been shown to be useful in speaker verification, where the M-BMs are formed based on different Vocal Tract Lengths (VTLs) among the population. The speaker models are adapted from the particular Background Model (BM) corresponding to their VTL. During test, log likelihood ratio of the test utterance is calculated between claimant model and the corresponding BM. In this paper, instead of using different BM for different speaker, we propose the use of single gender, channel and VTL independent UBM (root-UBM) using the concept of VTL dependent mapping function. The proposed concept is inspired by Feature Mapping (FM) technique used in speaker verification to overcome channel variability. In our proposed method, VTL specific gender independent Gaussian Mixture models (GMMs) are derived from the root-UBM using Maximum a posteriori (MAP) adaptation. The mapping relation is then learned between the root-UBM and the VTL-specific GMM. During training and testing phase, feature vectors are mapped into root-UBM using the best VTL specific model. Then speaker models are adapted from the root-UBM using mapped features. During test, the log likelihood ratio is calculated between target model and root-UBM. Therefore, unlike M-BM system, there is no need to switch to different BMs depending on the claimant. Another advantage of the proposed method is that other additional normalization/compensation techniques can be easily applied since it is in a single UBM frame-work. The experiments are performed on NIST 2004 SRE core condition, and we show that the performance of the proposed method is close to the M-BM system with and without score normalization. © 2011 IEEE.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification

Vocal tract length normalization factor based speaker-cluster UBM for speaker verification

Odyssey 2020 The Speaker and Language Recognition Workshop

Speaker Characterization Using TDNN, TDNN-LSTM, TDNN-LSTM-Attention based Speaker Embeddings for NIST SRE 2019

Journal	Proceedings of 16th National Conference on Communications, NCC 2010
Open Access	No