Header menu link for other important links
X
A shift-based approach to speaker normalization using non-linear frequency-scaling model
Published in Elsevier
2008
Volume: 50
   
Issue: 3
Pages: 191 - 202
Abstract
In this work, we present a speaker-normalization method based on the idea that the speaker-dependent scale-factor can be separated out as a fixed translation factor in an alternate domain. We also introduce a non-linear frequency-scaling model motivated by the analysis of speech data. The proposed shift-based normalization approach is implemented using a maximum-likelihood (ML) search for the translation factor in the alternate domain. The advantage of our approach is that we are able to show the relationship between conventional frequency-warping based vocal-tract length normalization (VTLN) methods and the methods based on shifts in psycho-acoustic scale thus providing a unifying frame-work for speaker-normalization. Additionally, in our approach it is simple to show that the shifting required for normalization can be expressed as a linear transformation in the cepstral domain. This is important for computational efficiency since we do not have to recompute the features by re-doing the signal processing for each scale/translation factor as is usually done in conventional normalization. We present recognition results using our proposed approach on a digit recognition task and show that the non-linear scaling model provides relative improvement of 4% for adults and 7.5% for children when compared to the linear-scaling model. © 2007 Elsevier B.V. All rights reserved.
About the journal
JournalData powered by TypesetSpeech Communication
PublisherData powered by TypesetElsevier
Open AccessNo