Header menu link for other important links
X
Speaker recognition using pyramid match kernel based support vector machines
C. Chandra Sekhar
Published in
2012
Volume: 15
   
Issue: 3
Pages: 365 - 379
Abstract
Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook- based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. © 2012 Springer Science+Business Media, LLC.
About the journal
JournalInternational Journal of Speech Technology
ISSN13812416
Open AccessNo
Concepts (31)
  •  related image
    Codebooks
  •  related image
    Discriminative classifiers
  •  related image
    EXPECTATION-MAXIMIZATION METHOD
  •  related image
    Feature space
  •  related image
    Feature vectors
  •  related image
    GAUSSIAN MIXTURE MODEL
  •  related image
    In-buildings
  •  related image
    K-means clustering method
  •  related image
    KERNEL METHODS
  •  related image
    LEARNING-BASED METHODS
  •  related image
    Linear discriminants
  •  related image
    LOCAL FEATURE VECTORS
  •  related image
    Multi-resolutions
  •  related image
    Number of clusters
  •  related image
    PROBABILISTIC SEQUENCES
  •  related image
    PYRAMID MATCH KERNEL
  •  related image
    SOFT CLUSTERING
  •  related image
    SPEAKER IDENTIFICATION
  •  related image
    SPEAKER RECOGNITION
  •  related image
    SPEAKER VERIFICATION
  •  related image
    Speech signals
  •  related image
    Storage requirements
  •  related image
    SUPERVECTOR
  •  related image
    Svm classifiers
  •  related image
    WEIGHTED HISTOGRAM
  •  related image
    Graphic methods
  •  related image
    Image retrieval
  •  related image
    LOUDSPEAKERS
  •  related image
    Speech recognition
  •  related image
    Statistical methods
  •  related image
    Support vector machines