Header menu link for other important links
X
Improved chirp group delay based algorithms with applications to vocal tract estimation and speech recognition
M. K. Jayesh,
Published in Elsevier B.V.
2016
Volume: 81
   
Pages: 72 - 89
Abstract
In this paper we propose two algorithms for estimating the vocal tract from the Fourier transform phase of a given speech segment. In the first approach, we find the zeros of the z-transform, reflect all outside-unit-circle zeros inside, and then compute the chirp group delay spectrum. This method eliminates many of the drawbacks in Bozkurt's CGDGCI method, and is able to model well the spectral valleys present. In the case of high pitch sounds, the vocal tract estimate in the proposed method is corrupted by source oscillations. In the second approach, by casting the problem within the framework of Independent Component Analysis, we propose a method wherein these effects are considerably suppressed. ASR results on the TIMIT database using features derived from the first method are comparable to those obtained using MFCC features. Further improvement in the recognition accuracy (compared with the baseline MFCC) was obtained by using lattice combining technique, resulting in a Phone Error Rate of 17%. Also, by using our abilities to model spectral valleys well, we propose additional features that are able to distinguish the nasals /m/, /n/, and /ng/, which in turn lead to an increase in their recognition accuracy. © 2016
About the journal
JournalData powered by TypesetSpeech Communication
PublisherData powered by TypesetElsevier B.V.
ISSN01676393
Open AccessNo
Concepts (12)
  •  related image
    Group delay
  •  related image
    Independent component analysis
  •  related image
    Z TRANSFORMS
  •  related image
    Combining techniques
  •  related image
    FOURIER TRANSFORM PHASE
  •  related image
    GROUP DELAY SPECTRUMS
  •  related image
    PHASE PROCESSING
  •  related image
    Phone error rate
  •  related image
    Recognition accuracy
  •  related image
    Speech segments
  •  related image
    Vocal-tracts
  •  related image
    Speech recognition