Header menu link for other important links
X
Using VTLN for broadcast news transcription
, Kim D.Y., Gales M.J.F., Hain T., Woodland P.C.
Published in
2004
Pages: 1953 - 1956
Abstract
Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the Jacobian associated with the VTLN transformation. A new, simple, linear approximation to VTLN is described in this paper. This linear approximation allows the Jacobian to be exactly computed. It can also be highly efficient in terms of warp factor estimation and application of the warp factors. Both the linear and standard CUED VTLN schemes were evaluated in the 2003 BNE evaluation framework and found to yield similar performance. When used in system combination both VTLN schemes yielded slight gains over the baseline system.
About the journal
Journal8th International Conference on Spoken Language Processing, ICSLP 2004
Open AccessNo