Phone-cluster adaptive training (Phone-CAT) is a subspace based acoustic modeling technique inspired from cluster adaptive training (CAT) and subspace Gaussian mixture model (SGMM). This paper explores three extensions, viz., increasing phonetic subspace dimension, including sub-states and speaker subspace, to the basic Phone-CAT model to improve its recognition performance. The latter two extensions are similar in implementation as that of SGMM as both acoustic models share a similar subspace framework. But, since the phonetic subspace dimension of Phone-CAT is constrained to be equal to the number of monophones, the first extension is not straightforward to implement. We propose a Two-stage Phone-CAT model where we increase the phonetic subspace dimension to that of the number of monophone states. This model will still be able to retain the center phone capturing property of the state-specific vectors in basic Phone-CAT. Experiments done on 33-hour train subset of Switchboard database shows improvements in recognition performance of basic Phone-CAT model with the inclusion of the proposed extensions. © 2016 IEEE.

Srinivasan Umesh

Department of Electrical Engineering

Gaussian distribution

Linguistics

signal processing

Vectors

Acoustic model

CLUSTER ADAPTIVE TRAINING

MONOPHONES

SUBSPACE BASED

SUBSPACE GAUSSIAN MIXTURE MODELS

Telephone sets

IIT Madras is a public technical and research university located in Chennai, Tamil Nadu. Founded in 1959, it is recognised as an Institute of National Importance.

IIT Madras has been ranked as the top engineering institute in India for four years in a row by the National Institutional Ranking Framework of the MHRD

It currently offers undergraduate, postgraduate and research degrees across 16 disciplines in Engineering, Sciences, Humanities and Management. About 596 faculty belonging to science and engineering departments and centres of the Institute are engaged in teaching, research and industrial consultancy.

IIT Madras

Improved phone-cluster adaptive training acoustic model

2016 International Conference on Signal Processing and Communications, SPCOM 2016

In this paper, we propose a new acoustic modeling technique called the Phone-Cluster Adaptive Training. In this approach, the parameters of context-dependent states are obtained by the linear interpolation of several monophone cluster models, which are themselves obtained by adaptation using linear transformation of a canonical Gaussian Mixture Model (GMM). This approach is inspired from the Cluster Adaptive Training (CAT) for speaker adaptation and the Subspace Gaussian Mixture Model (SGMM). The parameters of the model are updated in an adaptive training framework. The interpolation vectors implicitly capture the phonetic context information. The proposed approach shows substantial improvement over the Continuous Density Hidden Markov Model (CDHMM) and a similar performance to that of the SGMM, while using significantly fewer parameters than both the CDHMM and the SGMM. © 2013 IEEE.

2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Acoustic modeling using transform-based phone-cluster adaptive training

One of the major problems in acoustic modeling for a low-resource language is data sparsity. In recent years, cross-lingual acoustic modeling techniques have been employed to overcome this problem. In this paper we propose multiple cross-lingual techniques to address the problem of data insufficiency. The first method, which we call as the cross-lingual phone-CAT, uses the principles of phone-cluster adaptive training (phone-CAT), where the parameters of context-dependent states are obtained by linear interpolation of monophone cluster models. The second method uses the interpolation vectors of phone-CAT, which is known to capture the phonetic context information, to map phonemes between two languages. Finally, the data-driven phoneme-mapping technique is incorporated into the cross-lingual phone-CAT, to obtain what we call as the phoneme-mapped cross-lingual phone-CAT. The proposed techniques are employed in acoustic modeling of three Indian languages namely Bengali, Hindi and Tamil. The phoneme-mapped cross-lingual phone-CAT gave relative improvements of 15.14% for Bengali, 16.4% for Hindi and 11.3% for Tamil over the conventional cross-lingual subspace Gaussian mixture model (SGMM) in low-resource scenario. © 2014 IEEE.

2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings

A data-driven phoneme mapping technique using interpolation vectors of phone-cluster adaptive training

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Journal	Data powered by Typeset2016 International Conference on Signal Processing and Communications, SPCOM 2016
Publisher	Data powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
Open Access	No