Articulatory feature extraction using CTC to build articulatory classifiers without forced frame alignments for speech recognition

Srinivasan Umesh

doi:10.21437/Interspeech.2016-925

Profiles Research Units Publications

Conferences

Open Access

Articulatory feature extraction using CTC to build articulatory classifiers without forced frame alignments for speech recognition

Published in International Speech and Communication Association

2016

DOI: 10.21437/Interspeech.2016-925

Volume: 08-12-September-2016

Pages: 798 - 802

Abstract

Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set. Copyright ©2016 ISCA.

PDFPublisher Copy

About the journal

Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher	International Speech and Communication Association
ISSN	2308457X
Open Access	Yes

Authors (1)

Srinivasan Umesh
- Department of Electrical Engineering

Concepts (19)

Alignment
Feature extraction
Network architecture
Neural networks
Recurrent neural networks
Speech
Speech communication
Speech processing
Speech recognition
Telephone sets
ARTICULATORY FEATURES
BLSTM
Convolutional neural network
Deep neural networks
LONG SHORT TERM MEMORY
Recurrent neural network (rnn)
Speech production
TEMPORAL CLASSIFICATION
Classification (of information)

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND