Header menu link for other important links
X
Articulatory feature extraction using CTC to build articulatory classifiers without forced frame alignments for speech recognition
Published in International Speech and Communication Association
2016
Volume: 08-12-September-2016
   
Pages: 798 - 802
Abstract
Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set. Copyright ©2016 ISCA.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
ISSN2308457X
Open AccessYes
Concepts (19)
  •  related image
    Alignment
  •  related image
    Feature extraction
  •  related image
    Network architecture
  •  related image
    Neural networks
  •  related image
    Recurrent neural networks
  •  related image
    Speech
  •  related image
    Speech communication
  •  related image
    Speech processing
  •  related image
    Speech recognition
  •  related image
    Telephone sets
  •  related image
    ARTICULATORY FEATURES
  •  related image
    BLSTM
  •  related image
    Convolutional neural network
  •  related image
    Deep neural networks
  •  related image
    LONG SHORT TERM MEMORY
  •  related image
    Recurrent neural network (rnn)
  •  related image
    Speech production
  •  related image
    TEMPORAL CLASSIFICATION
  •  related image
    Classification (of information)