Header menu link for other important links
X
Discovering language in marmoset vocalization
, Nauman Dawalatabad, Sakshi Verma, K.L. Prateek, Karthik Pandia, Rogier Landman, Jitendra Sharma, Mriganka Sur
Published in International Speech Communication Association
2017
Volume: 2017-August
   
Pages: 2426 - 2430
Abstract
Various studies suggest that marmosets (Callithrix jacchus) show behavior similar to that of humans in many aspects. Analyzing their calls would not only enable us to better understand these species but would also give insights into the evolution of human languages and vocal tract. This paper describes a technique to discover the patterns in marmoset vocalization in an unsupervised fashion. The proposed unsupervised clustering approach operates in two stages. Initially, voice activity detection (VAD) is applied to remove silences and non-voiced regions from the audio. This is followed by a group-delay based segmentation on the voiced regions to obtain smaller segments. In the second stage, a two-tier clustering is performed on the segments obtained. Individual hidden Markov models (HMMs) are built for each of the segments using a multiple frame size and multiple frame rate. The HMMs are then clustered until each cluster is made up of a large number of segments. Once all the clusters get enough number of segments, one Gaussian mixture model (GMM) is built for each of the clusters. These clusters are then merged using Kullback-Leibler (KL) divergence. The algorithm converges to the total number of distinct sounds in the audio, as evidenced by listening tests. Copyright © 2017 ISCA.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association
ISSN2308457X
Open AccessYes
Concepts (17)
  •  related image
    Audio acoustics
  •  related image
    Gaussian distribution
  •  related image
    Group delay
  •  related image
    Hidden markov models
  •  related image
    Image segmentation
  •  related image
    Markov processes
  •  related image
    Speech recognition
  •  related image
    Trellis codes
  •  related image
    Clustering
  •  related image
    GAUSSIAN MIXTURE MODEL
  •  related image
    Hidden markov models (hmms)
  •  related image
    KULLBACK-LEIBLER DIVERGENCE
  •  related image
    MARMOSET VOCALIZATION
  •  related image
    MULTIPLE FRAME SIZES
  •  related image
    Unsupervised clustering
  •  related image
    VOICE ACTIVITY DETECTION
  •  related image
    Speech communication