Discovering language in marmoset vocalization

Hema Murthy; Nauman Dawalatabad; Sakshi Verma; K.L. Prateek; Karthik Pandia; Rogier Landman; Jitendra Sharma; Mriganka Sur

doi:10.21437/Interspeech.2017-842

Profiles Research Units Publications

Conferences

Open Access

Discovering language in marmoset vocalization

, Nauman Dawalatabad, Sakshi Verma, K.L. Prateek, Karthik Pandia, Rogier Landman, Jitendra Sharma, Mriganka Sur

Published in International Speech Communication Association

2017

DOI: 10.21437/Interspeech.2017-842

Volume: 2017-August

Pages: 2426 - 2430

Abstract

Various studies suggest that marmosets (Callithrix jacchus) show behavior similar to that of humans in many aspects. Analyzing their calls would not only enable us to better understand these species but would also give insights into the evolution of human languages and vocal tract. This paper describes a technique to discover the patterns in marmoset vocalization in an unsupervised fashion. The proposed unsupervised clustering approach operates in two stages. Initially, voice activity detection (VAD) is applied to remove silences and non-voiced regions from the audio. This is followed by a group-delay based segmentation on the voiced regions to obtain smaller segments. In the second stage, a two-tier clustering is performed on the segments obtained. Individual hidden Markov models (HMMs) are built for each of the segments using a multiple frame size and multiple frame rate. The HMMs are then clustered until each cluster is made up of a large number of segments. Once all the clusters get enough number of segments, one Gaussian mixture model (GMM) is built for each of the clusters. These clusters are then merged using Kullback-Leibler (KL) divergence. The algorithm converges to the total number of distinct sounds in the audio, as evidenced by listening tests. Copyright © 2017 ISCA.

PDFPublisher Copy

About the journal

Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publisher	International Speech Communication Association
ISSN	2308457X
Open Access	Yes

Authors (1)

Hema Murthy
- Department of Computer Science and Engineering

Concepts (17)

Audio acoustics
Gaussian distribution
Group delay
Hidden markov models
Image segmentation
Markov processes
Speech recognition
Trellis codes
Clustering
GAUSSIAN MIXTURE MODEL
Hidden markov models (hmms)
KULLBACK-LEIBLER DIVERGENCE
MARMOSET VOCALIZATION
MULTIPLE FRAME SIZES
Unsupervised clustering
VOICE ACTIVITY DETECTION
Speech communication

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND