Header menu link for other important links
CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations
Lodagala V.S., Ghosh S.,
Published in Institute of Electrical and Electronics Engineers Inc.
Pages: 1 - 8
While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation based cross-contrastive loss as its self-supervised objective. Through the clustering module we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation, and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves upto 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets respectively of LibriSpeech, without the use of any language model. The proposed method also achieves upto 14.9% relative WER improvement over the baseline wav2vec 2.0, when fine-tuned on Switchboard data. © 2023 IEEE.
About the journal
Journal2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Open AccessNo