ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021

Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

In this paper, we describe the ViVoLab speech activity detection (SAD) system submitted to the Fearless Steps Challenge Phase III. This series of challenges have proposed a number of speech processing task dealing with audio from Apollo space missions over the last few years. The focus in this edition is set on the generalisation capabilities of the systems, with new evaluation data from different channels. Our proposed submission is based on the use of the unsupervised representation learning paradigm, seeking to obtain a new and more discriminative audio representation than traditional perceptual features such as log Mel-filterbank energies. These new features are used to train different variations of a convolutional recurrent neural network (CRNN). Experimental results show that features learned via unsupervised learning provide a much more robust representation, significantly reducing the mismatch observed between development and evaluation partition results. Obtained results largely outperform the organisation baseline, achieving a DCF metric of 2.98% on the evaluation set and ranking third among all the participant teams.


doi: 10.21437/Interspeech.2021-309

Cite as: Gimeno, P., Ortega, A., Miguel, A., Lleida, E. (2021) Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021. Proc. Interspeech 2021, 4359-4363, doi: 10.21437/Interspeech.2021-309

@inproceedings{gimeno21_interspeech,
  author={Pablo Gimeno and Alfonso Ortega and Antonio Miguel and Eduardo Lleida},
  title={{Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4359--4363},
  doi={10.21437/Interspeech.2021-309}
}