Published October 27, 2020 | Version v1
Dataset Open

WinoST: Evaluating Gender Bias in Speech Translation

  • 1. Universitat Politècnica de Catalunya

Contributors

Description

WinoST is a challenge set for evaluating gender bias in speech translation. WinoST is the speech version of WinoMT (Stanovsky et al., 2019) which is an MT challenge set and both follow an evaluation protocol to measure gender accuracy. WinoST consists of 3888 speech audios in English plus a text file with the the text of these audios. For further details, please refer to the publication entitled "Evaluating Gender Bias in Speech Translation" (Costa-jussà et al., 2020). Also cite this publication if using this corpus.

Notes

IMPORTANT: The corpus is distributed under the MIT License, but recordings can't be used for speech synthesis, text to speech, voice conversion, or other applications where the speaker's voice is imitated or reproduced. This work is supported in part by the Spanish Ministerio de Ciencia e Innovación through the postdoctoral senior grant Ramón y Cajal.

Files

WinoST.zip

Files (2.2 GB)

Name Size Download all
md5:9c4acd530794fb0b3fc069b04573e190
2.2 GB Preview Download

Additional details

References

  • (Costa-jussà et al., 2020) Evaluating Gender Bias in Speech Translation, ARXIV.org, 2020
  • (Stanovsky et al., 2019) Evaluating Gender Bias in Machine Translation, ACL, 2019