Skip to main content

Gender and Age Bias in Commercial Machine Translation

  • Chapter
  • First Online:
Towards Responsible Machine Translation

Part of the book series: Machine Translation: Technologies and Applications ((MATRA,volume 4))

  • 644 Accesses

Abstract

The main goal of Machine Translation (MT) has been to correctly convey the content in the source text to the target language. Stylistic considerations have been at best secondary. However, style carries information about the author’s identity. Mostly overlooking this aspect, the output of three commercial MT systems (Bing, DeepL, Google) make demographically diverse samples from five languages “sound” older and more male than the original texts. Our findings suggest that translation models reflect demographic bias in the training data. This bias can cause misunderstandings about unspoken assumptions and communication goals, which normally differ for different demographic categories. These results open up interesting new research avenues in MT to take stylistic considerations into account. We explore whether this bias can be used as a feature, by correcting skewed initial samples, and compute fairness scores for the different demographics.

All authors contributed equally and are listed alphabetically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-morning-attack-them-arrest.

  2. 2.

    n-grams are sequences of n consecutive words in a text. In NLP, they are frequently used linguistic features, and the most common n-grams are bi-grams and tri-grams.

  3. 3.

    https://translate.google.com/.

  4. 4.

    https://www.bing.com/translator.

  5. 5.

    https://www.deepl.com/en/translator.

  6. 6.

    https://www.deepl.com/quality.html.

  7. 7.

    Note that the KL is a divergence and not a distance measure, because it is not symmetric, KL(P|Q) ≠ KL(Q|P). This difference is not important for our objective, but it is important to remember. Moreover, the KL divergence does not have an upper bound, while the lower bound is 0 (i.e., equal distributions).

  8. 8.

    https://github.com/saffsd/langid.py.

  9. 9.

    https://www.cia.gov/library/publications/the-world-factbook/.

  10. 10.

    https://github.com/google-research/bert/blob/master/multilingual.md.

  11. 11.

    We will use LS-BERT to refer to the corresponding language-specific BERT applicable to the language being discussed.

  12. 12.

    https://github.com/dbmdz/berts.

  13. 13.

    https://github.com/idb-ita/GilBERTo.

References

  • Antoun W, Baly F, Hajj H (2020) AraBERT: Transformer-based model for Arabic language understanding. In: Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection, Marseille, France. European Language Resource Association, pp 9–15

    Google Scholar 

  • Bentivogli L, Savoldi B, Negri M, Di Gangi MA, Cattoni R, Turchi M (2020) Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In: Proceedings of the 58th annual meeting of the association for computational linguistics, Online. Association for Computational Linguistics, pp 6923–6933

    Google Scholar 

  • Bianchi F, Hovy D (2021) On the gap between adoption and understanding in NLP. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online. Association for Computational Linguistics, pp 3895–3901

    Google Scholar 

  • Bianchi F, Nozza D, Hovy D (2021a) FEEL-IT: Emotion and sentiment classification for the Italian language. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 76–83

    Google Scholar 

  • Bianchi F, Nozza D, Hovy D (2021b) Language invariant properties in natural language processing. Preprint. arXiv:2109.13037

    Google Scholar 

  • Bianchi F, Terragni S, Hovy D, Nozza D, Fersini E (2021c) Cross-lingual contextualized topic models with zero-shot learning. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 1676–1683

    Google Scholar 

  • Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: A critical survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5454–5476

    Google Scholar 

  • Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 4349–4357

    Google Scholar 

  • Dabre R, Chu C, Kunchukuttan A (2020) A survey of multilingual neural machine translation. ACM Comput Surv 53(5):1

    Article  Google Scholar 

  • de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019) BERTje: A Dutch BERT model. Preprint. arXiv:1912.09582

    Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186

    Google Scholar 

  • Escudé Font J, Costa-jussà MR (2019) Equalizing gender bias in neural machine translation with word embeddings techniques. In: Proceedings of the first workshop on gender bias in natural language processing, Florence, Italy. Association for Computational Linguistics, pp 147–154

    Google Scholar 

  • Flek L (2020) Returning the N to NLP: Towards contextually personalized classification models. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7828–7838

    Google Scholar 

  • Gehman S, Gururangan S, Sap M, Choi Y, Smith NA (2020) RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 3356–3369

    Google Scholar 

  • Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 609–614

    Google Scholar 

  • Hovy D, Yang D (2021) The importance of modeling social factors of language: Theory and practice. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 588–602

    Google Scholar 

  • Hovy D, Johannsen A, Søgaard A (2015) User review sites as a resource for large-scale sociolinguistic studies. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th international conference on world wide web, WWW 2015, Florence, Italy, May 18–22, 2015. ACM, pp 452–461

    Google Scholar 

  • Hovy D, Bianchi F, Fornaciari T (2020) “you sound just like your father” commercial machine translation systems include stylistic biases. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 1686–1690

    Google Scholar 

  • Johannsen A, Hovy D, Søgaard A (2015) Cross-lingual syntactic variation over age and gender. In: Proceedings of the nineteenth conference on computational natural language learning, Beijing, China. Association for Computational Linguistics, pp 103–112

    Google Scholar 

  • Lamprinidis S, Bianchi F, Hardt D, Hovy D (2021) Universal joy a data set and results for classifying emotions across languages. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 62–75

    Google Scholar 

  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. Preprint. arXiv:1907.11692

    Google Scholar 

  • Liu X, Duh K, Liu L, Gao J (2020) Very deep transformers for neural machine translation. Preprint. arXiv:2008.07772

    Google Scholar 

  • Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings of the ACL 2012 system demonstrations, Jeju Island, Korea. Association for Computational Linguistics, pp 25–30

    Google Scholar 

  • Manzini T, Yao Chong L, Black AW, Tsvetkov Y (2019) Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 615–621

    Google Scholar 

  • Martin L, Muller B, Ortiz Suárez P J., Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B (2020) CamemBERT: a tasty French language model. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7203–7219

    Google Scholar 

  • Michel P, Neubig G (2018) Extreme adaptation for personalized neural machine translation. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp 312–318

    Google Scholar 

  • Mirkin S, Meunier JL (2015) Personalized machine translation: Predicting translational preferences. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 2019–2025

    Google Scholar 

  • Mirkin S, Nowson S, Brun C, Perez J (2015) Motivating personality-aware machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 1102–1108

    Google Scholar 

  • Nguyen DQ, Tuan Nguyen A (2020) PhoBERT: Pre-trained language models for Vietnamese. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 1037–1042

    Google Scholar 

  • Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and across languages. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp 1008–1021

    Google Scholar 

  • Nozza D (2021) Exposing the limits of zero-shot cross-lingual hate speech detection. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), Online. Association for Computational Linguistics, pp 907–914

    Google Scholar 

  • Nozza D, Bianchi F, Hovy D (2020) What the [mask]? making sense of language-specific bert models. Preprint. arXiv:2003.02912

    Google Scholar 

  • Nozza D, Bianchi F, Hovy D (2021) HONEST: Measuring hurtful sentence completion in language models. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 2398–2406

    Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel, O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  Google Scholar 

  • Prabhumoye S, Boldt B, Salakhutdinov R, Black AW (2021) Case study: Deontological ethics in NLP. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 3784–3798

    Google Scholar 

  • Prates MO, Avelar PH, Lamb LC (2019) Assessing gender bias in machine translation: a case study with Google translate. Neural Comput Appl:1–19

    Google Scholar 

  • Rabinovich E, Patel RN, Mirkin S, Specia L, Wintner S (2017) Personalized machine translation: Preserving original author traits. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain. Association for Computational Linguistics, pp 1074–1084

    Google Scholar 

  • Rescigno AA, Monti J, Way A, Vanmassenhove E (2020) A case study of natural gender phenomena in translation: A comparison of Google Translate, bing Microsoft translator and DeepL for English to Italian, French and Spanish. In: Workshop on the impact of machine translation (iMpacT 2020), Virtual. Association for Machine Translation in the Americas, pp 62–90

    Google Scholar 

  • Saunders D, Byrne B (2020) Reducing gender bias in neural machine translation as a domain adaptation problem. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7724–7736

    Google Scholar 

  • Saunders D, Sallis R, Byrne B (2020) Neural machine translation doesn’t translate gender coreference right unless you make it. In: Proceedings of the second workshop on gender bias in natural language processing, Barcelona, Spain (Online). Association for Computational Linguistics, pp 35–43

    Google Scholar 

  • Shah DS, Schwartz HA, Hovy D (2020) Predictive biases in natural language processing models: A conceptual framework and overview. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5248–5264

    Google Scholar 

  • Stafanovičs A, Bergmanis T, Pinnis M (2020) Mitigating gender bias in machine translation with target gender annotations. In: Proceedings of the fifth conference on machine translation, Online. Association for Computational Linguistics, pp 629–638

    Google Scholar 

  • Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1679–1684

    Google Scholar 

  • Vanmassenhove E, Hardmeier C, Way A (2018) Getting gender right in neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium. Association for Computational Linguistics, pp 3003–3008

    Google Scholar 

  • Vanmassenhove E, Shterionov D, Way A (2019) Lost in translation: Loss and decay of linguistic richness in machine translation. In: Proceedings of machine translation summit XVII volume 1: research track, Dublin, Ireland. European Association for Machine Translation, pp 222–232

    Google Scholar 

  • Vanmassenhove E, Shterionov D, Gwilliam M (2021) Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 2203–2213

    Google Scholar 

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008

    Google Scholar 

  • Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net

    Google Scholar 

  • Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota. Association for Computational Linguistics, pp 72–77

    Google Scholar 

  • Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. Association for Computational Linguistics, pp 2979–2989

    Google Scholar 

  • Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018) Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 2 (Short Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp 15–20

    Google Scholar 

  • Zhao G, Sun X, Xu J, Zhang Z, Luo L (2019) Muse: Parallel multi-scale attention for sequence to sequence learning. Preprint. arXiv:1911.09483

    Google Scholar 

  • Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural machine translation. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net

    Google Scholar 

  • Zmigrod R, Mielke SJ, Wallach H, Cotterell R (2019) Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1651–1661

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Hovy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bianchi, F., Fornaciari, T., Hovy, D., Nozza, D. (2023). Gender and Age Bias in Commercial Machine Translation. In: Moniz, H., Parra Escartín, C. (eds) Towards Responsible Machine Translation. Machine Translation: Technologies and Applications, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-031-14689-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14689-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14688-6

  • Online ISBN: 978-3-031-14689-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics