Gender and Age Bias in Commercial Machine Translation

Bianchi, Federico; Fornaciari, Tommaso; Hovy, Dirk; Nozza, Debora

doi:10.1007/978-3-031-14689-3_9

Federico Bianchi⁶,
Tommaso Fornaciari⁷,
Dirk Hovy⁵ &
…
Debora Nozza⁵

Part of the book series: Machine Translation: Technologies and Applications ((MATRA,volume 4))

644 Accesses

Abstract

The main goal of Machine Translation (MT) has been to correctly convey the content in the source text to the target language. Stylistic considerations have been at best secondary. However, style carries information about the author’s identity. Mostly overlooking this aspect, the output of three commercial MT systems (Bing, DeepL, Google) make demographically diverse samples from five languages “sound” older and more male than the original texts. Our findings suggest that translation models reflect demographic bias in the training data. This bias can cause misunderstandings about unspoken assumptions and communication goals, which normally differ for different demographic categories. These results open up interesting new research avenues in MT to take stylistic considerations into account. We explore whether this bias can be used as a feature, by correcting skewed initial samples, and compute fairness scores for the different demographics.

All authors contributed equally and are listed alphabetically.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-morning-attack-them-arrest.
2.
n-grams are sequences of n consecutive words in a text. In NLP, they are frequently used linguistic features, and the most common n-grams are bi-grams and tri-grams.
3.
https://translate.google.com/.
4.
https://www.bing.com/translator.
5.
https://www.deepl.com/en/translator.
6.
https://www.deepl.com/quality.html.
7.
Note that the KL is a divergence and not a distance measure, because it is not symmetric, KL(P|Q) ≠ KL(Q|P). This difference is not important for our objective, but it is important to remember. Moreover, the KL divergence does not have an upper bound, while the lower bound is 0 (i.e., equal distributions).
8.
https://github.com/saffsd/langid.py.
9.
https://www.cia.gov/library/publications/the-world-factbook/.
10.
https://github.com/google-research/bert/blob/master/multilingual.md.
11.
We will use LS-BERT to refer to the corresponding language-specific BERT applicable to the language being discussed.
12.
https://github.com/dbmdz/berts.
13.
https://github.com/idb-ita/GilBERTo.

References

Antoun W, Baly F, Hajj H (2020) AraBERT: Transformer-based model for Arabic language understanding. In: Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection, Marseille, France. European Language Resource Association, pp 9–15
Google Scholar
Bentivogli L, Savoldi B, Negri M, Di Gangi MA, Cattoni R, Turchi M (2020) Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In: Proceedings of the 58th annual meeting of the association for computational linguistics, Online. Association for Computational Linguistics, pp 6923–6933
Google Scholar
Bianchi F, Hovy D (2021) On the gap between adoption and understanding in NLP. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online. Association for Computational Linguistics, pp 3895–3901
Google Scholar
Bianchi F, Nozza D, Hovy D (2021a) FEEL-IT: Emotion and sentiment classification for the Italian language. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 76–83
Google Scholar
Bianchi F, Nozza D, Hovy D (2021b) Language invariant properties in natural language processing. Preprint. arXiv:2109.13037
Google Scholar
Bianchi F, Terragni S, Hovy D, Nozza D, Fersini E (2021c) Cross-lingual contextualized topic models with zero-shot learning. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 1676–1683
Google Scholar
Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: A critical survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5454–5476
Google Scholar
Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 4349–4357
Google Scholar
Dabre R, Chu C, Kunchukuttan A (2020) A survey of multilingual neural machine translation. ACM Comput Surv 53(5):1
Article Google Scholar
de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019) BERTje: A Dutch BERT model. Preprint. arXiv:1912.09582
Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186
Google Scholar
Escudé Font J, Costa-jussà MR (2019) Equalizing gender bias in neural machine translation with word embeddings techniques. In: Proceedings of the first workshop on gender bias in natural language processing, Florence, Italy. Association for Computational Linguistics, pp 147–154
Google Scholar
Flek L (2020) Returning the N to NLP: Towards contextually personalized classification models. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7828–7838
Google Scholar
Gehman S, Gururangan S, Sap M, Choi Y, Smith NA (2020) RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 3356–3369
Google Scholar
Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 609–614
Google Scholar
Hovy D, Yang D (2021) The importance of modeling social factors of language: Theory and practice. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 588–602
Google Scholar
Hovy D, Johannsen A, Søgaard A (2015) User review sites as a resource for large-scale sociolinguistic studies. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th international conference on world wide web, WWW 2015, Florence, Italy, May 18–22, 2015. ACM, pp 452–461
Google Scholar
Hovy D, Bianchi F, Fornaciari T (2020) “you sound just like your father” commercial machine translation systems include stylistic biases. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 1686–1690
Google Scholar
Johannsen A, Hovy D, Søgaard A (2015) Cross-lingual syntactic variation over age and gender. In: Proceedings of the nineteenth conference on computational natural language learning, Beijing, China. Association for Computational Linguistics, pp 103–112
Google Scholar
Lamprinidis S, Bianchi F, Hardt D, Hovy D (2021) Universal joy a data set and results for classifying emotions across languages. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 62–75
Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. Preprint. arXiv:1907.11692
Google Scholar
Liu X, Duh K, Liu L, Gao J (2020) Very deep transformers for neural machine translation. Preprint. arXiv:2008.07772
Google Scholar
Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings of the ACL 2012 system demonstrations, Jeju Island, Korea. Association for Computational Linguistics, pp 25–30
Google Scholar
Manzini T, Yao Chong L, Black AW, Tsvetkov Y (2019) Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 615–621
Google Scholar
Martin L, Muller B, Ortiz Suárez P J., Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B (2020) CamemBERT: a tasty French language model. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7203–7219
Google Scholar
Michel P, Neubig G (2018) Extreme adaptation for personalized neural machine translation. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp 312–318
Google Scholar
Mirkin S, Meunier JL (2015) Personalized machine translation: Predicting translational preferences. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 2019–2025
Google Scholar
Mirkin S, Nowson S, Brun C, Perez J (2015) Motivating personality-aware machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 1102–1108
Google Scholar
Nguyen DQ, Tuan Nguyen A (2020) PhoBERT: Pre-trained language models for Vietnamese. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 1037–1042
Google Scholar
Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and across languages. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp 1008–1021
Google Scholar
Nozza D (2021) Exposing the limits of zero-shot cross-lingual hate speech detection. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), Online. Association for Computational Linguistics, pp 907–914
Google Scholar
Nozza D, Bianchi F, Hovy D (2020) What the [mask]? making sense of language-specific bert models. Preprint. arXiv:2003.02912
Google Scholar
Nozza D, Bianchi F, Hovy D (2021) HONEST: Measuring hurtful sentence completion in language models. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 2398–2406
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel, O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet Google Scholar
Prabhumoye S, Boldt B, Salakhutdinov R, Black AW (2021) Case study: Deontological ethics in NLP. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 3784–3798
Google Scholar
Prates MO, Avelar PH, Lamb LC (2019) Assessing gender bias in machine translation: a case study with Google translate. Neural Comput Appl:1–19
Google Scholar
Rabinovich E, Patel RN, Mirkin S, Specia L, Wintner S (2017) Personalized machine translation: Preserving original author traits. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain. Association for Computational Linguistics, pp 1074–1084
Google Scholar
Rescigno AA, Monti J, Way A, Vanmassenhove E (2020) A case study of natural gender phenomena in translation: A comparison of Google Translate, bing Microsoft translator and DeepL for English to Italian, French and Spanish. In: Workshop on the impact of machine translation (iMpacT 2020), Virtual. Association for Machine Translation in the Americas, pp 62–90
Google Scholar
Saunders D, Byrne B (2020) Reducing gender bias in neural machine translation as a domain adaptation problem. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7724–7736
Google Scholar
Saunders D, Sallis R, Byrne B (2020) Neural machine translation doesn’t translate gender coreference right unless you make it. In: Proceedings of the second workshop on gender bias in natural language processing, Barcelona, Spain (Online). Association for Computational Linguistics, pp 35–43
Google Scholar
Shah DS, Schwartz HA, Hovy D (2020) Predictive biases in natural language processing models: A conceptual framework and overview. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5248–5264
Google Scholar
Stafanovičs A, Bergmanis T, Pinnis M (2020) Mitigating gender bias in machine translation with target gender annotations. In: Proceedings of the fifth conference on machine translation, Online. Association for Computational Linguistics, pp 629–638
Google Scholar
Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1679–1684
Google Scholar
Vanmassenhove E, Hardmeier C, Way A (2018) Getting gender right in neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium. Association for Computational Linguistics, pp 3003–3008
Google Scholar
Vanmassenhove E, Shterionov D, Way A (2019) Lost in translation: Loss and decay of linguistic richness in machine translation. In: Proceedings of machine translation summit XVII volume 1: research track, Dublin, Ireland. European Association for Machine Translation, pp 222–232
Google Scholar
Vanmassenhove E, Shterionov D, Gwilliam M (2021) Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 2203–2213
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008
Google Scholar
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Google Scholar
Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota. Association for Computational Linguistics, pp 72–77
Google Scholar
Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. Association for Computational Linguistics, pp 2979–2989
Google Scholar
Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018) Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 2 (Short Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp 15–20
Google Scholar
Zhao G, Sun X, Xu J, Zhang Z, Luo L (2019) Muse: Parallel multi-scale attention for sequence to sequence learning. Preprint. arXiv:1911.09483
Google Scholar
Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural machine translation. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Google Scholar
Zmigrod R, Mielke SJ, Wallach H, Cotterell R (2019) Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1651–1661
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Sciences Department, Bocconi University, Milan, Italy
Dirk Hovy & Debora Nozza
Computer Science Department, Stanford University, Stanford, USA
Federico Bianchi
Italian National Police, Rome, Italy
Tommaso Fornaciari

Authors

Federico Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Fornaciari
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Hovy
View author publications
You can also search for this author in PubMed Google Scholar
Debora Nozza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dirk Hovy .

Editor information

Editors and Affiliations

School of Arts and Humanities, University of Lisbon, Lisbon, Portugal
Helena Moniz
Research & Development, RWS Language Weaver, Dublin, Ireland
Carla Parra Escartín

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bianchi, F., Fornaciari, T., Hovy, D., Nozza, D. (2023). Gender and Age Bias in Commercial Machine Translation. In: Moniz, H., Parra Escartín, C. (eds) Towards Responsible Machine Translation. Machine Translation: Technologies and Applications, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-031-14689-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-14689-3_9
Published: 25 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14688-6
Online ISBN: 978-3-031-14689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics