Abstract
The main goal of Machine Translation (MT) has been to correctly convey the content in the source text to the target language. Stylistic considerations have been at best secondary. However, style carries information about the author’s identity. Mostly overlooking this aspect, the output of three commercial MT systems (Bing, DeepL, Google) make demographically diverse samples from five languages “sound” older and more male than the original texts. Our findings suggest that translation models reflect demographic bias in the training data. This bias can cause misunderstandings about unspoken assumptions and communication goals, which normally differ for different demographic categories. These results open up interesting new research avenues in MT to take stylistic considerations into account. We explore whether this bias can be used as a feature, by correcting skewed initial samples, and compute fairness scores for the different demographics.
All authors contributed equally and are listed alphabetically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
n-grams are sequences of n consecutive words in a text. In NLP, they are frequently used linguistic features, and the most common n-grams are bi-grams and tri-grams.
- 3.
- 4.
- 5.
- 6.
- 7.
Note that the KL is a divergence and not a distance measure, because it is not symmetric, KL(P|Q) ≠ KL(Q|P). This difference is not important for our objective, but it is important to remember. Moreover, the KL divergence does not have an upper bound, while the lower bound is 0 (i.e., equal distributions).
- 8.
- 9.
- 10.
- 11.
We will use LS-BERT to refer to the corresponding language-specific BERT applicable to the language being discussed.
- 12.
- 13.
References
Antoun W, Baly F, Hajj H (2020) AraBERT: Transformer-based model for Arabic language understanding. In: Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection, Marseille, France. European Language Resource Association, pp 9–15
Bentivogli L, Savoldi B, Negri M, Di Gangi MA, Cattoni R, Turchi M (2020) Gender in danger? evaluating speech translation technology on the MuST-SHE corpus. In: Proceedings of the 58th annual meeting of the association for computational linguistics, Online. Association for Computational Linguistics, pp 6923–6933
Bianchi F, Hovy D (2021) On the gap between adoption and understanding in NLP. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online. Association for Computational Linguistics, pp 3895–3901
Bianchi F, Nozza D, Hovy D (2021a) FEEL-IT: Emotion and sentiment classification for the Italian language. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 76–83
Bianchi F, Nozza D, Hovy D (2021b) Language invariant properties in natural language processing. Preprint. arXiv:2109.13037
Bianchi F, Terragni S, Hovy D, Nozza D, Fersini E (2021c) Cross-lingual contextualized topic models with zero-shot learning. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 1676–1683
Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: A critical survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5454–5476
Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp 4349–4357
Dabre R, Chu C, Kunchukuttan A (2020) A survey of multilingual neural machine translation. ACM Comput Surv 53(5):1
de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019) BERTje: A Dutch BERT model. Preprint. arXiv:1912.09582
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 4171–4186
Escudé Font J, Costa-jussà MR (2019) Equalizing gender bias in neural machine translation with word embeddings techniques. In: Proceedings of the first workshop on gender bias in natural language processing, Florence, Italy. Association for Computational Linguistics, pp 147–154
Flek L (2020) Returning the N to NLP: Towards contextually personalized classification models. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7828–7838
Gehman S, Gururangan S, Sap M, Choi Y, Smith NA (2020) RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 3356–3369
Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 609–614
Hovy D, Yang D (2021) The importance of modeling social factors of language: Theory and practice. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 588–602
Hovy D, Johannsen A, Søgaard A (2015) User review sites as a resource for large-scale sociolinguistic studies. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th international conference on world wide web, WWW 2015, Florence, Italy, May 18–22, 2015. ACM, pp 452–461
Hovy D, Bianchi F, Fornaciari T (2020) “you sound just like your father” commercial machine translation systems include stylistic biases. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 1686–1690
Johannsen A, Hovy D, Søgaard A (2015) Cross-lingual syntactic variation over age and gender. In: Proceedings of the nineteenth conference on computational natural language learning, Beijing, China. Association for Computational Linguistics, pp 103–112
Lamprinidis S, Bianchi F, Hardt D, Hovy D (2021) Universal joy a data set and results for classifying emotions across languages. In: Proceedings of the eleventh workshop on computational approaches to subjectivity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp 62–75
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. Preprint. arXiv:1907.11692
Liu X, Duh K, Liu L, Gao J (2020) Very deep transformers for neural machine translation. Preprint. arXiv:2008.07772
Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings of the ACL 2012 system demonstrations, Jeju Island, Korea. Association for Computational Linguistics, pp 25–30
Manzini T, Yao Chong L, Black AW, Tsvetkov Y (2019) Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp 615–621
Martin L, Muller B, Ortiz Suárez P J., Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B (2020) CamemBERT: a tasty French language model. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7203–7219
Michel P, Neubig G (2018) Extreme adaptation for personalized neural machine translation. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics, pp 312–318
Mirkin S, Meunier JL (2015) Personalized machine translation: Predicting translational preferences. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 2019–2025
Mirkin S, Nowson S, Brun C, Perez J (2015) Motivating personality-aware machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal. Association for Computational Linguistics, pp 1102–1108
Nguyen DQ, Tuan Nguyen A (2020) PhoBERT: Pre-trained language models for Vietnamese. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp 1037–1042
Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and across languages. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp 1008–1021
Nozza D (2021) Exposing the limits of zero-shot cross-lingual hate speech detection. In: Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (Volume 2: Short Papers), Online. Association for Computational Linguistics, pp 907–914
Nozza D, Bianchi F, Hovy D (2020) What the [mask]? making sense of language-specific bert models. Preprint. arXiv:2003.02912
Nozza D, Bianchi F, Hovy D (2021) HONEST: Measuring hurtful sentence completion in language models. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 2398–2406
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel, O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Prabhumoye S, Boldt B, Salakhutdinov R, Black AW (2021) Case study: Deontological ethics in NLP. In: Proceedings of the 2021 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Online. Association for Computational Linguistics, pp 3784–3798
Prates MO, Avelar PH, Lamb LC (2019) Assessing gender bias in machine translation: a case study with Google translate. Neural Comput Appl:1–19
Rabinovich E, Patel RN, Mirkin S, Specia L, Wintner S (2017) Personalized machine translation: Preserving original author traits. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain. Association for Computational Linguistics, pp 1074–1084
Rescigno AA, Monti J, Way A, Vanmassenhove E (2020) A case study of natural gender phenomena in translation: A comparison of Google Translate, bing Microsoft translator and DeepL for English to Italian, French and Spanish. In: Workshop on the impact of machine translation (iMpacT 2020), Virtual. Association for Machine Translation in the Americas, pp 62–90
Saunders D, Byrne B (2020) Reducing gender bias in neural machine translation as a domain adaptation problem. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 7724–7736
Saunders D, Sallis R, Byrne B (2020) Neural machine translation doesn’t translate gender coreference right unless you make it. In: Proceedings of the second workshop on gender bias in natural language processing, Barcelona, Spain (Online). Association for Computational Linguistics, pp 35–43
Shah DS, Schwartz HA, Hovy D (2020) Predictive biases in natural language processing models: A conceptual framework and overview. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics, Online. Association for Computational Linguistics, pp 5248–5264
Stafanovičs A, Bergmanis T, Pinnis M (2020) Mitigating gender bias in machine translation with target gender annotations. In: Proceedings of the fifth conference on machine translation, Online. Association for Computational Linguistics, pp 629–638
Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1679–1684
Vanmassenhove E, Hardmeier C, Way A (2018) Getting gender right in neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium. Association for Computational Linguistics, pp 3003–3008
Vanmassenhove E, Shterionov D, Way A (2019) Lost in translation: Loss and decay of linguistic richness in machine translation. In: Proceedings of machine translation summit XVII volume 1: research track, Dublin, Ireland. European Association for Machine Translation, pp 222–232
Vanmassenhove E, Shterionov D, Gwilliam M (2021) Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume, Online. Association for Computational Linguistics, pp 2203–2213
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5998–6008
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota. Association for Computational Linguistics, pp 72–77
Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark. Association for Computational Linguistics, pp 2979–2989
Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018) Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 2 (Short Papers), New Orleans, Louisiana. Association for Computational Linguistics, pp 15–20
Zhao G, Sun X, Xu J, Zhang Z, Luo L (2019) Muse: Parallel multi-scale attention for sequence to sequence learning. Preprint. arXiv:1911.09483
Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural machine translation. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Zmigrod R, Mielke SJ, Wallach H, Cotterell R (2019) Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy. Association for Computational Linguistics, pp 1651–1661
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bianchi, F., Fornaciari, T., Hovy, D., Nozza, D. (2023). Gender and Age Bias in Commercial Machine Translation. In: Moniz, H., Parra Escartín, C. (eds) Towards Responsible Machine Translation. Machine Translation: Technologies and Applications, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-031-14689-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-14689-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14688-6
Online ISBN: 978-3-031-14689-3
eBook Packages: Computer ScienceComputer Science (R0)