ABSTRACT
As virtual assistants continue to be taken up globally, there is an ever-greater need for these speech-based systems to communicate naturally in a variety of languages. Crowdsourcing initiatives have focused on multilingual translation of big, open data sets for use in natural language processing (NLP). Yet, language translation is often not one-to-one, and biases can trickle in. In this late-breaking work, we focus on the case of pronouns translated between English and Japanese in the crowdsourced Tatoeba database. We found that masculine pronoun biases were present overall, even though plurality in language was accounted for in other ways. Importantly, we detected biases in the translation process that reflect nuanced reactions to the presence of feminine, neutral, and/or non-binary pronouns. We raise the issue of translation bias for pronouns and offer a practical solution to embed plurality in NLP data sets.
Footnotes
1 The Massively Multilingual NLU 2022 workshop (MMNLU-22 at EMNLP 2022: https://mmnlu-22.github.io
Footnote- Footnote
3 https://arxiv.org/abs/2204.08582
Footnote- Footnote
5 https://tatoeba.org/en/stats/native_speakers
Footnote6 https://tatoeba.org/en/stats/users_languages
Footnote7 https://huggingface.co/datasets/tatoeba
Footnote8 https://pypi.org/project/tatoebatools
Footnote9 https://mocobeta.github.io/janome
Footnote- Footnote
11 https://www.nltk.org/book/ch05.html
Footnote
Supplemental Material
- April H. Bailey, Marianne LaFrance, and John F. Dovidio. 2019. Is man the measure of all things? A social cognitive account of androcentrism. Personality and Social Psychology Review 23, 4, 307-331. https://doi.org/10.1177/1088868318782848Google ScholarCross Ref
- April H. Bailey, Adina Williams, and Andrei Cimpian. 2022. Based on billions of words on the internet, people = men. Science Advances 8, 13, eabm2463. https://doi.org/10.1126/sciadv.abm2463Google ScholarCross Ref
- Shaowen Bardzell. 2010. Feminist HCI: Taking stock and outlining an agenda for design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), 1301-1310. https://doi.org/10.1145/1753326.1753521Google ScholarDigital Library
- Shaowen Bardzell and Jeffrey Bardzell. 2011. Towards a feminist HCI methodology: Social science, feminism, and HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11), 675-684. https://doi.org/10.1145/1978942.1979041Google ScholarDigital Library
- Simone de Beauvoir. 2011. The Second Sex. Vintage Books, New York, NY.Google Scholar
- Rosanna Bellini, Angelika Strohmayer, Ebtisam Alabdulqader, Alex A. Ahmed, Katta Spiel, Shaowen Bardzell, and Madeline Balaam. 2018. Feminist HCI: Taking stock, moving forward, and engaging community. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (CHI EA ’18), 1-4. https://doi.org/10.1145/3170427.3185370Google ScholarDigital Library
- Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6, 587-604. https://doi.org/10.1162/tacl_a_00041Google ScholarCross Ref
- Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 5454-5476. https://doi.org/10.18653/v1/2020.acl-main.485Google ScholarCross Ref
- Su Lin Blodgett, Q. Vera Liao, Alexandra Olteanu, Rada Mihalcea, Michael Muller, Morgan Klaus Scheuerman, Chenhao Tan, and Qian Yang. 2022. Responsible language technologies: Foreseeing and mitigating harms. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22), 1-3. https://doi.org/10.1145/3491101.3516502Google ScholarDigital Library
- Yang Trista Cao and Hal Daumé III. 2021. Toward gender-inclusive coreference resolution: An analysis of gender and bias throughout the machine learning lifecycle. Computational Linguistics 47, 3, 615-661. https://doi.org/10.1162/coli_a_00413Google ScholarCross Ref
- Tommaso Caselli, Roberto Cibin, Costanza Conforti, Enrique Encinas, and Maurizio Teli. 2021. Guiding principles for participatory design-inspired natural language processing. In Proceedings of the 1st Workshop on NLP for Positive Impact (ACL-IJCNLP-NLP4PosImpact 2021), 27-35. https://doi.org/10.18653/v1/2021.nlp4posimpact-1.4Google ScholarCross Ref
- Yan Chen, Christopher Mahoney, Isabella Grasso, Esma Wali, Abigail Matthews, Thomas Middleton, Mariama Njie, and Jeanna Matthews. 2021. Gender bias and under-representation in natural language processing across human languages. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21), 24-34. https://doi.org/10.1145/3461702.3462530Google ScholarDigital Library
- Shruthi Sai Chivukula and Colin M. Gray. 2020. Bardzell's “Feminist HCI” legacy: Analyzing citational patterns. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20), 1-8. https://doi.org/10.1145/3334480.3382936Google ScholarDigital Library
- Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (GeBNLP 2019), 173-181. https://doi.org/10.18653/v1/W19-3824Google ScholarCross Ref
- Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, and Benjamin R Cowan. 2019. The state of speech in HCI: Trends, themes and challenges. Interacting with Computers 31, 4, 349-371. https://doi.org/10.1093/iwc/iwz016Google ScholarCross Ref
- Mar Díaz-Millón and María Dolores Olvera-Lobo. 2021. Towards a definition of transcreation: a systematic literature review. Perspectives 0, 0, 1-18. https://doi.org/10.1080/0907676X.2021.2004177Google ScholarCross Ref
- Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, and Adina Williams. 2020. Multi-dimensional gender bias classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP 2020), 314-331. https://doi.org/10.18653/v1/2020.emnlp-main.23Google ScholarCross Ref
- Anjalie Field and Yulia Tsvetkov. 2020. Unsupervised discovery of implicit gender bias. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP 2020), 596-608. https://doi.org/10.18653/v1/2020.emnlp-main.44Google ScholarCross Ref
- Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, and Richa Singh. 2022. MASSIVE: A 1M-example multilingual natural language understanding dataset with 51 typologically-diverse languages. arXiv preprint arXiv:2204.08582.Google Scholar
- Ismael Garrido-Muñoz, Arturo Montejo-Ráez, Fernando Martínez-Santiago, and L. Alfonso Ureña-López. 2021. A survey on bias in deep NLP. Applied Sciences 11, 7, 3184. https://doi.org/10.3390/app11073184Google ScholarCross Ref
- Liane Guillou and Christian Hardmeier. 2016. PROTEST: A test suite for evaluating pronouns in machine translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (LREC 2016), 636-643. Retrieved January 20, 2023 from https://aclanthology.org/L16-1100Google Scholar
- Samantha Jaroszewski, Danielle Lottridge, Oliver L. Haimson, and Katie Quehl. 2018. “Genderfluid” or “attack helicopter”: Responsible HCI research practice with non-binary gender variation in online communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18), 1-15. https://doi.org/10.1145/3173574.3173881Google ScholarDigital Library
- Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 6282-6293. https://doi.org/10.18653/v1/2020.acl-main.560Google ScholarCross Ref
- Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), 230-237. Retrieved January 20, 2023 from https://aclanthology.org/W04-3230Google Scholar
- Anne Lauscher, Archie Crowley, and Dirk Hovy. 2022. Welcome to the modern world of pronouns: Identity-inclusive natural language processing beyond gender. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), 1221-1232. Retrieved November 23, 2022 from https://aclanthology.org/2022.coling-1.105Google Scholar
- Jackie F.K. Lee. 2018. Gender representation in Japanese EFL textbooks – a corpus study. Gender and Education 30, 3, 379-395. https://doi.org/10.1080/09540253.2016.1214690Google ScholarCross Ref
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. https://doi.org/10.1145/3287560.3287596Google ScholarDigital Library
- Christine Murad, Cosmin Munteanu, Benjamin R. Cowan, Leigh Clark, Martin Porcheron, Heloisa Candello, Stephan Schlögl, Matthew P. Aylett, Jaisie Sin, Robert J. Moore, Grace Hughes, and Andrew Ku. 2021. Let's talk about CUIs: Putting conversational user interface design into practice. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA ’21), 1-6. https://doi.org/10.1145/3411763.3441336Google ScholarDigital Library
- Tsuyoshi Ono and Sandra A. Thompson. 2003. Japanese (w)atashi/ore/boku I: They're not just pronouns. Cognitive Linguistics 14, 4, 321-347. https://doi.org/10.1515/cogl.2003.013Google ScholarCross Ref
- Marcelo O. R. Prates, Pedro H. Avelar, and Luís C. Lamb. 2020. Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing and Applications 32, 10, 6363-6381. https://doi.org/10.1007/s00521-019-04144-6Google ScholarDigital Library
- Ari Schlesinger, W. Keith Edwards, and Rebecca E. Grinter. 2017. Intersectional HCI: Engaging identity through gender, race, and class. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17), 5412-5427. https://doi.org/10.1145/3025453.3025766Google ScholarDigital Library
- Katie Seaborn, Shruti Chandra, and Thibault Fabre. 2023. Transcending the “male code”: Implicit masculine biases in NLP contexts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), 1-19. https://doi.org/10.1145/3544548.3581017Google ScholarDigital Library
- Katie Seaborn and Alexa Frank. 2022. What pronouns for Pepper? A critical review of gender/ing in research. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22), 1-15. https://doi.org/10.1145/3491102.3501996Google ScholarDigital Library
- Katie Seaborn, Norihisa Paul Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2022. Voice in human-agent interaction: A survey. ACM Computing Surveys (CSUR) 54, 4, Article No. 81. https://doi.org/10.1145/3386867Google ScholarDigital Library
- Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. 2019. How to do better with gender on surveys: A guide for HCI researchers. Interactions 26, 4, 62-65. https://doi.org/10.1145/3338283Google ScholarDigital Library
- Damiano Spina, Johanne R. Trippas, Paul Thomas, Hideo Joho, Katriina Byström, Leigh Clark, Nick Craswell, Mary Czerwinski, David Elsweiler, Alexander Frummet, Souvick Ghosh, Johannes Kiesel, Irene Lopatovska, Daniel McDuff, Selina Meyer, Ahmed Mourad, Paul Owoicho, Sachin Pathiyan Cherumanal, Daniel Russell, and Laurianne Sitbon. 2021. Report on the future conversations workshop at CHIIR 2021. ACM SIGIR Forum 55, 1, 6:1-6:22. https://doi.org/10.1145/3476415.3476421Google ScholarDigital Library
- Stephen V. Stehman. 1997. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment 62, 1, 77-89. https://doi.org/10.1016/S0034-4257(97)00083-7Google ScholarCross Ref
- Yolande Strengers, Lizhen Qu, Qiongkai Xu, and Jarrod Knibbe. 2020. Adhering, steering, and queering: Treatment of gender in natural language generation. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), 1-14. https://doi.org/10.1145/3313831.3376315Google ScholarDigital Library
- Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), 1630-1640. https://doi.org/10.18653/v1/P19-1159Google ScholarCross Ref
- Yasuhito Tanaka. 2001. Compilation of a multilingual parallel corpus. In Proceedings of the 2001 International Conference of the Pacific Association for Computational Linguistics (PACLING 2001) (PACLING 2001), 265-268.Google Scholar
- Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics 6, 605-617. https://doi.org/10.1162/tacl_a_00240Google ScholarCross Ref
- James Zou and Londa Schiebinger. 2018. AI can be sexist and racist—it's time to make it fair. Nature Publishing Group. Retrieved from https://www.nature.com/articles/d41586-018-05707-8Google Scholar
Index Terms
- “I'm” Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets
Recommendations
Transcending the “Male Code”: Implicit Masculine Biases in NLP Contexts
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsCritical scholarship has elevated the problem of gender bias in data sets used to train virtual assistants (VAs). Most work has focused on explicit biases in language, especially against women, girls, femme-identifying people, and genderqueer folk; ...
Crowdsourced Monolingual Translation
An enormous potential exists for solving certain classes of computational problems through rich collaboration among crowds of humans supported by computers. Solutions to these problems used to involve human professionals, who are expensive to hire or ...
Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval
AbstractCross-Lingual Information Retrieval (CLIR) enables a user to query in a language which is different from the target documents language. CLIR incorporates a translation technique based on either a manual dictionary or a probabilistic dictionary ...
Comments