Work in Progress

“I'm” Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

Authors:
Katie Seaborn

Department of Industrial Engineering and Economics, Tokyo Institute of Technology, Japan

Department of Industrial Engineering and Economics, Tokyo Institute of Technology, Japan

0000-0002-7812-9096
View Profile

,
Yeongdae Kim

Department of Industrial Engineering and Economics, Tokyo Institute of Technology, Japan

Department of Industrial Engineering and Economics, Tokyo Institute of Technology, Japan

0000-0001-5346-0041
View Profile

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsApril 2023Article No.: 168Pages 1–6https://doi.org/10.1145/3544549.3585667

Published:19 April 2023Publication History

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Pages 1–6

ABSTRACT

As virtual assistants continue to be taken up globally, there is an ever-greater need for these speech-based systems to communicate naturally in a variety of languages. Crowdsourcing initiatives have focused on multilingual translation of big, open data sets for use in natural language processing (NLP). Yet, language translation is often not one-to-one, and biases can trickle in. In this late-breaking work, we focus on the case of pronouns translated between English and Japanese in the crowdsourced Tatoeba database. We found that masculine pronoun biases were present overall, even though plurality in language was accounted for in other ways. Importantly, we detected biases in the translation process that reflect nuanced reactions to the presence of feminine, neutral, and/or non-binary pronouns. We raise the issue of translation bias for pronouns and offer a practical solution to embed plurality in NLP data sets.

Footnotes

¹ The Massively Multilingual NLU 2022 workshop (MMNLU-22 at EMNLP 2022: https://mmnlu-22.github.io
Footnote
² https://tatoeba.org
Footnote
³ https://arxiv.org/abs/2204.08582
Footnote
⁴ https://osf.io/8jmyc
Footnote
⁵ https://tatoeba.org/en/stats/native_speakers
Footnote
⁶ https://tatoeba.org/en/stats/users_languages
Footnote
⁷ https://huggingface.co/datasets/tatoeba
Footnote
⁸ https://pypi.org/project/tatoebatools
Footnote
⁹ https://mocobeta.github.io/janome
Footnote
¹⁰ https://jisho.org
Footnote
¹¹ https://www.nltk.org/book/ch05.html
Footnote

Supplemental Material

3544549.3585667-talk-video.mp4

mp4

26.5 MB

Download

References

April H. Bailey, Marianne LaFrance, and John F. Dovidio. 2019. Is man the measure of all things? A social cognitive account of androcentrism. Personality and Social Psychology Review 23, 4, 307-331. https://doi.org/10.1177/1088868318782848Google ScholarCross Ref
April H. Bailey, Adina Williams, and Andrei Cimpian. 2022. Based on billions of words on the internet, people = men. Science Advances 8, 13, eabm2463. https://doi.org/10.1126/sciadv.abm2463Google ScholarCross Ref
Shaowen Bardzell. 2010. Feminist HCI: Taking stock and outlining an agenda for design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10), 1301-1310. https://doi.org/10.1145/1753326.1753521Google ScholarDigital Library
Shaowen Bardzell and Jeffrey Bardzell. 2011. Towards a feminist HCI methodology: Social science, feminism, and HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11), 675-684. https://doi.org/10.1145/1978942.1979041Google ScholarDigital Library
Simone de Beauvoir. 2011. The Second Sex. Vintage Books, New York, NY.Google Scholar
Rosanna Bellini, Angelika Strohmayer, Ebtisam Alabdulqader, Alex A. Ahmed, Katta Spiel, Shaowen Bardzell, and Madeline Balaam. 2018. Feminist HCI: Taking stock, moving forward, and engaging community. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (CHI EA ’18), 1-4. https://doi.org/10.1145/3170427.3185370Google ScholarDigital Library
Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6, 587-604. https://doi.org/10.1162/tacl_a_00041Google ScholarCross Ref
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 5454-5476. https://doi.org/10.18653/v1/2020.acl-main.485Google ScholarCross Ref
Su Lin Blodgett, Q. Vera Liao, Alexandra Olteanu, Rada Mihalcea, Michael Muller, Morgan Klaus Scheuerman, Chenhao Tan, and Qian Yang. 2022. Responsible language technologies: Foreseeing and mitigating harms. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22), 1-3. https://doi.org/10.1145/3491101.3516502Google ScholarDigital Library
Yang Trista Cao and Hal Daumé III. 2021. Toward gender-inclusive coreference resolution: An analysis of gender and bias throughout the machine learning lifecycle. Computational Linguistics 47, 3, 615-661. https://doi.org/10.1162/coli_a_00413Google ScholarCross Ref
Tommaso Caselli, Roberto Cibin, Costanza Conforti, Enrique Encinas, and Maurizio Teli. 2021. Guiding principles for participatory design-inspired natural language processing. In Proceedings of the 1st Workshop on NLP for Positive Impact (ACL-IJCNLP-NLP4PosImpact 2021), 27-35. https://doi.org/10.18653/v1/2021.nlp4posimpact-1.4Google ScholarCross Ref
Yan Chen, Christopher Mahoney, Isabella Grasso, Esma Wali, Abigail Matthews, Thomas Middleton, Mariama Njie, and Jeanna Matthews. 2021. Gender bias and under-representation in natural language processing across human languages. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21), 24-34. https://doi.org/10.1145/3461702.3462530Google ScholarDigital Library
Shruthi Sai Chivukula and Colin M. Gray. 2020. Bardzell's “Feminist HCI” legacy: Analyzing citational patterns. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (CHI EA ’20), 1-8. https://doi.org/10.1145/3334480.3382936Google ScholarDigital Library
Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing (GeBNLP 2019), 173-181. https://doi.org/10.18653/v1/W19-3824Google ScholarCross Ref
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, and Benjamin R Cowan. 2019. The state of speech in HCI: Trends, themes and challenges. Interacting with Computers 31, 4, 349-371. https://doi.org/10.1093/iwc/iwz016Google ScholarCross Ref
Mar Díaz-Millón and María Dolores Olvera-Lobo. 2021. Towards a definition of transcreation: a systematic literature review. Perspectives 0, 0, 1-18. https://doi.org/10.1080/0907676X.2021.2004177Google ScholarCross Ref
Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, and Adina Williams. 2020. Multi-dimensional gender bias classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP 2020), 314-331. https://doi.org/10.18653/v1/2020.emnlp-main.23Google ScholarCross Ref
Anjalie Field and Yulia Tsvetkov. 2020. Unsupervised discovery of implicit gender bias. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP 2020), 596-608. https://doi.org/10.18653/v1/2020.emnlp-main.44Google ScholarCross Ref
Jack FitzGerald, Christopher Hench, Charith Peris, Scott Mackie, Kay Rottmann, Ana Sanchez, Aaron Nash, Liam Urbach, Vishesh Kakarala, and Richa Singh. 2022. MASSIVE: A 1M-example multilingual natural language understanding dataset with 51 typologically-diverse languages. arXiv preprint arXiv:2204.08582.Google Scholar
Ismael Garrido-Muñoz, Arturo Montejo-Ráez, Fernando Martínez-Santiago, and L. Alfonso Ureña-López. 2021. A survey on bias in deep NLP. Applied Sciences 11, 7, 3184. https://doi.org/10.3390/app11073184Google ScholarCross Ref
Liane Guillou and Christian Hardmeier. 2016. PROTEST: A test suite for evaluating pronouns in machine translation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (LREC 2016), 636-643. Retrieved January 20, 2023 from https://aclanthology.org/L16-1100Google Scholar
Samantha Jaroszewski, Danielle Lottridge, Oliver L. Haimson, and Katie Quehl. 2018. “Genderfluid” or “attack helicopter”: Responsible HCI research practice with non-binary gender variation in online communities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18), 1-15. https://doi.org/10.1145/3173574.3173881Google ScholarDigital Library
Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 6282-6293. https://doi.org/10.18653/v1/2020.acl-main.560Google ScholarCross Ref
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), 230-237. Retrieved January 20, 2023 from https://aclanthology.org/W04-3230Google Scholar
Anne Lauscher, Archie Crowley, and Dirk Hovy. 2022. Welcome to the modern world of pronouns: Identity-inclusive natural language processing beyond gender. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022), 1221-1232. Retrieved November 23, 2022 from https://aclanthology.org/2022.coling-1.105Google Scholar
Jackie F.K. Lee. 2018. Gender representation in Japanese EFL textbooks – a corpus study. Gender and Education 30, 3, 379-395. https://doi.org/10.1080/09540253.2016.1214690Google ScholarCross Ref
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220-229. https://doi.org/10.1145/3287560.3287596Google ScholarDigital Library
Christine Murad, Cosmin Munteanu, Benjamin R. Cowan, Leigh Clark, Martin Porcheron, Heloisa Candello, Stephan Schlögl, Matthew P. Aylett, Jaisie Sin, Robert J. Moore, Grace Hughes, and Andrew Ku. 2021. Let's talk about CUIs: Putting conversational user interface design into practice. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA ’21), 1-6. https://doi.org/10.1145/3411763.3441336Google ScholarDigital Library
Tsuyoshi Ono and Sandra A. Thompson. 2003. Japanese (w)atashi/ore/boku I: They're not just pronouns. Cognitive Linguistics 14, 4, 321-347. https://doi.org/10.1515/cogl.2003.013Google ScholarCross Ref
Marcelo O. R. Prates, Pedro H. Avelar, and Luís C. Lamb. 2020. Assessing gender bias in machine translation: A case study with Google Translate. Neural Computing and Applications 32, 10, 6363-6381. https://doi.org/10.1007/s00521-019-04144-6Google ScholarDigital Library
Ari Schlesinger, W. Keith Edwards, and Rebecca E. Grinter. 2017. Intersectional HCI: Engaging identity through gender, race, and class. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17), 5412-5427. https://doi.org/10.1145/3025453.3025766Google ScholarDigital Library
Katie Seaborn, Shruti Chandra, and Thibault Fabre. 2023. Transcending the “male code”: Implicit masculine biases in NLP contexts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), 1-19. https://doi.org/10.1145/3544548.3581017Google ScholarDigital Library
Katie Seaborn and Alexa Frank. 2022. What pronouns for Pepper? A critical review of gender/ing in research. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22), 1-15. https://doi.org/10.1145/3491102.3501996Google ScholarDigital Library
Katie Seaborn, Norihisa Paul Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2022. Voice in human-agent interaction: A survey. ACM Computing Surveys (CSUR) 54, 4, Article No. 81. https://doi.org/10.1145/3386867Google ScholarDigital Library
Katta Spiel, Oliver L. Haimson, and Danielle Lottridge. 2019. How to do better with gender on surveys: A guide for HCI researchers. Interactions 26, 4, 62-65. https://doi.org/10.1145/3338283Google ScholarDigital Library
Damiano Spina, Johanne R. Trippas, Paul Thomas, Hideo Joho, Katriina Byström, Leigh Clark, Nick Craswell, Mary Czerwinski, David Elsweiler, Alexander Frummet, Souvick Ghosh, Johannes Kiesel, Irene Lopatovska, Daniel McDuff, Selina Meyer, Ahmed Mourad, Paul Owoicho, Sachin Pathiyan Cherumanal, Daniel Russell, and Laurianne Sitbon. 2021. Report on the future conversations workshop at CHIIR 2021. ACM SIGIR Forum 55, 1, 6:1-6:22. https://doi.org/10.1145/3476415.3476421Google ScholarDigital Library
Stephen V. Stehman. 1997. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment 62, 1, 77-89. https://doi.org/10.1016/S0034-4257(97)00083-7Google ScholarCross Ref
Yolande Strengers, Lizhen Qu, Qiongkai Xu, and Jarrod Knibbe. 2020. Adhering, steering, and queering: Treatment of gender in natural language generation. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20), 1-14. https://doi.org/10.1145/3313831.3376315Google ScholarDigital Library
Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), 1630-1640. https://doi.org/10.18653/v1/P19-1159Google ScholarCross Ref
Yasuhito Tanaka. 2001. Compilation of a multilingual parallel corpus. In Proceedings of the 2001 International Conference of the Pacific Association for Computational Linguistics (PACLING 2001) (PACLING 2001), 265-268.Google Scholar
Kellie Webster, Marta Recasens, Vera Axelrod, and Jason Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. Transactions of the Association for Computational Linguistics 6, 605-617. https://doi.org/10.1162/tacl_a_00240Google ScholarCross Ref
James Zou and Londa Schiebinger. 2018. AI can be sexist and racist—it's time to make it fair. Nature Publishing Group. Retrieved from https://www.nature.com/articles/d41586-018-05707-8Google Scholar

Index Terms

“I'm” Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

Recommendations

Transcending the “Male Code”: Implicit Masculine Biases in NLP Contexts
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Critical scholarship has elevated the problem of gender bias in data sets used to train virtual assistants (VAs). Most work has focused on explicit biases in language, especially against women, girls, femme-identifying people, and genderqueer folk; ...
Read More
Crowdsourced Monolingual Translation

An enormous potential exists for solving certain classes of computational problems through rich collaboration among crowds of humans supported by computers. Solutions to these problems used to involve human professionals, who are expensive to hire or ...
Read More
Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval
Abstract
Cross-Lingual Information Retrieval (CLIR) enables a user to query in a language which is different from the target documents language. CLIR incorporates a translation technique based on either a manual dictionary or a probabilistic dictionary ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
April 2023
3914 pages
ISBN:9781450394222
DOI:10.1145/3544549
Editors:
Albrecht Schmidt
LMU Munich, Germany
,
Kaisa Väänänen
Tampere University, Finland
,
Tesh Goyal
Google Research, USA
,
Per Ola Kristensson
University of Cambridge, UK
,
Anicia Peters
University of Namibia, Namibia
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 April 2023
Check for updates
Author Tags
Data sets
Feminist HCI
Gender bias
Natural language processing
Translation
Qualifiers
- Work in Progress
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 114
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

“I'm” Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

Transcending the “Male Code”: Implicit Masculine Biases in NLP Contexts

Crowdsourced Monolingual Translation

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval