ABSTRACT
We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.
- Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. arXiv preprint arXiv:1608.04207 (2016).Google Scholar
- Kristen M Altenburger, Rajlakshmi De, Kaylyn Frazier, Nikolai Avteniev, and Jim Hamilton. 2017. Are There Gender Differences in Professional Self-Promotion? An Empirical Case Study of LinkedIn Profiles Among Recent MBA Graduates. In ICWSM. 460--463.Google Scholar
- Ian Ayres. 2002. Outcome tests of racial disparities in police practices. Justice research and Policy 4, 1--2 (2002), 131--142.Google Scholar
- Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Cal. L. Rev. 104 (2016), 671.Google Scholar
- Marianne Bertrand and Esther Duflo. 2017. Field Experiments on Discrimination. In Handbook of Economic Field Experiments. Vol. 1. Elsevier, 309--393.Google Scholar
- Marianne Bertrand and Sendhil Mullainathan. 2004. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review 94, 4 (2004), 991--1013.Google Scholar
- Asia J Biega, Krishna P Gummadi, and Gerhard Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. arXiv preprint arXiv:1805.01788 (2018). Google ScholarDigital Library
- Su Lin Blodgett and Brendan O'Connor. 2017. Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. arXiv preprint arXiv: 1707.00061 (2017).Google Scholar
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.Google ScholarCross Ref
- Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems. 4349--4357. Google ScholarDigital Library
- Victoria L Brescoll. 2011. Who takes the floor and why: Gender, power, and volubility in organizations. Administrative Science Quarterly 56, 4 (2011), 622--641.Google ScholarCross Ref
- Toon Calders and Indrė Žliobaitė. 2013. Why unbiased computational processes can lead to discriminative decision procedures. In Discrimination and privacy in the information society. Springer, 43--57.Google Scholar
- Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186.Google Scholar
- L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2018. Ranking with fairness constraints. In Proceedings of the International Colloquium on Automata, Languages, and Programming.Google Scholar
- Na Cheng, Rajarathnam Chandramouli, and KP Subbalakshmi. 2011. Author gender identification from text. Digital Investigation 8, 1 (2011), 78--88. Google ScholarDigital Library
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarDigital Library
- Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2017. Measuring and Mitigating Unintended Bias in Text Classification. (2017).Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214--226. Google ScholarDigital Library
- Cynthia Dwork and Christina Ilvento. 2018. Fairness Under Composition. arXiv preprint arXiv:1806.06122 (2018).Google Scholar
- Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, and Mark DM Leiserson. 2018. Decoupled classifiers for group-fair and efficient machine learning. In Conference on Fairness, Accountability and Transparency. 119--133.Google Scholar
- Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences 115, 16 (2018), E3635--E3644.Google ScholarCross Ref
- Sahin Cem Geyik and Krishnaram Kenthapadi. October 2018. Building Representative Talent Search at LinkedIn. (October 2018). LinkedIn engineering blog post, Available at https://engineering.linkedin.com/blog/2018/10/building-representative-talent-search-at-linkedin.Google Scholar
- Donna K Ginther and Shulamit Kahn. 2004. Women in economics: Moving up or falling off the academic career ladder? Journal of Economic perspectives 18, 3 (2004), 193--214.Google ScholarCross Ref
- Claudia Goldin and Cecilia Rouse. 2000. Orchestrating impartiality: The impact of "blind" auditions on female musicians. American economic review 90, 4 (2000), 715--741.Google Scholar
- Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315--3323. Google ScholarDigital Library
- Deborah Hellman. 2018. Indirect Discrimination and the Duty to Avoid Compounding Injustice. Foundations of Indirect Discrimination Law, Forthcoming (2018).Google Scholar
- Niki Kilbertus, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems. 656--666. Google ScholarDigital Library
- Pauline T Kim. 2016. Data-driven discrimination at work. Wm. & Mary L. Rev. 58 (2016), 857.Google Scholar
- Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan. 2018. Algorithmic fairness. In AEA Papers and Proceedings, Vol. 108. 22--27.Google ScholarCross Ref
- Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni. 2002. Automatically categorizing written texts by author gender. Literary and Linguistic Computing 17, 4 (2002), 401--412.Google ScholarCross Ref
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).Google Scholar
- David Niven and Jeremy Zilber. 2001. Do women and men in congress cultivate different images? Evidence from congressional web sites. Political Communication 18, 4 (2001), 395--405.Google ScholarCross Ref
- Devin G Pope and Justin R Sydnor. 2011. Implementing anti-discrimination policies in statistical profiling models. American Economic Journal: Economic Policy 3, 3 (2011), 206--31.Google ScholarCross Ref
- Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 (2018).Google Scholar
- Heather Sarsons. 2015. Gender differences in recognition for group work. Harvard University Working Paper (2015).Google Scholar
- Heather Sarsons. 2017. Interpreting signals in the labor market: evidence from medical referrals. Job Market Paper (2017).Google Scholar
- David G Smith, Judith E Rosenstein, Margaret C Nikolov, and Darby A Chaney. 2018. The Power of Language: Gender, Status, and Agency in Performance Evaluations. Sex Roles (2018), 1--13.Google Scholar
- Rachael Tatman. 2017. Gender and Dialect Bias in YouTube's Automatic Captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. 53--59.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008. Google ScholarDigital Library
- Ke Yang and Julia Stoyanovich. 2017. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 22. Google ScholarDigital Library
- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480--1489.Google ScholarCross Ref
- Meike Zehlike, Francesco Bonchi, Carlos Castillo, Sara Hajian, Mohamed Megahed, and Ricardo Baeza-Yates. 2017. FA*TR: A fair top-k ranking algorithm. In Proceedings of the ACM Conference on Information and Knowledge Management. 1569--1578. Google ScholarDigital Library
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325--333. Google ScholarDigital Library
- Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv: 1804.06876 (2018).Google Scholar
Index Terms
- Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting
Recommendations
Social norm bias: residual harms of fairness-aware algorithms
AbstractMany modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race. However, these algorithms seldom account for within-group heterogeneity ...
Controlled Analyses of Social Biases in Wikipedia Bios
WWW '22: Proceedings of the ACM Web Conference 2022Social biases on Wikipedia, a widely-read global platform, could greatly influence public opinion. While prior research has examined man/woman gender bias in biography articles, possible influences of other demographic attributes limit conclusions. In ...
Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning
GE '18: Proceedings of the 1st International Workshop on Gender Equality in Software EngineeringArtificial intelligence is increasingly influencing the opinions and behavior of people in everyday life. However, the over-representation of men in the design of these technologies could quietly undo decades of advances in gender equality. Over ...
Comments