ABSTRACT
Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the first steps to mitigate them constructively.
Supplemental Material
Available for Download
Supplemental material.
- U.S. House. 101st Congress, 2nd Session. 101 H.R. 2273. 1990. Americans with Disabilities Act of 1990. Washington: Government Printing Office.Google Scholar
- U.S. House. 88th Congress, 1st Session. 88 H.R. 6060. 1963. Equal Pay Act of 1963. Washington: Government Printing Office.Google Scholar
- U.S. House. 98th Congress, 2nd Session. 98 H.R. 5490. 1984. Civil Rights Act of 1984. Washington: Government Printing Office.Google Scholar
- Giorgia Aiello and Anna Woodhouse. 2016. When corporations come to define the visual politics of gender. Journal of Language and Politics 15, 3 (2016), 351--366.Google ScholarCross Ref
- Jeanette Altarriba, Lisa M Bauer, and Claudia Benvenuto. 1999. Concreteness, context availability, and imageability ratings and word associations for abstract, concrete, and emotion words. Behavior Research Methods, Instruments, & Computers 31, 4 (1999), 578--602.Google ScholarCross Ref
- Solon Barocas, Elizabeth Bradley, Vasant Honavar, and Foster Provost. 2017. Big data, data science, and civil rights. arXiv preprint arXiv:1706.03102 (2017).Google Scholar
- Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.Google Scholar
- Kristy Beers Fägersten. 2007. A sociolinguistic analysis of swear word offensiveness. Universität des Saarlands.Google Scholar
- Helen Bird, Sue Franklin, and David Howard. 2001. Age of acquisition and imageability ratings for a large set of words, including verbs and function words. Behavior Research Methods, Instruments, & Computers 33, 1 (2001), 73--79.Google ScholarCross Ref
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77--91.Google Scholar
- Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women also snowboard: Overcoming bias in captioning models. In European Conference on Computer Vision.Google Scholar
- Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops. IEEE, 13--18.Google ScholarDigital Library
- Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21, 2 (2010), 277--292.Google ScholarDigital Library
- L Elisa Celis and Vijay Keswani. 2019. Implicit Diversity in Image Summarization. arXiv preprint arXiv:1901.10265 (2019).Google Scholar
- Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015).Google Scholar
- A Chardon, I Cretois, and C Hourseau. 1991. Skin colour typology and suntanning pathways. International journal of cosmetic science 13, 4 (1991), 191--208.Google Scholar
- Michael J Cortese and April Fugett. 2004. Imageability ratings for 3,000 monosyllabic words. Behavior Research Methods, Instruments, & Computers 36, 3 (2004), 384--387.Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition.Google ScholarCross Ref
- Terrance DeVries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone? arXiv preprint arXiv:1906.02659 (2019).Google Scholar
- Jean-Marc Dewaele. 2016. Thirty shades of offensiveness: L1 and LX English usersâĂŹ understanding, perception and self-reported use of negative emotionladen words. Journal of Pragmatics 94 (2016), 112--127.Google ScholarCross Ref
- Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical Turk workers. In Proceedings of the eleventh acm international conference on web search and data mining. ACM, 135--143.Google ScholarDigital Library
- Chris Dulhanty and Alexander Wong. 2019. Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets. arXiv preprint arXiv:1905.01347 (2019).Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214--226.Google ScholarDigital Library
- Harrison Edwards and Amos Storkey. 2016. Censoring representations with an adversary. In ICLR.Google Scholar
- Eran Eidinger, Roee Enbar, and Tal Hassner. 2014. Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security 9, 12 (2014), 2170--2179.Google ScholarDigital Library
- Mark Everingham, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vision 88, 2 (June 2010), 303--338.Google ScholarDigital Library
- Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.Google ScholarDigital Library
- Thomas B Fitzpatrick. 1988. The validity and practicality of sun-reactive skin types I through VI. Archives of dermatology 124, 6 (1988), 869--871.Google Scholar
- Paul Frosh. 2001. Inside the image factory: stock photography and cultural production. Media, Culture & Society 23, 5 (2001), 625--646.Google ScholarCross Ref
- Paul Frosh. 2002. Rhetorics of the Overlooked: On the communicative modes of stock advertising images. Journal of Consumer Culture 2, 2 (2002), 171--196.Google ScholarCross Ref
- Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for Datasets. In Workshop on Fairness, Accountability, and Transparency in Machine Learning.Google Scholar
- Ken J Gilhooly and Robert H Logie. 1980. Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior research methods & instrumentation 12, 4 (1980), 395--427.Google Scholar
- Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.Google ScholarDigital Library
- Moritz Hardt, Eric Price, Nati Srebro, et al. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315--3323.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, and Anna Rohrbach. 2018. Women also snowboard: Overcoming bias in captioning models. In European Conference on Computer Vision. Springer, 793--811.Google ScholarDigital Library
- Minyoung Huh, Pulkit Agrawal, and Alexei A Efros. 2016. What makes ImageNet good for transfer learning? arXiv preprint arXiv:1608.08614 (2016).Google Scholar
- Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of Test (Un)fairness: Lessons for Machine Learning. In ACM Conference on Fairness, Accountability and Transparency.Google ScholarDigital Library
- Faisal Kamiran and Toon Calders. 2009. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication. IEEE, 1--6.Google ScholarCross Ref
- Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. 2011. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops. IEEE, 643--650.Google ScholarDigital Library
- Matthew Kay, Cynthia Matuszek, and Sean A Munson. 2015. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 3819--3828.Google ScholarDigital Library
- Brendan F Klare, Ben Klein, Emma Taborsky, Austin Blanton, Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K Jain. 2015. Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1931--1939.Google ScholarCross Ref
- Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent trade-offs in the fair determination of risk scores. In Proc. 8th Conf. on Innovations in Theoretical Computer Science (ITCS).Google Scholar
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Li Fei-Fei. 2017. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision 123, 1 (01 May 2017), 32--73.Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google Scholar
- Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig, et al. 2018. The Open Images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018).Google Scholar
- Sam Levin. 2016. A beauty contest was judged by AI and the robots didnâĂŹt like dark skin.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European conference on computer vision.Google ScholarCross Ref
- Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5162--5170.Google ScholarCross Ref
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV.Google Scholar
- Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, and Stella X Yu. 2019. Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2537--2546.Google ScholarCross Ref
- David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309 (2018).Google Scholar
- George A Miller. 1998. WordNet: An electronic lexical database. MIT press.Google Scholar
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2018. Model Cards for Model Reporting. In ACM Conference on Fairness, Accountability and Transparency.Google Scholar
- Safiya Umoja Noble. 2018. Algorithms of oppression: How search engines reinforce racism. nyu Press.Google Scholar
- The Office of Communications (Ofcom). 2016. Attitudes To Potentially Offensive Language and Gestures on TV and Radio. Technical Report. https://www.ofcom.org.uk/research-and-data/tv-radio-and-on-demand/tv-research/offensive-language-2016Google Scholar
- Allan Paivio. 1965. Abstractness, imagery, and meaningfulness in paired-associate learning. Journal of Verbal Learning and Verbal Behavior 4, 1 (1965), 32--38.Google ScholarCross Ref
- Allan Paivio, John C Yuille, and Stephen A Madigan. 1968. Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of experimental psychology 76, 1p2 (1968), 1.Google ScholarCross Ref
- Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 560--568.Google ScholarDigital Library
- Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. In Advances in Neural Information Processing Systems. 5680--5689.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.Google ScholarCross Ref
- Lauren Rhue. 2018. Racial Influence on Automated Perceptions of Emotions. (2018). https://ssrn.com/abstract=3281765Google Scholar
- Joel Ross, Andrew Zaldivar, Lilly Irani, and Bill Tomlinson. 2009. Who are the turkers? worker demographics in amazon mechanical turk. Department of Informatics, University of California, Irvine, USA, Tech. Rep (2009).Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International journal of computer vision 115, 3 (2015), 211--252.Google Scholar
- Hee Jung Ryu, Hartwig Adam, and Margaret Mitchell. 2018. Inclusivefacenet: Improving face attribute detection with race and gender diversity. In Proceedings of FATML.Google Scholar
- Barry S Sapolsky, Daniel M Shafer, and Barbara K Kaye. 2010. Rating offensive words in three television program contexts. Mass Communication and Society 14, 1 (2010), 45--70.Google ScholarCross Ref
- Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. 2017. No classification without representation: Assessing geodiversity issues in open data sets for the developing world. In NeurIPS workshop: Machine Learning for the Developing World.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Pierre Stock and Moustapha Cisse. 2018. Convnets and ImageNet beyond accuracy: Understanding mistakes and uncovering biases. In Proceedings of the European Conference on Computer Vision (ECCV). 498--512.Google ScholarCross Ref
- Hirotsugu Takiwaki et al. 1998. Measurement of skin color: practical application and theoretical considerations. Journal of Medical Investigation 44 (1998), 121--126.Google Scholar
- Antonio Torralba, Alexei A Efros, et al. 2011. Unbiased look at dataset bias.. In CVPR, Vol. 1. Citeseer, 7.Google ScholarDigital Library
- Antonio Torralba, Rob Fergus, and William T. Freeman. 2008. 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (Nov. 2008), 1958--1970.Google ScholarDigital Library
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3156--3164.Google ScholarCross Ref
- Yilun Wang and Michal Kosinski. 2018. Deep neural networks are more accurate than humans at detecting sexual orientation from facial images. Journal of Personality and Social Psychology (JPSP) (2018).Google Scholar
- P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. 2010. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001. California Institute of Technology.Google Scholar
- Meredith Whittaker, Kate Crawford, Roel Dobbe, Genevieve Fried, Elizabeth Kaziunas, Varoon Mathur, Sarah Myers West, Rashida Richardson, Jason Schultz, and Oscar Schwartz. 2018. AI Now Report 2018. https://ainowinstitute.org/AI_Now_2018_Report.pdf.Google Scholar
- Lydia TS Yee. 2017. Valence, arousal, familiarity, concreteness, and imageability ratings for 292 two-character Chinese nouns in Cantonese speakers in Hong Kong. PloS one 12, 3 (2017), e0174569.Google ScholarCross Ref
- Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2015. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259 (2015).Google Scholar
- Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325--333.Google ScholarDigital Library
- Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 335--340.Google ScholarDigital Library
- Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In EMNLP.Google Scholar
- Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).Google Scholar
Index Terms
- Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy
Recommendations
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
CSCW2Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has ...
From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets
CSCWComputer vision is a "data hungry" field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data ...
A survey on RGB-D datasets
AbstractRGB-D data is essential for solving many problems in computer vision. Hundreds of public RGB-D datasets containing various scenes, such as indoor, outdoor, aerial, driving, and medical, have been proposed. These datasets are useful for different ...
Highlights- 231 public datasets related to RGB-D data gathered and organized.
- A survey with 2119 papers reviewed, and datasets collected from seven distinct sub-areas.
- More than 100 new RGB-D datasets included when compared to the most recent ...
Comments