ABSTRACT
Thanks to the increasing growth of computational power and data availability, the research in machine learning has advanced with tremendous rapidity. Nowadays, the majority of automatic decision making systems are based on data. However, it is well known that machine learning systems can present problematic results if they are built on partial or incomplete data. In fact, in recent years several studies have found a convergence of issues related to the ethics and transparency of these systems in the process of data collection and how they are recorded. Although the process of rigorous data collection and analysis is fundamental in the model design, this step is still largely overlooked by the machine learning community. For this reason, we propose a method of data annotation based on Bayesian statistical inference that aims to warn about the risk of discriminatory results of a given data set. In particular, our method aims to deepen knowledge and promote awareness about the sampling practices employed to create the training set, highlighting that the probability of success or failure conditioned to a minority membership is given by the structure of the data available. We empirically test our system on three datasets commonly accessed by the machine learning community and we investigate the risk of racial discrimination.
- Aws Albarghouthi and Samuel Vinitsky. 2019. Fairness-Aware Programming. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* '19). Association for Computing Machinery, New York, NY, USA, 211--219. https://doi.org/10.1145/3287560.3287588Google ScholarDigital Library
- Julia Angwin and Terry Jr. Parris. 2016. Facebook Lets Advertisers Exclude Users by Race. ProPublica. Retrieved September 12, 2020 from https://www.propublica.org/article/facebook-lets-advertisers-exclude-users-by-raceGoogle Scholar
- Abolfazl Asudeh, Assessing Jin, Remedying Coverage for a Given Dataset, and Hosagrahar Visvesvaraya Jagadish. 2019. Assessing and Remedying Coverage for a Given Dataset. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, New Jersey, US, 554--565.Google Scholar
- Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2018. Fairness and Machine Learning. http://www.fairmlbook.org.Google Scholar
- Ruha Benjamin. 2019. Assessing risk, automating racism. Science 366, 6464 (2019), 421--422. https://doi.org/10.1126/science.aaz3873 arXiv:https://science.sciencemag.org/content/366/6464/421.full.pdfGoogle Scholar
- Elena Beretta, Antonio Santangelo, Bruno Lepri, Antonio Vetró, and Juan Carlos De Martin. 2019. The Invisible Power of Fairness. How Machine Learning Shapes Democracy. In Advances in Artificial Intelligence, Proceedings of 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019 (Kingston, ON, Canada), Marie-Jean Meurs and Frank Rudzicz (Eds.), Vol. 11489. Springer, Cham, Germany, 238--250. https://doi.org/10.1007/978$-$3$-$030$-$18305$-$9{_}19Google Scholar
- Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in Criminal Justice Risk Assessments: The State of the Art. Sociological Methods & Research 50, 1 (2018), 3--44. https://doi.org/10.1177/0049124118782533Google ScholarCross Ref
- Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186. https://doi.org/10.1126/science.aal4230 arXiv:https://science.sciencemag.org/content/356/6334/183.full.pdfGoogle Scholar
- Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 2334--2346. https://doi.org/10.1145/3025453.3026044Google ScholarDigital Library
- Ashley Colley, Jacob Thebault-Spieker, Allen Yilun Lin, Donald Degraen, Benjamin Fischman, Jonna Häkkilä, Kate Kuehl, Valentina Nisi, Nuno Jardim Nunes, Nina Wenig, Dirk Wenig, Brent Hecht, and Johannes Schöning. 2017. The Geography of PokéMon GO: Beneficial and Problematic Effects on Places and Movement. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 1179--1192. https://doi.org/10.1145/3025453.3025495Google ScholarDigital Library
- D. Dahiwade, G. Patle, and E. Meshram. 2019. Designing Disease Prediction Model Using Machine Learning Approach., 1211-1215 pages.Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, New York, NY, USA, 214--226. https://doi.org/10.1145/2090236.2090255Google ScholarDigital Library
- Benjamin Edelman, Michael Luca, and Dan Svirsky. 2017. Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment. American Economic Journal: Applied Economics 9, 2 (April 2017), 1--22. https://doi.org/10.1257/app.20160213Google ScholarCross Ref
- Virginia Eubanks. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin's Press, Inc., USA.Google ScholarDigital Library
- Elaine Fehrman, Vincent Egan, and Evgeny M. Mirkes. 2015. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Elaine Fehrman, Awaz K. Muhammad, Evgeny M. Mirkes, Vincent Egan, and Alexander N. Gorban. 2017. The Five Factor Model of Personality and Evaluation of Drug Consumption Risk., 231--242 pages. https://doi.org/10.1007/978-3-319-55723-6_18Google Scholar
- Timnit Gebru, Jamie Morgenstern, W. Jennifer Vecchione, Brianna Vaughan, Hanna Wallach, Hal III Daumé, and Kate Crawford. 2018. Datasheets for Datasets. arXiv:arXiv:1803.09010Google Scholar
- R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Machinery, New York, NY, USA, 325--336. https://doi.org/10.1145/3351095.3372862Google ScholarDigital Library
- Yolanda Gil, Cédric H. David, Ibrahim Demir, Bakinam T. Essawy, Robinson W. Fulweiler, Jonathan L. Goodall, Leif Karlstrom, Huikyo Lee, Heath J. Mills, Ji-Hyun Oh, Suzanne A. Pierce, Allen Pope, Mimi W. Tzeng, Sandra R. Villamizar, and Xuan Yu. 2016. Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science 3, 10 (2016), 388--415. https://doi.org/10.1002/2015EA000136 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/2015EA000136Google ScholarCross Ref
- Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 3323--3331.Google ScholarDigital Library
- Jeff Harry Thornburg Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine Bias. ProPublica. Retrieved September 2, 2020 from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
- Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. CoRR abs/1805.03677 (2018), 21 pages. arXiv:1805.03677 http://arxiv.org/abs/1805.03677Google Scholar
- K. Indira and M. K. Kavithadevi. 2019. Efficient Machine Learning Model for Movie Recommender Systems Using Multi-Cloud Environment. obile Networks and Applications 24, 6 (2019), 1872--1882. https://doi.org/10.1007/s11036-019-01387-4Google Scholar
- Eun Seo Jo and Timnit Gebru. 2020. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Machinery, New York, NY, USA, 306--316. https://doi.org/10.1145/3351095.3372829Google ScholarDigital Library
- Sumitkumar Kanoje, Debajyoti Mukhopadhyay, and Sheetal Girase. 2016. User Profiling for University Recommender System Using Automatic Information Retrieval. In Procedia Computer Science, Vol. 78. Elsevier, Netherlands, 5--12. https://doi.org/10.1016/j.procs.2016.02.002 1st International Conference on Information Security & Privacy 2015.Google ScholarDigital Library
- Nicolas Kayser-Bril. 2020. Google apologizes after its Vision AI produced racist results. AlgorithmWatch. Retrieved August 17, 2020 from https://algorithmwatch.org/en/story/google-vision-racism/Google Scholar
- Jon Kleinberg. 2018. Inherent Trade-Offs in Algorithmic Fairness. In Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems (Irvine, CA, USA) (SIGMETRICS '18). ACM Press, New York, NY, USA, 40--40. https://doi.org/10.1145/3219617.3219634Google ScholarDigital Library
- Ronny Kohavi and Barry Becker. 1996. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Joe Kukura. 2020. Facebook (Finally) Removes Racial Ad Targeting. SFist. Retrieved September 12, 2020 from https://sfist.com/2020/08/31/facebook-finally-removes-racial-ad-targeting/Google Scholar
- Vidushi Marda and Shivangi Narayan. 2020. Data in New Delhi's Predictive Policing System. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Machinery, New York, NY, USA, 317--324. https://doi.org/10.1145/3351095.3372865Google ScholarDigital Library
- Daniel McDuff, Roger Cheng, and Ashish Kapoor. 2018. Identifying Bias in AI using Simulation. arXiv:arXiv:1810.00471Google Scholar
- Safiya Umoja Noble. 2018. Algorithms of oppression: How search engines reinforce racism. NYU Press, New York, NY, USA.Google Scholar
- Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447--453. https://doi.org/10.1126/science.aax2342 arXiv:https://science.sciencemag.org/content/366/6464/447.full.pdfGoogle Scholar
- Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group, New York.Google Scholar
- Oladapo Oyebode and Rita Orji. 2020. A hybrid recommender system for product sales in a banking environment., 11 pages. https://doi.org/10.1007/s42786-019-00014-wGoogle Scholar
- Inioluwa Deborah Raji, Timnit Gebru, Margaret Mitchell, Joy Buolamwini, Joonseok Lee, and Emily Denton. 2020. Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (New York, NY, USA) (AIES '20). Association for Computing Machinery, New York, NY, USA, 145--151. https://doi.org/10.1145/3375627.3375820Google ScholarDigital Library
- Sheldon M Ross. 1996. Stochastic processes. Wiley, New Jersey, US. https://books.google.de/books?id=ImUPAQAAMAAJGoogle Scholar
- Markus Schedl, Hamed Zamani, Ching-Wei Chen, Yashar Deldjoo, and Mehdi Elahi. 2018. Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval 7, 2 (2018), 95--116. https://doi.org/10.1007/s13735-018-0154-2Google ScholarCross Ref
- Z. Siting, H. Wenxing, Z. Ning, and Y. Fan. 2012. Job recommender systems: A survey. In 2012 7th International Conference on Computer Science Education (ICCSE). IEEE Xplore Digital Library, New York, 920--924.Google Scholar
- Lin Song. 2020. Two-Sided Price Discrimination by Media Platforms. Marketing Science 39, 2 (2020), 317--338. https://doi.org/10.1287/mksc.2019.1211 arXiv:https://doi.org/10.1287/mksc.2019.1211Google ScholarDigital Library
- Toll Speicher, Muhammad Ali, Giridhari Venkatadri, Filipe Nunes Ribeiro, George Arvanitakis, Fabrício Benevenuto, Krishna P. Gummadi, Patrick Loiseau, and Alan Mislove. 2018. Potential for Discrimination in Online Targeted Advertising. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (Proceedings of Machine Learning Research, Vol. 81), Sorelle A. Friedler and Christo Wilson (Eds.). PMLR, New York, NY, USA, 5--19. http://proceedings.mlr.press/v81/speicher18a.htmlGoogle Scholar
- Tatiana Tommasi, Patricia Novi, Barbara Caputo, and Tinne Tuytelaars. 2017. A Deeper Look at Dataset Bias. Csurka G. (eds) Domain Adaptation in Computer Vision Applications. Advances in Computer Vision and Pattern Recognition, Springer, Cham, Swiss. 37--55 pages. https://doi.org/10.1007/978-3-319-58347-1_2Google Scholar
- Yan Wang and Xuelei Sherry Ni. 2017. Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, and Model Emsembling Algorithms. International Journal of Managing Information Technology (IJMIT) 11, 1 (2017), 15 pages. https://ssrn.com/abstract=3366806Google Scholar
- Betsy A. Williams, Catherine F. Brooks, and Yotam Shmargad. 2018. How Algorithms Discriminate Based on Data They Lack: Challenges, Solutions, and Policy Implications. Journal of Information Policy 8 (2018), 78--115. https://www.jstor.org/stable/10.5325/jinfopoli.8.2018.0078Google ScholarCross Ref
- Yao Zhou, M. Isabel Vales, Aoxue Wang, and Zhiwu Zhang. 2017. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction. Briefings Bioinform 18, 5 (2017), 744--753. https://doi.org/10.1093/bib/bbw064Google ScholarCross Ref
- Donald W. Zimmerman, Bruno D. Zumbo, and Richard H. Williams. 2017. Bias in estimation and hypothesis testing of correlation. Psicológica 24, 1 (2017), 133--158.Google Scholar
Index Terms
- Detecting discriminatory risk through data annotation based on Bayesian inferences
Recommendations
Understanding How Non-Experts Collect and Annotate Activity Data
UbiComp '18: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable ComputersTraining classifiers for human activity recognition systems often relies on large corpora of annotated sensor data. Crowd sourcing is one way to collect and annotate large amounts of sensor data. Crowd sourcing often depends on unskilled workers to ...
Data Labeling: An Empirical Investigation into Industrial Challenges and Mitigation Strategies
Product-Focused Software Process ImprovementAbstractLabeling is a cornerstone of supervised machine learning. However, in industrial applications, data is often not labeled, which complicates using this data for machine learning. Although there are well-established labeling techniques such as ...
Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from?
FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and TransparencyMany machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) ...
Comments