ABSTRACT
A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D' would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A' on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.
Supplemental Material
Available for Download
Files: 1. supp.pdf: a pdf containing supplementary material.
- Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2018. A marketplace for data: an algorithmic solution. arXiv preprint arXiv:1805.08125 (2018).Google Scholar
- Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
- Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research 36, 5 (2009), 1726--1730.Google ScholarDigital Library
- Jianbo Chen, Le Song, Martin JWainwright, and Michael I Jordan. 2018. L-Shapley and C-Shapley: Efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018).Google Scholar
- Shay Cohen, Gideon Dror, and Eytan Ruppin. 2007. Feature selection via coalitional game theory. Neural Computation 19, 7 (2007), 1939--1961.Google ScholarDigital Library
- Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598--617.Google ScholarCross Ref
- Shaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence 172, 14 (2008), 1673--1699.Google ScholarDigital Library
- Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger Hoos, and Kevin Leyton-Brown. 2016. Using the shapley value to analyze algorithm portfolios. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarCross Ref
- Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarDigital Library
- Amirata Ghorbani, Michael P Kim, and James Zou. 2020. A Distributional Framework for Data Valuation. arXiv preprint arXiv:2002.12334 (2020).Google Scholar
- Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning. 2242--2251.Google Scholar
- The Gradient. June 24, 2020. Lessons from the PULSE Model and Discussion. https://thegradient.pub/pulse-lessons/.Google Scholar
- Faruk Gul. 1989. Bargaining foundations of Shapley value. Econometrica: Journal of the Econometric Society (1989), 81--95.Google Scholar
- Herbert Hamers, Bart Husslage, R Lindelauf, Tjeerd Campen, et al. 2016. A New Approximation Method for the Shapley Value Applied to the WTC 9/11 Terrorist Attack. Technical Report.Google Scholar
- Sara Hooker. 2021. Moving beyond "algorithmic bias is a data problem". Patterns 2, 4 (2021), 100241.Google ScholarCross Ref
- Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. 2019. What Do Compressed Deep Neural Networks Forget? arXiv preprint arXiv:1911.05248 (2019).Google Scholar
- Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google Scholar
- Ruoxi Jia, David Dao, BoxinWang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, and Costas Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value. arXiv preprint arXiv:1902.10275 (2019).Google Scholar
- Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 247--254.Google ScholarDigital Library
- Igor Kononenko et al. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, Jan (2010), 1--18.Google Scholar
- Lars Kotthoff, Alexandre Fréchette, Tomasz P Michalak, Talal Rahwan, Holger H Hoos, and Kevin Leyton-Brown. 2018. Quantifying Algorithmic Improvements over Time. In IJCAI. 5165--5171.Google Scholar
- Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730--3738.Google ScholarDigital Library
- ScottMLundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888 (2018).Google Scholar
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google Scholar
- Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the estimation error of sampling-based Shapley value approximation. arXiv preprint arXiv:1306.4265 (2013).Google Scholar
- Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.Google ScholarCross Ref
- Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravindran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research 46 (2013), 607--650.Google ScholarCross Ref
- John Willard Milnor and Lloyd S Shapley. 1978. Values of large games II: Oceanic games. Mathematics of operations research 3, 4 (1978), 290--307.Google Scholar
- Art B Owen. 2014. Sobol'indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 245--251.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
- Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.Google Scholar
- Lloyd S Shapley, Alvin E Roth, et al. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.Google Scholar
- Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12, 3 (2015), e1001779.Google Scholar
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.Google ScholarDigital Library
- Venturebeat. June 26, 2020. AI Weekly: A deep learning pioneer's teachable moment on AI bias. https://venturebeat.com/2020/06/26/ai-weekly-a-deeplearning-pioneers-teachable-moment-on-ai-bias/.Google Scholar
- Lior Wolf, Tal Hassner, and Yaniv Taigman. 2011. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE transactions on pattern analysis and machine intelligence 33, 10 (2011), 1978-- 1990.Google ScholarDigital Library
- Tom Yan and A. Procaccia. 2020. If You Like Shapley Then You'll Love the Core.Google Scholar
- James Zou and Londa Schiebinger. 2018. AI can be sexist and racist-it's time to make it fair.Google Scholar
Index Terms
- Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data
Recommendations
Measuring justice in machine learning
FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and TransparencyHow can we build more just machine learning systems? To answer this question, we need to know both what justice is and how to tell whether one system is more or less just than another. That is, we need both a definition and a measure of justice. ...
Towards Responsible Spatial Data Science and Geo-AI
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary ComputingResponsible Geo-AI encourages the design and development of spatial methods, processes, algorithms, and systems to discover spatial patterns (e.g., hotspots, colocations) that reduce adverse impacts on the communities that use them. We propose a vision ...
A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsThe ethical implications of algorithmic systems have been much discussed in both HCI and the broader community of those interested in technology design, development and policy. In this paper, we explore the application of one prominent ethical framework-...
Comments