research-article

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

Authors:
Gal Yona

Weizmann Institute, Rehovot, Israel

Weizmann Institute, Rehovot, Israel
View Profile

,
Amirata Ghorbani

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
James Zou

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and SocietyJuly 2021Pages 1034–1041https://doi.org/10.1145/3461702.3462574

Published:30 July 2021Publication History

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Pages 1034–1041

ABSTRACT

A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D' would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A' on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Supplemental Material

Available for Download

zip

aiespp088aux.zip (1 MB)

Files: 1. supp.pdf: a pdf containing supplementary material.

References

Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2018. A marketplace for data: an algorithmic solution. arXiv preprint arXiv:1805.08125 (2018).Google Scholar
Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research 36, 5 (2009), 1726--1730.Google ScholarDigital Library
Jianbo Chen, Le Song, Martin JWainwright, and Michael I Jordan. 2018. L-Shapley and C-Shapley: Efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018).Google Scholar
Shay Cohen, Gideon Dror, and Eytan Ruppin. 2007. Feature selection via coalitional game theory. Neural Computation 19, 7 (2007), 1939--1961.Google ScholarDigital Library
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598--617.Google ScholarCross Ref
Shaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence 172, 14 (2008), 1673--1699.Google ScholarDigital Library
Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger Hoos, and Kevin Leyton-Brown. 2016. Using the shapley value to analyze algorithm portfolios. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarCross Ref
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarDigital Library
Amirata Ghorbani, Michael P Kim, and James Zou. 2020. A Distributional Framework for Data Valuation. arXiv preprint arXiv:2002.12334 (2020).Google Scholar
Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning. 2242--2251.Google Scholar
The Gradient. June 24, 2020. Lessons from the PULSE Model and Discussion. https://thegradient.pub/pulse-lessons/.Google Scholar
Faruk Gul. 1989. Bargaining foundations of Shapley value. Econometrica: Journal of the Econometric Society (1989), 81--95.Google Scholar
Herbert Hamers, Bart Husslage, R Lindelauf, Tjeerd Campen, et al. 2016. A New Approximation Method for the Shapley Value Applied to the WTC 9/11 Terrorist Attack. Technical Report.Google Scholar
Sara Hooker. 2021. Moving beyond "algorithmic bias is a data problem". Patterns 2, 4 (2021), 100241.Google ScholarCross Ref
Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. 2019. What Do Compressed Deep Neural Networks Forget? arXiv preprint arXiv:1911.05248 (2019).Google Scholar
Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google Scholar
Ruoxi Jia, David Dao, BoxinWang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, and Costas Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value. arXiv preprint arXiv:1902.10275 (2019).Google Scholar
Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 247--254.Google ScholarDigital Library
Igor Kononenko et al. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, Jan (2010), 1--18.Google Scholar
Lars Kotthoff, Alexandre Fréchette, Tomasz P Michalak, Talal Rahwan, Holger H Hoos, and Kevin Leyton-Brown. 2018. Quantifying Algorithmic Improvements over Time. In IJCAI. 5165--5171.Google Scholar
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730--3738.Google ScholarDigital Library
ScottMLundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888 (2018).Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google Scholar
Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the estimation error of sampling-based Shapley value approximation. arXiv preprint arXiv:1306.4265 (2013).Google Scholar
Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.Google ScholarCross Ref
Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravindran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research 46 (2013), 607--650.Google ScholarCross Ref
John Willard Milnor and Lloyd S Shapley. 1978. Values of large games II: Oceanic games. Mathematics of operations research 3, 4 (1978), 290--307.Google Scholar
Art B Owen. 2014. Sobol'indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 245--251.Google ScholarCross Ref
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.Google Scholar
Lloyd S Shapley, Alvin E Roth, et al. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.Google Scholar
Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12, 3 (2015), e1001779.Google Scholar
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.Google ScholarDigital Library
Venturebeat. June 26, 2020. AI Weekly: A deep learning pioneer's teachable moment on AI bias. https://venturebeat.com/2020/06/26/ai-weekly-a-deeplearning-pioneers-teachable-moment-on-ai-bias/.Google Scholar
Lior Wolf, Tal Hassner, and Yaniv Taigman. 2011. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE transactions on pattern analysis and machine intelligence 33, 10 (2011), 1978-- 1990.Google ScholarDigital Library
Tom Yan and A. Procaccia. 2020. If You Like Shapley Then You'll Love the Core.Google Scholar
James Zou and Londa Schiebinger. 2018. AI can be sexist and racist-it's time to make it fair.Google Scholar

Index Terms

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

Measuring justice in machine learning
FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

How can we build more just machine learning systems? To answer this question, we need to know both what justice is and how to tell whether one system is more or less just than another. That is, we need both a definition and a measure of justice. ...
Read More
Towards Responsible Spatial Data Science and Geo-AI
IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

Responsible Geo-AI encourages the design and development of spatial methods, processes, algorithms, and systems to discover spatial patterns (e.g., hotspots, colocations) that reduce adverse impacts on the communities that use them. We propose a vision ...
Read More
A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

The ethical implications of algorithmic systems have been much discussed in both HCI and the broader community of those interested in technology design, development and policy. In this paper, we explore the application of one prominent ethical framework-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
July 2021
1077 pages
ISBN:9781450384735
DOI:10.1145/3461702
Program Chairs:
Marion Fourcade
University of California Berkeley, USA
,
Benjamin Kuipers
University of Michigan, USA
,
Seth Lazar
Australian National University, Australia
,
Deirdre Mulligan
University of California Berkeley, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 July 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accountability
data valuation
fairness
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate61of162submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Measuring justice in machine learning

Towards Responsible Spatial Data Science and Geo-AI

A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Measuring justice in machine learning

Towards Responsible Spatial Data Science and Geo-AI

A Mulching Proposal: Analysing and Improving an Algorithmic System for Turning the Elderly into High-Nutrient Slurry

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media