skip to main content
10.1145/3461702.3462574acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

Published:30 July 2021Publication History

ABSTRACT

A learning algorithm A trained on a dataset D is revealed to have poor performance on some subpopulation at test time. Where should the responsibility for this lay? It can be argued that the data is responsible, if for example training A on a more representative dataset D' would have improved the performance. But it can similarly be argued that A itself is at fault, if training a different variant A' on the same dataset D would have improved performance. As ML becomes widespread and such failure cases more common, these types of questions are proving to be far from hypothetical. With this motivation in mind, in this work we provide a rigorous formulation of the joint credit assignment problem between a learning algorithm A and a dataset D. We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Skip Supplemental Material Section

Supplemental Material

References

  1. Anish Agarwal, Munther Dahleh, and Tuhin Sarkar. 2018. A marketplace for data: an algorithmic solution. arXiv preprint arXiv:1805.08125 (2018).Google ScholarGoogle Scholar
  2. Eugene Bagdasaryan, Omid Poursaeed, and Vitaly Shmatikov. 2019. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019), 15479--15488.Google ScholarGoogle Scholar
  3. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google ScholarGoogle Scholar
  4. Javier Castro, Daniel Gómez, and Juan Tejada. 2009. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research 36, 5 (2009), 1726--1730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jianbo Chen, Le Song, Martin JWainwright, and Michael I Jordan. 2018. L-Shapley and C-Shapley: Efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610 (2018).Google ScholarGoogle Scholar
  6. Shay Cohen, Gideon Dror, and Eytan Ruppin. 2007. Feature selection via coalitional game theory. Neural Computation 19, 7 (2007), 1939--1961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 598--617.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shaheen S Fatima, Michael Wooldridge, and Nicholas R Jennings. 2008. A linear approximation method for the Shapley value. Artificial Intelligence 172, 14 (2008), 1673--1699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexandre Fréchette, Lars Kotthoff, Tomasz Michalak, Talal Rahwan, Holger Hoos, and Kevin Leyton-Brown. 2016. Using the shapley value to analyze algorithm portfolios. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarGoogle ScholarCross RefCross Ref
  10. Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Amirata Ghorbani, Michael P Kim, and James Zou. 2020. A Distributional Framework for Data Valuation. arXiv preprint arXiv:2002.12334 (2020).Google ScholarGoogle Scholar
  12. Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable Valuation of Data for Machine Learning. In International Conference on Machine Learning. 2242--2251.Google ScholarGoogle Scholar
  13. The Gradient. June 24, 2020. Lessons from the PULSE Model and Discussion. https://thegradient.pub/pulse-lessons/.Google ScholarGoogle Scholar
  14. Faruk Gul. 1989. Bargaining foundations of Shapley value. Econometrica: Journal of the Econometric Society (1989), 81--95.Google ScholarGoogle Scholar
  15. Herbert Hamers, Bart Husslage, R Lindelauf, Tjeerd Campen, et al. 2016. A New Approximation Method for the Shapley Value Applied to the WTC 9/11 Terrorist Attack. Technical Report.Google ScholarGoogle Scholar
  16. Sara Hooker. 2021. Moving beyond "algorithmic bias is a data problem". Patterns 2, 4 (2021), 100241.Google ScholarGoogle ScholarCross RefCross Ref
  17. Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, and Andrea Frome. 2019. What Do Compressed Deep Neural Networks Forget? arXiv preprint arXiv:1911.05248 (2019).Google ScholarGoogle Scholar
  18. Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).Google ScholarGoogle Scholar
  19. Ruoxi Jia, David Dao, BoxinWang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, and Costas Spanos. 2019. Towards Efficient Data Valuation Based on the Shapley Value. arXiv preprint arXiv:1902.10275 (2019).Google ScholarGoogle Scholar
  20. Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 247--254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Igor Kononenko et al. 2010. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, Jan (2010), 1--18.Google ScholarGoogle Scholar
  22. Lars Kotthoff, Alexandre Fréchette, Tomasz P Michalak, Talal Rahwan, Holger H Hoos, and Kevin Leyton-Brown. 2018. Quantifying Algorithmic Improvements over Time. In IJCAI. 5165--5171.Google ScholarGoogle Scholar
  23. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision. 3730--3738.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ScottMLundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv preprint arXiv:1802.03888 (2018).Google ScholarGoogle Scholar
  25. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google ScholarGoogle Scholar
  26. Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, and Alex Rogers. 2013. Bounding the estimation error of sampling-based Shapley value approximation. arXiv preprint arXiv:1306.4265 (2013).Google ScholarGoogle Scholar
  27. Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.Google ScholarGoogle ScholarCross RefCross Ref
  28. Tomasz P Michalak, Karthik V Aadithya, Piotr L Szczepanski, Balaraman Ravindran, and Nicholas R Jennings. 2013. Efficient computation of the Shapley value for game-theoretic network centrality. Journal of Artificial Intelligence Research 46 (2013), 607--650.Google ScholarGoogle ScholarCross RefCross Ref
  29. John Willard Milnor and Lloyd S Shapley. 1978. Values of large games II: Oceanic games. Mathematics of operations research 3, 4 (1978), 290--307.Google ScholarGoogle Scholar
  30. Art B Owen. 2014. Sobol'indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2, 1 (2014), 245--251.Google ScholarGoogle ScholarCross RefCross Ref
  31. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games 2, 28 (1953), 307--317.Google ScholarGoogle Scholar
  33. Lloyd S Shapley, Alvin E Roth, et al. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.Google ScholarGoogle Scholar
  34. Cathie Sudlow, John Gallacher, Naomi Allen, Valerie Beral, Paul Burton, John Danesh, Paul Downey, Paul Elliott, Jane Green, Martin Landray, et al. 2015. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine 12, 3 (2015), e1001779.Google ScholarGoogle Scholar
  35. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Venturebeat. June 26, 2020. AI Weekly: A deep learning pioneer's teachable moment on AI bias. https://venturebeat.com/2020/06/26/ai-weekly-a-deeplearning-pioneers-teachable-moment-on-ai-bias/.Google ScholarGoogle Scholar
  37. Lior Wolf, Tal Hassner, and Yaniv Taigman. 2011. Effective unconstrained face recognition by combining multiple descriptors and learned background statistics. IEEE transactions on pattern analysis and machine intelligence 33, 10 (2011), 1978-- 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tom Yan and A. Procaccia. 2020. If You Like Shapley Then You'll Love the Core.Google ScholarGoogle Scholar
  39. James Zou and Londa Schiebinger. 2018. AI can be sexist and racist-it's time to make it fair.Google ScholarGoogle Scholar

Index Terms

  1. Who's Responsible? Jointly Quantifying the Contribution of the Learning Algorithm and Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
        July 2021
        1077 pages
        ISBN:9781450384735
        DOI:10.1145/3461702

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 July 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate61of162submissions,38%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader