skip to main content
10.1145/3514094.3534157acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

FairCanary: Rapid Continuous Explainable Fairness

Published:27 July 2022Publication History

ABSTRACT

Systems that offer continuous model monitoring have emerged in response to (1) well-documented failures of deployed Machine Learning (ML) and Artificial Intelligence (AI) models and (2) new regulatory requirements impacting these models. Existing monitoring systems continuously track the performance of deployed ML models and compute feature importance (a.k.a. explanations) for each prediction to help developers identify the root causes of emergent model performance problems.

We present Quantile Demographic Drift (QDD), a novel model bias quantification metric that uses quantile binning to measure differences in the overall prediction distributions over subgroups. QDD is ideal for continuous monitoring scenarios, does not suffer from the statistical limitations of conventional threshold-based bias metrics, and does not require outcome labels (which may not be available at runtime). We incorporate QDD into a continuous model monitoring system, called FairCanary, that reuses existing explanations computed for each individual prediction to quickly compute explanations for the QDD bias metrics. This optimization makes FairCanary an order of magnitude faster than previous work that has tried to generate feature-level bias explanations.

Skip Supplemental Material Section

Supplemental Material

AIES22-fp098.mp4

mp4

55.2 MB

References

  1. 116th Congress (2019--2020). [n.,d.]. H.R.2231 - Algorithmic Accountability Act of 2019. https://www.congress.gov/bill/116th-congress/house-bill/2231.Google ScholarGoogle Scholar
  2. Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, and Hanna Wallach. 2021. Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs. arXiv preprint arXiv:2103.06076 (2021).Google ScholarGoogle Scholar
  3. Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. 2018. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943 (2018).Google ScholarGoogle Scholar
  4. Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 648--657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM international conference on data mining. SIAM, 443--448.Google ScholarGoogle ScholarCross RefCross Ref
  7. Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2016. Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks. Springer, 63--71.Google ScholarGoogle ScholarCross RefCross Ref
  8. Emily Black, Samuel Yeom, and Matt Fredrikson. 2020. Fliptest: fairness testing via optimal transport. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 111--121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Miranda Bogen and Aaron Rieke. 2018. Help wanted: An examination of hiring algorithms, equity, and bias. (2018).Google ScholarGoogle Scholar
  10. Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning.. In MLSys.Google ScholarGoogle Scholar
  11. European Commission. [n.,d.]. Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence.Google ScholarGoogle Scholar
  12. Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023 (2018).Google ScholarGoogle Scholar
  13. Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797--806.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jakub Czakon. 2022. Best Tools to Do ML Model Monitoring. (2022). https://neptune.ai/blog/ml-model-monitoring-best-toolsGoogle ScholarGoogle Scholar
  15. Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar. 2021. Fairness Measures for Machine Learning in Finance. The Journal of Financial Data Science, Vol. 3, 4 (2021), 33--64.Google ScholarGoogle ScholarCross RefCross Ref
  16. Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP). IEEE, 598--617.Google ScholarGoogle ScholarCross RefCross Ref
  17. Denis Moreira dos Reis, Peter Flach, Stan Matwin, and Gustavo Batista. 2016. Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1545--1554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Equal Employment Opportunity Commission, Civil Service Commission, et al. 1978. Uniform guidelines on employee selection procedures. Federal Register, Vol. 43, 166 (1978), 38290--38315.Google ScholarGoogle Scholar
  20. UK Office for Artificial Intelligence. [n.,d.]. Ethics, Transparency and Accountability Framework for Automated Decision-Making. https://www.gov.uk/government/publications/ethics-transparency-and-accountability-framework-for-automated-decision-making.Google ScholarGoogle Scholar
  21. Center for Data Science and Public Policy. [n.,d.]. Aequitas: Fairness Tree. http://www.datasciencepublicpolicy.org/projects/aequitas/.Google ScholarGoogle Scholar
  22. Joao Gama, Raquel Sebastiao, and Pedro Pereira Rodrigues. 2013. On evaluating stream learning algorithms. Machine learning, Vol. 90, 3 (2013), 317--346.Google ScholarGoogle Scholar
  23. Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2221--2231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sindhu Ghanta, Sriram Subramanian, Lior Khermosh, Swaminathan Sundararaman, Harshil Shah, Yakov Goldberg, Drew S. Roselli, and Nisha Talagala. 2019. ML Health: Fitness Tracking for Production Models. CoRR, Vol. abs/1902.02808 (2019). arxiv: 1902.02808 http://arxiv.org/abs/1902.02808Google ScholarGoogle Scholar
  25. Avijit Ghosh, Lea Genuit, and Mary Reagan. 2021. Characterizing Intersectional Group Fairness with Worst-Case Comparisons. arXiv preprint arXiv:2101.01673 (2021).Google ScholarGoogle Scholar
  26. Vincent Grari, Boris Ruf, Sylvain Lamprier, and Marcin Detyniecki. 2019. Fairness-Aware Neural Réyni Minimization for Continuous Features. arXiv preprint arXiv:1911.04929 (2019).Google ScholarGoogle Scholar
  27. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413 (2016).Google ScholarGoogle Scholar
  28. Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. 2019. A benchmark for interpretability methods in deep neural networks. Advances in neural information processing systems, Vol. 32 (2019).Google ScholarGoogle Scholar
  29. Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, and Silvia Chiappa. 2020. Wasserstein fair classification. In Uncertainty in Artificial Intelligence. PMLR, 862--872.Google ScholarGoogle Scholar
  30. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems, Vol. 33, 1 (2012), 1--33.Google ScholarGoogle Scholar
  31. Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2020. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050 (2020).Google ScholarGoogle Scholar
  32. Niki Kilbertus, Philip J Ball, Matt J Kusner, Adrian Weller, and Ricardo Silva. 2020. The sensitivity of counterfactual fairness to unmeasured confounding. In Uncertainty in artificial intelligence. PMLR, 616--626.Google ScholarGoogle Scholar
  33. Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).Google ScholarGoogle Scholar
  34. Alistair Knott. [n.,d.]. Moving Towards Responsible Government Use of AI in New Zealand). https://digitaltechitp.nz/2021/03/22/moving-towards-responsible-government-use-of-ai-in-new-zealand/.Google ScholarGoogle Scholar
  35. I Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020 a. Problems with Shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning. PMLR, 5491--5500.Google ScholarGoogle Scholar
  36. I. Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020 b. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 5491--5500. https://proceedings.mlr.press/v119/kumar20e.htmlGoogle ScholarGoogle Scholar
  37. Matt J Kusner, Joshua R Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. arXiv preprint arXiv:1703.06856 (2017).Google ScholarGoogle Scholar
  38. Jianhua Lin. 1991. Divergence measures based on the Shannon entropy. IEEE Transactions on Information theory, Vol. 37, 1 (1991), 145--151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. 2018. Delayed impact of fair machine learning. In International Conference on Machine Learning. PMLR, 3150--3158.Google ScholarGoogle Scholar
  40. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdfGoogle ScholarGoogle Scholar
  41. Aniek F Markus, Jan A Kors, and Peter R Rijnbeek. 2021. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics, Vol. 113 (2021), 103655.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).Google ScholarGoogle Scholar
  43. Luke Merrick and Ankur Taly. 2020. The Explanation Game: Explaining Machine Learning Models Using Shapley Values. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 17--38.Google ScholarGoogle Scholar
  44. Alexey Miroshnikov, Konstandinos Kotsiopoulos, Ryan Franks, and Arjun Ravi Kannan. 2020. Wasserstein-based fairness interpretability framework for machine learning models. arXiv preprint arXiv:2011.03156 (2020).Google ScholarGoogle Scholar
  45. Preetam Nandy, Cyrus Diciccio, Divya Venugopalan, Heloise Logan, Kinjal Basu, and Noureddine El Karoui. 2021. Achieving Fairness via Post-Processing in Web-Scale Recommender Systems. arxiv: 2006.11350 [stat.ML]Google ScholarGoogle Scholar
  46. Arvind Narayanan. [n.,d.]. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html.Google ScholarGoogle Scholar
  47. David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models. arXiv preprint arXiv:2111.13657 (2021).Google ScholarGoogle Scholar
  48. K. Nishida, S. Shimada, S. Ishikawa, and K. Yamauchi. 2008. Detecting sudden concept drift with knowledge of human behavior. In 2008 IEEE International Conference on Systems, Man and Cybernetics. 3261--3267. https://doi.org/10.1109/ICSMC.2008.4811799Google ScholarGoogle ScholarCross RefCross Ref
  49. Government of Canada. [n.,d.]. Responsible use of artificial intelligence (AI). https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/responsible-use-ai.html.Google ScholarGoogle Scholar
  50. Fábio Pinto, Marco OP Sampaio, and Pedro Bizarro. 2019. Automatic model monitoring for data streams. arXiv preprint arXiv:1908.04240 (2019).Google ScholarGoogle Scholar
  51. Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT*).Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Marcos Salganicoff. 1997. Tolerating concept and sampling shift in lazy learning using prediction error context switching. In Lazy learning. Springer, 133--155.Google ScholarGoogle Scholar
  54. Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On challenges in machine learning model management. (2018).Google ScholarGoogle Scholar
  55. David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems, Vol. 28 (2015).Google ScholarGoogle Scholar
  56. Andrew Selbst and Julia Powles. 2018. "Meaningful Information" and the Right to Explanation. In Conference on Fairness, Accountability and Transparency. PMLR, 48--48.Google ScholarGoogle Scholar
  57. Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency. 59--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Aalok Shanbhag, Avijit Ghosh, and Josh Rubin. 2021. Unified Shapley Framework to Explain Prediction Drift. arXiv preprint arXiv:2102.07862 (2021).Google ScholarGoogle Scholar
  59. Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145--3153.Google ScholarGoogle Scholar
  60. Kenneth O Stanley. 2003. Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA (2003).Google ScholarGoogle Scholar
  61. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319--3328.Google ScholarGoogle Scholar
  62. Cédric Villani. 2009. The wasserstein distances. In Optimal transport. Springer, 93--111.Google ScholarGoogle Scholar
  63. Christo Wilson, Avijit Ghosh, Shan Jiang, Alan Mislove, Lewis Baker, Janelle Szary, Kelly Trindel, and Frida Polli. 2021. Building and auditing fair algorithms: A case study in candidate screening. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 666--677.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Indre vZ liobaite. 2010. Change with delayed labeling: When is it detectable?. In 2010 IEEE International Conference on Data Mining Workshops. IEEE, 843--850.Google ScholarGoogle Scholar

Index Terms

  1. FairCanary: Rapid Continuous Explainable Fairness

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
        July 2022
        939 pages
        ISBN:9781450392471
        DOI:10.1145/3514094

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 July 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate61of162submissions,38%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader