research-article

FairCanary: Rapid Continuous Explainable Fairness

Authors:
Avijit Ghosh

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Aalok Shanbhag

Snap Inc., Mountain View, CA, USA

Snap Inc., Mountain View, CA, USA
View Profile

,
Christo Wilson

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and SocietyJuly 2022Pages 307–316https://doi.org/10.1145/3514094.3534157

Published:27 July 2022Publication History

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

Pages 307–316

ABSTRACT

Systems that offer continuous model monitoring have emerged in response to (1) well-documented failures of deployed Machine Learning (ML) and Artificial Intelligence (AI) models and (2) new regulatory requirements impacting these models. Existing monitoring systems continuously track the performance of deployed ML models and compute feature importance (a.k.a. explanations) for each prediction to help developers identify the root causes of emergent model performance problems.

We present Quantile Demographic Drift (QDD), a novel model bias quantification metric that uses quantile binning to measure differences in the overall prediction distributions over subgroups. QDD is ideal for continuous monitoring scenarios, does not suffer from the statistical limitations of conventional threshold-based bias metrics, and does not require outcome labels (which may not be available at runtime). We incorporate QDD into a continuous model monitoring system, called FairCanary, that reuses existing explanations computed for each individual prediction to quickly compute explanations for the QDD bias metrics. This optimization makes FairCanary an order of magnitude faster than previous work that has tried to generate feature-level bias explanations.

Supplemental Material

AIES22-fp098.mp4

mp4

55.2 MB

Download

References

116th Congress (2019--2020). [n.,d.]. H.R.2231 - Algorithmic Accountability Act of 2019. https://www.congress.gov/bill/116th-congress/house-bill/2231.Google Scholar
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, and Hanna Wallach. 2021. Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs. arXiv preprint arXiv:2103.06076 (2021).Google Scholar
Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. 2018. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943 (2018).Google Scholar
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.Google ScholarDigital Library
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 648--657.Google ScholarDigital Library
Albert Bifet and Ricard Gavalda. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM international conference on data mining. SIAM, 443--448.Google ScholarCross Ref
Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, Klaus-Robert Müller, and Wojciech Samek. 2016. Layer-wise relevance propagation for neural networks with local renormalization layers. In International Conference on Artificial Neural Networks. Springer, 63--71.Google ScholarCross Ref
Emily Black, Samuel Yeom, and Matt Fredrikson. 2020. Fliptest: fairness testing via optimal transport. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 111--121.Google ScholarDigital Library
Miranda Bogen and Aaron Rieke. 2018. Help wanted: An examination of hiring algorithms, equity, and bias. (2018).Google Scholar
Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning.. In MLSys.Google Scholar
European Commission. [n.,d.]. Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence.Google Scholar
Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023 (2018).Google Scholar
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797--806.Google ScholarDigital Library
Jakub Czakon. 2022. Best Tools to Do ML Model Monitoring. (2022). https://neptune.ai/blog/ml-model-monitoring-best-toolsGoogle Scholar
Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar. 2021. Fairness Measures for Machine Learning in Finance. The Journal of Financial Data Science, Vol. 3, 4 (2021), 33--64.Google ScholarCross Ref
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP). IEEE, 598--617.Google ScholarCross Ref
Denis Moreira dos Reis, Peter Flach, Stan Matwin, and Gustavo Batista. 2016. Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1545--1554.Google ScholarDigital Library
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.Google ScholarDigital Library
Equal Employment Opportunity Commission, Civil Service Commission, et al. 1978. Uniform guidelines on employee selection procedures. Federal Register, Vol. 43, 166 (1978), 38290--38315.Google Scholar
UK Office for Artificial Intelligence. [n.,d.]. Ethics, Transparency and Accountability Framework for Automated Decision-Making. https://www.gov.uk/government/publications/ethics-transparency-and-accountability-framework-for-automated-decision-making.Google Scholar
Center for Data Science and Public Policy. [n.,d.]. Aequitas: Fairness Tree. http://www.datasciencepublicpolicy.org/projects/aequitas/.Google Scholar
Joao Gama, Raquel Sebastiao, and Pedro Pereira Rodrigues. 2013. On evaluating stream learning algorithms. Machine learning, Vol. 90, 3 (2013), 317--346.Google Scholar
Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2221--2231.Google ScholarDigital Library
Sindhu Ghanta, Sriram Subramanian, Lior Khermosh, Swaminathan Sundararaman, Harshil Shah, Yakov Goldberg, Drew S. Roselli, and Nisha Talagala. 2019. ML Health: Fitness Tracking for Production Models. CoRR, Vol. abs/1902.02808 (2019). arxiv: 1902.02808 http://arxiv.org/abs/1902.02808Google Scholar
Avijit Ghosh, Lea Genuit, and Mary Reagan. 2021. Characterizing Intersectional Group Fairness with Worst-Case Comparisons. arXiv preprint arXiv:2101.01673 (2021).Google Scholar
Vincent Grari, Boris Ruf, Sylvain Lamprier, and Marcin Detyniecki. 2019. Fairness-Aware Neural Réyni Minimization for Continuous Features. arXiv preprint arXiv:1911.04929 (2019).Google Scholar
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413 (2016).Google Scholar
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. 2019. A benchmark for interpretability methods in deep neural networks. Advances in neural information processing systems, Vol. 32 (2019).Google Scholar
Ray Jiang, Aldo Pacchiano, Tom Stepleton, Heinrich Jiang, and Silvia Chiappa. 2020. Wasserstein fair classification. In Uncertainty in Artificial Intelligence. PMLR, 862--872.Google Scholar
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems, Vol. 33, 1 (2012), 1--33.Google Scholar
Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2020. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050 (2020).Google Scholar
Niki Kilbertus, Philip J Ball, Matt J Kusner, Adrian Weller, and Ricardo Silva. 2020. The sensitivity of counterfactual fairness to unmeasured confounding. In Uncertainty in artificial intelligence. PMLR, 616--626.Google Scholar
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).Google Scholar
Alistair Knott. [n.,d.]. Moving Towards Responsible Government Use of AI in New Zealand). https://digitaltechitp.nz/2021/03/22/moving-towards-responsible-government-use-of-ai-in-new-zealand/.Google Scholar
I Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020 a. Problems with Shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning. PMLR, 5491--5500.Google Scholar
I. Elizabeth Kumar, Suresh Venkatasubramanian, Carlos Scheidegger, and Sorelle Friedler. 2020 b. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 5491--5500. https://proceedings.mlr.press/v119/kumar20e.htmlGoogle Scholar
Matt J Kusner, Joshua R Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. arXiv preprint arXiv:1703.06856 (2017).Google Scholar
Jianhua Lin. 1991. Divergence measures based on the Shannon entropy. IEEE Transactions on Information theory, Vol. 37, 1 (1991), 145--151.Google ScholarDigital Library
Lydia T Liu, Sarah Dean, Esther Rolf, Max Simchowitz, and Moritz Hardt. 2018. Delayed impact of fair machine learning. In International Conference on Machine Learning. PMLR, 3150--3158.Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdfGoogle Scholar
Aniek F Markus, Jan A Kors, and Peter R Rijnbeek. 2021. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. Journal of Biomedical Informatics, Vol. 113 (2021), 103655.Google ScholarDigital Library
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635 (2019).Google Scholar
Luke Merrick and Ankur Taly. 2020. The Explanation Game: Explaining Machine Learning Models Using Shapley Values. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 17--38.Google Scholar
Alexey Miroshnikov, Konstandinos Kotsiopoulos, Ryan Franks, and Arjun Ravi Kannan. 2020. Wasserstein-based fairness interpretability framework for machine learning models. arXiv preprint arXiv:2011.03156 (2020).Google Scholar
Preetam Nandy, Cyrus Diciccio, Divya Venugopalan, Heloise Logan, Kinjal Basu, and Noureddine El Karoui. 2021. Achieving Fairness via Post-Processing in Web-Scale Recommender Systems. arxiv: 2006.11350 [stat.ML]Google Scholar
Arvind Narayanan. [n.,d.]. 21 fairness definitions and their politics. https://fairmlbook.org/tutorial2.html.Google Scholar
David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models. arXiv preprint arXiv:2111.13657 (2021).Google Scholar
K. Nishida, S. Shimada, S. Ishikawa, and K. Yamauchi. 2008. Detecting sudden concept drift with knowledge of human behavior. In 2008 IEEE International Conference on Systems, Man and Cybernetics. 3261--3267. https://doi.org/10.1109/ICSMC.2008.4811799Google ScholarCross Ref
Government of Canada. [n.,d.]. Responsible use of artificial intelligence (AI). https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/responsible-use-ai.html.Google Scholar
Fábio Pinto, Marco OP Sampaio, and Pedro Bizarro. 2019. Automatic model monitoring for data streams. arXiv preprint arXiv:1908.04240 (2019).Google Scholar
Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating Bias in Algorithmic Hiring: Evaluating Claims and Practices. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT*).Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.Google ScholarDigital Library
Marcos Salganicoff. 1997. Tolerating concept and sampling shift in lazy learning using prediction error context switching. In Lazy learning. Springer, 133--155.Google Scholar
Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018. On challenges in machine learning model management. (2018).Google Scholar
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. Advances in neural information processing systems, Vol. 28 (2015).Google Scholar
Andrew Selbst and Julia Powles. 2018. "Meaningful Information" and the Right to Explanation. In Conference on Fairness, Accountability and Transparency. PMLR, 48--48.Google Scholar
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency. 59--68.Google ScholarDigital Library
Aalok Shanbhag, Avijit Ghosh, and Josh Rubin. 2021. Unified Shapley Framework to Explain Prediction Drift. arXiv preprint arXiv:2102.07862 (2021).Google Scholar
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145--3153.Google Scholar
Kenneth O Stanley. 2003. Learning concept drift with a committee of decision trees. Informe técnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA (2003).Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319--3328.Google Scholar
Cédric Villani. 2009. The wasserstein distances. In Optimal transport. Springer, 93--111.Google Scholar
Christo Wilson, Avijit Ghosh, Shan Jiang, Alan Mislove, Lewis Baker, Janelle Szary, Kelly Trindel, and Frida Polli. 2021. Building and auditing fair algorithms: A case study in candidate screening. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 666--677.Google ScholarDigital Library
Indre vZ liobaite. 2010. Change with delayed labeling: When is it detectable?. In 2010 IEEE International Conference on Data Mining Workshops. IEEE, 843--850.Google Scholar

Index Terms

FairCanary: Rapid Continuous Explainable Fairness
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Data analytics

Recommendations

PreCoF: counterfactual explanations for fairness
Abstract
This paper studies how counterfactual explanations can be used to assess the fairness of a model. Using machine learning for high-stakes decisions is a threat to fairness as these models can amplify bias present in the dataset, and there is no ...
Read More
Graphical Perception of Saliency-based Model Explanations
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

In recent years, considerable work has been devoted to explaining predictive, deep learning-based models, and in turn how to evaluate explanations. An important class of evaluation methods are ones that are human-centered, which typically require the ...
Read More
Bias in Artificial Intelligence Models in Financial Services
AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

Nowadays, artificial intelligence models are widely used in financial services, from credit scoring to fraud detection, having a direct impact on our daily lives. Although such models have been developed to try to reduce human bias and thus bring ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
July 2022
939 pages
ISBN:9781450392471
DOI:10.1145/3514094
General Chairs:
Vincent Conitzer
Duke University & University of Oxford
,
John Tasioulas
University of Oxford
,
Program Chairs:
Matthias Scheutz
Tufts University
,
Ryan Calo
University of Washington
,
Martina Mara
Johannes Kepler University Linz
,
Annette Zimmermann
University of York
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
continuous measurement
drift
fairness
model explanation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate61of162submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 269
  Total Downloads
- Downloads (Last 12 months)123
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FairCanary: Rapid Continuous Explainable Fairness

AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

PreCoF: counterfactual explanations for fairness

Graphical Perception of Saliency-based Model Explanations

Bias in Artificial Intelligence Models in Financial Services