ABSTRACT
To address the growing concern of unfairness in Artificial Intelligence (AI), several bias mitigation algorithms have been introduced in prior research. Their capabilities are often evaluated on certain overly-used datasets without rigorously stress-testing them under simultaneous train and test distribution shifts. To address this, we investigate the fairness vulnerabilities of these algorithms across several distribution shift scenarios using synthetic data, to highlight scenarios where these algorithms do and don’t work to encourage their trustworthy use. The paper makes three important contributions. Firstly, we propose a flexible pipeline called the Fairness Auditor to systematically stress-test bias mitigation algorithms using multiple synthetic datasets with shifts. Secondly, we introduce the Deviation Metric for measuring the fairness and utility performance of these algorithms under such shifts. Thirdly, we propose an interactive reporting tool for comparing algorithmic performance across various synthetic datasets, mitigation algorithms and metrics called the Fairness Report.
Supplemental Material
- A Agarwal, A Beygelzimer, Mv Dudík, J Langford, and H Wallach. 2018. A Reductions Approach to Fair Classification. In Intl Conf on Machine Learning. PMLR, 60–69.Google Scholar
- Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.Google ScholarCross Ref
- R K E Bellamy, K Dey, M Hind, S C Hoffman, S Houde, K Kannan, P Lohia, J Martino, S Mehta, A Mojsilovic, S Nagar, K Natesan Ramamurthy, J Richards, D Saha, P Sattigeri, M Singh, K . Varshney, and Y Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. arxiv:1810.01943Google Scholar
- Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng, and Kristin P. Bennett. 2022. Downstream Fairness Caveats with Synthetic Healthcare Data. https://doi.org/10.48550/ARXIV.2203.04462 arxiv:2203.04462Google ScholarCross Ref
- Jinbo Bi and Kristin P Bennett. 2003. Regression Error Characteristic Curves. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 43–50.Google Scholar
- Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under Covariate Shift.Journal of Machine Learning Research 10, 9 (2009).Google Scholar
- J Buolamwini and T Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conf on Fairness, Accountability and Transparency. PMLR, 77–91.Google Scholar
- Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops. IEEE, 13–18.Google ScholarDigital Library
- Cntrs for Disease Cont and Prev. 2018. Health Insurance Portability and Accountability Act of 1996 (HIPAA). Centers for Disease Control and Prevention. https://www.cdc.gov/phlp/publications/topic/hipaa.htmlGoogle Scholar
- A Coston, K Natesan Ramamurthy, D Wei, K R Varshney, S Speakman, Z Mustahsan, and S Chakraborty. 2019. Fair Transfer Learning with Missing Protected Attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). ACM, New York, NY, USA, 91–98. https://doi.org/10.1145/3306618.3314236Google ScholarDigital Library
- European Parliament and of the Council (2016, Apr. 27). 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). Official Journal, L119 (May 2016), 1–88. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32016R0679 Accessed: Jan 26, 2022.Google Scholar
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD ’15). Association for Computing Machinery, New York, NY, USA, 259–268. https://doi.org/10.1145/2783258.2783311Google ScholarDigital Library
- A Goldberger, L Amaral, L Glass, J Hausdorff, PC Ivanov, R Mark, JE Mietus, GB Moody, CK Peng, and HE Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online] 101, 23 (2000), e215–e220.Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. Advances in Neural Information Processing Systems 27 (2014).Google Scholar
- James A Hanley and Barbara J McNeil. 1982. The meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve.Radiology 143, 1 (1982), 29–36.Google Scholar
- A Johnson, T Pollard, and R Mark. 2016. MIMIC-III Clinical Database (version 1.4). PhysioNet (2016). https://doi.org/10.13026/C2XW26Google ScholarCross Ref
- Alistair E W Johnson, T J Pollard, L Shen, L H Lehman, M Feng, Benj Ghassemi, Mand Moody, P Szolovits, L Anthony Celi, and R G Mark. 2016. MIMIC-III, a Freely Accessible Critical Care Database. Scientific Data 3, 1 (2016), 1–9.Google ScholarCross Ref
- N Lomax and P Norman. 2016. Estimating Population Attribute Values in a Table: “Get Me Started in” Iterative Proportional Fitting. The Professional Geographer 68, 3 (2016), 451–461. https://doi.org/10.1080/00330124.2015.1099449 arXiv:https://doi.org/10.1080/00330124.2015.1099449Google ScholarCross Ref
- K Makhlouf, S Zhioua, and C Palamidessi. 2021. On the Applicability of Machine Learning Fairness Notions. SIGKDD Explor. Newsl. 23, 1 (may 2021), 14–23.Google ScholarDigital Library
- Paul Norman. 1999. Putting Iterative Proportional Fitting on the Researcher’s Desk. School of Geography, University of Leeds (1999).Google Scholar
- Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems 30 (2017).Google Scholar
- Jessica Schrouff, Natalie Harris, Sanmi Koyejo, Ibrahim M Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alexander Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine A Heller, Silvia Chiappa, and Alexander D' Amour. 2022. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., 19304–19318. https://proceedings.neurips.cc/paper_files/paper/2022/file/7a969c30dc7e74d4e891c8ffb217cf79-Paper-Conference.pdfGoogle Scholar
- S Shankar, Y Halpern, E Breck, J Atwood, J Wilson, and D Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. arxiv:1711.08536Google Scholar
- A Yale, S Dash, K Bhanot, I Guyon, J S. Erickson, and K P Bennett. 2020. Synthesizing Quality Open Data Assets from Private Health Research Studies. In Business Information Systems Workshops, Witold Abramowicz and Gary Klein (Eds.). Springer Intl Pub, Cham, 324–335.Google ScholarCross Ref
- A Yale, S Dash, R Dutta, I Guyon, A Pavao, and K P Bennett. 2020. Generation and Evaluation of Privacy Preserving Synthetic Health Data. Neurocomputing 416 (2020), 244–255.Google ScholarCross Ref
Index Terms
- Stress-Testing Bias Mitigation Algorithms to Understand Fairness Vulnerabilities
Recommendations
Airtime Fairness for IEEE 802.11 Multirate Networks
Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach
QShine '08: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and RobustnessIn this work, we consider the problem of fairness for Transit Access Points (TAP) in multi-hop wireless backhaul networks. Existing approaches are not practical due to the requirement for modifications to the MAC layer or queueing operations of TAPs, or ...
Dealing with Data Bias in Classification: Can Generated Data Ensure Representation and Fairness?
Big Data Analytics and Knowledge DiscoveryAbstractFairness is a critical consideration in data analytics and knowledge discovery because biased data can perpetuate inequalities through further pipelines. In this paper, we propose a novel pre-processing method to address fairness issues in ...
Comments