skip to main content
10.1145/3600211.3604713acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article
Open Access

Stress-Testing Bias Mitigation Algorithms to Understand Fairness Vulnerabilities

Published:29 August 2023Publication History

ABSTRACT

To address the growing concern of unfairness in Artificial Intelligence (AI), several bias mitigation algorithms have been introduced in prior research. Their capabilities are often evaluated on certain overly-used datasets without rigorously stress-testing them under simultaneous train and test distribution shifts. To address this, we investigate the fairness vulnerabilities of these algorithms across several distribution shift scenarios using synthetic data, to highlight scenarios where these algorithms do and don’t work to encourage their trustworthy use. The paper makes three important contributions. Firstly, we propose a flexible pipeline called the Fairness Auditor to systematically stress-test bias mitigation algorithms using multiple synthetic datasets with shifts. Secondly, we introduce the Deviation Metric for measuring the fairness and utility performance of these algorithms under such shifts. Thirdly, we propose an interactive reporting tool for comparing algorithmic performance across various synthetic datasets, mitigation algorithms and metrics called the Fairness Report.

Skip Supplemental Material Section

Supplemental Material

References

  1. A Agarwal, A Beygelzimer, Mv Dudík, J Langford, and H Wallach. 2018. A Reductions Approach to Fair Classification. In Intl Conf on Machine Learning. PMLR, 60–69.Google ScholarGoogle Scholar
  2. Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.Google ScholarGoogle ScholarCross RefCross Ref
  3. R K E Bellamy, K Dey, M Hind, S C Hoffman, S Houde, K Kannan, P Lohia, J Martino, S Mehta, A Mojsilovic, S Nagar, K Natesan Ramamurthy, J Richards, D Saha, P Sattigeri, M Singh, K . Varshney, and Y Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. arxiv:1810.01943Google ScholarGoogle Scholar
  4. Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng, and Kristin P. Bennett. 2022. Downstream Fairness Caveats with Synthetic Healthcare Data. https://doi.org/10.48550/ARXIV.2203.04462 arxiv:2203.04462Google ScholarGoogle ScholarCross RefCross Ref
  5. Jinbo Bi and Kristin P Bennett. 2003. Regression Error Characteristic Curves. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 43–50.Google ScholarGoogle Scholar
  6. Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under Covariate Shift.Journal of Machine Learning Research 10, 9 (2009).Google ScholarGoogle Scholar
  7. J Buolamwini and T Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conf on Fairness, Accountability and Transparency. PMLR, 77–91.Google ScholarGoogle Scholar
  8. Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. 2009. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops. IEEE, 13–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cntrs for Disease Cont and Prev. 2018. Health Insurance Portability and Accountability Act of 1996 (HIPAA). Centers for Disease Control and Prevention. https://www.cdc.gov/phlp/publications/topic/hipaa.htmlGoogle ScholarGoogle Scholar
  10. A Coston, K Natesan Ramamurthy, D Wei, K R Varshney, S Speakman, Z Mustahsan, and S Chakraborty. 2019. Fair Transfer Learning with Missing Protected Attributes. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). ACM, New York, NY, USA, 91–98. https://doi.org/10.1145/3306618.3314236Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. European Parliament and of the Council (2016, Apr. 27). 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). Official Journal, L119 (May 2016), 1–88. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32016R0679 Accessed: Jan 26, 2022.Google ScholarGoogle Scholar
  12. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD ’15). Association for Computing Machinery, New York, NY, USA, 259–268. https://doi.org/10.1145/2783258.2783311Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A Goldberger, L Amaral, L Glass, J Hausdorff, PC Ivanov, R Mark, JE Mietus, GB Moody, CK Peng, and HE Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online] 101, 23 (2000), e215–e220.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. Advances in Neural Information Processing Systems 27 (2014).Google ScholarGoogle Scholar
  15. James A Hanley and Barbara J McNeil. 1982. The meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve.Radiology 143, 1 (1982), 29–36.Google ScholarGoogle Scholar
  16. A Johnson, T Pollard, and R Mark. 2016. MIMIC-III Clinical Database (version 1.4). PhysioNet (2016). https://doi.org/10.13026/C2XW26Google ScholarGoogle ScholarCross RefCross Ref
  17. Alistair E W Johnson, T J Pollard, L Shen, L H Lehman, M Feng, Benj Ghassemi, Mand Moody, P Szolovits, L Anthony Celi, and R G Mark. 2016. MIMIC-III, a Freely Accessible Critical Care Database. Scientific Data 3, 1 (2016), 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  18. N Lomax and P Norman. 2016. Estimating Population Attribute Values in a Table: “Get Me Started in” Iterative Proportional Fitting. The Professional Geographer 68, 3 (2016), 451–461. https://doi.org/10.1080/00330124.2015.1099449 arXiv:https://doi.org/10.1080/00330124.2015.1099449Google ScholarGoogle ScholarCross RefCross Ref
  19. K Makhlouf, S Zhioua, and C Palamidessi. 2021. On the Applicability of Machine Learning Fairness Notions. SIGKDD Explor. Newsl. 23, 1 (may 2021), 14–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Paul Norman. 1999. Putting Iterative Proportional Fitting on the Researcher’s Desk. School of Geography, University of Leeds (1999).Google ScholarGoogle Scholar
  21. Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  22. Jessica Schrouff, Natalie Harris, Sanmi Koyejo, Ibrahim M Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alexander Brown, Subhrajit Roy, Diana Mincu, Christina Chen, Awa Dieng, Yuan Liu, Vivek Natarajan, Alan Karthikesalingam, Katherine A Heller, Silvia Chiappa, and Alexander D' Amour. 2022. Diagnosing failures of fairness transfer across distribution shift in real-world medical settings. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35. Curran Associates, Inc., 19304–19318. https://proceedings.neurips.cc/paper_files/paper/2022/file/7a969c30dc7e74d4e891c8ffb217cf79-Paper-Conference.pdfGoogle ScholarGoogle Scholar
  23. S Shankar, Y Halpern, E Breck, J Atwood, J Wilson, and D Sculley. 2017. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World. arxiv:1711.08536Google ScholarGoogle Scholar
  24. A Yale, S Dash, K Bhanot, I Guyon, J S. Erickson, and K P Bennett. 2020. Synthesizing Quality Open Data Assets from Private Health Research Studies. In Business Information Systems Workshops, Witold Abramowicz and Gary Klein (Eds.). Springer Intl Pub, Cham, 324–335.Google ScholarGoogle ScholarCross RefCross Ref
  25. A Yale, S Dash, R Dutta, I Guyon, A Pavao, and K P Bennett. 2020. Generation and Evaluation of Privacy Preserving Synthetic Health Data. Neurocomputing 416 (2020), 244–255.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Stress-Testing Bias Mitigation Algorithms to Understand Fairness Vulnerabilities

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society
      August 2023
      1026 pages
      ISBN:9798400702310
      DOI:10.1145/3600211

      Copyright © 2023 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 August 2023

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate61of162submissions,38%
    • Article Metrics

      • Downloads (Last 12 months)279
      • Downloads (Last 6 weeks)72

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format