skip to main content
10.1145/3468264.3468536acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections

Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

Published:18 August 2021Publication History

ABSTRACT

In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and Michael Isard. 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Julius Adebayo and Lalana Kagal. 2016. Iterative orthogonal feature projection for diagnosing bias in black-box models. arXiv preprint arXiv:1611.04967.Google ScholarGoogle Scholar
  3. Aniya Aggarwal, Pranay Lohia, Seema Nagar, Kuntal Dey, and Diptikalyan Saha. 2019. Black box fairness testing of machine learning models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 625–635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, https://github.com/propublica/compas-analysisGoogle ScholarGoogle Scholar
  6. Niels Bantilan. 2018. Themis-ml: A fairness-aware machine learning interface for end-to-end discrimination discovery and mitigation. Journal of Technology in Human Services, 36, 1 (2018), 15–30.Google ScholarGoogle ScholarCross RefCross Ref
  7. Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, and Levent Koc. 2017. TFX: A tensorflow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1387–1395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, and Aleksandra Mojsilovic. 2018. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943.Google ScholarGoogle Scholar
  9. Reuben Binns. 2017. Fairness in machine learning: Lessons from political philosophy. arXiv preprint arXiv:1712.03586.Google ScholarGoogle Scholar
  10. Sumon Biswas and Hridesh Rajan. 2020. Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 642–653. https://doi.org/10.1145/3368089.3409704 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sumon Biswas and Hridesh Rajan. 2021. Replication Package for "Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline". https://github.com/sumonbis/FairPreprocessingGoogle ScholarGoogle Scholar
  12. Amanda Bower, Sarah N Kitchen, Laura Niss, Martin J Strauss, Alexander Vargas, and Suresh Venkatasubramanian. 2017. Fair pipelines. arXiv preprint arXiv:1707.00391.Google ScholarGoogle Scholar
  13. Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 754–759.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, and Jaques Grobler. 2013. API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.Google ScholarGoogle Scholar
  15. Toon Calders and Sicco Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, 21, 2 (2010), 277–292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joymallya Chakraborty, Kewen Peng, and Tim Menzies. 2020. Making fair ML software using trustworthy explanation. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1229–1233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joymallya Chakraborty, Tianpei Xia, Fahmid M Fahid, and Tim Menzies. 2019. Software engineering for fairness: A case study with hyperparameter optimization. arXiv preprint arXiv:1905.05786.Google ScholarGoogle Scholar
  18. Priyanga Chandrasekar and Kai Qian. 2016. The impact of data preprocessing on the performance of a naive bayes classifier. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). 2, 618–619.Google ScholarGoogle ScholarCross RefCross Ref
  19. Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5, 2 (2017), 153–163.Google ScholarGoogle Scholar
  20. Sven F Crone, Stefan Lessmann, and Robert Stahlbock. 2006. The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173, 3 (2006), 781–800.Google ScholarGoogle ScholarCross RefCross Ref
  21. Alexander D’Amour, Hansa Srinivasan, James Atwood, Pallavi Baljekar, D Sculley, and Yoni Halpern. 2020. Fairness is not static: deeper understanding of long term fairness via simulation studies. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 525–534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 67–73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Cynthia Dwork and Christina Ilvento. 2018. Fairness under composition. arXiv preprint arXiv:1806.06122.Google ScholarGoogle Scholar
  25. Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 259–268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 329–338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 498–510.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Damien Garreau and Ulrike Luxburg. 2020. Explaining the explainer: A first theoretical analysis of LIME. In International Conference on Artificial Intelligence and Statistics. 1287–1296.Google ScholarGoogle Scholar
  29. Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. 2016. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems. 2415–2423.Google ScholarGoogle Scholar
  30. Noah J Goodall. 2016. Can you program ethics into a self-driving car? IEEE Spectrum, 53, 6 (2016), 28–58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. 2018. Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning.. In AAAI. 51–60.Google ScholarGoogle Scholar
  32. Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In Advances in neural information processing systems. 3315–3323.Google ScholarGoogle Scholar
  33. Galen Harrison, Julia Hanson, Christine Jacinto, Julio Ramirez, and Blase Ur. 2020. An empirical study on the perceived fairness of realistic, imperfect machine learning models. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 392–402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dr. Hans Hofmann. 1994. German Credit Dataset: UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)Google ScholarGoogle Scholar
  35. Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need? In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Max Hort, Jie M Zhang, Federica Sarro, and Mark Harman. 2021. Fairea: A Model Behaviour Mutation Approach to Benchmarking Bias Mitigation Methods. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (to appear).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In ESEC/FSE’19: The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (ESEC/FSE 2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing Deep Neural Networks: Fix Patterns and Challenges. In ICSE’20: The 42nd International Conference on Software Engineering.Google ScholarGoogle Scholar
  39. Kaggle. 2017. Home Credit Dataset. https://www.kaggle.com/c/home-credit-default-riskGoogle ScholarGoogle Scholar
  40. Kaggle. 2017. Titanic ML Dataset. https://www.kaggle.com/c/titanic/dataGoogle ScholarGoogle Scholar
  41. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33, 1 (2012), 1–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Faisal Kamiran, Asim Karim, and Xiangliang Zhang. 2012. Decision theory for discrimination-aware classification. In 2012 IEEE 12th International Conference on Data Mining. 924–929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 35–50.Google ScholarGoogle ScholarCross RefCross Ref
  44. Keith Kirkpatrick. 2017. It’s not the algorithm, it’s the data. Commun. ACM, 60, 2 (2017), 21–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ron Kohavi. 1996. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid.. In KDD. 96, 202–207. https://archive.ics.uci.edu/ml/datasets/adultGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  46. Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2019. An intermediate representation for optimizing machine learning pipelines. Proceedings of the VLDB Endowment, 12, 11 (2019), 1553–1567.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Matt Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 4069–4079.Google ScholarGoogle Scholar
  48. Guillaume Lemaître, Fernando Nogueira, and Christos K Aridas. 2017. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research, 18, 1 (2017), 559–563.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sérgio Moro, Paulo Cortez, and Paulo Rita. 2014. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62 (2014), 22–31. https://archive.ics.uci.edu/ml/datasets/Bank+MarketingGoogle ScholarGoogle ScholarCross RefCross Ref
  50. P Olson. 2011. The algorithm that beats your bank manager. CNN Money, https://www.forbes.com/sites/parmyolson/2011/03/15/the-algorithm-that-beats-your-bank-manager/Google ScholarGoogle Scholar
  51. Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on automatic machine learning. 66–74.Google ScholarGoogle Scholar
  52. Rangeet Pan and Hridesh Rajan. 2020. On Decomposing a Deep Neural Network into Modules. In ESEC/FSE’2020: The 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarGoogle Scholar
  53. Judea Pearl. 2000. Causality: Models, reasoning and inference cambridge university press. Cambridge, MA, USA,, 9 (2000), 10–11.Google ScholarGoogle Scholar
  54. Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics surveys, 3 (2009), 96–146.Google ScholarGoogle Scholar
  55. Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. In Advances in Neural Information Processing Systems. 5680–5689.Google ScholarGoogle Scholar
  56. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence. 32.Google ScholarGoogle ScholarCross RefCross Ref
  57. Christopher Russell, Matt J Kusner, Joshua R Loftus, and Ricardo Silva. 2017. When worlds collide: integrating different counterfactual assumptions in fairness. Advances in Neural Information Processing Systems 30. Pre-proceedings, 30 (2017).Google ScholarGoogle Scholar
  58. Pedro Saleiro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T Rodolfa, and Rayid Ghani. 2018. Aequitas: A bias and fairness audit toolkit. arXiv preprint arXiv:1811.05577.Google ScholarGoogle Scholar
  59. Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In Proceedings of the 2019 International Conference on Management of Data. 793–810.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Scikit Learn. 2019. Feature Selection Methods. https://scikit-learn.org/stable/modules/feature_selection.htmlGoogle ScholarGoogle Scholar
  61. Scikit Learn. 2019. Preprocessing API Documentation. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessingGoogle ScholarGoogle Scholar
  62. Scikit Learn. 2019. Scikit Learn SimpleImputer. https://scikit-learn.org/0.18/modules/generated/sklearn.preprocessing.Imputer.htmlGoogle ScholarGoogle Scholar
  63. Scikit-Learn Pipeline. 2020. Scikit-Learn API Documentation. https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.htmlGoogle ScholarGoogle Scholar
  64. Kacper Sokol, Raul Santos-Rodriguez, and Peter Flach. 2019. FAT Forensics: A Python Toolbox for Algorithmic Fairness, Accountability and Transparency. arXiv preprint arXiv:1909.05167.Google ScholarGoogle Scholar
  65. Till Speicher, Hoda Heidari, Nina Grgic-Hlaca, Krishna P Gummadi, Adish Singla, Adrian Weller, and Muhammad Bilal Zafar. 2018. A unified approach to quantifying algorithmic unfairness: Measuring individual &group unfairness via inequality indices. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2239–2248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Florian Tramer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin. 2017. FairTest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P). 401–416.Google ScholarGoogle ScholarCross RefCross Ref
  67. Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. 2018. Automated directed fairness testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 98–108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. US Equal Employment Opportunity Commission. 1979. Guidelines on Employee Selection Procedures. https://www.eeoc.gov/laws/guidance/questions-and-answers-clarify-and-provide-common-interpretation-uniform-guidelinesGoogle ScholarGoogle Scholar
  69. Alper Kursat Uysal and Serkan Gunal. 2014. The impact of preprocessing on text classification. Information Processing & Management, 50, 1 (2014), 104–112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Mohammad Wardat, Wei Le, and Hridesh Rajan. 2021. DeepLocalize: Fault Localization for Deep Neural Networks. In ICSE’21: The 43nd International Conference on Software Engineering.Google ScholarGoogle Scholar
  71. Ke Yang, Biao Huang, Julia Stoyanovich, and Sebastian Schelter. 2020. Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning. In Workshop on Human-In-the-Loop Data Analytics (HILDA’20).Google ScholarGoogle Scholar
  72. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2015. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259.Google ScholarGoogle Scholar
  73. Carlos Vladimiro González Zelaya. 2019. Towards Explaining the Effects of Data Preprocessing on Machine Learning. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 2086–2090.Google ScholarGoogle Scholar
  74. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International Conference on Machine Learning. 325–333.Google ScholarGoogle Scholar
  75. Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335–340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Junzhe Zhang and Elias Bareinboim. 2018. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence. 32.Google ScholarGoogle ScholarCross RefCross Ref
  77. Jie M Zhang and Mark Harman. 2021. “Ignorance and Prejudice” in Software Fairness. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1436–1447.Google ScholarGoogle Scholar

Index Terms

  1. Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader