skip to main content
10.1145/3593013.3594008acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

Published:12 June 2023Publication History

ABSTRACT

In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population changes, certain demographic groups will be under-served or otherwise disadvantaged by the model, even as they become more represented in the target population. The field of domain adaptation proposes techniques for a situation where label data for the target population does not exist, but some information about the target distribution does exist. In this paper we contribute to the domain adaptation literature by introducing domain-adaptive decision trees (DADT). We focus on decision trees given their growing popularity due to their interpretability and performance relative to other more complex models. With DADT we aim to improve the accuracy of models trained in a source domain (or training data) that differs from the target domain (or test data). We propose an in-processing step that adjusts the information gain split criterion with outside information corresponding to the distribution of the target population. We demonstrate DADT on real data and find that it improves accuracy over a standard decision tree when testing in a shifted target population. We also study the change in fairness under demographic parity and equal opportunity. Results show an improvement in fairness with the use of DADT.

Skip Supplemental Material Section

Supplemental Material

References

  1. Evan Archer, Il Memming Park, and Jonathan W. Pillow. 2014. Bayesian entropy estimation for countable discrete distributions. J. Mach. Learn. Res. 15, 1 (2014), 2833–2868.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.Google ScholarGoogle Scholar
  3. Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In FAT(Proceedings of Machine Learning Research, Vol. 81). PMLR, 77–91.Google ScholarGoogle Scholar
  4. Thomas M. Cover and Joy A. Thomas. 2001. Elements of Information Theory. Wiley.Google ScholarGoogle Scholar
  5. Catherine D’Ignazio and Lauren F. Klein. 2020. Data Feminism. The MIT Press.Google ScholarGoogle Scholar
  6. Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In NeurIPS. 6478–6490.Google ScholarGoogle Scholar
  7. Sanghamitra Dutta, Dennis Wei, Hazar Yueksel, Pin-Yu Chen, Sijia Liu, and Kush R. Varshney. 2020. Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing. In ICML(Proceedings of Machine Learning Research, Vol. 119). PMLR, 2803–2813.Google ScholarGoogle Scholar
  8. European Commission. 2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206, accessed on January 2nd, 2023.Google ScholarGoogle Scholar
  9. Igor Goldenberg and Geoffrey I. Webb. 2019. Survey of distance measures for quantifying concept drift and shift in numeric data. Knowl. Inf. Syst. 60, 2 (2019), 591–615.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. 2022. Why do tree-based models still outperform deep learning on typical tabular data?. In NeurIPS.Google ScholarGoogle Scholar
  11. Silviu Guiasu. 1971. Weighted Entropy. Reports on Mathematical Physic 2, 3 (1971).Google ScholarGoogle Scholar
  12. Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. Springer.Google ScholarGoogle Scholar
  14. Annie Liang, Jay Lu, and Xiaosheng Mu. 2022. Algorithmic Design: Fairness Versus Accuracy. In EC. ACM, 58–59.Google ScholarGoogle Scholar
  15. Subha Maity, Debarghya Mukherjee, Mikhail Yurochkin, and Yuekai Sun. 2021. Does enforcing fairness mitigate biases caused by subpopulation shift?. In NeurIPS. 25773–25784.Google ScholarGoogle Scholar
  16. Tomasz Maszczyk and Wlodzislaw Duch. 2008. Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees. In ICAISC(Lecture Notes in Computer Science, Vol. 5097). Springer, 643–651.Google ScholarGoogle Scholar
  17. Jose G. Moreno-Torres, Troy Raeder, Rocío Alaíz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recognit. 45, 1 (2012), 521–530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Debarghya Mukherjee, Felix Petersen, Mikhail Yurochkin, and Yuekai Sun. 2022. Domain Adaptation meets Individual Fairness. And they get along. In NeurIPS.Google ScholarGoogle Scholar
  19. Ilya Nemenman, F. Shafee, and William Bialek. 2001. Entropy and Inference, Revisited. In NIPS. MIT Press, 471–478.Google ScholarGoogle Scholar
  20. Sebastian Nowozin. 2012. Improved Information Gain Estimates for Decision Tree Induction. In ICML. icml.cc / Omnipress.Google ScholarGoogle Scholar
  21. Vipin Kumar Pang-Ning Tan, Michael Steinbach. 2006. Introduction to Data Mining. Addison Wesley.Google ScholarGoogle Scholar
  22. Joaquin Quiñonero-Candela, Masashi Sugiyama, Neil D Lawrence, and Anton Schwaighofer. 2009. Dataset shift in machine learning. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younès Bennani. 2020. A survey on domain adaptation theory. CoRR abs/2004.11829 (2020).Google ScholarGoogle Scholar
  24. Cynthia Rudin. 2016. A renaissance for decision tree learning. https://www.youtube.com/watch?v=bY7WEr6lcuY. Keynote at PAPIs 2016..Google ScholarGoogle Scholar
  25. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 5 (2019), 206–215.Google ScholarGoogle ScholarCross RefCross Ref
  26. Thomas Schürmann. 2004. Bias analysis in entropy estimation. Journal of Physics A: Mathematical and General 37, 27 (2004), L295–L301. https://doi.org/10.1088/0305-4470/37/27/l02Google ScholarGoogle ScholarCross RefCross Ref
  27. Edward H Simpson. 1951. The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological) 13, 2 (1951), 238–241.Google ScholarGoogle ScholarCross RefCross Ref
  28. Gonen Singer, Roee Anuar, and Irad Ben-Gal. 2020. A weighted information-gain measure for ordinal classification trees. Expert Syst. Appl. 152 (2020), 113375.Google ScholarGoogle ScholarCross RefCross Ref
  29. Harini Suresh and John V. Guttag. 2021. A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle. In EAAMO. ACM, 17:1–17:9.Google ScholarGoogle Scholar
  30. Ana Valdivia, Javier Sánchez-Monedero, and Jorge Casillas. 2021. How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness. Int. J. Intell. Syst. 36, 4 (2021), 1619–1643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. João Vieira and Cláudia Antunes. 2014. Decision Tree Learner in the Presence of Domain Knowledge. In CSWS(Communications in Computer and Information Science, Vol. 480). Springer, 42–55.Google ScholarGoogle ScholarCross RefCross Ref
  32. White House. 2022. Blueprint for an AI Bill of Rights. https://www.whitehouse.gov/ostp/ai-bill-of-rights/. Accessed on January 2nd, 2023.Google ScholarGoogle Scholar
  33. Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. 2013. Domain Adaptation under Target and Conditional Shift. In ICML (3)(JMLR Workshop and Conference Proceedings, Vol. 28). JMLR.org, 819–827.Google ScholarGoogle Scholar
  34. Wenbin Zhang and Albert Bifet. 2020. FEAT: A Fairness-Enhancing and Concept-Adapting Decision Tree Classifier. In DS(Lecture Notes in Computer Science, Vol. 12323). Springer, 175–189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wenbin Zhang and Eirini Ntoutsi. 2019. FAHT: An Adaptive Fairness-aware Decision Tree Classifier. In IJCAI. ijcai.org, 1480–1486.Google ScholarGoogle Scholar

Index Terms

  1. Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
          June 2023
          1929 pages
          ISBN:9798400701924
          DOI:10.1145/3593013

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format