research-article

Certifying and Removing Disparate Impact

Authors:
Michael Feldman

Haverford College, Haverford, PA, USA

Haverford College, Haverford, PA, USA
View Profile

,
Sorelle A. Friedler

Haverford College, Haverford, PA, USA

Haverford College, Haverford, PA, USA
View Profile

,
John Moeller

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

,
Carlos Scheidegger

University of Arizona, Tucson, AZ, USA

University of Arizona, Tucson, AZ, USA
View Profile

,
Suresh Venkatasubramanian

University of Utah, Salt Lake City, UT, USA

University of Utah, Salt Lake City, UT, USA
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 259–268https://doi.org/10.1145/2783258.2783311

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 259–268

ABSTRACT

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender) and an explicit description of the process.

When computers are involved, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the process, we propose making inferences based on the data it uses.

We present four contributions. First, we link disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on how well the protected class can be predicted from the other attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

Supplemental Material

p259.mp4

mp4

106.8 MB

Download

References

S. Barocas and A. D. Selbst. Big data's disparate impact. Technical report, available at SSRN: http://ssrn.com/abstract=2477899, 2014.Google Scholar
T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In ICDM Workshop Domain Driven Data Mining, pages 13--18, 2009. Google ScholarDigital Library
C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proc. of Innovations in Theoretical Computer Science, 2012. Google ScholarDigital Library
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. J. of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. CoRR, abs/1412.3756, 2014. Google ScholarDigital Library
H. Hodson. No one in control: The algorithms that run our lives. New Scientist, Feb. 04, 2015.Google Scholar
T. Joachims. A support vector method for multivariate performance measures. In Proc. of Intl. Conf. on Machine Learning, pages 377--384. ACM, 2005. Google ScholarDigital Library
F. Kamiran and T. Calders. Classifying without discriminating. In Proc. of the IEEE International Conference on Computer, Control and Communication, 2009.Google ScholarCross Ref
T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Fairness-aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, pages 35--50, 2012. Google ScholarDigital Library
T. Kamishima, S. Akaho, and J. Sakuma. Fairness aware learning through regularization approach. In Proc of. Intl. Conf. on Data Mining, pages 643--650, 2011. Google ScholarDigital Library
B. T. Luong, S. Ruggieri, and F. Turini. k-nn as an implementation of situation testing for discrimination discovery and prevention. In Proc. of Intl. Conf. on Knowledge Discovery and Data Mining, KDD '11, pages 502--510, 2011. Google ScholarDigital Library
A. Menon, H. Narasimhan, S. Agarwal, and S. Chawla. On the statistical consistency of algorithms for binary classification under class imbalance. In Proc. 30th. ICM, pages 603--611, 2013.Google Scholar
W. Miao. Did the results of promotion exams have a disparate impact on minorities? Using statistical evidence in Ricci v. DeStefano. J. of Stat. Ed., 19(1), 2011.Google Scholar
D. Pedreschi, S. Ruggieri, and F. Turini. Integrating induction and deduction for finding evidence of discrimination. In Proc. of Intl. Conf. on Artificial Intelligence and Law, ICAIL '09, pages 157--166, 2009. Google ScholarDigital Library
D. Pedreschi, S. Ruggieri, and F. Turini. A study of top-k measures for discrimination discovery. In Proc. of Symposium on Applied Computing, SAC '12, pages 126--131, 2012. Google ScholarDigital Library
J. L. Peresie. Toward a coherent test for disparate impact discrimination. Indiana Law Journal, 84(3):Article 1, 2009.Google Scholar
J. Podesta, P. Pritzker, E. J. Moniz, J. Holdren, and J. Zients. Big data: seizing opportunities, preserving values. Executive Office of the President, May 2014.Google Scholar
A. Romei and S. Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, pages 1--57, April 3 2013.Google Scholar
Supreme Court of the United States. Griggs v. Duke Power Co. 401 U.S. 424, March 8, 1971.Google Scholar
Supreme Court of the United States. Watson v. Fort Worth Bank & Trust. 487 U.S. 977, 995, 1988.Google Scholar
Supreme Court of the United States. Ricci v. DeStefano. 557 U.S. 557, 174, 2009.Google Scholar
Texas House of Representatives. House bill 588. 75th Legislature, 1997.Google Scholar
The Leadership Conference. Civil rights principles for the era of big data. http://www.civilrights.org/press/2014/civil-rights-principles-big-data.html, Feb. 27, 2014.Google Scholar
The U.S. EEOC. Uniform guidelines on employee selection procedures, March 2, 1979.Google Scholar
R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proc. of Intl. Conf. on Machine Learning, pages 325--333, 2013.Google Scholar
M.-J. Zhao, N. Edakunni, A. Pocock, and G. Brown. Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J. of Machine Learning Research, 14(1):1033--1090, 2013. Google ScholarDigital Library

Index Terms

Certifying and Removing Disparate Impact

Recommendations

Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such ...
Read More
Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Developing classification algorithms that are fair with respect to sensitive attributes of the data is an important problem due to the increased deployment of classification algorithms in societal contexts. Several recent works have focused on studying ...
Read More
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
disparate impact
fairness
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 784
  Total Citations
  View Citations
- 5,428
  Total Downloads
- Downloads (Last 12 months)917
- Downloads (Last 6 weeks)123
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Certifying and Removing Disparate Impact

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

"Why Should I Trust You?": Explaining the Predictions of Any Classifier