Towards Causal Benchmarking of Bias in Face Analysis Algorithms

Balakrishnan, Guha; Xiong, Yuanjun; Xia, Wei; Perona, Pietro

doi:10.1007/978-3-030-58523-5_32

Guha Balakrishnan^12,14,
Yuanjun Xiong¹⁴,
Wei Xia¹⁴ &
…
Pietro Perona^13,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12363))

Included in the following conference series:

European Conference on Computer Vision

3383 Accesses
21 Citations
3 Altmetric

Abstract

Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current bias measurement methods in computer vision are based on observational datasets, and so conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. Our method is based on generating synthetic image grids that differ along specific attributes while leaving other attributes constant. Crucially, we rely on the perception of human observers to control for synthesis inaccuracies when measuring algorithmic bias. We validate our method by comparing it to a traditional observational bias analysis study in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair. We also show that our synthetic transects allow for more straightforward bias analysis on minority and intersectional groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://aws.amazon.com/sagemaker/groundtruth/
Albiero, V., KS, K., Vangara, K., Zhang, K., King, M.C., Bowyer, K.W.: Analysis of gender inequality in face recognition accuracy. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 81–89 (2020)
Google Scholar
Angrist, J.D., Imbens, G.W.: Identification and estimation of local average treatment effects. Technical report, National Bureau of Economic Research (1995)
Google Scholar
Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Article Google Scholar
Bertrand, M., Mullainathan, S.: Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94(4), 991–1013 (2004)
Article Google Scholar
Bowyer, K., Phillips, P.J.: Empirical Evaluation Techniques in Computer Vision. IEEE Computer Society Press (1998)
Google Scholar
Brandao, M.: Age and gender bias in pedestrian detection algorithms. arXiv preprint arXiv:1906.10490 (2019)
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? (2016)
Google Scholar
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Denton, E., Hutchinson, B., Mitchell, M., Gebru, T.: Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439 (2019)
Drozdowski, P., Rathgeb, C., Dantcheva, A., Damer, N., Busch, C.: Demographic bias in biometrics: a survey on an emerging challenge. IEEE Trans. Technol. Soc. (2020)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178. IEEE (2004)
Google Scholar
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Grother, P., Ngan, M., Hanaoka, K.: Ongoing face recognition vendor test (FRVT) part 1: verification. Technical report, National Institute of Standards and Technology (2018)
Google Scholar
Grother, P.J., Ngan, M.L., Hanaoka, K.K.: Ongoing face recognition vendor test (FRVT) part 2: identification. Technical report (2018)
Google Scholar
Hanaoka, P.G.N.K.: Face recognition vendor test (FRVT) part 3: demographic effects. IR 8280, NIST (2019). https://doi.org/10.6028/NIST.IR.8280
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heckman, J.J., Vytlacil, E.J.: Instrumental variables, selection models, and tight bounds on the average treatment effect. In: Lechner, M., Pfeiffer, F. (eds.) Econometric Evaluation of Labour Market Policies, vol. 13, pp. 1–15. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-57615-7_1
Chapter Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Kärkkäinen, K., Joo, J.: FairFace: face attribute dataset for balanced race, gender, and age. arXiv preprint arXiv:1908.04913 (2019)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:1912.04958 (2019)
Kearns, M., Neel, S., Roth, A., Wu, Z.S.: Preventing fairness gerrymandering: auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144 (2017)
Kearns, M., Roth, A.: The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford University Press, Oxford (2019)
Google Scholar
Klare, B.F., Burge, M.J., Klontz, J.C., Bruegge, R.W.V., Jain, A.K.: Face recognition performance: role of demographic information. IEEE Trans. Inf. Forensics Secur. 7(6), 1789–1801 (2012)
Article Google Scholar
Kleinberg, J., Ludwig, J., Mullainathany, S., Sunstein, C.R.: Discrimination in the age of algorithms. Published by Oxford University Press on behalf of The John M. Olin Center for Law, Economics and Business at Harvard Law School (2019). https://academic.oup.com/jla/article-abstract/doi/10.1093/jla/laz001/5476086
Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Empirically analyzing the effect of dataset biases on deep face recognition systems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2093–2102 (2018)
Google Scholar
Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Krishnapriya, K.S., Vangara, K., King, M., Albiero, V., Bowyer, K.: Characterizing the variability in face recognition accuracy relative to race. ArXiv 1904.07325, April 2019
Google Scholar
Li, Y., Vasconcelos, N.: REPAIR: removing representation bias by dataset resampling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581 (2019)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Lohr, S.: Facial recognition is accurate, if you’re a white guy. New York Times, 9 February 2018. https://nyti.ms/2BNurVq
Lu, B., Chen, J.C., Castillo, C.D., Chellappa, R.: An experimental evaluation of covariates effects on unconstrained face verification. IEEE Trans. Biometr. Behav. Identity Sci. 1(1), 42–55 (2019)
Article Google Scholar
Merkatz, R.B., Temple, R., Sobel, S., Feiden, K., Kessler, D.A.: Working group on women in clinical trials: women in clinical trials of new drugs-a change in food and drug administration policy. New Engl. J. Med. 329(4), 292–296 (1993)
Article Google Scholar
Merler, M., Ratha, N., Feris, R.S., Smith, J.R.: Diversity in faces. arXiv preprint arXiv:1901.10436 (2019)
Muthukumar, V., et al.: Understanding unequal gender classification accuracy from face images. arXiv preprint arXiv:1812.00099 (2018)
Oreopoulos, P.: Estimating average and local average treatment effects of education when compulsory schooling laws really matter. Am. Econ. Rev. 96(1), 152–175 (2006)
Article Google Scholar
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Phillips, P.J., Grother, P., Micheals, R., Blackburn, D.M., Tabassi, E., Bone, M.: Face recognition vendor test 2002. In: Proceedings of the 2003 IEEE International SOI Conference (Cat. No. 03CH37443), p. 44. IEEE (2003)
Google Scholar
Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.J.: The feret database and evaluation procedure for face-recognition algorithms. Image Vis. Comput. 16(5), 295–306 (1998)
Article Google Scholar
Phillips, P.J., et al.: Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proc. Natl. Acad. Sci. 115(24), 6171–6176 (2018)
Article Google Scholar
Pocock, S.J., Assmann, S.E., Enos, L.E., Kasten, L.E.: Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practiceand problems. Stat. Med. 21(19), 2917–2930 (2002)
Article Google Scholar
Ponce, J., et al.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006). https://doi.org/10.1007/11957959_2
Chapter Google Scholar
Robinson, L.D., Jewell, N.P.: Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev./Revue Inte. Stat. 227–240 (1991)
Google Scholar
Rubin, D.B.: Matched Sampling for Causal Effects. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019)
Simon, V.: Wanted: women in clinical trials (2005)
Google Scholar
Singla, S., Pollack, B., Chen, J., Batmanghelich, K.: Explanation by progressive exaggeration. arXiv preprint arXiv:1911.00483 (2019)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Torralba, A., Efros, A.A., et al.: Unbiased look at dataset bias. In: CVPR, vol. 1, p. 7 (2011)
Google Scholar
VanderWeele, T.J., Shpitser, I.: On the definition of a confounder. Ann. Stat. 41(1), 196 (2013)
Article MathSciNet Google Scholar
Willan, A.R., Briggs, A.H., Hoch, J.S.: Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Econ. 13(5), 461–475 (2004)
Article Google Scholar
Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)
Article Google Scholar
Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–184 (2018)
Google Scholar
Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018)
Article Google Scholar

Download references

Acknowledgments

We are grateful to Frederick Eberhardt, Bill Freeman, Lei Jin, Michael Kearns, R. Manmatha, Tristan McKinney, Sendhil Mullainathan, and Chandan Singh for insights and suggestions.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, USA
Guha Balakrishnan
California Institute of Technology, Pasadena, USA
Pietro Perona
Amazon Web Services, Seattle, USA
Guha Balakrishnan, Yuanjun Xiong, Wei Xia & Pietro Perona

Authors

Guha Balakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanjun Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Perona
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guha Balakrishnan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13316 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balakrishnan, G., Xiong, Y., Xia, W., Perona, P. (2020). Towards Causal Benchmarking of Bias in Face Analysis Algorithms. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-58523-5_32
Published: 04 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics