Skip to main content

Towards Causal Benchmarking of Bias in Face Analysis Algorithms

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12363))

Included in the following conference series:

Abstract

Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current bias measurement methods in computer vision are based on observational datasets, and so conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. Our method is based on generating synthetic image grids that differ along specific attributes while leaving other attributes constant. Crucially, we rely on the perception of human observers to control for synthesis inaccuracies when measuring algorithmic bias. We validate our method by comparing it to a traditional observational bias analysis study in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair. We also show that our synthetic transects allow for more straightforward bias analysis on minority and intersectional groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://aws.amazon.com/sagemaker/groundtruth/

  2. Albiero, V., KS, K., Vangara, K., Zhang, K., King, M.C., Bowyer, K.W.: Analysis of gender inequality in face recognition accuracy. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 81–89 (2020)

    Google Scholar 

  3. Angrist, J.D., Imbens, G.W.: Identification and estimation of local average treatment effects. Technical report, National Bureau of Economic Research (1995)

    Google Scholar 

  4. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)

    Article  Google Scholar 

  5. Bertrand, M., Mullainathan, S.: Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94(4), 991–1013 (2004)

    Article  Google Scholar 

  6. Bowyer, K., Phillips, P.J.: Empirical Evaluation Techniques in Computer Vision. IEEE Computer Society Press (1998)

    Google Scholar 

  7. Brandao, M.: Age and gender bias in pedestrian detection algorithms. arXiv preprint arXiv:1906.10490 (2019)

  8. Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? (2016)

    Google Scholar 

  9. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)

    Google Scholar 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Denton, E., Hutchinson, B., Mitchell, M., Gebru, T.: Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439 (2019)

  12. Drozdowski, P., Rathgeb, C., Dantcheva, A., Damer, N., Busch, C.: Demographic bias in biometrics: a survey on an emerging challenge. IEEE Trans. Technol. Soc. (2020)

    Google Scholar 

  13. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178. IEEE (2004)

    Google Scholar 

  14. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  15. Grother, P., Ngan, M., Hanaoka, K.: Ongoing face recognition vendor test (FRVT) part 1: verification. Technical report, National Institute of Standards and Technology (2018)

    Google Scholar 

  16. Grother, P.J., Ngan, M.L., Hanaoka, K.K.: Ongoing face recognition vendor test (FRVT) part 2: identification. Technical report (2018)

    Google Scholar 

  17. Hanaoka, P.G.N.K.: Face recognition vendor test (FRVT) part 3: demographic effects. IR 8280, NIST (2019). https://doi.org/10.6028/NIST.IR.8280

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  19. Heckman, J.J., Vytlacil, E.J.: Instrumental variables, selection models, and tight bounds on the average treatment effect. In: Lechner, M., Pfeiffer, F. (eds.) Econometric Evaluation of Labour Market Policies, vol. 13, pp. 1–15. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-57615-7_1

    Chapter  Google Scholar 

  20. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)

    Article  Google Scholar 

  21. Kärkkäinen, K., Joo, J.: FairFace: face attribute dataset for balanced race, gender, and age. arXiv preprint arXiv:1908.04913 (2019)

  22. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:1912.04958 (2019)

  24. Kearns, M., Neel, S., Roth, A., Wu, Z.S.: Preventing fairness gerrymandering: auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144 (2017)

  25. Kearns, M., Roth, A.: The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford University Press, Oxford (2019)

    Google Scholar 

  26. Klare, B.F., Burge, M.J., Klontz, J.C., Bruegge, R.W.V., Jain, A.K.: Face recognition performance: role of demographic information. IEEE Trans. Inf. Forensics Secur. 7(6), 1789–1801 (2012)

    Article  Google Scholar 

  27. Kleinberg, J., Ludwig, J., Mullainathany, S., Sunstein, C.R.: Discrimination in the age of algorithms. Published by Oxford University Press on behalf of The John M. Olin Center for Law, Economics and Business at Harvard Law School (2019). https://academic.oup.com/jla/article-abstract/doi/10.1093/jla/laz001/5476086

  28. Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Empirically analyzing the effect of dataset biases on deep face recognition systems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2093–2102 (2018)

    Google Scholar 

  29. Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  30. Krishnapriya, K.S., Vangara, K., King, M., Albiero, V., Bowyer, K.: Characterizing the variability in face recognition accuracy relative to race. ArXiv 1904.07325, April 2019

    Google Scholar 

  31. Li, Y., Vasconcelos, N.: REPAIR: removing representation bias by dataset resampling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581 (2019)

    Google Scholar 

  32. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  33. Lohr, S.: Facial recognition is accurate, if you’re a white guy. New York Times, 9 February 2018. https://nyti.ms/2BNurVq

  34. Lu, B., Chen, J.C., Castillo, C.D., Chellappa, R.: An experimental evaluation of covariates effects on unconstrained face verification. IEEE Trans. Biometr. Behav. Identity Sci. 1(1), 42–55 (2019)

    Article  Google Scholar 

  35. Merkatz, R.B., Temple, R., Sobel, S., Feiden, K., Kessler, D.A.: Working group on women in clinical trials: women in clinical trials of new drugs-a change in food and drug administration policy. New Engl. J. Med. 329(4), 292–296 (1993)

    Article  Google Scholar 

  36. Merler, M., Ratha, N., Feris, R.S., Smith, J.R.: Diversity in faces. arXiv preprint arXiv:1901.10436 (2019)

  37. Muthukumar, V., et al.: Understanding unequal gender classification accuracy from face images. arXiv preprint arXiv:1812.00099 (2018)

  38. Oreopoulos, P.: Estimating average and local average treatment effects of education when compulsory schooling laws really matter. Am. Econ. Rev. 96(1), 152–175 (2006)

    Article  Google Scholar 

  39. Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  40. Phillips, P.J., Grother, P., Micheals, R., Blackburn, D.M., Tabassi, E., Bone, M.: Face recognition vendor test 2002. In: Proceedings of the 2003 IEEE International SOI Conference (Cat. No. 03CH37443), p. 44. IEEE (2003)

    Google Scholar 

  41. Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.J.: The feret database and evaluation procedure for face-recognition algorithms. Image Vis. Comput. 16(5), 295–306 (1998)

    Article  Google Scholar 

  42. Phillips, P.J., et al.: Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proc. Natl. Acad. Sci. 115(24), 6171–6176 (2018)

    Article  Google Scholar 

  43. Pocock, S.J., Assmann, S.E., Enos, L.E., Kasten, L.E.: Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practiceand problems. Stat. Med. 21(19), 2917–2930 (2002)

    Article  Google Scholar 

  44. Ponce, J., et al.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006). https://doi.org/10.1007/11957959_2

    Chapter  Google Scholar 

  45. Robinson, L.D., Jewell, N.P.: Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev./Revue Inte. Stat. 227–240 (1991)

    Google Scholar 

  46. Rubin, D.B.: Matched Sampling for Causal Effects. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  47. Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019)

  48. Simon, V.: Wanted: women in clinical trials (2005)

    Google Scholar 

  49. Singla, S., Pollack, B., Chen, J., Batmanghelich, K.: Explanation by progressive exaggeration. arXiv preprint arXiv:1911.00483 (2019)

  50. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  51. Torralba, A., Efros, A.A., et al.: Unbiased look at dataset bias. In: CVPR, vol. 1, p. 7 (2011)

    Google Scholar 

  52. VanderWeele, T.J., Shpitser, I.: On the definition of a confounder. Ann. Stat. 41(1), 196 (2013)

    Article  MathSciNet  Google Scholar 

  53. Willan, A.R., Briggs, A.H., Hoch, J.S.: Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Econ. 13(5), 461–475 (2004)

    Article  Google Scholar 

  54. Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)

    Article  Google Scholar 

  55. Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–184 (2018)

    Google Scholar 

  56. Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Frederick Eberhardt, Bill Freeman, Lei Jin, Michael Kearns, R. Manmatha, Tristan McKinney, Sendhil Mullainathan, and Chandan Singh for insights and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guha Balakrishnan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13316 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balakrishnan, G., Xiong, Y., Xia, W., Perona, P. (2020). Towards Causal Benchmarking of Bias in Face Analysis Algorithms. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58523-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58522-8

  • Online ISBN: 978-3-030-58523-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics