Abstract
Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current bias measurement methods in computer vision are based on observational datasets, and so conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change. Our method is based on generating synthetic image grids that differ along specific attributes while leaving other attributes constant. Crucially, we rely on the perception of human observers to control for synthesis inaccuracies when measuring algorithmic bias. We validate our method by comparing it to a traditional observational bias analysis study in gender classification algorithms. The two methods reach different conclusions. While the observational method reports gender and skin color biases, the experimental method reveals biases due to gender, hair length, age, and facial hair. We also show that our synthetic transects allow for more straightforward bias analysis on minority and intersectional groups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albiero, V., KS, K., Vangara, K., Zhang, K., King, M.C., Bowyer, K.W.: Analysis of gender inequality in face recognition accuracy. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 81–89 (2020)
Angrist, J.D., Imbens, G.W.: Identification and estimation of local average treatment effects. Technical report, National Bureau of Economic Research (1995)
Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
Bertrand, M., Mullainathan, S.: Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94(4), 991–1013 (2004)
Bowyer, K., Phillips, P.J.: Empirical Evaluation Techniques in Computer Vision. IEEE Computer Society Press (1998)
Brandao, M.: Age and gender bias in pedestrian detection algorithms. arXiv preprint arXiv:1906.10490 (2019)
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality data? (2016)
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Denton, E., Hutchinson, B., Mitchell, M., Gebru, T.: Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439 (2019)
Drozdowski, P., Rathgeb, C., Dantcheva, A., Damer, N., Busch, C.: Demographic bias in biometrics: a survey on an emerging challenge. IEEE Trans. Technol. Soc. (2020)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178. IEEE (2004)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Grother, P., Ngan, M., Hanaoka, K.: Ongoing face recognition vendor test (FRVT) part 1: verification. Technical report, National Institute of Standards and Technology (2018)
Grother, P.J., Ngan, M.L., Hanaoka, K.K.: Ongoing face recognition vendor test (FRVT) part 2: identification. Technical report (2018)
Hanaoka, P.G.N.K.: Face recognition vendor test (FRVT) part 3: demographic effects. IR 8280, NIST (2019). https://doi.org/10.6028/NIST.IR.8280
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heckman, J.J., Vytlacil, E.J.: Instrumental variables, selection models, and tight bounds on the average treatment effect. In: Lechner, M., Pfeiffer, F. (eds.) Econometric Evaluation of Labour Market Policies, vol. 13, pp. 1–15. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-57615-7_1
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Kärkkäinen, K., Joo, J.: FairFace: face attribute dataset for balanced race, gender, and age. arXiv preprint arXiv:1908.04913 (2019)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:1912.04958 (2019)
Kearns, M., Neel, S., Roth, A., Wu, Z.S.: Preventing fairness gerrymandering: auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144 (2017)
Kearns, M., Roth, A.: The Ethical Algorithm: The Science of Socially Aware Algorithm Design. Oxford University Press, Oxford (2019)
Klare, B.F., Burge, M.J., Klontz, J.C., Bruegge, R.W.V., Jain, A.K.: Face recognition performance: role of demographic information. IEEE Trans. Inf. Forensics Secur. 7(6), 1789–1801 (2012)
Kleinberg, J., Ludwig, J., Mullainathany, S., Sunstein, C.R.: Discrimination in the age of algorithms. Published by Oxford University Press on behalf of The John M. Olin Center for Law, Economics and Business at Harvard Law School (2019). https://academic.oup.com/jla/article-abstract/doi/10.1093/jla/laz001/5476086
Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Empirically analyzing the effect of dataset biases on deep face recognition systems. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2093–2102 (2018)
Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., Vetter, T.: Analyzing and reducing the damage of dataset bias to face recognition with synthetic data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Krishnapriya, K.S., Vangara, K., King, M., Albiero, V., Bowyer, K.: Characterizing the variability in face recognition accuracy relative to race. ArXiv 1904.07325, April 2019
Li, Y., Vasconcelos, N.: REPAIR: removing representation bias by dataset resampling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581 (2019)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Lohr, S.: Facial recognition is accurate, if you’re a white guy. New York Times, 9 February 2018. https://nyti.ms/2BNurVq
Lu, B., Chen, J.C., Castillo, C.D., Chellappa, R.: An experimental evaluation of covariates effects on unconstrained face verification. IEEE Trans. Biometr. Behav. Identity Sci. 1(1), 42–55 (2019)
Merkatz, R.B., Temple, R., Sobel, S., Feiden, K., Kessler, D.A.: Working group on women in clinical trials: women in clinical trials of new drugs-a change in food and drug administration policy. New Engl. J. Med. 329(4), 292–296 (1993)
Merler, M., Ratha, N., Feris, R.S., Smith, J.R.: Diversity in faces. arXiv preprint arXiv:1901.10436 (2019)
Muthukumar, V., et al.: Understanding unequal gender classification accuracy from face images. arXiv preprint arXiv:1812.00099 (2018)
Oreopoulos, P.: Estimating average and local average treatment effects of education when compulsory schooling laws really matter. Am. Econ. Rev. 96(1), 152–175 (2006)
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Phillips, P.J., Grother, P., Micheals, R., Blackburn, D.M., Tabassi, E., Bone, M.: Face recognition vendor test 2002. In: Proceedings of the 2003 IEEE International SOI Conference (Cat. No. 03CH37443), p. 44. IEEE (2003)
Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.J.: The feret database and evaluation procedure for face-recognition algorithms. Image Vis. Comput. 16(5), 295–306 (1998)
Phillips, P.J., et al.: Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proc. Natl. Acad. Sci. 115(24), 6171–6176 (2018)
Pocock, S.J., Assmann, S.E., Enos, L.E., Kasten, L.E.: Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practiceand problems. Stat. Med. 21(19), 2917–2930 (2002)
Ponce, J., et al.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006). https://doi.org/10.1007/11957959_2
Robinson, L.D., Jewell, N.P.: Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev./Revue Inte. Stat. 227–240 (1991)
Rubin, D.B.: Matched Sampling for Causal Effects. Cambridge University Press, Cambridge (2006)
Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019)
Simon, V.: Wanted: women in clinical trials (2005)
Singla, S., Pollack, B., Chen, J., Batmanghelich, K.: Explanation by progressive exaggeration. arXiv preprint arXiv:1911.00483 (2019)
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Torralba, A., Efros, A.A., et al.: Unbiased look at dataset bias. In: CVPR, vol. 1, p. 7 (2011)
VanderWeele, T.J., Shpitser, I.: On the definition of a confounder. Ann. Stat. 41(1), 196 (2013)
Willan, A.R., Briggs, A.H., Hoch, J.S.: Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Econ. 13(5), 461–475 (2004)
Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)
Xiao, T., Hong, J., Ma, J.: ELEGANT: exchanging latent encodings with GAN for transferring multiple face attributes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 168–184 (2018)
Zhou, B., Bau, D., Oliva, A., Torralba, A.: Interpreting deep visual representations via network dissection. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2131–2145 (2018)
Acknowledgments
We are grateful to Frederick Eberhardt, Bill Freeman, Lei Jin, Michael Kearns, R. Manmatha, Tristan McKinney, Sendhil Mullainathan, and Chandan Singh for insights and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Balakrishnan, G., Xiong, Y., Xia, W., Perona, P. (2020). Towards Causal Benchmarking of Bias in Face Analysis Algorithms. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12363. Springer, Cham. https://doi.org/10.1007/978-3-030-58523-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-58523-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58522-8
Online ISBN: 978-3-030-58523-5
eBook Packages: Computer ScienceComputer Science (R0)