Abstract
Ranking systems conceived on historical data are central to our societies. Given a set of applicants and the information as to whether a past-applicant should have been selected or not, the task of fairly ranking the applicants (either by humans or by computers) is critical to the success of any institution. These tasks are typically carried out using regression methods, and considering the impact of these selection processes on our lives, it is natural to expect various fairness guarantees. In this article, we assume that affirmative action is enforced and that the number of candidates to admit from each protected group is predetermined. We demonstrate that even with this safety-net, classical linear regression methods may increase discrimination in the selection process, reinforcing implicit biases against minorities, in particular by poorly ranking the top minority applicants. We show that this phenomenon is intrinsic to linear regression methods and may happen even if the sensitive attribute is explicitly part of the input, or if a linear regression is computed on each minority group individually. We show that to better rank applicants it might be needed to adapt the choice of the regression methods (linear, polynomial, etc.) to each minority group individually.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We use the term minority group here to essentially include all subgroups that have been historically discriminated against on the basis of type T, and does not reflect the quantitative representation of the subgroup in the total population.
Recall for a race is defined as the ratio of successful students admitted of that race to the total number of successful students of that race in the pool of applicants.
In [49], the order of the grades is reversed, i.e., Grade 3 is the highest possible grade and Grade 10 is the lowest possible grade. We have, however, reversed this order for clarity of presentation.
Given the grade set is \(\{3,\ldots ,10\}\) and that we have four subjects, it is easy to very that the ranking given at \(p_0=15\) is the same as the ranking given at any \(p\ge p_0\). In other words, the ranking at \(p_0\) is the same as the ranking at \(p=\infty \).
References
Bridgeman, Brent, Pollack, Judith, Burton, Nancy: Predicting grades in college courses: a comparison of multiple regression and percent succeeding approaches. J. Coll. Admiss. 199, 19–25 (2008)
Noble, J., Sawyer, R.: Predicting different levels of academic success in college using high school gpa and act composite score. act research report series. (2002)
Nguyen, Alyssa, Hays, Brianna, Wetstein, Matthew: Showing incoming students the campus ropes: predicting student persistence using a logistic regression model. J. Appl. Res. Community Coll. 18(1), 11–16 (2010)
Dey, E.L., Astin, A.W.: Statistical alternatives for studying college student retention: A comparative analysis of logit probit and linear regression. Research in higher education 34(5), 569–581 (1993)
Goldman, R.D., Hewitt, B.N.: Predicting the success of black, chicano, oriental and white college students. J. Educ. Meas. 13(2), 107–117 (1976)
Angrist, J.D., Rokkanen, M.: Wanna get away? regression discontinuity estimation of exam school effects away from the cutoff. J. Am. Stat. Assoc. 110(512), 1331–1344 (2015)
Corrente, Salvatore, Greco, Salvatore, SłOwińSki, Roman: Multiple criteria hierarchy process in robust ordinal regression. Decis. Support Syst. 53(3), 660–674 (2012)
Wesman, A.G., Bennett, G.K.: Multiple regression vs. simple addition of scores in prediction of college grades. Educ. Psychol. Meas. 19(2), 243–246 (1959)
Jacob, B.R., Jonah, E., Taylor, E.S., Lindy, B.R.R.: Teacher applicant hiring and teacher performance: Evidence from dc public schools. Technical report, National Bureau of Economic Research, (2016)
Borman, W.C., White, L.A., Pulakos, E.D., Oppler, S.H.: Models of supervisory job performance ratings. J. Appl. Psychol. 76(6), 863 (1991)
McHenry, J.J., Hough, L.M., Toquam, J.L., Hanson, M.A., Ashworth, S.: Project a validity results: the relationship between predictor and criterion domains. Pers. Psychol. 43(2), 335–354 (1990)
Ree, M.J., Earles, J.A.: Predicting training success: not much more than g. Pers. Psychol. 44(2), 321–332 (1991)
Raju, N.S., Steinhaus, S.D., Edwards, J.E., DeLessio, J. A.: logistic regression model for personnel selection. Appl. Psychol. Meas. 15(2), 139–152 (1991)
Agbemava, E., Nyarko, I.K., Adade, T.C., Bediako, A.K.: Logistic regression analysis of predictors of loan defaults by customers of non-traditional banks in ghana. Eur. Sci. J. 12(1), 175–189 (2016)
Wiginton, J.C.: A note on the comparison of logit and discriminant models of consumer credit behavior. J. Financ. Quant. Anal. 15(3), 757–770 (1980)
Leonard, K.J.: Empirical bayes analysis of the commercial loan evaluation process. Stat. Prob. Lett. 18(4), 289–296 (1993)
Gilbert, L.R., Menon, K., Schwartz, K.B.: Predicting bankruptcy for firms in financial distress. J. Bus. Financ. Account. 17(1), 161–171 (1990)
Zaghdoudi, T.: Bank failure prediction with logistic regression. Int. J. Econ. Financ. Issues 3(2), 537 (2013)
Srinivasan, B.V., Gnanasambandam, N., Zhao, S., Minhas, R.: Domain-specific adaptation of a partial least squares regression model for loan defaults prediction. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 474–479. IEEE, (2011)
Khemais, Zaghdoudi, Nesrine, Djebali, Mohamed, Mezni, et al.: Credit scoring and default risk prediction: a comparative study between discriminant analysis & logistic regression. Int. J. Econ. Financ. 8(4), 39 (2016)
Thompson, E.D., Bowling, B.V., Markle, R.E.: Predicting student success in a major’s introductory biology course via logistic regression analysis of scientific reasoning ability and mathematics scores. Res. Sci. Educ. 48(1), 151–163 (2018)
Cleary, T.: A: test bias: prediction of grades of negro and white students in integrated colleges. J. Educ. Meas. 5(2), 115–124 (1968)
Cleary, T.A., Hilton, T.L.: An investigation of item bias. Educ. Psychol. Meas. 28(1), 61–75 (1968)
Guion, R.M.: Employment tests and discriminatory hiring. Ind. Relat. J. Econ. Soc. 5(2), 20–37 (1966)
Thorndike, R.L.: Concepts of culture-fairness. J. Educ. Meas. 8(2), 63–70 (1971)
Kleinberg, Jon, Ludwig, Jens, Mullainathan, Sendhil, Rambachan, Ashesh: Algorithmic fairness. AEA Papers and Proceedings 108, 22–27 (2018)
Darlington, R.B.: Another look at cultural fairness 1. J. Educ. Meas. 8(2), 71–82 (1971)
Cole, N.S.: Bias in selection. J. Educ. Measur. 10(4), 237–255 (1973)
Hutchinson, B., Mitchell, M.: 50 years of test (un) fairness: Lessons for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 49–58, (2019)
Einhorn, H.J., Bass, A.R.: Methodological considerations relevant to discrimination in employment testing. Psychol. Bull. 75(4), 261 (1971)
Flaugher, Ronald, L.: Bias in testing: A review and discussion, TM report no 36. Educational Testing Services, (1974)
Flaugher, R.L.: The many definitions of test bias. Am. Psychol. 33(7), 671 (1978)
Jones, M.B.: Moderated regression and equal opportunity. Educ. Psychol. Meas. 33(3), 591–602 (1973)
Linn, R.L.: Fair test use in selection. Rev. Educ. Res. 43(2), 139–161 (1973)
Linn, R.L.: In search of fair selection procedures. J. Educ. Meas. 13(1), 53–58 (1976)
Petersen, N.S., Novick, M.R.: An evaluation of some models for culture-fair selection. J. Educ. Meas. 13(1), 3–29 (1976)
Zwick, R., Dorans, N.J.: Philosophical perspectives on assessment fairness. Fairness in educational assessment and measurement, pages 267–282, (2016)
Rice, M.F., Baptiste, B.: Race norming, validity generalization, and employment testing. Handb. Public Pers. Admin. 58, 451 (1994)
Hartigan, J.A., Wigdor, A.K.: Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academy Press, Washington (1989)
West-Faulcon, Kimberly: Fairness feuds: competing conceptions of title vii discriminatory testing. Wake Forest L. Rev. 46, 1035 (2011)
Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the compas recidivism algorithm. ProPublica (5 2016) 9(1), 3–3 (2016)
Dieterich, W., Mendoza, C.: and Tim Brennan. Demonstrating accuracy equity and predictive parity. Northpointe Inc, Compas risk scales (2016)
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. ProPublica 23, 2016 (2016)
Kleinberg, Jon, Lakkaraju, Himabindu, Leskovec, Jure, Ludwig, Jens, Mullainathan, Sendhil: Human decisions and machine predictions. Quart. J. Econ. 133(1), 237–293 (2018)
Barocas, S., Selbst, A.D.: Big data’s disparate impact. Calif. L. Rev. 104, 671 (2016)
Barocas, S., Hardt, M., Narayanan, A.: Fairness and Machine Learning. fairmlbook.org, (2019). http://www.fairmlbook.org
Fryer, R.G., Jr., Loury, G.C.: Valuing diversity. J. Polit. Econ. 121(4), 747–774 (2013)
Corbett-Davies, S., Pierson, E.F., Avi, G.S. , Huq, A.: Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806, (2017)
National Education Longitudinal Study of 1988, 1988. http://nces.ed.gov/surveys/nels88
Banaji, M.R., Greenwald, A.G.: Implicit gender stereotyping in judgments of fame. J. Person. Soc. Psychol. 68(2), 181–198 (1995)
Banaji, M.R., Hardin, C., Rothman, A.J.: Implicit stereotyping in person judgment. J. Person. Soc. Psychol. 65, 272–281 (1993)
Banaji, M.R., Hardin, C.: Automatic stereotyping. Psychol. Sci. 7, 136–141 (1996)
Bargh, J.A., Pratto, F.: Individual construct accessibility and perceptual selection. J. Exp. Soc. Psychol. 22, 293–311 (1986)
Bodenhausen, G.V.: Stereotypes as judgmental heuristics: evidence of circadian variations in discrimination. Psychol. Sci. 1, 319–322 (1990)
Darley, J.M., Gross, P.H.: A hypothesis-confirming bias in labeling effects. J. Pers. Soc. Psychol. 44, 20–33 (1983)
Devine, P.G.: Stereotypes and prejudice: their automatic and controlled components. J. Pers. Soc. Psychol. 56, 5–18 (1989)
Dovidio, J.F., Evans, N., Tyler, R.B.: Racial stereotypes: the contents of their cognitive representations. J. Exp. Soc. Psychol. 22, 22–37 (1986)
Dovidio, J.F., Kawakami, K., Johnson, C., Johnson, B., Howard, A.: On the nature of prejudice: automatic and controlled processes. J. Exp. Soc. Psychol. 33, 510–540 (1997)
Fazio, R.H., Jackson, J.R., Dunton, B.C., Williams, C.J.: Variability in automatic activation as an unobtrusive measure of racial attitudes. a bona fide pipeline? J. Pers. Soc. Psychol. 69, 1013–1027 (1995)
Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C., Kardes, F.R.: On the automatic activation of attitudes. J. Pers. Soc. Psychol. 50, 229–323 (1986)
Gaertner, S.L., McLaughlin, J.P.: Racial stereotypes: associations and ascriptions of positive and negative characteristics. Soc. Psychol. Quart. 46, 23–30 (1983)
Macrae, C.N., Bodenhausen, G.V., Milne, A.B., Jetten, J.: Out of mind but back in sight: Stereotypes on the rebound. J. Pers. Soc. Psychol. 67, 808–817 (1994)
Perdue, C.W., Gurtman, M.B.: Evidence for the automaticity of ageism. J. Exp. Soc. Psychol. 26, 199–216 (1990)
Rudman, L.A., Borgida, E.: The afterglow of construct accessibility: the behavioral consequences of priming men to view women as sexual objects. J. Exp. Soc. Psychol. 31, 493–517 (1995)
Stangor, C., Sullivan, L.A., Ford, T.E.: Affective and cognitive determinants of prejudice. Soc. Cognit. 9, 359–380 (1991)
Forscher, P.S., Lai, C.K., Axt, J.R., Ebersole, C.R., Herman, M.D., Patricia, G., Nosek, B.A.: A meta-analysis of procedures to change implicit measures. J. Pers. Soc. Psychol. 117(3), 522 (2019)
Meissner, F., Grigutsch, L.A., Koranyi, N., Müller, F., Rothermund, K.: Predicting behavior with implicit measures: disillusioning findings, reasonable explanations, and sophisticated solutions. Front. Psychol. 10, 2483 (2019)
Corneille, O., Hütter, M.: Implicit? what do you mean? a comprehensive review of the delusive implicitness construct in attitude research. Pers. Soc. Psychol. Rev. 24(3), 212–232 (2020)
Sue, D.W., Capodilupo, C.M., Torino, G.C., Bucceri, J.M., Holder, A., Nadal, K.L., Esquilin, M.: Racial microaggressions in everyday life: implications for clinical practice. Am. Psychol. 62(4), 271 (2007)
Sue, D.W.: Microaggressions in everyday life: race, gender, and sexual orientation. Wiley, New York (2010)
Paludi, Michele, A: Managing Diversity in Today’s Workplace: Strategies for Employees and Employers [4 volumes]. ABC-CLIO, (2012)
Lukianoff, G., Haidt, J.: The coddling of the American mind: How good intentions and bad ideas are setting up a generation for failure. Penguin Books, Baltimore (2019)
Cantu, E., Jussim, L.: Microaggressions, questionable science, and free speech. Texas Review of Law & Politics, Forthcoming, (2021)
Daniel Hirschman and Emily Adlin Bosk: Standardizing biases: selection devices and the quantification of race. Sociol. Race Ethnic. 6(3), 348–364 (2020)
Funding
C. S. Karthik was supported by a grant from the Simons Foundation, Grant Number 825876, Awardee Thu D. Nguyen, the Israel Science Foundation (grant number 552/16), the Len Blavatnik and the Blavatnik Family foundation, and Subhash Khot’s Simons Investigator Award. Claire Mathieu was partially funded by the grant ANR-19-CE48-0016 from the French National Research Agency (ANR).
Author information
Authors and Affiliations
Contributions
All authors contributed equally.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof of theorem 1
Proof of theorem 1
Suppose the linear regression predictor tries to fit the population to the equation \(A\mathbf {\beta } = \textbf{y}\), where each row of A corresponds to the non-sensitive attributes of an applicant (i.e., each row of A is a uniformly random vector in A) and the \(i^{\text {th}}\) coordinate of y is given by the p th norm of the \(i^{\text {th}}\) row of A. Note that \(\mathbf {\beta }\) is the vector that minimizes the least square error, \(\Vert A\mathbf {\beta } - \textbf{y}\Vert _2\).
First, we show below that \(\mathbf {\beta }\) must have (almost) the same entry in all coordinates.
Theorem 2
As \(n\rightarrow \infty \), with high probability \(1-o(1)\), the optimal vector \(\mathbf {\beta }=(b,\ldots ,b)+o(1)\) for some \(b\in {\mathbb {R}}\).
Proof
The error \(\Vert A\mathbf {\beta } -\textbf{y}\Vert _2^2\) is equal to \(\underset{i=1}{\overset{n}{\sum }} |A_i \cdot \mathbf {\beta } -y_i |^2\), where \(y_i\) is equal to the (scaled) p th norm of the row vector \(A_i\). So we have a sum of N independent identically distributed copies which by large of large numbers tends to \(\sim N \cdot \mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)\). The expected value \(\mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)\) is a quadratic form in \(\mathbf {\beta }\) given by:
Because the form \(Q(\mathbf {\beta }) =\mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)= \mathbb {E}(\mathbf {\beta }^{T} A_i^T A_i \mathbf {\beta }) -2\mathbb {E}(y_iA_i\cdot \mathbf {\beta })+\mathbb {E}(y_i^2)\) is non-negative, it has a unique minimum \(\mathbf {\beta }\) which satisfies gradient condition \(\frac{dQ(\mathbf {\beta } +t \textbf{v})}{dt}=0\) for every vector \(\textbf{v}\).
Therefore, for every \(\textbf{v}\), we have
Hence, the minimizer \(\mathbf {\beta }\) satisfies
If \(y_i\) is any symmetric function of the coordinates of \(A_i\) (like pth norm in this case), we have that \(y_iA_i\) is vector with identically distributed coordinates, so the vector \(\mathbb {E}(y_iA_i)\) has all equal coordinates.
If we use that \(A_i\) is vector with iid coordinates from [0, 1], then \(\mathbb {E}(A_i^T A_i) \) is given by a matrix M with the (r, s) entry given by \(M_{r,s} = \mathbb {E}(X_rX_s)\), where \(X_i\) are iid uniform variables on [0, 1].
So we have \( M = \frac{1}{3} I + (\frac{1}{4} J - \frac{1}{4} I) = \frac{1}{12} I + \frac{1}{4}J\), where J is the all ones matrix, and thus, we see that \(\mathbf {\beta }\) satisfies \(\mathbf {\beta } = (b, b, \cdots , b)\), where \(\frac{b}{12} + \frac{db}{4} = \mathbb {E}(y_i X_r)\).
Thus, we have that the minimizer of this quadratic form is of the form \(\mathbf {\beta }= (b, b, \cdots , b)\) for some \(b\in {\mathbb {R}}\). Therefore, with probability \(1-o(1)\), the minimizer \(\mathbf {\beta } = (b, b, \cdots , b) + o(1)\) (Because the quadratic form is very close to this expectation quadratic form with high probability, and the minimizer does not change under slight perturbations to the form) \(\square \)
Therefore, informally, we may conclude that linear regression simply selects the top \(50\%\) of the applicants based on their \(\ell _1\) norm.
Thus, ranking applicants using regression is equivalent to ranking according to \(A_i \cdot \mathbf {\beta }\) which is proportional to the \(\sum _{j=1}^d \mathbf {\beta } A_{i}(j)\) which essentially the \(\ell _1\) norm of \(A_i\). (We are going to rank according to \(\mathbf {\beta } =(b, b, \cdots , b) + o(1)\), so the ranking which is determined by the volumes of the region \(\mathbf {\beta } \cdot X > \tau \) is essentially equal to the volumes \((b, b, \cdots , b) \cdot X >\tau \)- which corresponds to the rank by \(\ell _1\) because in this case (non-negative entries) \(\Vert X\Vert _1 = (1, 1, \cdots , 1) \cdot X\).)
Let \(S_p\) be a ranking of vectors in \([0,1]^{d}\) based on their \(\ell _p\)-norm. Let \(S_p^\tau \) be the restriction of the ranking to the top \(\tau \) fraction of applicants. The value of \(|S_1^\tau {\setminus } S_p^\tau |\) gives us the recall of the theorem statement, and this is calculated below.
Theorem 3
Let S be a uniformly random sample of N points in \([0,1]^d\). Let \(p\in {\mathbb {R}}_{\ge 1}\cup \{\infty \}\). After ranking the points in S by their \(\ell _1\)-norm (resp. \(\ell _p\)-norm), let \(S_1\subset S\) (resp. \(S_p\subset S\)) be all points in S ranked in the top half (breaking ties randomly). Then, for large enough d, n we have that
Proof
By taking n large enough, it is enough to consider the case where you pick a point \({{\textbf {x}}}\) randomly from \([0,1]^d\) and compute the following probability:
where \(m_1\) (resp. \(m_p\)) is the median of the distribution of \(\Vert {{\textbf {x}}}\Vert _1\) (resp. \(\Vert x\Vert _p\)). Asymptotically, for large d, we have \(m_p\sim \root p \of {\frac{d}{p+1}}\). If \({{\textbf {x}}}:=(x_1,\ldots , x_d)\), then the above probability is essentially the following:
For every \(i\in [d]\), let \(\mathbf {y_i}:=(x_i,x_i^p)\). Then, the probability can be seen as:
where \({\mathcal {R}}:=[m_1,\infty )\times [m_p^p,\infty )\). Also, note that for every \(i\in [d]\), we have
Thus, applying central limit theorem to all the \(\mathbf {y_i}\)s, as \(d\rightarrow \infty \), we have:
where \(\Sigma \) is the covariance matrix of \(\mathbf {y_i}\)s given by \({\mathbb {E}}[\mathbf {y_i}^T\mathbf {y_i}]\). We can thus compute \(\Sigma \) to be:
So the probability in (1) converges to
Note that the distribution of \({\mathcal {N}}(0, \varvec{\Sigma })\) is given by
Moreover, we have the inverse of \(\Sigma \) is:
Thus, using the integral
we can compute probability to be the expression given in the theorem statement. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cohen-Addad, V., Gavva, S.T., Karthik, C.S. et al. Fairness of linear regression in decision making. Int J Data Sci Anal 18, 337–347 (2024). https://doi.org/10.1007/s41060-023-00423-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-023-00423-7