Skip to main content
Log in

Abstract

Ranking systems conceived on historical data are central to our societies. Given a set of applicants and the information as to whether a past-applicant should have been selected or not, the task of fairly ranking the applicants (either by humans or by computers) is critical to the success of any institution. These tasks are typically carried out using regression methods, and considering the impact of these selection processes on our lives, it is natural to expect various fairness guarantees. In this article, we assume that affirmative action is enforced and that the number of candidates to admit from each protected group is predetermined. We demonstrate that even with this safety-net, classical linear regression methods may increase discrimination in the selection process, reinforcing implicit biases against minorities, in particular by poorly ranking the top minority applicants. We show that this phenomenon is intrinsic to linear regression methods and may happen even if the sensitive attribute is explicitly part of the input, or if a linear regression is computed on each minority group individually. We show that to better rank applicants it might be needed to adapt the choice of the regression methods (linear, polynomial, etc.) to each minority group individually.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We use the term minority group here to essentially include all subgroups that have been historically discriminated against on the basis of type T, and does not reflect the quantitative representation of the subgroup in the total population.

  2. Recall for a race is defined as the ratio of successful students admitted of that race to the total number of successful students of that race in the pool of applicants.

  3. In [49], the order of the grades is reversed, i.e., Grade 3 is the highest possible grade and Grade 10 is the lowest possible grade. We have, however, reversed this order for clarity of presentation.

  4. Given the grade set is \(\{3,\ldots ,10\}\) and that we have four subjects, it is easy to very that the ranking given at \(p_0=15\) is the same as the ranking given at any \(p\ge p_0\). In other words, the ranking at \(p_0\) is the same as the ranking at \(p=\infty \).

References

  1. Bridgeman, Brent, Pollack, Judith, Burton, Nancy: Predicting grades in college courses: a comparison of multiple regression and percent succeeding approaches. J. Coll. Admiss. 199, 19–25 (2008)

    Google Scholar 

  2. Noble, J., Sawyer, R.: Predicting different levels of academic success in college using high school gpa and act composite score. act research report series. (2002)

  3. Nguyen, Alyssa, Hays, Brianna, Wetstein, Matthew: Showing incoming students the campus ropes: predicting student persistence using a logistic regression model. J. Appl. Res. Community Coll. 18(1), 11–16 (2010)

    Google Scholar 

  4. Dey, E.L., Astin, A.W.: Statistical alternatives for studying college student retention: A comparative analysis of logit probit and linear regression. Research in higher education 34(5), 569–581 (1993)

    Article  Google Scholar 

  5. Goldman, R.D., Hewitt, B.N.: Predicting the success of black, chicano, oriental and white college students. J. Educ. Meas. 13(2), 107–117 (1976)

    Article  Google Scholar 

  6. Angrist, J.D., Rokkanen, M.: Wanna get away? regression discontinuity estimation of exam school effects away from the cutoff. J. Am. Stat. Assoc. 110(512), 1331–1344 (2015)

    Article  MathSciNet  Google Scholar 

  7. Corrente, Salvatore, Greco, Salvatore, SłOwińSki, Roman: Multiple criteria hierarchy process in robust ordinal regression. Decis. Support Syst. 53(3), 660–674 (2012)

    Article  Google Scholar 

  8. Wesman, A.G., Bennett, G.K.: Multiple regression vs. simple addition of scores in prediction of college grades. Educ. Psychol. Meas. 19(2), 243–246 (1959)

    Article  Google Scholar 

  9. Jacob, B.R., Jonah, E., Taylor, E.S., Lindy, B.R.R.: Teacher applicant hiring and teacher performance: Evidence from dc public schools. Technical report, National Bureau of Economic Research, (2016)

  10. Borman, W.C., White, L.A., Pulakos, E.D., Oppler, S.H.: Models of supervisory job performance ratings. J. Appl. Psychol. 76(6), 863 (1991)

    Article  Google Scholar 

  11. McHenry, J.J., Hough, L.M., Toquam, J.L., Hanson, M.A., Ashworth, S.: Project a validity results: the relationship between predictor and criterion domains. Pers. Psychol. 43(2), 335–354 (1990)

    Article  Google Scholar 

  12. Ree, M.J., Earles, J.A.: Predicting training success: not much more than g. Pers. Psychol. 44(2), 321–332 (1991)

    Article  Google Scholar 

  13. Raju, N.S., Steinhaus, S.D., Edwards, J.E., DeLessio, J. A.: logistic regression model for personnel selection. Appl. Psychol. Meas. 15(2), 139–152 (1991)

    Article  Google Scholar 

  14. Agbemava, E., Nyarko, I.K., Adade, T.C., Bediako, A.K.: Logistic regression analysis of predictors of loan defaults by customers of non-traditional banks in ghana. Eur. Sci. J. 12(1), 175–189 (2016)

    Google Scholar 

  15. Wiginton, J.C.: A note on the comparison of logit and discriminant models of consumer credit behavior. J. Financ. Quant. Anal. 15(3), 757–770 (1980)

    Article  Google Scholar 

  16. Leonard, K.J.: Empirical bayes analysis of the commercial loan evaluation process. Stat. Prob. Lett. 18(4), 289–296 (1993)

    Article  Google Scholar 

  17. Gilbert, L.R., Menon, K., Schwartz, K.B.: Predicting bankruptcy for firms in financial distress. J. Bus. Financ. Account. 17(1), 161–171 (1990)

    Article  Google Scholar 

  18. Zaghdoudi, T.: Bank failure prediction with logistic regression. Int. J. Econ. Financ. Issues 3(2), 537 (2013)

    Google Scholar 

  19. Srinivasan, B.V., Gnanasambandam, N., Zhao, S., Minhas, R.: Domain-specific adaptation of a partial least squares regression model for loan defaults prediction. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 474–479. IEEE, (2011)

  20. Khemais, Zaghdoudi, Nesrine, Djebali, Mohamed, Mezni, et al.: Credit scoring and default risk prediction: a comparative study between discriminant analysis & logistic regression. Int. J. Econ. Financ. 8(4), 39 (2016)

    Article  Google Scholar 

  21. Thompson, E.D., Bowling, B.V., Markle, R.E.: Predicting student success in a major’s introductory biology course via logistic regression analysis of scientific reasoning ability and mathematics scores. Res. Sci. Educ. 48(1), 151–163 (2018)

    Article  Google Scholar 

  22. Cleary, T.: A: test bias: prediction of grades of negro and white students in integrated colleges. J. Educ. Meas. 5(2), 115–124 (1968)

    Article  Google Scholar 

  23. Cleary, T.A., Hilton, T.L.: An investigation of item bias. Educ. Psychol. Meas. 28(1), 61–75 (1968)

    Article  Google Scholar 

  24. Guion, R.M.: Employment tests and discriminatory hiring. Ind. Relat. J. Econ. Soc. 5(2), 20–37 (1966)

    Google Scholar 

  25. Thorndike, R.L.: Concepts of culture-fairness. J. Educ. Meas. 8(2), 63–70 (1971)

    Article  Google Scholar 

  26. Kleinberg, Jon, Ludwig, Jens, Mullainathan, Sendhil, Rambachan, Ashesh: Algorithmic fairness. AEA Papers and Proceedings 108, 22–27 (2018)

    Article  Google Scholar 

  27. Darlington, R.B.: Another look at cultural fairness 1. J. Educ. Meas. 8(2), 71–82 (1971)

    Article  Google Scholar 

  28. Cole, N.S.: Bias in selection. J. Educ. Measur. 10(4), 237–255 (1973)

    Article  Google Scholar 

  29. Hutchinson, B., Mitchell, M.: 50 years of test (un) fairness: Lessons for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 49–58, (2019)

  30. Einhorn, H.J., Bass, A.R.: Methodological considerations relevant to discrimination in employment testing. Psychol. Bull. 75(4), 261 (1971)

    Article  Google Scholar 

  31. Flaugher, Ronald, L.: Bias in testing: A review and discussion, TM report no 36. Educational Testing Services, (1974)

  32. Flaugher, R.L.: The many definitions of test bias. Am. Psychol. 33(7), 671 (1978)

    Article  Google Scholar 

  33. Jones, M.B.: Moderated regression and equal opportunity. Educ. Psychol. Meas. 33(3), 591–602 (1973)

    Article  Google Scholar 

  34. Linn, R.L.: Fair test use in selection. Rev. Educ. Res. 43(2), 139–161 (1973)

    Article  Google Scholar 

  35. Linn, R.L.: In search of fair selection procedures. J. Educ. Meas. 13(1), 53–58 (1976)

    Article  Google Scholar 

  36. Petersen, N.S., Novick, M.R.: An evaluation of some models for culture-fair selection. J. Educ. Meas. 13(1), 3–29 (1976)

    Article  Google Scholar 

  37. Zwick, R., Dorans, N.J.: Philosophical perspectives on assessment fairness. Fairness in educational assessment and measurement, pages 267–282, (2016)

  38. Rice, M.F., Baptiste, B.: Race norming, validity generalization, and employment testing. Handb. Public Pers. Admin. 58, 451 (1994)

    Google Scholar 

  39. Hartigan, J.A., Wigdor, A.K.: Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academy Press, Washington (1989)

    Google Scholar 

  40. West-Faulcon, Kimberly: Fairness feuds: competing conceptions of title vii discriminatory testing. Wake Forest L. Rev. 46, 1035 (2011)

    Google Scholar 

  41. Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the compas recidivism algorithm. ProPublica (5 2016) 9(1), 3–3 (2016)

    Google Scholar 

  42. Dieterich, W., Mendoza, C.: and Tim Brennan. Demonstrating accuracy equity and predictive parity. Northpointe Inc, Compas risk scales (2016)

  43. Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. ProPublica 23, 2016 (2016)

    Google Scholar 

  44. Kleinberg, Jon, Lakkaraju, Himabindu, Leskovec, Jure, Ludwig, Jens, Mullainathan, Sendhil: Human decisions and machine predictions. Quart. J. Econ. 133(1), 237–293 (2018)

    MATH  Google Scholar 

  45. Barocas, S., Selbst, A.D.: Big data’s disparate impact. Calif. L. Rev. 104, 671 (2016)

    Google Scholar 

  46. Barocas, S., Hardt, M., Narayanan, A.: Fairness and Machine Learning. fairmlbook.org, (2019). http://www.fairmlbook.org

  47. Fryer, R.G., Jr., Loury, G.C.: Valuing diversity. J. Polit. Econ. 121(4), 747–774 (2013)

    Article  Google Scholar 

  48. Corbett-Davies, S., Pierson, E.F., Avi, G.S. , Huq, A.: Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining, pages 797–806, (2017)

  49. National Education Longitudinal Study of 1988, 1988. http://nces.ed.gov/surveys/nels88

  50. Banaji, M.R., Greenwald, A.G.: Implicit gender stereotyping in judgments of fame. J. Person. Soc. Psychol. 68(2), 181–198 (1995)

    Article  Google Scholar 

  51. Banaji, M.R., Hardin, C., Rothman, A.J.: Implicit stereotyping in person judgment. J. Person. Soc. Psychol. 65, 272–281 (1993)

    Article  Google Scholar 

  52. Banaji, M.R., Hardin, C.: Automatic stereotyping. Psychol. Sci. 7, 136–141 (1996)

    Article  Google Scholar 

  53. Bargh, J.A., Pratto, F.: Individual construct accessibility and perceptual selection. J. Exp. Soc. Psychol. 22, 293–311 (1986)

    Article  Google Scholar 

  54. Bodenhausen, G.V.: Stereotypes as judgmental heuristics: evidence of circadian variations in discrimination. Psychol. Sci. 1, 319–322 (1990)

    Article  Google Scholar 

  55. Darley, J.M., Gross, P.H.: A hypothesis-confirming bias in labeling effects. J. Pers. Soc. Psychol. 44, 20–33 (1983)

    Article  Google Scholar 

  56. Devine, P.G.: Stereotypes and prejudice: their automatic and controlled components. J. Pers. Soc. Psychol. 56, 5–18 (1989)

    Article  Google Scholar 

  57. Dovidio, J.F., Evans, N., Tyler, R.B.: Racial stereotypes: the contents of their cognitive representations. J. Exp. Soc. Psychol. 22, 22–37 (1986)

    Article  Google Scholar 

  58. Dovidio, J.F., Kawakami, K., Johnson, C., Johnson, B., Howard, A.: On the nature of prejudice: automatic and controlled processes. J. Exp. Soc. Psychol. 33, 510–540 (1997)

    Article  Google Scholar 

  59. Fazio, R.H., Jackson, J.R., Dunton, B.C., Williams, C.J.: Variability in automatic activation as an unobtrusive measure of racial attitudes. a bona fide pipeline? J. Pers. Soc. Psychol. 69, 1013–1027 (1995)

    Article  Google Scholar 

  60. Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C., Kardes, F.R.: On the automatic activation of attitudes. J. Pers. Soc. Psychol. 50, 229–323 (1986)

    Article  Google Scholar 

  61. Gaertner, S.L., McLaughlin, J.P.: Racial stereotypes: associations and ascriptions of positive and negative characteristics. Soc. Psychol. Quart. 46, 23–30 (1983)

    Article  Google Scholar 

  62. Macrae, C.N., Bodenhausen, G.V., Milne, A.B., Jetten, J.: Out of mind but back in sight: Stereotypes on the rebound. J. Pers. Soc. Psychol. 67, 808–817 (1994)

    Article  Google Scholar 

  63. Perdue, C.W., Gurtman, M.B.: Evidence for the automaticity of ageism. J. Exp. Soc. Psychol. 26, 199–216 (1990)

    Article  Google Scholar 

  64. Rudman, L.A., Borgida, E.: The afterglow of construct accessibility: the behavioral consequences of priming men to view women as sexual objects. J. Exp. Soc. Psychol. 31, 493–517 (1995)

    Article  Google Scholar 

  65. Stangor, C., Sullivan, L.A., Ford, T.E.: Affective and cognitive determinants of prejudice. Soc. Cognit. 9, 359–380 (1991)

    Article  Google Scholar 

  66. Forscher, P.S., Lai, C.K., Axt, J.R., Ebersole, C.R., Herman, M.D., Patricia, G., Nosek, B.A.: A meta-analysis of procedures to change implicit measures. J. Pers. Soc. Psychol. 117(3), 522 (2019)

    Article  Google Scholar 

  67. Meissner, F., Grigutsch, L.A., Koranyi, N., Müller, F., Rothermund, K.: Predicting behavior with implicit measures: disillusioning findings, reasonable explanations, and sophisticated solutions. Front. Psychol. 10, 2483 (2019)

    Article  Google Scholar 

  68. Corneille, O., Hütter, M.: Implicit? what do you mean? a comprehensive review of the delusive implicitness construct in attitude research. Pers. Soc. Psychol. Rev. 24(3), 212–232 (2020)

    Article  Google Scholar 

  69. Sue, D.W., Capodilupo, C.M., Torino, G.C., Bucceri, J.M., Holder, A., Nadal, K.L., Esquilin, M.: Racial microaggressions in everyday life: implications for clinical practice. Am. Psychol. 62(4), 271 (2007)

    Article  Google Scholar 

  70. Sue, D.W.: Microaggressions in everyday life: race, gender, and sexual orientation. Wiley, New York (2010)

    Google Scholar 

  71. Paludi, Michele, A: Managing Diversity in Today’s Workplace: Strategies for Employees and Employers [4 volumes]. ABC-CLIO, (2012)

  72. Lukianoff, G., Haidt, J.: The coddling of the American mind: How good intentions and bad ideas are setting up a generation for failure. Penguin Books, Baltimore (2019)

    Google Scholar 

  73. Cantu, E., Jussim, L.: Microaggressions, questionable science, and free speech. Texas Review of Law & Politics, Forthcoming, (2021)

  74. Daniel Hirschman and Emily Adlin Bosk: Standardizing biases: selection devices and the quantification of race. Sociol. Race Ethnic. 6(3), 348–364 (2020)

    Article  Google Scholar 

  75. http://karthikcs.org/files/Code.zip

Download references

Funding

C. S. Karthik was supported by a grant from the Simons Foundation, Grant Number 825876, Awardee Thu D. Nguyen, the Israel Science Foundation (grant number 552/16), the Len Blavatnik and the Blavatnik Family foundation, and Subhash Khot’s Simons Investigator Award. Claire Mathieu was partially funded by the grant ANR-19-CE48-0016 from the French National Research Agency (ANR).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally.

Corresponding author

Correspondence to C. S. Karthik.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proof of theorem 1

Proof of theorem 1

Suppose the linear regression predictor tries to fit the population to the equation \(A\mathbf {\beta } = \textbf{y}\), where each row of A corresponds to the non-sensitive attributes of an applicant (i.e., each row of A is a uniformly random vector in A) and the \(i^{\text {th}}\) coordinate of y is given by the p th norm of the \(i^{\text {th}}\) row of A. Note that \(\mathbf {\beta }\) is the vector that minimizes the least square error, \(\Vert A\mathbf {\beta } - \textbf{y}\Vert _2\).

First, we show below that \(\mathbf {\beta }\) must have (almost) the same entry in all coordinates.

Theorem 2

As \(n\rightarrow \infty \), with high probability \(1-o(1)\), the optimal vector \(\mathbf {\beta }=(b,\ldots ,b)+o(1)\) for some \(b\in {\mathbb {R}}\).

Proof

The error \(\Vert A\mathbf {\beta } -\textbf{y}\Vert _2^2\) is equal to \(\underset{i=1}{\overset{n}{\sum }} |A_i \cdot \mathbf {\beta } -y_i |^2\), where \(y_i\) is equal to the (scaled) p th norm of the row vector \(A_i\). So we have a sum of N independent identically distributed copies which by large of large numbers tends to \(\sim N \cdot \mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)\). The expected value \(\mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)\) is a quadratic form in \(\mathbf {\beta }\) given by:

$$\begin{aligned}{} & {} \mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2) = \mathbb {E}( (\left( \mathbf {\beta }^{T}\cdot A_i^T -y_i\right) \left( A_i\cdot \mathbf {\beta }-y_i\right) ) \\{} & {} = \mathbb {E}(\mathbf {\beta }^{T} A_i^T A_i \mathbf {\beta }) -2\mathbb {E}(y_iA_i\cdot \mathbf {\beta })+\mathbb {E}(y_i^2). \end{aligned}$$

Because the form \(Q(\mathbf {\beta }) =\mathbb {E}(|A_i \cdot \mathbf {\beta } -y_i |^2)= \mathbb {E}(\mathbf {\beta }^{T} A_i^T A_i \mathbf {\beta }) -2\mathbb {E}(y_iA_i\cdot \mathbf {\beta })+\mathbb {E}(y_i^2)\) is non-negative, it has a unique minimum \(\mathbf {\beta }\) which satisfies gradient condition \(\frac{dQ(\mathbf {\beta } +t \textbf{v})}{dt}=0\) for every vector \(\textbf{v}\).

$$\begin{aligned}&Q(\mathbf {\beta } + t\textbf{v}) = \mathbb {E}((\mathbf {\beta }+t \textbf{v})^{T} A_i^T A_i ( \mathbf {\beta }+ t\textbf{v}) \\&-2\mathbb {E}(y_iA_i\cdot (\mathbf {\beta } + t\textbf{v}))+\mathbb {E}(y_i^2)\\ =&Q(\mathbf {\beta }) + t \left( \textbf{v}^{T} \mathbb {E}(A_i^T A_i)\mathbf {\beta } + \mathbf {\beta }^T \mathbb {E}(A_i^T A_i) \cdot \textbf{v}- 2\mathbb {E}(y_iA_i)\cdot \textbf{v}\right) \\&+O(t^2). \end{aligned}$$

Therefore, for every \(\textbf{v}\), we have

$$\begin{aligned} \textbf{v}^{T} \mathbb {E}(A_i^T A_i)\mathbf {\beta } + \mathbf {\beta }^T \mathbb {E}(A_i^T A_i) \textbf{v} -2\mathbb {E}(y_iA_i)\cdot \textbf{v}=0\\ \Rightarrow 2 \mathbb {E}(A_i^T A_i) \mathbf {\beta } - 2\mathbb {E}(y_iA_i)=\textbf{0}. \end{aligned}$$

Hence, the minimizer \(\mathbf {\beta }\) satisfies

$$\begin{aligned} \mathbb {E}(A_i^T A_i) \mathbf {\beta } =\mathbb {E}(y_iA_i). \end{aligned}$$

If \(y_i\) is any symmetric function of the coordinates of \(A_i\) (like pth norm in this case), we have that \(y_iA_i\) is vector with identically distributed coordinates, so the vector \(\mathbb {E}(y_iA_i)\) has all equal coordinates.

If we use that \(A_i\) is vector with iid coordinates from [0, 1], then \(\mathbb {E}(A_i^T A_i) \) is given by a matrix M with the (rs) entry given by \(M_{r,s} = \mathbb {E}(X_rX_s)\), where \(X_i\) are iid uniform variables on [0, 1].

So we have \( M = \frac{1}{3} I + (\frac{1}{4} J - \frac{1}{4} I) = \frac{1}{12} I + \frac{1}{4}J\), where J is the all ones matrix, and thus, we see that \(\mathbf {\beta }\) satisfies \(\mathbf {\beta } = (b, b, \cdots , b)\), where \(\frac{b}{12} + \frac{db}{4} = \mathbb {E}(y_i X_r)\).

Thus, we have that the minimizer of this quadratic form is of the form \(\mathbf {\beta }= (b, b, \cdots , b)\) for some \(b\in {\mathbb {R}}\). Therefore, with probability \(1-o(1)\), the minimizer \(\mathbf {\beta } = (b, b, \cdots , b) + o(1)\) (Because the quadratic form is very close to this expectation quadratic form with high probability, and the minimizer does not change under slight perturbations to the form) \(\square \)

Therefore, informally, we may conclude that linear regression simply selects the top \(50\%\) of the applicants based on their \(\ell _1\) norm.

Thus, ranking applicants using regression is equivalent to ranking according to \(A_i \cdot \mathbf {\beta }\) which is proportional to the \(\sum _{j=1}^d \mathbf {\beta } A_{i}(j)\) which essentially the \(\ell _1\) norm of \(A_i\). (We are going to rank according to \(\mathbf {\beta } =(b, b, \cdots , b) + o(1)\), so the ranking which is determined by the volumes of the region \(\mathbf {\beta } \cdot X > \tau \) is essentially equal to the volumes \((b, b, \cdots , b) \cdot X >\tau \)- which corresponds to the rank by \(\ell _1\) because in this case (non-negative entries) \(\Vert X\Vert _1 = (1, 1, \cdots , 1) \cdot X\).)

Let \(S_p\) be a ranking of vectors in \([0,1]^{d}\) based on their \(\ell _p\)-norm. Let \(S_p^\tau \) be the restriction of the ranking to the top \(\tau \) fraction of applicants. The value of \(|S_1^\tau {\setminus } S_p^\tau |\) gives us the recall of the theorem statement, and this is calculated below.

Theorem 3

Let S be a uniformly random sample of N points in \([0,1]^d\). Let \(p\in {\mathbb {R}}_{\ge 1}\cup \{\infty \}\). After ranking the points in S by their \(\ell _1\)-norm (resp. \(\ell _p\)-norm), let \(S_1\subset S\) (resp. \(S_p\subset S\)) be all points in S ranked in the top half (breaking ties randomly). Then, for large enough dn we have that

$$\begin{aligned} \frac{ |S_p\cap S_1|}{|S_1|}\sim 1-\frac{\tan ^{-1}\left( \sqrt{\frac{(p+2)^2}{6p+3}-1}\right) }{\pi }. \end{aligned}$$

Proof

By taking n large enough, it is enough to consider the case where you pick a point \({{\textbf {x}}}\) randomly from \([0,1]^d\) and compute the following probability:

$$\begin{aligned} \Pr _{{{\textbf {x}}}\sim [0,1]^d}\left[ \Vert {{\textbf {x}}}\Vert _1\ge m_1\text { and } \Vert {{\textbf {x}}}\Vert _p\ge m_p\right] , \end{aligned}$$

where \(m_1\) (resp. \(m_p\)) is the median of the distribution of \(\Vert {{\textbf {x}}}\Vert _1\) (resp. \(\Vert x\Vert _p\)). Asymptotically, for large d, we have \(m_p\sim \root p \of {\frac{d}{p+1}}\). If \({{\textbf {x}}}:=(x_1,\ldots , x_d)\), then the above probability is essentially the following:

$$\begin{aligned} \Pr _{{{\textbf {x}}}\sim [0,1]^d}\left[ x_1+\cdots +x_d\ge m_1\text { and } x_1^p+\cdots x_d^p\ge m_p^p\right] . \end{aligned}$$

For every \(i\in [d]\), let \(\mathbf {y_i}:=(x_i,x_i^p)\). Then, the probability can be seen as:

$$\begin{aligned} \Pr _{{{\textbf {x}}}\sim [0,1]^d}\left[ \mathbf {y_1}+\cdots +\mathbf {y_d}\in {\mathcal {R}}\right] , \end{aligned}$$
(1)

where \({\mathcal {R}}:=[m_1,\infty )\times [m_p^p,\infty )\). Also, note that for every \(i\in [d]\), we have

$$\begin{aligned} \underset{{{{\textbf {x}}}\sim [0,1]^d}}{{\mathbb {E}}}\left[ \mathbf {y_i}\right] = \left( \frac{d}{2},\frac{d}{p+1}\right) \sim (m_1,m_p^p). \end{aligned}$$

Thus, applying central limit theorem to all the \(\mathbf {y_i}\)s, as \(d\rightarrow \infty \), we have:

$$\begin{aligned} \frac{1}{\sqrt{d}}\cdot \underset{i\in [d]}{\sum }\left( \mathbf {y_i}-{\mathbb E}\left[ \mathbf {y_i}\right] \right) \rightarrow {\mathcal {N}}(0,\Sigma ), \end{aligned}$$

where \(\Sigma \) is the covariance matrix of \(\mathbf {y_i}\)s given by \({\mathbb {E}}[\mathbf {y_i}^T\mathbf {y_i}]\). We can thus compute \(\Sigma \) to be:

$$\begin{aligned} \Sigma ={\mathbb {E}}\left[ \begin{array}{cc} x_i^2&{}x_i^{p+1}\\ x_i^{p+1}&{}x_i^{2p} \end{array}\right] =\left[ \begin{array}{cc} \nicefrac {1}{3}&{}\nicefrac {1}{p+2}\\ \nicefrac {1}{p+2}&{}\nicefrac {1}{2p+1} \end{array}\right] . \end{aligned}$$

So the probability in (1) converges to

$$\begin{aligned} \Pr _{{\textbf{Y}}\sim {\mathcal {N}}(0,\Sigma )}[{\textbf{Y}}\in [0,\infty )\times [0,\infty )]. \end{aligned}$$

Note that the distribution of \({\mathcal {N}}(0, \varvec{\Sigma })\) is given by

$$\begin{aligned} \displaystyle {\frac{\exp \left( -{\frac{1}{2}}({{\textbf{Y}}^T}{ \varvec{\Sigma } }^{-1}{{\textbf{Y}}})\right) }{ {2\pi \cdot |{ \det {\varvec{\Sigma }}}|}}} \end{aligned}$$

Moreover, we have the inverse of \(\Sigma \) is:

$$\begin{aligned} \varvec{\Sigma } ^{-1} = \frac{1}{\frac{1}{3(2p+1)} -\frac{1}{(p+2)^2}}\left[ \begin{array}{cc} \frac{1}{2p+1}&{} -\frac{1}{p+2}\\ -\frac{1}{p+2}&{} \frac{1}{3}\end{array}\right] . \end{aligned}$$

Thus, using the integral

$$\begin{aligned}{} & {} \int _0^{\infty }\int _0^{\infty } \exp (a x^2 + b x y + c y^2) \, dx \, dy = \\{} & {} \frac{1}{2\sqrt{4ac-b^2}} \left( \pi + 2 \arctan \left( \frac{b}{\sqrt{4ac-b^2}}\right) \right) , \end{aligned}$$

we can compute probability to be the expression given in the theorem statement. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cohen-Addad, V., Gavva, S.T., Karthik, C.S. et al. Fairness of linear regression in decision making. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00423-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41060-023-00423-7

Navigation