Algorithm Fairness
(sketch)
(Jan. 4, 2012)
This page is a placeholder (sketch) for a page that will discuss various issues about the so-called algorithm fairness or unfairness. Perceptions about algorithm fairness or unfairness involve the pattern whereby when one group (Group A) has a higher likelihood of experiencing outcome X than another group (Group B), while an imperfect predictor of outcome X will tend to underpredict the outcome X for Group A and overpredict the outcome for Group B, (a) among persons who experience outcome X after being evaluated by the predictor, a higher proportion of Group B than Group A will have been identified by the predictor as unlikely to experience outcome X (so-called false negatives) and (b) among persons who do not experience outcome X after being evaluated by the predictor higher proportion of Group A than Group B will have been identified by the predictor as likely to experience the outcome (so-called false positives). The issue, which is currently much discussed with regard to the fairness of algorithms used to make decisions about arrested or convicted persons is the same as that much discussed with regard to employment tests in Ability Testing: Uses, Consequences and Controversies, Part I, National Academies Press (1982) (Committee on Ability Testing, Assembly of Behavioral and Social Sciences, National Research Council, Alexandra K. Wigdor & Wendell R. Garner (eds.)) and various others places in the 1980s and early 1990s.
Whether outcome X is the favorable or corresponding adverse outcome in a particular context, as well as whether groups are deemed false positive of false negatives, is entirely arbitrary. Thus, in the case of employment tests where whites have higher scores than blacks, successful performance of the job is commonly regarded as outcome X. Thus, perceived test unfairness will commonly be described in terms of higher false negative rates for blacks (higher rates of test failure among blacks than whites who performed well on the job) and higher false positive rates for whites (i.e., higher pass rates among whites than blacks who did not perform well on the job).
In the case of algorithms used to predict recidivism, recidivism is commonly treated as X. Thus, in situations where black defendants have higher recidivism risk scores than whites, perceived algorithm unfairness will commonly be cast in terms of higher false positives for blacks and higher false negatives for whites. Commonly the racial differences in false positives and false negatives will be cast in relative terms.
With regard to both employment tests and recidivism algorithms, the characterization of the issue if often misleading. In the case of employment tests, the matter has at times been characterized in terms of underprediction of successful performance for blacks and overprediction for whites, when in fact that opposite is the case.
In the case of recidivism algorithms, a May 23, 2016 ProPublica article titled “How We Analyzed the COMPASS Algorithm” that is the subject of the Recidivism Illustration page, would state that “black defendants were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism, while white defendants were more likely than black defendants to be incorrectly flagged as low risk.” Such statements, even if semantically accurate, must be interpreted with an understanding that, among all defendants, black defendants were in fact less likely to be incorrectly identified as a likely to recidivate than a white defendant and more likely than whites to be incorrectly identified as unlikely to recidivate. It is only among defendants who did not recidivate that blacks were more likely than whites to have been incorrectly identified as highly likely to recidivate and only among defendants who did recidivate that whites were more likely than blacks to have been incorrectly identified as highly unlikely to recidivate.
This page will eventually address several issues regarding perceptions about algorithm fairness. One will involve the failure to understand the way cutoffs affect the size of relative differences in false positives and relative differences in false negative. Observers speak about reducing the unfairness of a predictor. But just such observers universally fail to understand the way altering a cutoffs will tend to affect relative differences in favorable and adverse outcomes pursuant to the predictor (e.g., altering the cutoff in a way that increase favorable outcomes tends to reduce relative differences in favorable outcomes while increasing relative differences in the corresponding adverse outcome), they fail to understand the way cutoffs affect the size of relative difference in false positives or false negatives.
Another may involve the way the validity of the predictor affects relative difference in false positives and relative difference in false negatives. That is, the more valid the predictor, the fewer will be the number of false positives and false negatives. But the effect on relative differences in false positives and false negatives is another matter.