ML Fairness

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. For example gender, ethnicity, sexual orientation or disability. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumers.

Context

Discussion about fairness in machine learning is a relatively recent topic. Since 2016 there has been a sharp increase in research into the topic.^[1] This increase could be partly accounted to an influential report by ProPublica that claimed that the COMPAS software, widely used in US courts to predict recidivism, was racially biased.^[2] One topic of research and discussion is the definition of fairness, as there is no universal definition, and different definitions can be in contradiction with each other, which makes it difficult to judge machine learning models.^[3] Other research topics include the origins of bias, the types of bias, and methods to reduce bias.^[4]

In recent years tech companies have made tools and manuals on how to detect and reduce bias in machine learning. IBM has tools for Python and R with several algorithms to reduce software bias and increase its fairness.^[5]^[6] Google has published guidelines and tools to study and combat bias in machine learning.^[7]^[8] Facebook have reported their use of a tool, Fairness Flow, to detect bias in their AI.^[9] However, critics have argued that the company's efforts are insufficient, reporting little use of the tool by employees as it cannot be used for all their programs and even when it can, use of the tool is optional.^[10]

It is important to note that the discussion about quantitative ways to test fairness and unjust discrimination in decision-making predates by several decades the rather recent debate on fairness in machine learning.^[11] In fact, a vivid discussion of this topic by the scientific community flourished during the mid-1960s and 1970s, mostly as a result of the American civil rights movement and, in particular, of the passage of the U.S. Civil Rights Act of 1964. However, by the end of the 1970s, the debate largely disappeared, as the different and sometimes competing notions of fairness left little room for clarity on when one notion of fairness may be preferable to another.

Language Bias

Language bias refers a type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository."^{[better source needed]}^[12] Luo et al. ^[12]show that current large language models, as they are predominately trained on English-language data, often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent. Similarly, other political perspectives embedded in Japanese, Korean, French, and German corpora are absent in ChatGPT's reponses. ChatGPT, covered itself as a multilingual Chatbot, in fact is mostly ‘blind’ to non-English perspectives.^[12]

Gender Bias

Gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. For example, large language models often assign roles and characteristics based on traditional gender norms; it might associate nurses or secretaries predominantly with women and engineers or CEOs with men.^[13]

Political Bias

Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data.^[14]

Controversies

The use of algorithmic decision making in the legal system has been a notable area of use under scrutiny. In 2014, then U.S. Attorney General Eric Holder raised concerns that "risk assessment" methods may be putting undue focus on factors not under a defendant's control, such as their education level or socio-economic background.^[15] The 2016 report by ProPublica on COMPAS claimed that black defendants were almost twice as likely to be incorrectly labelled as higher risk than white defendants, while making the opposite mistake with white defendants.^[2] The creator of COMPAS, Northepointe Inc., disputed the report, claiming their tool is fair and ProPublica made statistical errors,^[16] which was subsequently refuted again by ProPublica.^[17]

Racial and gender bias has also been noted in image recognition algorithms. Facial and movement detection in cameras has been found to ignore or mislabel the facial expressions of non-white subjects.^[18] In 2015, the automatic tagging feature in both Flickr and Google Photos was found to label black people with tags such as "animal" and "gorilla".^[19] A 2016 international beauty contest judged by an AI algorithm was found to be biased towards individuals with lighter skin, likely due to bias in training data.^[20] A study of three commercial gender classification algorithms in 2018 found that all three algorithms were generally most accurate when classifying light-skinned males and worst when classifying dark-skinned females.^[21] In 2020, an image cropping tool from Twitter was shown to prefer lighter skinned faces.^[22] DALL-E, a machine learning Text-to-image model released in 2021, has been prone to create racist and sexist images that reinforce societal stereotypes, something that has been admitted by its creators.^[23]

Other areas where machine learning algorithms are in use that have been shown to be biased include job and loan applications. Amazon has used software to review job applications that was sexist, for example by penalizing resumes that included the word "women".^[24] In 2019, Apple's algorithm to determine credit card limits for their new Apple Card gave significantly higher limits to males than females, even for couples that shared their finances.^[25] Mortgage-approval algorithms in use in the U.S. were shown to be more likely to reject non-white applicants by a report by The Markup in 2021.^[26]

Limitations

Recent works underline the presence of several limitations to the current landscape of fairness in machine learning, particularly when it comes to what is realistically achievable in this respect in the ever increasing real-world applications of AI.^[27]^[28]^[29] For instance, the mathematical and quantitative approach to formalize fairness, and the related "de-biasing" approaches, may rely onto too simplistic and easily overlooked assumptions, such as the categorization of individuals into pre-defined social groups. Other delicate aspects are, e.g., the interaction among several sensible characteristics,^[21] and the lack of a clear and shared philosophical and/or legal notion of non-discrimination.

Group fairness criteria

In classification problems, an algorithm learns a function to predict a discrete characteristic ${\textstyle Y}$ , the target variable, from known characteristics ${\textstyle X}$ . We model ${\textstyle A}$ as a discrete random variable which encodes some characteristics contained or implicitly encoded in ${\textstyle X}$ that we consider as sensitive characteristics (gender, ethnicity, sexual orientation, etc.). We finally denote by ${\textstyle R}$ the prediction of the classifier. Now let us define three main criteria to evaluate if a given classifier is fair, that is if its predictions are not influenced by some of these sensitive variables.^[30]

Independence

We say the random variables ${\textstyle (R,A)}$ satisfy independence if the sensitive characteristics ${\textstyle A}$ are statistically independent of the prediction ${\textstyle R}$ , and we write

R\bot A.

We can also express this notion with the following formula:

P(R=r\ |\ A=a)=P(R=r\ |\ A=b)\quad \forall r\in R\quad \forall a,b\in A

This means that the classification rate for each target classes is equal for people belonging to different groups with respect to sensitive characteristics

A

.

Yet another equivalent expression for independence can be given using the concept of mutual information between random variables, defined as

I(X,Y)=H(X)+H(Y)-H(X,Y)

In this formula,

{\textstyle H(X)}

is the entropy of the random variable

X

. Then

{\textstyle (R,A)}

satisfy independence if

{\textstyle I(R,A)=0}

.

A possible relaxation of the independence definition include introducing a positive slack ${\textstyle \epsilon >0}$ and is given by the formula:

P(R=r\ |\ A=a)\geq P(R=r\ |\ A=b)-\epsilon \quad \forall r\in R\quad \forall a,b\in A

Finally, another possible relaxation is to require ${\textstyle I(R,A)\leq \epsilon }$ .

Separation

We say the random variables ${\textstyle (R,A,Y)}$ satisfy separation if the sensitive characteristics ${\textstyle A}$ are statistically independent of the prediction ${\textstyle R}$ given the target value ${\textstyle Y}$ , and we write

R\bot A\ |\ Y.

We can also express this notion with the following formula:

P(R=r\ |\ Y=q,A=a)=P(R=r\ |\ Y=q,A=b)\quad \forall r\in R\quad q\in Y\quad \forall a,b\in A

This means that all the dependence of the decision

R

on the sensitive attribute

A

must be justified by the actual dependence of the true target variable

Y

.

Another equivalent expression, in the case of a binary target rate, is that the true positive rate and the false positive rate are equal (and therefore the false negative rate and the true negative rate are equal) for every value of the sensitive characteristics:

P(R=1\ |\ Y=1,A=a)=P(R=1\ |\ Y=1,A=b)\quad \forall a,b\in A

P(R=1\ |\ Y=0,A=a)=P(R=1\ |\ Y=0,A=b)\quad \forall a,b\in A

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]