top of page

Navigating the Complex Landscape of AI - Understanding AI Bias

In today's era of rapid technological advancements, Artificial Intelligence (AI) stands out as a transformative force and the hottest topic of nowadays.

While AI offers immense transformative potential and productivity, it's accompanied by significant challenges that need careful consideration.

In our blog series, we'll dive into a series of articles taking a closer look at each consideration, with our First set of posts on AI bias.

How OpenAI DALL-E visualizes AI Bias.

Bias in AI

AI models learn and behave based on the training data fed into them and how those are finetuned. If training data is incomplete, skewed or biased, it would inadvertently perpetuate and amplify human biases. These biases are usually based on the same inherited human biases on features such as gender, ethnicity, race, language, age etc.

Additionally, biases could be introduced through the algorithm design choices as well as due to intentional or unintentional biases of the people who prepare data or who develop these AI systems.

This is a dangerous situation as AI is increasingly getting embedded into many day-to-day applications ranging from recruitment, healthcare and even law enforcement.

Many tend to think that machines and AI are emotionless, neutral and unbiased, often overlooking algorithmic bias until an eye-opener incident occurs.

Algorithmic Bias

According to Wikipedia, Algorithmic bias is described as systematic and repeatable errors in a computer system that create "unfair" outcomes, such as "privileging" one category over another in ways different from the intended function of the algorithm.

Types of AI bias

  1. Historical Bias - Classical case of Garbage-in, Garbage-out. Historical data is full of inherited societal biases. For example, predominantly company CEOs have been males during the last century. So if CEO profiles are used as a data set, and gender is included as a feature, there would be a bias towards predicting a male CEO out of a candidate pool. Another example is "word embeddings"; does a female image come to your mind along with the word "nurse"? This is because historically we tend to consider nurses as female. Try it out yourself: Open DALL-E and try "image of a nurse" and the majority of images would be of females. Try the same for "image of a CEO" and the majority of the images would be of males. Note that all major AI vendors take a lot of precautions to reduce biased outputs and to provide balanced predictions, so you would probably see a mixed result.

  2. Representational Bias -

This happens when the data set is not representative of the population that the model is used. For example, a healthcare diagnostic model trained to identify cancer biomarkers on datasets from the US may not work well in Asia. Another example is a popular image dataset - ImageNet which is used as a common data source in computer vision projects. Out of the 14 Million images, the majority of the images are collected from the US or Europe. So a computer vision program trained using this dataset may have issues identifying some images from Asia or Africa.

3. Measurement Bias - Measurement bias arises when selecting, gathering, or processing features and labels in a dataset. Sometimes a proxy such as "CreditScore" or "GPA" is used to capture a more complex construct. As an example, a city employs a predictive model to identify patients at high risk of developing serious health conditions. This model considers factors like previous medical history, current medications, and demographics to forecast healthcare expenses. The underlying assumption is that patients incurring higher healthcare costs (the proxy) are likely at higher risk. The developers are careful to not factor in whether a patient is from a rural or urban area. However, predictions may still show bias towards city populations. This is because the rural population might be poor & seek less medical care compared to urban populations who are more wealthy and have insurance coverage etc. This makes the model predict that the urban population is more prone to serious health issues whereas the rural population might have the same levels of health conditions (but couldn't afford healthcare.)

4. Aggregation bias - This occurs when data from different groups or categories are combined/aggregated, and the resulting model fails to accurately recognize the variations within these different groups. For example, suppose researchers are studying a new medication's effectiveness against a disease. They gather data from patients across different age groups, ethnicities, and genders but analyze the data as one large group, without considering the variations. The medication might be very effective for some groups and less so for others. However, because of the aggregated analysis, the conclusion might be that it is a moderately effective treatment for the whole population.

5. Evaluation Bias - When evaluating a model, if the benchmark data (used to compare models) does not represent the population that the model will serve. For example, let's say that you're evaluating a few computer vision algorithms for a project to identify animals. You identify a model, which accurately identifies a diverse range of land animals based on the benchmark/test data. When you put it to use, most of the images to identify are of fish and birds, and the model struggles with accuracy.

6. Deployment Bias - These are types of biases that arise not from the training data or the algorithm itself, but from how and where the AI system is deployed and used in the real world.

For example, take a Predictive policing system that is developed to identify criminal activity hotspots. It's trained with a good amount of diverse training data and works well in a cosmopolitan city identifying areas which are more prone to criminal activity. When you apply the same model to a small suburban area with different dynamics, the model might start to identify wrong areas as criminal hotspots and miss actual hotspots that need more surveillance.

Another example is an algorithm trained to identify explicit comments. The algorithm may do a good job on a social media platform. But if you use the same algorithm to moderate text comments on a platform like YouTube, it might inaccurately identify some of the comments including parts of rap song lyrics as inappropriate.

Next post:

In the next blog post, let's take a look at some real-life AI bias scenarios that happened during the past few years. Do you feel that you've become a victim of AI bias in recent times? Comment to share your experience.

Additional tools on AI Bias:



bottom of page