“Imagine a scenario in which self-driving cars fail to recognize people of color as people—and are thus more likely to hit them—because the computers were trained on data sets of photos in which such people were absent or underrepresented,” Joy Buolamwini, a computer scientist and researcher at MIT, told Fortune in a recent interview.
Buolamwini’s research revealed that facial recognition software from tech giants Microsoft, IBM and Amazon, among others could identify lighter-skinned men but not darker-skinned women.
How does this happen? It happens because of something that is mounting alarm: algorithmic bias.
Algorithms are the foundation of machine learning. They are what drives intelligent machines to make decisions. These machines are increasingly making decisions that have real world implications and consequences.
Algorithms are fed on data. Let’s say you want to teach a machine to recognize a cat. To do so, the machine needs to know what a cat is. So, you feed the algorithm thousands of images of cats until it can recognize a cat better than a human can. This process is how IBM’s Watson has become so good at recognizing cancer. Watson was trained on endless images of cancer and so now Watson can spot cancer better than most trained doctors.
Machine learning algorithms are all around us. They are helping us choose the next Netflix show to binge-watch and which posts will appear at the top of our social media feeds. They are helping to detect spam in our email inboxes and purchase new outfits.
Yet they are also helping to make decisions for us that have extreme ramifications. An algorithm may help you decide who you’re going to go out with on Saturday night. It may help police decide where they should send resources. It may guide the length of individual prison sentences. It may also decide which individuals get into a university, and who gets the job.
Hence, the surging concern. Any bias in these algorithms could negatively impact people or groups of people, including traditionally marginalized groups.
To dive deeper into this issue, I spoke with Corey White, Senior Vice President at Future Point of View, a technology strategy firm. FPOV is engaged by organizations to help them look out into the future and anticipate trends that will disrupt their industries. Because of its potential, AI/ML has become a major focus for the organization.
“These technologies have the potential to be so transformative, but because of their power, they must be thoughtfully developed and applied,” White states.
White began studying algorithmic bias intensely several years ago because he realized the enormous harm biased algorithms could cause. Today, he speaks on the topic to audiences of all backgrounds in order to raise awareness about the issue.
He explains, “Machines, like humans, are guided by data and experience.” If that data or experience is mistaken or crooked, a biased decision can be made, whether that decision is made by a human or a machine.
In White’s presentation on algorithmic bias, he uses simple yet powerful examples of how algorithmic bias impacts our everyday lives and can reinforce existing societal stereotypes. One example in particular stood out to me as someone who is interested in gender equality.
Above represents a search Google Image search White did of Robert Downey Jr. in the middle of 2019. If you look at the most related searches they include Iron Man, Sherlock Holmes, Avengers, Tropic Thunder, and Civil War.
In 2019, Robert Downey Jr. was the third highest paid actor in Hollywood. Avengers: Endgame dominated the box office for much of the year. His co-star in that film, Scarlett Johansson was the highest paid actress in 2019 for the second year in a row.
Below is a Google image search for Scarlett Johansson that White conducted on the same day as the above search for Robert Downey Jr.
The related searches in this search include “body,” “cute,” “bed,” “photoshoot,” “makeup ” and Vanity Fair.
In spite of the fact that Robert Downey Jr and Scarlet Johansson are actors of equal accomplishment who both shared the screen in one of the highest grossing films in history, their related search results returned vastly different content.
As White depicts in this example, Robert Downey Jr.’s related search results mostly focus on his work. Meanwhile, Scarlet Johansson’s primarily focus on her appearance.
In order to best understand why and how this happens, White suggests we look at how bias can occur in algorithms by examining different types of biases most commonly found in datasets.
A unfortunately common example of Interaction Bias is facial recognition algorithms trained on datasets containing more Caucasian faces than African American faces. The groundbreaking study by Buolamwini, mentioned above, highlights this very problem.
“People who are misidentified or not identified by these algorithms could face increased harassment or detainment by law enforcement,” White states, elaborating on Buolamwini’s research, “especially considering facial recognition technologies are being used in facial recognition cameras at airports and national borders.”
In latent bias, an algorithm may incorrectly identify something based on historical data or because of a stereotype that already exists in society.
As White explains, “it may recognize a doctor to be male and not female, because the data tells it that doctors are primarily men. This could also be called prejudice bias. In it, our data characterizes our preconceived notions or historical predispositions and that unfavorably slants a dataset.”
Famously, Amazon had to scrap an automated recruiting tool that was favoring male candidates over female candidates because the algorithm was trained on historical patterns in which men are primarily those hired.
A dataset overrepresents one certain group and underrepresents another.
“Selection bias occurs when a data set contains vastly more information on one subgroup and not another,” says White.
For instance, many machine learning algorithms are taught by scraping the Internet for information. Major search engines and their algorithms were developed in the West. Therefore, algorithms are often more likely to recognize a bride and a groom in a western-style wedding but fail to do so in an African wedding.
It is clear that the real-world implications of biased algorithms are vast and enormous. “Algorithms are now a part of everything we do,” says White. “In many ways they’re judging us at every stage of life, and we might not even know it.”
So what can we do to ensure AI becomes more accurate, ethical and inclusive?
White believes that the first step is awareness, “People need to understand that you can’t blindly trust an algorithm.” He also notes that the progress on this front is promising.
“Government leaders from across this country and the world are beginning to ask important questions about these technologies and how they are being deployed and even if they should be deployed,” White states.
Furthermore, in the development of algorithms, we need to ensure that datasets are unbiased.
IBM’s AI Fairness 360 is an open source toolkit that helps developers test for bias in their datasets. Just this week, Facebook AI announced a new technique that marks the images in a dataset so that researchers can understand if a machine learning model was trained using those images.
This verification method, called “radioactive” data, allows for greater transparency when it comes to the data that a model is trained on.
“We also need to get better at constantly testing algorithms to ensure they are not unfairly marginalizing communities,” White adds. This means creating assessments to assess algorithms that are being used in communities.
“External review processes should be enabled to track the influence of the algorithm over time, and the communities that these algorithms will impact should be offered a public space to voice their concerns.”
Finally, we need to improve the diversity of data and of the people who are developing, deploying, and overseeing these algorithms.
White emphasizes this point, stating “We must have diverse voices at every stage of this journey or these algorithms may continue to disaffect certain communities to the benefit of others.”
A root of algorithmic inequality stems from a chronic lack of diversity in the technology field.
“When I say diversity,” White adds, “I mean people of different ethnicities, gender identities, religious and socio-economic backgrounds. I mean people in the LGBTQIA community. I mean people with physical disabilities and mental health disorders. An algorithm shouldn’t be built for one type of person. It should be built for all types of people.”
You can learn more about Corey White at his work on algorithmic bias FPOV’s speaking page.