As developers release new AI tools, the potential of repeating negative prejudices grows — especially in the aftermath of a year like 2020, which rebuilt many of our societal and cultural conventions, which AI algorithms have long been trained on.
A few foundational models are emerging that rely on a large amount of training data, which makes them inherently powerful, but they are not without the possibility of negative biases, which we must all acknowledge.
It is simple to recognize someone. Understanding is far more difficult, as is preventing future problems. That is to say, in order to better comprehend the risks associated with constructing AI models, we must first take steps to ensure that we understand the causes of these biases.
Today’s AI models are frequently pre-trained and open source, allowing researchers and businesses to swiftly use AI and adjust it to their individual requirements.
While this technique makes AI more commercially accessible, it also has a significant drawback: a small number of models currently support the majority of AI applications across industries and continents. Developers that modify these technologies for their applications are operating on a shaky basis because of undetected or unknown biases.
According to new research by Stanford’s Center for Research on Foundation Models, any biases detected in these foundational models or the data on which they’re based are passed down to individuals who use them, potentially amplifying the effect.
YFCC100M, for example, is a freely available data set from Flickr that is frequently used to train models. When you look at the photographs of individuals in this data set, you’ll see that the distribution of images around the world is highly skewed toward the United States, implying that people from other countries and cultures are underrepresented. These forms of skews in training data lead to AI models with under- or overrepresentation biases in their output, i.e., output that favors white or Western cultures.
There is a lack of transparency when several data sets are joined to form enormous sets of training data, and it can become increasingly difficult to tell if you have a fair mix of individuals, countries, and cultures. It should come as no surprise that the resulting AI models are published with blatant biases. Furthermore, when basic AI models are disclosed, little to no information about their limits is usually provided. Testing for potential issues is left to the end-user — a step that is frequently skipped.