3 min Analytics

IBM comes up with new dataset to address prejudices AI

IBM comes up with new dataset to address prejudices AI

Recent times have shown that artificial intelligence can be biased. This has to do with the dataset that is used in the development of AI. IBM hopes that its new database of more than a million faces will better reflect the real world and help to reduce that bias.

Face recognition is used for a variety of purposes. Think of unlocking your phone, but sometimes also to open your front door and assess your state of mind. But even the best-developed facial recognition does not pass the simplest test to recognize people with dark skin tones. The problem is broader than just the dataset that is being worked on, but it certainly has something to do with it.

Million faces

That’s why, according to TechCrunch, IBM is now introducing a new dataset, with a million faces. This set is made as diverse as possible and covers people from different backgrounds, but also ages. Face recognition also sometimes has a hard time recognizing older people.

In order for facial recognition to work as desired – both to be accurate and honest – training data must provide sufficient balance. The datasets with which we train the AI must be large and diverse enough to understand the many ways in which faces differ. The images should reflect the diversity of facial features that we see around the world.

IBM has taken the faces out of a hundred million image dataset from Flickr Creative Commons. To do that, IBM built another AI that simply searched for faces in the database. Those faces were cut out and then analyzed. Each face was precisely defined. Think of details about the distance between eyes, how big the forehead is and more. That information was used to create a facial print that the system could use. On the basis of this, faces are linked to each other.

In order to maximize this diversity, the IBM team has also taken into account other factors than dimensions. So there is also looked at skin color and gender. But because sex is not binary, it has been decided to place individuals on a scale of feminine and masculine. Everyone gets a rating between 0 and 1, on the basis of which IBM hopes to appeal to non-binary people as well. Finally, the whole thing looks at age, for which IBM relies on human judgement.

Incidentally, IBM cannot guarantee that its dataset is now fully representative. It does, however, think that this is a good starting point.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.