
Several leading artificial intelligence companies pledged recently to remove nude images from the data sources they use to train their AI products, and committed to other safeguards to curb the spread of harmful sexual deepfake imagery.
In a deal brokered by the Biden administration, tech companies Adobe, Anthropic, Cohere, Microsoft and OpenAI said they would voluntarily commit to removing nude images from AI training datasets “when appropriate and depending on the purpose of the model.”
The White House announcement was part of a broader campaign against image-based sexual abuse of children as well as the creation of intimate AI deepfake images of adults without their consent.
Such images have “skyrocketed, disproportionately targeting women, children, and LGBTQI+ people, and emerging as one of the fastest growing harmful uses of AI to date,” said a statement from the White House’s Office of Science and Technology Policy.
Deepfakes, especially of a sexual nature, have drawn widespread criticism due to their ability to severely violate privacy and facilitate harassment. These AI-generated images or videos, which often superimpose someone’s face onto explicit material without consent, can cause significant emotional and reputational damage. Victims of these deepfakes face profound personal consequences, including loss of control over their own image and identity, along with the potential for ongoing harassment once the content spreads online.
The creation of deepfakes is enabled by AI models trained on vast amounts of data, much of which comes from publicly available images on the internet. When these datasets contain explicit material, AI systems inadvertently learn patterns that allow them to replicate or create harmful content. The sheer scale of these datasets makes it difficult to control what type of content is included, leading to the possibility that models trained on such data can be exploited to generate damaging and non-consensual sexual imagery.
By removing explicit content from these training datasets and improving the ways AI systems identify and prevent misuse, companies are attempting to tackle the ethical challenges associated with deepfakes. This effort aims to mitigate the risks of AI-generated sexual content being used for harassment or privacy violations.
Joining the tech companies for part of the pledge was Common Crawl, a repository of data constantly trawled from the open internet that’s a key source used to train AI chatbots and image-generators. It committed more broadly to responsibly sourcing its datasets and safeguarding them from image-based sexual abuse.
In a separate pledge, another group of companies — among them Bumble, Discord, Match Group, Meta, Microsoft, and TikTok — announced a set of voluntary principles to prevent image-based sexual abuse. The announcements were tied to the 30th anniversary of the Violence Against Women Act.