Thorn’s ‘Safety by Design’: A Summary’
June 11, 2024 | UGCAI has supercharged productivity and expression but with it comes the risk that these technologies are abused. It’s why, with the industry, we’re celebrating the arrival of the whitepaper, ‘Safety by Design for Generative AI: Preventing Child Sexual Abuse.’
Written by Thorn, a nonprofit dedicated to defending children from sexual abuse, and All Tech Is Human, an organization dedicated to collectively tackling tech and society’s complex problems, the paper outlines principles that guard against the creation and spread of AI-generated child sexual abuse material and other sexual harms against children.
“We find ourselves in a rare moment — a window of opportunity — to still go down the right path with generative AI and ensure children are protected as the technology is built,” says Thorn.
Amazon, Anthropic, Google, Meta, Microsoft, OpenAI, and more have publicly committed to these principles.
“This misuse [of Gen AI], and its associated downstream harm, is already occurring, and warrants collective action, today,” the report says, in a call for action. “The prevalence of AIG-CSAM is small, but growing. Now is the time to act, and put child safety at the center of this technology as it emerges. Now is the time for Safety by Design.”
To effectively combat the misuse of generative AI, companies must embed safety measures from the very beginning of the AI lifecycle. This involves integrating safeguards during the design, development, and deployment stages, ensuring that child safety is a core consideration at every step. Proactive measures can prevent potential harms before they arise, fostering a safer environment for all users.
“Whether you are an AI Developer, AI Provider, Data Hosting Platform, Social Platform or Search Engine, you can join the effort to minimize the possibility of generative AI being misused to further sexual harms against children,” says the report.
Here is our 11-point summary of the actions and principles laid out in the report – guidelines for teams to take and build upon in proactive and actionable ways. And for more on combating the misuse of generative AI, see our blog on our dual approach to generative AI content moderation.
DEVELOP: Develop, build, and train generative AI models that proactively address child safety risks
1. Responsible Data Sourcing
A cornerstone of safe AI development is the responsible sourcing of training data. Developers must ensure their datasets are free from CSAM and CSEM by implementing rigorous detection and removal processes. Reporting any confirmed CSAM to the relevant authorities is also crucial. By maintaining clean datasets, developers can mitigate the risk of generative models producing harmful content.
2. Structured Stress Testing
Continuous learning and iterative stress testing are essential for understanding and mitigating a model’s ability to generate abusive content. Regularly testing models throughout the development process helps identify vulnerabilities and integrate safety improvements, reducing the likelihood of misuse by bad actors.
3. Content Provenance and Watermarking
Effective content provenance solutions are critical for distinguishing AI-generated content. Techniques like watermarking can embed imperceptible signals within generated content, aiding in the identification and mitigation of AI-generated CSAM. These measures help law enforcement and other entities quickly identify and respond to harmful content.
DEPLOY: Release and distribute generative AI models after they have been trained and evaluated for child safety, providing protections throughout the process
4. Responding to Abusive Content
It is crucial to combat and respond to abusive content (CSAM, AIG-CSAM, and CSEM) throughout our generative AI systems and incorporate prevention efforts. Users’ voices are key, and platforms should empower users to report content that may violate child safety policies through intuitive and accessible reporting mechanisms. Real-time reporting options and user feedback pathways help build a comprehensive understanding of a model’s limitations and inform necessary adjustments to improve safety.
5. Responsible Hosting
Safety by design must encompass not just how a model is trained, but how models are hosted: assessing them – e.g. via red teaming or phased deployment – for their potential to generate AIG-CSAM and CSEM, and implementing mitigations before hosting.
6. Encouraging Developer Ownership
Developer creativity is the lifeblood of progress. Platforms must encourage developer ownership in safety by design: for example, providing information about models, including a child safety section detailing steps taken to avoid the downstream misuse of the model to further sexual harms against children.
MAINTAIN: Maintain model and platform safety by continuing to actively understand and respond to child safety risks
7. Enforcement Mechanisms
Robust enforcement mechanisms are necessary to address violations of child safety policies. Companies should establish clear procedures for dealing with offenders, including preserving information to meet legal requirements and taking appropriate actions against violators. These mechanisms help prevent repeated violations and ensure accountability.
8. Combatting AIG-CSAM
Before hosting generative models, AI providers must assess their potential to generate AI-generated child sexual abuse material (AIG-CSAM). High-risk models should be updated with mitigations or restricted to prevent the creation of harmful content. This proactive assessment helps limit the availability of tools that could be misused.
9. Removing Harmful Tools and Services
Search engines and platforms must take decisive action to remove links to services that violate children’s rights, such as those being used to “nudify” images of children or create AIG-CSAM. Platforms can prevent their services from increasing the amount of access to these harmful tools.
10. Invest in Research and Future Technology Solutions
Continued research is vital for staying ahead of new harm vectors and threats. Investing in technology to protect user content from AI manipulation is essential to prevent online sexual abuse and exploitation. Ongoing innovation in this area will help maintain a robust defense against emerging risks.
11. Collaborative Efforts and Standardized Assessments
Collaboration across the industry and the development of standardized safety assessments are crucial for consistent and transparent evaluation of AI models. Building a shared dataset of known prompts that generate AIG-CSAM and partnering with organizations like National Institute of Standards and Technology at the US Department of Commerce can support the development of effective safety measures and foster a collaborative approach to child protection.