Request a Demo Blog

Detecting Deepfakes & AI Content: where does the responsibility lie?

November 15, 2023 | UGC

The prevalence of AI-generated content has surged in recent years. With harmful deepfakes and synthetic media becoming increasingly common on platforms across the web, the question on everyone’s mind is: how can I recognize manipulated media and protect myself online?

While many people believe the responsibility for generative AI content moderation lies with the platforms – to automatically recognize and remove AI-generated content as it crops up – distinguishing between human-generated and AI-generated media remains difficult for people and machines alike.

In part 2 of their conversation, Sam Gregory, Executive Director at human rights organization WITNESS, speaks to WebPurify’s VP of Trust & Safety, Alexandra Popken, about where the responsibility lies when it comes to detecting and removing AI-generated content.

Detecting Deepfakes & AI Content: where does the responsibility lie?

In a recent US consumer survey conducted by WebPurify, 70% of respondents believe it is a platform’s responsibility to detect and remove harmful AI-generated content such as deepfakes, while 75% of respondents believe more should be done to protect users from potential risks of AI-generated content. What are your thoughts on this? Do you think there’s a shared responsibility here between AI developers, platforms and users?

Sam Gregory: At WITNESS, we think there’s a pipeline of responsibility, meaning there’s a responsibility that goes even further back than we’ve usually described it. For example, we often talk about platform responsibility as a concept to place an onus on platforms to act in line with core human rights, risk prevention values, or to inform users. But in the case of AI-generated content, we might go further back to the foundation model, particularly if we’re trying to understand that something was made with AI.

For the 75% of respondents who think there’s more to be done, I think there’s a pipeline of responsibility that includes platforms. I’m cautious about saying platforms have all the responsibility. It’s clear at the moment that both signals of provenance of AI-generated content and detection are not internet scale reliable, so we have to be very careful about placing all the onus there. I don’t think platforms will do it completely effectively.

When we look globally at these challenges, there is a concern that placing all the content moderation emphasis on platforms is a dangerous step, given these systems are unreliable and given the history of lack of equitable application of resources and tools at a global level. I’ve engaged with content moderation from communities that have been frustrated with the way that their own content hasn’t been understood – either contextualized content that should have stayed up or content that should have been taken down and stayed up.

Instead, it’s a necessary step for platforms to be better at helping people understand when something incorporates AI-generation by providing the signals. They should be providing information to users about what they identify about the use of AI in content as a signal to help users understand and then apply media literacy. A big problem at the moment is we’re asking people to apply media literacy to AI content with no signals from the models, tools, platform or creators, which makes it really hard.

That’s separate from a platform’s responsibility to detect content that shouldn’t be up on a platform, like CSAM, whether they’re AI-generated or not.

Two-thirds of respondents in our survey would feel more comfortable using platforms that have measures to control or limit AI-generated content, or that require it to be clearly labeled. Last month, the White House announced that prominent AI companies have committed to developing measures to ensure that users know what is AI-generated. How effective do you think measures, like watermarking, will be?

Sam Gregory: At WITNESS, we’ve been a strong voice around authenticity and provenance for a decade and have engaged in initiatives around the infrastructure of this for five years. Many of the measures being discussed in the White House describe something around labeling, watermarking or disclosure. However, the first challenge is that we’re not really clear on what that means. It’s really important to recognize that very binary visible labeling, where we say it’s AI-generated or place a little watermark, is not going to be effective in the long run because of the ways we know people easily remove those watermarks as they circulate.

Also, AI-based production is not going to be binary. It will be a part of media production, rather than simply being AI or not. A lot of the public perception around watermarking and labeling is that it’s a very visible and straightforward yes or no situation. We have to be careful of the messaging we give to the public around what this process needs to be.

We’ve supported initiatives like C2PA and the Content Authenticity Initiative because they understand media production as a process. This includes AI-based media, camera media and digital media of other formats. It also includes editing, which might take place to protect identity for example, like to blur a face. It’s not about an immutable idea that something is AI or not AI at a certain point. It’s about everything as a process in an increasingly complex media and personal communication world.

I see people’s support for labeling and I think people want these signals and I appreciate that. I think the devil is in the details of how to do that. For example, in the watermarking commitments coming out of the White House, one thing we’ve said strongly and very publicly, including in briefings to the National AI Advisory Council and other settings, is that for AI you need to focus on how it was made, not the who and the why. So, you need to know if this was made by a particular AI model, or that AI was used in a process, but not who used the tool.

We have to be very careful from a privacy and freedom of expression point of view about making a correlation to identity, particularly when placing these tools in a global context of surveillance and authoritarianism. Although there may be a context where people would like to know the identity of a media maker, that can’t be the general rule. Also, it’s not necessary if we’re trying to understand media as a process.

At the same time, I deeply believe we need to get better at providing this labeling and disclosure to people experiencing media, and some of that labeling might be visible, like with a watermark. However, a significant amount of it is likely to be what we might describe as invisible disclosure, which is machine data that you could easily access in a platform, search engine or app.

Half of our survey respondents are confident in the ability of platforms to detect and handle AI-generated content, such as deepfakes, with a quarter who are very confident and 41% who are not confident at all. What are your thoughts on this?

Sam Gregory: As this is a US survey, there’s skepticism about platform power in the United States. There’s also perhaps an overconfidence of people in the platform’s ability – detection is not yet adequate. There’s not a shared standard on provenance and authenticity the platforms can use. Even now, you can see the platforms are still struggling to write their policies.

Right now, TikTok has been furthest ahead, but a number of platforms have not yet revised their policies, so platforms are not confident from a technical perspective, particularly when you generalize this across a whole range of modes of synthesis and media.

The ray of light in this is the hope that there’s a chance for a reset on some of the content moderation approaches that people are taking now, and we should take this opportunity. We’re going to have more AI-generated media and must consider how to do this in a way that works well globally and gets the right combination of using AI to detect AI, which we’re going have to do, and improving our use of provenance tools, but also really bringing in contextualized human experience to understand the way in which a more complex media ecosystem is going to use AI, create more volume and more content, including creative, satirical, political and harmful content. We need to work out how we navigate that together.

What are your thoughts on how we mitigate some of these harms? Is it using AI to police AI? Tell me more about those potential future solutions for detecting AI-generated content.

Sam Gregory: In the real world, there’s going to be a combination of solutions. Right now, a lot of where I look is: what can platforms as well as other companies in that pipeline do? Also, for someone who’s experiencing this content in the world, what are the options available to them? Either as a professional or someone in an intermediary position of influence, like a journalist or an ordinary consumer.

As it stands, the technical solutions have stark limitations to them. On the authenticity and provenance, there’s not a shared standard. If we develop a shared standard, it has to be one that protects privacy, that is accessible, that works across a range of platforms and search engines and that works in a way that can easily be done globally. And we are not there yet.

On the detection side, most tools tend to work well for specific types of media, for example, detecting a particular way something was synthesized with a particular underlying model. However, these don’t generalize well to new ways to generate media, and they often don’t detect existing manual manipulations of media – so they might detect machine learning or AI, but not that someone did a different type of crop or edit. There are obviously tools to look for that, but an AI tool won’t do that.

It’s interesting, we’ve been running a Deepfakes Rapid Response Taskforce that brings together around 30 of the leading media forensics and detection experts. So, we have a very firsthand experience of the challenges of detection when they play out around real-world cases.

It still requires a lot of expertise to make judgments between a suite of tools. Like, if I suspect this is the case, then how do I apply it? How do I develop a personalized model for cases when I’m really trying to confirm that the audio of someone is actually that person. There’s still a lot of expert knowledge required to do this work.

It’s about saying, we’re going to have to triangulate between multiple tools and have to know what the tool is. We may need to develop a specific way of doing it or feed particular training data. So, there’s a real challenge. And, of course, most of those are not internet-scalable.

When I look at the platform side, it’s hard to imagine where the detection solution is really going to work, given what we know. I don’t think it’s possible for a platform to promise detection across a range of audio-visual material, at the level of avoidance of false positives and false negatives. So, even on the platform side, it’s a signal among others.

So, we need to have better detection capacity globally. We need to have better authenticity and provenance standards, which are going to provide signals to users and to platforms. But they’re not going to be a confirmation. They’re going to need to sit within a human infrastructure of trust and review in order to be effective.

This needs to be one that applies globally. We often talk about what we describe as a detection equity gap globally. This means, even when we come to the existing tools, are they being deployed and made available to journalists and civil society in a global way? And are they available in the same way for different language groups, say at the platform level? There’s a detection equity gap in both the content moderation side and the civil society and journalism side.

Speaking of human infrastructure, some say these increasingly sophisticated generative AI or large language models are going to negate the need for human moderators. Ironically, we’re seeing an influx of work to moderate AI-generated content with our humans at WebPurify. What are your thoughts on this? How will this disrupt the way in which content moderation is done today?

Sam Gregory: I’m not as deep inside the content moderation industry, so take this with a grain of salt. For many years I’ve been thinking about this, but I’m not an insider. I’m always skeptical of silver bullet solutions, such as AI to detect AI, as we know it doesn’t work with generative content. We’ve not reached a point where we can do that with any of the forms of generative content, be it from image to audio, to video to text. The current broadly available commercial tools are easily fooled by cropping, by making images low-res, by a bunch of the things that just naturally happen in a social media or content ecosystem, or could be done as counter forensics or anti-forensics by a malicious actor.

Knowing the state of play at the moment, I think you can use AI as a triage system that gives signals. There’s a very key role for informed content moderation alongside that, which I don’t see going away and I don’t think it would be ethical to take it away. Now, of course, how do you do that in content moderation?

I know you talked with Alexa Koenig, a peer in my field who I’ve worked with many times, about how we structure content moderation in a way that’s protective of the labor rights and integrity of those individuals. So, I don’t see a way away from human moderation, knowing where we’re going on the detection side of generative AI. We’re not at a point where there’s a replacement there. And, I’m not sure that’s optimal, given what we know about the biases in specific generative AI systems. Instead, we need to double down on responsible, well-resourced contextualized content moderation.

Learn more about the work WITNESS is doing within the human rights and technology sectors at and