Request a Demo Blog

Content moderation glossary: key trust and safety terms explained

June 3, 2024 | Careers

Content moderation might seem like a simple idea in principle. In practice, it’s anything but. In fact, it’s a multifaceted process often involving a combination of human review, automated systems and community self-regulation. And as user-generated content (UGC) continues to proliferate across platforms, things are only getting more complex.

Consequently, the world of content moderation, and the related overarching industry of Trust and Safety, can sometimes feature terms that people working outside this specialized sphere don’t quite understand. To help you navigate the jargon of this rapidly growing space, our content moderation glossary provides a comprehensive understanding of the key terms and concepts related to the process of keeping online communities safe.

By understanding the key trust and safety terms and concepts outlined below, you’ll be better equipped to make informed decisions, implement effective strategies, and stay up-to-date with the latest developments in this increasingly important field.

General content moderation terms

Brand Safety: Brand safety refers to the strategies and measures taken by companies to protect their brand’s reputation when advertising online. It involves ensuring that ads do not appear alongside inappropriate, offensive, or harmful content that could damage consumer perception of the brand. This includes avoiding content that is NSFW, NSFA (see these definitions below), or otherwise controversial. Effective brand safety practices involve the use of content moderation, keyword blacklisting, and placement monitoring to maintain a positive and secure environment for brand advertisements. See more in our ebook The Unseen Side of Advertising.

Chat moderation: The moderation of real-time text-based communication channels, such as chat rooms or messaging platforms. For more information, see our blog on chat moderation and how we block abusive speech in real-time.

Community guidelines: A set of rules and policies established by online platforms or communities to govern user behavior, content creation and interactions. These guidelines help maintain a safe and respectful environment for all users.

Cyberbullying: The use of digital technologies to intentionally harass, threaten, humiliate or otherwise target a person or group. Also known as online bullying, online harassment or cyberharassment, cyberbullying typically involves sending, posting, or sharing negative, harmful or false content about someone with the intent to cause harm or distress. For more information, see our guide on cyberbullying statistics and strategies for parents.

Deepfakes: Artificially generated or manipulated audio, video, or images that appear real, but are actually fake. Deepfakes are typically used to create explicit or non-consensual content (such as nude pictures of celebrities), to spread misinformation (such as fake photos showing things that never happened), or to impersonate individuals (such as politicians or religious leaders). For more information, read our blog on Detecting Deepfakes and who is responsible.

Disinformation: False information that is deliberately spread to deceive. This distinguishes it from misinformation, which may be spread for innocent reasons. For more information, read our ebook on Misinformation and Disinformation.

Doxxing: The act of publicly revealing private or identifying information about someone online with malicious intent.

Explicit content: Any form of content that contains graphic depictions of nudity, sexual acts or extreme violence. Explicit content is typically restricted or prohibited on most online platforms to protect users, especially minors. Perhaps the most egregious type of explicit content is CSAM, or “child sexual abuse material”, something WebPurify expertly detects, removes and reports. Sadly, there remains a preponderance of CSAM online, so much so that we have a specialist team with dedicated spaces in-office and robust training tailored to the unique challenges of tackling this type of content. To learn a bit more, read on in our ebook here.

Flagging: The process of identifying and reporting potentially harmful or inappropriate content to moderators or platform administrators for review and potential removal or moderation.

Hate speech: Any form of communication that attacks, threatens, or insults individuals or groups based on protected characteristics such as race, ethnicity, religion, gender, sexual orientation or disability. Hate speech is typically prohibited on most online platforms. However, drawing the line between strong but legitimate opinions and hate speech is notoriously tricky. This means clear guidelines must be drawn up by the platform. Cultural nuances should also be taken into account: for example, describing a person as ‘colored’ is considered highly offensive in the United States, but not in South Africa where it has a more specific historical meaning.

Image moderation: The process of reviewing and moderating visual content such as images and videos to ensure compliance with community guidelines and prevent the spread of harmful or inappropriate content. To be effective, especially on sites with large numbers of images, this typically needs to be carried out by a mixture of automated and human-driven moderation. For more information, visit our Image Moderation service page.

Misinformation: False or misleading information that is spread unintentionally. It’s often generated or shared to influence people’s opinions, but alternatively the motivation may be simply to drive clicks and make money. The difference between misinformation and disinformation is intent – the people spreading disinformation know it is false and are trying to deceive. Content moderators play a crucial role in identifying and combating the spread of misinformation on online platforms.

NSFA (Not Safe for Ads): NSFA refers to content that is deemed unsuitable for monetization and ad placement. This includes material that advertisers typically avoid due to its potential to harm their brand image or violate advertising policies. NSFA content can include, but is not limited to, explicit language, graphic violence, adult themes, and controversial subjects. Platforms and advertisers use this designation to ensure that ads do not appear alongside content that could negatively impact their reputation or alienate their audience.

NSFW: Short for ‘Not Safe For Work’, this term describes content that would be considered inappropriate or offensive in a professional or public setting, such as explicit or mature content. This does not necessarily imply disapproval of the content itself; indeed, the term is most often used by the person posting it, as a warning to others.

Profanity filter: As the name suggests, this is a tool or system designed to detect and filter out profane, obscene or offensive language from user-generated content, to create a more family-friendly environment. For more information, see our Profanity Filter service page.

QC: In the context of content moderation, QC (short for Quality Control) describes the process of reviewing and evaluating the accuracy, consistency and effectiveness of content moderation decisions.

Revenge porn: The non-consensual distribution of explicit or intimate images or videos, often with the intent to harm or humiliate the person shown. Revenge porn is illegal in many jurisdictions and is prohibited on most online platforms.

Sextortion: Sextortion is a form of online blackmail where an individual is tricked or coerced into sharing explicit images or videos of themselves. The perpetrator then uses these materials to extort the victim, demanding money, additional explicit content, or other favors under the threat of releasing the compromising images or videos publicly. This crime can occur through various online platforms, including social media, dating apps, and email. Sextortion not only invades personal privacy but also causes significant emotional and psychological distress to the victims.

Text moderation: The process of reviewing and moderating text content, such as comments, posts, or messages. Learn more about text moderation and how it differs from a profanity filter.

UGC: Short for User-Generated Content, UGC is any form of content that is created and shared by users on online platforms or social media sites, including text, images, videos or audio.


Technology terms

AI: A field of computer science focused on developing systems capable of performing tasks that typically require human intelligence – such as visual perception, speech recognition, decision-making, and language translation. AI is increasingly used in content moderation for tasks such as identifying hate speech patterns and filtering spam.

API: Short for Application Programming Interface, an API is a set of protocols, routines and tools that specifies how software components should interact with each other. For example, if you wished to load TikTok videos with a particular hashtag onto your website automatically, you’d use the TikTok Display API to do so. APIs are also essential for integrating content moderation solutions with various platforms and services. Through a straightforward API integration, our clients can quickly connect their platforms to WebPurify’s moderation tools, which are continually updated to handle new challenges and content types. For more information, see our blog on Moderation as a Service.

Generative AI: Generative AI, aka ‘Gen AI’ is a subset of artificial intelligence that focuses on creating new content, such as text, images, audio or video, based on learning patterns from existing data. Popular generative AI platforms include DALL-E 2, Midjourney and Stable Diffusion. Generative AI models can also be used for content moderation tasks, such as detecting synthetic media or answering user appeal queries. Find out more about our generative AI content moderation strategies.

Keyword List / Keyword Blocking: Keyword list or keyword blocking is a content moderation technique used by platforms to filter out or restrict access to content based on specific words or phrases. This process involves creating a list of terms that are deemed inappropriate, harmful, or otherwise unsuitable according to the platform’s policies. When content containing any of these flagged keywords is detected, it can be automatically blocked, flagged for review, or restricted from being displayed. This method helps maintain a safer and more controlled online environment by preventing the spread of offensive or harmful material.

Machine learning: A field of artificial intelligence (AI) that focuses on developing computer algorithms that can improve automatically through experience and by the use of data.

Synthetic & manipulated media: This term describes any form of media, such as images, videos, or audio, which has been artificially created or manipulated using advanced technologies like generative AI models or deep learning techniques.

Trust and Safety Terms

Account moderation: The practice of monitoring and moderating user accounts on online platforms, to ensure compliance with community guidelines and prevent the spread of harmful or inappropriate content and conduct.

Age-gating: The process of restricting access to certain types of content based on the user’s age or age rating, typically implemented to protect minors from exposure to mature or explicit content.

Content filtering: The use of automated systems or human review processes to identify and remove or restrict access to specific types of content based on predefined criteria or community guidelines.

Content takedown: The process of removing content from a platform, often due to copyright infringement or violation of community guidelines.

Content scoring: The process of assigning a numerical score or rating to user-generated content based on its potential for harm, offensiveness or violation of community guidelines. This score can be used to prioritize content for moderation or to automatically take action on high-risk content.

Downranking: Lowering the visibility of content in search results or feeds, often used for content that might be misleading but doesn’t necessarily warrant removal.

Fraud detection: The practice of identifying and preventing online scams, such as phishing, to protect users and maintain the integrity of online platforms.

Moderation queue: A system or interface used by content moderators to review and process user-generated content that has been flagged or identified as potentially problematic.

Proactive moderation: The practice of actively monitoring and moderating content before it is reported or flagged by users, using automated systems or human review processes (or a mixture of both) to identify potential issues.

Reactive moderation: The process of responding to user reports or flags by reviewing and taking appropriate action on potentially harmful or inappropriate content.

Trust and Safety team: A dedicated group of professionals responsible for developing and implementing policies, processes, and technologies to ensure the safety and well-being of users on online platforms. WebPurify’s Trust and Safety Consultancy can help you establish your own in-house team and develop solutions to your most pressing moderation challenges.

User privacy: The protection of personal information and data of users on online platforms, including the implementation of measures to prevent unauthorized access, misuse, or disclosure of sensitive information.

User reporting: The process of allowing users to report instances of harassment, hate speech or other forms of abuse to platform moderators for review and potential action.

Verification systems: Processes and technologies used to verify the age, authenticity and legitimacy of user accounts, content, or information shared on online platforms, helping to combat fraud, impersonation, and the spread of misinformation.