Measuring the effectiveness of content moderation efforts

July 7, 2023 | UGC

Effective content moderation, a topic of great discussion among trust and safety professionals, is crucial for maintaining a positive and safe environment on platforms ranging from social media to e-commerce websites to community forums. Defining ‘effective’ and measuring the success of content moderation efforts can be complex, as the goals can significantly differ based on business models, community needs, and risk tolerance.

Alex Popken, WebPurify’s VP of Trust & Safety, champions an approach to content moderation that is sensitive to the unique needs and risks of each platform. She recommends defining objective-based metrics that align with an organization’s trust and safety objectives, and then establishing corresponding goals or targets. A data-driven approach is fundamental to this process, allowing companies to understand their baseline condition and track progress over time.

Alex underscores the importance of monitoring not just the volume of violations – the total quantity – but also their prevalence and reach. Prevalence indicates the proportion of violative content within the platform’s overall ecosystem, and reach (measured in views or impressions) provides a proxy for how many users may have been exposed to violative content. User reports also serve as an important safety net in instances when violative content evades machine and human detection, and can be an important metric to track as it indicates how many users had to report a violation before it was acted upon. Finally, ad-supported platforms should measure the adjacency of ads to unsafe content, an important metric for brand safety.

Other metrics that help a trust and safety team understand the efficacy of their community guidelines and enforcement measures include the appeal, overturn, and escalation rates of moderation decisions, as well as customer satisfaction with support.

Alex also cautions that different platforms grapple with different types of policy violations based on their unique use cases. For instance, a stock photo platform might deal predominantly with intellectual property violations, whereas offensive language might be more prevalent on a gaming platform. Moreover, emerging technologies and digital spaces, such as the metaverse, present their own distinct challenges, necessitating partnerships with content moderation providers having extensive industry and multimedia experience.

In this journey towards effective content moderation, the struggle for many teams lies in measuring policy effectiveness, which is often steeped in ambiguity. It’s not only about how diligently a policy is enforced but also how well it resonates with the platform’s needs and its user community. As such, Alex encourages a nuanced approach tailored to the unique needs of each platform, whether that involves creating a strategy from scratch or recalibrating an existing one.

Evaluating Platform Health

As part of measuring the overall health of a company’s user-generated content ecosystem, it’s recommended that one analyzes violations across different variables to identify gaps and improve content moderation. “It’s important to look at the volume of violations broken out by many variables such as policy, media type and region, as this will give you insight into gaps or risk vectors in your ecosystem,” Alex advises. She adds that the prevalence of violations is an essential indicator that provides insight into the scale of a problem.

“In addition to the absolute number of violations, the prevalence of violations is a table-stakes metric to track. If 80% of your content is violative, that’s significantly more concerning than 1% and likely warrants an overhaul of your content moderation set-up,” she explains.

A company’s content moderation success is typically determined by an objective-based approach. “Success may look like the effective removal of harmful content, consistent moderation decisions, swift resolution on issues, transparent policies and enforcement practices,” she adds.

As for adjusting moderation efforts in response to changes in user behavior, platform usage, or societal norms, Alex emphasizes a balance of qualitative and quantitative approaches.

Accuracy of Content Moderation

Accuracy in content moderation is paramount, and it’s recommended to look at both machine and human decisions when determining accuracy. WebPurify regularly audits a representative sample of its moderation team’s decisions to ensure quality and consistency in decision-making.

“Errors are typically addressed one-on-one or via team feedback, and if we’re observing a trend of errors on the same community guideline, for example, we might evaluate our training and workflow to ensure there aren’t gaps,” Alex says. “Really, we’re looking at instances in which our moderators incorrectly flagged compliant content (false positives), or instances in which they did not flag violative content (false negatives).”

Improvement in the accuracy of human moderation teams can be achieved through robust training and workflows, regular auditing, ongoing testing, and feedback loops.

For machine moderation, the metrics to measure accuracy are precision and recall. Precision measures the rate of false positives, while recall measures the rate of false negatives.

Measuring the precision and recall of automated content moderation tools involves comparing the tool’s decisions to a ground truth or a reference set of content that has been manually reviewed and labeled by human moderators. Precision = True Positives / (True Positives + False Positives); Recall = True Positives / (True Positives + False Negatives).

Alex elaborates: “A high precision indicates a low rate of false positives, meaning that your AI is accurately identifying problematic content. A high recall indicates a low rate of false negatives, meaning that your AI is effectively capturing a significant portion of the problematic content. I highly recommend creating an automated dashboard that allows you to evaluate these metrics easily on an ongoing basis.”

If discrepancies occur with automated and human moderation accuracy, Alex suggests performing a root cause analysis to understand what caused the error and taking necessary steps to prevent it from happening again.

What’s more, WebPurify works with its clients to tighten up and more clearly define community guidelines so they can be more effectively enforced and measured. “Oftentimes we’ll see clients hand us really vague policies, like, ‘no hate speech.’ We work with them to create an actionable workflow that defines hate speech, provides examples, and makes it really clear-cut for our moderators.”

Speed of Content Moderation

The issue of productivity within content moderation is a complex one. There are many intricacies involved in defining productivity parameters and attaining a balance between speed and accuracy.

“Defining productivity involves more than a simplistic understanding of performance metrics,” Alex says. She explains that productivity is measured in relation to service level agreements (SLA) or review times, both of which are determined by a client’s specific business needs.

“We work closely with our clients to align upon these expectations. Whether we are reviewing content before it goes live on a platform requiring a tight turnaround time, or handling sensitive issues such as CSAM (child sexual abuse material) needing an even faster response, our productivity scales to meet these demands,” she says. To maintain this, WebPurify relies on a system of “tooling and reporting” which helps monitor productivity, while also identifying any shortfalls that need to be addressed via increased staffing or expectation setting.

Productivity, however, is not an end in itself. A delicate equilibrium exists between the need for speed and the prerequisite for accurate moderation. “We firmly believe in not sacrificing quality for productivity,” Alex asserts. “Erroneous moderation decisions can inflict harm in multiple ways, from serving inappropriate content to users, to damaging a brand’s reputation.”

Alex emphasizes that productivity and accuracy are not opposing factors, but complementary metrics. “Well-trained teams, equipped with efficient workflows and tools, are not only more productive but also more confident in their decision-making. Moreover, the capacity to escalate unclear cases to a subject matter expert is a critical component of our process, ensuring that moderation does not become a bottleneck.”

When it comes to enhancing productivity within content moderation teams, tools are imperative. Alex underscores the importance of minimizing the number of actions a moderator needs to perform during a single review. “Avoid burdening moderators with multiple tools or excessive clicks for a single action. A carefully thought-out user flow can enhance efficiency and cut down on unnecessary manual effort,” she says. WebPurify also deploys tools such as story-boarding for videos that expedites the content review process while giving moderators a comprehensive view.

Thus, the key to better productivity lies in the optimal blending of tools, training, and thoughtful workflow design.

Customer Satisfaction

Understanding customer satisfaction in relation to content moderation is also a nuanced process. “Surveys and feedback forms are essential when it comes to assessing customer satisfaction,” Alex notes. These tools are indispensable for gaining insights into user appeal of a moderation decision.

Interestingly, Alex acknowledges that trust and safety teams often exhibit a degree of reluctance to measure a Customer Satisfaction Score (CSAT). “This hesitancy stems from the fact that the enforcement of community guidelines doesn’t always bear positive news,” she explains. “However, CSAT serves as a critical signal, offering a glimpse into users’ perception of your policies, their clarity, and ease of adherence.”

In Alex’s view, CSAT is crucial for ensuring users comprehend and perceive the “rules of the road” as being fair and transparent. “CSAT is one signal, but an instructive one,” she says.

Consistently poor CSAT scores related to a specific policy should trigger a reevaluation of that policy. “The objective should be to ascertain whether the policy is appropriately framed and enforced. At times, guidelines may be overly restrictive, and the only way platforms become aware of this is through user feedback.”

To address issues of low satisfaction scores, Alex advises going straight to the source – the users. “Focus groups with users can prove to be a highly effective way of getting feedback on a multitude of aspects, from product usability to your policies,” she says. In this manner, customer satisfaction serves as an invaluable compass, guiding platforms towards more user-centric and effective policies.

Brand Safety for Advertisers

While user safety is vital for any UGC platform, those hosting ads have additional safety considerations. “In our current economic climate, where marketers are increasingly selective about where to invest, only platforms that offer brand safety will be successful,” Alex says, highlighting three pivotal aspects for ensuring brand safety:

Policies and Enforcement: Platforms need clear standards dictating where ads should not appear, and these should be “layered on top of platform policies.” These platforms also need robust content moderation capabilities, involving both AI and human efforts, to enforce these standards at scale.
Advertiser Controls: Recognizing that brand safety can be subjective due to differing risk tolerances across brands, Alex recommends offering advertisers product controls to establish their suitability preferences.
Partnerships: There’s real value in joining alliances, such as the Global Alliance for Responsible Media, that bring platforms together to align on common frameworks and best practices for brand safety.

To measure brand safety, your advertisers typically want to know when their ads have appeared adjacent to brand-unsafe content. To this end, brands often want to see an independent body audit your platform as opposed to platforms grading their own efforts. Firms like Integral Ad Science and DoubleVerify are some of the many solutions out there.

In dealing with situations where ads are placed next to brand-unsafe content, it again comes back to the importance of working with an experienced content moderation provider. WebPurify works closely with clients to establish suitability preferences regarding content categories they do not want ads to serve adjacent to, and a system for enforcing them at scale.

WebPurify also offers a deep understanding of the intricate layers that make up successful content moderation and understands the importance of customizing these strategies to fit the unique needs and risks each platform faces. Whether you’re starting from scratch or looking to refine your existing strategy, our experienced team can help guide your approach and set you on the path to achieving more effective content moderation.

Measuring the effectiveness of content moderation efforts

Evaluating Platform Health

Accuracy of Content Moderation

Speed of Content Moderation

Customer Satisfaction

Brand Safety for Advertisers

Request Demo