Twitter and Instagram Update Policies to Combat Bullying

August 28, 2019 | By WebPurify | Image Moderation, UGC

In previous posts, we’ve discussed the content moderation policies of large social media platforms and the factors that are catalyzing their changes. Recently, Twitter and Instagram announced updates to how they monitor content in an effort to combat bullying on their respective sites.Both companies are working toward the same goal, but each is taking a different approach to the problem.

Instagram’s Approach

In October of 2018, Adam Mosseri took over as the head of Instagram, just in time for an increase in outside pressure on social media companies. It should come as little surprise that the company is working to show that it is determined to rid Instagram of bullying. One ambition that Mosseri brought to the position is to not only improve their AI to spot bullying (a complex concept for non-humans), but to even spot bullying in photos and videos.

DeepText, the same AI tool used by Facebook, served as Instagram’s base for content moderation with the original purpose of spotting spam. Only a year later, DeepText was trained to spot racial slurs and other kinds of offensive comments. Now, with the daunting task of identifying problematic content in photos and videos, the AI needs further training.

“My take is that people and technology can and should work in tandem,” says Mosseri. Put into practice for content moderation, this starts with human user generated content moderators looking at hundreds of thousands of pieces of user content and making a call on which ones are examples of bullying. They then begin the process of teaching the AI to recognize bullying off of the patterns in the human-curated content. AI “classifiers” – now currently flagging content – attempt to flag bullying, and then the engineers at Instagram judge whether or not a classifier got it right. There are other signals that engineers look for that may suggest harassment, like if a user has blocked someone in the past or if the username of a suspect profile is close to a previously-booted username.

Through trial and error, the AI is expected to improve with the ultimate goal of the technology being to place content deemed harmful into one of seven subcategories – insults, shaming, disrespect, threats, betrayals, unwanted contact, and identity attacks.

Policy Updates

In the meantime, the company introduced two new features. The first, the “comment warning,” pops up when content is flagged for bullying in an effort to have the user reconsider before posting potentially offensive material.

The second is called Restrict. This feature is essentially a more under-the-radar way of blocking bullies. It allows users to preview comments from restricted users and decide whether or not to approve, delete, or leave them pending. Only the restricted user will be able to see their comment if it’s deleted, so they’ll be none the wiser.

Twitter’s Approach

In contrast with Instagram, Twitter is narrowing its focus in its effort to identify harassment online. The initial scope of Twitter’s policy, as it was announced last year, included all language that degraded users based on race, sexual orientation, or political opinion. However, Twitter’s announcement on July 9^th highlighted bullying with regard to religion in particular. The company has decided to direct their focus in this way as a result of research showing an increased risk of physical harm for social media users targeted for their religion.

Like Instagram, Twitter leverages AI along with user reports to find speech that qualifies as hateful. Certain terms suggest to the AI that the derogatory content is religion-based, for example, terminology related to vermin, plagues, or uncleanliness. Once a piece of content is flagged by the AI, a human moderator decides if it’s inappropriate and needs to be taken down. Users are asked to take down the posts deemed inappropriate and will have their accounts locked if they refuse.

Context is Everything

At WebPurify we have always believed that AI and humans need to work together to most effectively moderate user-generated content. Both Twitter and Instagram are embracing this notion, but only Twitter’s moderators make the final judgment call on all content that their AI flags. Instagram, on the other hand, appears to be aiming to create more sophisticated AI that lightens the load of humans.

The tricky part of all of this is context. How can you teach a machine to understand satire? This is especially important on social media, where satire and irony run rampant. How can a machine grasp the context of each religion and all its practices? That is, just because strong language isn’t being used, certain religious references can do harm based on the intended target’s religion (e.g., pork for both Muslims and Jews) – this in particular will be a challenge for Twitter’s team. What’s more, there are many subcultures that exist on social media that bring their own context and language, which is constantly changing at an alarming rate online.

Will These Changes Work?

Content moderation on this scale is undeniably a daunting task. First, there’s no telling how effective these methods will be. Secondly, changes in policy potentially threaten these companies’ revenue – that is, users who disagree with the amount of control placed on what they post tend to leave.

Perhaps for both reasons, neither company is willing to go into detail about just how many moderators they’re using nor how successful these endeavors have been so far. Whether or not these social media giants will continue to build on these efforts or pivot entirely is likely to be seen in the not-so-distant future.

Twitter and Instagram Update Policies to Combat Bullying

Request Demo

Stay ahead in Trust & Safety! Subscribe for expert insights, top moderation strategies, and the latest best practices.