Twitter and Instagram Unveil New Ways to Combat Hate—Again

Twitter and Instagram would like us all to be a little bit nicer to each other. To that end, this week both companies announced new content moderation policies that will, maybe, shield users from the unbridled harassment and hate speech we wreak on each other. Instagram’s anti-bullying initiative will rely on artificial intelligence, while Twitter will use human moderators to determine when language “dehumanizes others on the basis of religion.” In the end, both platforms face the same problem: In the blurry world of content moderation, context is everything and our technology isn’t up to the task.

In September, Twitter initially proposed a more ambitious policy targeting dehumanizing language aimed at a variety of groups, including people of different races, sexual orientation, or political beliefs. The platform then asked users for help developing guidelines to implement that policy. After 10 months and 8,000 responses, Twitter finally put a narrower version of the policy into action on Tuesday. Users can report tweets that compare religions to plagues or viruses or that describe certain groups as insects or rodents. Twitter’s AI will also search out those derogatory terms, but the suspect tweets will always be reviewed by a human, who will make the final call. If they decide the tweet is inappropriate, Twitter will alert the offending user and ask them to take down the post; if the user refuses, Twitter will lock the account.

Twitter says the more focused policy will allow it to test how to moderate potentially offensive content where language can be more ambiguous than personal threats, which are banned. However, some critics see the narrower scope as a retreat. “Dehumanization is a great start, but if dehumanization starts and stops at religious categories alone, that does not encapsulate all the ways people have been dehumanized,” Rashad Robinson, president of civil rights nonprofit Color of Change, told The New York Times.

Instagram is taking a different tack to police bullying, which spokesperson Stephanie Otway identified as the platform’s top priority. In addition to human moderators, the platform is using an AI feature that identifies bullying language like “stupid” and “ugly” before an item is posted and asks users “Are you sure you want to post this?” Otway says the feature gives users a moment to pause, take a breath, and decide if they really want to send that message.

If you feel like you’ve read these promises before, that’s because these issues aren’t new. Bullying and harassment have existed on social media for as long as humans have put fingertip to keyboard. Instagram has been fighting off negative content since the platform opened in 2010, when the founders would personally delete offensive comments. Twitter has been wrangling trolls and hate speech for years. But as platforms grow, policing content gets harder. Platforms need AI tools to sort through the incredible volume of content they publish. At the same time, those AI tools are ill-equipped to handle nuanced decisions about what counts as offensive or unacceptable. For example, YouTube has spent nearly two years trying to find an effective way to get white-supremacist content off the platform while still preserving important historical content about the Nazis and their role in World War II.

“Policy doesn’t matter if you can’t enforce it well.”

Kat Lo

Deciding what counts as “dehumanizing” or “bullying” content is equally complicated. Jessa Lingel, a professor at the University of Pennsylvania who studies digital culture, points out that language isn’t automatically good or bad: “Context matters,” she says. Lingel points to labels like dyke, which were once considered offensive but have now been reclaimed by some communities. The same could be said for other terms like ho, fatty, and pussy. Who gets to decide when a term is appropriate and who can use it? When does a term cross over from offensive to permitted, or vice versa? Such decisions rely on a level of cultural awareness and sensitivity.

The same problem emerges for hateful terms. For some religions, specific language can take on coded meanings. References to pork, for example, could be highly offensive to Jews or Muslims even though no words in the post would violate Twitter’s rule against dehumanizing content. Groups also can evolve new language that avoids censorship. In China, internet users have developed a host of alternate spellings, special phrases, and coded terms that criticize the Chinese government while evading government censorship. “People will always adapt,” says Kat Lo, a researcher and consultant who specializes in content mediation and online harassment.

Twitter acknowledged these problems in its blog post, saying the company needs to do more to protect marginalized communities and to understand the context behind different terms. Lo says it’s good the company recognizes those shortcomings, but it should explain how it will pursue solutions.

Twitter works with a Trust and Safety Council of outside experts that advise the platform on how to curb harassment and abusive behavior, but a Twitter spokesperson declined to give specifics about how the company or the council plan to answer these complicated questions.

“We need humans. The tech just isn’t there yet.”

Jessa Lingel, University of Pennsylvania

Of course, the new policies themselves are just words at this point. “The crucial part is the operational side,” Lo says. “Policy doesn’t matter if you can’t enforce it well.” Twitter and Instagram operate in dozens of countries and are home to myriad subcultures, from Black Twitter to Earthquake Twitter. Because those context problems are so complicated and often regional, neither platform can rely on AI systems alone to judge content. “We need humans,” says Lingel. “The tech just isn’t there yet.”

But humans are expensive. They require training, salaries, offices, and equipment. Lo describes this as an “iceberg of operational work” beneath the policies. To police content this sensitive and context-specific, across different cultures and countries, Lo says, you also need local experts and long-term partnerships with organizations that understand those groups. “I’m not confident Twitter has those resources,” she says.

In 2018 Facebook, which owns Instagram, announced it had more than doubled its safety and security team to 30,000 people, half of whom review content on both Facebook and Instagram. In order for Instagram’s AI to stay relevant and evolve to include trends in modern language, Instagram relies on feedback from those moderators to update language to look out for.

In January, Facebook CEO Mark Zuckerberg told investors Facebook was investing “billions of dollars in security.” Otway says the company is heavily focused on hiring more engineers and building AI models that can more precisely target bullying behavior. For the time being, though, she says the platforms “still very much rely on content moderators.”

Twitter declined to comment on how many moderators it employs and how many more, if any, it would add. Twitter also did not comment on what kinds of investments it is planning to make in technologies that could help monitor user behavior.