Online content moderation: Can AI help clean up social media?

Monday, 20 December 2021 00:00 GMT

People use mobile phones on a boat sailing on Yangtze River following the coronavirus disease (COVID-19) outbreak, in Wuhan, Hubei province, China, September 3, 2020. REUTERS/Aly Song

Image Caption and Rights Information

* Any views expressed in this article are those of the author and not of Thomson Reuters Foundation.

From Facebook to Twitter, machine learning algorithms are policing online content to decide what's banned and what's not. But how do they work? And are they effective?

By Umberto Bacchi

Dec 20 (Openly) -Two days after it was sued by Rohingya refugees from Myanmar over allegations that it did not take action against hate speech, social media company Meta, formerly known as Facebook, announced a new artificial intelligence system to tackle harmful content.

Machine learning tools have increasingly become the go-to solution for tech firms to police their platforms, but questions have been raised about their accuracy and their potential threat to freedom of speech.

Here is all you need to know about AI and content moderation:


The $150 billion Rohingya class-action lawsuit filed this month came at the end of a tumultuous period for social media giants, which have been criticised for failing to effectively tackle hate speech online and increasing polarization.

The complaint argues that calls for violence shared on Facebook contributed to real-world violence against the Rohingya community, which suffered a military crackdown in 2017 that refugees said included mass killings and rape.

The lawsuit followed a series of incidents that have put social media giants under intense scrutiny over their practices, including the killing of 51 people at two mosques in Christchurch, New Zealand in 2019, which was live-streamed by the attacker on Facebook.

In the wake of a deadly Jan. 6 assault on the Capitol, Meta's CEO Mark Zuckerberg and his counterparts at Google and Twitter appeared before U.S. Congress in March to answer questions about extremism and misinformation on their services.


Social media companies have long relied on human moderators and user reports to police their platforms. Meta, for example, has said it has 15,000 content moderators reviewing material from its global users in more than 70 languages.

But the mammoth size of the task and regulatory pressure to remove harmful content quickly have pushed firms to automate the process, said Eliska Pirkova, freedom of expression lead at digital rights group Access Now.

There are "good reasons" to use AI for content moderation, said Mitchell Gordon, a computer science PhD at Stanford University. 

"Platforms rarely have enough human moderators to review all, or even most, content. And when it comes to problematic content, it's often better for everyone's well-being if no human ever has to look at it," Gordon said in emailed comments.


Like other machine learning tools, AI moderation systems learn to recognise different types of content after being trained on large datasets that have been previously categorised by humans.

Researchers collecting these datasets typically ask several people to look at each piece of content, said Gordon. 

"What they tend to do is take a majority vote and say, 'Well, if most people say this is toxic, we're gonna view it as toxic'," he said.

From Twitter to YouTube to TikTok, AI content moderation has become pervasive in the industry in recent years.

In March, Zuckerberg told Congress AI was responsible for taking down more than 90% of content deemed to be against Facebook guidelines.

And earlier this month, the company announced a new tool that requires fewer examples for each dataset, meaning it can be trained to take action on new or evolving types of harmful content in weeks instead of months. 


Tech experts say one problem with these tools is that algorithms struggle to understand context and subtleties that allow them to discern, for example, satire from hate speech.

"Computers, no matter how sophisticated the algorithm they use, are always essentially stupid," said David Berry, a professor of digital humanities at the University of Sussex in Britain.

"(An algorithm) can only really process what it's been taught and it does so in a very simplistic fashion. So the nuances of human communication ... (are) very rarely captured."

This can result in harmless content being censored and harmful posts remaining online, which has deep ramifications for freedom of expression, said Pirkova of Access Now.

Earlier this year, Instagram and Twitter faced backlash for deleting posts mentioning the possible eviction of Palestinians from East Jerusalem, something the companies blamed on technical errors by their automated moderation systems.

Language variety is another issue.

Documents leaked in October suggested that in 2020 Meta lacked screening algorithms for languages used in some of the countries the firm deemed most "at-risk" for potential real-world harm, including Myanmar and Ethiopia.

Finally, since algorithms are largely trained based on how a majority feels about a certain type of content, minorities holding a different view risk having their voices automatically erased, said Gordon.


No matter how good AI systems become, deciding what is okay to say and what isn't will always be a matter of opinion.

A 2017 study by researchers in New York and Doha on hate speech detection found human coders reached a unanimous verdict in only 1.3% of cases. 

"So long as people disagree about what crosses the line, no AI will be able to come up with a decision that all people view as legitimate and correct," said Gordon.

He and his team are working on a solution: training AI to take on different perspectives and building interfaces that allow moderators to choose which views they would like the system's decisions to reflect.

Given the power automated monitoring systems have to shape public discourse, firms should be more transparent about the tools they deploy, how they operate and how they are trained, said Pirkova at Access Now.

Legislators should also make mandatory due diligence safeguards such as human rights impact assessments and independent audits, taking into account how the algorithms affect minorities, she added.

"This is not to say that we should completely get rid of automated decision-making processes, but we need to understand them better," she said.

Related stories:

AI vigilantes fuel censorship fears in Russian cyberspace 

'Like the Stasi': Cyber volunteers in India silence critical voices 

In Thailand's Muslim south, phones cut off in surveillance crackdown 

(Reporting by Umberto Bacchi @UmbertoBacchi, Editing by Jumana Farouky. Please credit the Thomson Reuters Foundation, the charitable arm of Thomson Reuters, that covers the lives of people around the world who struggle to live freely or fairly. Visit

Update cookies preferences