Illustration by Alex Castro / The Verge
Googleâs Jigsaw unit is releasing the code for an open source anti-harassment tool called Harassment Manager. The tool, intended for journalists and other public figures, employs Jigsawâs Perspective API to let users sort through potentially abusive comments on social media platforms starting with Twitter. Itâs debuting as source code for developers to build on, then being launched as a functional application for Thomson Reuters Foundation journalists in June.
Harassment Manager can currently work with Twitterâs API to combine moderation options â like hiding tweet replies and muting or blocking accounts â with a bulk filtering and reporting system. Perspective checks messagesâ language for levels of âtoxicityâ based on elements like threats, insults, and profanity. It sorts messages into queues on a dashboard, where users can address them in batches rather than individually through Twitterâs default moderation tools. They can choose to blur the text of the messages while theyâre doing it, so they donât need to read each one, and they can search for keywords in addition to using the automatically generated queues.
Harassment Manager also lets users download a standalone report containing abusive messages; this creates a paper trail for their employer or, in the case of illegal content like direct threats, law enforcement. For now, however, thereâs not a standalone application that users can download. Instead, developers can freely build apps that incorporate its functionality and services using it will be launched by partners like the Thomson Reuters Foundation.
Jigsaw announced Harassment Manager on International Womenâs Day, and it framed the tool as particularly relevant to female journalists who face gender-based abuse, highlighting input from âjournalists and activists with large Twitter presencesâ as well as nonprofits like the International Womenâs Media Foundation and the Committee To Protect Journalists. In a Medium post, the team says itâs hoping developers can tailor it for other at-risk social media users. âOur hope is that this technology provides a resource for people who are facing harassment online, especially female journalists, activists, politicians and other public figures, who deal with disproportionately high toxicity online,â the post reads.
Google has harnessed Perspective for automated moderation before. In 2019 it released a browser extension called Tune that let social media users avoid seeing messages with a high chance of being toxic, and itâs been used by many commenting platforms (including Vox Mediaâs Coral) to supplement human moderation. But as we noted around the release of Perspective and Tune, the language analysis model has historically been far from perfect. It sometimes misclassifies satirical content or fails to detect abusive messages, and Jigsaw-style AI can inadvertently associate terms like âblindâ or âdeafâ â which arenât necessarily negative â with toxicity. Jigsaw itself has also been criticized for a toxic workplace culture, although Google has disputed the claims.
Unlike AI-powered moderation on services like Twitter and Instagram, however, Harassment Manager isnât a platform-side moderation feature. Itâs apparently a sorting tool for helping manage the sometimes overwhelming scale of social media feedback, something that could be relevant for people far outside the realm of journalism â even if they canât use it for now.