Perspective API is a collaborative research effort exploring Machine Learning as a tool for better discussions online

Perspective API is the product of a collaborative research effort by Jigsaw and Google’s Counter Abuse Technology team exploring machine learning as a tool for better discussions online. The team routinely publishes datasets, academic research, and open source code as part of their commitment to transparency and innovation in natural language processing and machine learning.

The challenges of maintaining healthy conversations online are significant, and we know we cannot solve them alone. To enable academic and industry research in the field, we create public datasets whenever possible.

A public Kaggle competition, based on ~2 million comments from the Civil Comments platform, which shut down in 2017. This data is annotated for toxicity, toxicity sub-types, and mentions of identities, which enables evaluation of unintended bias with respect to identity mentions. See the Kaggle page as well as our academic paper Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification for detailed description of data source and annotation schema. This dataset is also available on TensorFlow Datasets.

A public Kaggle competition, based on a crowdsourced dataset that includes 4 toxicity sub-types, and approximately 160k human labelled comments from Wikipedia Talk pages. The labelled annotations are based on asking 5000 crowd-workers to rate Wikipedia comments according to their toxicity. This dataset is also available on Figshare as the Wikipedia Human Annotations of Toxicity on Talk Pages.

A public Kaggle competition that challenges participants to use the data from the previous two Kaggle competitions to build a multilingual toxicity model.

100k Comments from Wikipedia each with 10 annotations by the 4000 annotators who contributed to the effort. Each comment annotation notes whether the annotator considers the comment to be a personal attack or not.

Machine-labelled annotations for every English Wikipedia talk page comment from 2001 to 2015, approximately 95 million comments to support large scale data analysis.

A collection of 12,000 news comments that have been annotated for positive contributions to online conversations. This is a collaboration between Simon Fraser University and Jigsaw, and is soon to appear in a First Monday special issue on abusive language online.

A collection of 44,000 comments that have been annotated for a variety of subtle aspects of unhealthiness, including sarcasm, antagonism, and condescension. This dataset was a collaboration between the University of Oxford and Jigsaw and will be published at the Workshop on Online Abuse and Harms.

A dataset derived from Unintended Bias Kaggle competition forms the basis for a context-aware dataset that has been annotated by raters who could see the previous comment as part of a study measuring the importance of context for moderation. This collaboration between Athens University of Economics and Business and Jigsaw appeared at ACL 2020.

Our open source repositories provide a range of examples using Perspective, from fully-fledged tools to experimental demos, as well as examples of tools we leverage to build our machine learning models.

Tools built using Perspective

A moderation tool to support using machine learning models to assist a human review process (used by the New York Times).

Code to build an authorship experience that gives feedback to people as they type. This is used in our public demo of perspective API, but the code repository includes many additional features and ways to create other authorship experiences.

An experimental Chrome extension that lets people customize how much toxicity they want to see in comments across the internet. Tune uses Perspective to let people set the “volume” of conversations on a number of popular platforms, including YouTube, Facebook, Twitter, Reddit, and Disqus. The extension is available for download in the Chrome Web Store.

A collection of concepts and demos built using Perspective API.

Example code for calling Perspective

A simple JavaScript client library for calling the Perspective API.

A simple Express based proxy server that can hold your API-key and calls the Perspective API.

An Express based simple proxy server that can be used to provide restricted access to your Perspective API cloud project.

Example code using Perspective API with Google Apps Script.

Model building tools

Our repository for tools to measure and mitigated unintended bias in our models.

Collaborative work with Wikimedia to create a useful corpus of Talk Page conversations on Wikipedia.

Example code to train machine learning models for text

The team behind Perspective API regularly publishes research in academic forums.

Demonstrates that rater identity plays a statistically significant role in how raters annotate toxicity for identity-related annotations, and compares models trained on annotations from several different identity-based rater pools.

Introduces a novel framework for dataset developers to facilitate transparent documentation of key decision points at various stages of the ML data pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

Demonstrates that models distilled from large language models often have hidden performance costs especially in terms of identity-based bias.

Introduces a research framework to highlight the documentation and reporting needs of female journalists and activists undergoing significant harassment on social media platforms, and validates those needs by designing a prototype tool called Harassment Manager.

Presents the Charformer multilingual text classification model that is used in PerspectiveAPI and the techniques used to minimize bias and maximize the benefits of cross-lingual classification. This model shows across the board improvements, especially for emoji and code-switching data commonly used in user generated content.

Expands upon the work that resulted in the SemEval Toxic Spans evaluation from 2021 to present a range of techniques used to identify spans that are associated with comments receiving toxic ratings and present a method of suggesting alternative content that conveys the same ideas but in a civil fashion, when this is possible.

Surveys an array of literature on human computation, with a focus on ethical considerations  around crowdsourcing, and lays out challenges associated with who the annotator is, how the annotators’ lived experiences can impact their annotations, and the relationship between the annotators and the crowdsourcing platforms, including putting forth a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline.

Constructs and releases a dataset of posts with two kinds of toxicity labels, depending on whether annotators considered the post with the previous one as additional context or without additional context, and based on this introduces context sensitivity estimation, a new task which aims to identify posts whose perceived toxicity changes of the context is also considered.

Introduces new metrics enabling the rigorous study of content moderation as a human-AI collaborative process, and demonstrates that state-of-the-art uncertainty models enable new collaborative review strategies improving the overall collaborative moderator-model system's performance.

Examines incitements and calls to harass posted by members of certain online communities as a lens through which to holistically measure and understand a broad range of harassment strategies, including developing a taxonomy to categorize the preferred approaches of coordinated attackers and providing suggestions for actions and future research that could be performed by researchers, platforms, authorities, and anti-harassment groups.

Describes the Toxic Spans Detection task of SemEval-2021, which required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. Summarizes the results of the participants and their major strategies for this competition.

Develops a new model, CAE-T5, that can help suggest rephrasings of toxic comments in a more civil manner, inspired by recent progress in unpaired sequence-to-sequence tasks.

Studies the task of labeling covert or veiled toxicity in online conversations, including introducing a dataset categorizing different types of covert toxicity, and evaluating models on the task.

Presents a new dataset of comments annotated for their impact on the overall health of a conversation, including annotating for a new typology of potentially unhealthy sub-attributes.

Finds that context can affect human judgments of toxicity, either amplifying or mitigating the perceived toxicity of posts, and that a significant subset of annotations can be flipped if annotators are not provided with context, but that context surprisingly does not appear to improve the performance of toxicity classifiers.

Introduces the Constructive Comments Corpus, a new dataset intended to help build new tools for online communities to improve the quality of their discussions, including a taxonomy of sub-characteristics of constructiveness. Together with new machine learning models for constructiveness, this paves the way for moderation tools focused on promoting comments that contribute to a discussion rather than only filtering out undesirable content.

Describes our submissions for two of the EVALITA (Evaluation of NLP and Speech Tools for Italian) 2020 shared tasks, based in part on the technology that powers Perspective, and reviews the types of errors our system made in the shared tasks.

Presents the application of two strong baseline systems for toxicity detection, and evaluates their performance in identifying and categorizing offensive language in social media.

Demonstrates how traditional techniques for debiasing word embeddings can actually increase model bias on downstream tasks and proposes novel debiasing methods to ameliorate the issue.

Proposes a framework to encourage transparent reporting of the context, use-cases, and performance characteristics of machine learning models across domains.

Introduces a suite of threshold-agnostic metrics that provide a nuanced view of unintended bias in text classification, by exploring the various ways that a classifier’s score distribution can vary across designated groups.

Discusses open questions and research challenges toward the goal of effective crowdsourcing of online toxicity as well as presenting a survey of recent work that addresses these.

Presents a novel data visualization and moderation tool for Wikipedia that is built on top of the Perspective API.

Introduces the task of predicting whether a given conversation is on the verge of being derailed by the antisocial actions of one of its participants and demonstrates that a simple model using conversational and linguistic features can achieve performance close to that of humans for this task.

Develops methods for measuring the unintended bias in a text classifier according to terms that appear in the text, as well as approaches to help mitigate them. The limitations of these methods are expanded on in the follow up paper Limitations of Pinned AUC for Measuring Unintended Bias.

Connects trace data and machine learning classifiers to self-reported survey information about user’s online behaviour demonstrating the correlation between the two.

Presents an unprecedented view of the complete history of conversations between contributors of English Wikipedia by recording the intermediate states of conversations—including not only comments and replies, but also their modifications, deletions and restorations.

Outlines how crowdsourcing and machine learning can be used to scale our understanding of online personal attacks and applies these methods to the challenge on Wikipedia.

Surveys approaches that use machine learning to obfuscate network traffic to circumvent censorship.

Looking to learn more? Visit our Developers site for more technical information.

Go to developers site