Event Title

Automatic Detection of Toxic Online Comments

Research Mentor(s)

Yudong Liu

Description

Social media sites and online forums have struggled with the issue of harassment and hateful speech taking place on their platforms since their inception. On most of these platforms, identifying and reporting such behavior has been the responsibility of the targets of harassment or human moderators. The massive amount of comments posted every day has presented a need for the automatic detection of these toxic online comments. I implemented a deep neural network based system that can automatically classify different types of toxic comments which could allow websites to filter out content that they deem inappropriate or in violation of their policies. The deep neural network is first trained on thousands of comments which had been previously classified as being toxic, severely toxic, obscene, threatening, insulting, or hateful towards a group or individual’s identity, and is then applied to make predictions on new comments which it had not seen before. My experiments show that the system correctly classifies these six categories with over 98% accuracy. Even though most of the prior work in classifying hateful and offensive speech has suffered from low precision, which is the tendency to falsely predict that a comment is hateful when in fact, it is benign, this system produces high precision making it a viable solution for dependably flagging toxic comments with little human oversight required.

Document Type

Event

Start Date

May 2018

End Date

May 2018

Department

Computer Science

Rights

Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this document for commercial purposes, or for financial gain, shall not be allowed without the author’s written permission.

Language

English

Format

application/pdf

This document is currently not available here.

Share

COinS
 
May 17th, 12:00 PM May 17th, 3:00 PM

Automatic Detection of Toxic Online Comments

Social media sites and online forums have struggled with the issue of harassment and hateful speech taking place on their platforms since their inception. On most of these platforms, identifying and reporting such behavior has been the responsibility of the targets of harassment or human moderators. The massive amount of comments posted every day has presented a need for the automatic detection of these toxic online comments. I implemented a deep neural network based system that can automatically classify different types of toxic comments which could allow websites to filter out content that they deem inappropriate or in violation of their policies. The deep neural network is first trained on thousands of comments which had been previously classified as being toxic, severely toxic, obscene, threatening, insulting, or hateful towards a group or individual’s identity, and is then applied to make predictions on new comments which it had not seen before. My experiments show that the system correctly classifies these six categories with over 98% accuracy. Even though most of the prior work in classifying hateful and offensive speech has suffered from low precision, which is the tendency to falsely predict that a comment is hateful when in fact, it is benign, this system produces high precision making it a viable solution for dependably flagging toxic comments with little human oversight required.