Automatic Detection of Toxic Online Comments
Research Mentor(s)
Liu, YuDong
Description
Social media sites and online forums have struggled with the issue of harassment and hateful speech taking place on their platforms since their inception. On most of these platforms, identifying and reporting such behavior has been the responsibility of the targets of harassment or human moderators. The massive amount of comments posted every day has presented a need for the automatic detection of these toxic online comments. I implemented a deep neural network based system that can automatically classify different types of toxic comments which could allow websites to filter out content that they deem inappropriate or in violation of their policies. The deep neural network is first trained on thousands of comments which had been previously classified as being toxic, severely toxic, obscene, threatening, insulting, or hateful towards a group or individual’s identity, and is then applied to make predictions on new comments which it had not seen before. My experiments show that the system correctly classifies these six categories with over 98% accuracy. Even though most of the prior work in classifying hateful and offensive speech has suffered from low precision, which is the tendency to falsely predict that a comment is hateful when in fact, it is benign, this system produces high precision making it a viable solution for dependably flagging toxic comments with little human oversight required.
Document Type
Event
Start Date
17-5-2018 12:00 AM
End Date
17-5-2018 12:00 AM
Department
Computer Science
Genre/Form
student projects, posters
Subjects – Topical (LCSH)
Internet--Access control; Internet--Censorship; Computer networks--Management; Computational intelligence
Type
Image
Rights
Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this document for commercial purposes, or for financial gain, shall not be allowed without the author’s written permission.
Language
English
Format
application/pdf
Automatic Detection of Toxic Online Comments
Social media sites and online forums have struggled with the issue of harassment and hateful speech taking place on their platforms since their inception. On most of these platforms, identifying and reporting such behavior has been the responsibility of the targets of harassment or human moderators. The massive amount of comments posted every day has presented a need for the automatic detection of these toxic online comments. I implemented a deep neural network based system that can automatically classify different types of toxic comments which could allow websites to filter out content that they deem inappropriate or in violation of their policies. The deep neural network is first trained on thousands of comments which had been previously classified as being toxic, severely toxic, obscene, threatening, insulting, or hateful towards a group or individual’s identity, and is then applied to make predictions on new comments which it had not seen before. My experiments show that the system correctly classifies these six categories with over 98% accuracy. Even though most of the prior work in classifying hateful and offensive speech has suffered from low precision, which is the tendency to falsely predict that a comment is hateful when in fact, it is benign, this system produces high precision making it a viable solution for dependably flagging toxic comments with little human oversight required.