Transfer learning for hate speech detection in social media - UTU Research Portal

A1 Refereed original research article in a scientific journal

Transfer learning for hate speech detection in social media

Authors: Yuan Lanqin, Wang Tianyu, Ferraro Gabriela, Suominen Hanna, Rizoiu Marian-Andrei

Publisher: Springer

Publication year: 2023

Journal: Journal of computational social science

Journal name in source: JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE

Journal acronym: J Comput Soc Sc

Volume: 6

Issue: 2

First page : 1081

Last page: 1101

ISSN: 2432-2717

eISSN: 2432-2725

DOI: https://doi.org/10.1007/s42001-023-00224-9

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Partially Open Access publication channel

Web address : https://doi.org/10.1007/s42001-023-00224-9

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/181952431

Abstract

Today, the internet is an integral part of our daily lives, enabling people to be more connected than ever before. However, this greater connectivity and access to information increase exposure to harmful content, such as cyber-bullying and cyber-hatred. Models based on machine learning and natural language offer a way to make online platforms safer by identifying hate speech in web text autonomously. However, the main difficulty is annotating a sufficiently large number of examples to train these models. This paper uses a transfer learning technique to leverage two independent datasets jointly and builds a single representation of hate speech. We build an interpretable two-dimensional visualization tool of the constructed hate speech representation—dubbed the Map of Hate—in which multiple datasets can be projected and comparatively analyzed. The hateful content is annotated differently across the two datasets (racist and sexist in one dataset, hateful and offensive in another). However, the common representation successfully projects the harmless class of both datasets into the same space and can be used to uncover labeling errors (false positives). We also show that the joint representation boosts prediction performances when only a limited amount of supervision is available. These methods and insights hold the potential for safer social media and reduce the need to expose human moderators and annotators to distressing online messaging.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

s42001-023-00224-9.pdf