A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Developing an online hate classifier for multiple social media platforms




TekijätJoni Salminen, Maximilian Hopf, Shammur A. Chowdhury, Soon-gyo Jung, Hind Almerekhi, Bernard Jansen

KustantajaSpringer

Julkaisuvuosi2020

JournalHuman-Centric Computing and Information Sciences

Tietokannassa oleva lehden nimiHuman-centric Computing and Information Sciences

Artikkelin numero1

Vuosikerta10

Numero1

ISSN2192-1962

eISSN2192-1962

DOIhttps://doi.org/10.1186/s13673-019-0205-6

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/45063523


Tiivistelmä

The proliferation of social media enables people to express their
opinions widely online. However, at the same time, this has resulted in
the emergence of conflict and hate, making online environments
uninviting for users. Although researchers have found that hate is a
problem across multiple platforms, there is a lack of models for online
hate detection using multi-platform data. To address this research gap,
we collect a total of 197,566 comments from four platforms: YouTube,
Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as
non-hateful and the remaining 20% labeled as hateful. We then experiment
with several classification algorithms (Logistic Regression, Naïve
Bayes, Support Vector Machines, XGBoost, and Neural Networks) and
feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their
combination). While all the models significantly outperform the
keyword-based baseline classifier, XGBoost using all features performs
the best (F1 = 0.92). Feature importance analysis indicates that BERT
features are the most impactful for the predictions. Findings support
the generalizability of the best model, as the platform-specific results
from Twitter and Wikipedia are comparable to their respective source
papers. We make our code publicly available for application in real
software systems as well as for further development by online hate
researchers.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 10:35