A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
Developing an online hate classifier for multiple social media platforms
Tekijät: Joni Salminen, Maximilian Hopf, Shammur A. Chowdhury, Soon-gyo Jung, Hind Almerekhi, Bernard Jansen
Kustantaja: Springer
Julkaisuvuosi: 2020
Journal: Human-Centric Computing and Information Sciences
Tietokannassa oleva lehden nimi: Human-centric Computing and Information Sciences
Artikkelin numero: 1
Vuosikerta: 10
Numero: 1
ISSN: 2192-1962
eISSN: 2192-1962
DOI: https://doi.org/10.1186/s13673-019-0205-6
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/45063523
The proliferation of social media enables people to express their
opinions widely online. However, at the same time, this has resulted in
the emergence of conflict and hate, making online environments
uninviting for users. Although researchers have found that hate is a
problem across multiple platforms, there is a lack of models for online
hate detection using multi-platform data. To address this research gap,
we collect a total of 197,566 comments from four platforms: YouTube,
Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as
non-hateful and the remaining 20% labeled as hateful. We then experiment
with several classification algorithms (Logistic Regression, Naïve
Bayes, Support Vector Machines, XGBoost, and Neural Networks) and
feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their
combination). While all the models significantly outperform the
keyword-based baseline classifier, XGBoost using all features performs
the best (F1 = 0.92). Feature importance analysis indicates that BERT
features are the most impactful for the predictions. Findings support
the generalizability of the best model, as the platform-specific results
from Twitter and Wikipedia are comparable to their respective source
papers. We make our code publicly available for application in real
software systems as well as for further development by online hate
researchers.
Ladattava julkaisu This is an electronic reprint of the original article. |