A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Inter-rater agreement for social computing studies




TekijätJoni O. Salminen, Hind A. Al-Merekhi, Partha Dey, Bernard J. Jansen

Konferenssin vakiintunut nimiInternational Conference on Social Networks Analysis, Management and Security

Julkaisuvuosi2018

Kokoomateoksen nimi2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS)

ISBN978-1-5386-9588-3

DOIhttps://doi.org/10.1109/SNAMS.2018.8554744


Tiivistelmä

Different agreement scores are widely used in social computing studies
to evaluate the reliability of crowdsourced ratings. In this research,
we argue that the concept of agreement is problematic for many rating
tasks in computational social science because they are characterized by
subjectivity. We demonstrate this claim by analyzing four social
computing datasets that are rated by crowd workers, showing that the
agreement ratings are low despite deploying proper instructions and
platform settings. Findings indicate that the more subjective the rating
task, the lower the agreement, suggesting that tasks differ by their
inherent subjectivity and that measuring the agreement of social
computing tasks might not be the optimal way to ensure data quality.
When creating subjective tasks, the use of agreement metrics potentially
gives a false picture of the consistency of crowd workers, as they
over-simplify the reality of obtaining quality labels. We also provide
empirical evidence on the stability of crowd ratings with a different
number of raters, items, and categories, finding that the reliability
scores are most sensitive to the number categories, somewhat less
sensitive to the number of raters, and the least sensitive to the number
of items. Our findings have implications for computational social
scientists using crowdsourcing for data collection.



Last updated on 2024-26-11 at 22:42