A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4
Tekijät: Tarkka, Otto; Koljonen, Jaakko; Korhonen, Markus; Laine, Juuso; Martiskainen, Kristian; Elo, Kimmo; Laippala, Veronika
Toimittaja: Fišer, Darja; Eskevich, Maria; Bordon, David
Konferenssin vakiintunut nimi: ParlaCLARIN Workshop
Julkaisuvuosi: 2024
Journal: LREC Proceedings
Kokoomateoksen nimi: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) : ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora
Aloitussivu: 70
Lopetussivu: 76
eISBN: 978-2-493814-24-1
eISSN: 2522-2686
Verkko-osoite: https://aclanthology.org/2024.parlaclarin-1.11.pdf
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/457172276
Annotating datasets can often be prohibitively expensive and laborious. Emotion annotation specifically has been shown to be a difficult task in which even trained annotators rarely reach high agreement. With the introduction of ChatGPT, GPT-4 and other Large Language Models (LLMs), however, a new line of research has emerged that explores the possibilities of automated data annotation. In this paper, we apply GPT-4 to the task of annotating a dataset, which is subsequently used to train a BERT model for emotion analysis of Finnish parliamentary speeches. In our experiment, GPT-4 performs on par with trained annotators and the annotations it produces can be used to train a classifier that reaches micro F1 of 0.690. We compare this model to two other models that are trained on machine translated datasets and find that the model trained on GPT-4 annotated data outperforms them. Our paper offers new insight into the possibilities that LLMs have to offer for the analysis of parliamentary corpora.
Ladattava julkaisu This is an electronic reprint of the original article. |
Julkaisussa olevat rahoitustiedot:
This research was funded by the Research Council of Finland [grant number 353569].