Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4 - UTU Research Portal

A4 Refereed article in a conference publication

Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4

Authors: Tarkka, Otto; Koljonen, Jaakko; Korhonen, Markus; Laine, Juuso; Martiskainen, Kristian; Elo, Kimmo; Laippala, Veronika

Editors: Fišer, Darja; Eskevich, Maria; Bordon, David

Conference name: ParlaCLARIN Workshop

Publication year: 2024

Journal: LREC Proceedings

Book title : Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) : ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora

First page : 70

Last page: 76

eISBN: 978-2-493814-24-1

eISSN: 2522-2686

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://aclanthology.org/2024.parlaclarin-1.11.pdf

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/457172276

Self-archived copy's licence: CC BY NC

Self-archived copy's version: Publisher`s PDF

Abstract

Annotating datasets can often be prohibitively expensive and laborious. Emotion annotation specifically has been shown to be a difficult task in which even trained annotators rarely reach high agreement. With the introduction of ChatGPT, GPT-4 and other Large Language Models (LLMs), however, a new line of research has emerged that explores the possibilities of automated data annotation. In this paper, we apply GPT-4 to the task of annotating a dataset, which is subsequently used to train a BERT model for emotion analysis of Finnish parliamentary speeches. In our experiment, GPT-4 performs on par with trained annotators and the annotations it produces can be used to train a classifier that reaches micro F1 of 0.690. We compare this model to two other models that are trained on machine translated datasets and find that the model trained on GPT-4 annotated data outperforms them. Our paper offers new insight into the possibilities that LLMs have to offer for the analysis of parliamentary corpora.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

2024.parlaclarin-1.11_CC-BY-NC.pdf

Funding information in the publication:
This research was funded by the Research Council of Finland [grant number 353569].