A4 Refereed article in a conference publication

Finnish SQuAD: A Simple Approach to Machine Translation of Span Annotations




AuthorsNuutinen, Emil; Rastas, Iiro; Ginter, Filip

EditorsJohansson, Richard; Stymne, Sara

Conference nameNordic Conference on Computational Linguistics and Baltic Conference on Human Language Technologies

PublisherUniversity of Tartu Library

Publication year2025

Journal: NEALT proceedings series

Book title Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Volume57

First page 424

Last page432

ISBN978-9908-53-109-0

ISSN1736-8197

eISSN1736-6305

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://aclanthology.org/2025.nodalida-1.46/

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/506499977


Abstract

We apply a simple method to machine translate datasets with span-level annotation using the DeepL MT service and its ability to translate formatted documents. Using this method, we produce a Finnish version of the SQuAD2.0 question answering dataset and train QA retriever models on this new dataset. We evaluate the quality of the dataset and more generally the MT method through direct evaluation, indirect comparison to other similar datasets, a backtranslation experiment, as well as through the performance of downstream trained QA models. In all these evaluations, we find that the method of transfer is not only simple to use but produces consistently better translated data. Given its good performance on the SQuAD dataset, it is likely the method can be used to translate other similar span-annotated datasets for other tasks and languages as well. All code and data is available under an open license: data at HuggingFace TurkuNLP/squad_v2_fi, code on GitHub TurkuNLP/squad2-fi, and model at HuggingFace TurkuNLP/bert-base-finnish-cased-squad2.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
The research was supported by the Research Council of Finland funding.


Last updated on 08/01/2026 08:05:33 AM