Refereed article in conference proceedings (A4)

Finnish Paraphrase Corpus

List of Authors: Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto

Conference name: Nordic Conference on Computational Linguistics

Publication year: 2021

Journal: Linköping Electronic Conference Proceedings

Book title *: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Title of series: Linköping Electronic Conference Proceedings

Number in series: 178

ISBN: 978-91-7929-614-8

ISSN: 1650-3686


Self-archived copy’s web address:


In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

Last updated on 2021-24-06 at 09:33