Finnish Paraphrase Corpus




Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto

Simon Dobnik, Lilja Øvrelid

Nordic Conference on Computational Linguistics

2021

Linköping Electronic Conference Proceedings

Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Linköping Electronic Conference Proceedings

178

288

298

978-91-7929-614-8

1650-3686

https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29

https://research.utu.fi/converis/portal/Publication/53727016



In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.


Last updated on 2024-26-11 at 22:55