A4 Refereed article in a conference publication

Finnish Paraphrase Corpus




AuthorsKanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto

EditorsSimon Dobnik, Lilja Øvrelid

Conference nameNordic Conference on Computational Linguistics

Publication year2021

JournalLinköping Electronic Conference Proceedings

Book title Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Series titleLinköping Electronic Conference Proceedings

Number in series178

First page 288

Last page298

ISBN978-91-7929-614-8

ISSN1650-3686

Web address https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/Publication/53727016


Abstract

In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 22:55