Finnish Paraphrase Corpus
: Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto
: Simon Dobnik, Lilja Øvrelid
: Nordic Conference on Computational Linguistics
: 2021
: Linköping Electronic Conference Proceedings
: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
: Linköping Electronic Conference Proceedings
: 178
: 288
: 298
: 978-91-7929-614-8
: 1650-3686
: https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29
: https://research.utu.fi/converis/portal/Publication/53727016
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.