A4 Refereed article in a conference publication
Finnish Paraphrase Corpus
Authors: Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto
Editors: Simon Dobnik, Lilja Øvrelid
Conference name: Nordic Conference on Computational Linguistics
Publication year: 2021
Journal: Linköping Electronic Conference Proceedings
Book title : Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
Series title: Linköping Electronic Conference Proceedings
Number in series: 178
First page : 288
Last page: 298
ISBN: 978-91-7929-614-8
ISSN: 1650-3686
Web address : https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29
Self-archived copy’s web address: https://research.utu.fi/converis/portal/Publication/53727016
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.
Downloadable publication This is an electronic reprint of the original article. |