A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
Finnish Paraphrase Corpus
Tekijät: Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto
Toimittaja: Simon Dobnik, Lilja Øvrelid
Konferenssin vakiintunut nimi: Nordic Conference on Computational Linguistics
Julkaisuvuosi: 2021
Journal: Linköping Electronic Conference Proceedings
Kokoomateoksen nimi: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
Sarjan nimi: Linköping Electronic Conference Proceedings
Numero sarjassa: 178
Aloitussivu: 288
Lopetussivu: 298
ISBN: 978-91-7929-614-8
ISSN: 1650-3686
Verkko-osoite: https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/Publication/53727016
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.
Ladattava julkaisu This is an electronic reprint of the original article. |