Vertaisarvioitu artikkeli konferenssijulkaisussa (A4)
Finnish Paraphrase Corpus
Julkaisun tekijät: Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto
Konferenssin vakiintunut nimi: Nordic Conference on Computational Linguistics
Julkaisuvuosi: 2021
Journal: Linköping Electronic Conference Proceedings
Kirjan nimi *: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
Sarjan nimi: Linköping Electronic Conference Proceedings
Numero sarjassa: 178
ISBN: 978-91-7929-614-8
ISSN: 1650-3686
Verkko-osoite: https://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/Publication/53727016
In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.
Ladattava julkaisu This is an electronic reprint of the original article. |