A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Finnish Paraphrase Corpus




TekijätKanerva Jenna, Ginter Filip, Chang Li-Hsin, Rastas Iiro, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Saarni Jenna, Sevón Maija, Tarkka Otto

ToimittajaSimon Dobnik, Lilja Øvrelid

Konferenssin vakiintunut nimiNordic Conference on Computational Linguistics

Julkaisuvuosi2021

JournalLinköping Electronic Conference Proceedings

Kokoomateoksen nimiProceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Sarjan nimiLinköping Electronic Conference Proceedings

Numero sarjassa178

Aloitussivu288

Lopetussivu298

ISBN978-91-7929-614-8

ISSN1650-3686

Verkko-osoitehttps://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=29

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/Publication/53727016


Tiivistelmä

In this paper, we introduce the firstfully manually annotated paraphrase cor-pus for Finnish containing 53,572 para-phrase pairs harvested from alternative subtitles and news headings. Out of all paraphrase pairs in our corpus 98% are manually classified to be paraphrases at least in their given context, if not in all contexts. Additionally, we establish a manual candidate selection method and demonstrate its feasibility in high quality paraphrase selection in terms of both costand quality.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 22:55