A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
Tekijät: Bassignana Elisa, Ginter Filip, Pyysalo Sampo, Rob van der Goot, Plank Barbara
Toimittaja: Tanel Alumäe, Mark Fishel
Konferenssin vakiintunut nimi: Nordic Conference on Computational Linguistics
Julkaisuvuosi: 2023
Journal: NEALT proceedings series
Kokoomateoksen nimi: Proceedings of The 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Sarjan nimi: NEALT proceedings series
Numero sarjassa: 52
Aloitussivu: 80
Lopetussivu: 85
ISBN: 978-99-1621-999-7
ISSN: 1736-8197
eISSN: 1736-6305
Verkko-osoite: https://aclanthology.org/2023.nodalida-1.9
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/380758650
Most research in Relation Extraction (RE) involves the English language, mainly due to the lack of multi-lingual resources. We propose MULTI-CROSSRE, the broadest multi-lingual dataset for RE, including 26 languages in addition to English, and covering six text domains. MULTICROSSRE is a machine translated version of CrossRE (Bassignana and Plank, 2022a), with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers. We run a baseline model over the 26 new datasets and—as sanity check—over the 26 back-translations to English. Results on the back-translated data are consistent with the ones on the original English CrossRE, indicating high quality of the translation and the resulting dataset.
Ladattava julkaisu This is an electronic reprint of the original article. |