A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Beyond the English web: Zero-shot cross-lingual and lightweight monolingual classification of registers




TekijätRepo Liina, Skantsi Valtteri, Rönnqvist Samuel, Hellström Saara, Oinonen Miika, Salmela Anna, Biber Douglas, Egbert Jesse, Pyysalo Sampo, Laippala Veronika

ToimittajaIonut-Teodor Sorodoc, Madhumita Sushil, Ece Takmaz, Eneko Agirre

Konferenssin vakiintunut nimiEuropean Chapter of the Association for Computational Linguistics

KustantajaAssociation for Computational Linguistics (ACL)

Julkaisuvuosi2021

Kokoomateoksen nimiProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Tietokannassa oleva lehden nimiEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Student Research Workshop

Aloitussivu183

Lopetussivu191

ISBN978-1-954085-04-6

Verkko-osoitehttps://aclanthology.org/2021.eacl-srw.24.pdf

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/66505697


Tiivistelmä

We explore cross-lingual transfer of register classification for web documents. Registers,
that is, text varieties such as blogs or news are one of the primary predictors of linguistic variation and thus affect the automatic processing of language.

We introduce two new registerannotated corpora, FreCORE and SweCORE, for French and Swedish. We demonstrate that deep pre-trained language models perform strongly in these languages and outperform previous state-of-the-art in English and Finnish.

Specifically, we show 1) that zeroshot cross-lingual transfer from the large English CORE corpus can match or surpass previously published monolingual models, and 2) that lightweight monolingual classification requiring very little training data can reach or surpass our zero-shot performance. We further analyse classification results finding that certain registers continue to pose challenges in particular for cross-lingual transfer.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 18:56