Refereed article in conference proceedings (A4)

Fine-grained Named Entity Annotation for Finnish




List of AuthorsLuoma Jouni, Chang Li-Hsin, Ginter Filip, Pyysalo Sampo

EditorsSimon Dobnik, Lilja Øvrelid

Conference nameNordic Conference on Computational Linguistics

Publication year2021

JournalLinköping Electronic Conference Proceedings

Book title *Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Title of seriesLinköping Electronic Conference Proceedings

Number in series178

Start page135

End page144

ISBN978-91-7929-614-8

ISSN1650-3686

URLhttps://ep.liu.se/en/conference-article.aspx?series=ecp&issue=178&Article_No=14

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/56909867


Abstract

We introduce a corpus with fine-grained named entity annotation for Finnish, following the OntoNotes guidelines to create a resource that is cross-lingually compatible with existing annotations for other languages. We combine and extend two NER corpora recently introduced for Finnish and revise their custom annotation scheme through a combination of automatic and manual processing steps. The resulting corpus consists of nearly 500,000 tokens annotated for over 50,000 mentions categorized into the 18 OntoNotes name and numeric entity types. We evaluate this resource and demonstrate its compatibility with the English OntoNotes annotations by training state-of-the-art mono-, bi- and multilingual deep learning models, finding both that the corpus allows highly accurate recognition of OntoNotes types at 93\% F-score and that a comparable level of tagging accuracy can be achieved by a bilingual Finnish-English NER model.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Last updated on 2022-07-04 at 18:30