A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
To Compress or not to Compress? A Finite-State Approach to Nen Verbal Morphology
Tekijät: Saliha Muradoglu, Nicholas Evans, Hanna Suominen
Konferenssin vakiintunut nimi: Annual Meeting of the Association for Computational Linguistics
Julkaisuvuosi: 2020
Kokoomateoksen nimi: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Aloitussivu: 207
Lopetussivu: 213
Sivujen määrä: 7
ISBN: 978-1-952148-03-3
DOI: https://doi.org/10.18653/v1/2020.acl-srw.28
This paper describes the development of a verbal morphological parser
for an under-resourced Papuan language, Nen. Nen verbal morphology is
particularly complex, with a transitive verb taking up to 1,740 unique
features. The structural properties exhibited by Nen verbs raises
interesting choices for analysis. Here we compare two possible methods
of analysis: ‘Chunking’ and decomposition. ‘Chunking’ refers to the
concept of collating morphological segments into one, whereas the
decomposition model follows a more classical linguistic approach. Both
models are built using the Finite-State Transducer toolkit foma. The
resultant architecture shows differences in size and structural clarity.
While the ‘Chunking’ model is under half the size of the full
de-composed counterpart, the decomposition displays higher structural
order. In this paper, we describe the challenges encountered when
modelling a language exhibiting distributed exponence and present the
first morphological analyser for Nen, with an overall accuracy of 80.3%.