Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words - UTU Research Portal

A1 Refereed original research article in a scientific journal

Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

Authors: Hakala, Tero; Lindh-Knuutila, Tiina; Hulten, Annika; Lehtonen, Minna; Salmelin, Riitta

Publishing place: CAMBRIDGE

Publication year: 2024

Journal: Neurobiology of language

Journal name in source: NEUROBIOLOGY OF LANGUAGE

Journal acronym: NEUROBIOL LANG

Volume: 5

Issue: 4

First page : 844

Last page: 863

Number of pages: 20

eISSN: 2641-4368

DOI: https://doi.org/10.1162/nol_a_00149

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://doi.org/10.1162/nol_a_00149

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/458252538

Self-archived copy's licence: CC BY

Self-archived copy's version: Publisher`s PDF

Abstract

This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350-500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

nol_a_00149.pdf

Funding information in the publication:
Riitta Salmelin, Academy of Finland (https://dx.doi.org/10.13039/501100002341), Award ID: LASTU, 256887. Riitta Salmelin, Academy of Finland (https://dx.doi.org/10.13039 /501100002341), Award ID: 255349. Riitta Salmelin, Academy of Finland (https://dx.doi.org /10.13039/501100002341), Award ID: 315553. Minna Lehtonen, Academy of Finland (https:// dx.doi.org/10.13039/501100002341), Award ID: 288880. Annika Hultén, Academy of Finland (https://dx.doi.org/10.13039/501100002341), Award ID: 287474. Tiina Lindh-Knuutila, Aalto Brain Center. Riitta Salmelin, Sigrid Juséliuksen Säätiö (https://dx.doi.org/10.13039 /501100006306). Riitta Salmelin, Academy of Finland, Award ID: 355407.