A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

TCBLex - A lexical database of Finnish literary texts for children




TekijätNojonen, Tapio; Korsu, Kiia; Ginter, Filip; Laippala, Veronika; Kanerva, Jenna

KustantajaSpringer Science and Business Media LLC

Julkaisuvuosi2025

Lehti:Behavior Research Methods

Artikkelin numero312

Vuosikerta57

ISSN1554-351X

eISSN1554-3528

DOIhttps://doi.org/10.3758/s13428-025-02832-x

Verkko-osoitehttps://doi.org/10.3758/s13428-025-02832-x

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/504652992


Tiivistelmä

This work introduces TCBLex, a lexical database of Finnish literary works read by children between the ages of 7 and 15. We explain in detail the work done to build the corpus TCBLex is based on, including how books were sampled and collected, turned into text files, and finally processed. We also touch on legal considerations and how it is possible to build such a corpus in the EU. TCBLex contains over 11 million tokens that are annotated with parts-of-speech tags and lemmatized. We provide 14 different sub-lexicons in total, covering individual intended reading ages, age groups, as well as different genres. We also provide versions with additional morphological information, such as the cases and tenses of words. TCBLex provides various psycholinguistically interesting lexical statistics for both word types and lemmas, such as different frequency metrics, distributions, word lengths, numbers of syllables, morphological paradigm sizes, and for the first time in a Finnish lexicon, ages when words and lemmas are first encountered in books. TCBLex is freely available at https://doi.org/10.5281/zenodo.15655580.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
Open Access funding provided by University of Turku (including Turku University Central Hospital). The present study is a part of the EDUCA Flagship funded by the Research Council of Finland (#358924, #358947) and the EDUCA-Doc Doctoral Education pilot funded by the Ministry of Education and Culture (Doctoral school pilot #VN/3137/2024-OKM-4).


Last updated on 2025-16-10 at 08:55