A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
TCBLex - A lexical database of Finnish literary texts for children
Tekijät: Nojonen, Tapio; Korsu, Kiia; Ginter, Filip; Laippala, Veronika; Kanerva, Jenna
Kustantaja: Springer Science and Business Media LLC
Julkaisuvuosi: 2025
Lehti:: Behavior Research Methods
Artikkelin numero: 312
Vuosikerta: 57
ISSN: 1554-351X
eISSN: 1554-3528
DOI: https://doi.org/10.3758/s13428-025-02832-x
Verkko-osoite: https://doi.org/10.3758/s13428-025-02832-x
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/504652992
This work introduces TCBLex, a lexical database of Finnish literary works read by children between the ages of 7 and 15. We explain in detail the work done to build the corpus TCBLex is based on, including how books were sampled and collected, turned into text files, and finally processed. We also touch on legal considerations and how it is possible to build such a corpus in the EU. TCBLex contains over 11 million tokens that are annotated with parts-of-speech tags and lemmatized. We provide 14 different sub-lexicons in total, covering individual intended reading ages, age groups, as well as different genres. We also provide versions with additional morphological information, such as the cases and tenses of words. TCBLex provides various psycholinguistically interesting lexical statistics for both word types and lemmas, such as different frequency metrics, distributions, word lengths, numbers of syllables, morphological paradigm sizes, and for the first time in a Finnish lexicon, ages when words and lemmas are first encountered in books. TCBLex is freely available at https://doi.org/10.5281/zenodo.15655580.
Ladattava julkaisu This is an electronic reprint of the original article. |
Julkaisussa olevat rahoitustiedot:
Open Access funding provided by University of Turku (including Turku University Central Hospital). The present study is a part of the EDUCA Flagship funded by the Research Council of Finland (#358924, #358947) and the EDUCA-Doc Doctoral Education pilot funded by the Ministry of Education and Culture (Doctoral school pilot #VN/3137/2024-OKM-4).