A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
In search of founding era registers: automatic modeling of registers from the corpus of Founding Era American English
Tekijät: Repo Liina, Hashimoto Brett, Laippala Veronika
Julkaisuvuosi: 2023
Lehti:Digital Scholarship in the Humanities
DOI: https://doi.org/10.1093/llc/fqad049
Verkko-osoite: https://doi.org/10.1093/llc/fqad049
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/181743412
Registers are situationally defined text varieties, such as letters, essays, or news articles, that are considered to be one of the most important predictors of linguistic variation. Often historical databases of language lack register information, which could greatly enhance their usability (e.g. Early English Books Online). This article examines register variation in Late Modern English and automatic register identification in historical corpora. We model register variation in the corpus of Founding Era American English (COFEA) and develop machine-learning methods for automatic register identification in COFEA. We also extract and analyze the most significant grammatical characteristics estimated by the classifier for the best-predicted registers and found that letters and journals in the 1700s were characterized by informational density. The chosen method enables us to learn more about registers in the Founding Era. We show that some registers can be reliably identified from COFEA, the best overall performance achieved by the deep learning model Bidirectional Encoder Representations from Transformers with an F1-score of 97 per cent. This suggests that deep learning models could be utilized in other studies concerned with historical language and its automatic classification.
Ladattava julkaisu This is an electronic reprint of the original article. |