A1 Refereed original research article in a scientific journal

In search of founding era registers: automatic modeling of registers from the corpus of Founding Era American English




AuthorsRepo Liina, Hashimoto Brett, Laippala Veronika

Publication year2023

Journal:Digital Scholarship in the Humanities

DOIhttps://doi.org/10.1093/llc/fqad049

Web address https://doi.org/10.1093/llc/fqad049

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/181743412


Abstract

Registers are situationally defined text varieties, such as letters, essays, or news articles, that are considered to be one of the most important predictors of linguistic variation. Often historical databases of language lack register information, which could greatly enhance their usability (e.g. Early English Books Online). This article examines register variation in Late Modern English and automatic register identification in historical corpora. We model register variation in the corpus of Founding Era American English (COFEA) and develop machine-learning methods for automatic register identification in COFEA. We also extract and analyze the most significant grammatical characteristics estimated by the classifier for the best-predicted registers and found that letters and journals in the 1700s were characterized by informational density. The chosen method enables us to learn more about registers in the Founding Era. We show that some registers can be reliably identified from COFEA, the best overall performance achieved by the deep learning model Bidirectional Encoder Representations from Transformers with an F1-score of 97 per cent. This suggests that deep learning models could be utilized in other studies concerned with historical language and its automatic classification.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2025-27-03 at 21:57