Corpus Linguistics and Eighteenth Century Collections Online (ECCO) - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Corpus Linguistics and Eighteenth Century Collections Online (ECCO)

Tekijät: Tolonen Mikko, Mäkelä Eetu, Ijaz Ali, Lahti Leo

Kustantaja: Asociacion Espanola de Linguistica de Corpus

Julkaisuvuosi: 2021

Lehti: Research in Corpus Linguistics

Vuosikerta: 9

Numero: 1

Aloitussivu: 19

Lopetussivu: 34

eISSN: 2243-4712

DOI: https://doi.org/10.32714/ricl.09.01.03

Verkko-osoite: https://ricl.aelinco.es/index.php/ricl/article/view/161

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/66578597

Tiivistelmä

Eighteenth Century Collections Online (ECCO) is the most comprehensive dataset available in machine-readable form for eighteenth-century printed texts. It plays a crucial role in studies of eighteenth-century language and it has vast potential for corpus linguistics. At the same time, it is an unbalanced corpus that poses a series of different problems. The aim of this paper is to offer a general overview of ECCO for corpus linguistics by analysing, for example, its publication countries and languages. We will also analyse the role of the substantial number of reprints and new editions in the data, discuss genres and the estimates of Optical Character Recognition (OCR) quality. Our conclusion is that whereas ECCO provides a valuable source for corpus linguistics, scholars need to pay attention to historical source criticism. We have highlighted key aspects that need to be taken into consideration when considering its possible uses.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

161-Article Text-1107-1-10-20210427.pdf