A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Analyzing the unrestricted web: The finnish corpus of online registers




TekijätSkantsi Valtteri, Laippala Veronika

KustantajaCAMBRIDGE UNIV PRESS

Julkaisuvuosi2023

JournalNordic Journal of Linguistics

Tietokannassa oleva lehden nimiNORDIC JOURNAL OF LINGUISTICS

Lehden akronyymiNORD J LINGUIST

Artikkelin numeroPII S0332586523000021

Sivujen määrä31

ISSN0332-5865

eISSN1502-4717

DOIhttps://doi.org/10.1017/S0332586523000021

Verkko-osoitehttps://doi.org/10.1017/S0332586523000021

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/179300015


Tiivistelmä
This article introduces the Finnish Corpus of Online Registers (FinCORE) representing the full range of registers - situationally defined text varieties such as news and blogs - on the Finnish Internet. The extreme range of language use found online has challenged the study of registers. It has been unclear what registers the entire Internet includes, and if they can be sufficiently defined to allow for their analysis or classification, previous studies focusing on restricted sets of registers and English. FinCORE features 10,754 texts from the unrestricted web, manually annotated for their register using a scheme originally established for the Corpus of Online Registers of English (CORE). We present the FinCORE registers and compare them to CORE. Finally, we show that the FinCORE registers are sufficiently well-defined to allow for their automatic identification, thus opening novel possibilities for both linguistics and web-as-corpus research. FinCORE is published under an open license.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 23:22