A3 Vertaisarvioitu kirjan tai muun kokoomateoksen osa

Från dialektinspelning till talspråkskorpus – beskrivning av ett korpusbygge




Alaotsikkobeskrivning av ett korpusbygge

TekijätLisa Södergård, Therese Leinonen

ToimittajaJ.-O. östman et al.

KustannuspaikkaHelsinki

Julkaisuvuosi2017

Kokoomateoksen nimiIdeologi, identitet, intervention. Tionde nordiska dialektologkonferensen

Sarjan nimiNordica Helsingiensia

Numero sarjassa48

ISBN978-951-51-2996-3

ISSN1795-4428

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/2315952


Tiivistelmä

The Talko corpus of Swedish spoken in Finland is a new research tool consisting of audio files linked to annotation, i.e., transcriptions on two parallel levels and part-of-speech tagging. The corpus is searchable through a web-based interface. The re­cord­ings were made in 2005–2008 in all parts of Swedish-language Finland. They have been transcribed in a broad phonetic transcription as well as in a standard ortho­graphic transcription. The part-of-speech tagging is done with TreeTagger, trained on the Stockholm-Umeå Corpus of written Swedish. The automatically pro­duced part-of-speech tags are manually corrected for subsets of the data, and the manually corrected data are subsequently added to the training data. This will grad­ually improve the result of the automatic tagging and compensate for differences between spoken and written Swedish and between Finland-Swedish and Sweden-Swedish.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 11:12