A4 Vertaisarvioitu artikkeli konferenssijulkaisussa
Distinguishing translations from non-translations and identifying (in)direct translations’ source languages
Tekijät: Ivaska Laura
Toimittaja: Jarmo Harri Jantunen, Sisko Brunni, Niina Kunnas, Santeri Palviainen, Katja Västi
Konferenssin vakiintunut nimi: Research Data and Humanities
Kustannuspaikka: Oulu
Julkaisuvuosi: 2019
Journal: Studia Humaniora Ouluensia
Kokoomateoksen nimi: Proceedings of the Research Data and Humanities (RDHUM) 2019 Conference: Data, Methods and Tools
Sarjan nimi: Studia humaniora ouluensia
Numero sarjassa: 17
Aloitussivu: 125
Lopetussivu: 138
ISBN: 978-952-62-2320-9
eISBN: 978-952-62-2321-6
ISSN: 1796-4725
Verkko-osoite: https://www.oulu.fi/sites/default/files/content/ProceedingsStudiaHumanioraOuluensia17.pdf
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/43722199
The scope of this study is threefold. First, machine learning will be applied to
distinguish translated from non-translated Finnish texts. Then, it will attempt to
identify the source languages of the translated Finnish texts. Finally, the source
language identification will be tested with indirect translations, that is, with
translations made from translations. The three underlying research questions are: 1)
Can translated Finnish be distinguished from non-translated Finnish? 2) Can the
source languages of Finnish translations be identified? 3) If the answer to question
2 is yes, then what happens when the method is applied to indirect translations; will
the analysis identify the ultimate source language, the mediating language, or
neither?
This study is based on the hypothesis that translated language contains traces
of the source language (Toury 1995). The corpus of the study consists of nontranslated
Finnish prose, Finnish prose literature translations made from English,
German, French, Modern Greek, and Swedish, as well as indirect translations from
Modern Greek into Finnish via English, German, French, and Swedish. The
analyses are based on cluster analysis and support vector machines using the
frequencies of the most frequent lemmatized words.
Results show that translated and non-translated Finnish can be distinguished
by using machine learning techniques. Support vector machine-based source
language identification, however, was only partially successful, while a cluster
analysis suggested that there is coherence within a group of texts translated from
the same source language and variation between the groups of texts with different
source languages. Clustering was further tested with indirect translations, and the
results were mixed: six of the thirteen tested indirect translations clustered with
direct translations from the ultimate source language, two with translations from
their mediating languages, and five with neither.
Ladattava julkaisu This is an electronic reprint of the original article. |