Distinguishing translations from non-translations and identifying (in)direct translations’ source languages - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Distinguishing translations from non-translations and identifying (in)direct translations’ source languages

Tekijät: Ivaska Laura

Toimittaja: Jarmo Harri Jantunen, Sisko Brunni, Niina Kunnas, Santeri Palviainen, Katja Västi

Konferenssin vakiintunut nimi: Research Data and Humanities

Kustannuspaikka: Oulu

Julkaisuvuosi: 2019

Lehti: Studia Humaniora Ouluensia

Kokoomateoksen nimi: Proceedings of the Research Data and Humanities (RDHUM) 2019 Conference: Data, Methods and Tools

Sarjan nimi: Studia humaniora ouluensia

Numero sarjassa: 17

Aloitussivu: 125

Lopetussivu: 138

ISBN: 978-952-62-2320-9

eISBN: 978-952-62-2321-6

ISSN: 1796-4725

Verkko-osoite: https://www.oulu.fi/sites/default/files/content/ProceedingsStudiaHumanioraOuluensia17.pdf

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/43722199

Tiivistelmä

The scope of this study is threefold. First, machine learning will be applied to
distinguish translated from non-translated Finnish texts. Then, it will attempt to
identify the source languages of the translated Finnish texts. Finally, the source
language identification will be tested with indirect translations, that is, with
translations made from translations. The three underlying research questions are: 1)
Can translated Finnish be distinguished from non-translated Finnish? 2) Can the
source languages of Finnish translations be identified? 3) If the answer to question
2 is yes, then what happens when the method is applied to indirect translations; will
the analysis identify the ultimate source language, the mediating language, or
neither?

This study is based on the hypothesis that translated language contains traces
of the source language (Toury 1995). The corpus of the study consists of nontranslated
Finnish prose, Finnish prose literature translations made from English,
German, French, Modern Greek, and Swedish, as well as indirect translations from
Modern Greek into Finnish via English, German, French, and Swedish. The
analyses are based on cluster analysis and support vector machines using the
frequencies of the most frequent lemmatized words.
Results show that translated and non-translated Finnish can be distinguished
by using machine learning techniques. Support vector machine-based source
language identification, however, was only partially successful, while a cluster
analysis suggested that there is coherence within a group of texts translated from
the same source language and variation between the groups of texts with different
source languages. Clustering was further tested with indirect translations, and the
results were mixed: six of the thirteen tested indirect translations clustered with
direct translations from the ultimate source language, two with translations from
their mediating languages, and five with neither.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

ivaska_distinguishing_translations.pdf