A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Distinguishing translations from non-translations and identifying (in)direct translations’ source languages




TekijätIvaska Laura

ToimittajaJarmo Harri Jantunen, Sisko Brunni, Niina Kunnas, Santeri Palviainen, Katja Västi

Konferenssin vakiintunut nimiResearch Data and Humanities

KustannuspaikkaOulu

Julkaisuvuosi2019

JournalStudia Humaniora Ouluensia

Kokoomateoksen nimiProceedings of the Research Data and Humanities (RDHUM) 2019 Conference: Data, Methods and Tools

Sarjan nimiStudia humaniora ouluensia

Numero sarjassa17

Aloitussivu125

Lopetussivu138

ISBN978-952-62-2320-9

eISBN978-952-62-2321-6

ISSN1796-4725

Verkko-osoitehttps://www.oulu.fi/sites/default/files/content/ProceedingsStudiaHumanioraOuluensia17.pdf

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/43722199


Tiivistelmä

The scope of this study is threefold. First, machine learning will be applied to
distinguish translated from non-translated Finnish texts. Then, it will attempt to
identify the source languages of the translated Finnish texts. Finally, the source
language identification will be tested with indirect translations, that is, with
translations made from translations. The three underlying research questions are: 1)
Can translated Finnish be distinguished from non-translated Finnish? 2) Can the
source languages of Finnish translations be identified? 3) If the answer to question
2 is yes, then what happens when the method is applied to indirect translations; will
the analysis identify the ultimate source language, the mediating language, or
neither?

This study is based on the hypothesis that translated language contains traces
of the source language (Toury 1995). The corpus of the study consists of nontranslated
Finnish prose, Finnish prose literature translations made from English,
German, French, Modern Greek, and Swedish, as well as indirect translations from
Modern Greek into Finnish via English, German, French, and Swedish. The
analyses are based on cluster analysis and support vector machines using the
frequencies of the most frequent lemmatized words.
Results show that translated and non-translated Finnish can be distinguished
by using machine learning techniques. Support vector machine-based source
language identification, however, was only partially successful, while a cluster
analysis suggested that there is coherence within a group of texts translated from
the same source language and variation between the groups of texts with different
source languages. Clustering was further tested with indirect translations, and the
results were mixed: six of the thirteen tested indirect translations clustered with
direct translations from the ultimate source language, two with translations from
their mediating languages, and five with neither.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.





Last updated on 2024-26-11 at 18:55