Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771–1910




Aleksi Vesanto, Asko Nivala, Heli Rantala, Tapio Salakoski, Hannu Salmi, Filip Ginter

Gerlof Bouma, Yvonne Adesam

Workshop on Processing Historical Language

Gothenburg

2017

Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language

NEALT Proceedings Series

133

32

54

58

978-91-7685-503-4

1650-3686

http://www.ep.liu.se/ecp/133/010/ecp17133010.pdf

https://research.utu.fi/converis/portal/detail/Publication/20562472









We present the results of text reuse de-
tection, based on the corpus of scanned
and OCR-recognized Finnish newspapers
and journals from 1771 to 1910. Our
study draws on BLAST, a software cre-
ated for comparing and aligning biologi-
cal sequences. We show different types of
text reuse in this corpus, and also present
a comparison to the software Passim, de-
veloped at the Northeastern University in
Boston, for text reuse detection. 





Last updated on 2024-26-11 at 19:41