Hanna-Mari Kupari
filosofian maisteri - Master of Arts
hmknie@utu.fi Arcanuminkuja 1 Turku Office: A390 ORCID identifier: https://orcid.org/0000-0003-2515-5861 |
digital linguistics, medieval Latin; corpus linguistics; TEI-xml; automatic morpho-syntactic parsing
Corpus linguistics methods for study of Medieval Latin
I am a doctoral researcher in digital language studies at the University of Turku, funded by the Emil Aaltonen Foundation. My research combines medieval Latin textual data with state-of-the-art machine learning methods. I hold a Master’s degree in Classical Philology with a specialization in Latin, focusing on medieval Latin. My primary interests include the study of grammar, quantitative methodologies, and local history.
In addition to my academic work, I am passionate about science communication and have published numerous popularized articles. I am a member of the school’s Working Group on Societal Interaction and Communications for several semesters. Additionally, I have also been an active member of the Tohtoriverkosto society, contributing regularly to its activities.
Modern Methods for Medieval Texts
In my digital humanities doctoral dissertation, I investigate medieval apostolic penitentiary documents and the Registrum Ecclesiae Aboensis copybook using corpus linguistics methods. My research focuses on language use and linguistic variation (register analysis) in Medieval Latin, utilizing metadata-enriched and morpho-syntactically annotated corpora. Committed to promoting open-access research, I openly publish all my code, data, and results alongside my publications.
I am a member of the TurkuNLP and TUCEMEMS research groups.
Grants
My work is supported by several grants, including the Emil Aaltonen Foundation grant (2022–2024), Turku University Foundation travel grant (2023), University of Turku research grants (2022, 2021), Finnish Cultural Foundation Varsinais-Suomi Regional Fund grant (2021), and Uskelan Opintorahastosäätiö grant (2020). I have also received Turku University Foundation Villa Tammekann grants for research visits to Tartu, Estonia (2023, 2024, 2025).
In 2024, I was awarded the Otto A. Malm mobility grant and the Kordelin Foundation full-time working grant. In January and December 2024, I conducted my research at the Finnish Institute in Rome, visiting the penitentiary archive and libraries. For 2025 I was awarded the Villa Lanten ystävät – Villa Lantes vänner ry. grant. I have received a travel grant from COST Action CA21167 for Corpus Linguistics 2025.
Research Visits
In fall 2024, I visited the History Department at Harvard University to discuss their current research in digital methods and to present my own work on parsers. Winter 2024, I presentend my work at The American Academy in Rome as a part of the Circolo Gianicolense seminar. In spring 2025, I was invited to present my work at the Junge Zürcher Mediävistik seminar at the University of Zürich.
Teaching Experience
University of Tartu, Estonia
- Lecture From Manuscripts and Edited Publications to XML for BA and MA students as a part of Paberilt arvutisse lecture course
- Practical Workshop: Automatic morpho-syntactic annotation of large language corpora using the Universal Dependencies framework (spring 2024). This five-session workshop for PhD students and staff covered theory, terminology, parsing tools, and practical treebank creation.
- Lecture for the Digital Resources course in Classical Philology: Treebanks and automatic linguistic annotation for Classical Languages (spring 2024).
University of Turku, Finland
- Digital Interaction Lecture Course (spring 2024): One lecture: Using computer-assisted methods for parsing grammar.
- Corpus Linguistics and Language Technology (fall 2023, five lectures and 2024, six lectues): Topics included student projects, ethics and large language models, named-entity recognition, sentiment analysis, automatic morpho-syntactic parsing, representing language as vectors, and supervised and unsupervised machine learning.
- Linguistic Landscapes Course: A lecture titled Historiallisten kirjallisten lähteiden näkökulmia kielimaisemiin Turussa, co-taught with Professor Marko Lamberg (2023-03-15).
- Building the Penitentiary Document Corpus (PeDoCo) for NLP: Balancing Data Complexity and Uniform Data Structure (2025)
- Digital Humanities in the Nordic and Baltic Countries Publications
(A4 Refereed article in a conference publication ) - Ad fontes - digitaalisten resurssien ääreltä Vatikaanin arkiston alkuperäislähteiden pariin (2024)
- Fenestra Finnorum - Näköaloja Villa Lantesta
(E1 Popularised article) - Avointa tiedettä Vatikaanin arkistoissa osa 1: Pohdintoja keskiaikaisten kirkollisten anomusten avoimuudesta (2024)
- Avointiede.fi
(D1 Article in a professional journal) - Avointa tiedettä Vatikaanin arkistoissa osa 2: Muinaiset aineistot – nykyajan tekijänoikeudet (2024)
- Avointiede.fi
(E1 Popularised article) - Avoin tiede ja tutkimusinfra (2024)
- Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(D1 Article in a professional journal) - Improving Latin Dependency Parsing by Combining Treebanks and Predictions (2024) Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities Kupari, Hanna-Mari Kristiina; Henriksson, Erik; Laippala, Veronika; Kanerva, Jenna
(A4 Refereed article in a conference publication ) - Pääkirjoitus: Hiiskutun teemakokonaisuudet käynnistyvät kielen oppimisen teemalla (2024)
- Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(D1 Article in a professional journal) - Pääkirjoitus: Kielen opettamisen ajankohtaiset ilmiöt (2024)
- Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(D1 Article in a professional journal) - FinGPT: Large Generative Models for a Small Language (2023) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Luukkonen Risto, Komulainen Ville, Luoma Jouni, Eskelinen Anni, Kanerva Jenna, Kupari Hanna-Mari, Ginter Filip, Laippala Veronika, Muennighoff Niklas, Piktus Aleksandra, Wang Thomas, Tazi Nouamane, Scao Le Teven, Wolf Thomas, Suominen Osma, Sairanen Samuli, Merioksa Mikko, Heinonen Jyrki, Vahtola Aija, Antao Samuel, Pyysalo Sampo
(A4 Refereed article in a conference publication ) - Hiiskuttua-lehden uudet päätoimittajat esittäytyvät (2023)
- Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(D1 Article in a professional journal) - Keskiajan myytit erilaisten linssien läpi tarkasteltuna – populaarikulttuuri kohtaa penitentiariaattiasiakirjat (2023)
- Kulttuurihistorian seura : blogi
(E1 Popularised article) - Kohti suomenkielisiä keskustelumalleja: tule kehittämään tekoälyä (2023)
- Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(D1 Article in a professional journal) - Our everyday surroundings in Turku brought to life with narratives from the Middle Ages (2023)
- Elävää tiedettä
(E1 Popularised article) - Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for Finnish (2023)
- Natural Language Engineering
(A1 Refereed original research article in a scientific journal) - Salolaiset myöhäiskeskiaikaiset anomukset Vatikaanin arkistossa (2022)
- Hakastarolainen
(E1 Popularised article) - Textual Paraphrase Dataset for Deep Language Modelling (2022) European Language Grid: A Language Technology Platform for Multilingual Europe Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Piirto Aurora, Saarni Jenna, Sevón Maija, Tarkka Otto
(A3 Refereed book chapter or chapter in a compilation book) - Väkivaltakuolemien sanoittaminen Turun hiippakunnan asiakirjoissa 1450–1517 (2022)
- Kalmistopiiri
(D1 Article in a professional journal) - Vertaisarvioidun artikkelin kirjoittamisen ensimmäiset askeleet seurana Belcherin opas (2022)
- Kielingua
(D1 Article in a professional journal) - Finnish Paraphrase Corpus (2021)
- Linköping Electronic Conference Proceedings
(A4 Refereed article in a conference publication ) - Väkivallan ja kuoleman ilmaukset keskiaikaisessa penitentiariaattiaineistossa (2020)
- Kalmistopiiri
(D1 Article in a professional journal)