Jenna Kanerva
jmnybl@utu.fi ORCID identifier: https://orcid.org/ https://orcid.org/ 0000-0003-4580-5366 |
language technology, natural language processing, machine learning, corpus annotation
University Lecturer in Language Technology and Digital Language Studies. I’m working as a part of the TurkuNLP research group focusing on language technology and natural language processing (NLP) related topics. I got my MSc degree in 2014 and PhD in 2024 (computer science, University of Turku).
My research focuses on language technology, with a particular interest in machine learning based methods for Finnish language processing. I also greatly enjoy and value elementary corpus work and data annotation.
I'm involved in teaching language technology related courses at the Department of Computing and Digital Language Studies. I have completed a 25 ECTS study module in university pedagogy between years 2019 and 2021.
- A Deep Dive into Multi-Head Attention and Multi-Aspect Embedding (2025)
- Recent Advances in Natural Language Processing
(A4 Refereed article in a conference publication ) - Creating a Historical Migration Dataset from Finnish Church Records, 1800–1920 (2025)
- Journal of Open Humanities Data
(A1 Refereed original research article in a scientific journal) - Muuttoluetteloista digitaaliseen tietokantaan – muuttoluettelot sukututkijan avuksi (2025)
- Suku: lehti sukututkijoille
(E1 Popularised article) - OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches (2025) Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025) Kanerva, Jenna; Ledins, Cassadra; Käpyaho, Siiri; Ginter, Filip
(A4 Refereed article in a conference publication ) - TCBLex - A lexical database of Finnish literary texts for children (2025)
- Behavior Research Methods
(A1 Refereed original research article in a scientific journal) - Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs (2024)
- CEUR Workshop Proceedings
(A4 Refereed article in a conference publication ) - Improving Latin Dependency Parsing by Combining Treebanks and Predictions (2024) Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities Kupari, Hanna-Mari Kristiina; Henriksson, Erik; Laippala, Veronika; Kanerva, Jenna
(A4 Refereed article in a conference publication ) - Semantic search as extractive paraphrase span detection (2024)
- Language Resources and Evaluation
(A1 Refereed original research article in a scientific journal) - Understanding the structure and meaning of Finnish texts: From corpus creation to deep language modelling (2024) Kanerva, Jenna
(G5 Article dissertation ) - FinGPT: Large Generative Models for a Small Language (2023) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Luukkonen Risto, Komulainen Ville, Luoma Jouni, Eskelinen Anni, Kanerva Jenna, Kupari Hanna-Mari, Ginter Filip, Laippala Veronika, Muennighoff Niklas, Piktus Aleksandra, Wang Thomas, Tazi Nouamane, Scao Le Teven, Wolf Thomas, Suominen Osma, Sairanen Samuli, Merioksa Mikko, Heinonen Jyrki, Vahtola Aija, Antao Samuel, Pyysalo Sampo
(A4 Refereed article in a conference publication ) - Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for Finnish (2023)
- Natural Language Engineering
(A1 Refereed original research article in a scientific journal) - Deep Learning and Film History: Model Explanation Techniques in the Analysis of Temporality in Finnish Fiction Film Metadata (2022)
- CEUR Workshop Proceedings
(A4 Refereed article in a conference publication ) - GEMv2: Multilingual NLG Benchmarking in a Single Line of Code (2022) Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Gehrmann Sebastian, Bhattacharjee Abhik, Mahendiran Abinaya, Wang Alex, Papangelis Alexandros, Madaan Aman, McMillan-Major Angelina, Shvets Anna, Upadhyay Ashish, Bohnet Bernd, Yao Bingsheng, Wilie Bryan, Bhagavatula Chandra, You Chaobin, Thomson Craig, Garbacea Cristina, Wang, Dakuo, Deutsch Daniel, Xiong Deyi, Jin Di, Gkatzia Dimitra, Radev Dragomir, Clark Elizabeth, Durmus Esin, Ladhak Faisal, Ginter Filip, Winata Genta Indra, Strobelt, Hendrik, Hayashi, Hiroaki, Novikova Jekaterina, Kanerva Jenna, Chim Jenny, Zhou Jiawei, Clive Jordan, Maynez Joshua, Sedoc João, Juraska Juraj, Dhole Kaustubh, Chandu Khyathi Raghavi, Perez-Beltrachini Laura, Ribeiro Leonardo F.R., Tunstall Lewis, Zhang Li, Pushkarna Mahima, Creutz Mathias, White Michael, Kale Mihir Sanjay, Eddine Moussa Kamal, Daheim Nico, Subramani, Nishant, Dusek Ondrej, Liang Paul Pu, Ammanamanchi Pawan Sasanka, Zhu Qi, Puduppully Ratish, Kriz Reno, Shahriyar Rifat, Cardenas Ronald, Mahamood Saad, Osei Salomey, Cahyawijaya Samuel, Štajner Sanja, Montella Sebastien, Jolly Shailza, Mille Simon, Hasan Tahmid, Shen Tianhao, Adewumi Tosin, Raunak Vikas, Raheja Vipul, Nikolaev Vitaly, Tsai Vivian, Jernite Yacine, Xu Ying, Sang Yisi, Liu Yixin, Hou Yufang
(A4 Refereed article in a conference publication ) - Out-of-Domain Evaluation of Finnish Dependency Parsing (2022)
- LREC Proceedings
(A4 Refereed article in a conference publication ) - Paimen, piika ja emäntä. Arvot ja ammatit suomalaisessa näytelmäelokuvassa 1907–2017 (2022)
- Lähikuva
(A1 Refereed original research article in a scientific journal) - Textual Paraphrase Dataset for Deep Language Modelling (2022) European Language Grid: A Language Technology Platform for Multilingual Europe Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Piirto Aurora, Saarni Jenna, Sevón Maija, Tarkka Otto
(A3 Refereed book chapter or chapter in a compilation book) - Towards Automatic Short Answer Assessment for Finnish as a Paraphrase Retrieval Task (2022) Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022) Chang Li-Hsin, Kanerva Jenna, Ginter Filip
(A4 Refereed article in a conference publication ) - Finnish Paraphrase Corpus (2021)
- Linköping Electronic Conference Proceedings
(A4 Refereed article in a conference publication ) - Quantitative Evaluation of Alternative Translations in a Corpus of Highly Dissimilar Finnish Paraphrases (2021) Proceedings for the First Workshop on Modelling Translation: Translatology in the Digital Age Chang Li-Hsin, Pyysalo Sampo, Kanerva Jenna, Ginter Filip
(A4 Refereed article in a conference publication ) - Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks (2021)
- Natural Language Engineering
(A1 Refereed original research article in a scientific journal)



