Filip Ginter
figint@utu.fi Työhuone: 4th floor, 451A ORCID-tunniste: https://orcid.org/0000-0002-5484-6103 |
natural language processing; human language technology; machine learning; deep learning; resource development
human language technology, natural language processing, machine learning applied to human language, both methodological and resource creation research
I am a researcher at the Department of Computing, University of Turku. My research is in the area of natural language processing. I belong to the TurkuNLP (turkunlp.org) research group.
I was born in 1978 in Ostrava, Czech Republic (Czechoslovakia back then). In 2001, I got a M.Sc. (tech) in computer science at the computer science department of VSB - Technical University Ostrava. My major subject was artificial intelligence. I gained a PhD in computer science in 2007. The title of my thesis is Towards Information Extraction in the Biomedical Domain: Methods and Resources.
As of 2022, I am a professor of language technology and as of 2021 the deputy director of the Department of Computing.
My primary field of research is language technology / natural language processing. In my post-PhD career, I have focused on the development of NLP tools and resources primarily for Finnish, but later also numerous other languages via the Universal Dependencies project. My work is heavy on resource development, both in terms of data and machine learning pipelines. Open science and resources play an important role in my research, much of which is carried out in the open on GitHub and as a rule, all resources are openly available for unrestricted use. I work collaboratively, especially with my younger colleagues, rather than striving for deeper, primary author inquiries.
I have been actively teaching since early on during my PhD studies. I independently prepared my first advanced level NLP course in 2004, and since ca. 2008 I have been teaching at least one course every year, substantially more during my bioinformatics lecturer appointment. While a lecturer in the bioinformatics MSc degree programme, I was lecturing international students in two cities. In 2016, I was tasked with developing and coordinating the introduction of a new 20 ECTS study module on natural language processing. This module is, with modifications, still in use and shared between the departments of Languages and Computing, both in terms of teaching and in terms of students. In 2019-2020 and 2020-2021 I was also co-lecturing, upon invitation, two courses in natural language processing in the Arcada University of Applied Sciences in Helsinki.
- A Deep Dive into Multi-Head Attention and Multi-Aspect EmbeddingFinnish SQuAD: A Simple Approach to Machine Translation of Span Annotations2025
- Recent Advances in Natural Language Processing
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Annotated textual dataset PV600 of perovskite bandgaps for information extraction from literature (2025)
- Scientific Data
(A1 Vertaisarvioitu data-artikkeli tieteellisessä lehdessä) - A RAG-based LLM Approach for Data Validation and Harmonization in Ship Design (2025) 23rd Conference on Computer and IT Applications in the Maritime Industries: COMPIT’25 Bronson, Janica; Teimouri, Maryam; Gaspar, Henrique; Fonseca, Icaro; Bierkowska, Karolina; Ginter, Filip; Koelman, Herbert
(D3 Artikkeli ammatillisessa konferenssijulkaisussa ) - (2025) Proceedings of the 29th International Symposium on Logistics (2025) : Embedding Circularity in Supply Chains Davoodi, Laleh; Salimi, Sima; Jyote, Abul Khair; Mezei, Jozsef; Ginter, Filip
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Creating a Historical Migration Dataset from Finnish Church Records, 1800–1920 (2025)
- Journal of Open Humanities DataLecture Notes in Computer Science
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä ) - FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering (2025)
- NEALT proceedings series
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - (2025)
- NEALT proceedings series
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Interaction Analysis by Humans and AI: A Comparative Perspective (2025) IDC '25: Proceedings of the 24th Interaction Design and Children Teimouri, Maryam; Ginter, Filip; Suovuo, Tomi
(Abstrakti) - OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches (2025) Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025) Kanerva, Jenna; Ledins, Cassadra; Käpyaho, Siiri; Ginter, FilipA4 Vertaisarvioitu artikkeli konferenssijulkaisussa
- Question Answering models for information extraction from perovskite materials science literature (2025)
- Communications materials
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä ) - Risk Detection in E-commerce with LLMs: Annotation Challenges and Lessons from Real-World Business News (2025)
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - TCBLex - A lexical database of Finnish literary texts for children (2025)
- Behavior Research Methods
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä ) - Application of the Question Answering method to extract information from materials science literature (2024) Sipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip; Todorović Milica
(Abstrakti) - Automatic Short Answer Grading for Finnish with ChatGPT (2024)
- Proceedings of the AAAI Conference on Artificial Intelligence
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Breakpoints in Iterative Development and Interdisciplinary Collaboration of AI-Driven Automated Assessment (2024)
- International Conference on Information Technology Based Higher Education and Training
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Extracting Social Connections from Finnish Karelian Refugee Interviews Using LLMs (2024)
- CEUR Workshop Proceedings
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Question Answering models for information extraction from perovskite materials science literature (2024) 2024 MRS Fall Meeting and Exhibit Sipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip, Todorović, Milica
(Abstrakti) - Semantic search as extractive paraphrase span detection (2024)
- Language Resources and Evaluation
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä ) - FinGPT: Large Generative Models for a Small Language (2023) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Luukkonen Risto, Komulainen Ville, Luoma Jouni, Eskelinen Anni, Kanerva Jenna, Kupari Hanna-Mari, Ginter Filip, Laippala Veronika, Muennighoff Niklas, Piktus Aleksandra, Wang Thomas, Tazi Nouamane, Scao Le Teven, Wolf Thomas, Suominen Osma, Sairanen Samuli, Merioksa Mikko, Heinonen Jyrki, Vahtola Aija, Antao Samuel, Pyysalo Sampo
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa) - Identifying gender bias in blockbuster movies through the lens of machine learning (2023)
- Humanities & social sciences communications
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä )



