Veronika Laippala
mavela@utu.fi +358 29 450 3330 +358 50 328 9739 Arcanuminkuja 1 Turku |
Areas of expertise
Computational linguistics; text linguistics; corpus linguistics; digital discourse analysis.
Computational linguistics; text linguistics; corpus linguistics; digital discourse analysis.
Biography
I am a linguist who likes computers. My main research topics include language variation across different communicative situations and the development of automatic tools so that we could better benefit from large, web-crawled corpora.
My ongoing projects include "A piece of news, an opinion or something else? Different texts and their detection from the multilingual Internet" funded by Emil Aaltonen foundation and "Massively multilingual modeling of registers in web-scale data" funded by Academy of Finland.
For more information, please have a look at our lab website at https://turkunlp.github.io/
Publications
- Analyzing register variation in web texts through automatic segmentation (2025) Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities Henriksson, Erik; Hellström, Saara; Laippala, Veronika
(A4 Refereed article in a conference publication ) - An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT) (2025)
- Annual Meeting of the Association for Computational Linguistics
(A4 Refereed article in a conference publication ) - Aurora-M: Open Source Continual Pre-training for Multilingual Language and Code (2025) Proceedings of the 31st International Conference on Computational Linguistics : Industry Track Nakamura, Taishi; Mishra, Mayank; Tedeschi, Simone; Chai, Yekun; Stillerman, Jason T.; Friedrich, Felix; Yadav, Prateek; Laud, Tanmay; Chien, Vu Minh; Zhuo, Terry Yue; Misra, Diganta; Bogin, Ben; Vu, Xuan-Son; Karpinska, Marzena; Dantuluri, Arnav Varma; Kusa, Wojciech; Furlanello, Tommaso; Yokota, Rio; Muennighoff, Niklas; Pai, Suhas; Adewumi, Tosin; Laippala, Veronika; Yao, Xiaozhe; Junior, Adalberto Barbosa; Drozd, Aleksandr; Clive, Jordan; Gupta, Kshitij; Chen, Liangyu; Sun, Qi; Tsui, Ken; Moustafa-Fahmy, Nour; Monti, Nicolo; Dang, Tai; Luo, Ziyang; Bui, Tien-Tung; Navigli, Roberto; Mehta, Virendra; Blumberg, Matthew; May, Victor; Nguyen, Hiep; Pyysalo, Sampo
(A4 Refereed article in a conference publication ) - Building the Penitentiary Document Corpus (PeDoCo) for NLP: Balancing Data Complexity and Uniform Data Structure (2025)
- Digital Humanities in the Nordic and Baltic Countries Publications
(A4 Refereed article in a conference publication ) - Evaluation in social media discourse: A corpus-assisted discourse study of evaluative images of the Covid-19 pandemic on the Finnish Twitter-sphere (2025)
- Finnish journal of linguistics
(A1 Refereed original research article in a scientific journal) - From keywords to key embeddings – contrasting French and Swedish web registers using multilingual deep learning (2025)
- Corpus Linguistics and Linguistic Theory
(A1 Refereed original research article in a scientific journal) - L. Onervan varhaisen proosatuotannon tyylipiirteitä korpusstilistiikan avainsanamenetelmän valossa (2025)
- Kirjallisuudentutkimuksen aikakauslehti Avain
(A1 Refereed original research article in a scientific journal) - Perspectives on Forests and Forestry in Finnish Online Discussions - A Topic Modeling Approach to Suomi24 (2025) Proceedings of the 1st Workshop on Ecology, Environment, and Natural Language Processing (NLP4Ecology2025) Peura, Telma; Krizsán, Attila; Kuusalu, Salla-Riikka; Laippala, Veronika
(A4 Refereed article in a conference publication ) - Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation (2025) Proceedings of the Second Conference on Language Modeling, COLM 2025 Myntti, Amanda; Henriksson, Erik; Laippala,Veronika; Pyysalo, Sampo
(D3 Article in a professional conference publication) - TCBLex - A lexical database of Finnish literary texts for children (2025)
- Behavior Research Methods
(A1 Refereed original research article in a scientific journal) - Utilizing Text Dispersion Keyness on Turkish web registers: The case of Informational Description and Opinion (2025) Exploring digitally-mediated communication with corpora: Methods, analyses, and corpus construction. Erten-Johansson, Selcen; Laippala, Veronika
(A3 Refereed book chapter or chapter in a compilation book) - Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4 (2024)
- LREC Proceedings
(A4 Refereed article in a conference publication ) - Building Question-Answer Data Using Web Register Identification (2024)
- LREC Proceedings
(A4 Refereed article in a conference publication ) - From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations (2024) Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities Henriksson, Erik; Myntti, Amanda; Hellström, Saara; Erten-Johansson, Selcen; Eskelinen, Anni; Repo, Liina; Laippala, Veronika
(A4 Refereed article in a conference publication ) - Health crisis communication in Finnish news media: Evaluative images of the Covid-19 pandemic in digital news headlines (2024)
- Nordicom Review
(A1 Refereed original research article in a scientific journal) - Improving Latin Dependency Parsing by Combining Treebanks and Predictions (2024) Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities Kupari, Hanna-Mari Kristiina; Henriksson, Erik; Laippala, Veronika; Kanerva, Jenna
(A4 Refereed article in a conference publication ) - Intersecting Register and Genre: Understanding the Contents of Web-Crawled Corpora (2024) Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities Myntti, Amanda; Repo, Liina; Freyermuth, Elian; Kanner, Antti; Laippala, Veronika; Henriksson, Erik
(A4 Refereed article in a conference publication ) - Introduction (2024) Linguistics across Disciplinary Borders : the March of Data Coats, Steven; Laippala, Veronika
(A3 Refereed book chapter or chapter in a compilation book) - Linguistics across Disciplinary Borders : the March of Data (2024) Coats, Steven; Laippala, Veronika
(C2 Editorial work for a scientific compilation book) - Linguistic variation beyond the Indo-European web: Analyzing Turkish web registers in TurCORE (2024)
- Register studies
(A1 Refereed original research article in a scientific journal)



