Valtteri Skantsi
- From keywords to key embeddings – contrasting French and Swedish web registers using multilingual deep learning (2025)
- Corpus Linguistics and Linguistic Theory
- Analyzing the unrestricted web: The finnish corpus of online registers (2023)
- Nordic Journal of Linguistics
- Towards diverse and contextually anchored paraphrase modeling: A dataset and baselines for Finnish (2023)
- Natural Language Engineering
- Textual Paraphrase Dataset for Deep Language Modelling (2022) European Language Grid: A Language Technology Platform for Multilingual Europe Kanerva Jenna, Ginter Filip, Chang Li-Hsin, Skantsi Valtteri, Kilpeläinen Jemina, Kupari Hanna-Mari, Piirto Aurora, Saarni Jenna, Sevón Maija, Tarkka Otto
- Towards better structured and less noisy Web data: Oscar with Register annotations (2022)
- International Conference on Computational Linguistics
- Beyond the English web: Zero-shot cross-lingual and lightweight monolingual classification of registers (2021) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop Repo Liina, Skantsi Valtteri, Rönnqvist Samuel, Hellström Saara, Oinonen Miika, Salmela Anna, Biber Douglas, Egbert Jesse, Pyysalo Sampo, Laippala Veronika
- Finnish Paraphrase Corpus (2021)
- Linköping Electronic Conference Proceedings
- Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification (2021)
- Linköping Electronic Conference Proceedings
- From Web Crawl to Clean Register-Annotated Corpora (2020) Proceedings of the 12th Web as Corpus Workshop Laippala Veronika, Rönnqvist Samuel, Hellström Saara, Luotolahti, Juhani, Repo Liina, Salmela Anna, Skantsi Valtteri and Pyysalo Sampo