A global and interoperable dataset of linguistic distributions derived from the Atlas of the World’s Languages - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu data-artikkeli tieteellisessä lehdessä

A global and interoperable dataset of linguistic distributions derived from the Atlas of the World’s Languages

Tekijät: Ranacher, Peter; Forkel, Robert; Efrat-Kowalsky, Nour; Urban, Matthias; Hehli, Antonia; Franz, Micha; Biland, Gregory; Kreienbühl, Aaron; Hermida Rodríguez, Alba; Azevedo, Matheus; Romar, Martijn; Klaussova, Andrea; Takahashi, Takuya; Neureiter, Nico; van Gijn, Rik; Roose, Meeli; Vesakoski, Outi; Weibel, Robert; Kaiping, Gereon; Norder, Sietze

Kustantaja: Springer Nature

Julkaisuvuosi: 2025

Lehti: Scientific Data

Artikkelin numero: 1466

Vuosikerta: 12

eISSN: 2052-4463

DOI: https://doi.org/10.1038/s41597-025-05828-6

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Kokonaan avoin julkaisukanava

Verkko-osoite: https://doi.org/10.1038/s41597-025-05828-6

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/500015565

Rinnakkaistallenteen lisenssi: CC BY

Rinnakkaistallennetun julkaisun versio: Kustantajan versio

Tiivistelmä

Asher and Moseley’s Atlas of the World’s Languages illustrates the past and present spatial distribution of human languages across more than 100 maps. While the Atlas is an impressive resource, its data are not readily accessible for research. Language areas are presented as printed maps and referenced by name, rather than as digital spatial objects linked to a standardised language catalogue. To address these limitations, we present a digital dataset derived from the Atlas. We georeferenced the map images, digitised the language polygons in a Geographic Information System (GIS), and linked each polygon to a Glottocode — a unique identifier for languages and language varieties. Following the FAIR principles, we provide the data as a faithful digital replication of the Atlas (comprising 6,992 distinct language areas) and in enriched, aggregated versions for contemporary and traditional languages. The datasets capture the spatial distribution of human languages as depicted in the Atlas, with each polygon linked to an unambiguous identifier, enabling computational analyses of the origins, distribution, and drivers of global linguistic diversity.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

s41597-025-05828-6.pdf

Julkaisussa olevat rahoitustiedot:
PR was funded by the URPP ‘Language and Space’, University of Zurich and partially funded by the NCCR Evolving Language, Swiss NSF Agreement No. 51NF40_180888. AH, MF, AKR and GB were funded by the URPP ‘Language and Space’, University of Zurich. GK and TT were funded by the project ‘Out of Asia’, Swiss NSF agreement No. CRSII5_183578. MU was funded by the European Union (ERC, LANGUAGE REDUX, 101124345). AKL was funded by the BMA ‘Mapping global biocultural diversity’, Utrecht University. RG, AHR, MA, SN and MAR were funded by the European Union (ERC, SAPPHIRE, 818854). MER and OV were partially funded by the Kone Foundation and the Finnish Research Council.
Open Access funding enabled and organized by Projekt DEAL.