A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
Glottography: An Open-Source Geolinguistic Data Platform for Mapping the World's Languages
Tekijät: Ranacher, Peter; Forkel, Robert; Efrat-Kowalsky, Nour; Urban, Matthias; Hehli, Antonia; Franz, Micha; Biland, Gregory; Kreienbuhl, Aaron; Rodriguez, Alba Hermida; Azevedo, Matheus C. B. C.; Giebler, James; Takahashi, Takuya; Neureiter, Nico; Van Gijn, Rik; Roose, Meeli; Vesakoski, Outi; Weibel, Robert; Kaiping, Gereon; Norder, Sietze
Kustantaja: Ubiquity Press
Julkaisuvuosi: 2026
Lehti: Journal of Open Humanities Data
Artikkelin numero: 47
Vuosikerta: 12
eISSN: 2059-481X
DOI: https://doi.org/10.5334/johd.459
Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla
Julkaisukanavan avoimuus : Kokonaan avoin julkaisukanava
Verkko-osoite: https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.459
Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/523525197
Rinnakkaistallenteen lisenssi: CC BY
Rinnakkaistallennetun julkaisun versio: Kustantajan versio
Maps depicting the geographic location of languages are essential tools for linguistic research. Although many language maps are available in the scientific literature, most encode spatial information as static images, often on paper. In contrast, geographic databases store languages as georeferenced digital data, allowing integration with other datasets, quantitative geographic analyses, and mapping. At present, there is no open-access platform providing digital language areas. To address this limitation, we introduce Glottography, a free and open geolinguistic data platform for mapping the world’s languages. Glottography represents the speaker areas of the world’s languages as georeferenced spatial polygons, enriched with relevant metadata, including Glottocodes that link each polygon to a unique identifier in Glottolog, a database cataloguing the world’s dialects, languages, and language families. Glottography currently includes more than 13,000 language areas of 5,300 distinct languages, digitised from 29 source publications. For each source, the platform provides the data in its raw, unmodified form and aggregated at the levels of languages and language families, according to the classification in Glottolog. Glottography is accessible through Rglottography, an R package, and is accompanied by detailed tutorials for usage and data acquisition that encourage users to contribute new geodata to the platform. Being the first open data source of its kind, Glottography enables computational analyses that explore the origins, distribution, and drivers of global linguistic diversity.
Ladattava julkaisu This is an electronic reprint of the original article. |
Julkaisussa olevat rahoitustiedot:
PR, AH, NEK and MF were funded by the URPP ‘Language and Space’, University of Zurich. PR and MF were partially funded by the NCCR Evolving Language, Swiss NSF Agreement No. 51NF40_180888. GK and TT were funded by the project ‘Out of Asia’, Swiss NSF agreement No. CRSII5_183578. MU and JG were funded by the European Union (ERC, LANGUAGE REDUX, 101124345). RVG was funded by the European Union (ERC, SAPPHIRE, 818854) and the Dutch Scientific Organization (NWO Open Competition-L, Disentangling the roles of social and biophysical factors in the evolution of linguistic diversity in South America). MR was funded by the Finnish Society of Sciences and Letters (grant no. 87), the Finnish Cultural Foundation (grant no. 00220881), the Human Diversity consortium (HuDi) under the Profi7 programme of the Research Council of Finland (grant no. 352727).