A1 Refereed original research article in a scientific journal

Glottography: An Open-Source Geolinguistic Data Platform for Mapping the World's Languages




AuthorsRanacher, Peter; Forkel, Robert; Efrat-Kowalsky, Nour; Urban, Matthias; Hehli, Antonia; Franz, Micha; Biland, Gregory; Kreienbuhl, Aaron; Rodriguez, Alba Hermida; Azevedo, Matheus C. B. C.; Giebler, James; Takahashi, Takuya; Neureiter, Nico; Van Gijn, Rik; Roose, Meeli; Vesakoski, Outi; Weibel, Robert; Kaiping, Gereon; Norder, Sietze

PublisherUbiquity Press

Publication year2026

Journal: Journal of Open Humanities Data

Article number47

Volume12

eISSN2059-481X

DOIhttps://doi.org/10.5334/johd.459

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.459

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/523525197

Self-archived copy's licenceCC BY

Self-archived copy's versionPublisher`s PDF


Abstract

Maps depicting the geographic location of languages are essential tools for linguistic research. Although many language maps are available in the scientific literature, most encode spatial information as static images, often on paper. In contrast, geographic databases store languages as georeferenced digital data, allowing integration with other datasets, quantitative geographic analyses, and mapping. At present, there is no open-access platform providing digital language areas. To address this limitation, we introduce Glottography, a free and open geolinguistic data platform for mapping the world’s languages. Glottography represents the speaker areas of the world’s languages as georeferenced spatial polygons, enriched with relevant metadata, including Glottocodes that link each polygon to a unique identifier in Glottolog, a database cataloguing the world’s dialects, languages, and language families. Glottography currently includes more than 13,000 language areas of 5,300 distinct languages, digitised from 29 source publications. For each source, the platform provides the data in its raw, unmodified form and aggregated at the levels of languages and language families, according to the classification in Glottolog. Glottography is accessible through Rglottography, an R package, and is accompanied by detailed tutorials for usage and data acquisition that encourage users to contribute new geodata to the platform. Being the first open data source of its kind, Glottography enables computational analyses that explore the origins, distribution, and drivers of global linguistic diversity.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
PR, AH, NEK and MF were funded by the URPP ‘Language and Space’, University of Zurich. PR and MF were partially funded by the NCCR Evolving Language, Swiss NSF Agreement No. 51NF40_180888. GK and TT were funded by the project ‘Out of Asia’, Swiss NSF agreement No. CRSII5_183578. MU and JG were funded by the European Union (ERC, LANGUAGE REDUX, 101124345). RVG was funded by the European Union (ERC, SAPPHIRE, 818854) and the Dutch Scientific Organization (NWO Open Competition-L, Disentangling the roles of social and biophysical factors in the evolution of linguistic diversity in South America). MR was funded by the Finnish Society of Sciences and Letters (grant no. 87), the Finnish Cultural Foundation (grant no. 00220881), the Human Diversity consortium (HuDi) under the Profi7 programme of the Research Council of Finland (grant no. 352727).


Last updated on 25/05/2026 09:26:54 AM