This vignette demonstrates how to create language maps using
Rglottography. An introduction to the package’s basic
functionality is available in getting_started.Rmd.
The vignette requires the following geospatial and data visualisation packages:
sf: package for working with spatial data; it allows
you to handle, manipulate, and analyse the geometries of the
Rglottography speaker area polygons.dplyr: data manipulation package, useful for filtering,
selecting, and preparing the language datasets before mapping.ggplot2: data visualisation package; when combined with
sf, it helps you create layered and customisable maps.rnaturalearth: provides access to Natural Earth
datasets, including landmasses, oceans, rivers, and populated places;
used to add contextual layers and geographical reference to the language
maps.Note that the code used to generate the map is shown but not executed during package checks.
We begin by loading the Glottography dataset
matsumae2021exploring and extracting the language-level
polygons. Using dplyr::filter() and the relevant
Glottocodes, we select the Even, Koryak, and Chukchi languages. The
resulting polygons are then reprojected to a custom Lambert Azimuthal
Equal Area projection, centred over northeastern Siberia
(arctic_proj). This projection was identified using the Projection Wizard, an online
tool for selecting map projections tailored to specific geographic
regions. As a planar, equal-area projection, it preserves area and
supports reliable distance-based spatial operations in high-latitude
regions.
glottography <- load_datasets("matsumae2021exploring")
languages <- glottography$languages
arctic_proj <- paste0(
"+proj=aea +lon_0=159.609375 +lat_1=51.7903305 ",
"+lat_2=74.6587299 +lat_0=63.2245302 +datum=WGS84 ",
"+units=m +no_defs")
even_kory_chuk <- languages |>
dplyr::filter(glottocode %in% c("even1260", "kory1246", "chuk1273")) |>
sf::st_transform(arctic_proj) We load Natural Earth data for landmasses, oceans, lakes (polygons),
rivers (lines), and populated places (points). Because these datasets
are global, we reproject them to our custom Arctic Lambert Azimuthal
Equal-Area projection and then clip them to the data extent.
Reprojecting is necessary because working in a polar region can cause
certain spatial operations, such as intersection and buffering, to
behave unpredictably in geographic coordinates (EPSG:4326).
By using a planar, area-preserving projection specifically designed for
high-latitude regions these spatial operations become reliable.
The data extent is defined as a buffered bounding box around the Even, Koryak, and Chukchi polygons:
clip_extent <- even_kory_chuk |>
sf::st_bbox() |>
sf::st_as_sfc () |>
sf::st_buffer(dist = 700000) # buffer in metresWe then load the Natural Earth data Each dataset is first
reprojected, then validated with st_make_valid() to fix
potential geometry issues arising from reprojection, and finally clipped
to the buffered map extent. This ensures that only the relevant features
appear on the map, reduces computational load, and prevents errors when
performing spatial operations in high-latitude regions.
land <- rnaturalearth::ne_download(scale = "large", type = "land",
category = "physical") |>
sf::st_transform(arctic_proj) |>
sf::st_make_valid() |>
sf::st_intersection(clip_extent)
ocean <- rnaturalearth::ne_download(scale = "large", type = "ocean",
category = "physical") |>
sf::st_transform(arctic_proj) |>
sf::st_make_valid()|>
sf::st_intersection(clip_extent)
rivers <- rnaturalearth::ne_download(scale = "large",
type = "rivers_lake_centerlines",
category = "physical") |>
dplyr::filter(scalerank < 7) |> # retain only relatively large rivers
sf::st_transform(arctic_proj) |>
sf::st_make_valid()|>
sf::st_intersection(clip_extent)
lakes <- rnaturalearth::ne_download(scale = "large",
type = "lakes",
category = "physical") |>
dplyr::filter(scalerank < 7) |> # retain only relatively large lakes
sf::st_transform(arctic_proj) |>
sf::st_make_valid()|>
sf::st_intersection(clip_extent)
places <- rnaturalearth::ne_download(scale = "large",
type = "populated_places",
category = "cultural") |>
dplyr::filter(SCALERANK < 4) |> # retain only relatively large places
sf::st_transform(arctic_proj) |>
sf::st_make_valid()|>
sf::st_intersection(clip_extent)First, we create the base map using the land, ocean, and lake polygons, specifying fill colours for each layer and omitting borders for a clean look.
language_map <- ggplot2::ggplot() +
ggplot2::geom_sf(data = land, fill = "white", color = NA) +
ggplot2::geom_sf(data = ocean, fill = "#D0E1F2", color = NA) +
ggplot2::geom_sf(data = lakes, fill = "#7FB4D6", color = NA)Next, we add graticules to the map, which provide a subtle reference for latitude and longitude.
We now overlay the Even, Koryak, and Chukchi language polygons on the base map, using semi-transparent fills to allow the underlying geographic context to remain visible.
language_map <- language_map +
ggplot2::geom_sf(data = even_kory_chuk,
ggplot2::aes(fill = name), color = NA, alpha = 0.5)Next, we add populated places to the map, along with their labels.
Labels are slightly offset vertically using (nudge_y) to
improve readability.
We now add a legend to the map, assigning custom colours to each language. The legend title is removed for a cleaner appearance.
The map extent of the final visible map is a slightly smaller
rectangle than the clipped data extent. This ensures that features near
the edges are not truncated and gives the map some margin for a cleaner
appearance. In the code below, we calculate the bounding box of the
clipped data (clip_extent) and apply an offset to create
the visible map extent. We set expand = FALSE and
datum = NA to ensure the map is drawn exactly within the
specified bounds and to prevent any automatic reprojection or
padding.
We apply a minimal theme with theme_void() to remove
axes and background elements, and add a grey border around the map for
clarity. We also move the legend inside the plot and give it a white
background.
language_map <- language_map +
ggplot2::theme_void() +
ggplot2::theme(
panel.border = ggplot2::element_rect(
color = "grey", fill = NA, linewidth = 0.5),
legend.position = "inside",
legend.position.inside = c(0.95, 0.05),
legend.justification = c(1, 0),
legend.background = ggplot2::element_rect(fill = "white", color = NA),
legend.box.background = ggplot2::element_rect(fill = "white", color = NA),
legend.margin = ggplot2::margin(10, 10, 10, 10)
)The map below illustrates the speaker areas of Even, Chukchi, and Koryak in eastern Siberia. While it was generated using the code shown above, it is included here as a pre-rendered figure.
To acknowledge the primary sources used to construct the map, we can
collect the scientific references associated with all language polygons
included in the data. For any Glottography object, such as a complete
collection or a set of languages, collect_sources() returns
the corresponding bibliographic entries in BibTeX format. These
references can be printed to the console or written to a (BibTeX file)
for inclusion in a publications or supplementary materials.
Primary Citation: Ranacher et al. (2026). Glottography: an open-source geolinguistic data platform for mapping the world’s languages. Journal of Open Humanities Data. doi:10.5334/johd.459
Advanced Mapping: An advanced vignette on
creating language maps with Rglottography using optional
third-party packages is provided in
vignette("mapping_languages", package = "Rglottography").
Documentation: Consult the package documentation
for each function for detailed usage information (e.g.,
?load_datasets or ?Rglottography).