Mapping languages with Rglottography

This vignette demonstrates how to create language maps using Rglottography. An introduction to the package’s basic functionality is available in getting_started.Rmd.

Note that the code used to generate the map is shown but not executed during package checks.

Loading Glottography Language Polygons

We begin by loading the Glottography dataset matsumae2021exploring and extracting the language-level polygons. Using dplyr::filter() and the relevant Glottocodes, we select the Even, Koryak, and Chukchi languages. The resulting polygons are then reprojected to a custom Lambert Azimuthal Equal Area projection, centred over northeastern Siberia (arctic_proj). This projection was identified using the Projection Wizard, an online tool for selecting map projections tailored to specific geographic regions. As a planar, equal-area projection, it preserves area and supports reliable distance-based spatial operations in high-latitude regions.


glottography <- load_datasets("matsumae2021exploring")
languages <- glottography$languages

arctic_proj <- paste0(
  "+proj=aea +lon_0=159.609375 +lat_1=51.7903305 ",
  "+lat_2=74.6587299 +lat_0=63.2245302 +datum=WGS84 ",
  "+units=m +no_defs")

even_kory_chuk <- languages |> 
  dplyr::filter(glottocode %in% c("even1260", "kory1246", "chuk1273")) |>
  sf::st_transform(arctic_proj)

Loading Natural Earth Data

We load Natural Earth data for landmasses, oceans, lakes (polygons), rivers (lines), and populated places (points). Because these datasets are global, we reproject them to our custom Arctic Lambert Azimuthal Equal-Area projection and then clip them to the data extent. Reprojecting is necessary because working in a polar region can cause certain spatial operations, such as intersection and buffering, to behave unpredictably in geographic coordinates (EPSG:4326). By using a planar, area-preserving projection specifically designed for high-latitude regions these spatial operations become reliable.

The data extent is defined as a buffered bounding box around the Even, Koryak, and Chukchi polygons:

clip_extent <- even_kory_chuk |> 
  sf::st_bbox() |>
  sf::st_as_sfc () |>
  sf::st_buffer(dist = 700000)  # buffer in metres

We then load the Natural Earth data Each dataset is first reprojected, then validated with st_make_valid() to fix potential geometry issues arising from reprojection, and finally clipped to the buffered map extent. This ensures that only the relevant features appear on the map, reduces computational load, and prevents errors when performing spatial operations in high-latitude regions.

land <- rnaturalearth::ne_download(scale = "large", type = "land", 
                                   category = "physical") |>
  sf::st_transform(arctic_proj) |>
  sf::st_make_valid() |>
  sf::st_intersection(clip_extent)

ocean <- rnaturalearth::ne_download(scale = "large", type = "ocean", 
                                   category = "physical") |>
  sf::st_transform(arctic_proj) |>
  sf::st_make_valid()|>
  sf::st_intersection(clip_extent)

rivers <- rnaturalearth::ne_download(scale = "large", 
                                     type = "rivers_lake_centerlines",
                                     category = "physical") |> 
  dplyr::filter(scalerank < 7) |>  # retain only relatively large rivers
  sf::st_transform(arctic_proj) |>
  sf::st_make_valid()|>
  sf::st_intersection(clip_extent)

lakes <- rnaturalearth::ne_download(scale = "large",
                                    type = "lakes", 
                                    category = "physical") |>
  dplyr::filter(scalerank < 7) |>  # retain only relatively large lakes
  sf::st_transform(arctic_proj) |>
  sf::st_make_valid()|>
  sf::st_intersection(clip_extent)

places <- rnaturalearth::ne_download(scale = "large", 
                                     type = "populated_places", 
                                     category = "cultural") |>
  dplyr::filter(SCALERANK < 4) |>  # retain only relatively large places
  sf::st_transform(arctic_proj) |>
  sf::st_make_valid()|>
  sf::st_intersection(clip_extent)

Creating the base map

First, we create the base map using the land, ocean, and lake polygons, specifying fill colours for each layer and omitting borders for a clean look.

language_map <- ggplot2::ggplot() +
  ggplot2::geom_sf(data = land, fill = "white", color = NA) +
  ggplot2::geom_sf(data = ocean, fill = "#D0E1F2", color = NA) +
  ggplot2::geom_sf(data = lakes, fill = "#7FB4D6", color = NA)

Next, we add graticules to the map, which provide a subtle reference for latitude and longitude.

graticules <- sf::st_graticule(lons = c(150, 180), 
                           lats = c(50, 80)) |>
  sf::st_transform(arctic_proj) |>
  sf::st_geometry() |>
  sf::st_intersection(clip_extent)
  
language_map <- language_map +
  ggplot2::geom_sf(data = graticules, color = "grey70", linetype = "dashed", linewidth = 0.3)

Overlaying Languages and Geographic Features

We now overlay the Even, Koryak, and Chukchi language polygons on the base map, using semi-transparent fills to allow the underlying geographic context to remain visible.

language_map <- language_map +
  ggplot2::geom_sf(data = even_kory_chuk, 
                   ggplot2::aes(fill = name), color = NA, alpha = 0.5)

Next, we add populated places to the map, along with their labels. Labels are slightly offset vertically using (nudge_y) to improve readability.

language_map <- language_map +
    ggplot2::geom_sf(data = places) +
    ggplot2::geom_sf_text(
      data = places,
      ggplot2::aes(label = NAME),
      size = 3,
      nudge_y = 70000 #vertical offset in metres
    )

Adding a legend

We now add a legend to the map, assigning custom colours to each language. The legend title is removed for a cleaner appearance.

color <-c("#CC79A7", "#E69F00", "#009E73")
names(color) <- even_kory_chuk$name

language_map <- language_map +
    ggplot2::scale_fill_manual(
      values = color,
      labels = names(color),
      guide = ggplot2::guide_legend(title = NULL) 
    )

Map extent

The map extent of the final visible map is a slightly smaller rectangle than the clipped data extent. This ensures that features near the edges are not truncated and gives the map some margin for a cleaner appearance. In the code below, we calculate the bounding box of the clipped data (clip_extent) and apply an offset to create the visible map extent. We set expand = FALSE and datum = NA to ensure the map is drawn exactly within the specified bounds and to prevent any automatic reprojection or padding.

bbox <- sf::st_bbox(clip_extent)
offset <- 200000 
  
language_map <- language_map +
  ggplot2::coord_sf(crs = arctic_proj, 
           xlim = c(bbox["xmin"] + offset, bbox["xmax"] - offset),
           ylim = c(bbox["ymin"] + offset, bbox["ymax"] - offset),
           expand = FALSE, datum = NA)

Setting the map theme

We apply a minimal theme with theme_void() to remove axes and background elements, and add a grey border around the map for clarity. We also move the legend inside the plot and give it a white background.

language_map <- language_map + 
  ggplot2::theme_void() +
  ggplot2::theme(
    panel.border = ggplot2::element_rect(
      color = "grey", fill = NA, linewidth = 0.5),
    legend.position = "inside",
    legend.position.inside = c(0.95, 0.05),
    legend.justification = c(1, 0),
    legend.background = ggplot2::element_rect(fill = "white", color = NA),
    legend.box.background = ggplot2::element_rect(fill = "white", color = NA),
    legend.margin = ggplot2::margin(10, 10, 10, 10)
  )

Displaying the map

The map below illustrates the speaker areas of Even, Chukchi, and Koryak in eastern Siberia. While it was generated using the code shown above, it is included here as a pre-rendered figure.

Collecting references

To acknowledge the primary sources used to construct the map, we can collect the scientific references associated with all language polygons included in the data. For any Glottography object, such as a complete collection or a set of languages, collect_sources() returns the corresponding bibliographic entries in BibTeX format. These references can be printed to the console or written to a (BibTeX file) for inclusion in a publications or supplementary materials.

sources <- collect_sources(even_kory_chuk)

References and Further Reading

Primary Citation: Ranacher et al. (2026). Glottography: an open-source geolinguistic data platform for mapping the world’s languages. Journal of Open Humanities Data. doi:10.5334/johd.459
Advanced Mapping: An advanced vignette on creating language maps with Rglottography using optional third-party packages is provided in vignette("mapping_languages", package = "Rglottography").
Documentation: Consult the package documentation for each function for detailed usage information (e.g., ?load_datasets or ?Rglottography).