Visualization with ggplot2

library(bayesiansurpriser)
library(sf)
library(ggplot2)

Overview

The bayesiansurpriser package provides seamless ggplot2 integration through custom scales and computed surprise values that can be mapped to aesthetics.

Loading Example Data

nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

Basic Workflow: Compute then Plot

The recommended workflow is to compute surprise first, then use ggplot2:

# Compute surprise
result <- surprise(nc, observed = SID74, expected = BIR74)

# Plot with ggplot2 using geom_sf
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise() +
  labs(title = "Bayesian Surprise Map")

Example ggplot2 visualization of Bayesian surprise values.

Color Scales

Sequential Scale: scale_fill_surprise()

For absolute surprise values (always positive):

ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "inferno") +
  labs(title = "Inferno Palette")

Example ggplot2 visualization of Bayesian surprise values.

Available viridis options: “viridis”, “magma”, “plasma”, “inferno”, “cividis”, “rocket”, “mako”, “turbo”

p1 <- ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "viridis") +
  labs(title = "Viridis")

p2 <- ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(option = "plasma") +
  labs(title = "Plasma")

p1
p2

Example ggplot2 visualization of Bayesian surprise values.Example ggplot2 visualization of Bayesian surprise values.

Diverging Scale: scale_fill_surprise_diverging()

For signed surprise (positive = over-representation, negative = under-representation):

ggplot(result) +
  geom_sf(aes(fill = signed_surprise)) +
  scale_fill_surprise_diverging() +
  labs(title = "Diverging Scale for Signed Surprise")

Example ggplot2 visualization of Bayesian surprise values.

Custom colors:

ggplot(result) +
  geom_sf(aes(fill = signed_surprise)) +
  scale_fill_surprise_diverging(
    low = "#2166AC",   # Blue
    mid = "#F7F7F7",   # Light gray
    high = "#B2182B"   # Red
  ) +
  labs(title = "Custom Diverging Colors")

Example ggplot2 visualization of Bayesian surprise values.

Binned Scale: scale_fill_surprise_binned()

For discrete categories:

ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise_binned(n.breaks = 5) +
  labs(title = "Binned Surprise Scale")

Example ggplot2 visualization of Bayesian surprise values.

Combining with Other ggplot2 Elements

Adding Labels

# Top 5 most surprising counties
top5 <- result[order(-result$surprise), ][1:5, ]

ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  geom_sf_text(data = top5, aes(label = NAME), size = 3) +
  scale_fill_surprise() +
  labs(title = "Top 5 Most Surprising Counties Labeled")
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data

Example ggplot2 visualization of Bayesian surprise values.

Faceting

# Compare two time periods
result74 <- surprise(nc, observed = SID74, expected = BIR74)
result79 <- surprise(nc, observed = SID79, expected = BIR79)

result74$period <- "1974-78"
result79$period <- "1979-84"

combined <- rbind(result74, result79)

ggplot(combined) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise() +
  facet_wrap(~period) +
  labs(title = "Surprise by Time Period")

Example ggplot2 visualization of Bayesian surprise values.

Theme Customization

ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise(name = "Surprise\n(bits)") +
  labs(
    title = "Bayesian Surprise: NC SIDS Data",
    subtitle = "Identifying unexpectedly high/low SIDS rates",
    caption = "Data: NC SIDS 1974-78"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.key.width = unit(2, "cm")
  )

Example ggplot2 visualization of Bayesian surprise values.

Non-Spatial Data

For non-spatial data, use standard ggplot2 geoms after computing surprise:

# Create example data
df <- data.frame(
  region = LETTERS[1:10],
  observed = c(50, 120, 80, 200, 45, 150, 90, 180, 60, 110),
  expected = c(100, 100, 100, 100, 100, 100, 100, 100, 100, 100) * 10
)

result_df <- surprise(df, observed = observed, expected = expected)

ggplot(result_df, aes(x = reorder(region, -surprise), y = surprise)) +
  geom_col(aes(fill = surprise)) +
  scale_fill_surprise() +
  labs(x = "Region", y = "Surprise (bits)",
       title = "Surprise by Region") +
  theme_minimal()

Example ggplot2 visualization of Bayesian surprise values.

Best Practices

  1. Use diverging scales for signed surprise: Makes interpretation intuitive
  2. Consider binned scales for communication: Discrete categories are easier to read
  3. Label notable regions: Help viewers identify specific areas
  4. Include a legend title with units: “Surprise (bits)” clarifies the measure
  5. Use minimal themes for maps: Reduce visual clutter