Bayesian Surprise for De-Biasing Thematic Maps in R
bayesiansurpriser implements Bayesian Surprise
calculations for thematic maps, inspired by Correll & Heer’s
“Surprise! Bayesian Weighting for De-Biasing Thematic Maps” (IEEE
InfoVis 2016). The default calculation normalizes posterior model
probabilities and measures how much each observation updates beliefs
about a specified model space.
The package provides seamless integration with: - sf: Simple Features for spatial data - ggplot2: Grammar of graphics for visualization - Temporal/streaming data analysis
# Install from GitHub (development version)
# install.packages("devtools")
devtools::install_github("dshkol/bayesiansurpriser")library(bayesiansurpriser)
library(sf)
library(ggplot2)
# Load sample spatial data
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
# Compute Bayesian surprise
result <- surprise(nc, observed = SID74, expected = BIR74)
# View results
print(result)
# Plot with ggplot2
ggplot(result) +
geom_sf(aes(fill = signed_surprise)) +
scale_fill_surprise_diverging() +
labs(title = "Bayesian Surprise: NC SIDS Data") +
theme_minimal()bs_model_uniform()):
Assumes equiprobable eventsbs_model_baserate()):
Compares to expected rates (e.g., population)bs_model_gaussian()):
Parametric model for outlier detectionbs_model_sampled()):
Non-parametric KDE modelbs_model_funnel()):
Accounts for sampling variation# Works directly with sf objects
result <- st_surprise(nc, observed = SID74, expected = BIR74)
plot(result)# Custom geom and scales
ggplot(nc) +
geom_surprise(aes(observed = SID74, expected = BIR74)) +
scale_fill_surprise()
# Signed surprise with diverging colors
ggplot(nc) +
geom_surprise(aes(observed = SID74, expected = BIR74), fill_type = "signed") +
scale_fill_surprise_diverging()# Compute surprise over time
result <- surprise_temporal(data,
time_col = year,
observed = events,
expected = population
)
# Update with streaming data
result <- update_surprise(result, new_data)Traditional thematic maps suffer from three key biases:
Bayesian Surprise can help address these biases by comparing observations against explicit models, such as population base rates and sampling-variation models.
The default method uses KL-divergence to measure “surprise”:
Surprise = KL(P(M|D) || P(M))
= Σ P(M_i|D) * log(P(M_i|D) / P(M_i))
Where: - P(M) = Prior probability of model M -
P(M|D) = Posterior probability after observing data D -
High surprise = data significantly updates our beliefs
The original JavaScript demo associated with the paper used an
unnormalized per-region score for some map outputs. This package keeps
that behavior only as an explicit legacy comparison option
(normalize_posterior = FALSE); new analyses should use the
normalized default.
Correll, M., & Heer, J. (2017). Surprise! Bayesian Weighting for De-Biasing Thematic Maps. IEEE Transactions on Visualization and Computer Graphics, 23(1), 651-660. https://doi.org/10.1109/TVCG.2016.2598839
MIT