The bayesiansurpriser package provides five model types
that represent different hypotheses about how data might be distributed.
Bayesian Surprise measures how much our beliefs about which model is
correct change after observing data.
bs_model_uniform)The uniform model assumes all observations are equally likely regardless of any contextual factors.
model_uniform <- bs_model_uniform()
print(model_uniform)
#> <bs_model: uniform >
#> Name: Uniform
#> Parameters:
#> n_regions :Use when: You want to detect any deviation from uniform distribution.
Likelihood: Each observation has probability 1/n where n is the number of observations.
bs_model_baserate)The base rate model assumes observations follow a known baseline distribution (e.g., population proportions).
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
# Create base rate model from birth counts
model_baserate <- bs_model_baserate(nc$BIR74)
print(model_baserate)
#> <bs_model: baserate >
#> Name: Base Rate
#> Parameters:
#> expected : [ 100 values]
#> expected_prop : [ 100 values]
#> normalize : TRUEUse when: You have expected values (population, baseline rates) and want to detect deviations from them.
Likelihood: Probability proportional to expected values (normalized to sum to 1).
bs_model_gaussian)The Gaussian model assumes observations follow a normal distribution, either with fixed parameters or fitted from the data.
# Fit from data
model_gaussian_fit <- bs_model_gaussian()
# Fixed parameters
model_gaussian_fixed <- bs_model_gaussian(mu = 100, sigma = 20, fit_from_data = FALSE)
print(model_gaussian_fixed)
#> <bs_model: gaussian >
#> Name: Gaussian
#> Parameters:
#> mu : 100
#> sigma : 20
#> fit_from_data : FALSEUse when: You expect data to follow a normal distribution and want to detect outliers.
Likelihood: Based on normal PDF with given (or estimated) mean and standard deviation.
bs_model_sampled)The sampled model uses kernel density estimation (KDE) to create an empirical distribution from the data itself.
# Sample 20% of data for KDE
model_sampled <- bs_model_sampled(sample_frac = 0.2, kernel = "gaussian")
print(model_sampled)
#> <bs_model: sampled >
#> Name: Sampled (KDE)
#> Parameters:
#> sample_frac : 0.2
#> kernel : gaussian
#> bandwidth : nrd0
#> sample_indices :Use when: You want a data-driven baseline without parametric assumptions.
Likelihood: Evaluated from kernel density estimate of sampled data.
bs_model_funnel)The funnel model accounts for sampling variance, recognizing that smaller samples have higher natural variability.
# Create funnel model with sample sizes
model_funnel <- bs_model_funnel(nc$BIR74, type = "count")
print(model_funnel)
#> <bs_model: funnel >
#> Name: de Moivre Funnel (paper)
#> Parameters:
#> sample_size : [ 100 values]
#> target_rate :
#> type : count
#> formula : paper
#> control_limits : 2, 3Use when: Observations come from samples of different sizes and you want to account for sampling error.
Likelihood: Based on z-scores computed from expected standard error given sample size.
Models are combined into a “model space” with prior probabilities:
# Default model space (uniform + baserate + funnel)
space_default <- default_model_space(nc$BIR74)
print(space_default)
#> <bs_model_space>
#> Models: 3
#> 1. Uniform (prior: 0.3333)
#> 2. Base Rate (prior: 0.3333)
#> 3. de Moivre Funnel (paper) (prior: 0.3333)# Custom model space with explicit prior
space_custom <- model_space(
bs_model_uniform(),
bs_model_baserate(nc$BIR74),
bs_model_gaussian(),
bs_model_funnel(nc$BIR74),
prior = c(0.1, 0.4, 0.2, 0.3),
names = c("Uniform", "Population", "Normal", "Funnel")
)
print(space_custom)
#> <bs_model_space>
#> Models: 4
#> 1. Uniform (prior: 0.1)
#> 2. Population (prior: 0.4)
#> 3. Normal (prior: 0.2)
#> 4. Funnel (prior: 0.3)Different model combinations highlight different aspects of the data:
# Only uniform model - highlights any non-uniform distribution
result_uniform <- surprise(nc, observed = SID74,
models = model_space(bs_model_uniform()))
# Uniform + baserate - highlights deviation from population proportion
result_pop <- surprise(nc, observed = SID74, expected = BIR74,
models = model_space(
bs_model_uniform(),
bs_model_baserate(nc$BIR74)
))
# Full default - accounts for sampling variance too
result_full <- surprise(nc, observed = SID74, expected = BIR74)
# Compare max surprise values
cat("Uniform only, max surprise:", max(result_uniform$surprise), "\n")
#> Uniform only, max surprise: 0
cat("With base rate, max surprise:", max(result_pop$surprise), "\n")
#> With base rate, max surprise: 0.9999773
cat("Full model space, max surprise:", max(result_full$surprise), "\n")
#> Full model space, max surprise: 0.6008295Per-region surprise scores use the model-space priors for each
observation. If you want a single global posterior over models after
observing the full dataset, call bayesian_update()
explicitly:
| Scenario | Recommended Models |
|---|---|
| Raw counts, no baseline | Uniform, Gaussian |
| Counts with population data | Uniform, Base Rate, Funnel |
| Rates/proportions | Uniform, Base Rate |
| Exploratory count/exposure analysis | Uniform, Base Rate, Funnel |
| Small samples prevalent | Include Funnel |
| Unknown distribution | Include Sampled/KDE |