The saeHB.Spatial.Beta package provides several
functions to estimate small area proportions using Hierarchical Bayesian
(HB) methods under spatial and non-spatial models. This package is
specifically designed to accommodate survey design effects (DEFF) and
handle non-sampled areas.
In this vignette, we will demonstrate a complete analytical workflow:
First, load the package along with the provided synthetic dataset
(databeta). We will also load ggplot2 for
visualization. After loading the data, we calculate the variance and the
Relative Standard Error (RSE) of the direct estimates to serve as our
baseline for comparison.
library(saeHB.Spatial.Beta)
library(ggplot2)
# Load data
data("databeta")
# Calculate Variance of Direct Estimator for proportion data considering DEFF
# var(y) = [y * (1 - y) / n_i] * deff
var_direct <- (databeta$y * (1 - databeta$y) / databeta$n_i) * databeta$deff
# Calculate Relative Standard Error (RSE) of Direct Estimation
databeta$rse_direct <- (sqrt(var_direct) / databeta$y) * 100We begin by fitting a baseline non-spatial model that accommodates the survey design effect. Here, we use the default Markov Chain Monte Carlo (MCMC) iterations.
mod_ns_deff <- betadeff_nonspatial(
formula = y ~ x1 + x2,
deff = "deff",
n_i = "n_i",
data = databeta
)Extract the RSE for the non-spatial estimates:
To determine if a spatial model is necessary, we evaluate the spatial autocorrelation of the random effects (\(v\)) from the non-spatial model.
We use the Monte Carlo Permutation approach. This
method is highly recommended because it computes the p-value empirically
by shuffling the spatial units, making it robust. The Analytical
approach (randomization) can be used as an alternative
(mc = FALSE) for very large datasets where computational
speed is a priority.
# Load the spatial weight matrix for the diagnostic test
data("weight_mat")
W_listw <- spdep::mat2listw(weight_mat, style = "W")
# Extract the mean of the random effects (v)
v_ns_deff <- as.numeric(mod_ns_deff$randeff$Estimate)
# Monte Carlo Permutation Approach
set.seed(123)
moran_result <- moran_test(x = v_ns_deff, listw = W_listw, mc = TRUE, nsim = 999)
print(moran_result)
#>
#> Moran's I test under Monte Carlo permutation
#>
#> data: v_ns_deff
#> statistic = 0.35803, observed rank = 1000, p-value = 0.001
#> alternative hypothesis: greaterA significant p-value confirms that the non-spatial model left unexplained spatial structure, heavily justifying the use of spatial models.
We will now fit two spatial models: the Simultaneous Autoregressive (SAR) model and the Leroux Conditional Autoregressive (CAR) model. The SAR model requires a row-standardized weight matrix, while the Leroux CAR model requires a binary adjacency matrix.
# Load the binary adjacency matrix for the Leroux CAR model
data("adjacency_mat")
# 1. Fit Spatial SAR Model
mod_sar_deff <- betadeff_sar(
formula = y ~ x1 + x2,
deff = "deff",
n_i = "n_i",
proxmat = weight_mat,
data = databeta
)
# 2. Fit Spatial Leroux CAR Model
mod_leroux_deff <- betadeff_lerouxcar(
formula = y ~ x1 + x2,
deff = "deff",
n_i = "n_i",
proxmat = adjacency_mat,
data = databeta
)Extract the RSE for both spatial models:
We compare the performance of the Direct Estimator, the Non-Spatial Model, and the Spatial Models. A lower RSE indicates a more reliable and precise estimate.
# Combine RSEs into a single data frame for plotting
df_rse <- data.frame(
Area = seq_along(databeta$y),
Direct = databeta$rse_direct,
Non_Spatial = rse_ns_deff,
Spatial_SAR = rse_sar_deff,
Spatial_Leroux = rse_leroux_deff
)
# Order by Direct RSE for better visualization
df_rse <- df_rse[order(df_rse$Direct), ]
df_rse$Area_Index <- seq_len(nrow(df_rse))
# Plotting the RSE Comparison
ggplot(df_rse, aes(x = Area_Index)) +
# Direct Estimation
geom_line(aes(y = Direct, color = "Direct"), linewidth = 0.8, alpha = 0.6) +
geom_point(aes(y = Direct, color = "Direct"), size = 2, alpha = 0.6) +
# Non-Spatial
geom_line(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), linewidth = 0.8, alpha = 0.8) +
geom_point(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), size = 2, alpha = 0.8) +
# Spatial SAR
geom_line(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), linewidth = 1) +
geom_point(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), size = 2) +
# Spatial Leroux CAR
geom_line(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), linewidth = 1) +
geom_point(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), size = 2) +
scale_color_manual(
name = "Estimator",
values = c("Direct" = "#E69F00",
"HB Beta Deff Non-Spatial" = "#56B4E9",
"HB Beta Deff Spatial SAR" = "#009E73",
"HB Beta Deff Spatial Leroux" = "#D55E00")
) +
labs(
title = "Comparison of Relative Standard Error (RSE)",
subtitle = "Lower RSE indicates higher precision",
x = "Area (Ordered by Direct RSE)",
y = "RSE (%)"
) +
theme_minimal() +
theme(legend.position = "bottom")