Getting Started with saeHB.Spatial.Beta

Introduction

The saeHB.Spatial.Beta package provides several functions to estimate small area proportions using Hierarchical Bayesian (HB) methods under spatial and non-spatial models. This package is specifically designed to accommodate survey design effects (DEFF) and handle non-sampled areas.

In this vignette, we will demonstrate a complete analytical workflow:

Calculating Direct Estimates.
Fitting a Non-Spatial Beta Model.
Checking spatial autocorrelation on the random effects using Moran’s I.
Fitting Spatial SAR and Leroux CAR Beta Models.
Comparing the precision using Relative Standard Error (RSE).

Step 1: Preparation and Data Loading

First, load the package along with the provided synthetic dataset (databeta). We will also load ggplot2 for visualization. After loading the data, we calculate the variance and the Relative Standard Error (RSE) of the direct estimates to serve as our baseline for comparison.

library(saeHB.Spatial.Beta)
library(ggplot2)

# Load data
data("databeta")

# Calculate Variance of Direct Estimator for proportion data considering DEFF
# var(y) = [y * (1 - y) / n_i] * deff
var_direct <- (databeta$y * (1 - databeta$y) / databeta$n_i) * databeta$deff

# Calculate Relative Standard Error (RSE) of Direct Estimation
databeta$rse_direct <- (sqrt(var_direct) / databeta$y) * 100

Step 2: Fitting the Non-Spatial Model

We begin by fitting a baseline non-spatial model that accommodates the survey design effect. Here, we use the default Markov Chain Monte Carlo (MCMC) iterations.

mod_ns_deff <- betadeff_nonspatial(
  formula = y ~ x1 + x2,
  deff = "deff",
  n_i = "n_i",
  data = databeta
)

Extract the RSE for the non-spatial estimates:

rse_ns_deff <- (mod_ns_deff$est$Est.Error / mod_ns_deff$est$Estimate) * 100

Step 3: Spatial Autocorrelation Diagnostic (Moran’s I)

To determine if a spatial model is necessary, we evaluate the spatial autocorrelation of the random effects (\(v\)) from the non-spatial model.

We use the Monte Carlo Permutation approach. This method is highly recommended because it computes the p-value empirically by shuffling the spatial units, making it robust. The Analytical approach (randomization) can be used as an alternative (mc = FALSE) for very large datasets where computational speed is a priority.

# Load the spatial weight matrix for the diagnostic test
data("weight_mat")
W_listw <- spdep::mat2listw(weight_mat, style = "W")

# Extract the mean of the random effects (v)
v_ns_deff <- as.numeric(mod_ns_deff$randeff$Estimate)

# Monte Carlo Permutation Approach
set.seed(123) 
moran_result <- moran_test(x = v_ns_deff, listw = W_listw, mc = TRUE, nsim = 999)
print(moran_result)
#> 
#>  Moran's I test under Monte Carlo permutation
#> 
#> data:  v_ns_deff
#> statistic = 0.35803, observed rank = 1000, p-value = 0.001
#> alternative hypothesis: greater

A significant p-value confirms that the non-spatial model left unexplained spatial structure, heavily justifying the use of spatial models.

Step 4: Fitting the Spatial Models

We will now fit two spatial models: the Simultaneous Autoregressive (SAR) model and the Leroux Conditional Autoregressive (CAR) model. The SAR model requires a row-standardized weight matrix, while the Leroux CAR model requires a binary adjacency matrix.

# Load the binary adjacency matrix for the Leroux CAR model
data("adjacency_mat")

# 1. Fit Spatial SAR Model
mod_sar_deff <- betadeff_sar(
  formula = y ~ x1 + x2,
  deff = "deff",
  n_i = "n_i",
  proxmat = weight_mat,
  data = databeta
)


# 2. Fit Spatial Leroux CAR Model
mod_leroux_deff <- betadeff_lerouxcar(
  formula = y ~ x1 + x2,
  deff = "deff",
  n_i = "n_i",
  proxmat = adjacency_mat,
  data = databeta
)

Extract the RSE for both spatial models:

rse_sar_deff <- (mod_sar_deff$est$Est.Error / mod_sar_deff$est$Estimate) * 100
rse_leroux_deff <- (mod_leroux_deff$est$Est.Error / mod_leroux_deff$est$Estimate) * 100

Step 5: Evaluation and Comparison

We compare the performance of the Direct Estimator, the Non-Spatial Model, and the Spatial Models. A lower RSE indicates a more reliable and precise estimate.

# Combine RSEs into a single data frame for plotting
df_rse <- data.frame(
  Area = seq_along(databeta$y),
  Direct = databeta$rse_direct,
  Non_Spatial = rse_ns_deff,
  Spatial_SAR = rse_sar_deff,
  Spatial_Leroux = rse_leroux_deff
)

# Order by Direct RSE for better visualization
df_rse <- df_rse[order(df_rse$Direct), ]
df_rse$Area_Index <- seq_len(nrow(df_rse))

# Plotting the RSE Comparison
ggplot(df_rse, aes(x = Area_Index)) +
  # Direct Estimation
  geom_line(aes(y = Direct, color = "Direct"), linewidth = 0.8, alpha = 0.6) +
  geom_point(aes(y = Direct, color = "Direct"), size = 2, alpha = 0.6) +
  
  # Non-Spatial
  geom_line(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), linewidth = 0.8, alpha = 0.8) +
  geom_point(aes(y = Non_Spatial, color = "HB Beta Deff Non-Spatial"), size = 2, alpha = 0.8) +
  
  # Spatial SAR
  geom_line(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), linewidth = 1) +
  geom_point(aes(y = Spatial_SAR, color = "HB Beta Deff Spatial SAR"), size = 2) +
  
  # Spatial Leroux CAR
  geom_line(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), linewidth = 1) +
  geom_point(aes(y = Spatial_Leroux, color = "HB Beta Deff Spatial Leroux"), size = 2) +
  
  scale_color_manual(
    name = "Estimator",
    values = c("Direct" = "#E69F00", 
               "HB Beta Deff Non-Spatial" = "#56B4E9", 
               "HB Beta Deff Spatial SAR" = "#009E73",
               "HB Beta Deff Spatial Leroux" = "#D55E00")
  ) +
  labs(
    title = "Comparison of Relative Standard Error (RSE)",
    subtitle = "Lower RSE indicates higher precision",
    x = "Area (Ordered by Direct RSE)",
    y = "RSE (%)"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")