This vignette introduces spatial prediction, multiscale pattern extraction, and uncertainty quantification using the spCF package. Throughout the vignette, we employ the coarse-to-fine spatial modeling (CFSM) framework. CFSM sequentially learns spatial patterns from coarser to finer scales and terminates the learning process when the validation score no longer improves. CFSM is especially suitable for moderate-to-large samples. See Murakami et al. (2026) for further details.
Let us load the required packages
The meuse dataset included in the sp package is used in this vignette. This dataset comprises measurements of heavy metal concentrations in topsoil collected from a floodplain along the River Meuse, along with several covariates. In this example, zinc concentrations observed at 155 sample sites are used to predict concentrations over 3,103 regularly spaced grid cells.
The response variable is the logarithm of zinc concentration. The covariates include the distance to the river and dummy variables indicating flood frequency (ffreq2 and ffreq3):
### Data at samples sites
data(meuse)
y <- log(meuse[,"zinc"]) # Response variable
coords <- meuse[,c("x","y")] # Coordinates
x <- data.frame(dist = meuse[,"dist"],
ffreq2 = as.integer(meuse$ffreq == 2),
ffreq3 = as.integer(meuse$ffreq == 3)) # Covariates
### Data at prediction sites
data(meuse.grid)
coords0 <- meuse.grid[,c("x","y")] # Coordinates
x0 <- data.frame(dist = meuse.grid[,"dist"],
ffreq2 = as.integer(meuse.grid$ffreq == 2),
ffreq3 = as.integer(meuse.grid$ffreq == 3)) # CovariatesZinc concentration is spatially plotted as follows:
obs_s <- st_as_sf( data.frame(coords, zinc=y), coords= c("x","y"), crs=28992)
plot(obs_s[,"zinc"], cex=0.6,pch = 20, nbreaks = 20, key.pos=4, axes=TRUE)In CFSM, the number of spatial scales, R, is optimized through
holdout validation. A smaller R, corresponding to early stopping, allows
the spatial process to capture only coarse-scale patterns, whereas a
larger R enables the spatial process to represent finer-scale patterns.
The cf_lm_hv function performs holdout validation as
follows:
mod_hv <- cf_lm_hv(y = y, x = x, coords = coords)
#> [1] --- SSE: Linear regression ---
#> [1] 7.456608
#> [1] --- SSE: Learning multi-scale spatial process ---
#> [1] 6.722849 (Scale 1)
#> [1] 6.277134 (Scale 2)
#> [1] 6.027553 (Scale 3)
#> [1] 5.835912 (Scale 4)
#> [1] 5.675355 (Scale 5)
#> [1] 5.493684 (Scale 6)
#> [1] 5.289569 (Scale 7)
#> [1] 5.105127 (Scale 8)
#> [1] 4.956676 (Scale 9)
#> [1] 4.714796 (Scale 10)
#> [1] 4.515861 (Scale 11)
#> [1] 4.328215 (Scale 12)
#> [1] 4.158959 (Scale 13)
#> [1] 4.004512 (Scale 14)
#> [1] 4.004512 (Scale 15) no improvement
#> [1] 3.84613 (Scale 16)
#> [1] 3.84613 (Scale 17) no improvement
#> [1] 3.696813 (Scale 18)
#> [1] 3.696813 (Scale 19) no improvement
#> [1] 3.696813 (Scale 20) no improvement
#> [1] 3.501784 (Scale 21)
#> [1] 3.501784 (Scale 22) no improvement
#> [1] 3.501784 (Scale 23) no improvement
#> [1] 3.501784 (Scale 24) no improvement
#> [1] 3.501784 (Scale 25) no improvement
#> [1]
#> [1] -> Selected finest scale: 21 (bandwidth: 174.7008)
#> [1]
#> [1] --- SSE: After coefficient adjustment ---
#> [1] 3.331767In this example, 75% of the samples are randomly selected for
training, and the remaining samples are used for validation
(train_rat = 0.75). Alternatively, the training samples can
be specified explicitly using the id_train argument. As
shown in the output, the sum-of-squares error (SSE) for the validation
samples gradually decreases as learning proceeds from the coarsest scale
(Scale 1) to the finest scale (Scale 21 in this case).
Optionally, an additional learning can be applied to further reduce
the validation SSE. Currently, only random forest is available; it is
implemented by specifying add_learn = "rf" in the
cf_lm_hv function, to capture non-linear patterns and
higher-order interactions among covariates and spatial coordinates.
After holdout validation, the full model is trained using the
cf_lm function:
mod <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv)
#> [1] --- Learning multi-scale spatial process ---
#> [1] Scale 1 (bandwidth:1436.96)
#> [1] Scale 2 (bandwidth:1293.264)
#> [1] Scale 3 (bandwidth:1163.938)
#> [1] Scale 4 (bandwidth:1047.544)
#> [1] Scale 5 (bandwidth:942.7897)
#> [1] Scale 6 (bandwidth:848.5107)
#> [1] Scale 7 (bandwidth:763.6596)
#> [1] Scale 8 (bandwidth:687.2937)
#> [1] Scale 9 (bandwidth:618.5643)
#> [1] Scale 10 (bandwidth:556.7079)
#> [1] Scale 11 (bandwidth:501.0371)
#> [1] Scale 12 (bandwidth:450.9334)
#> [1] Scale 13 (bandwidth:405.84)
#> [1] Scale 14 (bandwidth:365.256)
#> [1] Scale 15 (bandwidth:328.7304) no improvement (skipped)
#> [1] Scale 16 (bandwidth:295.8574)
#> [1] Scale 17 (bandwidth:266.2717) no improvement (skipped)
#> [1] Scale 18 (bandwidth:239.6445)
#> [1] Scale 19 (bandwidth:215.68) no improvement (skipped)
#> [1] Scale 20 (bandwidth:194.112) no improvement (skipped)
#> [1] Scale 21 (bandwidth:174.7008)In addition to the covariates x and spatial coordinates
coords at the sample sites, the corresponding covariates
(x0) and coordinates (coords0) at the
prediction sites are also specified to enable spatial prediction.
The estimated regression coefficients, standard deviations of the scale-wise spatial processes, and error statistics are displayed as follows:
mod
#> Call:
#> cf_lm(y = y, x = x, coords = coords, x0 = x0, coords0 = coords0,
#> mod_hv = mod_hv)
#>
#> ----Coefficients---------------------------------------
#> coef coef_se lower_95CI upper_95CI
#> Intercept 6.7894420 0.06261576 6.6667151 6.9121688
#> dist -2.5665560 0.20291509 -2.9642696 -2.1688425
#> ffreq2 -0.6275473 0.08986892 -0.8036904 -0.4514043
#> ffreq3 -0.6288418 0.11186986 -0.8481067 -0.4095768
#>
#> ----Standard deviations (influential elements only)----
#> elements standard_deviation
#> 1 xb 0.68082890
#> 2 spatial_scale1 0.09866884
#> 3 spatial_scale2 0.05187771
#> 4 spatial_scale3 0.02740153
#> 5 spatial_scale4 0.01837173
#> 6 spatial_scale5 0.01802175
#> 7 spatial_scale6 0.01810347
#> 8 spatial_scale7 0.01920265
#> 9 spatial_scale8 0.02250761
#> 10 spatial_scale9 0.02380506
#> 11 spatial_scale10 0.02604527
#> 12 spatial_scale11 0.02789054
#> 13 spatial_scale12 0.02746576
#> 14 spatial_scale13 0.02726837
#> 15 spatial_scale14 0.02630008
#> 16 spatial_scale16 0.03005826
#> 17 spatial_scale18 0.03796247
#> 18 spatial_scale21 0.05913625
#> 19 residuals 0.17479384
#>
#> ----Error statistics ----------------------------------
#> stat value
#> 1 validation_R2 0.9507567
#> 2 validation_RMSE 0.2922840
#> 3 residual_SD 0.1747938The predictive values and their standard deviations at the grid cells are mapped as follows:
### Convert gridded points to gridded polygons (for clear visualization)
meuse.grid_sp <- meuse.grid
coordinates(meuse.grid_sp)<- c("x", "y")
gridded(meuse.grid_sp) <- TRUE
meuse.grid_sf <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons"))
st_crs(meuse.grid_sf) <- 28992
### Mapping predictive mean and standard deviations
meuse.grid_sf$pred <- mod$pred0$pred # Predictive mean
meuse.grid_sf$pred_sd <- mod$pred0$pred_sd# Predictive standard deviations
plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)plot(meuse.grid_sf[,"pred_sd"], pal = hcl.colors(9, "Viridis"), border = NA,
nbreaks = 9, key.pos = 4,axes = TRUE)As expected, the standard deviation, which quantifies prediction uncertainty, increases in the east-central region where sample sites are relatively sparse.
The sp_scalewise function extracts scalewise spatial
processes with bandwidth values falling within a pre-specified range.
For example, the following commands extract the large-, moderate-, and
small-scale processes, corresponding to bandwidth ranges of 1000+,
500–1000, and 0–500, respectively.
mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth)
mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Moderate scale (500 <= bandwidth < 1000)
mod_s3<- sp_scalewise(mod,bw_range=c(0,500)) # Small scale (bandwidth < 500)The extracted scalewise processes are mapped as follows:
meuse.grid_sf$z1 <- mod_s1$pred0$pred # Large-scale process
meuse.grid_sf$z2 <- mod_s2$pred0$pred # Moderate-scale process
meuse.grid_sf$z3 <- mod_s3$pred0$pred # Small-scale process
plot(meuse.grid_sf[,c("z1","z2","z3")],border = NA,key.pos=4,nbreaks=40,axes=TRUE)Their standard deviations are displayed as follows:
meuse.grid_sf$z1_sd <- mod_s1$pred0$pred_sd # Large-scale process
meuse.grid_sf$z2_sd <- mod_s2$pred0$pred_sd # Moderate-scale process
meuse.grid_sf$z3_sd <- mod_s3$pred0$pred_sd # Small-scale process
plot(meuse.grid_sf[,c("z1_sd","z2_sd","z3_sd")],
pal = function(n) hcl.colors(n, "Viridis"),
border = NA,key.pos=4,axes=TRUE)The sp_scalewise function is useful for multiscale
spatial pattern analysis (or feature extraction), which is commonly
conducted in ecological, epidemiological, and environmental studies.