Coarse-to-fine spatial modeling (CFSM) using the spCF package

This vignette introduces spatial prediction, multiscale pattern extraction, and uncertainty quantification using the spCF package. Throughout the vignette, we employ the coarse-to-fine spatial modeling (CFSM) framework. CFSM sequentially learns spatial patterns from coarser to finer scales and terminates the learning process when the validation score no longer improves. CFSM is especially suitable for moderate-to-large samples. See Murakami et al. (2026) for further details.

Data and setup

Let us load the required packages

library(spCF)
library(sp)
library(sf)

The meuse dataset included in the sp package is used in this vignette. This dataset comprises measurements of heavy metal concentrations in topsoil collected from a floodplain along the River Meuse, along with several covariates. In this example, zinc concentrations observed at 155 sample sites are used to predict concentrations over 3,103 regularly spaced grid cells.

The response variable is the logarithm of zinc concentration. The covariates include the distance to the river and dummy variables indicating flood frequency (ffreq2 and ffreq3):

### Data at samples sites
data(meuse)
y        <- log(meuse[,"zinc"])     # Response variable
coords   <- meuse[,c("x","y")]      # Coordinates
x        <- data.frame(dist   = meuse[,"dist"],
                       ffreq2 = as.integer(meuse$ffreq == 2),
                       ffreq3 = as.integer(meuse$ffreq == 3)) # Covariates

### Data at prediction sites
data(meuse.grid)
coords0  <- meuse.grid[,c("x","y")] # Coordinates
x0       <- data.frame(dist   = meuse.grid[,"dist"],
                       ffreq2 = as.integer(meuse.grid$ffreq == 2),
                       ffreq3 = as.integer(meuse.grid$ffreq == 3)) # Covariates

Zinc concentration is spatially plotted as follows:

obs_s    <- st_as_sf( data.frame(coords, zinc=y), coords= c("x","y"), crs=28992)
plot(obs_s[,"zinc"], cex=0.6,pch = 20, nbreaks = 20, key.pos=4, axes=TRUE)

Coarse-to-fine spatial modeling

In CFSM, the number of spatial scales, R, is optimized through holdout validation. A smaller R, corresponding to early stopping, allows the spatial process to capture only coarse-scale patterns, whereas a larger R enables the spatial process to represent finer-scale patterns. The cf_lm_hv function performs holdout validation as follows:

mod_hv   <- cf_lm_hv(y = y, x = x, coords = coords)
#> [1] --- SSE: Linear regression ---
#> [1] 7.456608
#> [1] --- SSE: Learning multi-scale spatial process ---
#> [1] 6.722849 (Scale  1)
#> [1] 6.277134 (Scale  2)
#> [1] 6.027553 (Scale  3)
#> [1] 5.835912 (Scale  4)
#> [1] 5.675355 (Scale  5)
#> [1] 5.493684 (Scale  6)
#> [1] 5.289569 (Scale  7)
#> [1] 5.105127 (Scale  8)
#> [1] 4.956676 (Scale  9)
#> [1] 4.714796 (Scale 10)
#> [1] 4.515861 (Scale 11)
#> [1] 4.328215 (Scale 12)
#> [1] 4.158959 (Scale 13)
#> [1] 4.004512 (Scale 14)
#> [1] 4.004512 (Scale 15) no improvement
#> [1]  3.84613 (Scale 16)
#> [1]  3.84613 (Scale 17) no improvement
#> [1] 3.696813 (Scale 18)
#> [1] 3.696813 (Scale 19) no improvement
#> [1] 3.696813 (Scale 20) no improvement
#> [1] 3.501784 (Scale 21)
#> [1] 3.501784 (Scale 22) no improvement
#> [1] 3.501784 (Scale 23) no improvement
#> [1] 3.501784 (Scale 24) no improvement
#> [1] 3.501784 (Scale 25) no improvement
#> [1] 
#> [1] -> Selected finest scale: 21 (bandwidth: 174.7008)
#> [1] 
#> [1] --- SSE: After coefficient adjustment ---
#> [1] 3.331767

In this example, 75% of the samples are randomly selected for training, and the remaining samples are used for validation (train_rat = 0.75). Alternatively, the training samples can be specified explicitly using the id_train argument. As shown in the output, the sum-of-squares error (SSE) for the validation samples gradually decreases as learning proceeds from the coarsest scale (Scale 1) to the finest scale (Scale 21 in this case).

Optionally, an additional learning can be applied to further reduce the validation SSE. Currently, only random forest is available; it is implemented by specifying add_learn = "rf" in the cf_lm_hv function, to capture non-linear patterns and higher-order interactions among covariates and spatial coordinates.

After holdout validation, the full model is trained using the cf_lm function:

mod      <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0, mod_hv = mod_hv)
#> [1] --- Learning multi-scale spatial process ---
#> [1]  Scale  1 (bandwidth:1436.96)
#> [1]  Scale  2 (bandwidth:1293.264)
#> [1]  Scale  3 (bandwidth:1163.938)
#> [1]  Scale  4 (bandwidth:1047.544)
#> [1]  Scale  5 (bandwidth:942.7897)
#> [1]  Scale  6 (bandwidth:848.5107)
#> [1]  Scale  7 (bandwidth:763.6596)
#> [1]  Scale  8 (bandwidth:687.2937)
#> [1]  Scale  9 (bandwidth:618.5643)
#> [1]  Scale 10 (bandwidth:556.7079)
#> [1]  Scale 11 (bandwidth:501.0371)
#> [1]  Scale 12 (bandwidth:450.9334)
#> [1]  Scale 13 (bandwidth:405.84)
#> [1]  Scale 14 (bandwidth:365.256)
#> [1]  Scale 15 (bandwidth:328.7304) no improvement (skipped)
#> [1]  Scale 16 (bandwidth:295.8574)
#> [1]  Scale 17 (bandwidth:266.2717) no improvement (skipped)
#> [1]  Scale 18 (bandwidth:239.6445)
#> [1]  Scale 19 (bandwidth:215.68) no improvement (skipped)
#> [1]  Scale 20 (bandwidth:194.112) no improvement (skipped)
#> [1]  Scale 21 (bandwidth:174.7008)

In addition to the covariates x and spatial coordinates coords at the sample sites, the corresponding covariates (x0) and coordinates (coords0) at the prediction sites are also specified to enable spatial prediction.

The estimated regression coefficients, standard deviations of the scale-wise spatial processes, and error statistics are displayed as follows:

mod
#> Call:
#> cf_lm(y = y, x = x, coords = coords, x0 = x0, coords0 = coords0, 
#>     mod_hv = mod_hv)
#> 
#> ----Coefficients---------------------------------------
#>                 coef    coef_se lower_95CI upper_95CI
#> Intercept  6.7894420 0.06261576  6.6667151  6.9121688
#> dist      -2.5665560 0.20291509 -2.9642696 -2.1688425
#> ffreq2    -0.6275473 0.08986892 -0.8036904 -0.4514043
#> ffreq3    -0.6288418 0.11186986 -0.8481067 -0.4095768
#> 
#> ----Standard deviations (influential elements only)----
#>           elements standard_deviation
#> 1               xb         0.68082890
#> 2   spatial_scale1         0.09866884
#> 3   spatial_scale2         0.05187771
#> 4   spatial_scale3         0.02740153
#> 5   spatial_scale4         0.01837173
#> 6   spatial_scale5         0.01802175
#> 7   spatial_scale6         0.01810347
#> 8   spatial_scale7         0.01920265
#> 9   spatial_scale8         0.02250761
#> 10  spatial_scale9         0.02380506
#> 11 spatial_scale10         0.02604527
#> 12 spatial_scale11         0.02789054
#> 13 spatial_scale12         0.02746576
#> 14 spatial_scale13         0.02726837
#> 15 spatial_scale14         0.02630008
#> 16 spatial_scale16         0.03005826
#> 17 spatial_scale18         0.03796247
#> 18 spatial_scale21         0.05913625
#> 19       residuals         0.17479384
#> 
#> ----Error statistics ----------------------------------
#>              stat     value
#> 1   validation_R2 0.9507567
#> 2 validation_RMSE 0.2922840
#> 3     residual_SD 0.1747938

Mapping

Predictive values

The predictive values and their standard deviations at the grid cells are mapped as follows:

### Convert gridded points to gridded polygons (for clear visualization)
meuse.grid_sp             <- meuse.grid
coordinates(meuse.grid_sp)<- c("x", "y")
gridded(meuse.grid_sp)    <- TRUE
meuse.grid_sf             <- st_as_sf(as(meuse.grid_sp, "SpatialPolygons"))
st_crs(meuse.grid_sf)     <- 28992

### Mapping predictive mean and standard deviations
meuse.grid_sf$pred        <- mod$pred0$pred   # Predictive mean
meuse.grid_sf$pred_sd     <- mod$pred0$pred_sd# Predictive standard deviations
plot(meuse.grid_sf[,"pred"], border = NA, nbreaks = 20, key.pos=4,axes=TRUE)

plot(meuse.grid_sf[,"pred_sd"], pal = hcl.colors(9, "Viridis"), border = NA, 
     nbreaks = 9, key.pos = 4,axes = TRUE)

As expected, the standard deviation, which quantifies prediction uncertainty, increases in the east-central region where sample sites are relatively sparse.

Multiscale spatial pattern/feature extraction

The sp_scalewise function extracts scalewise spatial processes with bandwidth values falling within a pre-specified range. For example, the following commands extract the large-, moderate-, and small-scale processes, corresponding to bandwidth ranges of 1000+, 500–1000, and 0–500, respectively.

mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth)
mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Moderate scale (500 <= bandwidth < 1000)
mod_s3<- sp_scalewise(mod,bw_range=c(0,500))    # Small scale (bandwidth < 500)

The extracted scalewise processes are mapped as follows:

meuse.grid_sf$z1    <- mod_s1$pred0$pred   # Large-scale process
meuse.grid_sf$z2    <- mod_s2$pred0$pred   # Moderate-scale process
meuse.grid_sf$z3    <- mod_s3$pred0$pred   # Small-scale process
plot(meuse.grid_sf[,c("z1","z2","z3")],border = NA,key.pos=4,nbreaks=40,axes=TRUE)

Their standard deviations are displayed as follows:

meuse.grid_sf$z1_sd <- mod_s1$pred0$pred_sd   # Large-scale process
meuse.grid_sf$z2_sd <- mod_s2$pred0$pred_sd   # Moderate-scale process
meuse.grid_sf$z3_sd <- mod_s3$pred0$pred_sd   # Small-scale process
plot(meuse.grid_sf[,c("z1_sd","z2_sd","z3_sd")],
     pal = function(n) hcl.colors(n, "Viridis"),
     border = NA,key.pos=4,axes=TRUE)

The sp_scalewise function is useful for multiscale spatial pattern analysis (or feature extraction), which is commonly conducted in ecological, epidemiological, and environmental studies.

Reference

Murakami, Daisuke, Alexis Comber, Takahiro Yoshida, Narumasa Tsutsumida, Chris Brunsdon, and Tomoki Nakaya. 2026. “Coarse-to-Fine Spatial Modeling: A Scalable, Machine-Learning-Compatible Framework.” Geographical Analysis 58 (2): e70034.