| Type: | Package |
| Title: | Coarse-to-Fine Spatial Modeling |
| Version: | 0.1.1 |
| Imports: | FNN, fields, nloptr, dbscan, ranger, withr, Rcpp |
| LinkingTo: | Rcpp |
| Suggests: | sp, sf, knitr, rmarkdown, CARBayesdata |
| Description: | Provides functions for coarse-to-fine spatial modeling (CFSM), enabling fast spatial prediction, regression, and uncertainty quantification. This method is suitable for moderate to large samples. For further details, see Murakami et al. (2026) <doi:10.1111/gean.70034>. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-05-01 16:12:02 UTC; dmuraka |
| Author: | Daisuke Murakami [aut, cre], Alexis Comber [aut], Takahiro Yoshida [aut], Narumasa Tsutsumida [aut], Chris Brunsdon [aut], Tomoki Nakaya [aut] |
| Maintainer: | Daisuke Murakami <dmuraka@ism.ac.jp> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-01 16:30:09 UTC |
spCF: Coarse-to-Fine Spatial Modeling
Description
Provides functions for coarse-to-fine spatial modeling (CFSM), enabling fast spatial prediction, regression, and uncertainty quantification. Suitable for moderate to large samples.
Author(s)
Maintainer: Daisuke Murakami dmuraka@ism.ac.jp
Authors:
Alexis Comber
Takahiro Yoshida
Narumasa Tsutsumida
Chris Brunsdon
Tomoki Nakaya
Coarse-to-fine spatial generalized linear mixed models (CF-GLMMs)
Description
Prediction and regression via CF-GLMMs.
Usage
cf_glm(
y,
x = NULL,
coords,
offset = NULL,
x0 = NULL,
coords0 = NULL,
offset0 = NULL,
mod_hv
)
Arguments
y |
Vector of response variables (N x 1) including continuous, count, and binary responses, following an exponential family distribution. |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
offset |
Optional. Vector of offset variable (N x 1) to be included
in the linear predictor. It is consistent with that of |
x0 |
Optional. Matrix of covariates at prediction sites (N0 x K). |
coords0 |
Optional. Matrix of 2-dimensional point coordinates at prediction sites (N0 x 2). |
offset0 |
Optional. Vector of offset variables at prediction sites (N0 x 1) |
mod_hv |
Output object of the |
Value
A list with the following elements:
- beta
Regression coefficients, their standard errors, and the lower and upper limits of the 95 percent confidence intervals.
- sd_summary
Standard deviation of the regression term (xb), spatial process (spatial_scale1, spatial_scale2,...), additional learning, and residuals.
- e_summary
Error statistics for the validation samples: pseudo R-squared, root mean squared error (RMSE), and mean absolute error (MAE).
- pred
Predictive means and standard deviations (sample sites).
- pred0
Predictive means and standard deviations (prediction sites).
- pred_q
Predictive quantiles on the response scale at the sample sites. A data frame whose columns
q0.005,q0.025,q0.05,q0.1, ...,q0.9,q0.95,q0.975,q0.995give the corresponding quantile levels, obtained by Gaussian approximation on the link scale followed by inverse-link transformation.- pred0_q
Predictive quantiles on the response scale at the prediction sites. Column structure is identical to
pred_q.NULLwhen prediction sites are not supplied.- bands
Bandwidth values for each scale. The i-th bandwidth is used for the spatial process corresponding to the i-th column of the Z matrix.
- Z
Predictive mean of the spatial process in each scale (sample sites; list).
- Z_sd
Predictive standard deviation of the spatial process in each scale (sample sites; list).
- Z0
Predictive mean of the spatial process in each scale (prediction sites; list).
- Z0_sd
Predictive standard deviation of the spatial process in each scale (prediction sites; list).
- Other
Other internal output objects.
Author(s)
Daisuke Murakami
References
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
See Also
Examples
################ Example 1: Count data modeling/Disease mapping/smoothing
set.seed(1234)
require( CARBayesdata )
require( sf )
data(pollutionhealthdata)
data(GGHB.IZ)
### Data
dat <- pollutionhealthdata[pollutionhealthdata$year==2011,]
y <- dat[,"observed"] # count data
x <- dat[,c("pm10","jsa","price")]
offset <- log(dat[,"expected"])
coords <- st_coordinates(st_centroid(GGHB.IZ))
### Holdout validation optimizing the number of spatial scales
mod_hv <- cf_glm_hv(y = y, x = x, offset=offset, coords = coords, family=poisson())
### Spatial modeling and prediction
mod <- cf_glm(y = y, x = x, coords = coords, mod_hv = mod_hv)
mod
### Mapping predictive mean and standard deviations (SD)
GGHB.IZ$y <- y
GGHB.IZ$pred <- mod$pred$pred
GGHB.IZ$pred_sd<- mod$pred$pred_sd
plot(GGHB.IZ[,c("pred")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50) # Predictive mean
plot(GGHB.IZ[,c("pred_sd")],lwd=0.2,axes=TRUE, key.pos=4,nbreaks=50)# Predictive SD
### Multiscale spatial pattern/feature extraction
mod_s1 <- sp_scalewise(mod,bw_range=c(4000,Inf)) # Large scale (4000 <= bandwidth)
mod_s2 <- sp_scalewise(mod,bw_range=c(0,4000)) # Small scale (bandwidth <= 4000)
GGHB.IZ$z1 <- mod_s1$pred$pred
GGHB.IZ$z2 <- mod_s2$pred$pred
plot(GGHB.IZ[,c("z1","z2")],lwd=0.2,axes=TRUE,key.pos=4, nbreaks=50)# Extracted features
################ Example 2: Binary data modeling/spatial prediction
set.seed(1234)
require(sp); require(sf)
data(meuse)
data(meuse.grid)
### Data
y <- ifelse(meuse$ffreq==1, 1, 0 )# binary data
coords <- meuse[,c("x","y")]
x <- meuse[,"dist"]
### Data at prediction sites
coords0 <- meuse.grid[,c("x","y")]
x0 <- meuse.grid[,"dist"]
### Holdout validation optimizing the number of spatial scales
mod_hv <- cf_glm_hv(y = y, x = x, coords = coords, family=binomial())
### Spatial modeling and prediction
mod <- cf_glm(y = y, x=x, coords = coords, x0=x0, coords0 = coords0,
mod_hv = mod_hv)
mod
### Mapping predictive mean and standard deviations (SD)
meuse.grid$pred <- mod$pred0$pred
meuse.grid$pred_sd<- mod$pred0$pred_sd
meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y"))
plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.8, nbreaks = 20) # Predictive mean
plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.8, nbreaks = 20)# Predictive SD
### Multiscale spatial pattern/feature extraction
mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth)
mod_s2<- sp_scalewise(mod,bw_range=c(0,1000)) # Small scale (0 <= bandwidth <= 1000)
meuse.grid_sf$z1 <- mod_s1$pred0$pred
meuse.grid_sf$z2 <- mod_s2$pred0$pred
plot(meuse.grid_sf[,c("z1","z2")], pch = 15,
cex = 0.5, nbreaks = 20,axes=TRUE) # Predictive means
Holdout validation for coarse-to-fine training of spatial generalized linear mixed models (GLMMs)
Description
Trains a coarse-to-fine spatial GLMMs (CF-GLMMs) and optimizes the spatial scale through progressive holdout validation.
Usage
cf_glm_hv(
y,
x = NULL,
coords,
offset = NULL,
train_rat = 0.75,
id_train = NULL,
alpha = 0.9,
kernel = "exp",
family = gaussian(),
seed = 1234
)
Arguments
y |
Vector of response variables (N x 1) including continuous, count, and binary responses, following an exponential family distribution. |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
offset |
Optional. Vector of offset variable (N x 1) to be included
in the linear predictor. It is consistent with that of |
train_rat |
Training sample ratio (default: 0.75). For small to moderate samples (N <= 30000), samples closest to the k-means centers are used for validation samples. For larger samples, training samples are drawn at random. |
id_train |
Optional. If specified, the corresponding samples are used as training samples. Otherwise, training samples are chosen based on 'train_rat'. |
alpha |
Decay ratio of the kernel bandwidth in the coarse-to-fine training (default: 0.9). As it approaches one, the optimization becomes more stringent but requires longer computation time. |
kernel |
Kernel type for modeling spatial dependence. '"exp"' for the exponential kernel (default) and '"gau"' for the Gaussian kernel. |
family |
Description of the error distribution and link function
consistent with the 'family' argument in the |
seed |
Random seed used for the training/validation split when 'id_train' is not supplied. Defaults to '1234', which makes the split reproducible across calls. Set to 'NULL' to allow each call to draw a different split (useful for assessing sensitivity to the split). |
Value
A list with the following elements:
- loss_hv
Deviance loss for validation samples.
- loss_hv_all
All the deviance losses obtained in each learning step.
- id_train
ID of training samples.
- other
List of other outcomes, which are internally used.
Author(s)
Daisuke Murakami
References
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
See Also
Coarse-to-fine spatial modeling (CFSM) for Gaussian response
Description
Prediction and regression via coarse-to-fine spatial modeling.
Usage
cf_lm(y, x = NULL, coords, x0 = NULL, coords0 = NULL, mod_hv)
Arguments
y |
Vector of response variables (N x 1). |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
x0 |
Optional. Matrix of covariates at prediction sites (N0 x K). |
coords0 |
Optional. Matrix of 2-dimensional point coordinates at prediction sites (N0 x 2). |
mod_hv |
Output object of the |
Value
A list with the following elements:
- beta
Regression coefficients, their standard errors, and the lower and upper limits of the 95 percent confidence intervals.
- sd_summary
Standard deviation of the regression term (xb), spatial process (spatial_scale1, spatial_scale2,...), additionally learned components (effective if 'cf_lm_hv/add_learn' is not 'none'), and residuals.
- e_summary
R-squared for the validation samples (validation_R2), root mean squared error for the validation samples (validation_RMSE), and the residual standard deviation (residual_SD).
- pred
Predictive means and standard deviations (sample sites).
- pred0
Predictive means and standard deviations (prediction sites).
- bands
Bandwidth values for each scale. The i-th bandwidth is used to describe the spatial process corresponding to the i-th column of the Z matrix.
- Z
Predictive means of the single-scale processes at each scale, corresponding to each bandwidth value (sample sites; list).
- Z_sd
Predictive standard deviation of the spatial processes corresponding to in each bandwidth (sample sites; list).
- Z0
Predictive mean of the spatial process corresponding to each bandwidth (prediction sites; list).
- Z0_sd
Predictive standard deviation of the spatial process corresponding to in each bandwidth (prediction sites; list).
- Other
Other internal output objects.
Author(s)
Daisuke Murakami
References
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2026). Coarse-to-fine spatial modeling: A scalable, machine-learning-compatible framework. *Geographical Analysis*, 58(2), e70034. https://onlinelibrary.wiley.com/doi/10.1111/gean.70034
See Also
cf_glm, cf_lm_hv, sp_scalewise
Examples
set.seed(123)
require(sp); require(sf)
data(meuse)
data(meuse.grid)
### Data
y <- log(meuse[,"zinc"])
coords <- meuse[,c("x","y")]
x <- data.frame(dist = meuse[,"dist"],
ffreq2 = as.integer(meuse$ffreq == 2),
ffreq3 = as.integer(meuse$ffreq == 3))
### Data at prediction sites
coords0 <- meuse.grid[,c("x","y")]
x0 <- data.frame(dist = meuse.grid[,"dist"],
ffreq2 = as.integer(meuse.grid$ffreq == 2),
ffreq3 = as.integer(meuse.grid$ffreq == 3))
### Holdout validation optimizing the number of spatial scales
mod_hv <- cf_lm_hv(y = y, x = x, coords = coords, add_learn = "none")
### Spatial modeling and prediction
mod <- cf_lm(y = y, x = x, x0 = x0, coords = coords, coords0 = coords0,
mod_hv = mod_hv)
mod
### Mapping predictive mean and standard deviations (SD)
meuse.grid$pred <- mod$pred0$pred
meuse.grid$pred_sd<- mod$pred0$pred_sd
meuse.grid_sf <- st_as_sf(meuse.grid, coords = c("x","y"))
plot(meuse.grid_sf[,"pred"], pch = 15, cex = 0.5, nbreaks = 20) # Predictive mean
plot(meuse.grid_sf[,"pred_sd"], pch = 15, cex = 0.5, nbreaks = 20)# Predictive SD
### Multiscale spatial pattern/feature extraction
mod_s1<- sp_scalewise(mod,bw_range=c(1000,Inf)) # Large scale (1000 <= bandwidth)
mod_s2<- sp_scalewise(mod,bw_range=c(500,1000)) # Middle scale (500 <= bandwidth <= 1000)
mod_s3<- sp_scalewise(mod,bw_range=c(0,500)) # Small scale (bandwidth <= 500)
z1 <- mod_s1$pred0$pred # Predictive mean
z2 <- mod_s2$pred0$pred
z3 <- mod_s3$pred0$pred
z1_sd <- mod_s1$pred0$pred_sd # Predictive SD
z2_sd <- mod_s2$pred0$pred_sd
z3_sd <- mod_s3$pred0$pred_sd
meuse.grid_sf3 <- cbind(meuse.grid_sf, z1, z2, z3, z1_sd, z2_sd, z3_sd)
plot(meuse.grid_sf3[,c("z1","z2","z3")], pch = 15,
cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive means
plot(meuse.grid_sf3[,c("z1_sd","z2_sd","z3_sd")], pch = 15,
cex = 0.5, nbreaks = 20,key.pos=4,axes=TRUE) # Predictive SD
Holdout validation for the Gaussian coarse-to-fine spatial modeling (CFSM)
Description
Trains the CFSM-based Gaussian spatial regression and optimizes the number of spatial scales through sequential holdout validation.
Usage
cf_lm_hv(
y,
x = NULL,
coords,
train_rat = 0.75,
id_train = NULL,
alpha = 0.9,
kernel = "exp",
add_learn = "none",
seed = 123
)
Arguments
y |
Vector of response variables (N x 1). |
x |
Matrix of covariates (N x K). |
coords |
Matrix of 2-dimensional point coordinates (N x 2). |
train_rat |
Training sample ratio (default: 0.75). For small to moderate samples (N <= 30000), samples closest to the k-means centers are used for validation samples. For larger samples, training samples are drawn at random. |
id_train |
Optional. If specified, the corresponding samples are used as training samples. Otherwise, training samples are chosen based on 'train_rat'. |
alpha |
Decay ratio of the kernel bandwidth in the coarse-to-fine training (default: 0.9). As it approaches one, the optimization becomes more stringent but requires longer computation time. |
kernel |
Kernel type for modeling spatial dependence. '"exp"' for the exponential kernel (default) and '"gau"' for the Gaussian kernel. |
add_learn |
If '"rf"', random forest is additionally trained to capture non-linear patterns and/or higher-order interactions. Default is '"none"', meaning no additional training. |
seed |
Random seed used for the training/validation split when 'id_train' is not supplied. Defaults to '123', which makes the split reproducible across calls. Set to 'NULL' to allow each call to draw a different split (useful for assessing sensitivity to the split). |
Value
A list with the following elements:
- sse_hv
Sum-of-squared error (SSE) for validation samples.
- sse_hv_all
All the SSEs obtained in each learning step.
- id_train
ID of training samples.
- other
List of other outcomes, which are internally used.
Author(s)
Daisuke Murakami
References
Murakami, D., Comber, A., Yoshida, T., Tsutsumida, N., Brunsdon, C., & Nakaya, T. (2025). Coarse-to-fine spatial GLMMs for scalable prediction and multiscale analysis. *ArXiv*.
See Also
Extract scale-wise spatial processes
Description
Evaluate mean and variance of the spatial process with bandwidth values within a pre-specified range
Usage
sp_scalewise(mod, bw_range = c(0, Inf))
Arguments
mod |
|
bw_range |
Range of bandwidth values of the simulated spatial processes. For example, if bw_range = c(10, 20), spatial processes with bandwidths between 10 and 20 are synthesized and simulated. The default is c(0, Inf), which synthesizes all scales. |
Value
A list with the following elements:
- pred
Means and standard deviations of the spatial process (sample sites).
- pred0
Means and standard deviations of the spatial process (prediction sites).
NULLwhenmodwas fitted without prediction sites.
Author(s)
Daisuke Murakami