| Type: | Package |
| Title: | Fast and Scalable Cellwise-Robust Ensemble |
| Version: | 3.0.0 |
| Date: | 2026-06-12 |
| Maintainer: | Anthony Christidis <anthony.christidis@stat.ubc.ca> |
| Description: | Functions to perform robust variable selection and regression using the Fast and Scalable Cellwise-Robust Ensemble (FSCRE) algorithm. The approach establishes a robust foundation using the Detect Deviating Cells (DDC) algorithm and robust correlation estimates. It then employs a competitive ensemble architecture where a robust Least Angle Regression (LARS) engine proposes candidate variables and cross-validation arbitrates their assignment. A final robust MM-estimator is applied to the selected predictors. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| Biarch: | true |
| Imports: | cellWise, robustbase, mvnfast |
| Suggests: | testthat |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-12 05:01:05 UTC; anthony |
| Author: | Anthony Christidis [aut, cre], Gabriela Cohen-Freue [aut] |
| Repository: | CRAN |
| Date/Publication: | 2026-06-12 07:10:02 UTC |
Check Input Data for srlars Function
Description
Internal helper function to validate arguments passed to srlars.
Checks types, dimensions, and logical constraints.
Usage
checkInputData(
x,
y,
n_models,
tolerance,
max_predictors,
x_preprocess,
y_preprocess,
cor_estimator,
cv_preprocess,
cv_loss,
cv_fit,
cv_folds,
compute_coef
)
Arguments
x |
Design matrix. |
y |
Response vector. |
n_models |
Number of models in the ensemble. |
tolerance |
Relative improvement tolerance for stopping. |
max_predictors |
Maximum total number of variables to select. |
x_preprocess |
X cleaning method. |
y_preprocess |
y cleaning method. |
cor_estimator |
Correlation method. |
cv_preprocess |
Foldwise or global CV. |
cv_loss |
Arbiter loss function. |
cv_fit |
Arbiter fit function. |
cv_folds |
Number of CV folds. |
compute_coef |
Logical. |
Value
NULL. Stops execution with an error message if invalid inputs are detected.
Coefficients for srlars Object
Description
coef.srlars returns the averaged coefficients for a srlars object.
Usage
## S3 method for class 'srlars'
coef(object, model_index = NULL, ...)
Arguments
object |
An object of class srlars. |
model_index |
Indices of the sub-models to include in the ensemble average. Default is NULL, which includes all models. |
... |
Additional arguments for compatibility. |
Value
A numeric vector containing the averaged intercept (first element) and slope coefficients.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)
# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1
# Setting the seed
set.seed(0)
# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
sigma.mat[(group*group.size+1):(group*group.size+group.size),
(group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1
# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))
# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))
# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)
# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)
# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
n_models = 5,
tolerance = 1e-4,
x_preprocess = "ddc",
y_preprocess = "wrap",
cor_estimator = "wrap",
cv_preprocess = "global",
cv_fit = "ls",
cv_loss = "huber",
compute_coef = TRUE)
# Ensemble coefficients
# Default: Average over all models
ensemble_coefs <- coef(ensemble_fit)
# Sensitivity (Recall)
active_selected <- which(ensemble_coefs[-1] != 0)
true_active <- which(true.beta != 0)
recall <- length(intersect(active_selected, true_active)) / length(true_active)
print(paste("Recall:", recall))
# Precision
if(length(active_selected) > 0){
precision <- length(intersect(active_selected, true_active)) / length(active_selected)
} else {
precision <- 0
}
print(paste("Precision:", precision))
Compute CV Error (Internal)
Description
Evaluates the cross-validation error of a given active set using a hyper-fast Cholesky solver (with optional Huber IRLS robust fitting) and robust validation loss.
Usage
computeCVError(cv_data, active_set, cv_fit, cv_loss)
Arguments
cv_data |
List of fold data (x_train, y_train, x_val, y_val). |
active_set |
Integer vector of active predictors. |
cv_fit |
Character. Fitting method: "ls" or "huber" (IRLS). |
cv_loss |
Character. Loss function: "huber", "trimmed", or "mse". |
Value
Numeric. The averaged cross-validation error.
FSCRE Final Robust Fitting (Internal)
Description
Implements Stage 3: Fitting robust MM-estimators to the selected variable sets.
Usage
computeFinalFit(x.imp, y.imp, active.sets, compute_coef)
Arguments
x.imp |
Imputed design matrix. |
y.imp |
Imputed response vector. |
active.sets |
List of integer vectors (indices of selected variables). |
compute_coef |
Logical. |
Value
A list containing 'coefficients' (list of vectors) and 'intercepts' (vector).
Compute Robust Foundation (Internal)
Description
Implements Stage 1 of the FSCRE algorithm: separate data cleaning for X and y, and robust, PSD correlation estimation.
Usage
computeRobustFoundation(x, y, x_preprocess, y_preprocess, cor_estimator)
Arguments
x |
Design matrix. |
y |
Response vector. |
x_preprocess |
Method for X cleaning ("ddc", "none"). |
y_preprocess |
Method for y cleaning ("wrap", "robust_z", "none"). |
cor_estimator |
Method for correlations ("wrap", "pearson"). |
Value
A list containing the imputed data (x.imp, y.imp), the correlation structures (Rx, ry), and the DDC object for prediction.
Get Robust LARS Proposal (Internal)
Description
Calculates the next LARS step analytically using correlation matrices.
Usage
getLarsProposal(
Rx,
active.set,
sign.vector,
current.correlations,
available.vars
)
Arguments
Rx |
Global correlation matrix (p x p). |
active.set |
Integer vector of active indices. |
sign.vector |
Integer vector of signs for active set. |
current.correlations |
Numeric vector of current correlations with residual. |
available.vars |
Integer vector of candidate indices to search. |
Value
A list containing: next_var, next_sign, gamma, a_vec (for update). Or NULL.
FSCRE Competitive Selection Loop (Internal)
Description
Implements Stage 2 of the FSCRE algorithm: the iterative competitive selection using a proposer–arbiter mechanism. Candidate moves are proposed by a robust LARS step computed from the (robust) correlation inputs, and accepted by an arbiter based on cross-validated predictive improvement.
The function supports two CV preprocessing modes:
-
cv_preprocess = "global": CV is computed on the globally preprocessed training data(x.imp, y.imp). -
cv_preprocess = "foldwise": preprocessing is fitted on each CV fold's training subset and applied to that fold's validation subset (viacellWise::DDCpredictforx_preprocess="ddc"and by reusing the training fold's location/scale fory_preprocess="wrap").
The selection step terminates when no candidate yields a strictly positive CV
improvement, or when the best relative improvement falls below tolerance.
Usage
performSelectionLoop(
Rx,
ry,
x,
y,
x.imp,
y.imp,
n_models,
max_predictors,
tolerance,
x_preprocess,
y_preprocess,
cv_preprocess,
cv_fit,
cv_loss,
cv_folds
)
Arguments
Rx |
Global predictor correlation matrix used by the LARS proposer (p x p). |
ry |
Global predictor-response correlation vector used by the LARS proposer (length p). |
x |
Raw design matrix (n x p). Used for foldwise preprocessing if requested. |
y |
Raw response vector (length n). Used for foldwise preprocessing if requested. |
x.imp |
Globally preprocessed design matrix (n x p), used when |
y.imp |
Globally preprocessed response vector (length n), used when |
n_models |
Number of models in the ensemble (K). |
max_predictors |
Maximum total number of predictors to select across all models. |
tolerance |
Relative improvement tolerance for stopping ( |
x_preprocess |
Character. Preprocessing method for predictors in the CV loop
(e.g., |
y_preprocess |
Character. Preprocessing method for response in the CV loop
(e.g., |
cv_preprocess |
Character. CV preprocessing mode: |
cv_fit |
Character. Inner CV fitting method used by |
cv_loss |
Character. CV scoring loss used by |
cv_folds |
Integer. Number of CV folds. |
Value
A list with component:
active.setsA list of length
n_models, where each element is an integer vector of selected variable indices for that sub-model.
See Also
getLarsProposal, computeCVError, computeRobustFoundation
Predictions for srlars Object
Description
predict.srlars returns the predictions for a srlars object.
Usage
## S3 method for class 'srlars'
predict(object, newx, model_index = NULL, dynamic = TRUE, ...)
Arguments
object |
An object of class srlars. |
newx |
New data matrix for predictions. |
model_index |
Indices of the sub-models to include in the ensemble. Default is NULL (all models). |
dynamic |
Logical. If TRUE, and the model was trained robustly, the new data |
... |
Additional arguments for compatibility. |
Value
A numeric vector of predictions.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)
# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1
# Setting the seed
set.seed(0)
# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
sigma.mat[(group*group.size+1):(group*group.size+group.size),
(group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1
# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))
# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))
# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)
# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)
# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
n_models = 5,
tolerance = 1e-4,
x_preprocess = "ddc",
y_preprocess = "wrap",
cor_estimator = "wrap",
cv_preprocess = "global",
cv_fit = "ls",
cv_loss = "huber",
compute_coef = TRUE)
# Generate Test Data
x_test <- mvnfast::rmvn(50, mu = rep(0, p), sigma = sigma.mat)
colnames(x_test) <- paste0("V", 1:p)
y_test <- x_test %*% true.beta + rnorm(50, 0, sigma)
# Predict on Test Data
preds <- predict(ensemble_fit, x_test)
# Calculate MSPE
mspe <- mean((y_test - preds)^2)
print(paste("MSPE:", mspe))
Fast and Scalable Cellwise-Robust Ensemble (FSCRE)
Description
srlars performs the FSCRE algorithm for robust variable selection and regression.
Usage
srlars(
x,
y,
n_models = 5,
tolerance = 1e-08,
max_predictors = NULL,
x_preprocess = c("ddc", "none"),
y_preprocess = c("wrap", "robust_z", "none"),
cor_estimator = c("wrap", "pearson"),
cv_preprocess = c("global", "foldwise"),
cv_fit = c("ls", "huber"),
cv_loss = c("huber", "trimmed", "mse"),
cv_folds = 5,
compute_coef = TRUE
)
Arguments
x |
Design matrix (n x p). |
y |
Response vector (n x 1). |
n_models |
Number of models in the ensemble (K). Default is 5. |
tolerance |
Relative improvement tolerance for stopping (tau). Default is 1e-8. |
max_predictors |
Maximum total number of variables to select across all models. Default is n * n_models. |
x_preprocess |
Character. "ddc" (default) for cellwise cleaning, or "none". |
y_preprocess |
Character. "wrap" (default) for univariate robustification, "robust_z", or "none". |
cor_estimator |
Character. "wrap" (default) for robust PSD correlation, or "pearson". |
cv_preprocess |
Character. "global" (default) or "foldwise" (to prevent data leakage). |
cv_fit |
Character. "ls" (default) or "huber" for the inner arbiter fitting method. |
cv_loss |
Character. "huber" (default), "trimmed", or "mse" for arbiter scoring. |
cv_folds |
Integer. Number of cross-validation folds. Default is 5. |
compute_coef |
Logical. If TRUE, fits the final robust MM-models. Default is TRUE. |
Value
An object of class srlars containing the selected variables and coefficients.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)
# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1
# Setting the seed
set.seed(0)
# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
sigma.mat[(group*group.size+1):(group*group.size+group.size),
(group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1
# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))
# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))
# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)
# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)
# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
n_models = 5,
tolerance = 1e-4,
x_preprocess = "ddc",
y_preprocess = "wrap",
cor_estimator = "wrap",
cv_preprocess = "global",
cv_fit = "ls",
cv_loss = "huber",
compute_coef = TRUE)
# Check selected variables
print(ensemble_fit$active.sets)