Package {srlars}


Type: Package
Title: Fast and Scalable Cellwise-Robust Ensemble
Version: 3.0.0
Date: 2026-06-12
Maintainer: Anthony Christidis <anthony.christidis@stat.ubc.ca>
Description: Functions to perform robust variable selection and regression using the Fast and Scalable Cellwise-Robust Ensemble (FSCRE) algorithm. The approach establishes a robust foundation using the Detect Deviating Cells (DDC) algorithm and robust correlation estimates. It then employs a competitive ensemble architecture where a robust Least Angle Regression (LARS) engine proposes candidate variables and cross-validation arbitrates their assignment. A final robust MM-estimator is applied to the selected predictors.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
Biarch: true
Imports: cellWise, robustbase, mvnfast
Suggests: testthat
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2026-06-12 05:01:05 UTC; anthony
Author: Anthony Christidis [aut, cre], Gabriela Cohen-Freue [aut]
Repository: CRAN
Date/Publication: 2026-06-12 07:10:02 UTC

Check Input Data for srlars Function

Description

Internal helper function to validate arguments passed to srlars. Checks types, dimensions, and logical constraints.

Usage

checkInputData(
  x,
  y,
  n_models,
  tolerance,
  max_predictors,
  x_preprocess,
  y_preprocess,
  cor_estimator,
  cv_preprocess,
  cv_loss,
  cv_fit,
  cv_folds,
  compute_coef
)

Arguments

x

Design matrix.

y

Response vector.

n_models

Number of models in the ensemble.

tolerance

Relative improvement tolerance for stopping.

max_predictors

Maximum total number of variables to select.

x_preprocess

X cleaning method.

y_preprocess

y cleaning method.

cor_estimator

Correlation method.

cv_preprocess

Foldwise or global CV.

cv_loss

Arbiter loss function.

cv_fit

Arbiter fit function.

cv_folds

Number of CV folds.

compute_coef

Logical.

Value

NULL. Stops execution with an error message if invalid inputs are detected.


Coefficients for srlars Object

Description

coef.srlars returns the averaged coefficients for a srlars object.

Usage

## S3 method for class 'srlars'
coef(object, model_index = NULL, ...)

Arguments

object

An object of class srlars.

model_index

Indices of the sub-models to include in the ensemble average. Default is NULL, which includes all models.

...

Additional arguments for compatibility.

Value

A numeric vector containing the averaged intercept (first element) and slope coefficients.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

See Also

srlars, predict.srlars

Examples

# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)

# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1

# Setting the seed
set.seed(0)

# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
  sigma.mat[(group*group.size+1):(group*group.size+group.size),
  (group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1

# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))

# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))

# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)

# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)

# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
                       n_models = 5,
                       tolerance = 1e-4,
                       x_preprocess = "ddc",
                       y_preprocess = "wrap",
                       cor_estimator = "wrap",
                       cv_preprocess = "global",
                       cv_fit = "ls",
                       cv_loss = "huber",
                       compute_coef = TRUE)

# Ensemble coefficients
# Default: Average over all models
ensemble_coefs <- coef(ensemble_fit)

# Sensitivity (Recall)
active_selected <- which(ensemble_coefs[-1] != 0)
true_active <- which(true.beta != 0)
recall <- length(intersect(active_selected, true_active)) / length(true_active)
print(paste("Recall:", recall))

# Precision
if(length(active_selected) > 0){
  precision <- length(intersect(active_selected, true_active)) / length(active_selected)
} else {
  precision <- 0
}
print(paste("Precision:", precision))


Compute CV Error (Internal)

Description

Evaluates the cross-validation error of a given active set using a hyper-fast Cholesky solver (with optional Huber IRLS robust fitting) and robust validation loss.

Usage

computeCVError(cv_data, active_set, cv_fit, cv_loss)

Arguments

cv_data

List of fold data (x_train, y_train, x_val, y_val).

active_set

Integer vector of active predictors.

cv_fit

Character. Fitting method: "ls" or "huber" (IRLS).

cv_loss

Character. Loss function: "huber", "trimmed", or "mse".

Value

Numeric. The averaged cross-validation error.


FSCRE Final Robust Fitting (Internal)

Description

Implements Stage 3: Fitting robust MM-estimators to the selected variable sets.

Usage

computeFinalFit(x.imp, y.imp, active.sets, compute_coef)

Arguments

x.imp

Imputed design matrix.

y.imp

Imputed response vector.

active.sets

List of integer vectors (indices of selected variables).

compute_coef

Logical.

Value

A list containing 'coefficients' (list of vectors) and 'intercepts' (vector).


Compute Robust Foundation (Internal)

Description

Implements Stage 1 of the FSCRE algorithm: separate data cleaning for X and y, and robust, PSD correlation estimation.

Usage

computeRobustFoundation(x, y, x_preprocess, y_preprocess, cor_estimator)

Arguments

x

Design matrix.

y

Response vector.

x_preprocess

Method for X cleaning ("ddc", "none").

y_preprocess

Method for y cleaning ("wrap", "robust_z", "none").

cor_estimator

Method for correlations ("wrap", "pearson").

Value

A list containing the imputed data (x.imp, y.imp), the correlation structures (Rx, ry), and the DDC object for prediction.


Get Robust LARS Proposal (Internal)

Description

Calculates the next LARS step analytically using correlation matrices.

Usage

getLarsProposal(
  Rx,
  active.set,
  sign.vector,
  current.correlations,
  available.vars
)

Arguments

Rx

Global correlation matrix (p x p).

active.set

Integer vector of active indices.

sign.vector

Integer vector of signs for active set.

current.correlations

Numeric vector of current correlations with residual.

available.vars

Integer vector of candidate indices to search.

Value

A list containing: next_var, next_sign, gamma, a_vec (for update). Or NULL.


FSCRE Competitive Selection Loop (Internal)

Description

Implements Stage 2 of the FSCRE algorithm: the iterative competitive selection using a proposer–arbiter mechanism. Candidate moves are proposed by a robust LARS step computed from the (robust) correlation inputs, and accepted by an arbiter based on cross-validated predictive improvement.

The function supports two CV preprocessing modes:

The selection step terminates when no candidate yields a strictly positive CV improvement, or when the best relative improvement falls below tolerance.

Usage

performSelectionLoop(
  Rx,
  ry,
  x,
  y,
  x.imp,
  y.imp,
  n_models,
  max_predictors,
  tolerance,
  x_preprocess,
  y_preprocess,
  cv_preprocess,
  cv_fit,
  cv_loss,
  cv_folds
)

Arguments

Rx

Global predictor correlation matrix used by the LARS proposer (p x p).

ry

Global predictor-response correlation vector used by the LARS proposer (length p).

x

Raw design matrix (n x p). Used for foldwise preprocessing if requested.

y

Raw response vector (length n). Used for foldwise preprocessing if requested.

x.imp

Globally preprocessed design matrix (n x p), used when cv_preprocess="global".

y.imp

Globally preprocessed response vector (length n), used when cv_preprocess="global".

n_models

Number of models in the ensemble (K).

max_predictors

Maximum total number of predictors to select across all models.

tolerance

Relative improvement tolerance for stopping (\tau).

x_preprocess

Character. Preprocessing method for predictors in the CV loop (e.g., "ddc" or "none").

y_preprocess

Character. Preprocessing method for response in the CV loop (e.g., "wrap", "robust_z", or "none").

cv_preprocess

Character. CV preprocessing mode: "global" or "foldwise".

cv_fit

Character. Inner CV fitting method used by computeCVError (e.g., "ls" or "huber").

cv_loss

Character. CV scoring loss used by computeCVError (e.g., "mse", "trimmed", or "huber").

cv_folds

Integer. Number of CV folds.

Value

A list with component:

active.sets

A list of length n_models, where each element is an integer vector of selected variable indices for that sub-model.

See Also

getLarsProposal, computeCVError, computeRobustFoundation


Predictions for srlars Object

Description

predict.srlars returns the predictions for a srlars object.

Usage

## S3 method for class 'srlars'
predict(object, newx, model_index = NULL, dynamic = TRUE, ...)

Arguments

object

An object of class srlars.

newx

New data matrix for predictions.

model_index

Indices of the sub-models to include in the ensemble. Default is NULL (all models).

dynamic

Logical. If TRUE, and the model was trained robustly, the new data newx is cleaned using DDCpredict before prediction. This ensures consistency with the robust training phase. Default is TRUE.

...

Additional arguments for compatibility.

Value

A numeric vector of predictions.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

See Also

srlars

Examples

# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)

# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1

# Setting the seed
set.seed(0)

# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
  sigma.mat[(group*group.size+1):(group*group.size+group.size),
  (group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1

# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))

# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))

# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)

# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)

# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
                       n_models = 5,
                       tolerance = 1e-4,
                       x_preprocess = "ddc",
                       y_preprocess = "wrap",
                       cor_estimator = "wrap",
                       cv_preprocess = "global",
                       cv_fit = "ls",
                       cv_loss = "huber",
                       compute_coef = TRUE)

# Generate Test Data
x_test <- mvnfast::rmvn(50, mu = rep(0, p), sigma = sigma.mat)
colnames(x_test) <- paste0("V", 1:p)
y_test <- x_test %*% true.beta + rnorm(50, 0, sigma)

# Predict on Test Data
preds <- predict(ensemble_fit, x_test)

# Calculate MSPE
mspe <- mean((y_test - preds)^2)
print(paste("MSPE:", mspe))


Fast and Scalable Cellwise-Robust Ensemble (FSCRE)

Description

srlars performs the FSCRE algorithm for robust variable selection and regression.

Usage

srlars(
  x,
  y,
  n_models = 5,
  tolerance = 1e-08,
  max_predictors = NULL,
  x_preprocess = c("ddc", "none"),
  y_preprocess = c("wrap", "robust_z", "none"),
  cor_estimator = c("wrap", "pearson"),
  cv_preprocess = c("global", "foldwise"),
  cv_fit = c("ls", "huber"),
  cv_loss = c("huber", "trimmed", "mse"),
  cv_folds = 5,
  compute_coef = TRUE
)

Arguments

x

Design matrix (n x p).

y

Response vector (n x 1).

n_models

Number of models in the ensemble (K). Default is 5.

tolerance

Relative improvement tolerance for stopping (tau). Default is 1e-8.

max_predictors

Maximum total number of variables to select across all models. Default is n * n_models.

x_preprocess

Character. "ddc" (default) for cellwise cleaning, or "none".

y_preprocess

Character. "wrap" (default) for univariate robustification, "robust_z", or "none".

cor_estimator

Character. "wrap" (default) for robust PSD correlation, or "pearson".

cv_preprocess

Character. "global" (default) or "foldwise" (to prevent data leakage).

cv_fit

Character. "ls" (default) or "huber" for the inner arbiter fitting method.

cv_loss

Character. "huber" (default), "trimmed", or "mse" for arbiter scoring.

cv_folds

Integer. Number of cross-validation folds. Default is 5.

compute_coef

Logical. If TRUE, fits the final robust MM-models. Default is TRUE.

Value

An object of class srlars containing the selected variables and coefficients.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

See Also

coef.srlars, predict.srlars

Examples

# Required libraries
library(mvnfast)
library(cellWise)
library(robustbase)

# Simulation parameters
n <- 50
p <- 100
rho.within <- 0.8
rho.between <- 0.2
p.active <- 20
group.size <- 5
snr <- 3
contamination.prop <- 0.1

# Setting the seed
set.seed(0)

# Block correlation structure
sigma.mat <- matrix(0, p, p)
sigma.mat[1:p.active, 1:p.active] <- rho.between
for(group in 0:(p.active/group.size - 1))
  sigma.mat[(group*group.size+1):(group*group.size+group.size),
  (group*group.size+1):(group*group.size+group.size)] <- rho.within
diag(sigma.mat) <- 1

# Simulation of beta vector
true.beta <- c(runif(p.active, 0, 5)*(-1)^rbinom(p.active, 1, 0.7), rep(0, p - p.active))

# Setting the SD of the variance
sigma <- as.numeric(sqrt(t(true.beta) %*% sigma.mat %*% true.beta)/sqrt(snr))

# Simulation of uncontaminated data
x <- mvnfast::rmvn(n, mu = rep(0, p), sigma = sigma.mat)
colnames(x) <- paste0("V", 1:p)
y <- x %*% true.beta + rnorm(n, 0, sigma)

# Cellwise contamination
contamination_indices <- sample(1:(n * p), round(n * p * contamination.prop))
x_train <- x
x_train[contamination_indices] <- runif(length(contamination_indices), -10, 10)

# FSCRE Ensemble model
ensemble_fit <- srlars(x_train, y,
                       n_models = 5,
                       tolerance = 1e-4,
                       x_preprocess = "ddc",
                       y_preprocess = "wrap",
                       cor_estimator = "wrap",
                       cv_preprocess = "global",
                       cv_fit = "ls",
                       cv_loss = "huber",
                       compute_coef = TRUE)

# Check selected variables
print(ensemble_fit$active.sets)