| Title: | Immune Oncology Biological Research |
| Version: | 2.2.0 |
| Description: | Provides six modules for tumor microenvironment (TME) analysis based on multi-omics data. These modules cover data preprocessing, TME estimation, TME infiltrating patterns, cellular interactions, genome and TME interaction, and visualization for TME relevant features, as well as modelling based on key features. It integrates multiple microenvironmental analysis algorithms and signature estimation methods, simplifying the analysis and downstream visualization of the TME. In addition to providing a quick and easy way to construct gene signatures from single-cell RNA-seq data, it also provides a way to construct a reference matrix for TME deconvolution from single-cell RNA-seq data. The analysis pipeline and feature visualization are user-friendly and provide a comprehensive description of the complex TME, offering insights into tumour-immune interactions (Zeng D, et al. (2024) <doi:10.1016/j.crmeth.2024.100910>. Fang Y, et al. (2025) <doi:10.1002/mdr2.70001>). |
| License: | GPL-3 |
| URL: | https://doi.org/10.3389/fimmu.2021.687975 (paper), https://iobr.github.io/book/ (docs), https://iobr.github.io/IOBR/ |
| BugReports: | https://github.com/IOBR/IOBR/issues |
| Depends: | R (≥ 3.6.0) |
| Imports: | cli, dplyr, ggplot2, glmnet, GSVA, methods, purrr, rlang, stringr, survival, survminer, tibble, tidyr |
| Suggests: | BiocParallel, biomaRt, circlize, clusterProfiler, ComplexHeatmap, corrplot, DESeq2, doParallel, DOSE, e1071, easier, enrichplot, factoextra, FactoMineR, foreach, ggdensity, ggpp, ggpubr, ggsci, gridExtra, Hmisc, knitr, limma, limSolve, maftools, MASS, Matrix, msigdbr, NbClust, org.Hs.eg.db, org.Mm.eg.db, patchwork, PMCMRplus, pracma, preprocessCore, prettydoc, pROC, psych, RColorBrewer, reshape2, rmarkdown, ROCR, sampling, scales, Seurat, SeuratObject, sva, testthat (≥ 3.0.0), tidyHeatmap, timeROC, webr, WGCNA |
| VignetteBuilder: | knitr |
| biocViews: | GeneExpression, DifferentialExpression, ImmunoOncology, Transcriptomics, Clustering, Survival, Visualization |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | FALSE |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-04-22 13:01:08 UTC; wsx |
| Author: | Dongqiang Zeng [aut],
Yiran Fang [aut],
Shixiang Wang |
| Maintainer: | Shixiang Wang <w_shixiang@163.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-04-22 14:00:18 UTC |
Append Signatures to Expression Matrix
Description
Calculates mean expression for each marker feature set.
Usage
.appendSignatures(xp, markers)
Arguments
xp |
Expression matrix with features in rows and samples in columns. |
markers |
List of marker gene vectors for each cell population. |
Value
Matrix with summarized expression values.
Build Mutation Matrices from MAF Data
Description
Build Mutation Matrices from MAF Data
Usage
.build_mut_matrices(mut, category)
Arguments
mut |
Data frame with mutation data. |
category |
Character. Mutation category. |
Value
List of mutation matrices.
Create Single Mutation Matrix
Description
Create Single Mutation Matrix
Usage
.create_mut_matrix(mut)
Arguments
mut |
Data frame with mutation data. |
Value
Mutation matrix.
Calculate AUC for Binomial Model
Description
Computes Area Under the ROC Curve (AUC) for model predictions using the ROCR package. Handles binary classification models from glmnet.
Usage
BinomialAUC(model, newx, s, acture.y)
Arguments
model |
Fitted glmnet model object. |
newx |
New data matrix for prediction. |
s |
Lambda value for prediction (e.g., "lambda.min" or numeric). |
acture.y |
Actual binary outcomes (numeric 0/1 or factor). |
Value
Numeric AUC value between 0 and 1.
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("ROCR", quietly = TRUE)) {
set.seed(123)
train_data <- matrix(rnorm(100 * 5), ncol = 5)
train_outcome <- rbinom(100, 1, 0.5)
test_data <- matrix(rnorm(50 * 5), ncol = 5)
test_outcome <- rbinom(50, 1, 0.5)
fitted_model <- glmnet::cv.glmnet(train_data, train_outcome, family = "binomial", nfolds = 5)
auc_value <- BinomialAUC(fitted_model, test_data, fitted_model$lambda.min, test_outcome)
print(auc_value)
}
Binomial Model Construction
Description
Constructs and evaluates binomial logistic regression models using Lasso and Ridge regularization. Processes input data, scales features if specified, splits data into training/testing sets, and fits both Lasso and Ridge models. Optionally generates AUC plots for model evaluation.
Usage
BinomialModel(
x,
y,
seed = 123456,
scale = TRUE,
train_ratio = 0.7,
nfold = 10,
plot = TRUE,
palette = "jama",
cols = NULL
)
Arguments
x |
A data frame containing sample ID and features. First column must be sample ID. |
y |
A data frame where first column is sample ID and second column is outcome (numeric or factor). |
seed |
Integer for random seed. Default is '123456'. |
scale |
Logical indicating whether to scale features. Default is 'TRUE'. |
train_ratio |
Numeric between 0 and 1 for training proportion. Default is '0.7'. |
nfold |
Integer for cross-validation folds. Default is '10'. |
plot |
Logical indicating whether to generate AUC plots. Default is 'TRUE'. |
palette |
Character string for color palette. Default is '"jama"'. |
cols |
Optional color vector for ROC curves. Default is 'NULL'. |
Value
List containing:
- lasso_result
Lasso model results
- ridge_result
Ridge model results
- train.x
Training data with IDs
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
x <- data.frame(
ID = paste0("Sample", 1:50),
Feature1 = rnorm(50),
Feature2 = rnorm(50),
Feature3 = rnorm(50)
)
y <- data.frame(
ID = x$ID,
Outcome = factor(rbinom(50, 1, 0.5))
)
result <- BinomialModel(x = x, y = y, plot = FALSE, nfold = 5)
str(result, max.level = 1)
CIBERSORT Deconvolution Algorithm
Description
An analytical tool to estimate cell type abundances in mixed cell populations using gene expression data.
Usage
CIBERSORT(
sig_matrix = NULL,
mixture_file,
perm,
QN = TRUE,
absolute = FALSE,
abs_method = "sig.score",
parallel = FALSE,
num_cores = 2,
seed = NULL
)
Arguments
sig_matrix |
Cell type GEP barcode matrix: row 1 = sample labels; column 1 = gene symbols; no missing values; default = LM22.txt download from CIBERSORT (https://cibersortx.stanford.edu/runcibersort.php) |
mixture_file |
GEP matrix: row 1 = sample labels; column 1 = gene symbols; no missing values |
perm |
Set permutations for statistical analysis (>=100 permutations recommended). |
QN |
Quantile normalization of input mixture (default = TRUE) |
absolute |
Run CIBERSORT in absolute mode (default = FALSE) - note that cell subsets will be scaled by their absolute levels and will not be represented as fractions (to derive the default output, normalize absolute levels such that they sum to 1 for each mixture sample) - the sum of all cell subsets in each mixture sample will be added to the output ('Absolute score'). If LM22 is used, this score will capture total immune content. |
abs_method |
If absolute is set to TRUE, choose method: 'no.sumto1' or 'sig.score' - sig.score = for each mixture sample, define S as the median expression level of all genes in the signature matrix divided by the median expression level of all genes in the mixture. Multiple cell subset fractions by S. - no.sumto1 = remove sum to 1 constraint |
parallel |
Logical. Enable parallel execution? (default = FALSE) |
num_cores |
Integer. Number of cores to use when |
seed |
Integer. Random seed for reproducible permutation testing.
If |
Value
A matrix object containing the estimated cibersort-cell fractions, p-values, correlation coefficients, and RMSE values.
Author(s)
Aaron M. Newman, Stanford University (amnewman@stanford.edu)
Examples
lm22 <- load_data("lm22")
common_genes <- rownames(lm22)[1:500]
sim_mixture <- as.data.frame(matrix(
rnorm(length(common_genes) * 10, mean = 5, sd = 2),
nrow = length(common_genes), ncol = 10
))
rownames(sim_mixture) <- common_genes
colnames(sim_mixture) <- paste0("Sample", 1:10)
result <- CIBERSORT(
sig_matrix = lm22,
mixture_file = sim_mixture,
perm = 10, QN = FALSE, absolute = FALSE,
parallel = FALSE
)
head(result)
Calculate Performance Metrics
Description
Computes True Positive Rate (TPR) and False Positive Rate (FPR) for ROC analysis using the ROCR package. Used internally for ROC curve generation.
Usage
CalculatePref(model, newx, s, acture.y)
Arguments
model |
Fitted glmnet model. |
newx |
New data matrix for prediction. |
s |
Lambda value for prediction. |
acture.y |
Actual binary outcomes. |
Value
ROCR performance object containing TPR and FPR values.
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("ROCR", quietly = TRUE)) {
fitted_model <- glmnet::cv.glmnet(matrix(rnorm(100), ncol = 2), rbinom(50, 1, 0.5), nfolds = 3)
perf <- CalculatePref(fitted_model, matrix(rnorm(20), ncol = 2), "lambda.min", rbinom(10, 1, 0.5))
}
Calculate Time-Dependent ROC Curve
Description
Computes time-dependent ROC curve for survival models using the 'timeROC' package. Evaluates predictive accuracy at a specified time quantile.
Usage
CalculateTimeROC(model, newx, s, acture.y, modelname, time_prob = 0.9)
Arguments
model |
A fitted survival model object. |
newx |
A matrix or data frame of new data for prediction. |
s |
Lambda value for prediction. |
acture.y |
Data frame with 'time' and 'status' columns. |
modelname |
Character string for model identification. |
time_prob |
Numeric quantile for ROC calculation. Default is '0.9'. |
Value
An object of class 'timeROC' containing ROC curve information.
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("survival", quietly = TRUE) &&
requireNamespace("timeROC", quietly = TRUE)) {
library(survival)
dat <- na.omit(lung[, c("time", "status", "age", "sex", "ph.ecog")])
dat$status <- dat$status - 1
x <- as.matrix(dat[, c("age", "sex", "ph.ecog")])
y <- Surv(dat$time, dat$status)
fit <- glmnet::glmnet(x, y, family = "cox")
actual_outcome <- data.frame(time = dat$time, status = dat$status)
roc_info <- CalculateTimeROC(
model = fit, newx = x, s = 0.01, acture.y = actual_outcome,
modelname = "glmnet Cox Model", time_prob = 0.5
)
print(roc_info$AUC)
}
Construct Contrast Matrix
Description
Creates a contrast matrix for differential analysis, where each phenotype level is contrasted against all other levels combined.
Usage
Construct_con(pheno)
Arguments
pheno |
Factor with different levels representing groups to contrast. |
Value
Square matrix with dimensions equal to the number of levels in 'pheno'. Each row represents a contrast where the corresponding level is compared against the average of others.
Examples
pheno <- factor(c("A", "B", "C", "D"))
contrast_matrix <- Construct_con(pheno)
print(contrast_matrix)
Convert Rowname To Loci
Description
Processes a gene expression data matrix by modifying its row names. Extracts the gene identifier from row names formatted as 'GENE|ID', simplifying them to 'GENE'.
Usage
ConvertRownameToLoci(cancerGeneExpression)
Arguments
cancerGeneExpression |
Matrix or data frame. Gene expression data with row names in the format 'GENE|ID'. |
Value
Matrix with modified gene expression data with updated row names. Rows without a valid identifier are removed.
Examples
example_data <- matrix(runif(20), ncol = 5)
rownames(example_data) <- c("LOC101", "LOC102", "LOC103", "LOC104")
processed_data <- ConvertRownameToLoci(example_data)
print(processed_data)
Core Algorithm for CIBERSORT Deconvolution
Description
Performs nu-regression using support vector machines (SVM) to estimate cell type proportions. This is the core computational engine of CIBERSORT, using nu-SVR with linear kernel to decompose mixed gene expression signals.
Usage
CoreAlg(X, y, absolute, abs_method)
Arguments
X |
Matrix or data frame containing signature matrix (predictor variables). Rows are genes, columns are cell types. |
y |
Numeric vector containing the mixture sample expression (response variable). |
absolute |
Logical indicating whether to use absolute space for weights. Default is FALSE (relative proportions). |
abs_method |
String specifying the method for absolute space weights: '"sig.score"' or '"no.sumto1"'. |
Value
List containing:
- w
Estimated cell type weights/proportions
- mix_rmse
Root mean squared error of the deconvolution
- mix_r
Correlation coefficient between observed and predicted mixture
Examples
X <- matrix(rnorm(100), nrow = 10)
y <- rnorm(10)
result <- CoreAlg(X, y, absolute = FALSE, abs_method = "sig.score")
Constrained Least Squares Deconvolution
Description
Constrained Least Squares Deconvolution
Usage
DClsei(b, A, G, H, scaling)
Arguments
b |
Observation vector. |
A |
Signature matrix. |
G |
Constraint matrix. |
H |
Constraint vector. |
scaling |
Scaling factor. |
Value
Estimated cell fractions.
Robust Regression Deconvolution
Description
Robust Regression Deconvolution
Usage
DCrr(b, A, method, scaling)
Arguments
b |
Observation vector. |
A |
Signature matrix. |
method |
Robust regression method. |
scaling |
Scaling factor. |
Value
Estimated cell fractions.
Draw QQ Plot Comparing Cancer and Immune Expression
Description
Creates a quantile-quantile (QQ) plot to compare gene expression distributions between cancer and immune samples. Points along the diagonal indicate similar distributions.
Usage
DrawQQPlot(cancer.exp, immune.exp, name = "")
Arguments
cancer.exp |
Vector. Gene expression data for cancer samples. |
immune.exp |
Vector. Gene expression data for immune samples. |
name |
Character. Optional subtitle with additional information. |
Value
Generates a QQ plot.
Examples
cancer_exp <- rnorm(100, mean = 5, sd = 1.5)
immune_exp <- rnorm(100, mean = 5, sd = 1.5)
DrawQQPlot(
cancer.exp = cancer_exp,
immune.exp = immune_exp,
name = "Comparison of Gene Expression"
)
Estimate the proportion of immune and cancer cells.
Description
EPIC takes as input bulk gene expression data (RNA-seq) and returns
the proportion of mRNA and cells composing the various samples.
Usage
EPIC(
bulk,
reference = NULL,
mRNA_cell = NULL,
mRNA_cell_sub = NULL,
sigGenes = NULL,
scaleExprs = TRUE,
withOtherCells = TRUE,
constrainedSum = TRUE,
rangeBasedOptim = FALSE
)
Arguments
bulk |
A matrix ( |
reference |
(optional): A string or a list defining the reference cells.
It can take multiple formats:
- 'NULL': to use the default reference profiles and genes signature
|
mRNA_cell |
(optional): A named numeric vector: tells (in arbitrary units) the amount of mRNA for each of the reference cells and of the other uncharacterized (cancer) cell. Two names are of special meaning: "otherCells" - used for the mRNA/cell value of the "other cells" from the sample (i.e. the cell type that don't have any reference gene expression profile) ; and default - used for the mRNA/cell of the cells from the reference profiles for which no specific value is given in mRNA_cell (i.e. if mRNA_cell=c(Bcells=2, NKcells=2.1, otherCells=3.5, default=1), then if the refProfiles described Bcells, NKcells and Tcells, we would use a value of 3.5 for the "otherCells" that didn't have any reference profile and a default value of 1 for the Tcells when computing the cell fractions). To note: if data is in tpm, this mRNA per cell would ideally correspond to some number of transcripts per cell. |
mRNA_cell_sub |
(optional): This can be given instead of
|
sigGenes |
(optional): a character vector of the gene names to use as signature for the deconvolution. In principle this is given with the reference as the "reference$sigGenes" but if we give a value for this input variable, it is these signature genes that will be used instead of the ones given with the reference profile. |
scaleExprs |
(optional, default is TRUE): boolean telling if the bulk samples and reference gene expression profiles should be rescaled based on the list of genes in common between the them (such a rescaling is recommanded). |
withOtherCells |
(optional, default is TRUE): if EPIC should allow for an additional cell type for which no gene expression reference profile is available or if the bulk is assumed to be composed only of the cells with reference profiles. |
constrainedSum |
(optional, default is TRUE): tells if the sum of all
cell types should be constrained to be < 1. When
|
rangeBasedOptim |
(optional): when this is FALSE (the default), the least square optimization is performed as described in Racle et al., 2017, eLife, which is recommanded. When this variable is TRUE, EPIC uses the variability of each gene from the reference profiles in another way: instead of defining weights (based on the variability) for the fit of each gene, we define a range of values accessible for each gene (based on the gene expression value in the reference profile +/- the variability values). The error that the optimization tries to minimize is by how much the predicted gene expression is outside of this allowed range of values. |
Details
This function uses a constrained least square minimization to estimate the proportion of each cell type with a reference profile and another uncharacterized cell type in bulk gene expression samples.
The names of the genes in the bulk samples, the reference samples and in the gene signature list need to be the same format (gene symbols are used in the predefined reference profiles). The full list of gene names don't need to be exactly the same between the reference and bulk samples: EPIC will use the intersection of the genes. In case of duplicate gene names, EPIC will use the median value per duplicate - if you want to consider these cases differently, you can remove the duplicates before calling EPIC.
Value
A list of 3 matrices:
- 'mRNAProportions': (nSamples x (nCellTypes+1)) the
proportion of mRNA coming from all cell types with a ref profile + the
uncharacterized other cell
- 'cellFractions': (nSamples x (nCellTypes+1)) this gives the
proportion of cells from each cell type after accounting for the mRNA /
cell value
- 'fit.gof': (nSamples x 12) a matrix telling the quality for the
fit of the signature genes in each sample. It tells if the minimization
converged, and other info about this fit comparing the measured gene
expression in the sigGenes vs predicted gene expression in the sigGenes
Examples
melanoma_data <- load_data("melanoma_data")
TRef <- load_data("TRef")
res1 <- EPIC(melanoma_data$counts)
res1$cellFractions
res2 <- EPIC(melanoma_data$counts, TRef)
res3 <- EPIC(bulk = melanoma_data$counts, reference = TRef)
res4 <- EPIC(melanoma_data$counts, reference = "TRef")
res5 <- EPIC(melanoma_data$counts, mRNA_cell_sub = c(
Bcells = 1,
otherCells = 5
))
# Various possible ways of calling EPIC function. res 1 to 4 should
# give exactly the same outputs, and the elements res1$cellFractions
# should be equal to the example predictions found in
# melanoma_data$cellFractions.pred for these first 4 results.
# The values of cellFraction for res5 will be different due to the use of
# other mRNA per cell values for the B and other cells.
Elastic Net Model Fitting
Description
Fits elastic net model with cross-validation to find optimal alpha and lambda. Searches across a grid of alpha values (0 to 1) and lambda values to minimize cross-validation error.
Usage
Enet(train.x, train.y, lambdamax, nfold = 10)
Arguments
train.x |
Training predictors matrix. |
train.y |
Training binary outcomes (0/1 or factor). |
lambdamax |
Maximum lambda value for the grid search. |
nfold |
Number of CV folds. Default is '10'. |
Value
List containing:
- chose_alpha
Optimal alpha value (0-1)
- chose_lambda
Optimal lambda value
Examples
if (requireNamespace("glmnet", quietly = TRUE)) {
set.seed(123)
train_data <- matrix(rnorm(50 * 5), ncol = 5)
train_outcome <- rbinom(50, 1, 0.5)
result <- Enet(train.x = train_data, train.y = train_outcome, lambdamax = 1, nfold = 5)
}
Constrained Regression Method (Abbas et al., 2009)
Description
Implements a constrained regression approach described by Abbas et al. (2009). Estimates proportions of immune cell types within mixed cancer tissue samples based on gene expression data. Iteratively adjusts regression coefficients to ensure non-negative values.
Usage
GetFractions.Abbas(XX, YY, w = NA)
Arguments
XX |
Matrix. Immune expression data with genes as rows and cell types as columns. |
YY |
Vector. Cancer expression data with gene expression levels. |
w |
Vector or NA. Weights for regression. Default is NA (no weights). |
Value
Vector with non-negative coefficients representing proportions of each cell type.
Examples
XX <- matrix(runif(100), nrow = 10, ncol = 10)
colnames(XX) <- paste("CellType", 1:10, sep = "")
YY <- runif(10)
results <- GetFractions.Abbas(XX, YY)
print(results)
Get Outlier Genes
Description
Identifies outlier genes from multiple cancer datasets. Treats the top 5 expressed genes in each sample as outliers and returns unique outlier genes.
Usage
GetOutlierGenes(cancers)
Arguments
cancers |
Data frame. One column containing paths to gene expression files. |
Value
Vector of unique gene names identified as outliers.
Examples
tf1 <- tempfile(fileext = ".tsv")
tf2 <- tempfile(fileext = ".tsv")
expr1 <- data.frame(
gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE", "GeneF"),
Sample1 = c(10, 50, 30, 80, 60, 20),
Sample2 = c(15, 40, 25, 90, 55, 10)
)
expr2 <- data.frame(
gene = c("GeneA", "GeneB", "GeneC", "GeneD", "GeneE", "GeneF"),
Sample3 = c(100, 20, 10, 60, 30, 80),
Sample4 = c(95, 25, 15, 70, 35, 85)
)
write.table(expr1, tf1, sep = "\t", row.names = FALSE, quote = FALSE)
write.table(expr2, tf2, sep = "\t", row.names = FALSE, quote = FALSE)
cancers <- data.frame(ExpressionFiles = c(tf1, tf2))
outlier_genes <- GetOutlierGenes(cancers)
Calculate Immunophenoscore (IPS)
Description
Calculates Immunophenoscore (IPS) from gene expression data. IPS is a composite score measuring immunophenotype based on four major categories: MHC molecules, immunomodulators, effector cells, and suppressor cells.
Usage
IPS_calculation(project = NULL, eset, plot = FALSE)
Arguments
project |
Character string for project identifier. Default is NULL. |
eset |
Gene expression matrix with official human gene symbols (HGNC) as rownames. Expression values should be log2(TPM+1) or will be transformed if max value > 100. |
plot |
Logical. Whether to generate immunophenogram plots. Default is FALSE. |
Value
Data frame containing:
- MHC
MHC molecules score
- EC
Effector cells score
- SC
Suppressor cells score
- CP
Checkpoints/Immunomodulators score
- AZ
Aggregate score (sum of MHC, CP, EC, SC)
- IPS
Immunophenoscore (0-10 scale)
Examples
# IPS requires gene symbols as rownames
# Create a simple example with gene symbols
example_genes <- c(
"HLA-A", "HLA-B", "HLA-C", "CD274", "PDCD1", "CTLA4",
"CD8A", "CD8B", "GZMB", "PRF1", "FOXP3", "IL10"
)
sim_eset <- as.data.frame(matrix(
rnorm(length(example_genes) * 10, mean = 5, sd = 2),
nrow = length(example_genes), ncol = 10
))
rownames(sim_eset) <- example_genes
colnames(sim_eset) <- paste0("Sample", 1:10)
ips_result <- IPS_calculation(eset = sim_eset, project = "Example", plot = FALSE)
head(ips_result)
Calculate Ligand-Receptor Interaction Scores
Description
Quantifies ligand-receptor interactions in the tumor microenvironment from bulk gene expression data using the easier package. This function processes raw counts or TPM data and computes interaction scores for each sample.
Usage
LR_cal(
eset,
data_type = c("count", "tpm"),
id_type = "ensembl",
cancer_type = "pancan"
)
Arguments
eset |
Gene expression matrix with genes as rows and samples as columns. |
data_type |
Type of input data. Options are '"count"' or '"tpm"'. If '"count"', data will be converted to TPM before analysis. |
id_type |
Type of gene identifier. Default is '"ensembl"'. |
cancer_type |
Character string specifying the cancer type for easier. Default is '"pancan"' for pan-cancer analysis. |
Value
Data frame containing ligand-receptor interaction scores with sample IDs as row names.
References
Lapuente-Santana, van Genderen, M., Hilbers, P., Finotello, F., & Eduati, F. (2021). Interpretable systems biomarkers predict response to immune-checkpoint inhibitors. Patterns (New York, N.Y.), 2(8), 100293. https://doi.org/10.1016/j.patter.2021.100293
Examples
# LR_cal requires HGNC gene symbols as rownames
# Create a simple example with gene symbols
example_genes <- c(
"TGFB1", "EGFR", "VEGFA", "PDGFB", "FGF2", "CXCL12",
"CXCR4", "IL6", "IL6R", "TNF", "TNFRSF1A", "IFNG"
)
sim_eset <- as.data.frame(matrix(
rnorm(length(example_genes) * 10, mean = 5, sd = 2),
nrow = length(example_genes), ncol = 10
))
rownames(sim_eset) <- example_genes
colnames(sim_eset) <- paste0("Sample", 1:10)
if (requireNamespace("easier", quietly = TRUE)) {
lr <- LR_cal(eset = sim_eset, data_type = "tpm")
head(lr)
}
MCP-counter Cell Population Abundance Estimation
Description
Estimates the abundance of different immune and stromal cell populations using the MCP-counter method. Works with various gene identifiers including Affymetrix probesets, HUGO gene symbols, Entrez IDs, and Ensembl IDs.
Usage
MCPcounter.estimate(
expression,
featuresType = c("affy133P2_probesets", "HUGO_symbols", "ENTREZ_ID", "ENSEMBL_ID"),
probesets = read.table(system.file("extdata/probesets.txt", package = "IOBR"), sep =
"\t", stringsAsFactors = FALSE, colClasses = "character"),
genes = read.table(system.file("extdata/genes.txt", package = "IOBR"), sep = "\t",
stringsAsFactors = FALSE, header = TRUE, colClasses = "character", check.names =
FALSE)
)
Arguments
expression |
Matrix or data.frame with features in rows and samples in columns. |
featuresType |
Type of identifiers for expression features. Options: "affy133P2_probesets", "HUGO_symbols", "ENTREZ_ID", "ENSEMBL_ID". Default is "affy133P2_probesets". |
probesets |
Probesets data table. Default loads from GitHub. |
genes |
Genes data table. Default loads from GitHub. |
Value
Matrix with cell populations in rows and samples in columns.
Author(s)
Etienne Becht
Examples
expr <- matrix(runif(1000), nrow = 100, ncol = 10)
rownames(expr) <- paste0("Gene", 1:100)
estimates <- MCPcounter.estimate(expr, featuresType = "HUGO_symbols")
Parse Input Gene Expression Data
Description
Reads gene expression data from a tab-delimited text file, using the first column as row names. Converts data into a numeric matrix for analysis.
Usage
ParseInputExpression(path)
Arguments
path |
Character. Path to a tab-delimited gene expression file. First column should contain gene identifiers. |
Value
Numeric matrix of gene expression values with genes as rows and samples as columns.
Examples
tf <- tempfile(fileext = ".tsv")
expr <- data.frame(
gene = c("GeneA", "GeneB", "GeneC"),
Sample1 = c(10, 20, 30),
Sample2 = c(15, 25, 35)
)
write.table(expr, tf, sep = "\t", row.names = FALSE, quote = FALSE)
gene_expression_data <- ParseInputExpression(tf)
print(gene_expression_data)
Plot AUC ROC Curves
Description
Generates ROC curves for model evaluation comparing training and testing performance at both lambda.min and lambda.1se. Creates a ggplot visualization with AUC values in the legend.
Usage
PlotAUC(
train.x,
train.y,
test.x,
test.y,
model,
modelname,
cols = NULL,
palette = "jama"
)
Arguments
train.x |
Training predictors matrix. |
train.y |
Training outcomes (binary factor). |
test.x |
Testing predictors matrix. |
test.y |
Testing outcomes (binary factor). |
model |
Fitted cv.glmnet model. |
modelname |
Character string for plot title. |
cols |
Optional color vector for ROC curves. |
palette |
Color palette name from IOBR palettes. Default is '"jama"'. |
Value
ggplot object of ROC curves.
Examples
if (requireNamespace("glmnet", quietly = TRUE)) {
set.seed(123)
train_data <- matrix(rnorm(100 * 5), ncol = 5)
train_outcome <- rbinom(100, 1, 0.5)
test_data <- matrix(rnorm(50 * 5), ncol = 5)
test_outcome <- rbinom(50, 1, 0.5)
fitted_model <- glmnet::cv.glmnet(train_data, train_outcome, family = "binomial", nfolds = 5)
p <- PlotAUC(train_data, train_outcome, test_data, test_outcome, fitted_model, "MyModel")
print(p)
}
Plot Time-Dependent ROC Curves
Description
Generates time-dependent ROC curves for evaluating prognostic accuracy of survival models. Plots training and testing ROC curves at the 90th percentile survival time.
Usage
PlotTimeROC(
train.x,
train.y,
test.x,
test.y,
model,
modelname,
cols = NULL,
palette = "jama"
)
Arguments
train.x |
Matrix or data frame of training predictors. |
train.y |
Training survival outcomes (time and status). |
test.x |
Matrix or data frame of testing predictors. |
test.y |
Testing survival outcomes (time and status). |
model |
Fitted survival model object. |
modelname |
Character string for model identification. |
cols |
Optional vector of colors for plotting. |
palette |
Character string specifying color palette. Default is '"jama"'. |
Value
A 'ggplot' object representing the ROC curve plot.
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("survival", quietly = TRUE) &&
requireNamespace("timeROC", quietly = TRUE)) {
library(survival)
set.seed(123)
train_x <- matrix(rnorm(100 * 5), ncol = 5)
train_y <- data.frame(time = rexp(100), status = rbinom(100, 1, 0.5))
test_x <- matrix(rnorm(50 * 5), ncol = 5)
test_y <- data.frame(time = rexp(50), status = rbinom(50, 1, 0.5))
fit <- glmnet::cv.glmnet(train_x, Surv(train_y$time, train_y$status), family = "cox")
p <- PlotTimeROC(train_x, train_y, test_x, test_y, fit, "Cox Model")
print(p)
}
Process Data for Model Construction
Description
Preprocesses data for binomial or survival analysis. Aligns and filters data based on sample IDs, optionally scales data, and ensures appropriate data types. Handles missing values by removing columns with NA values.
Usage
ProcessingData(x, y, scale, type = c("binomial", "survival"))
Arguments
x |
Data frame of predictors with first column as IDs. |
y |
Data frame of outcomes with first column as IDs. For survival, expects two additional columns for time and status. |
scale |
Logical indicating whether to scale predictors. |
type |
Character string: '"binomial"' or '"survival"'. |
Value
List containing:
- x_scale
Processed predictor matrix
- y
Processed outcome variable
- x_ID
Sample IDs
Examples
x <- data.frame(ID = 1:10, predictor1 = rnorm(10), predictor2 = rnorm(10))
y <- data.frame(ID = 1:10, outcome = sample(c(0, 1), 10, replace = TRUE))
result <- ProcessingData(x, y, scale = TRUE, type = "binomial")
Calculate Time-Dependent AUC for Survival Models
Description
Evaluates prognostic ability of a survival model by calculating time-dependent AUC at the 30th and 90th percentiles of survival time. These thresholds assess short-term and long-term predictive accuracy.
Usage
PrognosticAUC(model, newx, s, acture.y)
Arguments
model |
A fitted survival model object capable of generating risk scores. |
newx |
A matrix or data frame of new data for prediction. |
s |
Lambda value for prediction. Can be numeric or '"lambda.min"'/'"lambda.1se"'. |
acture.y |
Data frame with 'time' and 'status' columns. |
Value
A data frame with AUC values at 30th ('probs.3') and 90th ('probs.9') percentiles.
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("survival", quietly = TRUE) &&
requireNamespace("timeROC", quietly = TRUE)) {
library(survival)
set.seed(123)
x <- matrix(rnorm(100 * 5), ncol = 5)
y <- Surv(rexp(100), rbinom(100, 1, 0.5))
fit <- glmnet::cv.glmnet(x, y, family = "cox")
acture_y <- data.frame(time = y[, 1], status = y[, 2])
auc_results <- PrognosticAUC(fit, newx = x, s = "lambda.min", acture.y = acture_y)
}
Build Prognostic Models Using LASSO and Ridge Regression
Description
Prepares data, splits it into training and testing sets, and fits LASSO and Ridge regression models for survival analysis. Evaluates model performance using cross-validation and optionally generates time-dependent ROC curves for visual assessment of predictive accuracy.
Usage
PrognosticModel(
x,
y,
scale = FALSE,
seed = 123456,
train_ratio = 0.7,
nfold = 10,
plot = TRUE,
palette = "jama",
cols = NULL
)
Arguments
x |
A matrix or data frame of predictor variables (features). |
y |
A data frame of survival outcomes with two columns: survival time and event status. |
scale |
Logical indicating whether to scale predictor variables. Default is 'FALSE'. |
seed |
Integer seed for random number generation to ensure reproducibility. Default is '123456'. |
train_ratio |
Numeric proportion of data for training (e.g., 0.7). Default is '0.7'. |
nfold |
Integer number of folds for cross-validation. Default is '10'. |
plot |
Logical indicating whether to plot ROC curves. Default is 'TRUE'. |
palette |
String specifying color palette. Default is '"jama"'. |
cols |
Optional vector of colors for ROC curves. If 'NULL', uses default palette. |
Value
A list containing:
- lasso_result
Results from LASSO model including coefficients and AUC
- ridge_result
Results from Ridge model including coefficients and AUC
- train.x
Training data with sample IDs
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("survival", quietly = TRUE)) {
library(survival)
imvigor210_sig <- load_data("imvigor210_sig")
imvigor210_pdata <- load_data("imvigor210_pdata")
pdata_prog <- data.frame(
ID = imvigor210_pdata$ID,
OS_days = as.numeric(imvigor210_pdata$OS_days),
OS_status = as.numeric(imvigor210_pdata$OS_status)
)
prognostic_result <- PrognosticModel(
x = imvigor210_sig, y = pdata_prog,
scale = TRUE, seed = 123456,
train_ratio = 0.7, nfold = 10, plot = FALSE
)
}
Compute Prognostic Results for Survival Models
Description
Computes and compiles prognostic results from a survival model fitted with 'glmnet'. Extracts model coefficients at optimal lambda values ('lambda.min' and 'lambda.1se') and calculates time-dependent AUC metrics for both training and testing datasets.
Usage
PrognosticResult(model, train.x, train.y, test.x, test.y)
Arguments
model |
A fitted survival model object (e.g., from 'glmnet::cv.glmnet'). |
train.x |
Matrix or data frame of training predictors. |
train.y |
Training dataset survival outcomes (time and status). |
test.x |
Matrix or data frame of testing predictors. |
test.y |
Testing dataset survival outcomes (time and status). |
Value
A list containing:
- model
The fitted model object
- coefs
Data frame of coefficients at 'lambda.min' and 'lambda.1se'
- AUC
Data frame with AUC values for train/test at both lambda values
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("glmnet", quietly = TRUE) &&
requireNamespace("survival", quietly = TRUE) &&
requireNamespace("timeROC", quietly = TRUE)) {
library(survival)
set.seed(123)
train_x <- matrix(rnorm(100 * 10), ncol = 10)
train_y <- data.frame(time = rexp(100), status = rbinom(100, 1, 0.5))
test_x <- matrix(rnorm(50 * 10), ncol = 10)
test_y <- data.frame(time = rexp(50), status = rbinom(50, 1, 0.5))
fit <- glmnet::cv.glmnet(train_x, Surv(train_y$time, train_y$status), family = "cox")
results <- PrognosticResult(
model = fit, train.x = train_x, train.y = train_y,
test.x = test_x, test.y = test_y
)
}
Regression Result Computation
Description
Computes regression results with coefficients at lambda.min and lambda.1se, and evaluates AUC for binomial outcomes. Returns a comprehensive summary of model performance on both training and testing datasets.
Usage
RegressionResult(train.x, train.y, test.x, test.y, model)
Arguments
train.x |
Training predictors matrix. |
train.y |
Training outcomes (binary factor). |
test.x |
Testing predictors matrix. |
test.y |
Testing outcomes (binary factor). |
model |
Fitted cv.glmnet model object. |
Value
List containing:
- model
The fitted cv.glmnet model
- coefs
Data frame with feature names and coefficients at lambda.min and lambda.1se
- AUC
Matrix of AUC values for train/test sets at both lambda values
Examples
if (requireNamespace("glmnet", quietly = TRUE)) {
set.seed(123)
train_data <- matrix(rnorm(100 * 10), ncol = 10)
train_outcome <- rbinom(100, 1, 0.5)
test_data <- matrix(rnorm(50 * 10), ncol = 10)
test_outcome <- rbinom(50, 1, 0.5)
fitted_model <- glmnet::cv.glmnet(train_data, train_outcome, family = "binomial", nfolds = 5)
results <- RegressionResult(
train.x = train_data, train.y = train_outcome,
test.x = test_data, test.y = test_outcome, model = fitted_model
)
}
Remove Batch Effect of Expression Set
Description
Removes batch effects between two gene expression datasets, typically representing different sample types such as cancer cells and immune cells. Uses ComBat from the sva package for batch correction.
Usage
RemoveBatchEffect(cancer.exp, immune.exp, immune.cellType)
Arguments
cancer.exp |
Matrix or data frame. Cancer cell expression data with genes as rows and samples as columns. |
immune.exp |
Matrix or data frame. Immune cell expression data with genes as rows and samples as columns. |
immune.cellType |
Vector. Cell type for each column in 'immune.exp'. |
Value
A list containing:
- 1
Batch effect corrected cancer expression data
- 2
Batch effect corrected immune expression data
- 3
Aggregated immune expression data (median per cell type)
Author(s)
Bo Li
Examples
set.seed(123)
gene_names <- paste0("Gene", 1:100)
sample_names_cancer <- paste0("CancerSample", 1:10)
cancer.exp <- matrix(runif(1000, 1, 1000),
nrow = 100, ncol = 10,
dimnames = list(gene_names, sample_names_cancer)
)
sample_names_immune <- paste0("ImmuneSample", 1:5)
immune.exp <- matrix(runif(500, 1, 1000),
nrow = 100, ncol = 5,
dimnames = list(gene_names, sample_names_immune)
)
immune.cellType <- c("T-cell", "B-cell", "T-cell", "NK-cell", "B-cell")
names(immune.cellType) <- sample_names_immune
result <- RemoveBatchEffect(cancer.exp, immune.exp, immune.cellType)
Split Data into Training and Testing Sets
Description
Divides dataset into training and testing sets using random sampling. Maintains data integrity for both binomial and survival analysis types.
Usage
SplitTrainTest(x, y, train_ratio, type = c("binomial", "survival"), seed)
Arguments
x |
Predictor matrix or data frame. |
y |
Outcome vector (binomial) or matrix with time/status (survival). |
train_ratio |
Proportion for training (0-1). Default is '0.7'. |
type |
Analysis type: '"binomial"' or '"survival"'. |
seed |
Random seed for reproducibility. |
Value
List containing:
- train.x
Training predictors matrix
- train.y
Training outcomes
- test.x
Testing predictors matrix
- test.y
Testing outcomes
- train_sample
Indices of training samples
Examples
data_matrix <- matrix(rnorm(200), ncol = 2)
outcome_vector <- rbinom(100, 1, 0.5)
split_data <- SplitTrainTest(
data_matrix, outcome_vector,
train_ratio = 0.7,
type = "binomial", seed = 123
)
Top Probe Selector
Description
Extracts the top 'i' probes based on their ordering in the provided data frame. If the number of rows is less than or equal to 'i', returns all probes.
Usage
Top_probe(dat, i)
Arguments
dat |
Data frame containing a column named "probe". |
i |
Integer. Number of top probes to return. |
Value
Character vector containing the names of the top 'i' probes.
Examples
dat <- data.frame(
probe = c("Probe1", "Probe2", "Probe3", "Probe4", "Probe5"),
value = c(5, 3, 2, 4, 1)
)
top_probes <- Top_probe(dat, 3)
print(top_probes)
Add Custom Download Mirror
Description
Adds a custom mirror URL to the default mirrors for the current session. The mirror URL should be a base URL that will be prepended to GitHub paths.
Usage
add_iobr_mirror(url, position = c("first", "last", "before_github"))
Arguments
url |
Character string. The mirror URL to add. |
position |
Character. Where to add the mirror: "first", "last", or "before_github". Default: "first". |
Value
Invisibly returns the updated mirror list.
Examples
# Add a custom mirror to try first
add_iobr_mirror("https://my-mirror.com/https://github.com")
# Add mirror to try before default GitHub
add_iobr_mirror("https://fast-mirror.org", position = "before_github")
# Download with the new mirror
data <- download_iobr_data("BRef")
Add Risk Score to Dataset
Description
Computes a risk score for each observation based on Cox proportional hazards regression or binary logistic regression. The function fits the specified model and returns the dataset with an added risk score column.
Usage
add_riskscore(
input,
family = c("cox", "binary"),
target = NULL,
time = NULL,
status = NULL,
vars,
new_var_name = "riskscore"
)
Arguments
input |
Data frame containing the variables for analysis. |
family |
Character string specifying the model family: '"cox"' for Cox proportional hazards regression or '"binary"' for logistic regression. Default is '"cox"'. |
target |
Character string specifying the target variable name. Required when 'family = "binary"'. |
time |
Character string specifying the time-to-event variable name. Required when 'family = "cox"'. |
status |
Character string specifying the event status variable name. Required when 'family = "cox"'. |
vars |
Character vector of variable names to include in the model. |
new_var_name |
Character string specifying the name for the new risk score column. Default is '"riskscore"'. |
Value
Data frame identical to 'input' with an additional column containing risk scores (linear predictors for Cox models or predicted probabilities for logistic models).
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
input_data <- data.frame(
time = rexp(100),
status = rbinom(100, 1, 0.5),
age = rnorm(100, 60, 10),
score1 = rnorm(100),
score2 = rnorm(100)
)
result <- add_riskscore(
input_data,
time = "time", status = "status",
vars = c("age", "score1", "score2")
)
head(result$riskscore)
Annotate Gene Expression Matrix and Remove Duplicated Genes
Description
Annotates an expression matrix with gene symbols using provided annotation data, filters out missing or invalid symbols, handles duplicate gene entries, and removes uninformative rows. The function supports multiple aggregation methods for resolving duplicate gene symbols.
Usage
anno_eset(
eset,
annotation,
symbol = "symbol",
probe = "probe_id",
method = "mean"
)
Arguments
eset |
Expression matrix or ExpressionSet object containing gene expression data. |
annotation |
Data frame containing annotation information for probes. Built-in options include 'anno_hug133plus2', 'anno_rnaseq', and 'anno_illumina'. |
symbol |
Character string specifying the column name in 'annotation' that represents gene symbols. Default is '"symbol"'. |
probe |
Character string specifying the column name in 'annotation' that represents probe identifiers. Default is '"probe_id"'. |
method |
Character string specifying the aggregation method for duplicate gene symbols. Options are '"mean"', '"sum"', or '"sd"'. Default is '"mean"'. |
Details
The function performs the following operations:
Filters probes with missing symbols or labeled as '"NA_NA"'
Matches probes between expression set and annotation data
Merges annotation with expression data
Handles duplicate gene symbols using specified aggregation method
Removes rows with all zeros, all NAs, or missing values in the first column
Value
Annotated and cleaned gene expression matrix with gene symbols as row names.
Author(s)
Dongqiang Zeng
Examples
# Annotate Affymetrix microarray data
eset_gse62254 <- load_data("eset_gse62254")
anno_hug133plus2 <- load_data("anno_hug133plus2")
eset <- anno_eset(eset = eset_gse62254, annotation = anno_hug133plus2)
head(eset)
# Annotate RNA-seq data with Ensembl IDs
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
head(eset)
Harmonize Two Data Frames by Column Structure
Description
Adds missing columns (filled with 'NA') to a secondary data frame so that its column set and order match a reference data frame. This is useful when combining data frames from different sources that should have the same structure but may be missing some columns.
Usage
assimilate_data(data_a, data_b)
Arguments
data_a |
Data frame. Reference data frame whose column structure should be matched. |
data_b |
Data frame. Data frame to be conformed to 'data_a'. |
Value
Data frame 'data_b' with added missing columns (NA-filled) and reordered to match 'data_a'.
Examples
# Create reference data frame
pdata_a <- data.frame(
A = 1:5, B = 2:6, C = 3:7, D = 4:8, E = 5:9
)
# Create data frame with subset of columns
pdata_b <- data.frame(A = 1:3, C = 4:6, E = 7:9)
# Harmonize pdata_b to match pdata_a structure
pdata_b_harmonized <- assimilate_data(data_a = pdata_a, data_b = pdata_b)
print(names(pdata_b_harmonized)) # Now has A, B, C, D, E
Batch Correlation Analysis
Description
Performs correlation analysis between a target variable and multiple feature variables. Computes correlation coefficients, p-values, and adjusts for multiple testing using the Benjamini-Hochberg method.
Usage
batch_cor(data, target, feature, method = c("spearman", "pearson", "kendall"))
Arguments
data |
Data frame containing the target and feature variables. |
target |
Character string specifying the name of the target variable. |
feature |
Character vector specifying the names of feature variables to correlate with the target. |
method |
Character string specifying the correlation method. Options are '"spearman"', '"pearson"', or '"kendall"'. Default is '"spearman"'. |
Value
Tibble containing the following columns for each feature:
- sig_names
Feature name
- p.value
Raw p-value
- statistic
Correlation coefficient
- p.adj
Adjusted p-value (Benjamini-Hochberg)
- log10pvalue
Negative log10-transformed p-value
- stars
Significance stars: **** p<0.0001, *** p<0.001, ** p<0.01, * p<0.05, + p<0.5
Author(s)
Dongqiang Zeng
Examples
# Load TCGA-STAD signature data
sig_stad <- load_data("sig_stad")
# Perform batch correlation
results <- batch_cor(
data = sig_stad,
target = "CD_8_T_effector",
feature = colnames(sig_stad)[69:ncol(sig_stad)]
)
head(results)
Batch Kruskal-Wallis Test
Description
Performs Kruskal-Wallis rank sum tests on multiple continuous features across different groups. Computes p-values, adjusts for multiple testing, and ranks features by significance.
Usage
batch_kruskal(data, group, feature = NULL, feature_manipulation = FALSE)
Arguments
data |
Data frame containing the dataset for analysis. |
group |
Character string specifying the name of the grouping variable. |
feature |
Character vector specifying the names of feature variables to test. If 'NULL', the user is prompted to select features (interactive mode only). Default is 'NULL'. |
feature_manipulation |
Logical indicating whether to apply feature manipulation to filter valid features. Default is 'FALSE'. |
Value
Tibble containing:
- sig_names
Feature name
- p.value
Raw p-value from Kruskal-Wallis test
- statistic
Test statistic (chi-squared)
- p.adj
Adjusted p-value (Benjamini-Hochberg)
- log10pvalue
Negative log10-transformed p-value
- stars
Significance stars: **** p<0.0001, *** p<0.001, ** p<0.01, * p<0.05, + p<0.5
- group columns
Mean-centered values for each group
Author(s)
Dongqiang Zeng
Examples
# Load TCGA-STAD signature data
sig_stad <- load_data("sig_stad")
# Test features by gender (if available in your dataset)
if ("Gender" %in% colnames(sig_stad)) {
res <- batch_kruskal(
data = sig_stad,
group = "Gender",
feature = colnames(sig_stad)[69:ncol(sig_stad)]
)
head(res)
}
Batch Calculation of Partial Correlation Coefficients
Description
Computes partial correlation coefficients between multiple features and a target variable while controlling for an interference (confounding) variable. Adjusts p-values for multiple testing using the Benjamini-Hochberg method.
Usage
batch_pcc(
input,
interferenceid,
target,
features,
method = c("pearson", "spearman", "kendall")
)
Arguments
input |
Data frame containing feature variables, target variable, and interference variable. |
interferenceid |
Character string specifying the column name of the interference (confounding) variable to control for. |
target |
Character string specifying the column name of the target variable. |
features |
Character vector specifying the column names of feature variables to correlate with the target. |
method |
Character string specifying the correlation method. Options are '"pearson"', '"spearman"', or '"kendall"'. Default is '"pearson"'. |
Value
Tibble containing the following columns for each feature:
- sig_names
Feature name
- p.value
Raw p-value
- statistic
Partial correlation coefficient
- p.adj
Adjusted p-value (Benjamini-Hochberg method)
- log10pvalue
Negative log10-transformed p-value
- stars
Significance stars: **** p.adj<0.0001, *** p.adj<0.001, ** p.adj<0.01, * p.adj<0.05, + p.adj<0.5
Author(s)
Rongfang Shen
Examples
# Load TCGA-STAD signature data
sig_stad <- load_data("sig_stad")
# Calculate partial correlations controlling for tumor purity
res <- batch_pcc(
input = sig_stad,
interferenceid = "TumorPurity_estimate",
target = "Pan_F_TBRs",
method = "pearson",
features = colnames(sig_stad)[70:ncol(sig_stad)]
)
head(res)
Batch Signature Survival Plot
Description
Generates Kaplan-Meier survival plots for multiple projects or cohorts based on signature scores. Automatically determines optimal cutpoints for signature stratification and creates publication-ready survival curves.
Usage
batch_sig_surv_plot(
input_pdata,
signature,
id = "ID",
column_of_project = "ProjectID",
project = NULL,
time = "time",
status = "status",
time_type = "day",
break_month = "auto",
palette = "jama",
cols = NULL,
mini_sig = "score",
save_path = NULL,
show_col = TRUE,
fig_type = "pdf"
)
Arguments
input_pdata |
Data frame containing survival data and signature scores. |
signature |
Character string specifying the column name of the target signature for survival analysis. |
id |
Character string specifying the column name containing unique identifiers. Default is '"ID"'. |
column_of_project |
Character string specifying the column name containing project identifiers. Default is '"ProjectID"'. |
project |
Character string or vector specifying project name(s) to analyze. If 'NULL', all projects are analyzed. Default is 'NULL'. |
time |
Character string specifying the column name containing time-to-event data. Default is '"time"'. |
status |
Character string specifying the column name containing event status. Default is '"status"'. |
time_type |
Character string specifying the time unit. Options are '"day"' or '"month"'. Default is '"day"'. |
break_month |
Numeric value or '"auto"' specifying the interval for time axis breaks in months. Default is '"auto"'. |
palette |
Character string specifying the color palette. Default is '"jama"'. |
cols |
Character vector of custom colors. If 'NULL', palette is used. Default is 'NULL'. |
mini_sig |
Character string for the signature label in the legend. Default is '"score"'. |
save_path |
Character string or 'NULL'. Directory path for saving plots. If 'NULL', plots are not saved. Default is 'NULL'. |
show_col |
Logical indicating whether to display color information. Default is 'TRUE'. |
fig_type |
Character string specifying the output file format ('"pdf"', '"png"', etc.). Default is '"pdf"'. |
Value
Data frame containing combined survival analysis results from all projects.
Author(s)
Dongqiang Zeng
Examples
sig_stad <- load_data("sig_stad")
result <- batch_sig_surv_plot(
input_pdata = sig_stad,
signature = "T.cells.CD8",
id = "ID",
column_of_project = "ProjectID",
project = NULL,
time = "OS_time",
status = "OS_status",
time_type = "month",
break_month = "auto",
palette = "jama",
cols = NULL,
mini_sig = "score",
show_col = TRUE,
fig_type = "pdf"
)
Batch Survival Analysis
Description
Performs Cox proportional hazards regression analysis on multiple variables. Optionally determines optimal cutoffs to dichotomize continuous predictors before modeling. Returns hazard ratios, confidence intervals, and p-values for each variable.
Usage
batch_surv(
pdata,
variable,
time = "time",
status = "status",
best_cutoff = FALSE
)
Arguments
pdata |
Data frame containing survival time, event status, and predictor variables. |
variable |
Character vector specifying the names of predictor variables to analyze. |
time |
Character string specifying the column name containing follow-up time. Default is '"time"'. |
status |
Character string specifying the column name containing event status (1 = event occurred, 0 = censored). Default is '"status"'. |
best_cutoff |
Logical indicating whether to compute optimal cutoffs for continuous variables and analyze dichotomized versions. Default is 'FALSE'. |
Value
Data frame containing hazard ratios (HR), 95 and p-values for each variable, sorted by p-value.
Author(s)
Dongqiang Zeng
Examples
sig_stad <- load_data("sig_stad")
batch_surv(
pdata = sig_stad,
variable = colnames(sig_stad)[69:ncol(sig_stad)],
time = "OS_time",
status = "OS_status"
)
Batch Wilcoxon Rank-Sum Test Between Two Groups
Description
Performs Wilcoxon rank-sum tests (Mann-Whitney U tests) to compare the distribution of specified features between two groups. Computes p-values, adjusts for multiple testing, and ranks features by significance.
Usage
batch_wilcoxon(
data,
target = "group",
feature = NULL,
feature_manipulation = FALSE
)
Arguments
data |
Data frame containing the dataset for analysis. |
target |
Character string specifying the column name of the grouping variable. Default is '"group"'. |
feature |
Character vector specifying feature names to analyze. If 'NULL', prompts for selection (interactive mode only). Default is 'NULL'. |
feature_manipulation |
Logical indicating whether to apply feature manipulation filtering. Default is 'FALSE'. |
Value
Tibble with columns:
- sig_names
Feature name
- p.value
Raw p-value
- statistic
Difference in means between groups
- p.adj
Adjusted p-value (Benjamini-Hochberg)
- log10pvalue
Negative log10-transformed p-value
- stars
Significance stars: **** p<0.0001, *** p<0.001, ** p<0.01, * p<0.05, + p<0.5
- group1, group2
Mean values for each group
Author(s)
Dongqiang Zeng
Examples
# Load TCGA-STAD signature data
sig_stad <- load_data("sig_stad")
# Compare features by gender
res <- batch_wilcoxon(
data = sig_stad,
target = "Gender",
feature = colnames(sig_stad)[69:ncol(sig_stad)]
)
head(res)
Extract Best Cutoff and Add Binary Variable to Data Frame
Description
Determines the optimal cutoff point for a continuous variable in survival analysis using the maximally selected rank statistics method. Creates a binary variable based on the identified cutoff and adds it to the input data frame.
Usage
best_cutoff(
pdata,
variable,
time = "time",
status = "status",
print_result = TRUE
)
Arguments
pdata |
Data frame containing survival information and the continuous variable. |
variable |
Character string specifying the name of the continuous variable for which the optimal cutoff should be determined. |
time |
Character string specifying the column name containing time-to-event data. Default is '"time"'. |
status |
Character string specifying the column name containing event status (censoring information). Default is '"status"'. |
print_result |
Logical indicating whether to print detailed results including cutoff value and Cox model summaries. Default is 'TRUE'. |
Value
Data frame identical to 'pdata' with an additional binary column named '<variable>_binary' containing "High" and "Low" categories based on the optimal cutoff.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
pdata <- data.frame(
time = rexp(100),
status = rbinom(100, 1, 0.5),
score = rnorm(100, mean = 50, sd = 10)
)
result <- best_cutoff(pdata, variable = "score", print_result = FALSE)
table(result$score_binary)
Extract Best Cutoff and Add Binary Variable to Data Frame
Description
Finds the best cutoff point for a continuous variable in survival analysis. Takes input data containing a continuous variable and survival information (time and status). Returns the modified input data with a new binary variable created based on the best cutoff point.
Usage
best_cutoff2(
pdata,
variable,
time = "time",
status = "status",
print_result = TRUE
)
Arguments
pdata |
Data frame containing survival information and features. |
variable |
Character string specifying the continuous variable name. |
time |
Character string specifying the time-to-event column name. Default is '"time"'. |
status |
Character string specifying the event status column name. Default is '"status"'. |
print_result |
Logical indicating whether to print results. Default is 'TRUE'. |
Value
List containing:
- pdata
Data frame with binary variable added
- best_cutoff
Numeric cutoff value
- cox_continuous_object
Cox model summary for continuous variable
- summary_binary_variable
Summary of binary variable
- cox_binary_object
Cox model summary for binary variable
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
pdata <- data.frame(
time = rexp(100),
status = rbinom(100, 1, 0.5),
score = rnorm(100, mean = 50, sd = 10)
)
result <- best_cutoff2(pdata, variable = "score", print_result = FALSE)
result$best_cutoff
Break Time Into Blocks
Description
Divides time duration into specified blocks for analysis.
Usage
calculate_break_month(input, block = 6, time_type = c("month", "day"))
Arguments
input |
Numeric vector of time durations. |
block |
Number of blocks. Default is 6. |
time_type |
Units: "month" or "day". Default is "month". |
Value
Numeric vector of breakpoints, rounded to nearest multiple of 5.
Author(s)
Dongqiang Zeng
Examples
time_data <- c(24, 36, 12, 48)
blocks <- calculate_break_month(input = time_data)
Calculate Signature Score
Description
Main interface for calculating signature scores from gene expression data. Supports PCA, z-score, ssGSEA, and integration methods.
Usage
calculate_sig_score(
pdata = NULL,
eset,
signature = NULL,
method = c("pca", "ssgsea", "zscore", "integration"),
mini_gene_count = 3,
column_of_sample = "ID",
print_gene_proportion = FALSE,
print_filtered_signatures = FALSE,
adjust_eset = FALSE,
parallel.size = 1L,
...
)
Arguments
pdata |
Phenotype data. If 'NULL', created from 'eset' column names. |
eset |
Expression matrix (CPM, TPM, RPKM, FPKM, etc.). |
signature |
List of gene signatures. Can also be a character string naming a built-in signature collection (e.g., '"signature_collection"', '"signature_tme"', '"go_bp"', '"kegg"', '"hallmark"'). |
method |
Scoring method: '"pca"', '"ssgsea"', '"zscore"', or '"integration"'. Default is '"pca"'. |
mini_gene_count |
Minimum genes required per signature. Default is 3 (or 5 for ssGSEA). |
column_of_sample |
Column with sample IDs in 'pdata'. Default is '"ID"'. |
print_gene_proportion |
Logical: print gene coverage. Default is 'FALSE'. |
print_filtered_signatures |
Logical: print filtered signatures. Default is 'FALSE'. |
adjust_eset |
Logical: clean problematic features. Default is 'FALSE'. |
parallel.size |
Parallel workers for ssGSEA. Default is 1. |
... |
Additional arguments passed to specific methods. |
Value
Tibble with phenotype data and signature scores.
Author(s)
Dongqiang Zeng
References
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis. BMC Bioinformatics. 2013;14:7.
Mariathasan S, et al. TGF
\betaattenuates tumour response to PD-L1 blockade. Nature. 2018;554:544-548.
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
signature <- list(
Signature1 = paste0("Gene", 1:10),
Signature2 = paste0("Gene", 11:20)
)
result <- calculate_sig_score(eset = eset, signature = signature, method = "pca")
Calculate Signature Score Using Integration Method
Description
Computes signature scores using PCA, z-score, and ssGSEA methods combined.
Usage
calculate_sig_score_integration(
pdata = NULL,
eset,
signature,
mini_gene_count = 2,
column_of_sample = "ID",
adjust_eset = FALSE,
parallel.size = 1L
)
Arguments
pdata |
Data frame with phenotype data. If 'NULL', created from 'eset' column names. |
eset |
Expression matrix (genes as rows, samples as columns). |
signature |
List of gene signatures. |
mini_gene_count |
Minimum genes required per signature. Default is 3. |
column_of_sample |
Column in 'pdata' with sample IDs. Default is '"ID"'. |
adjust_eset |
Logical: remove problematic features. Default is 'FALSE'. |
parallel.size |
Number of parallel workers. Default is 1. |
Value
Tibble with signature scores from all three methods.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
signature <- list(
Signature1 = paste0("Gene", 1:15),
Signature2 = paste0("Gene", 16:30)
)
result <- calculate_sig_score_integration(eset = eset, signature = signature)
Calculate Signature Score Using PCA Method
Description
Computes signature scores using Principal Component Analysis. The first principal component is used as the signature score.
Usage
calculate_sig_score_pca(
pdata = NULL,
eset,
signature,
mini_gene_count = 3,
column_of_sample = "ID",
adjust_eset = FALSE
)
Arguments
pdata |
Data frame with phenotype data. If 'NULL', created from 'eset' column names. |
eset |
Expression matrix (genes as rows, samples as columns). |
signature |
List of gene signatures. |
mini_gene_count |
Minimum genes required per signature. Default is 3. |
column_of_sample |
Column in 'pdata' with sample IDs. Default is '"ID"'. |
adjust_eset |
Logical: remove problematic features. Default is 'FALSE'. |
Value
Tibble with signature scores.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
signature <- list(
Signature1 = paste0("Gene", 1:10),
Signature2 = paste0("Gene", 11:20)
)
result <- calculate_sig_score_pca(eset = eset, signature = signature)
Calculate Signature Score Using ssGSEA Method
Description
Computes signature scores using single-sample Gene Set Enrichment Analysis.
Usage
calculate_sig_score_ssgsea(
pdata = NULL,
eset,
signature,
mini_gene_count = 5,
column_of_sample = "ID",
adjust_eset = FALSE,
parallel.size = 1L
)
Arguments
pdata |
Data frame with phenotype data. If 'NULL', created from 'eset' column names. |
eset |
Expression matrix (genes as rows, samples as columns). |
signature |
List of gene signatures. |
mini_gene_count |
Minimum genes required per signature. Default is 3. |
column_of_sample |
Column in 'pdata' with sample IDs. Default is '"ID"'. |
adjust_eset |
Logical: remove problematic features. Default is 'FALSE'. |
parallel.size |
Number of parallel workers. Default is 1. |
Value
Tibble with signature scores.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
signature <- list(
Signature1 = paste0("Gene", 1:15),
Signature2 = paste0("Gene", 16:30)
)
result <- calculate_sig_score_ssgsea(eset = eset, signature = signature)
Calculate Signature Score Using Z-Score Method
Description
Computes signature scores using z-score transformation.
Usage
calculate_sig_score_zscore(
pdata = NULL,
eset,
signature,
mini_gene_count = 3,
column_of_sample = "ID",
adjust_eset = FALSE
)
Arguments
pdata |
Data frame with phenotype data. If 'NULL', created from 'eset' column names. |
eset |
Expression matrix (genes as rows, samples as columns). |
signature |
List of gene signatures. |
mini_gene_count |
Minimum genes required per signature. Default is 3. |
column_of_sample |
Column in 'pdata' with sample IDs. Default is '"ID"'. |
adjust_eset |
Logical: remove problematic features. Default is 'FALSE'. |
Value
Tibble with signature scores.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
signature <- list(
Signature1 = paste0("Gene", 1:10),
Signature2 = paste0("Gene", 11:20)
)
result <- calculate_sig_score_zscore(eset = eset, signature = signature)
Visualize Cell Fractions as Stacked Bar Chart
Description
Creates stacked bar charts to visualize tumor microenvironment (TME) cell fractions. Supports batch visualization of deconvolution results from methods such as CIBERSORT, EPIC, and quanTIseq.
Usage
cell_bar_plot(
input,
id = "ID",
title = "Cell Fraction",
features = NULL,
pattern = NULL,
legend.position = "bottom",
coord_flip = TRUE,
palette = 3,
show_col = FALSE,
cols = NULL
)
Arguments
input |
Data frame containing deconvolution results. |
id |
Character string specifying the column name containing sample identifiers. Default is "ID". |
title |
Character string specifying the plot title. Default is "Cell Fraction". |
features |
Character vector specifying column names representing cell types to plot. If NULL, columns are selected based on 'pattern'. Default is NULL. |
pattern |
Character string or regular expression to match column names for automatic feature selection. Used when 'features' is NULL. Default is NULL. |
legend.position |
Character string specifying legend position ("bottom", "top", "left", "right"). Default is "bottom". |
coord_flip |
Logical indicating whether to flip plot coordinates using 'coord_flip()'. Default is TRUE. |
palette |
Integer specifying the color palette to use. Default is 3. |
show_col |
Logical indicating whether to display color information. Default is FALSE. |
cols |
Character vector of custom colors. If NULL, palette is used. Default is NULL. |
Value
A ggplot2 object representing the stacked bar chart.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
input_data <- data.frame(
ID = paste0("Sample", 1:10),
Cell_A = runif(10, 0, 0.4),
Cell_B = runif(10, 0, 0.3),
Cell_C = runif(10, 0, 0.3)
)
cell_bar_plot(input = input_data, id = "ID", features = c("Cell_A", "Cell_B", "Cell_C"))
Process Batch Table and Validate Cancer Types
Description
Processes input data containing cancer types and validates each category against a predefined list of supported cancer types ('timer_available_cancers').
Usage
check_cancer_types(args)
Arguments
args |
A list containing input parameters:
|
Value
A character matrix with two columns:
- Column 1
Expression identifiers
- Column 2
Cancer categories
Examples
args <- list(
expression = c("exp1", "exp2"),
category = c("luad", "brca"),
batch = NULL
)
result <- check_cancer_types(args)
Check Integrity and Outliers of Expression Set
Description
Performs quality checks on an expression matrix to identify missing values, infinite values, and features with zero variance. Issues warnings when potential problems are detected that may affect downstream analyses.
Usage
check_eset(eset, print_result = FALSE, estimate_sd = FALSE)
Arguments
eset |
Expression matrix or data frame with genes/features in rows and samples in columns. |
print_result |
Logical indicating whether to print detailed check results to the console. Default is 'FALSE'. |
estimate_sd |
Logical indicating whether to check for features with zero standard deviation. Default is 'FALSE'. |
Value
Invisibly returns 'NULL'. Function is called for its side effects (printing messages and issuing warnings).
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
check_eset(eset, print_result = TRUE)
Clear IOBR Data Cache
Description
Removes all cached data files downloaded from GitHub.
Usage
clear_iobr_cache()
Value
Invisible NULL. Called for side effects of clearing the cache.
Combine Phenotype Data and Expression Set
Description
Merges phenotype data with an expression matrix by matching sample IDs. Optionally filters features, applies feature manipulation, and scales expression data before combining.
Usage
combine_pd_eset(
eset,
pdata,
id_pdata = "ID",
feas = NULL,
feature_manipulation = TRUE,
scale = TRUE,
choose_who_when_duplicate = c("eset", "pdata")
)
Arguments
eset |
Expression matrix with genes/features in rows and samples in columns. |
pdata |
Data frame containing phenotype/clinical data. |
id_pdata |
Character string specifying the column name in 'pdata' containing sample identifiers. Default is '"ID"'. |
feas |
Character vector specifying features to include from 'eset'. If 'NULL', all features are used. Default is 'NULL'. |
feature_manipulation |
Logical indicating whether to apply feature manipulation to filter valid features. Default is 'TRUE'. |
scale |
Logical indicating whether to scale (standardize) expression data. Default is 'TRUE'. |
choose_who_when_duplicate |
Character string specifying which data to prefer when duplicate columns exist. Options are '"eset"' or '"pdata"'. Default is '"eset"'. |
Value
Data frame combining phenotype data and (transposed) expression data, with samples in rows and features/phenotypes in columns.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
pdata <- data.frame(
ID = colnames(eset),
group = rep(c("A", "B"), each = 5),
age = rnorm(10, 50, 10)
)
result <- combine_pd_eset(eset = eset, pdata = pdata, scale = FALSE)
dim(result)
Convert Read Counts to Transcripts Per Million (TPM)
Description
Transforms gene expression count data into Transcripts Per Million (TPM) values, normalizing for gene length and library size. Supports multiple gene ID types and can retrieve gene length information from BioMart or use local datasets.
Usage
count2tpm(
countMat,
idType = "Ensembl",
org = c("hsa", "mmus"),
source = c("local", "biomart"),
effLength = NULL,
id = "id",
gene_symbol = "symbol",
length = "eff_length",
check_data = FALSE
)
Arguments
countMat |
Numeric matrix of raw read counts with genes in rows and samples in columns. |
idType |
Character string specifying the gene identifier type. Options are '"Ensembl"', '"Entrez"', or '"Symbol"'. Default is '"Ensembl"'. |
org |
Character string specifying the organism. Options include '"hsa"' (human) or '"mmus"' (mouse). Default is '"hsa"'. |
source |
Character string specifying the source for gene length information. Options are '"biomart"' (retrieve from Ensembl BioMart) or '"local"' (use local dataset). Default is '"local"'. |
effLength |
Data frame containing effective gene length information. If 'NULL', lengths are retrieved based on 'source'. Default is 'NULL'. |
id |
Character string specifying the column name in 'effLength' containing gene identifiers. Default is '"id"'. |
gene_symbol |
Character string specifying the column name in 'effLength' containing gene symbols. Default is '"symbol"'. |
length |
Character string specifying the column name in 'effLength' containing gene lengths. Default is '"eff_length"'. |
check_data |
Logical indicating whether to check for missing values in the count matrix. Default is 'FALSE'. |
Value
Data frame of TPM-normalized expression values with genes in rows and samples in columns. Gene identifiers are converted to gene symbols in the output, regardless of the input 'idType'.
Author(s)
Wubing Zhang, Dongqiang Zeng, Yiran Fang
Examples
# Load TCGA count data
eset_stad <- load_data("eset_stad")
# Transform to TPM using local gene annotation
eset <- count2tpm(countMat = eset_stad, source = "local", idType = "ensembl")
head(eset)
Create Nested Output Folders
Description
Creates one to three nested folders (if not existing) under the current working directory and returns their names and absolute paths.
Usage
creat_folder(f1, f2 = NULL, f3 = NULL, return = NULL)
Arguments
f1 |
Character. First-level folder name. |
f2 |
Character or 'NULL'. Second-level folder name. Default is 'NULL'. |
f3 |
Character or 'NULL'. Third-level folder name. Default is 'NULL'. |
return |
Deprecated (not used). Kept for backward compatibility. |
Value
List with elements:
- folder_name
Relative path to the created folder
- abspath
Absolute path ending with '/'
Examples
creat_folder(file.path(tempdir(), "1-result"))
creat_folder(file.path(tempdir(), "1-result"), "figures", "correlation")
Deconvolve Using CIBERSORT
Description
CIBERSORT is freely available to academic users. License and binary can be obtained from https://cibersortx.stanford.edu.
Usage
deconvo_cibersort(
eset,
project = NULL,
arrays = FALSE,
perm = 1000,
absolute = FALSE,
abs_method = "sig.score",
parallel = FALSE,
num_cores = 2,
seed = NULL
)
Arguments
eset |
Expression matrix with gene symbols as row names. |
project |
Optional project name. Default is 'NULL'. |
arrays |
Logical: optimized for microarray data. Default is 'FALSE'. |
perm |
Permutations for statistical analysis. Default is 1000. |
absolute |
Logical: run in absolute mode. Default is 'FALSE'. |
abs_method |
Method for absolute mode: '"sig.score"' or '"no.sumto1"'. Default is '"sig.score"'. |
parallel |
Enable parallel execution. Default is 'FALSE'. |
num_cores |
Number of cores for parallel mode. Default is 2. |
seed |
Random seed for reproducibility. Default is 'NULL'. |
Value
Data frame with CIBERSORT cell fractions. Columns suffixed with '_CIBERSORT'.
Author(s)
Dongqiang Zeng
Examples
eset_tme_stad <- load_data("eset_tme_stad")
lm22 <- load_data("lm22")
cibersort_result <- deconvo_cibersort(
eset = eset_tme_stad,
project = "TCGA-STAD",
perm = 100
)
Deconvolve Immune Microenvironment Using EPIC
Description
Estimates immune cell fractions using EPIC algorithm.
Usage
deconvo_epic(eset, project = NULL, tumor = TRUE)
Arguments
eset |
Gene expression matrix with genes as row names. |
project |
Optional project name. Default is 'NULL'. |
tumor |
Logical indicating tumor ('TRUE') or normal ('FALSE') samples. |
Value
Data frame with EPIC cell fraction estimates. Columns suffixed with '_EPIC'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
eset <- eset[1:500, 1:5]
epic_result <- deconvo_epic(eset = eset, project = "Example", tumor = TRUE)
head(epic_result)
Calculate ESTIMATE Scores
Description
Calculates stromal, immune, and ESTIMATE scores from gene expression.
Usage
deconvo_estimate(eset, project = NULL, platform = "affymetrix")
Arguments
eset |
Gene expression matrix with gene symbols. |
project |
Optional project name. Default is 'NULL'. |
platform |
Platform type: '"affymetrix"' or '"illumina"'. Default is '"affymetrix"'. |
Value
Data frame with ESTIMATE scores. Columns suffixed with '_estimate'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
estimate_result <- deconvo_estimate(eset, project = "TCGA-STAD")
Calculate Immunophenoscore (IPS)
Description
Calculates immune phenotype scores from gene expression data.
Usage
deconvo_ips(eset, project = NULL, plot = FALSE)
Arguments
eset |
Gene expression matrix. |
project |
Optional project name. Default is 'NULL'. |
plot |
Logical: generate visualization. Default is 'FALSE'. |
Value
Data frame with IPS scores. Columns suffixed with '_IPS'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
ips_result <- deconvo_ips(eset = eset, project = "TCGA-STAD")
Deconvolve Immune Microenvironment Using MCP-Counter
Description
Estimates immune cell abundances using MCP-counter.
Usage
deconvo_mcpcounter(eset, project = NULL)
Arguments
eset |
Gene expression matrix with HGNC symbols as row names. |
project |
Optional project name. Default is 'NULL'. |
Value
Data frame with MCP-counter scores. Columns suffixed with '_MCPcounter'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
eset <- eset[1:500, 1:3]
mcp_result <- deconvo_mcpcounter(eset = eset, project = "TCGA-STAD")
head(mcp_result)
Deconvolve Using quanTIseq
Description
quanTIseq deconvolution for RNA-seq immune cell fractions.
Usage
deconvo_quantiseq(eset, project = NULL, tumor, arrays, scale_mrna)
Arguments
eset |
Gene expression matrix. |
project |
Optional project name. Default is 'NULL'. |
tumor |
Logical: tumor samples. Must be specified. |
arrays |
Logical: microarray data. Must be specified. |
scale_mrna |
Logical: correct for mRNA content. Must be specified. |
Value
Data frame with quanTIseq cell fractions. Columns suffixed with '_quantiseq'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
eset <- eset[1:500, 1:3]
res <- deconvo_quantiseq(
eset = eset, project = "stad", tumor = TRUE,
arrays = FALSE, scale_mrna = FALSE
)
head(res)
Deconvolve Using Custom Reference
Description
Cell fraction estimation using SVR or lsei methods with custom reference.
Usage
deconvo_ref(
eset,
project = NULL,
arrays = TRUE,
method = c("svr", "lsei"),
perm = 100,
reference,
scale_reference = TRUE,
absolute.mode = FALSE,
abs.method = "sig.score"
)
Arguments
eset |
Gene expression matrix. |
project |
Optional project name. Default is 'NULL'. |
arrays |
Logical: use quantile normalization. Default is 'TRUE'. |
method |
Method: '"svr"' or '"lsei"'. Default is '"svr"'. |
perm |
Permutations for SVR. Default is 100. |
reference |
Custom reference matrix (e.g., lm22, lm6). |
scale_reference |
Logical: scale reference. Default is 'TRUE'. |
absolute.mode |
Logical: absolute mode for SVR. Default is 'FALSE'. |
abs.method |
Method for absolute mode. Default is '"sig.score"'. |
Value
Data frame with cell fractions. Columns suffixed with '_CIBERSORT'.
Author(s)
Dongqiang Zeng, Rongfang Shen
Examples
lm22 <- load_data("lm22")
common_genes <- rownames(lm22)[1:500]
sim_eset <- as.data.frame(matrix(
rnorm(length(common_genes) * 5, mean = 5, sd = 2),
nrow = length(common_genes), ncol = 5
))
rownames(sim_eset) <- common_genes
colnames(sim_eset) <- paste0("Sample", 1:5)
deconvo_ref(eset = sim_eset, reference = lm22, method = "lsei")
Deconvolve Using TIMER
Description
TIMER deconvolution for cancer-specific immune estimation.
Usage
deconvo_timer(eset, project = NULL, indications = NULL)
Arguments
eset |
Gene expression matrix. |
project |
Optional project name. Default is 'NULL'. |
indications |
Cancer type for each sample (e.g., '"brca"', '"stad"'). Must match number of columns in 'eset'. |
Value
Data frame with TIMER cell fractions. Columns suffixed with '_TIMER'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
res <- deconvo_timer(
eset = eset, project = "stad",
indications = rep("stad", ncol(eset))
)
head(res)
Main TME Deconvolution Function
Description
Unified interface for multiple TME deconvolution methods.
Usage
deconvo_tme(
eset,
project = NULL,
method = tme_deconvolution_methods,
arrays = FALSE,
tumor = TRUE,
perm = 1000,
reference,
scale_reference = TRUE,
plot = FALSE,
scale_mrna = TRUE,
group_list = NULL,
platform = "affymetrix",
absolute.mode = FALSE,
abs.method = "sig.score",
...
)
Arguments
eset |
Gene expression matrix with HGNC symbols as row names. |
project |
Optional project name. Default is 'NULL'. |
method |
Deconvolution method. See [tme_deconvolution_methods]. |
arrays |
Logical: microarray-optimized mode. Default is 'FALSE'. |
tumor |
Logical: tumor-optimized mode (EPIC). Default is 'TRUE'. |
perm |
Permutations (CIBERSORT/SVR). Default is 1000. |
reference |
Custom reference matrix (SVR/lsei). |
scale_reference |
Logical: scale reference (SVR/lsei). |
plot |
Logical: generate plots (IPS). Default is 'FALSE'. |
scale_mrna |
Logical: mRNA correction (quanTIseq/EPIC). |
group_list |
Cancer types for TIMER (vector). |
platform |
Platform for ESTIMATE. Default is '"affymetrix"'. |
absolute.mode |
Logical: absolute mode (CIBERSORT/SVR). Default is 'FALSE'. |
abs.method |
Absolute mode method. Default is '"sig.score"'. |
... |
Additional arguments passed to method. |
Value
Tibble with cell fractions and 'ID' column.
Author(s)
Dongqiang Zeng, Rongfang Shen
References
Newman et al. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods.
Vegesna et al. (2013). Inferring tumour purity and stromal/immune cell admixture. Nature Communications.
Finotello et al. (2019). Molecular and pharmacological modulators of the tumor immune contexture. Genome Medicine.
Li et al. (2016). Comprehensive analyses of tumor immunity. Genome Biology.
Charoentong et al. (2017). Pan-cancer Immunogenomic Analyses. Cell Reports.
Becht et al. (2016). Estimating population abundance of tissue-infiltrating immune cells. Genome Biology.
Aran et al. (2017). xCell: digitally portraying tissue cellular heterogeneity. Genome Biology.
Racle et al. (2017). Simultaneous enumeration of cancer and immune cell types. ELife.
Examples
lm22 <- load_data("lm22")
common_genes <- rownames(lm22)[1:500]
sim_eset <- as.data.frame(matrix(
rnorm(length(common_genes) * 5, mean = 5, sd = 2),
nrow = length(common_genes), ncol = 5
))
rownames(sim_eset) <- common_genes
colnames(sim_eset) <- paste0("Sample", 1:5)
res <- deconvo_tme(eset = sim_eset, method = "cibersort", perm = 10)
Deconvolve Immune Microenvironment Using xCell
Description
Estimates immune cell fractions using the xCell algorithm. xCell provides cell type enrichment scores for 64 immune and stromal cell types from gene expression data.
Usage
deconvo_xcell(eset, project = NULL, arrays = FALSE)
Arguments
eset |
Gene expression matrix with HGNC gene symbols as row names and samples as columns. |
project |
Optional project name to add as 'ProjectID' column. Default is 'NULL'. |
arrays |
Logical indicating microarray data ('TRUE') or RNA-seq ('FALSE'). Default is 'FALSE'. |
Value
Data frame with xCell enrichment scores. Cell type columns are suffixed with '_xCell'.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
xcell_result <- deconvo_xcell(eset = eset[, 1:3], project = "TCGA-STAD")
head(xcell_result)[, 1:5]
Use quanTIseq to Deconvolute a Gene Expression Matrix
Description
Deconvolutes gene expression data to estimate immune cell fractions using the quanTIseq method. Source code from https://github.com/FFinotello/quanTIseq.
Usage
deconvolute_quantiseq.default(
mix.mat,
arrays = FALSE,
signame = "TIL10",
tumor = FALSE,
mRNAscale = TRUE,
method = c("lsei", "hampel", "huber", "bisquare"),
rmgenes = "unassigned"
)
Arguments
mix.mat |
Data frame or matrix. Gene expression matrix with gene symbols on the first column and sample IDs on the first row. Expression data must be on non-log scale (TPM for RNA-seq or expression values for microarrays). |
arrays |
Logical. Whether expression data are from microarrays. Default is FALSE. If TRUE, the rmgenes parameter is set to "none". |
signame |
Character. Name of the signature matrix. Currently only "TIL10" is available. Default is "TIL10". |
tumor |
Logical. Whether expression data are from tumor samples. If TRUE, signature genes with high expression in tumor samples are removed. Default is FALSE. |
mRNAscale |
Logical. Whether cell fractions must be scaled to account for cell-type-specific mRNA content. Default is TRUE. |
method |
Character. Deconvolution method: "hampel", "huber", "bisquare" for robust regression, or "lsei" for constrained least squares. Default is "lsei". |
rmgenes |
Character. Genes to remove: "unassigned" (default), "default", "none", or "path". |
Value
Data frame with cell fractions for each sample.
Author(s)
Finotello F, et al. (adapted for IOBR)
References
F. Finotello, C. Mayer, C. Plattner, G. Laschober, D. Rieder, H. Hackl, A. Krogsdam, W. Posch, D. Wilflingseder, S. Sopper, M. Jsselsteijn, D. Johnsons, Y. Xu, Y. Wang, M. E. Sanders, M. V. Estrada, P. Ericsson-Gonzalez, J. Balko, N. F. de Miranda, Z. Trajanoski. "quanTIseq: quantifying immune contexture of human tumors". bioRxiv 223180. https://doi.org/10.1101/223180.
Examples
lm22 <- load_data("lm22")
common_genes <- rownames(lm22)[1:500]
tpm_matrix <- as.data.frame(matrix(
rnorm(length(common_genes) * 5, mean = 5, sd = 2),
nrow = length(common_genes), ncol = 5
))
rownames(tpm_matrix) <- common_genes
colnames(tpm_matrix) <- paste0("Sample", 1:5)
results <- deconvolute_quantiseq.default(mix.mat = tpm_matrix)
Deconvolute Tumor Microenvironment Using TIMER
Description
Performs deconvolution of the tumor microenvironment using the TIMER algorithm. Processes multiple cancer datasets, removes batch effects, and estimates immune cell type abundances.
Usage
deconvolute_timer.default(args)
Arguments
args |
List or environment containing parameters:
|
Value
Matrix of abundance scores for different immune cell types across multiple cancer samples.
Examples
## Not run:
# This example requires actual expression data files
# Create a batch file with paths to expression data and cancer types
batch_file <- "batch.csv"
# batch.csv format: each row contains expression_file_path,cancer_type
# Example content:
# /path/to/exp1.txt,luad
# /path/to/exp2.txt,brca
outdir <- tempdir()
args <- list(outdir = outdir, batch = batch_file)
results <- deconvolute_timer.default(args)
## End(Not run)
Design Custom Theme for ggplot2 Plots
Description
Creates a customized ggplot2 theme based on user-specified parameters for plot elements such as title size, axis sizes, legend settings, and theme style. Supports various base themes and allows fine-tuning of visual aspects.
Usage
design_mytheme(
theme = c("light", "bw", "classic", "classic2"),
plot_title_size = 2,
axis_title_size = 2,
axis_text_size = 12,
axis_angle = 60,
hjust = 1,
legend.position = "bottom",
legend.direction = "horizontal",
legend.size = 0.25,
legend.key.height = 0.5,
legend.key.width = 0.5,
legend.size.text = 10,
legend.box = "horizontal"
)
Arguments
theme |
Base theme: "light", "bw", "classic", "classic2". Default is "light". |
plot_title_size |
Relative size of plot title. Default is 2. |
axis_title_size |
Relative size of axis titles. Default is 2. |
axis_text_size |
Size of axis tick labels. Default is 12. |
axis_angle |
Angle of x-axis tick labels. Default is 60. |
hjust |
Horizontal justification for x-axis text. Default is 1. |
legend.position |
Legend position: "none", "left", "right", "bottom", "top". Default is "bottom". |
legend.direction |
Direction of legend items: "horizontal" or "vertical". Default is "horizontal". |
legend.size |
Size of legend key. Default is 0.25. |
legend.key.height |
Height of legend key in cm. Default is 0.5. |
legend.key.width |
Width of legend key in cm. Default is 0.5. |
legend.size.text |
Size of legend text labels. Default is 10. |
legend.box |
Orientation of legend box: "horizontal" or "vertical". Default is "horizontal". |
Value
A ggplot2 theme object.
Author(s)
Dongqiang Zeng
Examples
library(ggplot2)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
mytheme <- design_mytheme(theme = "bw", plot_title_size = 1.5, axis_text_size = 14)
p + mytheme + ggtitle("Example Plot")
Permutation Test for CIBERSORT
Description
Performs permutation-based sampling to generate an empirical null distribution of correlation coefficients for p-value calculation in CIBERSORT analysis. Randomly samples from the mixture data to create null distributions.
Usage
doPerm(perm, X, Y, absolute, abs_method, seed = NULL)
Arguments
perm |
Integer. Number of permutations to perform ( |
X |
Matrix or data frame containing signature matrix (predictor variables). |
Y |
Numeric vector containing the mixture sample expression. |
absolute |
Logical indicating whether to use absolute space for weights. |
abs_method |
String specifying the method for absolute space weights: '"sig.score"' or '"no.sumto1"'. |
seed |
Integer. Random seed for reproducibility. If NULL (default), uses current random state. |
Value
List containing:
- dist
Numeric vector of correlation coefficients from permutations
Examples
X <- matrix(rnorm(100), nrow = 10)
Y <- rnorm(10)
result <- doPerm(100, X, Y, absolute = FALSE, abs_method = "sig.score")
Download IOBR Data from GitHub with Mirror Support
Description
Downloads large datasets from GitHub releases to avoid CRAN size limits. Supports multiple download mirrors for users in different regions. Data is cached locally after first download.
Usage
download_iobr_data(
name,
force = FALSE,
verbose = TRUE,
mirrors = get_default_mirrors()
)
Arguments
name |
Character string. Name of the dataset to download. |
force |
Logical. Whether to force re-download even if cached. Default: FALSE. |
verbose |
Logical. Whether to print progress messages. Default: TRUE. |
mirrors |
Character vector. URLs of mirrors to try. Default uses get_default_mirrors(). |
Value
The requested dataset.
Examples
# Download TCGA STAD signature data
tcga_sig <- download_iobr_data("tcga_stad_sig")
# Download with custom mirrors
eset <- download_iobr_data("eset_stad",
mirrors = c(
"https://ghproxy.vip/https://github.com",
"https://gh-proxy.org/https://github.com"
)
)
Enrichment Bar Plot with Two Directions
Description
Creates a bar plot visualizing enrichment results for up-regulated and down-regulated terms, using -log10(p-values) to indicate significance.
Usage
enrichment_barplot(
up_terms,
down_terms,
terms = "Description",
pvalue = "pvalue",
group = "group",
palette = "jama",
cols = NULL,
title = "Gene Ontology Enrichment",
width_wrap = 30,
font_terms = 15
)
Arguments
up_terms |
Data frame for up-regulated terms. |
down_terms |
Data frame for down-regulated terms. |
terms |
Column name for term descriptions. Default is "Description". |
pvalue |
Column name for p-values. Default is "pvalue". |
group |
Column name for group indicator. Default is "group". |
palette |
Color palette. Default is "jama". |
cols |
Character vector. Custom colors for bars. If NULL, uses palette. Default is NULL. |
title |
Plot title. Default is "Gene Ontology Enrichment". |
width_wrap |
Maximum width for wrapping pathway names. Default is 30. |
font_terms |
Font size for axis labels. Default is 15. |
Value
A ggplot object of the enrichment bar plot.
Author(s)
Dongqiang Zeng
Examples
up_terms <- data.frame(
Description = c("Pathway1", "Pathway2"),
pvalue = c(0.001, 0.01)
)
down_terms <- data.frame(
Description = c("Pathway4", "Pathway5"),
pvalue = c(0.005, 0.02)
)
p <- enrichment_barplot(
up_terms = up_terms,
down_terms = down_terms,
title = "Custom Enrichment Plot"
)
p
Visualize Expression Set Distribution
Description
Generates boxplots and density plots to analyze the distribution of expression values in an expression set. Useful for quality control and assessing data normalization.
Usage
eset_distribution(eset, quantile = 3, log = TRUE, project = NULL)
Arguments
eset |
Expression matrix or data frame with genes in rows and samples in columns. |
quantile |
Integer specifying the divisor for sampling columns. Default is 3 (samples 1/3 of columns). |
log |
Logical indicating whether to perform log2 transformation. Default is 'TRUE'. |
project |
Optional output directory path for saving files. If 'NULL', no files are saved. Default is 'NULL'. |
Value
Invisibly returns 'NULL'. If 'project' is provided, saves PNG files to disk.
Examples
eset_stad <- load_data("eset_stad")
anno_rnaseq <- load_data("anno_rnaseq")
eset <- anno_eset(eset = eset_stad, annotation = anno_rnaseq)
eset_distribution(eset)
eset_distribution(eset, project = file.path(tempdir(), "ESET"))
estimateScore
Description
This function reads a gene expression dataset in GCT format, calculates enrichment scores for specific gene sets, and writes the computed scores to an output file. It supports multiple platform types and performs platform-specific calculations if necessary.
Usage
estimateScore(
input.ds,
output.ds,
platform = c("affymetrix", "agilent", "illumina")
)
Arguments
input.ds |
A character string specifying the path to the input dataset file in GCT format. The file should have gene expression data with appropriate headers. |
output.ds |
A character string specifying the path to the output dataset file, where the calculated scores will be written. |
platform |
A character vector indicating the platform type. Must be one of "affymetrix", "agilent", or "illumina". Platform-specific calculations are performed based on this parameter. |
Value
This function does not return a value but writes the computed scores to the specified output file in GCT format.
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
eset <- as.data.frame(eset)
eset <- tibble::rownames_to_column(eset, var = "symbol")
input_file <- tempfile(pattern = "estimate_", fileext = ".gct")
output_file <- tempfile(pattern = "estimate_score_", fileext = ".gct")
writeLines(c("#1.2", paste(nrow(eset), ncol(eset) - 1, sep = "\t")), input_file)
utils::write.table(
eset,
input_file,
sep = "\t", row.names = FALSE, col.names = TRUE, append = TRUE, quote = FALSE
)
estimateScore(input.ds = input_file, output.ds = output_file, platform = "affymetrix")
Calculate Exact P-Value for Correlation
Description
Computes the exact p-value for the correlation between two numeric variables using a specified correlation method.
Usage
exact_pvalue(x, y, method)
Arguments
x |
Numeric vector representing the first variable. |
y |
Numeric vector representing the second variable. |
method |
Character string specifying the correlation method: '"spearman"', '"pearson"', or '"kendall"'. |
Value
Numeric value representing the exact p-value.
Author(s)
Dongqiang Zeng
Examples
sig_stad <- load_data("sig_stad")
p_val <- exact_pvalue(
x = sig_stad$CD8.T.cells,
y = sig_stad$CD_8_T_effector,
method = "spearman"
)
print(p_val)
Extract Data Frame from Seurat Object
Description
Extracts and combines a data frame with cells as rows and features as columns from Seurat assay data. Supports multiple assays and optional metadata integration.
Usage
extract_sc_data(
sce,
vars = NULL,
assay,
slot = "scale.data",
combine_meta_data = TRUE
)
Arguments
sce |
Seurat object. |
vars |
Character vector of feature names to extract. If 'NULL', all features are extracted. |
assay |
Character vector specifying assay(s) to pull data from. |
slot |
Character string specifying the assay data slot. Default is '"scale.data"'. |
combine_meta_data |
Logical indicating whether to combine metadata with the extracted data frame. Default is 'TRUE'. |
Value
Data frame with cells as rows and features as columns.
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("Seurat", quietly = TRUE)) {
pbmc <- SeuratObject::pbmc_small
vars <- c("PPBP", "IGLL5", "VDAC3", "CD1C", "AKR1C3")
eset <- extract_sc_data(sce = pbmc, vars = vars, assay = "RNA")
}
Feature Quality Control and Filtering
Description
Filters features (variables) in a matrix or data frame by removing those with missing values, non-numeric types, infinite values, or zero variance. This is useful for preparing data for downstream statistical analyses.
Usage
feature_manipulation(
data,
feature = NULL,
is_matrix = FALSE,
print_result = FALSE
)
Arguments
data |
A matrix or data frame containing features to filter. |
feature |
Character vector of feature names to check. If 'is_matrix = TRUE', features are extracted from row names of the matrix. |
is_matrix |
Logical indicating whether 'data' is a gene expression matrix (features as rows, samples as columns). If 'TRUE', the matrix is transposed for processing. Default is 'FALSE'. |
print_result |
Logical indicating whether to print filtering statistics. Default is 'FALSE'. |
Value
Character vector of feature names that pass all quality checks.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
feas <- feature_manipulation(
data = eset_stad,
feature = rownames(eset_stad),
is_matrix = TRUE,
print_result = TRUE
)
Feature Selection via Correlation or Differential Expression
Description
Selects informative features using either correlation with a quantitative response or differential expression (limma) for binary/continuous responses.
Usage
feature_select(
x,
y,
method = c("cor", "dif"),
family = c("spearman", "pearson"),
cutoff = NULL,
padjcut = NULL
)
Arguments
x |
Numeric matrix. Features (rows) by samples (columns). |
y |
Numeric or factor. Response vector (quantitative or binary). |
method |
Character. "cor" (correlation) or "dif" (differential expression). Default c("cor","dif"). |
family |
Character. Correlation method if method = "cor": "spearman" or "pearson". |
cutoff |
Numeric. Absolute correlation (for cor) or |log2FC| (for dif) threshold. |
padjcut |
Numeric. Adjusted p-value cutoff. |
Value
Character vector of selected feature names.
Examples
imvigor210_eset <- load_data("imvigor210_eset")
mad <- apply(imvigor210_eset, 1, mad)
imvigor210_eset <- imvigor210_eset[mad > 0.5, ]
pd1 <- as.numeric(imvigor210_eset["PDCD1", ])
group <- ifelse(pd1 > mean(pd1), "high", "low")
pd1_cor <- feature_select(
x = imvigor210_eset, y = pd1, method = "cor",
family = "pearson", padjcut = 0.05, cutoff = 0.5
)
pd1_dif <- feature_select(
x = imvigor210_eset, y = pd1, method = "dif",
padjcut = 0.05, cutoff = 2
)
pd1_dif_2 <- feature_select(
x = imvigor210_eset, y = group,
method = "dif", padjcut = 0.05, cutoff = 2
)
filterCommonGenes
Description
This function filters and merges a dataset with a set of common genes.
Usage
filterCommonGenes(input.f, output.f, id = c("GeneSymbol", "EntrezID"))
Arguments
input.f |
A character string specifying the path to the input file or a connection object. The file should be a tab-separated table with row names. |
output.f |
A character string specifying the path to the output file. |
id |
A character string indicating the type of gene identifier to use. Can be either "GeneSymbol" or "EntrezID". |
Value
No return value. The function writes the merged dataset to the specified output file.
Examples
# Create a sample input dataframe
input_data <- data.frame(
GeneSymbol = c("BRCA1", "TP53", "EGFR", "NOTCH1"),
Value = c(10, 15, 8, 12),
stringsAsFactors = FALSE
)
# Write the input data to temporary file
input_file <- tempfile(fileext = ".txt")
output_file <- tempfile(fileext = ".txt")
write.table(input_data,
file = input_file, sep = "\t", row.names = TRUE,
quote = FALSE
)
# Call the filterCommonGenes function
filterCommonGenes(input_file, output_file, id = "GeneSymbol")
Identify Marker Features in Bulk Expression Data
Description
Identifies informative marker features across groups from bulk gene expression or signature score matrices using Seurat workflows. Performs feature selection, scaling, PCA, clustering, and marker discovery.
Usage
find_markers_in_bulk(
pdata,
eset,
group,
id_pdata = "ID",
nfeatures = 2000,
top_n = 20,
thresh.use = 0.25,
only.pos = TRUE,
min.pct = 0.25,
npcs = 30
)
Arguments
pdata |
Data frame. Sample metadata. |
eset |
Matrix. Gene expression or signature score matrix. |
group |
Character. Column name in pdata specifying grouping variable. |
id_pdata |
Character. Column name for sample IDs. Default is "ID". |
nfeatures |
Integer. Number of top variable features to select. Default is 2000. |
top_n |
Integer. Number of top markers to retain per cluster. Default is 20. |
thresh.use |
Numeric. Threshold for marker selection. Default is 0.25. |
only.pos |
Logical. Whether to retain only positive markers. Default is TRUE. |
min.pct |
Numeric. Minimum expression percentage threshold. Default is 0.25. |
npcs |
Integer. Number of principal components to use. Default is 30. |
Value
List with components: 'sce' (Seurat object), 'markers' (all markers), 'top_markers' (top markers per group).
Examples
eset_tme_stad <- load_data("eset_tme_stad")
colnames(eset_tme_stad) <- substring(colnames(eset_tme_stad), 1, 12)
pdata_sig_tme <- load_data("pdata_sig_tme")
res <- find_markers_in_bulk(
pdata = pdata_sig_tme, eset = eset_tme_stad,
group = "TMEcluster"
)
# Extract top markers per cluster using base R
top_markers <- res$top_markers
Analyze Mutations Related to Signature Scores
Description
This function identifies mutations associated with a specific signature score, performs statistical tests for significance, and generates oncoprints and box plots to visualize relationships.
Usage
find_mutations(
mutation_matrix,
signature_matrix,
id_signature_matrix = "ID",
signature,
min_mut_freq = 0.05,
plot = TRUE,
method = "multi",
point_alpha = 0.1,
save_path = NULL,
palette = "jco",
cols = NULL,
show_plot = TRUE,
show_col = FALSE,
width = 8,
height = 4,
oncoprint_group_by = "mean",
oncoprint_col = "#224444",
gene_counts = 10,
jitter = FALSE,
genes = NULL,
point_size = 4.5
)
Arguments
mutation_matrix |
A matrix of mutation data with samples in rows and genes in columns. |
signature_matrix |
A data frame with sample identifiers and signature scores. |
id_signature_matrix |
Column name in 'signature_matrix' for sample identifiers. |
signature |
Name of the target signature for analysis. |
min_mut_freq |
Minimum mutation frequency required for gene inclusion. Default is 0.05. |
plot |
Logical indicating whether to generate and save plots. Default is TRUE. |
method |
Statistical test method: "multi" for both Cuzick and Wilcoxon, or "Wilcoxon" only. Default is "multi". |
point_alpha |
Transparency of points in box plot. Default is 0.1. |
save_path |
Directory to save plots and results. If NULL, no files are saved. |
palette |
Color palette for box plots(used when cols is NULL). Default is "jco". |
cols |
Character vector. Custom colors for box plots. If NULL, uses palette. Default is NULL. |
show_plot |
Logical indicating whether to display plots. Default is TRUE. |
show_col |
Logical indicating whether to show color codes. Default is FALSE. |
width |
Width of oncoprint plot. Default is 8. |
height |
Height of oncoprint plot. Default is 4. |
oncoprint_group_by |
Grouping method for oncoprint: "mean" or "quantile". Default is "mean". |
oncoprint_col |
Color for mutations in oncoprint. Default is "#224444". |
gene_counts |
Number of genes to display in oncoprint. Default is 10. |
jitter |
Logical indicating whether to add jitter to box plot points. Default is FALSE. |
genes |
Optional vector of gene names; if NULL, selects based on frequency. |
point_size |
Size of points in box plot. Default is 4.5. |
Value
A list containing statistical test results, oncoprint plots, and box plots.
Author(s)
Dongqiang Zeng
Examples
## Not run:
# This example requires a MAF file from TCGA or maftools
# See maftools or TCGAbiolinks documentation for obtaining MAF files
mut_list <- make_mut_matrix(
maf = "path_to_maf_file", isTCGA = TRUE,
category = "multi"
)
mut <- mut_list$snp
results <- find_mutations(
mutation_matrix = mut, signature_matrix = signature_data,
id_signature_matrix = "ID", signature = "CD_8_T_effector",
min_mut_freq = 0.01, plot = TRUE, method = "multi"
)
## End(Not run)
Identify Outlier Samples in Gene Expression Data
Description
Analyzes gene expression data to identify potential outlier samples using connectivity analysis via the WGCNA package. Calculates normalized adjacency and connectivity z-scores for each sample, generates connectivity plots, and optionally performs hierarchical clustering.
Usage
find_outlier_samples(
eset,
yinter = -3,
project = NULL,
plot_hculst = FALSE,
show_plot = TRUE,
index = NULL,
save = FALSE
)
Arguments
eset |
Numeric matrix. Gene expression data with genes as rows and samples as columns. |
yinter |
Numeric. Z-score threshold for identifying outliers. Default is -3. |
project |
Character or 'NULL'. Output directory path for saving plots. Required if 'save = TRUE'. Default is 'NULL'. |
plot_hculst |
Logical. Whether to plot hierarchical clustering. Default is 'FALSE'. |
show_plot |
Logical. Whether to display the connectivity plot. Default is 'TRUE'. |
index |
Integer or 'NULL'. Index for output file naming. Default is 'NULL'. |
save |
Logical. Whether to save plots to files. Default is 'FALSE'. |
Value
Character vector of sample names identified as potential outliers.
Author(s)
Dongqiang Zeng
Examples
eset_tme_stad <- load_data("eset_tme_stad")
outs <- find_outlier_samples(eset = eset_tme_stad)
print(outs)
Identify Variable Genes in Expression Data
Description
Identifies variable genes from a gene expression dataset using specified selection criteria. Supports multiple methods, including expression thresholding and variability estimation via median absolute deviation (MAD).
Usage
find_variable_genes(
eset,
data_type = c("count", "normalized"),
methods = c("low", "mad"),
prop = 0.7,
quantile = c(0.75, 0.5, 0.25),
min.mad = 0.1,
feas = NULL
)
Arguments
eset |
Numeric matrix. Gene expression data (genes as rows, samples as columns). |
data_type |
Character. Type of data: '"count"' or '"normalized"'. Default is '"count"'. |
methods |
Character vector. Methods for gene selection: '"low"', '"mad"'. Default is 'c("low", "mad")'. |
prop |
Numeric. Proportion of samples in which a gene must be expressed. Default is 0.7. |
quantile |
Numeric. Quantile threshold for minimum MAD (0.25, 0.5, 0.75). Default is 0.75. |
min.mad |
Numeric. Minimum allowable MAD value. Default is 0.1. |
feas |
Character vector or 'NULL'. Additional features to include. Default is 'NULL'. |
Value
Matrix subset of 'eset' containing variable genes.
Author(s)
Dongqiang Zeng
Examples
eset_tme_stad <- load_data("eset_tme_stad")
eset <- find_variable_genes(
eset = eset_tme_stad,
data_type = "normalized",
methods = "mad",
quantile = 0.25
)
Fix Expression Mixture Matrix
Description
Processes expression matrix by mapping genes, converting log values, quantile normalization (if arrays), and TPM normalization.
Usage
fixMixture(mix.mat, arrays = FALSE)
Arguments
mix.mat |
Expression matrix with genes as rows. |
arrays |
Logical indicating if data is from arrays. Default is FALSE. |
Value
Processed expression matrix.
Format Input Signatures from MSigDB
Description
Reads a GMT file from MSigDB and converts it into a named list of gene sets suitable for IOBR functions.
Usage
format_msigdb(gmt, ont = "term", gene = "gene")
Arguments
gmt |
Character string. Path to a GMT file. |
ont |
Character string. Name of the signature/set column in the parsed GMT table. Default is '"term"'. |
gene |
Character string. Name of the gene column in the parsed GMT table. Default is '"gene"'. |
Value
Named list of character vectors, where each element contains the genes belonging to one signature.
Examples
tf <- tempfile(fileext = ".gmt")
writeLines(
c(
"HALLMARK_TNFA_SIGNALING_VIA_NFKB\tNA\tTNF\tNFKB1\tNFKB2",
"HALLMARK_P53_PATHWAY\tNA\tTP53\tMDM2\tCDKN1A"
),
con = tf
)
sig_list <- format_msigdb(tf, ont = "term", gene = "gene")
names(sig_list)
sig_list[[1]]
Transform Signature Data into List Format
Description
Converts signature data from a data frame (with signatures as columns and genes as rows) into a list format suitable for IOBR functions. Handles NA values appropriately.
Usage
format_signatures(sig_data, save_signature = FALSE, output_path = NULL)
Arguments
sig_data |
Data frame with signature names as columns and genes in rows. Use 'NA' for missing values. |
save_signature |
Logical. Whether to save the signature list as RData. Default is 'FALSE'. |
output_path |
Character. Full path (without extension) for output file. Required if 'save_signature = TRUE'. Default is 'NULL'. |
Value
List of signatures.
Author(s)
Dongqiang Zeng
Examples
sig_data <- data.frame(
Signature1 = c("Gene1", "Gene2", "Gene3", NA),
Signature2 = c("Gene4", "Gene5", NA, NA)
)
format_signatures(sig_data)
Generate Reference Signature Matrix
Description
Generates a reference signature matrix for cell types based on differential expression analysis. Supports both limma for normalized data and DESeq2 for raw count data.
Usage
generateRef(dds, pheno, FDR = 0.05, dat, method = "limma")
Arguments
dds |
Matrix. Raw count data from RNA-seq. Required if 'method = "DESeq2"'. |
pheno |
Character vector. Cell type class of the samples. |
FDR |
Numeric. Genes with BH adjusted p-value < FDR are considered significant. Default is 0.05. |
dat |
Matrix or data frame. Normalized transcript quantification data (e.g., FPKM, TPM). |
method |
Character. Method for differential expression: '"limma"' or '"DESeq2"'. Default is '"limma"'. |
Value
List containing: - 'reference_matrix': Data frame of median expression for significant genes across cell types. - 'G': Optimal number of probes minimizing condition number. - 'condition_number': Minimum condition number. - 'whole_matrix': Full median expression matrix.
Examples
expressionData <- matrix(runif(1000 * 4, min = 0, max = 10), ncol = 4)
rownames(expressionData) <- paste("Gene", 1:1000, sep = "_")
colnames(expressionData) <- paste("Sample", 1:4, sep = "_")
phenotype <- c("celltype1", "celltype2", "celltype1", "celltype2")
rawCountData <- matrix(sample(1:100, 1000 * 4, replace = TRUE), ncol = 4)
rownames(rawCountData) <- paste("Gene", 1:1000, sep = "_")
colnames(rawCountData) <- paste("Sample", 1:4, sep = "_")
result <- generateRef(
dds = rawCountData, pheno = phenotype,
FDR = 0.05, dat = expressionData, method = "DESeq2"
)
Generate Reference Signature Matrix Using DESeq2
Description
Uses DESeq2 to perform differential expression analysis across cell types, identifies significantly expressed genes, and creates a reference signature matrix from median expression levels.
Usage
generateRef_DEseq2(dds, pheno, FDR = 0.05, dat)
Arguments
dds |
Matrix. Raw count data from RNA-seq. |
pheno |
Character vector. Cell type classes for samples. |
FDR |
Numeric. Threshold for adjusted p-values. Default is 0.05. |
dat |
Matrix. Normalized expression data (e.g., FPKM, TPM) for calculating median expression. |
Value
List containing: - 'reference_matrix': Data frame of median expression for significant genes across cell types. - 'G': Optimal number of probes minimizing condition number. - 'condition_number': Minimum condition number. - 'whole_matrix': Full median expression matrix.
Examples
set.seed(123)
dds <- matrix(sample(0:1000, 2000, replace = TRUE), nrow = 100, ncol = 20)
colnames(dds) <- paste("Sample", 1:20, sep = "_")
rownames(dds) <- paste("Gene", 1:100, sep = "_")
pheno <- rep(c("Type1", "Type2"), each = 10)
dat <- matrix(runif(2000), nrow = 100, ncol = 20)
rownames(dat) <- rownames(dds)
colnames(dat) <- colnames(dds)
result <- generateRef_DEseq2(dds = dds, pheno = pheno, FDR = 0.05, dat = dat)
print(result$reference_matrix)
Generate Reference Signature Matrix Using Limma
Description
Performs differential expression analysis using the limma package to identify significantly expressed genes across different cell types. Computes median expression levels of these significant genes to create a reference signature matrix.
Usage
generateRef_limma(dat, pheno, FDR = 0.05)
Arguments
dat |
Matrix or data frame. Gene probes in rows and samples in columns. |
pheno |
Character vector. Cell type class of the samples. |
FDR |
Numeric. Genes with BH adjusted p-value < FDR are considered significant. Default is 0.05. |
Value
List containing: - 'reference_matrix': Data frame of median expression values for significantly expressed genes. - 'G': Number of probes used that resulted in the lowest condition number. - 'condition_number': Minimum condition number obtained. - 'whole_matrix': Matrix of median values across all samples.
Examples
dat <- matrix(rnorm(2000), nrow = 100)
rownames(dat) <- paste("Gene", 1:100, sep = "_")
colnames(dat) <- paste("Sample", 1:20, sep = "_")
pheno <- sample(c("Type1", "Type2", "Type3"), 20, replace = TRUE)
results <- generateRef_limma(dat, pheno)
print(results)
Generate Reference Gene Matrix from RNA-seq DEGs
Description
Uses DESeq2 to identify differentially expressed genes and create a reference matrix from median expression levels across cell types.
Usage
generateRef_rnaseq(dds, pheno, mode = "oneVSothers", FDR = 0.05, dat)
Arguments
dds |
Matrix. Raw count data from RNA-seq. |
pheno |
Character vector. Cell type classes. |
mode |
Character. DEG identification mode: '"oneVSothers"' or '"pairs"'. Default is '"oneVSothers"'. |
FDR |
Numeric. Threshold for adjusted p-values. Default is 0.05. |
dat |
Matrix. Normalized expression data (e.g., FPKM, TPM). |
Value
List containing: - 'reference_matrix': Data frame of median expression for significant genes across cell types. - 'G': Optimal number of probes minimizing condition number. - 'condition_number': Minimum condition number. - 'whole_matrix': Full median expression matrix.
Author(s)
Rongfang Shen
Examples
dds <- matrix(rpois(200 * 10, lambda = 10), ncol = 10)
rownames(dds) <- paste("Gene", 1:200, sep = "_")
colnames(dds) <- paste("Sample", 1:10, sep = "_")
pheno <- sample(c("Type1", "Type2", "Type3"), 10, replace = TRUE)
dat <- matrix(rnorm(200 * 10), ncol = 10)
rownames(dat) <- rownames(dds)
colnames(dat) <- colnames(dds)
results <- generateRef_rnaseq(dds = dds, pheno = pheno, FDR = 0.05, dat = dat)
Generate Reference Matrix from Seurat Object
Description
Generates reference gene expression data from a Seurat object by identifying marker genes for each cell type and aggregating expression data.
Usage
generateRef_seurat(
sce,
celltype = NULL,
proportion = NULL,
assay_deg = "RNA",
slot_deg = "data",
adjust_assay = FALSE,
assay_out = "RNA",
slot_out = "data",
verbose = FALSE,
only.pos = TRUE,
n_ref_genes = 50,
logfc.threshold = 0.15,
test.use = "wilcox"
)
Arguments
sce |
Seurat object containing single-cell RNA-seq data. |
celltype |
Character. Cell type column name in metadata. Default is 'NULL' (uses default identity). |
proportion |
Numeric. Proportion of cells to randomly select for analysis. Default is 'NULL' (use all cells). |
assay_deg |
Character. Assay for finding markers. Default is '"RNA"'. |
slot_deg |
Character. Slot for finding markers. Default is '"data"'. |
adjust_assay |
Logical. Whether to adjust assay for SCT. Default is 'FALSE'. |
assay_out |
Character. Assay for output. Default is '"RNA"'. |
slot_out |
Character. Slot for output. Default is '"data"'. |
verbose |
Logical. Print verbose messages. Default is 'FALSE'. |
only.pos |
Logical. Return only positive markers. Default is 'TRUE'. |
n_ref_genes |
Integer. Number of reference genes per cell type. Default is 50. |
logfc.threshold |
Numeric. Log fold change threshold. Default is 0.15. |
test.use |
Character. Statistical test for marker identification. Default is '"wilcox"'. |
Value
Matrix containing aggregated expression data for reference genes.
Author(s)
Dongqiang Zeng
Examples
## Not run:
if (requireNamespace("Seurat", quietly = TRUE)) {
# Requires a Seurat object with sufficient cells and markers
sm <- generateRef_seurat(sce = seurat_obj, celltype = "cell_type", slot_out = "data")
}
## End(Not run)
Extract Hazard Ratio and Confidence Intervals from Cox Model
Description
Extracts hazard ratio (HR) and 95 Cox proportional hazards model.
Usage
getHRandCIfromCoxph(coxphData)
Arguments
coxphData |
Fitted Cox model object from 'survival::coxph()'. |
Value
Data frame with p-values, HR, and confidence intervals.
Author(s)
Dorothee Nickles
Dongqiang Zeng
Examples
library(survival)
set.seed(123)
df <- data.frame(
TTE = rexp(200, rate = 0.1),
Cens = rbinom(200, size = 1, prob = 0.7),
group = sample(c("Treatment", "Control"), 200, replace = TRUE)
)
coxphData <- survival::coxph(survival::Surv(TTE, Cens) ~ group, data = df)
results <- getHRandCIfromCoxph(coxphData)
print(results)
Set and View Color Palettes
Description
Retrieves color palettes from the IOBR package with options for randomization and visualization. Users can specify predefined palettes or provide custom colors.
Usage
get_cols(cols = "normal", palette = 1, show_col = TRUE, seed = 123)
Arguments
cols |
Character vector of colors, or one of: - '"normal"': Use standard palette - '"random"': Randomly shuffle the palette Default is '"normal"'. |
palette |
Numeric or character specifying the palette. Options are 1, 2, 3, 4, or palette name. Default is 1. |
show_col |
Logical indicating whether to display the color palette. Default is 'TRUE'. |
seed |
Integer seed for random number generator when 'cols = "random"'. Default is 123. |
Value
Character vector of colors.
Examples
# Get default palette
mycols <- get_cols()
# Get random palette
mycols <- get_cols(cols = "random", seed = 456)
# Use custom colors
mycols <- get_cols(cols = c("red", "blue", "green"))
Calculate and Visualize Correlation Between Two Variables
Description
Calculates and visualizes the correlation between two variables with options for scaling, handling missing values, and incorporating grouping data.
Usage
get_cor(
eset,
pdata = NULL,
var1,
var2,
is.matrix = FALSE,
id_eset = "ID",
id_pdata = "ID",
scale = TRUE,
subtype = NULL,
na.subtype.rm = FALSE,
color_subtype = NULL,
palette = "jama",
index = NULL,
method = c("spearman", "pearson", "kendall"),
show_cor_result = TRUE,
col_line = NULL,
id = NULL,
show_label = FALSE,
point_size = 4,
title = NULL,
alpha = 0.5,
title_size = 1.5,
text_size = 10,
axis_angle = 0,
hjust = 0,
show_plot = TRUE,
save_plot = FALSE,
path = NULL,
fig.format = "png",
fig.width = 7,
fig.height = 7.3,
add.hdr.line = FALSE
)
Arguments
eset |
Dataset containing the variables (data frame or matrix). |
pdata |
Optional phenotype data frame. Default is 'NULL'. |
var1 |
Name of the first variable. |
var2 |
Name of the second variable. |
is.matrix |
Logical indicating if 'eset' is a matrix with features as rows. Default is 'FALSE'. |
id_eset |
ID column in 'eset'. Default is '"ID"'. |
id_pdata |
ID column in 'pdata'. Default is '"ID"'. |
scale |
Logical indicating whether to scale data. Default is 'TRUE'. |
subtype |
Optional grouping variable for coloring points. Default is 'NULL'. |
na.subtype.rm |
Logical indicating whether to remove NA in subtype. Default is 'FALSE'. |
color_subtype |
Colors for subtypes. Default is 'NULL'. |
palette |
Color palette name. Default is '"jama"'. |
index |
Plot index for filename. Default is 'NULL' (uses 1). |
method |
Correlation method: '"spearman"', '"pearson"', or '"kendall"'. Default is '"spearman"'. |
show_cor_result |
Logical indicating whether to print correlation result. Default is 'TRUE'. |
col_line |
Color of regression line. Default is 'NULL' (auto-determine). |
id |
Column for point labels. Default is 'NULL'. |
show_label |
Logical indicating whether to show labels. Default is 'FALSE'. |
point_size |
Size of points. Default is 4. |
title |
Plot title. Default is 'NULL'. |
alpha |
Transparency of points. Default is 0.5. |
title_size |
Title font size. Default is 1.5. |
text_size |
Text font size. Default is 10. |
axis_angle |
Axis label angle. Default is 0. |
hjust |
Horizontal justification. Default is 0. |
show_plot |
Logical indicating whether to display plot. Default is 'TRUE'. |
save_plot |
Logical indicating whether to save plot. Default is 'FALSE'. |
path |
Save path. Default is 'NULL'. |
fig.format |
Figure format: '"png"' or '"pdf"'. Default is '"png"'. |
fig.width |
Figure width in inches. Default is 7. |
fig.height |
Figure height in inches. Default is 7.3. |
add.hdr.line |
Logical for adding HDR (high density region) lines. Default is 'FALSE'. |
Value
A ggplot object of the correlation plot.
Author(s)
Dongqiang Zeng
Examples
eset_tme_stad <- load_data("eset_tme_stad")
get_cor(
eset = eset_tme_stad,
is.matrix = TRUE,
var1 = "GZMB",
var2 = "CD274"
)
Calculate and Visualize Correlation Matrix Between Two Variable Sets
Description
Calculates and visualizes the correlation matrix between two sets of variables. Supports Pearson, Spearman, and Kendall correlation methods. The function generates a customizable heatmap with significance stars.
Usage
get_cor_matrix(
data,
feas1,
feas2,
method = c("pearson", "spearman", "kendall"),
path = NULL,
index = 1,
fig.type = "pdf",
width = NULL,
height = NULL,
project = NULL,
is.matrix = FALSE,
scale = TRUE,
font.size = 15,
fill_by_cor = FALSE,
round.num = 1,
font.size.star = 8,
cols = NULL
)
Arguments
data |
Input data frame or matrix. Variables should be in columns. |
feas1 |
Character vector of variable names for the first set. |
feas2 |
Character vector of variable names for the second set. |
method |
Correlation method: '"pearson"', '"spearman"', or '"kendall"'. Default is '"pearson"'. |
path |
Directory to save the plot. If 'NULL', plot is not saved. Default is 'NULL'. |
index |
Numeric prefix for output filename. Default is 1. |
fig.type |
File format: '"pdf"', '"png"', etc. Default is '"pdf"'. |
width |
Plot width in inches. Auto-calculated if 'NULL'. |
height |
Plot height in inches. Auto-calculated if 'NULL'. |
project |
Project name for plot title. Default is 'NULL'. |
is.matrix |
Logical: if 'TRUE', data is transposed. Default is 'FALSE'. |
scale |
Logical: scale variables before correlation. Default is 'TRUE'. |
font.size |
Font size for axis labels. Default is 15. |
fill_by_cor |
Logical: show correlation values instead of stars. Default is 'FALSE'. |
round.num |
Decimal places for correlation values. Default is 1. |
font.size.star |
Font size for significance stars. Default is 8. |
cols |
Custom colors for gradient (low, mid, high). If 'NULL', uses blue-white-red. Default is 'NULL'. |
Value
ggplot object displaying the correlation matrix heatmap.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
data <- as.data.frame(matrix(rnorm(1000), nrow = 100, ncol = 10))
colnames(data) <- paste0("Gene_", 1:10)
feas1 <- c("Gene_1", "Gene_2", "Gene_3")
feas2 <- c("Gene_4", "Gene_5", "Gene_6")
cor_plot <- get_cor_matrix(
data = data,
feas1 = feas1,
feas2 = feas2,
method = "spearman",
project = "Example Correlation"
)
Get Default Download Mirrors
Description
Returns the default list of download mirrors.
Usage
get_default_mirrors()
Value
Character vector of mirror URLs.
Extract Top Marker Genes from Single-Cell Differential Results
Description
Selects the top N marker genes per cluster from a ranked differential expression result table.
Usage
get_sig_sc(
deg,
cluster = "cluster",
gene = "gene",
avg_log2FC = "avg_log2FC",
n = 100
)
Arguments
deg |
Data frame or matrix. Ranked marker statistics. |
cluster |
Character. Column name containing cluster identifiers. Default is '"cluster"'. |
gene |
Character. Column name containing gene identifiers. Default is '"gene"'. |
avg_log2FC |
Character. Column name for average log2 fold change. Default is '"avg_log2FC"'. |
n |
Integer. Number of top markers per cluster. Default is 100. |
Value
List of character vectors; each element contains the top N genes for a cluster.
Examples
deg <- load_data("deg")
get_sig_sc(deg, cluster = "cluster", gene = "gene", avg_log2FC = "avg_log2FC", n = 100)
GSVA API Version Detection
Description
Detects whether the installed GSVA package supports the new parameter-based API (gsvaParam/ssgseaParam) or the old direct argument API. This function is used internally to ensure compatibility across different GSVA package versions (1.50.0+ vs older versions).
Usage
gsva_use_new_api()
Value
A list with two elements: - 'use_new_api': Logical indicating whether to use the new API ('TRUE') or old API ('FALSE') - 'gsva_version': Character string of the installed GSVA version, or '"not installed"' if not available
Identify High-Variance Features from Statistical Results
Description
Selects top variable (up- and down-regulated) features based on adjusted p-value and log fold-change thresholds.
Usage
high_var_fea(
result,
target,
name_padj = "padj",
padj_cutoff = 1,
name_logfc,
logfc_cutoff = 0,
n = 10,
data_type = NULL
)
Arguments
result |
Data frame or tibble. Statistical results containing feature, adjusted p-value, and logFC columns. |
target |
Character. Column name of feature identifiers. |
name_padj |
Character. Adjusted p-value column name. Default is '"padj"'. |
padj_cutoff |
Numeric. Adjusted p-value threshold. Default is 1. |
name_logfc |
Character. log2 fold-change column name. |
logfc_cutoff |
Numeric. Absolute log2 fold-change threshold. Default is 0. |
n |
Integer. Number of top up and top down features to select. Default is 10. |
data_type |
Character or 'NULL'. If '"survival"', adjusts logFC interpretation. Default is 'NULL'. |
Value
Character vector of selected feature names (combined up and down sets).
Author(s)
Dongqiang Zeng
Examples
result_data <- data.frame(
gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5"),
padj = c(0.01, 0.02, 0.05, 0.001, 0.03),
logfc = c(-2, 1.5, -3, 2.5, 0.5)
)
high_var_fea(
result = result_data,
target = "gene",
name_padj = "padj",
name_logfc = "logfc",
n = 2,
padj_cutoff = 0.05,
logfc_cutoff = 1.5
)
IMvigor210 Bladder Cancer Immunotherapy Cohort Data
Description
Clinical and biomarker data from the IMvigor210 clinical trial cohort. Includes treatment response, survival outcomes, and immune biomarker measurements for bladder cancer patients treated with atezolizumab.
Usage
data(imvigor210_pdata)
Format
A data frame with patients as rows and variables as columns:
- ID
Patient sample identifier
- BOR
Best overall response (CR, PR, SD, PD, NA)
- BOR_binary
Binary response classification (R=responder, NR=non-responder)
- OS_days
Overall survival time in days
- OS_status
Overall survival status (0=alive, 1=dead)
- Mutation_Load
Tumor mutation burden
- Neo_antigen_Load
Neoantigen load
- CD_8_T_effector
CD8+ T effector signature score
- Immune_Checkpoint
Immune checkpoint signature score
- Pan_F_TBRs
Pan-fibroblast TGF-
\betaresponse signature- Mismatch_Repair
Mismatch repair status or signature
- TumorPurity
Estimated tumor purity
Source
IMvigor210 clinical trial (NCT02108652)
References
Mariathasan S et al. TGF\beta attenuates tumour response to PD-L1 blockade
by contributing to exclusion of T cells. Nature 554, 544-548 (2018).
doi:10.1038/nature25501
Examples
data(imvigor210_pdata)
head(imvigor210_pdata)
Integrative Correlation Analysis Between Phenotype and Features
Description
Performs comprehensive correlation analysis between phenotype data and feature data, supporting both continuous and categorical phenotypes. Filters features based on statistical significance and generates publication-ready visualizations including box plots, heatmaps, and correlation plots.
Usage
iobr_cor_plot(
pdata_group,
id1 = "ID",
feature_data,
id2 = "ID",
target = NULL,
group = "group3",
is_target_continuous = TRUE,
padj_cutoff = 1,
index = 1,
category = "signature",
signature_group = NULL,
ProjectID = "TCGA",
palette_box = "nrc",
cols_box = NULL,
palette_corplot = "pheatmap",
palette_heatmap = 2,
feature_limit = 26,
character_limit = 60,
show_heatmap_col_name = FALSE,
show_col = FALSE,
show_plot = FALSE,
path = NULL,
discrete_x = 20,
discrete_width = 20,
show_palettes = FALSE,
fig.type = "pdf"
)
Arguments
pdata_group |
Data frame containing phenotype data with an identifier column. |
id1 |
Character string specifying the column name in 'pdata_group' serving as the sample identifier. Default is '"ID"'. |
feature_data |
Data frame containing feature data with corresponding identifiers. |
id2 |
Character string specifying the column name in 'feature_data' serving as the sample identifier. Default is '"ID"'. |
target |
Character string specifying the target variable column name for continuous analysis. Default is 'NULL'. |
group |
Character string specifying the grouping variable name for categorical analysis. Default is '"group3"'. |
is_target_continuous |
Logical indicating whether the target variable is continuous, which affects grouping strategy. Default is 'TRUE'. |
padj_cutoff |
Numeric value specifying the adjusted p-value cutoff for filtering features. Default is '1'. |
index |
Numeric index used for ordering output file names. Default is '1'. |
category |
Character string specifying the data category: '"signature"' or '"gene"'. |
signature_group |
List specifying the grouping variable for signatures. Options include '"sig_group"' for signature grouping or '"signature_collection"'/'"signature_tme"' for gene grouping. |
ProjectID |
Character string specifying the project identifier for file naming. |
palette_box |
Character string or integer specifying the color palette for box plots. Default is '"nrc"'. |
cols_box |
Character vector of specific colors for box plots. Default is 'NULL'. |
palette_corplot |
Character string or integer specifying the color palette for correlation plots. Default is '"pheatmap"'. |
palette_heatmap |
Integer specifying the color palette index for heatmaps. Default is '2'. |
feature_limit |
Integer specifying the maximum number of features to display. Default is '26'. |
character_limit |
Integer specifying the maximum number of characters for variable labels. Default is '60'. |
show_heatmap_col_name |
Logical indicating whether to display column names on heatmaps. Default is 'FALSE'. |
show_col |
Logical indicating whether to display color codes for palettes. Default is 'FALSE'. |
show_plot |
Logical indicating whether to display plots. Default is 'FALSE'. |
path |
Character string specifying the directory path for saving output files. Default is 'NULL'. |
discrete_x |
Numeric threshold for character length beyond which labels are discretized. Default is '20'. |
discrete_width |
Numeric value specifying the width for label wrapping in plots. Default is '20'. |
show_palettes |
Logical indicating whether to display color palettes. Default is 'FALSE'. |
fig.type |
Character string specifying the format for saving figures ('"pdf"', '"png"', etc.). Default is '"pdf"'. |
Value
Depending on configuration, returns ggplot2 objects (box plots, heatmaps, correlation plots) and/or a data frame containing statistical analysis results.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
pdata_group <- data.frame(
ID = 1:100,
phenotype_score = rnorm(100)
)
feature_data <- data.frame(
ID = 1:100,
Feature1 = rnorm(100),
Feature2 = rnorm(100),
Feature3 = rnorm(100)
)
sig_group_example <- list(
signature = c("Feature1", "Feature2", "Feature3")
)
results <- iobr_cor_plot(
pdata_group = pdata_group,
feature_data = feature_data,
id1 = "ID",
id2 = "ID",
target = "phenotype_score",
is_target_continuous = TRUE,
category = "signature",
signature_group = sig_group_example,
show_plot = FALSE,
path = tempdir()
)
print(results)
Tumor Microenvironment (TME) Deconvolution Pipeline
Description
Executes an integrated TME analysis on a gene expression matrix: performs immune/stromal cell deconvolution using multiple algorithms, computes signature scores, and aggregates results. Designed for exploratory immunogenomic profiling.
Usage
iobr_deconvo_pipeline(
eset,
project,
array,
tumor_type,
path = NULL,
permutation = 1000
)
Arguments
eset |
Numeric matrix. Gene expression (TPM/log scale) with genes in rows. |
project |
Character. Project name (used in output naming). |
array |
Logical. Whether data originated from an array platform. Affects deconvolution choices. |
tumor_type |
Character. Tumor type code (e.g., "stad") used by certain methods. |
path |
Character. Output directory. Default is NULL (uses tempdir()). |
permutation |
Integer. Number of permutations for CIBERSORT (and similar). Default is 1000. |
Value
Data frame integrating cell fractions and signature scores (also writes intermediate outputs to disk).
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
res <- iobr_deconvo_pipeline(
eset = eset, project = "STAD",
array = FALSE, tumor_type = "stad",
path = tempdir(), permutation = 10
)
Differential Expression Analysis
Description
Performs differential expression analysis on gene expression data using either DESeq2 or limma. Includes pre-processing steps like filtering low count data, and calculates fold changes and adjusted p-values. Optionally generates volcano plots and heatmaps.
Usage
iobr_deg(
eset,
annotation = NULL,
id_anno = NULL,
pdata,
group_id = "group",
pdata_id = "ID",
array = FALSE,
method = c("DESeq2", "limma"),
contrast = c("High", "Low"),
path = NULL,
padj_cutoff = 0.01,
logfc_cutoff = 0.5,
volcano_plot = FALSE,
col_volcano = 1,
heatmap = TRUE,
col_heatmap = 1,
parallel = FALSE
)
Arguments
eset |
A matrix of gene expression data where rows represent genes and columns represent samples. |
annotation |
Optional data frame for mapping gene IDs to gene names. Default is 'NULL'. |
id_anno |
Character string specifying the identifier column in annotation. Default is 'NULL'. |
pdata |
A data frame containing sample information and grouping labels. |
group_id |
Character string specifying the column name in 'pdata' containing grouping labels. Default is '"group"'. |
pdata_id |
Character string specifying the column name in 'pdata' for sample IDs. Default is '"ID"'. |
array |
Logical indicating whether to perform quantile normalization. Default is 'FALSE'. |
method |
Character string specifying the method: '"DESeq2"' or '"limma"'. Default is '"DESeq2"'. |
contrast |
Character vector of length 2 specifying contrast groups. Default is 'c("High", "Low")'. |
path |
Character string for output directory. Default is 'NULL'. |
padj_cutoff |
Numeric cutoff for adjusted p-values. Default is '0.01'. |
logfc_cutoff |
Numeric log2 fold change cutoff. Default is '0.5'. |
volcano_plot |
Logical indicating whether to generate a volcano plot. Default is 'FALSE'. |
col_volcano |
Integer specifying color index for volcano plot. Default is '1'. |
heatmap |
Logical indicating whether to generate a heatmap. Default is 'TRUE'. |
col_heatmap |
Integer specifying color index for heatmap. Default is '1'. |
parallel |
Logical indicating whether to run in parallel. Default is 'FALSE'. |
Value
Data frame containing differentially expressed genes with statistics including log2 fold changes and adjusted p-values.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
stad_group <- load_data("stad_group")
deg <- iobr_deg(
eset = eset_stad, pdata = stad_group,
group_id = "subtype", pdata_id = "ID", array = FALSE,
method = "DESeq2", contrast = c("EBV", "GS"),
path = file.path(tempdir(), "STAD")
)
head(deg)
Principal Component Analysis (PCA) Visualization
Description
This function performs Principal Component Analysis (PCA) on gene expression data, reduces dimensionality while preserving variance, and generates a scatter plot visualization.
Usage
iobr_pca(
data,
is.matrix = TRUE,
scale = TRUE,
is.log = FALSE,
pdata,
id_pdata = "ID",
group = NULL,
geom.ind = "point",
cols = "normal",
palette = "jama",
repel = FALSE,
ncp = 5,
axes = c(1, 2),
addEllipses = TRUE
)
Arguments
data |
Input data for PCA: matrix or data frame. |
is.matrix |
Logical indicating if input is a matrix. Default is TRUE. |
scale |
Logical indicating whether to scale the data. Default is TRUE. |
is.log |
Logical indicating whether to log-transform the data. Default is FALSE. |
pdata |
Data frame with sample IDs and grouping information. |
id_pdata |
Column name in 'pdata' for sample IDs. Default is "ID". |
group |
Column name in 'pdata' for grouping variable. Default is NULL. |
geom.ind |
Type of geometric representation for points. Default is "point". |
cols |
Color scheme for groups. Default is "normal". |
palette |
Color palette for groups. Default is "jama". |
repel |
Logical indicating whether to repel overlapping points. Default is FALSE. |
ncp |
Number of principal components to retain. Default is 5. |
axes |
Principal components to plot (e.g., c(1, 2)). Default is c(1, 2). |
addEllipses |
Logical indicating whether to add concentration ellipses. Default is TRUE. |
Value
A ggplot object of the PCA plot.
Author(s)
Dongqiang Zeng
Examples
if (requireNamespace("FactoMineR", quietly = TRUE) &&
requireNamespace("factoextra", quietly = TRUE)) {
set.seed(123)
eset <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
pdata <- data.frame(
ID = colnames(eset),
group = rep(c("A", "B"), each = 5)
)
iobr_pca(eset, pdata = pdata, id_pdata = "ID", group = "group", addEllipses = FALSE)
}
Map Score to Immunophenoscore
Description
Maps input score to Immunophenoscore (IPS) on a 0-10 scale. Scores \le0 map to 0,
scores \ge3 map to 10, and intermediate scores are linearly scaled.
Usage
ipsmap(x)
Arguments
x |
Numeric value representing the aggregate z-score. |
Value
Integer value between 0 and 10 representing the Immunophenoscore.
Examples
ips <- ipsmap(2.5)
ips <- ipsmap(-1)
ips <- ipsmap(5)
Feature Selection for Predictive or Prognostic Models Using LASSO Regression
Description
Applies LASSO (Least Absolute Shrinkage and Selection Operator) regression to construct predictive or prognostic models. Supports both binary and survival response variables, utilizing cross-validation for optimal model selection.
Usage
lasso_select(
x,
y,
type = c("binary", "survival"),
nfold = 10,
lambda = c("lambda.min", "lambda.1se")
)
Arguments
x |
A numeric matrix. Features (e.g., gene symbols or CGI) as row names and samples as column names. |
y |
A response variable vector. Can be binary (0/1) or survival data (e.g., survival time and event status). |
type |
Character. Model type: "binary" for binary response or "survival" for survival analysis. Default is "binary". |
nfold |
Integer. Number of folds for cross-validation. Default is 10. |
lambda |
Character. Regularization parameter selection: "lambda.min" (minimum mean cross-validated error) or "lambda.1se" (one standard error from minimum). Default is "lambda.min". |
Value
Character vector of selected feature names with non-zero coefficients in the optimal LASSO model.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
gene_expression <- matrix(rnorm(100 * 20), nrow = 100, ncol = 20)
rownames(gene_expression) <- paste0("Gene", 1:100)
colnames(gene_expression) <- paste0("Sample", 1:20)
# Binary response example
binary_outcome <- sample(c(0, 1), 20, replace = TRUE)
lasso_select(
x = gene_expression,
y = binary_outcome,
type = "binary",
nfold = 5
)
Differential Expression Analysis Using Limma
Description
Performs differential expression analysis using the limma package on a given gene expression dataset. Constructs a design matrix from phenotype data, fits a linear model, applies contrasts, and computes statistics for differential expression.
Usage
limma.dif(exprdata, pdata, contrastfml)
Arguments
exprdata |
A matrix with rownames as features like gene symbols or cgi, and colnames as samples. |
pdata |
A two-column dataframe where the first column matches the colnames of exprdata and the second column contains the grouping variable. |
contrastfml |
A character vector for contrasts to be tested (see ?makeContrasts for more details). |
Value
Returns a dataframe from limma::topTable, which includes genes as rows and columns like genelist, logFC, AveExpr, etc.
Examples
# Toy example with 100 genes and 6 samples
set.seed(123)
exprdata <- matrix(
rnorm(100 * 6),
nrow = 100,
ncol = 6,
dimnames = list(
paste0("gene", 1:100),
paste0("sample", 1:6)
)
)
# Phenotype data: 3 vs 3
pdata <- data.frame(
sample = colnames(exprdata),
group = rep(c("group1", "group2"), each = 3),
stringsAsFactors = FALSE
)
# Differential expression: group1 vs group2
res <- limma.dif(
exprdata = exprdata,
pdata = pdata,
contrastfml = "group1 - group2"
)
head(res)
List Available GitHub Datasets
Description
List Available GitHub Datasets
Usage
list_github_datasets()
Value
Character vector of available dataset names.
Examples
list_github_datasets()
List Current Download Mirrors
Description
Returns the current list of download mirrors.
Usage
list_iobr_mirrors()
Value
Character vector of mirror URLs.
Examples
list_iobr_mirrors()
Load IOBR Datasets
Description
Loads internal datasets from the IOBR package. Supports both sysdata (internal) and exported data files included in the package.
Usage
load_data(name)
Arguments
name |
Character string. Name of the dataset to load. Must be a single value. Available datasets include: - Expression data: '"eset_stad"', '"imvigor210_eset"', '"melanoma_data"' - Signatures: '"signature_tme"', '"signature_metabolism"', '"signature_collection"' - Gene sets: '"hallmark"', '"kegg"', '"go_bp"', '"go_cc"', '"go_mf"' - Cell markers: '"cellmarkers"', '"mcp_genes"' - Phenotype data: '"pdata_stad"', '"pdata_sig_tme"', '"pdata_acrg"' - Reference data: '"xCell.data"', '"quantiseq_data"', '"TRef"', '"BRef"' - Color palettes: '"palette1"', '"palette2"', '"palette3"', '"palette4"' |
Value
Dataset object, typically a 'list', 'data.frame', or 'matrix'. The exact type depends on the requested dataset.
Examples
# Load signature collection
sig_tme <- load_data("signature_tme")
# Load expression data
eset <- load_data("eset_stad")
# Load color palette
colors <- load_data("palette1")
# Error handling with suggestions for similar names
try(load_data("sign_tme")) # Will suggest "signature_tme"
Log2 Transformation of Gene Expression Matrix
Description
Determines whether a gene expression matrix requires log2 transformation based on the distribution of values, and applies it if necessary. This is useful for automatically detecting raw counts or linear-scale data that should be log-transformed for downstream analysis.
Usage
log2eset(eset)
Arguments
eset |
Numeric matrix. Gene expression data with genes as rows and samples in columns. |
Value
Numeric matrix. Log2-transformed gene expression data (if transformation was needed), or the original data otherwise.
Examples
set.seed(123)
eset <- matrix(rnorm(1000, mean = 10, sd = 2), nrow = 100, ncol = 10)
rownames(eset) <- paste0("Gene", 1:100)
colnames(eset) <- paste0("Sample", 1:10)
eset_transformed <- log2eset(eset)
Quantile Normalization
Description
Quantile Normalization
Usage
makeQN(mix.mat)
Arguments
mix.mat |
Expression matrix. |
Value
Quantile normalized matrix.
Construct Mutation Matrices from MAF Data
Description
Builds mutation presence/absence matrices from MAF input (file path or MAF object). Supports multiple categories: all mutations, SNPs, indels, and frameshift mutations. When category = "multi", returns a list of matrices for each category. Compatible with TCGA-formatted data.
Usage
make_mut_matrix(
maf = NULL,
mut_data = NULL,
isTCGA = TRUE,
category = c("multi", "all", "snp", "indel", "frameshift"),
Tumor_Sample_Barcode = "Tumor_Sample_Barcode",
Hugo_Symbol = "Hugo_Symbol",
Variant_Classification = "Variant_Classification",
Variant_Type = "Variant_Type"
)
Arguments
maf |
Character or MAF object. Path to MAF file or an already loaded MAF object. |
mut_data |
Data frame or NULL. Preloaded MAF-like data (used if 'maf' is NULL). |
isTCGA |
Logical. Whether the MAF follows TCGA conventions. Default is TRUE. |
category |
Character. Mutation category: "all", "snp", "indel", "frameshift", or "multi". Default is "multi". |
Tumor_Sample_Barcode |
Character. Column name for tumor sample IDs. Default is "Tumor_Sample_Barcode". |
Hugo_Symbol |
Character. Column name for gene symbols. Default is "Hugo_Symbol". |
Variant_Classification |
Character. Column name for variant classification (e.g., Frame_Shift_Del). Default is "Variant_Classification". |
Variant_Type |
Character. Column name for variant type (e.g., SNP, INS, DEL). Default is "Variant_Type". |
Value
List of mutation matrices (if category = "multi") or a single matrix for the specified category.
Note
Some users may encounter errors from upstream data import (e.g. "Can't combine ..$Tumor_Seq_Allele2" when using TCGAbiolinks or TCGAmutations). This is due to inconsistent column types in the source MAF tables, not an issue of this function. Please ensure your MAF or merged data frame uses consistent column types (e.g. convert allele columns to character before input).
Author(s)
Dongqian Zeng
Shixiang Huang
Examples
## Not run:
# See maftools or TCGAbiolinks documentation for obtaining MAF input
# Example: Download MAF file from TCGA portal
mut_list <- make_mut_matrix(maf = "path_to_maf_file.maf", isTCGA = TRUE, category = "multi")
## End(Not run)
Map Gene Names to Approved Symbols
Description
Maps gene symbols to approved HGNC symbols, handling withdrawn and previous symbols.
Usage
mapGenes(mydata)
Arguments
mydata |
Expression matrix with gene symbols as rownames. |
Value
Matrix with mapped gene symbols.
Map Score to Black and White Color
Description
Maps a numeric input value to a color from a black-white gradient palette. Values are mapped to a 1001-color palette where -2 maps to black and +2 maps to white.
Usage
mapbw(x, my_palette2 = NULL)
Arguments
x |
Numeric value to be mapped to a color (typically between -2 and 2). |
my_palette2 |
Color palette vector (should have 1001 colors). Default uses black-white gradient. |
Value
A color from the black-white palette as a hex code.
Examples
my_palette2 <- grDevices::colorRampPalette(c("black", "white"))(1001)
color <- mapbw(1.5, my_palette2)
color <- mapbw(-1, my_palette2)
Map Score to Color
Description
Maps a numeric input value to a color from a blue-white-red gradient palette. Values are mapped to a 1001-color palette where -3 maps to blue, 0 maps to white, and +3 maps to red.
Usage
mapcolors(x, my_palette = NULL)
Arguments
x |
Numeric value to be mapped to a color (typically between -3 and 3). |
my_palette |
Color palette vector (should have 1001 colors). Default uses blue-white-red gradient. |
Value
A color from the palette as a hex code.
Examples
my_palette <- grDevices::colorRampPalette(c("blue", "white", "red"))(1001)
color <- mapcolors(2, my_palette)
color <- mapcolors(-2, my_palette)
Merge Data Frames with Duplicated Column Names
Description
Merges two data frames, resolving duplicated column names according to user preference. Allows selection of which data frame's duplicated columns to retain, ensuring data integrity during merging.
Usage
merge_duplicate(
x,
y,
by.x,
by.y,
all.x = FALSE,
all.y = FALSE,
all = NULL,
choose = c("x", "y")
)
Arguments
x |
Data frame. First data frame to merge. |
y |
Data frame. Second data frame to merge. |
by.x |
Character. Column name(s) in 'x' used for merging. |
by.y |
Character. Column name(s) in 'y' used for merging. |
all.x |
Logical. Include all rows from 'x' in output. Default is 'FALSE'. |
all.y |
Logical. Include all rows from 'y' in output. Default is 'FALSE'. |
all |
Logical or 'NULL'. If not 'NULL', include all rows from both 'x' and 'y', overriding 'all.x' and 'all.y'. |
choose |
Character. Which data frame's duplicated non-joining columns to retain: '"x"' or '"y"'. Default is '"x"'. |
Value
Data frame resulting from merging 'x' and 'y' according to specified parameters.
Examples
df1 <- data.frame(ID = 1:3, Name = c("A", "B", "C"), Value = 1:3)
df2 <- data.frame(ID = 1:3, Name = c("X", "Y", "Z"), Score = 4:6)
# Merge keeping duplicated columns from x
merged_df <- merge_duplicate(df1, df2,
by.x = "ID", by.y = "ID",
all.x = TRUE, choose = "x"
)
print(merged_df)
# Merge keeping duplicated columns from y
merged_df2 <- merge_duplicate(df1, df2,
by.x = "ID", by.y = "ID",
all = TRUE, choose = "y"
)
Merging the duplicates from the input matrix.
Description
In case there are some duplicate rownames in the input matrix (mat), this function will return a similar matrix with unique rownames and the duplicate cases will be merged together based on the median values. When warn is true a warning message will be written in case there are duplicates, this warning message will include the in_type string to indicate the current type of the input matrix.
Usage
merge_duplicates(mat, warn = TRUE, in_type = NULL)
Arguments
mat |
matrix |
warn |
warning |
in_type |
default is null |
Merge Expression Sets by Row Names
Description
Merges two or three expression sets (matrices or data frames) by row names (gene symbols), removing duplicates. The function ensures common genes across all input expression sets are retained.
Usage
merge_eset(eset1, eset2, eset3 = NULL)
Arguments
eset1 |
First expression set (matrix or data frame with row names). |
eset2 |
Second expression set (matrix or data frame with row names). |
eset3 |
Optional third expression set. Default is 'NULL'. |
Value
Merged expression set (data frame) with duplicates removed. Row names correspond to gene symbols.
Author(s)
Dongqiang Zeng
Examples
# Load example data
eset_stad <- load_data("eset_stad")
# Create mock expression sets with common genes
common_genes <- c("TP53", "BRCA1", "EGFR", "MYC")
eset1 <- matrix(rnorm(12),
nrow = 4,
dimnames = list(common_genes, paste0("S", 1:3))
)
eset2 <- matrix(rnorm(16),
nrow = 4,
dimnames = list(common_genes, paste0("S", 4:7))
)
# Merge two expression sets
merged_eset <- merge_eset(eset1, eset2)
print(dim(merged_eset))
Calculate Microenvironment Scores
Description
Calculates combined immune and stroma scores from adjusted xCell scores.
Usage
microenvironmentScores(adjustedScores)
Arguments
adjustedScores |
Adjusted xCell scores matrix. |
Value
Matrix with additional microenvironment score rows.
Convert Mouse Gene Symbols to Human Gene Symbols
Description
Converts mouse gene symbols to human gene symbols in an expression dataset. Supports using either an online resource (Ensembl) or a local dataset for conversion.
Usage
mouse2human_eset(
eset,
source = c("local", "ensembl"),
is_matrix = TRUE,
column_of_symbol = NULL,
verbose = FALSE
)
Arguments
eset |
Matrix or data frame. Expression matrix with genes in rows. |
source |
Character. Data source for conversion: "ensembl" (online) or "local". Default is "ensembl". If Ensembl fails, use "local" which uses the internal 'mus_human_gene_symbol' dataset. |
is_matrix |
Logical. Whether 'eset' is a matrix with gene symbols as row names. Default is 'TRUE'. If 'FALSE', 'column_of_symbol' must be specified. |
column_of_symbol |
Character or 'NULL'. Column name containing gene symbols if 'eset' is not a matrix. Default is 'NULL'. |
verbose |
Logical. If 'TRUE', prints available Ensembl datasets. Default is 'FALSE'. |
Value
Expression set with human gene symbols.
Author(s)
Dongqiang Zeng
Examples
# Create example mouse expression data
anno_gc_vm32 <- load_data("anno_gc_vm32")
num_rows <- 200
num_cols <- 10
sample_names <- paste0("Sample", 1:num_cols)
data <- matrix(runif(num_rows * num_cols), nrow = num_rows, ncol = num_cols)
rownames(data) <- anno_gc_vm32$symbol[1:200]
colnames(data) <- sample_names
# Convert using local database
human_data <- mouse2human_eset(data, source = "local", is_matrix = TRUE)
NULL Model Coefficients for MCPcounter
Description
NULL Model Coefficients for MCPcounter
Usage
data(null_models)
Format
A 'data.frame' with cell types in rows and coefficients in columns.
Examples
data(null_models)
head(null_models)
outputGCT
Description
This function converts a gene expression dataset to a GCT format file.
Usage
outputGCT(input.f, output.f)
Arguments
input.f |
A data frame or a character string specifying the path to the input file. If a character string, the file should be a tab-separated table with row names. |
output.f |
A character string specifying the path to the output file. |
Value
No return value. The function writes the dataset to the specified output file in GCT format.
Examples
# Create a sample input data frame
sample_data <- data.frame(
Gene = c("BRCA1", "TP53", "EGFR"),
Sample1 = c(10, 15, 8),
Sample2 = c(12, 18, 7),
stringsAsFactors = FALSE
)
rownames(sample_data) <- sample_data$Gene
sample_data <- sample_data[, -1]
# Convert the input data frame to GCT format and save to temporary file
output_file <- tempfile(fileext = ".gct")
outputGCT(sample_data, output_file)
Save Signature Data to File
Description
Saves signature data to a specified file format, supporting CSV or RData. Handles single signatures or lists of signatures, converting them to a data frame for storage.
Usage
output_sig(signatures, format = c("csv", "rdata"), file.name)
Arguments
signatures |
Signature data: a list or single string of signatures. |
format |
Character. Output format: "csv" or "rdata". Default is "csv". |
file.name |
Character. Name of the output file without extension. |
Value
Data frame containing the processed signature data, also saved to the specified file.
Author(s)
Dongqiang Zeng
Examples
signature_collection <- load_data("signature_collection")
tmpfile <- tempfile(fileext = ".csv")
output_sig(
signatures = signature_collection, format = "csv",
file.name = tools::file_path_sans_ext(tmpfile)
)
Select Color Palettes for Visualization
Description
Provides curated qualitative, sequential, and diverging palettes for multiple plot types. Supports intensity adjustment and preview.
Usage
palettes(
category = "box",
palette = "nrc",
alpha = 1,
counts = 50,
show_col = TRUE,
show_message = FALSE
)
Arguments
category |
Character. Plot/palette category: one of 'box', 'continue2', 'continue', 'random', 'heatmap', 'heatmap3', 'tidyheatmap'. |
palette |
Character or numeric. Palette name or index (varies by category). |
alpha |
Numeric. Alpha (transparency) scaling factor. Default is 1. |
counts |
Integer. Number of colors (for continuous palettes). Default is 50. |
show_col |
Logical. If TRUE, prints the palette. Default is TRUE. |
show_message |
Logical. If TRUE, prints available options. Default is FALSE. |
Value
Character vector of hex color codes.
Author(s)
Dongqiang Zeng
Examples
colors <- palettes(category = "box", palette = "nrc", show_col = FALSE)
heatmap_colors <- palettes(
category = "heatmap", palette = 1, counts = 10, show_col = FALSE
)
Parallel Permutation Test for CIBERSORT
Description
Parallel version of doPerm. Performs permutation-based sampling and runs the CoreAlg function iteratively using multiple CPU cores to accelerate computation. This function generates an empirical null distribution of correlation coefficients for p-value calculation in CIBERSORT analysis.
Usage
parallel_doperm(
perm1,
X1,
Y1,
absolute1,
abs_method1,
num_cores1 = 2,
seed = NULL
)
Arguments
perm1 |
Integer. Number of permutations to perform ( |
X1 |
Matrix or data frame. Signature matrix (cell type GEP barcode). |
Y1 |
Matrix. Mixture file containing gene expression profiles. |
absolute1 |
Logical. Whether to run in absolute mode (default: FALSE). |
abs_method1 |
String. Method for absolute mode: 'sig.score' or 'no.sumto1'. |
num_cores1 |
Integer. Number of CPU cores for parallel computation (default: 2). |
seed |
Integer. Random seed for reproducibility. If NULL (default), uses current random state. Set to a specific value (e.g., 123) for reproducible results across runs. |
Details
This function utilizes the foreach and doParallel packages
to distribute permutation iterations across multiple cores. It automatically
handles cluster setup/teardown via on.exit() to prevent resource leaks.
Note: Windows users may experience slower performance due to socket-based parallelization (PSOCK) versus forking on Unix systems.
Value
List containing:
- dist
Numeric vector of correlation coefficients from permutations, representing the empirical null distribution.
See Also
doPerm for the sequential version, CoreAlg, CIBERSORT
Examples
X <- matrix(rnorm(1000), nrow = 100)
Y <- matrix(rnorm(500), nrow = 100)
rownames(X) <- rownames(Y) <- paste0("Gene", 1:100)
result <- parallel_doperm(
perm1 = 100, X1 = X, Y1 = Y,
absolute1 = FALSE, abs_method1 = "sig.score", num_cores1 = 2
)
str(result$dist)
Default Pattern List for Name Cleaning
Description
A character vector of common substrings to remove from feature names. Used in [remove_names()] and other helper functions.
Usage
patterns_to_na
Format
A character vector of length 12.
Value
Character vector of patterns to remove.
Examples
# View default patterns
patterns_to_na
Toy STAD Phenotype Data
Description
A data frame containing clinical and pathological annotations for the
TCGA stomach adenocarcinoma (STAD) cohort. Each row corresponds to one
tumour sample and can be matched to the columns of eset_stad via
the ID column. This dataset is typically used together with
eset_stad in examples of survival analysis, subgroup comparison
and immune deconvolution in the IOBR package.
Usage
data(pdata_stad)
Format
A data frame with one row per TCGA-STAD sample and 8 variables:
- ID
Character. TCGA sample barcode, matching the column names of
eset_stad.- stage
Factor. Pathological stage (e.g.
"Stage_I","Stage_II","Stage_III","Stage_IV").- status
Factor. Vital status at last follow-up (
"Alive"or"Dead").- Lauren
Factor. Lauren classification of gastric cancer (
"Intestinal","Diffuse","Mixed"orNA).- subtype
Factor. Molecular subtype (e.g.
"CIN","EBV","GS","MSI").- EBV
Factor. EBV status of the tumour (
"Positive"or"Negative").- time
Numeric. Overall survival or follow-up time, typically measured in months.
- OS_status
Integer/binary. Overall survival status indicator. (
1= death,0= censored)
Examples
data(pdata_stad)
head(pdata_stad)
Create a Percent Bar Plot
Description
Generates a bar plot visualizing the percentage distribution of a variable grouped by another variable.
Usage
percent_bar_plot(
input,
x,
y,
subset.x = NULL,
color = NULL,
palette = NULL,
title = NULL,
axis_angle = 0,
coord_flip = FALSE,
add_Freq = TRUE,
Freq = NULL,
size_freq = 8,
legend.size = 0.5,
legend.size.text = 10,
add_sum = TRUE,
print_result = TRUE,
round.num = 2
)
Arguments
input |
Input data frame. |
x |
Name of the x-axis variable. |
y |
Name of the y-axis (grouping) variable. |
subset.x |
Optional subset of x-axis values. |
color |
Optional color palette. |
palette |
Optional palette type. |
title |
Optional plot title. |
axis_angle |
Angle for axis labels (0-90). Default is 0. |
coord_flip |
Logical to flip coordinates. Default is FALSE. |
add_Freq |
Logical to add frequency count. Default is TRUE. |
Freq |
Name of frequency column. |
size_freq |
Size of frequency labels. Default is 8. |
legend.size |
Size of legend. Default is 0.5. |
legend.size.text |
Size of legend text. Default is 10. |
add_sum |
Logical to add sum to x-axis labels. Default is TRUE. |
print_result |
Logical to print result data frame. Default is TRUE. |
round.num |
Decimal places for proportion. Default is 2. |
Value
A ggplot object.
Author(s)
Dongqiang Zeng
Examples
sig_stad <- load_data("sig_stad")
percent_bar_plot(
input = sig_stad, x = "Subtype", y = "Lauren",
axis_angle = 60
)
Create Pie or Donut Charts
Description
Generates a pie chart or donut chart from input data.
Usage
pie_chart(
input,
var,
var2 = NULL,
type = 2,
show_freq = FALSE,
color = NULL,
palette = "jama",
title = NULL,
text_size = 10,
title_size = 20,
add_sum = FALSE
)
Arguments
input |
Input dataframe. |
var |
Variable for the chart. |
var2 |
Secondary variable for donut chart (type = 3). |
type |
Chart type: 1 (pie), 2 (donut), 3 (PieDonut via webr). |
show_freq |
Logical to show frequencies. Default is FALSE. |
color |
Optional color palette. |
palette |
Color palette name. Default is "jama". |
title |
Plot title. Default is NULL. |
text_size |
Text size. Default is 10. |
title_size |
Title size. Default is 20. |
add_sum |
Logical to add sum to labels. Default is FALSE. |
Value
A ggplot object.
Author(s)
Dongqiang Zeng
Examples
sig_stad <- load_data("sig_stad")
pie_chart(input = sig_stad, var = "Subtype", palette = "jama")
pie_chart(input = sig_stad, var = "Subtype", type = 2)
plotPurity
Description
This function generates scatterplots of tumor purity based on ESTIMATE scores for given samples.
Usage
plotPurity(
scores,
samples = "all_samples",
platform = c("affymetrix", "agilent", "illumina"),
output.dir = NULL
)
Arguments
scores |
A character string specifying the path to the input file containing ESTIMATE scores. The file should be a tab-separated table with appropriate headers. |
samples |
A character vector specifying the sample names to plot. The default is "all_samples", which plots all samples in the input file. |
platform |
A character string specifying the platform used for data collection. Can be "affymetrix", "agilent", or "illumina". Currently, only "affymetrix" is implemented. |
output.dir |
A character string specifying the directory to save the output plots. If 'NULL', plots are not saved. Default is 'NULL'. |
Value
No return value. The function generates and saves scatterplots in the specified output directory.
Examples
# Create a sample ESTIMATE score matrix
scores_data <- data.frame(
Sample1 = c(100, 200, 500, 0.80),
Sample2 = c(120, 220, 450, 0.70),
Sample3 = c(140, 240, 600, 0.90),
row.names = c(
"StromalScore", "ImmuneScore", "ESTIMATEScore",
"TumorPurity"
),
check.names = FALSE
)
# Write to a temporary GCT file
scores_file <- tempfile(fileext = ".gct")
outputGCT(scores_data, scores_file)
Run quanTIseq Deconvolution
Description
Run quanTIseq Deconvolution
Usage
quanTIseq(currsig, currmix, scaling, method)
Arguments
currsig |
Signature matrix. |
currmix |
Mixture matrix. |
scaling |
Scaling factors. |
method |
Deconvolution method: "lsei", "hampel", "huber", or "bisquare". |
Value
Deconvolution results matrix.
Helper Functions for quanTIseq
Description
Helper functions for quanTIseq deconvolution method. Source code adapted from https://github.com/FFinotello/quanTIseq.
Value
Internal functions return processed data for quanTIseq deconvolution.
Stratified Random Sampling of Cells
Description
Performs stratified random sampling of cells from single-cell data, ensuring proportional representation of each cell type while respecting minimum and maximum count constraints.
Usage
random_strata_cells(
input,
group,
proportion = 0.1,
minimum_count_include = 300,
minimum_count = 200,
maximum_count = 1000,
sub_cluster = NULL,
cell_type = NULL
)
Arguments
input |
A data frame or Seurat object containing cell annotations. |
group |
Character string specifying the column name for cell type grouping. |
proportion |
Numeric value between 0 and 1 specifying the sampling proportion. Default is 0.1. |
minimum_count_include |
Integer specifying the minimum count threshold for a cell type to be included in sampling. Default is 300. |
minimum_count |
Integer specifying the minimum number of cells to sample per cell type. Default is 200. |
maximum_count |
Integer specifying the maximum number of cells to sample per cell type. Default is 1000. |
sub_cluster |
Optional character string specifying a sub-cluster column for filtering. Default is NULL. |
cell_type |
Optional character string specifying the cell type value to filter when 'sub_cluster' is provided. Default is NULL. |
Value
A data frame containing the sampled cells with preserved cell type proportions.
Examples
## Not run:
# Sample cells from a data frame
sampled_cells <- random_strata_cells(
input = cell_annotations,
group = "cell_type",
proportion = 0.1,
minimum_count_include = 300,
minimum_count = 200,
maximum_count = 1000
)
# Sample cells from a Seurat object
sampled_cells <- random_strata_cells(
input = seurat_object,
group = "seurat_clusters",
proportion = 0.2
)
## End(Not run)
Calculate Raw xCell Enrichment Scores
Description
Returns the raw xCell cell types enrichment scores using ssGSEA.
Usage
rawEnrichmentAnalysis(expr, signatures, genes, file.name = NULL)
Arguments
expr |
Gene expression data matrix with row names as gene symbols. |
signatures |
GMT object of signatures. |
genes |
Character vector of genes to use. |
file.name |
Character string for saving scores. Default is 'NULL'. |
Value
Matrix of raw xCell scores.
Row Bind Multiple Data Sets
Description
Combines two or three data frames or matrices vertically using 'rbind'. Ensures compatibility of input data before binding by aligning columns.
Usage
rbind_iobr(data1, data2, data3 = NULL)
Arguments
data1 |
Data frame or matrix. First dataset to combine. |
data2 |
Data frame or matrix. Second dataset to combine. |
data3 |
Data frame or matrix or 'NULL'. Optional third dataset. Default is 'NULL'. |
Value
Combined data frame resulting from row binding the input datasets.
Author(s)
Dongqiang Zeng
Examples
data1 <- data.frame(A = 1:5, B = letters[1:5])
data2 <- data.frame(A = 6:10, B = letters[6:10])
combined_data <- rbind_iobr(data1, data2)
# With three datasets
data3 <- data.frame(A = 11:15, B = letters[11:15])
combined_data <- rbind_iobr(data1, data2, data3)
Removing Batch Effect from Expression Sets
Description
Removes batch effects from expression datasets using sva::ComBat (for microarray/TPM data) or sva::ComBat_seq (for RNA-seq count data). Generates PCA plots to visualize data before and after correction.
Usage
remove_batcheffect(
eset1,
eset2,
eset3 = NULL,
id_type,
data_type = c("array", "count", "tpm"),
cols = "normal",
palette = "jama",
log2 = TRUE,
check_eset = TRUE,
adjust_eset = TRUE,
repel = FALSE,
path = NULL
)
Arguments
eset1 |
First expression set (matrix or data frame with genes as rows). |
eset2 |
Second expression set. |
eset3 |
Optional third expression set. Use 'NULL' if not available. |
id_type |
Type of gene ID in expression sets (e.g., '"ensembl"', '"symbol"'). Required for count data normalization. |
data_type |
Type of data: '"array"', '"count"', or '"tpm"'. Default is '"array"'. |
cols |
Color scale for PCA plot. Default is '"normal"'. |
palette |
Color palette for PCA plot. Default is '"jama"'. |
log2 |
Whether to perform log2 transformation. Default is 'TRUE'. Ignored for count data. |
check_eset |
Whether to check expression sets for errors. Default is 'TRUE'. |
adjust_eset |
Whether to adjust expression sets by removing problematic features. Default is 'TRUE'. |
repel |
Whether to add repelling labels to PCA plot. Default is 'FALSE'. |
path |
Directory where results should be saved. Default is 'NULL' (display only). |
Value
Expression matrix after batch correction.
Author(s)
Dongqiang Zeng
References
Zhang Y, et al. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genomics and Bioinformatics. 2020;2(3):lqaa078. doi:10.1093/nargab/lqaa078
Leek JT, et al. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882-883.
Examples
eset_stad <- load_data("eset_stad")
eset_blca <- load_data("eset_blca")
eset_corrected <- remove_batcheffect(
eset_stad[1:1000, 1:5], eset_blca[1:1000, 1:5],
id_type = "ensembl",
data_type = "count"
)
Remove Duplicate Gene Symbols in Gene Expression Data
Description
This function addresses duplicate gene symbols in a gene expression dataset by selecting the highest-expressing instance among duplicates. Users can choose between mean, standard deviation, or sum as the ranking criterion for selection. This is useful for preparing data where duplicates can lead to issues in downstream analyses.
Usage
remove_duplicate_genes(eset, column_of_symbol, method = c("mean", "sd", "sum"))
Arguments
eset |
A data frame or matrix representing gene expression data, with gene symbols as one of the columns. |
column_of_symbol |
The name of the column containing gene symbols in 'eset'. |
method |
The ranking method to use for selecting among duplicate gene symbols: '"mean"' for mean expression, '"sd"' for standard deviation, or '"sum"' for sum of expression values. Default is '"mean"'. |
Value
A modified version of 'eset' where duplicate gene symbols have been reduced to a single entry (the highest-ranking one). The gene symbols are set as row names in the returned data frame.
Note
Important: This function performs selection, not aggregation. For duplicate genes, it retains only the highest-ranking instance (based on the specified method) and discards others.
Author(s)
Dongqiang Zeng
Examples
# Load and annotate expression data
eset_stad <- load_data("eset_stad")
anno_rnaseq <- load_data("anno_rnaseq")
eset_stad <- anno_eset(eset = eset_stad, annotation = anno_rnaseq)
eset_stad <- tibble::rownames_to_column(as.data.frame(eset_stad), var = "id")
# Create duplicate gene names for demonstration
eset_stad[2:3, "id"] <- "MT-CO1"
# Check duplicates before
sum(duplicated(eset_stad$id))
# Remove duplicates using mean expression as ranking criterion
eset_stad <- remove_duplicate_genes(
eset = eset_stad,
column_of_symbol = "id",
method = "mean"
)
# Check duplicates after
sum(duplicated(rownames(eset_stad)))
Remove Patterns from Column Names or Variables
Description
Modifies column names or specified variables in a data frame by replacing specified patterns with empty strings or spaces.
Usage
remove_names(
input_df,
variable = "colnames",
patterns_to_na = patterns_to_na,
patterns_space = NULL
)
Arguments
input_df |
Data frame. Input data to modify. |
variable |
Character. Column to modify: "colnames" for column names, or a specific column name. Default is "colnames". |
patterns_to_na |
Character vector. Patterns to replace with empty string. Default uses [patterns_to_na]. |
patterns_space |
Character vector or 'NULL'. Patterns to replace with spaces. Default is 'NULL'. |
Value
Modified data frame with patterns replaced.
Author(s)
Dongqiang Zeng
Examples
df <- data.frame(
"CellA_cibersort" = 1:5,
"CellB_xCell" = 6:10,
"CellC_TIMER" = 11:15
)
result <- remove_names(df, variable = "colnames", patterns_to_na = patterns_to_na)
colnames(result)
Reset Download Mirrors to Default
Description
Resets the download mirror list to the default values.
Usage
reset_iobr_mirrors()
Value
Invisibly returns the default mirror list.
Examples
reset_iobr_mirrors()
Time-dependent ROC Curve for Survival Analysis
Description
Generates a time-dependent Receiver Operating Characteristic (ROC) plot to evaluate the predictive performance of one or more variables in survival analysis. Calculates the Area Under the Curve (AUC) for each specified time point and variable, and creates a multi-line ROC plot with annotated AUC values.
Usage
roc_time(
input,
vars,
time = "time",
status = "status",
time_point = 12,
time_type = "month",
palette = "jama",
cols = "normal",
seed = 1234,
show_col = FALSE,
path = NULL,
main = "PFS",
index = 1,
fig.type = "pdf",
width = 5,
height = 5.2
)
Arguments
input |
Data frame containing variables for analysis. |
vars |
Character vector. Variable(s) to be evaluated. |
time |
Character string. Name of the time variable. Default is '"time"'. |
status |
Character string. Name of the status variable. Default is '"status"'. |
time_point |
Integer or vector. Time point(s) for ROC analysis. Default is '12'. |
time_type |
Character string. Time unit ('"day"' or '"month"'). Default is '"month"'. |
palette |
Character string. Color palette for the plot. Default is '"jama"'. |
cols |
Character vector or string. Color specification: '"normal"', '"random"', or custom color vector. |
seed |
Integer. Random seed for reproducibility. Default is '1234'. |
show_col |
Logical indicating whether to display the color palette. Default is 'FALSE'. |
path |
Character string or 'NULL'. Path to save the plot. Default is 'NULL'. |
main |
Character string. Main title of the plot. Default is '"PFS"'. |
index |
Integer. Index for plot filename. Default is '1'. |
fig.type |
Character string. Output file type (e.g., '"pdf"', '"png"'). Default is '"pdf"'. |
width |
Numeric. Width of the plot. Default is '5'. |
height |
Numeric. Height of the plot. Default is '5.2'. |
Value
A ggplot object representing the time-dependent ROC plot.
Author(s)
Dongqiang Zeng
Examples
tcga_stad_sig <- load_data("tcga_stad_sig")
pdata_stad <- load_data("pdata_stad")
input <- merge(pdata_stad, tcga_stad_sig, by = "ID")
roc_time(
input = input, vars = c("Pan_F_TBRs", "CD_8_T_effector", "Immune_Checkpoint"),
time = "time", status = "OS_status", time_point = 12, path = NULL, main = "OS"
)
Scaling raw counts from each sample.
Description
Normalizing the sum of counts from each sample to 1e6.
Usage
scaleCounts(counts, sigGenes = NULL, renormGenes = NULL, normFact = NULL)
Details
Function taking a matrix (genes x samples) of counts as input
and returning the scaled counts for the subset signature genes (or all genes
if it sigGenes is NULL), with the scaled counts computed based on all
the 'renormGenes' (or all genes if it is NULL). The renormalization is made
independently for each sample, to have the sum of each columns over the
renormGenes equal to 1e6.
normFact, if not null, is used as the normalization factor instead of the colSums (used to renormalize the refProfiles.var by the same amount than the refProfiles).
Scale and Manipulate a Matrix
Description
Scales a gene expression matrix, optionally applies logarithmic transformation, and performs feature manipulation to remove problematic variables (e.g., genes with zero variance or missing values).
Usage
scale_matrix(matrix, log2matrix = TRUE, manipulate = TRUE)
Arguments
matrix |
Numeric matrix with genes as rows and samples as columns. |
log2matrix |
Logical indicating whether to apply log2 transformation using [log2eset()]. Default is 'TRUE'. |
manipulate |
Logical indicating whether to perform feature manipulation to remove problematic features. Default is 'TRUE'. |
Value
A scaled matrix (genes as rows, samples as columns).
Author(s)
Dongqiang Zeng
Examples
eset_gse62254 <- load_data("eset_gse62254")
eset2 <- scale_matrix(eset_gse62254, log2matrix = FALSE, manipulate = TRUE)
Select a Signature Scoring Method Subset
Description
Filters an integrated signature score matrix to retain results from a specified method (PCA, ssGSEA, or zscore) and strips method suffixes from column names.
Usage
select_method(data, method = c("ssGSEA", "PCA", "zscore"))
Arguments
data |
Data frame or matrix. Integrated signature score matrix. |
method |
Character. One of "PCA", "ssGSEA", or "zscore" (case-insensitive). Default is "ssGSEA". |
Value
Matrix or data frame containing only the selected method's scores.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
anno_grch38 <- load_data("anno_grch38")
hallmark <- load_data("hallmark")
eset <- anno_eset(eset = eset_stad, annotation = anno_grch38, probe = "id")
eset <- eset[1:5000, 1:10]
res <- calculate_sig_score(
eset = eset,
signature = hallmark[1:4],
method = "integration"
)
select_method(res, method = "PCA")
Calculate Signature Score Using PCA, Mean, or Z-score Methods
Description
Computes signature scores from gene expression data using Principal Component Analysis (PCA), mean-based, or z-score approaches.
Usage
sigScore(eset, methods = c("PCA", "mean", "zscore"))
Arguments
eset |
Normalized expression matrix with genes (signature) as rows and samples as columns. |
methods |
Scoring method: '"PCA"' (default) for principal component 1, '"mean"' for mean expression, or '"zscore"' for z-score normalized mean. |
Value
Numeric vector of length 'ncol(eset)'; a score summarizing the rows of 'eset'.
Author(s)
Dorothee Nickles, Dongqiang Zeng
Examples
# Load example data
eset_stad <- load_data("eset_stad")
eset <- count2tpm(eset_stad, idType = "ensembl")
# Get signature genes
signature_tme <- load_data("signature_tme")
genes <- signature_tme[["CD_8_T_effector"]]
genes <- genes[genes %in% rownames(eset)]
# Calculate scores (only if enough genes are available)
if (length(genes) >= 2) {
score_pca <- sigScore(eset = eset[genes, ], methods = "PCA")
score_mean <- sigScore(eset = eset[genes, ], methods = "mean")
score_zscore <- sigScore(eset = eset[genes, ], methods = "zscore")
}
Signature Box Plot with Statistical Comparisons
Description
Creates box plots to visualize signature distributions across groups with optional statistical pairwise comparisons. Supports both data frames and Seurat objects for single-cell data visualization.
Usage
sig_box(
data,
signature,
variable,
palette = "nrc",
cols = NULL,
jitter = FALSE,
point_size = 5,
angle_x_text = 0,
hjust = 0.5,
show_pairwise_p = TRUE,
show_overall_p = FALSE,
return_stat_res = FALSE,
size_of_pvalue = 6,
size_of_font = 10,
assay = NULL,
slot = "scale.data",
scale = FALSE
)
Arguments
data |
Data frame or Seurat object containing the signature and grouping variable. |
signature |
Character string specifying the column name (or feature name in Seurat) for the signature values to plot on the y-axis. |
variable |
Character string specifying the grouping variable column name for the x-axis. |
palette |
Character string specifying the color palette name. Default is '"nrc"'. |
cols |
Character vector of custom fill colors. If 'NULL', palette is used. Default is 'NULL'. |
jitter |
Logical indicating whether to add jittered points to the box plot. Default is 'FALSE'. |
point_size |
Numeric value specifying the size of jittered points. Default is 5. |
angle_x_text |
Numeric value specifying the rotation angle for x-axis labels (in degrees). Default is 0. |
hjust |
Numeric value specifying the horizontal justification of x-axis labels. Default is 0.5. |
show_pairwise_p |
Logical indicating whether to display pairwise comparison p-values between groups. Default is 'TRUE'. |
show_overall_p |
Logical indicating whether to display the overall group difference p-value. Default is 'FALSE'. |
return_stat_res |
Logical indicating whether to return statistical test results instead of the plot. Default is 'FALSE'. |
size_of_pvalue |
Numeric value specifying the font size for p-values. Default is 6. |
size_of_font |
Numeric value specifying the base font size. Default is 10. |
assay |
Character string specifying the assay name (for Seurat objects). Default is 'NULL'. |
slot |
Character string specifying the slot name (for Seurat objects). Default is '"scale.data"'. |
scale |
Logical indicating whether to scale signature values (z-score transformation). Default is 'FALSE'. |
Value
If 'return_stat_res = FALSE', returns a ggplot2 object. If 'return_stat_res = TRUE', returns a data frame containing statistical test results.
Author(s)
Dongqiang Zeng
Examples
tcga_stad_pdata <- load_data("tcga_stad_pdata")
sig_box(
data = tcga_stad_pdata,
signature = "TMEscore_plus",
variable = "subtype",
jitter = TRUE,
palette = "jco"
)
Batch Signature Box Plots for Group Comparisons
Description
Generates multiple box plots for specified features (signatures) across groups in the input data. Supports customization of plot appearance, output path, statistical annotation, and compatibility with Seurat objects. Plots are saved to the specified directory or a default folder.
Usage
sig_box_batch(
input,
vars,
groups,
pattern_vars = FALSE,
path = NULL,
index = 0,
angle_x_text = 0,
hjust = 0.5,
palette = "jama",
cols = NULL,
jitter = FALSE,
point_size = 5,
size_of_font = 8,
size_of_pvalue = 4.5,
show_pvalue = TRUE,
return_stat_res = FALSE,
assay = NULL,
slot = "scale.data",
scale = FALSE,
height = 5,
width = 3.5,
fig_type = "pdf",
max_count_feas = 30
)
Arguments
input |
Data frame or Seurat object containing the data for analysis. |
vars |
Character vector. Features or variables to analyze. When 'pattern_vars = TRUE', these are treated as regular expression patterns. |
groups |
Character vector. Grouping variable(s) for comparison. |
pattern_vars |
Logical indicating whether to treat 'vars' as regular expression patterns for matching column names. Default is 'FALSE'. |
path |
Character string or 'NULL'. Directory to save plots. If 'NULL', plots are not saved. Default is 'NULL'. |
index |
Integer. Starting index for plot filenames. Default is '0'. |
angle_x_text |
Numeric. Angle of x-axis labels in degrees. Default is '0'. |
hjust |
Numeric. Horizontal justification of x-axis labels. Default is '0.5'. |
palette |
Character. Color palette for plots. Default is '"jama"'. |
cols |
Character vector or 'NULL'. Custom colors for plot elements. |
jitter |
Logical indicating whether to add jittered points. Default is 'FALSE'. |
point_size |
Numeric. Size of points. Default is '5'. |
size_of_font |
Numeric. Base font size. Default is '8'. |
size_of_pvalue |
Numeric. Size of p-value text. Default is '4.5'. |
show_pvalue |
Logical indicating whether to display p-values. Default is 'TRUE'. |
return_stat_res |
Logical indicating whether to return statistical results instead of saving plots. Default is 'FALSE'. |
assay |
Character string or 'NULL'. Assay type for Seurat objects. |
slot |
Character. Data slot for Seurat objects. Default is '"scale.data"'. |
scale |
Logical indicating whether to scale data before analysis. Default is 'FALSE'. |
height |
Numeric. Height of plots in inches. Default is '5'. |
width |
Numeric. Width of plots in inches. Default is '3.5'. |
fig_type |
Character. File format for plots (e.g., '"pdf"', '"png"'). Default is '"pdf"'. |
max_count_feas |
Integer. Maximum number of features to analyze when 'pattern_vars = TRUE'. If matched variables exceed this limit, only the first 'max_count_feas' features are used. Default is '30'. |
Value
If 'return_stat_res = TRUE', returns a data frame of statistical results; otherwise, invisibly returns the path to saved plots.
Author(s)
Dongqiang Zeng
Examples
tcga_stad_pdata <- load_data("tcga_stad_pdata")
sig_box_batch(
input = tcga_stad_pdata,
vars = c("TMEscore_plus", "GZMB"),
groups = "subtype",
jitter = TRUE,
palette = "jco",
path = tempdir()
)
Forest Plot for Survival Analysis Results
Description
Generates a forest plot to visualize hazard ratios, confidence intervals, and p-values for gene signatures or features from survival analysis.
Usage
sig_forest(
data,
signature,
pvalue = "P",
HR = "HR",
CI_low_0.95 = "CI_low_0.95",
CI_up_0.95 = "CI_up_0.95",
n = 10,
max_character = 25,
discrete_width = 35,
color_option = 1,
cols = NULL,
text.size = 13
)
Arguments
data |
Data frame with survival analysis results including p-values, hazard ratios, and confidence intervals. |
signature |
Character string. Column name for signatures or feature names. |
pvalue |
Character string. Column name for p-values. Default is '"P"'. |
HR |
Character string. Column name for hazard ratios. Default is '"HR"'. |
CI_low_0.95 |
Character string. Column name for lower CI bound. Default is '"CI_low_0.95"'. |
CI_up_0.95 |
Character string. Column name for upper CI bound. Default is '"CI_up_0.95"'. |
n |
Integer. Maximum number of signatures to display. Default is '10'. |
max_character |
Integer. Maximum characters for labels before wrapping. Default is '25'. |
discrete_width |
Integer. Width for discretizing long labels. Default is '35'. |
color_option |
Integer. Color option for p-value gradient (1, 2, or 3). Default is '1'. |
cols |
Character vector. Custom colors for p-value gradient (low to high). Default is 'NULL'. |
text.size |
Numeric. Text size for y-axis labels. Default is '13'. |
Value
A ggplot2 object of the forest plot.
Author(s)
Dongqiang Zeng
Examples
# Example with sample survival results
sample_results <- data.frame(
ID = c("Sig1", "Sig2", "Sig3"),
HR = c(1.5, 0.8, 2.0),
P = c(0.01, 0.05, 0.001),
CI_low_0.95 = c(1.1, 0.6, 1.5),
CI_up_0.95 = c(2.0, 1.0, 2.8)
)
sig_forest(data = sample_results, signature = "ID")
Grouped gene signatures for IOBR analysis
Description
A named list that organizes gene signatures into functional or biological
categories. Each element of the list is a character vector containing the
names of gene signatures defined in signature_collection. A total of
43 signature groups are included, covering tumour intrinsic pathways,
immune-related processes, stromal activity, TME characteristics and
immuno-oncology biomarkers. These groups are used in IOBR to conveniently
select sets of signatures for scoring and visualization.
Usage
data(sig_group)
Format
A named list of length 43. Each element is a character vector of signature names. Representative groups include:
- tumor_signature
Signatures related to intrinsic tumour biology such as cell cycle, DNA damage repair and histone regulation.
- EMT
Epithelial–mesenchymal transition (EMT)–associated signatures.
- io_biomarkers
Immuno-oncology biomarker–related signatures.
- immu_microenvironment
Immune microenvironment–related signatures.
- immu_suppression
Immune suppression–related signatures.
- immu_exclusion
Signatures associated with immune exclusion and stromal barriers.
- TCR_BCR
T-cell and B-cell receptor pathway signatures.
- tme_signatures1
Tumour microenvironment signature panel (set 1).
- tme_signatures2
Tumour microenvironment signature panel (set 2).
- ...
Additional groups are included (43 total), but not listed individually here; all groups follow the same structure.
Examples
data(sig_group)
head(sig_group)
Perform Gene Set Enrichment Analysis (GSEA)
Description
Conducts Gene Set Enrichment Analysis to identify significantly enriched gene sets from differential gene expression data. Supports MSigDB gene sets or custom gene signatures, and generates comprehensive visualizations and statistical results.
Usage
sig_gsea(
deg,
genesets = NULL,
path = NULL,
gene_symbol = "symbol",
logfc = "log2FoldChange",
org = c("hsa", "mus"),
msigdb = TRUE,
category = "H",
subcategory = NULL,
palette_bar = "jama",
palette_gsea = 2,
cols_gsea = NULL,
cols_bar = NULL,
show_bar = 10,
show_col = FALSE,
show_plot = FALSE,
show_gsea = 8,
show_path_n = 20,
plot_single_sig = FALSE,
project = "custom_sig",
minGSSize = 10,
maxGSSize = 500,
verbose = TRUE,
seed = FALSE,
fig.type = "pdf",
print_bar = TRUE
)
Arguments
deg |
Data frame containing differential expression results with gene symbols and log fold changes. |
genesets |
List of custom gene sets for enrichment analysis. If 'NULL', MSigDB gene sets are used based on 'org' and 'category'. Default is 'NULL'. |
path |
Character string specifying the directory path for saving results. Default is 'NULL'. |
gene_symbol |
Character string specifying the column name in 'deg' containing gene symbols. Default is '"symbol"'. |
logfc |
Character string specifying the column name in 'deg' containing log fold change values. Default is '"log2FoldChange"'. |
org |
Character string specifying the organism. Options are '"hsa"' (Homo sapiens) or '"mus"' (Mus musculus). Default is '"hsa"'. |
msigdb |
Logical indicating whether to use MSigDB gene sets. Default is 'TRUE'. |
category |
Character string specifying the MSigDB category (e.g., '"H"' for Hallmark, '"C2"' for curated gene sets). Default is '"H"'. |
subcategory |
Character string specifying the MSigDB subcategory to filter gene sets. Default is 'NULL'. |
palette_bar |
Character string or integer specifying the color palette for bar plots. Default is '"jama"'. |
palette_gsea |
Integer specifying the color palette for GSEA plots. Default is '2'. |
cols_gsea |
Character vector specifying custom colors for GSEA enrichment plots. If 'NULL', colors are automatically generated. Default is 'NULL'. |
cols_bar |
Character vector specifying custom colors for the enrichment bar plot. If 'NULL', colors are automatically generated. Default is 'NULL'. |
show_bar |
Integer specifying the number of top enriched gene sets to display in the bar plot. Default is '10'. |
show_col |
Logical indicating whether to display color names in the bar plot. Default is 'FALSE'. |
show_plot |
Logical indicating whether to display GSEA enrichment plots. Default is 'FALSE'. |
show_gsea |
Integer specifying the number of top significant gene sets for which to generate GSEA plots. Default is '8'. |
show_path_n |
Integer specifying the number of pathways to display in GSEA plots. Default is '20'. |
plot_single_sig |
Logical indicating whether to generate separate plots for each significant gene set. Default is 'TRUE'. |
project |
Character string specifying the project name for output files. Default is '"custom_sig"'. |
minGSSize |
Integer specifying the minimum gene set size for analysis. Default is '10'. |
maxGSSize |
Integer specifying the maximum gene set size for analysis. Default is '500'. |
verbose |
Logical indicating whether to display progress messages. Default is 'TRUE'. |
seed |
Logical indicating whether to set a random seed for reproducibility. Default is 'FALSE'. |
fig.type |
Character string specifying the file format for saving plots (e.g., '"pdf"', '"png"'). Default is '"pdf"'. |
print_bar |
Logical indicating whether to save and print the bar plot. Default is 'TRUE'. |
Value
List containing:
- up
Data frame of up-regulated enriched gene sets
- down
Data frame of down-regulated enriched gene sets
- all
Complete GSEA results
- plot_top
GSEA enrichment plot for top gene sets
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
genes <- c(
"TP53", "BRCA1", "EGFR", "MYC", "KRAS", "PTEN", "APC", "RB1",
"CDKN2A", "VHL", "ATM", "ATR", "CHEK2", "PALB2", "RAD51", "MDM2",
"CDK4", "CDK6", "CCND1", "CCNE1", "CDK2", "E2F1", "E2F2", "E2F3",
"ARF1", "ARF3", "ARF4", "ARF5", "ARF6", "GSK3B", "AKT1", "AKT2",
"PIK3CA", "PIK3CB", "PIK3CD", "PIK3CG", "PIK3R1", "PIK3R2", "PIK3R3"
)
deg <- data.frame(
symbol = genes,
log2FoldChange = rnorm(length(genes), mean = 0, sd = 2),
padj = runif(length(genes), 0, 0.1)
)
signature <- list(
DNA_Repair = c(
"TP53", "BRCA1", "ATM",
"ATR", "CHEK2", "PALB2", "RAD51"
),
Cell_Cycle = c(
"TP53", "MYC",
"RB1", "CDKN2A", "CDK4",
"CDK6", "CCND1", "CCNE1",
"CDK2", "E2F1", "E2F2", "E2F3"
),
PI3K_AKT = c(
"AKT1", "AKT2",
"PIK3CA", "PIK3CB", "PIK3CD",
"PIK3CG", "PIK3R1", "PIK3R2", "PIK3R3"
)
)
res <- sig_gsea(
deg = deg,
genesets = signature,
path = tempdir(),
show_plot = FALSE,
print_bar = FALSE
)
print(names(res))
Signature Heatmap with Optional Annotations
Description
Generates a heatmap of selected features grouped by a categorical variable, with optional conditional (annotation) bars. Supports palette customization, scaling, size controls, and output saving.
Usage
sig_heatmap(
input,
id = "ID",
features,
group,
condition = NULL,
id_condition = "vars",
col_condition = "condition",
cols_condition = NULL,
scale = FALSE,
palette = 2,
cols_heatmap = NULL,
palette_group = "jama",
show_col = FALSE,
show_palettes = FALSE,
cols_group = NULL,
show_plot = TRUE,
width = 8,
height = NULL,
size_col = 10,
size_row = 8,
angle_col = 90,
column_title = NULL,
row_title = NULL,
show_heatmap_col_name = FALSE,
path = NULL,
index = NULL
)
Arguments
input |
Data frame containing ID, grouping variable, and feature columns. |
id |
Character string. Column name for sample identifier. Default is '"ID"'. |
features |
Character vector. Feature (column) names to include in the heatmap. |
group |
Character string. Grouping variable column name. |
condition |
Data frame or 'NULL'. Optional annotation table with variable-condition mapping. Default is 'NULL'. |
id_condition |
Character string. Column name in 'condition' for feature IDs. Default is '"vars"'. |
col_condition |
Character string. Column name in 'condition' for condition labels. Default is '"condition"'. |
cols_condition |
Character vector. Colors for conditions. |
scale |
Logical indicating whether to scale values by row. Default is 'FALSE'. |
palette |
Integer or character. Palette index/name for heatmap colors. Default is '2'. |
cols_heatmap |
Character vector. Custom colors for heatmap gradient. |
palette_group |
Character string. Palette name for group colors. Default is '"jama"'. |
show_col |
Logical indicating whether to display the color vector. Default is 'FALSE'. |
show_palettes |
Logical indicating whether to print palette options. Default is 'FALSE'. |
cols_group |
Character vector. Custom colors for groups. |
show_plot |
Logical indicating whether to print the heatmap. Default is 'TRUE'. |
width |
Numeric. Plot width in inches. Default is '8'. |
height |
Numeric or 'NULL'. Plot height in inches. Auto-calculated if 'NULL'. |
size_col |
Numeric. Font size for column labels. Default is '10'. |
size_row |
Numeric. Font size for row labels. Default is '8'. |
angle_col |
Numeric. Rotation angle for column labels in degrees. Default is '90'. |
column_title |
Character string or 'NULL'. Title for column annotation. |
row_title |
Character string or 'NULL'. Title for row annotation. |
show_heatmap_col_name |
Logical indicating whether to show column names. Default is 'FALSE'. |
path |
Character string or 'NULL'. Output directory for saving the heatmap. |
index |
Integer or 'NULL'. Index appended to filename. Default is 'NULL'. |
Value
A tidyHeatmap object. Saves PDF only when 'path' is provided.
Author(s)
Dongqiang Zeng
Examples
tcga_stad_sig <- load_data("tcga_stad_sig")
tcga_stad_pdata <- load_data("tcga_stad_pdata")
input <- merge(tcga_stad_pdata, tcga_stad_sig, by = "ID")
feas <- grep("MCPcounter", colnames(input), value = TRUE)
sig_heatmap(input = input, features = feas, group = "subtype", scale = TRUE)
Generate Heatmap for Signature Data
Description
Creates a heatmap from signature data with grouping variables, offering flexible options for colors, clustering, and output formats using ComplexHeatmap.
Usage
sig_pheatmap(
input,
feas,
group,
group2 = NULL,
group3 = NULL,
ID = "ID",
path = NULL,
cols1 = "random",
cols2 = "random",
cols3 = "random",
seed = 54321,
show_col = FALSE,
palette1 = 1,
palette2 = 2,
palette3 = 3,
cluster_cols = TRUE,
palette_for_heatmape = 6,
scale.matrix = TRUE,
cellwidth = 1,
cellheight = 9,
show_colnames = FALSE,
fig.type = "pdf",
width = 6,
height = NULL,
file_name_prefix = 1
)
Arguments
input |
Data frame with variables in columns. |
feas |
Character vector. Feature names (columns) to include in heatmap. |
group |
Character string. Column name for primary grouping variable. |
group2 |
Character string or 'NULL'. Optional secondary grouping variable. |
group3 |
Character string or 'NULL'. Optional tertiary grouping variable. |
ID |
Character string. Column name for sample identifiers. Default is '"ID"'. |
path |
Character string or 'NULL'. Directory to save output files. If 'NULL', the heatmap is not saved. Default is 'NULL'. |
cols1 |
Character vector or '"random"' or '"normal"'. Colors for primary group. Default is '"random"'. |
cols2 |
Character vector or '"random"' or '"normal"'. Colors for secondary group. Default is '"random"'. |
cols3 |
Character vector or '"random"' or '"normal"'. Colors for tertiary group. Default is '"random"'. |
seed |
Integer. Random seed for color generation. Default is '54321'. |
show_col |
Logical indicating whether to display colors. Default is 'FALSE'. |
palette1 |
Integer. Palette for primary group. Default is '1'. |
palette2 |
Integer. Palette for secondary group. Default is '2'. |
palette3 |
Integer. Palette for tertiary group. Default is '3'. |
cluster_cols |
Logical indicating whether to cluster columns. Default is 'TRUE'. |
palette_for_heatmape |
Integer. Palette number for heatmap. Default is '6'. |
scale.matrix |
Logical indicating whether to scale the matrix. Default is 'TRUE'. |
cellwidth |
Numeric. Width of each cell in points. Default is '1'. |
cellheight |
Numeric. Height of each cell in points. Default is '9'. |
show_colnames |
Logical indicating whether to show column names. Default is 'FALSE'. |
fig.type |
Character string. File format for saving. Default is '"pdf"'. |
width |
Numeric. Width of saved figure in inches. Default is '6'. |
height |
Numeric or 'NULL'. Height of saved figure in inches. Calculated if 'NULL'. |
file_name_prefix |
Character or numeric. Prefix for saved file name. Default is '1'. |
Value
A list containing:
- p_anno
Annotation data frame
- p_cols
List of cluster colors
- plot
ComplexHeatmap object
- eset
Transformed expression matrix
Author(s)
Dongqiang Zeng
Examples
tcga_stad_sig <- load_data("tcga_stad_sig")
tcga_stad_pdata <- load_data("tcga_stad_pdata")
input <- merge(tcga_stad_pdata, tcga_stad_sig, by = "ID")
feas <- grep("MCPcounter", colnames(input), value = TRUE)
sig_pheatmap(
input = input, feas = feas, group = "subtype",
scale.matrix = TRUE, path = tempdir()
)
Plot ROC Curves and Compare Them
Description
Generates Receiver Operating Characteristic (ROC) curves for multiple predictors and optionally performs statistical comparisons between them.
Usage
sig_roc(
data,
response,
variables,
fig.path = NULL,
main = NULL,
file.name = NULL,
palette = "jama",
cols = NULL,
alpha = 1,
compare = FALSE,
smooth = TRUE,
compare_method = "bootstrap",
boot.n = 100
)
Arguments
data |
Data frame containing the predictor variables and binary outcome. |
response |
Character. Name of the binary outcome variable in 'data'. |
variables |
Character vector. Names of predictor variables for ROC curves. |
fig.path |
Character or 'NULL'. Directory path to save output PDF. Default is 'NULL'. |
main |
Character or 'NULL'. Main title for the ROC plot. Default is 'NULL'. |
file.name |
Character or 'NULL'. Output PDF filename without extension. Default is '"0-ROC of multiple variables"'. |
palette |
Character. Color palette for ROC curves. Default is '"jama"'. |
cols |
Character vector or 'NULL'. Custom colors for ROC curves. Default is 'NULL'. |
alpha |
Numeric. Transparency level (1 = opaque, 0 = transparent). Default is '1'. |
compare |
Logical. Whether to perform statistical comparison of AUCs. Default is 'FALSE'. |
smooth |
Logical. Whether to smooth ROC curves. Default is 'TRUE'. |
compare_method |
Character. Method for comparing ROC curves. Default is '"bootstrap"'. |
boot.n |
Integer. Number of bootstrap replications. Default is '100'. |
Value
A list containing:
- auc.out
Data frame with AUC values and confidence intervals
- legend.name
Vector of legend entries for the plot
- p.out
If 'compare = TRUE', data frame with p-values from comparisons
Author(s)
Dongqiang Zeng
Examples
tcga_stad_pdata <- load_data("tcga_stad_pdata")
sig_roc(
data = tcga_stad_pdata, response = "OS_status",
variables = c("TMEscore_plus", "GZMB", "GNLY")
)
Generate Kaplan-Meier Survival Plot for Signature
Description
Creates Kaplan-Meier survival plots for a given signature or gene, with automatic cutoff determination. Generates three types of plots: optimal cutoff (best cutoff), tertile-based (3 groups), and median split (2 groups).
Usage
sig_surv_plot(
input_pdata,
signature,
project = "KM",
ID = "ID",
time = "time",
status = "status",
time_type = "month",
break_month = "auto",
cols = NULL,
palette = "jama",
show_col = TRUE,
mini_sig = "score",
fig.type = "png",
save_path = NULL,
index = 1
)
Arguments
input_pdata |
Data frame with survival data and signature scores. |
signature |
Character string. Column name of the target signature. |
project |
Character string. Project name for output. Default is '"KM"'. |
ID |
Character string. Column name for sample IDs. Default is '"ID"'. |
time |
Character string. Column name for survival time. Default is '"time"'. |
status |
Character string. Column name for survival status. Default is '"status"'. |
time_type |
Character string. Time unit ('"month"' or '"day"'). Default is '"month"'. |
break_month |
Numeric or '"auto"'. Time axis breaks. Default is '"auto"'. |
cols |
Character vector. Optional custom colors. |
palette |
Character string. Color palette if 'cols' not provided. Default is '"jama"'. |
show_col |
Logical indicating whether to show colors. Default is 'TRUE'. |
mini_sig |
Character string. Label for low score group. Default is '"score"'. |
fig.type |
Character string. File format. Default is '"png"'. |
save_path |
Character string or 'NULL'. Directory for saving plots. If 'NULL', plots are not saved. Default is 'NULL'. |
index |
Integer. Index for multiple plots. Default is '1'. |
Value
A list containing:
- data
Processed input data with group assignments
- plots
Combined survival plots
Author(s)
Dongqiang Zeng
Examples
tcga_stad_pdata <- load_data("tcga_stad_pdata")
sig_surv_plot(
input_pdata = tcga_stad_pdata,
signature = "TMEscore_plus",
time = "time",
status = "OS_status"
)
Gene signature collection for pathway and immune analysis
Description
A named list of gene signatures used in the IOBR package for immune deconvolution, pathway scoring, functional annotation, and tumour microenvironment (TME) characterization. Each element corresponds to a predefined biological signature and contains a character vector of HGNC gene symbols.
Usage
data(signature_collection)
Format
A named list of length 323. Each element is a character vector of gene symbols. Representative entries include:
- CD_8_T_effector
Markers of CD8
^{+}effector T cells.- DDR
DNA damage response and repair genes.
- Immune_Checkpoint
Immune checkpoint molecules.
- CellCycle_Reg
Core regulators of cell-cycle progression.
- Mismatch_Repair
Mismatch-repair pathway genes.
- TMEsocreA_CIR
TME-related signature used in TMEscore analysis.
- ...
Additional signatures are included in the list but are not individually listed here; all follow the same structure.
Examples
data(signature_collection)
head(signature_collection)
Signature Score Calculation Methods
Description
A named vector of supported methods for calculating signature scores.
Usage
signature_score_calculation_methods
Format
Named character vector:
- PCA
Principal Component Analysis method ("pca")
- ssGSEA
Single-sample Gene Set Enrichment Analysis ("ssgsea")
- z-score
Z-score transformation method ("zscore")
- Integration
Integration of multiple methods ("integration")
Examples
signature_score_calculation_methods
signature_score_calculation_methods["PCA"]
Adjust Scores Using Spillover Compensation
Description
Adjusts xCell scores using spillover compensation matrix.
Usage
spillOver(transformedScores, K, alpha = 0.5, file.name = NULL)
Arguments
transformedScores |
Transformed scores from transformScores. |
K |
Spillover matrix. |
alpha |
Spillover alpha parameter. Default is '0.5'. |
file.name |
Character string for saving scores. Default is 'NULL'. |
Value
Adjusted xCell scores matrix.
Example Clinical Data for TCGA-STAD Gastric Cancer Analysis
Description
A small example dataset demonstrating the clinical data structure required for TCGA Stomach Adenocarcinoma (STAD) analysis using IOBR package functions. Contains simulated clinical variables, molecular subtypes, and survival data.
Usage
data(stad_group)
Format
A data frame with patient samples as rows and clinical variables as columns:
- ID
Unique patient identifier (TCGA barcode format)
- stage
AJCC pathological stage (Stage_II, Stage_III, Stage_IV)
- status
Patient vital status (Alive, Dead, NA)
- Lauren
Lauren histological classification (Intestinal, Diffuse, Mixed)
- subtype
Molecular subtype classification (EBV, GS)
- EBV
Epstein-Barr virus status (Positive, Negative)
- time
Overall survival time in months
- OS_status
Overall survival status (0=alive, 1=dead)
Examples
data(stad_group)
head(stad_group)
Example Dataset for Subgroup Survival Analysis
Description
An example dataset demonstrating the data structure required for subgroup
survival analysis using the subgroup_survival function.
Contains simulated clinical and biomarker data with survival outcomes.
Usage
data(subgroup_data)
Format
A data frame with clinical variables and biomarker scores:
- Patient_ID
Unique patient identifier
- ProjectID
Study or dataset identifier (e.g., "Dataset1")
- AJCC_stage
AJCC pathological stage (2, 3, 4)
- status
Event status (0=censored, 1=event)
- time
Follow-up time in months
- score
Continuous biomarker score (numeric values)
- score_binary
Binary biomarker classification ("High", "Low")
Examples
data(subgroup_data)
head(subgroup_data)
Subgroup Survival Analysis Using Cox Proportional Hazards Models
Description
Extracts hazard ratios (HR) and 95 proportional hazards models across specified subgroups.
Usage
subgroup_survival(
pdata,
time_name = "time",
status_name = "status",
variables,
object
)
Arguments
pdata |
Data frame containing variables, follow-up time, and outcome. |
time_name |
Character. Column name of follow-up time. Default is '"time"'. |
status_name |
Character. Column name of event status (1/0). Default is '"status"'. |
variables |
Character vector. Subgrouping variables (each processed independently). |
object |
Character. Variable of interest used in Cox model. If it has levels "High" and "Low", recode "High" to 0 and "Low" to 1 before calling. |
Value
Data frame summarizing subgroup Cox results (HR, CI, p-value).
Author(s)
Dongqiang Zeng
Examples
subgroup_data <- load_data("subgroup_data")
input <- subset(subgroup_data, time > 0 & !is.na(status) & !is.na(AJCC_stage))
# Binary variable example
res_bin <- subgroup_survival(
pdata = input, time_name = "time", status_name = "status",
variables = c("ProjectID", "AJCC_stage"), object = "score_binary"
)
# Continuous variable example
res_cont <- subgroup_survival(
pdata = input, time_name = "time", status_name = "status",
variables = c("ProjectID", "AJCC_stage"), object = "score"
)
Generate Kaplan-Meier Survival Plots for Categorical Groups
Description
Creates Kaplan-Meier survival plots for data grouped by a categorical variable. Handles both binary and multi-level categorical groups with customizable plot aesthetics.
Usage
surv_group(
input_pdata,
target_group,
ID = "ID",
levels = c("High", "Low"),
reference_group = NULL,
project = NULL,
time = "time",
status = "status",
time_type = "month",
break_month = "auto",
cols = NULL,
palette = "jama",
mini_sig = "score",
save_path = NULL,
fig.type = "pdf",
index = 1,
width = 6,
height = 6.5,
font.size.table = 3
)
Arguments
input_pdata |
Data frame containing survival data and grouping variables. |
target_group |
Name of column containing the grouping variable. |
ID |
Name of column with unique identifiers. Default is "ID". |
levels |
Names for levels of target_group (for binary groups). Default is c("High", "Low"). |
reference_group |
Reference level for binary comparison. Default is NULL. |
project |
Optional title for plot. Default is NULL. |
time |
Name of column with follow-up times. Default is "time". |
status |
Name of column with event indicators. Default is "status". |
time_type |
Units: "month" or "day". Default is "month". |
break_month |
X-axis break interval. If "auto", calculated automatically. Default is "auto". |
cols |
Color vector for plot lines. Default is NULL. |
palette |
Color palette name. Default is "jama". |
mini_sig |
Prefix label for variables. Default is "score". |
save_path |
Directory for saving plot. Default is NULL. |
fig.type |
File format: "pdf" or "png". Default is "pdf". |
index |
Identifier for file naming. Default is 1. |
width |
Plot width. Default is 6. |
height |
Plot height. Default is 6.5. |
font.size.table |
Font size for risk table. Default is 3. |
Value
Kaplan-Meier plot object.
Author(s)
Dongqiang Zeng
Examples
tcga_stad_pdata <- load_data("tcga_stad_pdata")
surv_group(
input_pdata = tcga_stad_pdata,
target_group = "Lauren",
time = "time",
status = "OS_status"
)
Preprocess TCGA RNA-seq Data
Description
Preprocesses TCGA RNA-seq data by modifying sample types, transforming data, and annotating genes based on specified parameters.
Usage
tcga_rna_preps(
eset,
id_type = c("ensembl", "symbol"),
input_type = c("log2count", "count"),
output = c("tumor", "tumor_normal"),
output_type = c("tpm", "log2tpm", "count"),
annotation = TRUE
)
Arguments
eset |
Matrix or data frame. RNA-seq gene expression matrix from TCGA. |
id_type |
Character. Gene identifier type: "ensembl" or "symbol". Default is "ensembl". |
input_type |
Character. Input data type: "log2count" or "count". Default is "log2count". |
output |
Character. Sample type: "tumor" or "tumor_normal". Default is "tumor". |
output_type |
Character. Output data type: "tpm", "log2tpm", or "count". Default is "tpm". |
annotation |
Logical. Whether to perform gene annotation. Default is TRUE. |
Value
Preprocessed gene expression matrix.
Author(s)
Dongqiang Zeng
Examples
eset_stad <- load_data("eset_stad")
eset <- tcga_rna_preps(
eset = eset_stad, id_type = "ensembl", input_type = "count",
output = "tumor", output_type = "tpm", annotation = TRUE
)
TCGA-STAD Clinical and Molecular Annotation Data
Description
Clinical, molecular, and signature score data for TCGA stomach adenocarcinoma (STAD) samples. Includes patient demographics, tumor characteristics, molecular subtypes, and pre-computed signature scores.
Usage
data(tcga_stad_pdata)
Format
A data frame with samples as rows and variables as columns:
ID – TCGA sample barcode
stage – tumor stage (Stage_I, Stage_II, etc.)
status – vital status (Alive/Dead)
Lauren – histological classification
subtype – molecular subtype (EBV, MSI, GS, CN, ...)
EBV – EBV infection status
TMEscore_plus – continuous tumor-micro-environment score
TMEscore_plus_binary – High/Low TME classification
time – follow-up time (months)
OS_status – 0 = alive, 1 = dead
ARID1A, PIK3CA – driver-gene mutation status
MALAT1 – lncRNA expression
remaining columns – gene-expression values and additional clinical/molecular annotations
Source
The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD)
References
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202-209 (2014). doi:10.1038/nature13480
Examples
data(tcga_stad_pdata)
head(tcga_stad_pdata)
Test for Cell Population Infiltration
Description
Returns p-values for the null hypothesis that samples are not infiltrated by the corresponding cell population.
Usage
test_for_infiltration(MCPcounterMatrix, platform = c("133P2", "133A", "HG1"))
Arguments
MCPcounterMatrix |
Matrix, usually output from MCPcounter.estimate. |
platform |
Expression platform: "133P2", "133A", or "HG1". Default is "133P2". |
Value
Matrix with samples in rows and cell populations in columns. Elements are p-values.
Author(s)
Etienne Becht
Examples
# This function requires null_models data which is loaded internally
# Create example data
scores <- matrix(runif(30), nrow = 3, ncol = 10)
rownames(scores) <- c("T cells", "B cells", "NK cells")
pvals <- test_for_infiltration(scores, platform = "133P2")
TIMER Available Cancer Types
Description
Character vector of cancer types supported by TIMER deconvolution. TIMER signatures are cancer-specific.
Usage
timer_available_cancers
Format
An object of class character of length 32.
Value
Character vector of available cancer type abbreviations.
Examples
# List all available cancer types for TIMER
timer_available_cancers
# Check if a cancer type is supported
"brca" %in% timer_available_cancers
Source code for the TIMER deconvolution method.
Description
Formats and displays informational messages for timing or logging purposes. Useful for tracking progress or stages of execution within scripts.
Usage
timer_info(string)
Arguments
string |
Character. Message to be displayed. |
Details
This code is adapted from https://github.com/hanfeisun/TIMER, which again is an adapted version of the original TIMER source code from http://cistrome.org/TIMER/download.html.
The method is described in Li et al. Genome Biology 2016;17(1):174., PMID 27549193. Display Timer Information Messages
Value
None; used for its side effect of printing a message.
Author(s)
Bo Li
Examples
timer_info("Data processing started.")
Identification of TME Cluster
Description
Performs TME (Tumor Microenvironment) clustering analysis using various clustering methods. Supports feature selection, scaling, and automatic determination of optimal cluster number.
Usage
tme_cluster(
input,
features = NULL,
pattern = NULL,
id = NULL,
scale = TRUE,
method = "kmeans",
min_nc = 2,
max.nc = 6
)
Arguments
input |
Data frame containing the input dataset. |
features |
Vector of features to use for clustering. Default is NULL (uses all columns or pattern-selected columns). |
pattern |
Regular expression pattern for selecting features. Default is NULL. |
id |
Column name for identifiers. Default is NULL (uses row names). |
scale |
Logical indicating whether to scale features. Default is TRUE. |
method |
Clustering method. Default is "kmeans". |
min_nc |
Minimum number of clusters to evaluate. Default is 2. |
max.nc |
Maximum number of clusters to evaluate. Default is 6. |
Value
Data frame with cluster assignments appended.
Author(s)
Dongqiang Zeng
Examples
set.seed(123)
input_data <- data.frame(
ID = paste0("Sample", 1:20),
xCell_Tcells = rnorm(20),
xCell_Bcells = rnorm(20),
xCell_Macrophages = rnorm(20),
Other_feature = rnorm(20)
)
result <- tme_cluster(
input = input_data,
pattern = "xCell",
id = "ID",
method = "kmeans"
)
table(result$cluster)
TME Deconvolution Methods
Description
A named vector of supported tumor microenvironment (TME) deconvolution methods in the IOBR package.
Usage
tme_deconvolution_methods
Format
A named character vector where names are display names and values are internal method names.
Details
The methods currently supported are:
'mcpcounter': MCP-counter for immune and stromal cell populations
'epic': EPIC for immune, stromal, and cancer cell fractions
'xcell': xCell for 64 immune and stromal cell types
'cibersort': CIBERSORT for 22 immune cell types
'cibersort_abs': CIBERSORT in absolute mode
'ips': Immunophenoscore calculation
'estimate': ESTIMATE for stromal/immune/estimate scores
'svr': Support Vector Regression (custom reference)
'lsei': Least Squares with Equality/Inequality constraints
'timer': TIMER for cancer-specific immune estimation
'quantiseq': quanTIseq for RNA-seq immune deconvolution
Value
Named character vector of available deconvolution methods.
Examples
# List all available TME deconvolution methods
tme_deconvolution_methods
# Get internal method name for a specific method
tme_deconvolution_methods["MCPcounter"]
Transform Raw Scores to Fractions
Description
Transforms raw xCell scores to estimated cell fractions.
Usage
transformScores(scores, fit.vals, scale = TRUE, fn = NULL)
Arguments
scores |
Raw scores from rawEnrichmentAnalysis. |
fit.vals |
Calibration values from spill object. |
scale |
Logical indicating whether to use scaling. Default is 'TRUE'. |
fn |
Character string for saving scores. Default is 'NULL'. |
Value
Transformed xCell scores matrix.
Transform NA, Inf, or Zero Values in Data
Description
Replaces NA, Inf, or zero values in specified columns of a data frame with a user-defined value or the column mean.
Usage
transform_data(data, feature, data_type = c("NA", "Inf", "zero"), into = 0)
Arguments
data |
Data frame. Input data to be transformed. |
feature |
Character vector. Column names in 'data' to apply transformation. |
data_type |
Character. Type of value to replace: '"NA"', '"Inf"', or '"zero"'. |
into |
Value to replace specified type with. Default is 0. If '"mean"', replaces with column mean (excluding NA/Inf values as appropriate). |
Value
Data frame with specified transformations applied to selected features.
Author(s)
Dongqiang Zeng
Examples
data_matrix <- data.frame(
A = c(1, 2, NA, 4, Inf),
B = c(Inf, 2, 3, 4, 5),
C = c(0, 0, 0, 1, 2)
)
# Replace NAs with 0
transform_data(data_matrix, feature = c("A", "B"), data_type = "NA")
# Replace Inf values with the mean of the column
transform_data(data_matrix,
feature = c("A", "B"),
data_type = "Inf", into = "mean"
)
# Replace zeros with -1 in column C
transform_data(data_matrix, feature = "C", data_type = "zero", into = -1)
The xCell Analysis Pipeline
Description
Returns the xCell cell types enrichment scores for tumor microenvironment deconvolution. Uses ssGSEA-based enrichment analysis with spillover compensation to estimate cell type proportions from gene expression data.
Usage
xCellAnalysis(
expr,
signatures = NULL,
genes = NULL,
spill = NULL,
rnaseq = TRUE,
file.name = NULL,
scale = TRUE,
alpha = 0.5,
save.raw = FALSE,
cell.types.use = NULL
)
Arguments
expr |
Gene expression data matrix with row names as gene symbols and columns as samples. |
signatures |
GMT object of signatures. If 'NULL', uses xCell defaults. |
genes |
Character vector of genes to use in the analysis. If 'NULL', uses xCell defaults. |
spill |
Spillover object for adjusting scores. If 'NULL', uses xCell defaults. |
rnaseq |
Logical indicating whether to use RNA-seq (TRUE) or array (FALSE) spillover parameters. Default is 'TRUE'. |
file.name |
Character string for saving scores. Default is 'NULL'. |
scale |
Logical indicating whether to use scaling. Default is 'TRUE'. |
alpha |
Numeric value to override spillover alpha parameter. Default is '0.5'. |
save.raw |
Logical indicating whether to save raw scores. Default is 'FALSE'. |
cell.types.use |
Character vector of cell types to use. If 'NULL', uses all available cell types. Default is 'NULL'. |
Value
A matrix of adjusted xCell scores.
Calculate Significance P-values Using Beta Distribution
Description
Calculates FDR-adjusted p-values for the null hypothesis that a cell type is not present in the mixture.
Usage
xCellSignifcanceBetaDist(
scores,
beta_params = NULL,
rnaseq = TRUE,
file.name = NULL
)
Arguments
scores |
xCell scores matrix. |
beta_params |
Pre-calculated beta distribution parameters. |
rnaseq |
Logical for RNA-seq vs array parameters. Default is 'TRUE'. |
file.name |
Character string for saving p-values. Default is 'NULL'. |
Value
Matrix of p-values.
Calculate Significance Using Random Matrix
Description
Calculates FDR-adjusted p-values using a random shuffled matrix.
Usage
xCellSignifcanceRandomMatrix(
scores,
expr,
spill,
alpha = 0.5,
nperm = 250,
file.name = NULL
)
Arguments
scores |
xCell scores matrix. |
expr |
Input expression matrix. |
spill |
Spillover object. |
alpha |
Spillover alpha parameter. Default is '0.5'. |
nperm |
Number of permutations. Default is '250'. |
file.name |
Character string for saving p-values. Default is 'NULL'. |
Value
List containing p-values, shuffled xCell scores, shuffled expression, and beta distributions.