CAFT

Rank-Based Compositional Analysis using Log-Linear Models for Microbiome Data with Zero Cells.

Authors: Glen Satten (GSatten@emory.edu), Mo Li (mo.li@louisiana.edu), Ni Zhao (nzhao10@jhu.edu)

Overview

In this study, we introduce a novel statistical framework for differential abundance analysis of microbiome data, termed the Compositional Accelerated Failure Time (CAFT) model. The CAFT model addresses zero read counts by treating them as censored observations below the detection limit, similar to censoring mechanisms employed in survival analysis. This approach is inherently resistant to multiplicative bias, eliminates the need for pseudocounts, and addresses compositional bias through the establishment of appropriate score test procedures. For FDR control, we utilize and expand the idea from Efron’s empirical null distribution to achieve better FDR control.

Package download and installation

You can install the version of CAFT from Github:

# install.packages("remotes")
remotes::install_github("mli171/CAFT", build_vignettes = TRUE, dependencies = TRUE)

Open the Vignette in R for more details

browseVignettes("CAFT")

CAFT: Compositional Rank-Based Analysis using AFT models

The main function in CAFT package is:

caft()

An example of using the caft function

Apply ‘caft’ to a dataset from the study of gut microbiome data set focusing on the adult colorectal cancer using the stool samples (Pasolli et al.,2017).

library(CAFT)
library(phyloseq)

data(Colon)

count.tab = t(as.data.frame(as.matrix(otu_table(Colon))))
sample.tab = as.data.frame(as.matrix(sample_data(Colon)))
tax.tab = as.data.frame(as.matrix(tax_table(Colon)))

dim(count.tab)

pNA = which(is.na(sample.tab$age))
if(length(pNA) > 0){
  count.tab = count.tab[-pNA, ]
  sample.tab = sample.tab[-pNA,]
}
# No missing values from gender

## otu presence filtering
p_otu = which(rowSums(t(count.tab) > 0) > 1)
count.tab = count.tab[,p_otu]
tax.tab = tax.tab[p_otu,]

dim(count.tab)

cens.prop = colMeans(count.tab == 0, na.rm = T)
mean(cens.prop)

Disease1 = Disease2 = rep(0, NROW(sample.tab)) # healthy
Disease1[sample.tab$disease == "CRC"] = 1
Disease2[sample.tab$disease == "adenoma"] = 1

Age = as.numeric(sample.tab$age)
Gender = as.numeric(factor(sample.tab$gender)) - 1

x.test = cbind(Disease1, Disease2)
x.adj  = cbind(Age, Gender)

res.CAFT = caft(otu.table=count.tab, x.test=x.test, x.adj=x.adj)

res.CAFT = caft(otu.table=count.tab, x.test=x.test, x.adj=x.adj, n.cores=4)

How to cite

If you use CAFT in your work, please cite:

Satten, G. A., Li, M., & Zhao, N. (2025). CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells. bioRxiv, 2025.11.26.690468. https://doi.org/10.1101/2025.11.26.690468

BibTeX:

@article{satten2025caft,
  title   = {CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells},
  author  = {Satten, Glen A. and Li, Mo and Zhao, Ni},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.1101/2025.11.26.690468},
  note    = {Preprint}
}

References

Pasolli E, Schiffer L, Manghi P, Renson A, Obenchain V, Truong D, Beghini F, Malik F, Ramos M, Dowd J, Huttenhower C, Morgan M, Segata N, Waldron L (2017). “Accessible, curated metagenomic data through ExperimentHub.” Nat. Methods, 14(11), 1023–1024. ISSN 1548-7091, 1548-7105, doi:10.1038/nmeth.4468.