Need 16G memory to run diffdriver. The place that needs the most memory is the parameter estimation for background mutation rate model.
Diffdriver requires three datasets from the user: phenotype or context of each individual tumor sample, somatic mutations identified from tumor samples and a list of driver gene names. Diffdriver will test the association between the phenotype/context with selection strength of each provided driver gene.
Diffdriver needs phenotype or context of each individual tumor sample. A data frame should provided, in this data frame the first column is sample ID with column name “SampleID”, the second column is phenotype or context with the phenotype or context name as column name (note no space or in tab in column names are allowed). See an example below:
phenof = system.file("extdata/", "example_phenotypes.txt", package = "diffdriver")
pheno <- read.table(phenof, header = T)
head(pheno)
#> SampleID SmokingCessation
#> 1 TCGA-N5-A4R8-01A-11D-A28R-08 0.5319630
#> 2 TCGA-N5-A4RD-01A-11D-A28R-08 0.0448991
#> 3 TCGA-N5-A4RF-01A-11D-A28R-08 -0.3140750
#> 4 TCGA-N5-A4RJ-01A-11D-A28R-08 0.4229920
#> 5 TCGA-N5-A4RM-01A-11D-A28R-08 -0.2830070
#> 6 TCGA-N5-A4RN-01A-12D-A28R-08 0.7874080A vector of driver gene names. Diffdriver will test the association for each gene.
genef = system.file("extdata", "example_gene.txt", package = "diffdriver")
gene <- read.table(genef, header = F)
head(gene)
#> V1
#> 1 CHD4
#> 2 PIK3CADiffdriver needs the somatic mutations identified for each tumor samples. Note this should include all somatic mutations identified, not just the ones in selected driver genes. Somatic mutations will be used to estimate background mutation rate and selection strength for selected genes. A data frame should be provided, see below for colnames of this data frame and example mutations.
mutf = system.file("extdata/", "example_mutations.txt", package = "diffdriver")
mut <- read.table(mutf, header = T)
head(mut)
#> Chromosome Position Ref Alt SampleID
#> 1 19 55653236 C T TCGA-N6-A4VE-01A-11D-A28R-08
#> 2 17 65134211 C T TCGA-NA-A4R1-01A-11D-A28R-08
#> 3 20 30354424 G T TCGA-N8-A4PM-01A-11D-A28R-08
#> 4 6 18215312 G C TCGA-N8-A4PO-01A-11D-A28R-08
#> 5 1 154186393 C G TCGA-NA-A4R0-01A-11D-A28R-08
#> 6 10 23003128 C A TCGA-NF-A4X2-01A-11D-A28R-08In addition to these datasets provided by the user, diffdriver also needs annotation files. See the package installation page for download links to these annotation files. Unless the number of tumor samples or number of mutations is very small, we suggest to use the 96-annotation files. Please download these files to a folder and provide the folder address to diffdriver.
To ensure that the numerical results are reproducible, this tutorial uses a fixed random seed.
With signature adjustment (BMRmode = "signature"), which
is the default mode, it will need around 15min to estimate parameters in
background model. Please use the provided the annotation folder with
96-annotation files when running with the “signature mode”.
library(diffdriver)
res <- diffdriver(gene = gene, mut= mut, pheno = pheno, anno_dir = "/Volumes/Szhao/library/diffdriver_anno/annodir96", k=6, totalnttype = 96, BMRmode = "signature", output_dir = output_dir, output_prefix = "testdiffdriver_sig")
resWithout signature adjustment (BMRmode = "regular"), it
will need around 15min to estimate parameters in background model. You
can use the provided the annotation folder with 9-annotation files or
96-annotation files. The example below uses 96-annotation files. When
total number of mutations is low, one should use 9-annotation files.
To visualize the data for specific genes, diffdriver has a plotting function:
diffdriver::plot_mut(gene_name = "PIK3CA", mut= mut, pheno = pheno, totalnttype = 9, anno_dir = "/Volumes/Szhao/library/diffdriver_anno/annodir9")sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS 15.7.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] C/UTF-8/C/C/C/C
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.52
#> [5] cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
#> [9] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.3.1 tools_4.3.1 evaluate_1.0.4 bslib_0.9.0
#> [17] yaml_2.3.10 rlang_1.1.6 jsonlite_2.0.0