| Title: | Tidy Presentation of Clinical Reporting |
| Version: | 0.1.3 |
| Date: | 2026-03-20 |
| Description: | Streamlined statistical reporting in 'Rmarkdown' environments. Facilitates the automated reporting of descriptive statistics, multiple univariate models, multivariable models and tables combining these outputs. Plotting functions include customisable survival curves, forest plots from logistic and ordinal regression and bivariate comparison plots. |
| License: | MIT + file LICENSE |
| Suggests: | geepack, lme4, lmerTest, MASS, mice, nlme, tidycmprsk, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Imports: | aod, boot, cmprsk, cowplot, dplyr, ggplot2, ggpubr, kableExtra, knitr, lifecycle, pander, rlang, rstudioapi, scales, stats, survival, tidyr, tidyselect |
| Collate: | 'imports.R' 'helper.R' 'model_registry.R' 'main.R' 'globals.R' 'data.R' 'lblCode.R' 'rm_compactsum.R' 'rm_uvsum2.R' 'autoreg.R' 'autosum.R' 'getVarLevels.R' 'ggkmcif3.R' 'rm_mvsum2.R' 'deprecated.R' |
| Depends: | R (≥ 4.2) |
| LazyData: | no |
| URL: | https://biostatspmh.github.io/reportRmd/ |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-03-20 15:44:17 UTC; lisaavery |
| Author: | Lisa Avery |
| Maintainer: | Lisa Avery <lisa.avery@uhn.ca> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-20 16:10:03 UTC |
Compute Generalized Variance Inflation Factor (GVIF)
Description
Calculates GVIF for model terms to detect multicollinearity. Adapted from the car package.
Usage
GVIF(model)
Arguments
model |
Fitted model object (lm, glm, coxph, etc.) |
Details
For terms with degrees of freedom > 1 (e.g., factor variables), returns GVIF^(1/(2*df)) for comparability with standard VIF.
Value
Data frame with columns: Covariate (term name) and VIF (value)
References
Fox, J. and Weisberg, S. (2019). An R Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage.
Add censor marks to KM plot
Description
Add censor marks to KM plot
Usage
add_censor_marks(p, df, censor.size = 0.5, censor.stroke = 1.5, shape = "|")
Arguments
p |
ggplot object |
df |
Plotting dataframe |
censor.size |
Size of censor marks |
censor.stroke |
Stroke of censor marks |
shape |
Shape for censor marks, defaults to "|", but can use an character or standard geom_point shapes (0-24) |
Add CIF Hazard Ratios to Strata Labels
Description
Add CIF Hazard Ratios to Strata Labels
Usage
add_cif_hazard_ratios(
stratalabs,
data,
response,
cov,
plot.event = 1,
HR = FALSE,
HR_pval = FALSE,
HR.digits = 2,
HR.pval.digits = 3
)
Arguments
stratalabs |
Original strata labels |
data |
Input data |
response |
Time and status variables |
cov |
Covariate |
plot.event |
Event for CIF (must be 1 for HR calculation) |
HR |
Whether to include HR |
HR_pval |
Whether to include HR p-value |
HR.digits |
Number of digits for HR |
HR.pval.digits |
Number of digits for HR p-value |
Value
Updated strata labels
Add confidence bands to plot
Description
Add confidence bands to plot
Usage
add_confidence_bands(p, df, type, event = "col", plot.event = 1)
Arguments
p |
ggplot object |
df |
Plotting dataframe |
type |
Plot type |
event |
Event distinction method |
plot.event |
Events to plot |
Add hazard ratios to strata labels
Description
Add hazard ratios to strata labels
Usage
add_km_hazard_ratios(
stratalabs,
data,
response,
cov,
type,
plot.event = 1,
HR = FALSE,
HR_pval = FALSE,
HR.digits = 2,
HR.pval.digits = 3
)
Arguments
stratalabs |
Original strata labels |
data |
Input data |
response |
Time and status variables |
cov |
Covariate |
type |
Model type ("KM" or "CIF") |
plot.event |
Event for CIF |
HR |
Whether to include HR |
HR_pval |
Whether to include HR p-value |
HR.digits |
Number of digits for HR |
HR.pval.digits |
Number of digits for HR p-value |
Add median survival lines to plot
Description
Add median survival lines to plot
Usage
add_median_lines(p, median_vals, median.lsize = 1)
Arguments
p |
ggplot object |
median_vals |
Median values |
median.lsize |
Line size for median lines |
Add median survival text to plot
Description
Add median survival text to plot
Usage
add_median_text(
p,
type,
multiple_lines,
median_txt,
median.pos = NULL,
times,
ylim,
median.size = 3,
plot.event = 1,
eventlabs = NULL
)
Arguments
p |
ggplot object |
type |
Plot type |
multiple_lines |
Whether multiple strata |
median_txt |
Median text |
median.pos |
Position for median text |
times |
Time breaks |
ylim |
Y-axis limits |
median.size |
Text size |
plot.event |
Events being plotted |
eventlabs |
Event labels |
Add survival at set time lines to plot
Description
Add survival at set time lines to plot
Usage
add_set_time_lines(p, set.surv, set.lsize = 1)
Arguments
p |
ggplot object |
set.surv |
Data frame with survival at set times |
set.lsize |
Line size |
Add survival at set time text to plot
Description
Add survival at set time text to plot
Usage
add_set_time_text(
p,
type,
multiple_lines,
set.surv.text,
set.pos = NULL,
times,
ylim,
set.size = 3,
plot.event = 1,
eventlabs = NULL
)
Arguments
p |
ggplot object |
type |
Plot type |
multiple_lines |
Whether multiple strata |
set.surv.text |
Survival text |
set.pos |
Position for survival text |
times |
Time breaks |
ylim |
Y-axis limits |
set.size |
Text size |
plot.event |
Events being plotted |
eventlabs |
Event labels |
Add statistical test results to plot
Description
Add statistical test results to plot
Usage
add_statistical_tests(
p,
type,
multiple_lines,
pval_result,
pval.pos = NULL,
times,
xlim,
ylim,
psize = 3.5,
pval.digits = 3,
plot.event = 1,
eventlabs = NULL
)
Arguments
p |
ggplot object |
type |
Plot type |
multiple_lines |
Whether multiple strata |
pval_result |
P-value |
pval.pos |
Position for p-value text |
times |
Time breaks |
xlim |
X-axis limits |
ylim |
Y-axis limits |
psize |
Text size for p-value |
pval.digits |
Number of digits for p-value |
plot.event |
Events being plotted |
eventlabs |
Event labels |
Apply colour and linetype scales
Description
Apply colour and linetype scales
Usage
apply_scales_and_guides(
p,
col,
linetype = NULL,
stratalabs,
eventlabs = NULL,
multiple_lines,
plot.event = 1,
event = "col"
)
Arguments
p |
ggplot object |
col |
colours vector |
linetype |
Line types vector |
stratalabs |
Strata labels |
eventlabs |
Event labels |
multiple_lines |
Whether multiple strata |
plot.event |
Events being plotted |
event |
How events are distinguished |
Fit a regression model based on response type
Description
S3 generic that dispatches to the appropriate model fitting function based on the class of the response variable.
Usage
autoreg(
response,
data,
x_var,
id = NULL,
strata = "",
family = NULL,
offset = NULL,
corstr = "independence"
)
Arguments
response |
The response variable (used for S3 dispatch). |
data |
A data frame containing the model data. |
x_var |
Character string of the predictor variable name. |
id |
Optional subject ID for GEE models. |
strata |
Optional stratification variable. |
family |
Optional family for GLM models. |
offset |
Optional offset term. |
corstr |
Correlation structure for GEE models (default "independence"). |
Value
A fitted model object.
fit box cox transformed linear model
Description
Wrapper function to fit fine and gray competing risk model using function crr from package cmprsk
Usage
boxcoxfitRx(f, data, lambda = FALSE)
Arguments
f |
formula for the model. Currently the formula only works by using the name of the column in a dataframe. It does not work by using $ or [] notation. |
data |
dataframe containing data |
lambda |
boolean indicating if you want to output the lamda used in the boxcox transformation. If so the function will return a list of length 2 with the model as the first element and a vector of length 2 as the second. |
Value
a list containing the linear model (lm) object and, if requested, lambda
Standard break function using pretty()
Description
Creates nice axis breaks using R's built-in pretty() function. This is the standard approach used by ggkmcif functions.
Usage
break_function(x, n = 5)
Arguments
x |
Maximum value for axis |
n |
Desired number of intervals (default 5) |
Value
Numeric vector of break positions
Custom break function for axis scaling
Description
Creates axis breaks based on maximum value with custom logic for determining break intervals based on magnitude.
Usage
break_function_custom(xmax)
Arguments
xmax |
Maximum value for axis |
Value
Numeric vector of break positions
Build arrow segments for CIs extending beyond xlim
Description
Returns a list of ggplot layers for left and right arrows.
Usage
build_arrow_segments(tab, xlim, use_linetype = FALSE)
Arguments
tab |
forest plot data with ci_low_arrow and ci_high_arrow columns |
xlim |
numeric vector of length 2 |
use_linetype |
logical, include linetype aesthetic? |
Value
list of ggplot layers
Build a forest plot from prepared data
Description
Internal function that handles all ggplot construction for forest plots. Called by forestplotMV and forestplotUV after data preparation.
Usage
build_forest_ggplot(
tab,
x_lab,
show_linetype = FALSE,
colours = "default",
showEst = TRUE,
showRef = TRUE,
logScale = TRUE,
nxTicks = 5,
showN = TRUE,
showEvent = TRUE,
xlim = NULL
)
Arguments
tab |
data frame prepared by prepare_forest_data and order_forest_data |
x_lab |
character label for the x-axis |
show_linetype |
logical, should adjusted/unadjusted be distinguished by linetype and shape? TRUE when showing both adjusted and unadjusted. |
colours |
colour specification: "default" or a vector of 3 colours for risks less than, equal to, and greater than 1.0 |
showEst |
logical, should estimates be shown in labels? |
showRef |
logical, should reference levels be shown? |
logScale |
logical, should x-axis be log-scaled? |
nxTicks |
number of x-axis tick marks |
showN |
logical, show N on secondary axis? |
showEvent |
logical, show events on secondary axis? |
xlim |
numeric vector of length 2 for x-axis limits, or NULL |
Build log scale for forest plot x-axis
Description
Build log scale for forest plot x-axis
Usage
build_log_scale(tab, xlim, nxTicks)
Arguments
tab |
forest plot data |
xlim |
numeric vector of length 2, or NULL |
nxTicks |
number of tick marks |
Value
a scale_x_log10 layer
Build secondary axis specification
Description
Build secondary axis specification
Usage
build_secondary_axis(tab, yLabels, showN, showEvent)
Arguments
tab |
forest plot data |
yLabels |
data frame with y.pos and labels |
showN |
logical |
showEvent |
logical |
Value
list with axis (scale_y_continuous) and theme elements
Calculate median survival times and add to labels
Description
Calculate median survival times and add to labels
Usage
calculate_and_add_median_times(
sfit = NULL,
fit = NULL,
stratalabs,
type = "KM",
plot.event = 1,
median.text = FALSE,
median.CI = FALSE,
median.digits = 3
)
Arguments
sfit |
Survival fit object (for KM) |
fit |
CIF fit object |
stratalabs |
Strata labels |
type |
Model type |
plot.event |
Events to plot (for CIF) |
median.text |
Whether to add median text |
median.CI |
Whether to include CI |
median.digits |
Number of digits |
Calculate survival/CIF at specific time points and add to labels
Description
Calculate survival/CIF at specific time points and add to labels
Usage
calculate_and_add_time_specific_estimates(
sfit = NULL,
fit = NULL,
stratalabs,
type = "KM",
plot.event = 1,
set.time.text = NULL,
set.time = NULL,
set.time.line = FALSE,
set.time.CI = FALSE,
set.time.digits = 3
)
Arguments
sfit |
Survival fit object (for KM) |
fit |
CIF fit object |
stratalabs |
Strata labels |
type |
Model type |
plot.event |
Events to plot (for CIF) |
set.time.text |
Text label for time points |
set.time |
Time points to evaluate |
set.time.line |
boolean to specify if you want the survival added as lines to the plot at a specified point |
set.time.CI |
Whether to include CI |
set.time.digits |
Number of digits |
Calculate Median Time to Event for CIF
Description
Calculate Median Time to Event for CIF
Usage
calculate_cif_median(fit, event_name)
Arguments
fit |
CIF fit object |
event_name |
Name of the event in fit object |
Value
Median time to event
Calculate CIF Estimates at Specific Time Points
Description
Calculate CIF Estimates at Specific Time Points
Usage
calculate_cif_timepoints(
fit,
plot.event,
set.time,
set.time.CI = FALSE,
set.time.digits = 3,
multiple_lines = FALSE
)
Arguments
fit |
CIF fit object |
plot.event |
Events to plot |
set.time |
Time points to evaluate |
set.time.CI |
Whether to include confidence intervals |
set.time.digits |
Number of digits |
multiple_lines |
Whether there are multiple strata |
Value
Data frame with time-specific estimates
Clear variable labels from a data frame
Description
This function will remove all label attributes from variables in the data.
Usage
clear_labels(data)
Arguments
data |
the data frame to remove labels from |
Details
To change or remove individual labels use set_labels or set_var_labels
Examples
# Set a few variable labels for ctDNA
data("ctDNA")
ctDNA <- ctDNA |> set_var_labels(
ctdna_status="detectable ctDNA",
cohort="A cohort label")
# Clear all variable data frames and check
clear_labels(ctDNA)
Extract model coefficient summary
Description
S3 generic to extract formatted coefficient summaries from fitted models.
Usage
coeffSum(model, CIwidth = 0.95, digits = 2)
Arguments
model |
A fitted model object. |
CIwidth |
Confidence interval width (default 0.95). |
digits |
Number of digits for rounding (default 2). |
Value
A data frame of formatted model coefficients.
Match coefficient names to covariate names
Description
Matches model coefficient names (with factor levels) back to original covariate names from the model call. Handles cases where one factor name is a subset of another.
Usage
covnm(betanames, call)
Arguments
betanames |
Vector of coefficient names from model |
call |
Vector of covariate names from model formula |
Value
Vector of matched covariate names
Get covariate summary dataframe
Description
Returns a dataframe corresponding to a descriptive table.
Usage
covsum(
data,
covs,
maincov = NULL,
digits = 1,
numobs = NULL,
markup = FALSE,
sanitize = FALSE,
nicenames = TRUE,
IQR = FALSE,
all.stats = FALSE,
pvalue = TRUE,
effSize = FALSE,
show.tests = FALSE,
dropLevels = TRUE,
excludeLevels = NULL,
full = TRUE,
digits.cat = 0,
testcont = c("rank-sum test", "ANOVA"),
testcat = c("Chi-squared", "Fisher"),
include_missing = FALSE,
percentage = c("column", "row")
)
Arguments
data |
dataframe containing data |
covs |
character vector with the names of columns to include in table |
maincov |
covariate to stratify table by |
digits |
number of digits for summarizing mean data, does not affect p-values |
numobs |
named list overriding the number of people you expect to have the covariate |
markup |
boolean indicating if you want latex markup |
sanitize |
boolean indicating if you want to sanitize all strings to not break LaTeX |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space |
IQR |
boolean indicating if you want to display the inter quantile range (Q1,Q3) as opposed to (min,max) in the summary for continuous variables |
all.stats |
boolean indicating if all summary statistics (Q1,Q3 + min,max on a separate line) should be displayed. Overrides IQR. |
pvalue |
boolean indicating if you want p-values included in the table |
effSize |
boolean indicating if you want effect sizes included in the table. Can only be obtained if pvalue is also requested. Effect sizes calculated include Cramer's V for categorical variables, Cohen's d, Wilcoxon r, or Eta-squared for numeric/continuous variables. |
show.tests |
boolean indicating if the type of statistical test and effect size used should be shown in a column beside the pvalues. Ignored if pvalue=FALSE. |
dropLevels |
logical, indicating if empty factor levels be dropped from the output, default is TRUE. |
excludeLevels |
a named list of covariate levels to exclude from statistical tests in the form list(varname =c('level1','level2')). These levels will be excluded from association tests, but not the table. This can be useful for levels where there is a logical skip (ie not missing, but not presented). Ignored if pvalue=FALSE. |
full |
boolean indicating if you want the full sample included in the table, ignored if maincov is NULL |
digits.cat |
number of digits for the proportions when summarizing categorical data (default: 0) |
testcont |
test of choice for continuous variables,one of rank-sum (default) or ANOVA |
testcat |
test of choice for categorical variables,one of Chi-squared (default) or Fisher |
include_missing |
Option to include NA values of maincov. NAs will not be included in statistical tests |
percentage |
choice of how percentages are presented ,one of column (default) or row |
Details
Comparisons for categorical variables default to chi-square tests, but if there are counts of <5 then the Fisher Exact test will be used and if this is unsuccessful then a second attempt will be made computing p-values using MC simulation. If testcont='ANOVA' then the t-test with unequal variance will be used for two groups and an ANOVA will be used for three or more. The statistical test used can be displayed by specifying show.tests=TRUE.
The number of decimals places to display the statistics can be changed with digits, but this will not change the display of p-values. If more significant digits are required for p-values then use tableOnly=TRUE and format as desired.
References
Ellis, P.D. (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.doi:10.1017/CBO9780511761676
Lakens, D. (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4; 863:1-12. doi:10.3389/fpsyg.2013.00863
See Also
fisher.test,chisq.test,
wilcox.test,kruskal.test,and
anova
Create base ggplot for survival curves
Description
Create base ggplot for survival curves
Usage
create_base_plot(
df,
type,
xlab = "Time",
ylab = "Survival Probability",
multiple_lines,
plot.event = 1,
event = "col",
lsize = 0.5,
fsize,
col,
linetype = NULL,
legend.pos = "bottom",
legend.title = NULL,
times,
ylim = c(0, 1),
xlim = NULL,
main = NULL
)
Arguments
df |
Plotting dataframe |
type |
Plot type ("KM" or "CIF") |
xlab |
x axis label |
ylab |
y axis label |
multiple_lines |
Whether multiple strata |
plot.event |
Events to plot |
event |
How to distinguish events ("col" or "linetype") |
lsize |
Line size |
fsize |
Font size |
col |
colours vector |
linetype |
Line types vector |
legend.pos |
Legend position |
legend.title |
Legend title |
times |
Time breaks |
ylim |
Y-axis limits |
xlim |
X-axis limits |
main |
Plot title |
Create CIF Plotting Data Frame
Description
Create CIF Plotting Data Frame
Usage
create_cif_dataframe(
fit,
gsep,
plot.event,
stratalabs,
conf.type = "log",
flip.CIF = FALSE,
eventlabs = NULL,
cov = NULL,
data = NULL
)
Arguments
fit |
CIF fit object |
gsep |
Group separator from fit_cif_model |
plot.event |
Events to plot |
stratalabs |
Strata labels |
conf.type |
Confidence interval type |
flip.CIF |
Whether to flip the CIF curve |
eventlabs |
Event labels |
cov |
Covariate for proper factor levels |
data |
Original data (for factor levels) |
Value
Data frame for plotting
Create Survival Fit for Risk Table in CIF
Description
Create Survival Fit for Risk Table in CIF
Usage
create_cif_risk_table_sfit(data, response, cov = NULL)
Arguments
data |
Input data |
response |
Time and status variables |
cov |
Covariate (optional) |
Value
Survival fit object for risk table
Create plotting dataframe for KM curves
Description
Create plotting dataframe for KM curves
Usage
create_km_dataframe(sfit, stratalabs, conf.curves = FALSE, conf.type = "log")
Arguments
sfit |
Survival fit object |
stratalabs |
Strata labels |
conf.curves |
Whether to include confidence intervals |
conf.type |
Confidence interval type |
Create risk table for survival plot
Description
Create risk table for survival plot
Usage
create_risk_table(
sfit,
times,
xlim,
stratalabs,
stratalabs.table = NULL,
strataname.table = "",
Numbers_at_risk_text = "At Risk",
multiple_lines = TRUE,
col = NULL,
fsize = 12,
nsize = 3
)
Arguments
sfit |
Survival fit object |
times |
Time points for risk table |
xlim |
X-axis limits |
stratalabs |
Strata labels |
stratalabs.table |
Table-specific strata labels |
strataname.table |
Table strata name |
Numbers_at_risk_text |
Text for numbers at risk |
multiple_lines |
Whether multiple strata |
col |
colours for strata |
fsize |
Font size |
nsize |
Number size in table |
fit crr model
Description
Wrapper function to fit fine and gray competing risk model using function crr from package cmprsk
Usage
crrRx(f, data)
Arguments
f |
formula for the model. Currently the formula only works by using the name of the column in a dataframe. It does not work by using $ or [] notation. |
data |
dataframe containing data |
Value
a competing risk model with the call appended to the list
See Also
Examples
# From the crr help file:
set.seed(10)
ftime <- rexp(200)
fstatus <- sample(0:2,200,replace=TRUE)
cov <- matrix(runif(600),nrow=200)
dimnames(cov)[[2]] <- c('x1','x2','x3')
df <- data.frame(ftime,fstatus,cov)
m1 <- crrRx(as.formula('ftime+fstatus~x1+x2+x3'),df)
# Nicely output to report:
rm_mvsum(m1,data=df,showN = TRUE,vif=TRUE)
Column separator for paste operations
Description
Column separator for paste operations
Usage
csep()
Value
Character string ", "
Tumour size change over time Longitudinal changes in tumour size since baseline for patients by changes in ctDNA status (clearance, decrease or increase) since baseline.
Description
Tumour size change over time
Longitudinal changes in tumour size since baseline for patients by changes in ctDNA status (clearance, decrease or increase) since baseline.
Usage
data('ctDNA')
Format
A data frame with 270 rows and 5 variables:
- id
Patient ID
- cohort
Study Cohort
- ctdna_status
Change in ctDNA since baseline
- time
Number of weeks on treatment
- size_change
Percentage change in tumour measurement
Source
https://www.nature.com/articles/s43018-020-0096-5
Remove duplicate reference level rows
Description
When combining adjusted and unadjusted data, reference levels appear twice. This keeps only the first occurrence per variable + level combination.
Usage
deduplicate_refs(tab)
Arguments
tab |
combined data frame from prepare_forest_data |
Value
data frame with duplicate reference rows removed
Lower bound of non-centrality parameter confidence interval
Description
S3 generic to compute the lower bound of the confidence interval for the non-centrality parameter of a test statistic.
Usage
delta_l(htest, CIwidth)
Arguments
htest |
A hypothesis test object (with class indicating test type). |
CIwidth |
Confidence interval width. |
Value
A numeric lower bound.
Upper bound of non-centrality parameter confidence interval
Description
S3 generic to compute the upper bound of the confidence interval for the non-centrality parameter of a test statistic.
Usage
delta_u(htest, CIwidth)
Arguments
htest |
A hypothesis test object (with class indicating test type). |
CIwidth |
Confidence interval width. |
Value
A numeric upper bound.
Retrieve columns number from spreadsheet columns specified as unquoted letters
Description
Retrieve columns number from spreadsheet columns specified as unquoted letters
Usage
excelCol(...)
Arguments
... |
unquoted excel column headers (i.e. excelCol(A,CG,AA)) separated by commas |
Value
a numeric vector corresponding to columns in a spreadsheet
Examples
## Find the column numbers for excel columns AB, CE and BB
excelCol(AB,CE,bb)
## Get the columns between A and K and Z
excelCol(A-K,Z)
Retrieve spreadsheet column letter-names from columns indices
Description
Creates a vector of spreadsheet-style letter-names corresponding to column numbers
Usage
excelColLetters(columnIndices)
Arguments
columnIndices |
vector of integer column indices |
Details
This is the inverse function of excelCol
Value
a character vector corresponding to the spreadsheet column headings
Examples
## Find the column numbers for excel columns AB, CE and BB
colIndices <- excelCol(AB,CE,bb)
## Go back to the column names
excelColLetters(colIndices)
Extract Gray's test results from CIF fit
Description
Extract Gray's test results from CIF fit
Usage
extract_grays_test(fit, plot.event = 1)
Arguments
fit |
CIF fit object or dataframe with test attribute |
plot.event |
Events to test |
Extract variable labels from labelled data frame
Description
Extract variable labels from data and return a data frame with labels
Usage
extract_labels(data, sep = "_")
Arguments
data |
the data frame to extract labels from |
sep |
character used to separate multiple labels, defaults to "_" |
Details
All variable names will be returned, even those with no labels. If the label attribute has length greater than one the values will be concatenated and returned as a single string separated by sep
Examples
# Set a few variable labels for ctDNA
data("ctDNA")
ctDNA <- ctDNA |> set_var_labels(
ctdna_status="detectable ctDNA",
cohort="A cohort label")
# Extract labels
extract_labels(ctDNA)
Extract Function and Package Information from Current Document
Description
The function automatically detects the current R script file
(works best in RStudio), parses the code to identify function calls,
determines which packages they belong to, and creates a summary of all
non-base R packages used in the script. It handles both namespace-qualified
function calls (e.g., dplyr::filter) and regular function calls, while
filtering out base R functions and control structures.
Usage
extract_package_details(ignore_comments = TRUE)
Arguments
ignore_comments |
Logical. If TRUE (default), ignores function calls within commented code (both R comments starting with # and HTML/XML comments ). If FALSE, extracts functions from all code including commented sections. |
Details
This function analyses the current file (an R script, Rmd or qmd file) to extract information about all functions called within the code, identifies their associated packages, and returns a summary of packages used with version and citation information.
Value
A data frame with the following columns:
- package_name
Character. Name of the package
- functions_called
Character. Comma-separated list of functions called from this package
- package_version
Character. Version number of the installed package
- package_citation
Character. Formatted citation for the package
Note
Works best when run from RStudio with an active source file
Requires that referenced packages are already loaded/installed
Will not detect functions called through indirect methods (e.g.,
do.call())
See Also
getAnywhere,
packageVersion, citation
Examples
## Not run:
# Run this function from within an R script to analyze its dependencies
package_info <- extract_package_details()
# Include functions from commented code
package_info_all <- extract_package_details(ignore_comments = FALSE)
print(package_info)
## End(Not run)
Forward fill NA values (vectorized implementation)
Description
Efficiently fills NA values by carrying forward the last non-NA value. Uses vectorized operations for better performance than loop-based approaches.
Usage
fillNAs(x)
Arguments
x |
Vector with NAs to fill |
Value
Vector with NAs filled
Examples
## Not run:
fillNAs(c(1, NA, NA, 2, NA, 3)) # Returns: c(1, 1, 1, 2, 2, 3)
## End(Not run)
Fit Competing Risks Model
Description
Fit Competing Risks Model
Usage
fit_cif_model(data, response, cov = NULL)
Arguments
data |
Input dataframe |
response |
Character vector with time and status column names |
cov |
Covariate column name (optional) |
Value
List containing fit object and group separator
Fit Kaplan-Meier survival curves
Description
Fit Kaplan-Meier survival curves
Usage
fit_km_model(data, response, cov = NULL, conf.type = "log")
Arguments
data |
Input data |
response |
Time and status variables |
cov |
Covariate (optional) |
conf.type |
Confidence interval type |
Create a forest plot using ggplot2 (DEPRECATED)
Description
#' @description
Deprecated: Please use forestplotMV() instead.
Usage
forestplot2(
model,
conf.level = 0.95,
orderByRisk = TRUE,
colours = "default",
showEst = TRUE,
rmRef = FALSE,
logScale = getOption("reportRmd.logScale", TRUE),
nxTicks = 5
)
Arguments
model |
an object output from the glm or geeglm function, must be from a logistic regression |
conf.level |
controls the width of the confidence interval |
orderByRisk |
logical, should the plot be ordered by risk |
colours |
can specify colours for risks less than, 1 and greater than 1.0. Default is red, black, green |
showEst |
logical, should the risks be displayed on the plot in text |
rmRef |
logical, should the reference levels be removed for the plot? |
logScale |
logical, should OR/RR be shown on log scale, defaults to TRUE, or reportRmd.logScale if set. See https://doi.org/10.1093/aje/kwr156 for why you may prefer a linear scale. |
nxTicks |
Number of tick marks supplied to the log_breaks function to produce |
Details
This function will be removed in a future version.
This function will accept a log or logistic regression fit from glm or geeglm, and display the OR or RR for each variable on the appropriate log scale.
Value
a plot object
Create a multivariable forest plot using ggplot2
Description
This function creates forest plots from fitted regression models, with optional inclusion of unadjusted estimates. It uses m_summary for robust data extraction and properly handles factor level ordering and reference levels.
Usage
forestplotMV(
model,
data = NULL,
include_unadjusted = FALSE,
conf.level = 0.95,
colours = "default",
showEst = TRUE,
showRef = TRUE,
digits = getOption("reportRmd.digits", 2),
logScale = getOption("reportRmd.logScale", TRUE),
nxTicks = 5,
showN = TRUE,
showEvent = TRUE,
xlim = NULL
)
Arguments
model |
an object output from the glm or geeglm function, must be from a logistic or log-link regression |
data |
dataframe containing your data (required if include_unadjusted = TRUE) |
include_unadjusted |
logical, should unadjusted estimates be included? Default is FALSE |
conf.level |
controls the width of the confidence interval (default 0.95) |
colours |
can specify colours for risks less than, equal to, and greater than 1.0. Default is green, black, red |
showEst |
logical, should the risks be displayed on the plot in text? Default is TRUE |
showRef |
logical, should reference levels be shown? Default is TRUE |
digits |
number of digits to use displaying estimates (default 2) |
logScale |
logical, should OR/RR be shown on log scale? Defaults to TRUE. See https://doi.org/10.1093/aje/kwr156 for why you may prefer a linear scale |
nxTicks |
Number of tick marks for x-axis (default 5) |
showN |
Show number of observations per variable and category (default TRUE) |
showEvent |
Show number of events per variable and category (default TRUE) |
xlim |
Numeric vector of length 2 specifying x-axis limits (ex c(0.2, 5)) |
Value
a ggplot object
Examples
data("pembrolizumab")
glm_fit <- glm(orr ~ change_ctdna_group + sex + age + l_size,
data = pembrolizumab, family = 'binomial')
# Adjusted only
forestplotMV(glm_fit, data = pembrolizumab)
# Both adjusted and unadjusted
forestplotMV(glm_fit, data = pembrolizumab, include_unadjusted = TRUE)
Create a univariable forest plot using ggplot2
Description
This function creates forest plots from univariable regression models. For new code, consider using forestplotMV() which can handle both adjusted and unadjusted estimates.
Usage
forestplotUV(
response,
covs,
data,
model = "glm",
id = NULL,
corstr = NULL,
family = NULL,
digits = getOption("reportRmd.digits", 2),
conf.level = 0.95,
colours = "default",
showEst = TRUE,
showRef = TRUE,
logScale = getOption("reportRmd.logScale", TRUE),
nxTicks = 5,
showN = TRUE,
showEvent = TRUE,
xlim = NULL
)
Arguments
response |
character vector with names of columns to use for response |
covs |
character vector with names of columns to use for covariates |
data |
dataframe containing your data |
model |
fitted model object (default "glm") |
id |
character vector which identifies clusters. Only used for geeglm |
corstr |
character string specifying the correlation structure. Only used for geeglm |
family |
description of the error distribution and link function to be used in the model |
digits |
number of digits to round to (default 2) |
conf.level |
controls the width of the confidence interval (default 0.95) |
colours |
can specify colours for risks less than, equal to, and greater than 1.0. Default is green, black, red |
showEst |
logical, should the risks be displayed on the plot in text? Default is TRUE |
showRef |
logical, should reference levels be shown? Default is TRUE |
logScale |
logical, should OR/RR be shown on log scale? Defaults to TRUE |
nxTicks |
Number of tick marks for x-axis (default 5) |
showN |
Show number of observations per variable and category (default TRUE) |
showEvent |
Show number of events per variable and category (default TRUE) |
xlim |
numeric vector of length 2 specifying x-axis limits (ex c(0.2, 5)) Confidence intervals extending beyond these limits will be shown with arrows. |
Value
a ggplot object
Examples
data("pembrolizumab")
forestplotUV(response = "orr",
covs = c("change_ctdna_group", "sex", "age", "l_size"),
data = pembrolizumab, family = 'binomial')
Combine univariable and multivariable forest plot (DEPRECATED)
Description
This function is deprecated. Please use forestplotMV() with include_unadjusted = TRUE instead.
Usage
forestplotUVMV(UVmodel, MVmodel, ...)
Arguments
UVmodel |
an UV model object output from the forestplotUV function |
MVmodel |
a MV model object output from the forestplotMV function |
... |
additional arguments (ignored) |
Parameter Estimation for the Box-Cox Transformation
Description
This function is copied from the geoR package which has been removed from the CRAN repository.
Usage
geoR_boxcoxfit(object, xmat, lambda, lambda2 = NULL, add.to.data = 0)
Arguments
object |
a vector with the data |
xmat |
a matrix with covariates values. Defaults to rep(1, length(y)). |
lambda |
numerical value(s) for the transformation parameter lambda. Used as the initial value in the function for parameter estimation. If not provided default values are assumed. If multiple values are passed the one with highest likelihood is used as initial value. |
lambda2 |
ogical or numerical value(s) of the additional transformation (see DETAILS below). Defaults to NULL. If TRUE this parameter is also estimated and the initial value is set to the absolute value of the minimum data. A numerical value is provided it is used as the initial value. Multiple values are allowed as for lambda. |
add.to.data |
a constant value to be added to the data. |
Details
For more information see: https://cran.r-project.org/web/packages/geoR/index.html
Get beta label for model type
Description
Returns the appropriate coefficient label (OR, HR, RR, Estimate) for a given model class.
Usage
get_beta_label(model_class)
Arguments
model_class |
S3 class name (e.g., "rm_glm", "rm_coxph") |
Value
Character string for beta label
Extract cluster IDs from a fitted model
Description
S3 generic to extract the cluster/group identifier vector from a fitted
model. Returns NULL for models that do not have a clustering
structure.
Usage
get_cluster_ids(model)
Arguments
model |
A fitted model object. |
Value
A vector of cluster identifiers (one per observation), or NULL.
Extract event counts from a fitted model
Description
S3 generic to extract event and sample size counts from a fitted model.
Usage
get_event_counts(model)
Arguments
model |
A fitted model object. |
Value
A named list with event count information, or NULL.
Get model class from model specifications
Description
Internal function that maps model type, family, and GEE usage to the appropriate S3 class for autoreg() dispatch.
Usage
get_model_class(type, family = NULL, gee = FALSE)
Arguments
type |
Model type (linear, logistic, poisson, negbin, ordinal, boxcox, coxph, crr) |
family |
Model family (gaussian, binomial, poisson, or NULL) |
gee |
Logical indicating if GEE model |
Value
Character string of S3 class name
Extract model coefficients
Description
S3 generic to extract raw coefficients from a fitted model.
Usage
get_model_coef(model)
Arguments
model |
A fitted model object. |
Value
A named numeric vector of model coefficients.
Extract data from a fitted model
Description
S3 generic to extract the data frame used to fit a model.
Usage
get_model_data(model)
Arguments
model |
A fitted model object. |
Value
A data frame, or NULL if data cannot be extracted.
Create Kaplan-Meier or cumulative incidence plots
Description
ggkmcif() was deprecated in version 0.1.2 and will be removed in a future version.
Please use ggkmcif2() instead.
Usage
ggkmcif(
response,
cov = NULL,
data,
type = NULL,
pval = TRUE,
HR = FALSE,
HR_pval = FALSE,
conf.curves = FALSE,
conf.type = "log",
table = TRUE,
times = NULL,
xlab = "Time",
ylab = NULL,
main = NULL,
stratalabs = NULL,
strataname = nicename(cov),
stratalabs.table = NULL,
strataname.table = strataname,
median.text = FALSE,
median.lines = FALSE,
median.CI = FALSE,
set.time.text = NULL,
set.time.line = FALSE,
set.time = 5,
set.time.CI = FALSE,
censor.marks = TRUE,
censor.size = 3,
censor.stroke = 1.5,
fsize = 10,
nsize = 3,
lsize = 1,
psize = 3.5,
median.size = 3,
median.pos = NULL,
median.lsize = 1,
set.size = 3,
set.pos = NULL,
set.lsize = 1,
ylim = c(0, 1),
col = NULL,
linetype = NULL,
xlim = NULL,
legend.pos = NULL,
pval.pos = NULL,
plot.event = 1,
event = c("col", "linetype"),
flip.CIF = FALSE,
cut = NULL,
eventlabs = NULL,
event.name = NULL,
Numbers_at_risk_text = "Numbers at risk",
HR.digits = 2,
HR.pval.digits = 3,
pval.digits = 3,
median.digits = 3,
set.time.digits = 3,
returns = FALSE,
print.n.missing = TRUE
)
Arguments
response |
Character vector with time and status column names |
cov |
Covariate column name (optional) |
data |
Input dataframe |
type |
Plot type ("KM" or "CIF", auto-detected if NULL) |
pval |
Whether to show p-values |
conf.curves |
Whether to show confidence bands |
table |
Whether to include risk table |
times |
Numeric vector of times for the x-axis |
xlab |
X-axis label |
ylab |
Y-axis label |
col |
colours vector |
plot.event |
Events to plot |
returns |
Whether to return list with plot and at risk table |
Value
See ggkmcif2() for return value details
Plot KM and CIF curves with ggplot
Description
This function will plot a KM or CIF curve with option to add the number at risk. You can specify if you want confidence bands, the hazard ratio, and pvalues, as well as the units of time used.
Usage
ggkmcif2(
response,
cov = NULL,
data,
pval = TRUE,
conf.curves = FALSE,
table = TRUE,
xlab = "Time",
ylab = NULL,
col = NULL,
times = NULL,
type = NULL,
plot.event = 1,
returns = FALSE,
...
)
Arguments
response |
Character vector with time and status column names |
cov |
Covariate column name (optional) |
data |
Input dataframe |
pval |
Whether to show p-values |
conf.curves |
Whether to show confidence bands |
table |
Whether to include risk table |
xlab |
X-axis label |
ylab |
Y-axis label |
col |
colours vector |
times |
Numeric vector of times for the x-axis |
type |
Plot type ("KM" or "CIF", auto-detected if NULL) |
plot.event |
Events to plot |
returns |
Whether to return list with plot and at risk table |
... |
Additional arguments see see ggkmcif2Parameters |
Details
Note that for proper pdf output of special characters the following code needs to be included in the first chunk of the rmd knitr::opts_chunk$set(dev="cairo_pdf")
Additional parameters passed to ggkmcif2
Description
This section documents the additional parameters for ggkmcif2.
Usage
ggkmcif2Parameters(
HR = FALSE,
HR_pval = FALSE,
conf.type = "log",
main = NULL,
stratalabs = NULL,
strataname,
stratalabs.table = NULL,
strataname.table = strataname,
median.text = FALSE,
median.lines = FALSE,
median.CI = FALSE,
set.time.text = NULL,
set.time.line = FALSE,
set.time = 5,
set.time.CI = FALSE,
censor.marks = TRUE,
censor.size = 2,
censor.stroke = 1.5,
censor.symbol = "|",
fsize,
nsize = 3,
lsize = 0.7,
psize = 3.5,
median.size = 3,
median.pos = NULL,
median.lsize = 1,
set.size = 3,
set.pos = NULL,
set.lsize = 1,
ylim = c(0, 1),
linetype = NULL,
xlim = NULL,
legend.pos,
legend.title = strataname,
pval.pos = NULL,
event = c("col", "linetype"),
flip.CIF = FALSE,
cut = NULL,
eventlabs = NULL,
event.name = NULL,
Numbers_at_risk_text = "At risk",
tbl.height = NULL,
HR.digits = 2,
HR.pval.digits = 3,
pval.digits = 3,
median.digits = 3,
set.time.digits = 3,
print.n.missing = TRUE,
returns = FALSE
)
Arguments
HR |
boolean to specify if you want hazard ratios included in the plot |
HR_pval |
boolean to specify if you want HR p-values in the plot |
conf.type |
One of "none"(the default), "plain", "log" , "log-log" or "logit". Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervals curve +- k *se(curve), where k is determined from conf.int. The log option calculates intervals based on the cumulative hazard or log(survival). The log-log option bases the intervals on the log hazard or log(-log(survival)), and the logit option on log(survival/(1-survival)). |
main |
String corresponding to main title. When NULL uses Kaplan-Meier Plot s, and "Cumulative Incidence Plot for CIF" |
stratalabs |
string corresponding to the labels of the covariate, when NULL will use the levels of the covariate |
strataname |
String of the covariate name default is nicename(cov) |
stratalabs.table |
String corresponding to the levels of the covariate for the number at risk table, when NULL will use the levels of the covariate. Can use a string of "-" when the labels are long |
strataname.table |
String of the covariate name for the number at risk table default is nicename(cov |
median.text |
boolean to specify if you want the median values added to the legend (or as added text if there are no covariates), for KM only |
median.lines |
boolean to specify if you want the median values added as lines to the plot, for KM only |
median.CI |
boolean to specify if you want the 95\ with the median text (Only for KM) |
set.time.text |
string for the text to add survival at a specified time (eg. year OS) |
set.time.line |
boolean to specify if you want the survival added as lines to the plot at a specified point |
set.time |
Numeric values of the specific time of interest, default is 5 (Multiple values can be entered) |
set.time.CI |
boolean to specify if you want the 95\ interval with the set time text |
censor.marks |
logical value. If TRUE, includes censor marks (only for KM curves) |
censor.size |
size of censor marks, default is 3 |
censor.stroke |
stroke of censor marks, default is 1.5 |
censor.symbol |
either a character or a number 0-24 specifying the ggplot shape to be used as the censor symbol |
fsize |
font size |
nsize |
font size for numbers in the numbers at risk table |
lsize |
line size |
psize |
size of the pvalue |
median.size |
size of the median text (Only when there are no covariates) |
median.pos |
vector of length 2 corresponding to the median position (Only when there are no covariates) |
median.lsize |
line size of the median lines |
set.size |
size of the survival at a set time text (Only when there are no covariates) |
set.pos |
vector of length 2 corresponding to the survival at a set point position (Only when there are no covariates) |
set.lsize |
line size of the survival at set points |
ylim |
vector of length 2 corresponding to limits of y-axis. Default to NULL |
linetype |
vector of line types; default is solid for all lines |
xlim |
vector of length 2 corresponding to limits of x-axis. Default to NULL |
legend.pos |
A string corresponding to the legend position ("left","top", "right", "bottom", "none") or a numeric vector specifying the internal coordinates of the plot ie c(0.5,.0.5) for the centre of the plot. |
legend.title |
a string for the title of the legend, defaults to strataname |
pval.pos |
vector of length 2 corresponding to the p-value position |
event |
String specifying if the event should be mapped to the colour, or linetype when plotting both events to colour = "col", line type |
flip.CIF |
boolean to flip the CIF curve to start at 1 |
cut |
numeric value indicating where to divide a continuous covariate (default is the median) |
eventlabs |
String corresponding to the event type names |
event.name |
String corresponding to the label of the event types |
Numbers_at_risk_text |
String for the label of the number at risk, set Numbers_at_risk_text=NULL to remove |
tbl.height |
Height of the at risk table, relative to plot. To set the table to half the height of the plot use tbl.height = 0.5 |
HR.digits |
Number of digits printed of the hazard ratio |
HR.pval.digits |
Number of digits printed of the hazard ratio pvalue |
pval.digits |
Number of digits printed of the Gray's/log rank pvalue |
median.digits |
Number of digits printed of the median pvalue |
set.time.digits |
Number of digits printed of the probability at a specified time |
print.n.missing |
Logical, should the number of missing be shown !Needs to be checked |
returns |
Logical, if TRUE a list contain the plot and at risk table is returned |
Additional parameters passed to ggkmcif2
Description
This section documents the additional parameters for ggkmcif2.
Arguments
HR |
boolean to specify if you want hazard ratios included in the plot |
HR_pval |
boolean to specify if you want HR p-values in the plot |
conf.type |
One of "none"(the default), "plain", "log" , "log-log" or "logit". Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervals curve +- k *se(curve), where k is determined from conf.int. The log option calculates intervals based on the cumulative hazard or log(survival). The log-log option bases the intervals on the log hazard or log(-log(survival)), and the logit option on log(survival/(1-survival)). |
table.height |
Relative height of risk table (0-1) |
main |
String corresponding to main title. When NULL uses Kaplan-Meier Plot s, and "Cumulative Incidence Plot for CIF" |
stratalabs |
string corresponding to the labels of the covariate, when NULL will use the levels of the covariate |
strataname |
String of the covariate name default is nicename(cov) |
stratalabs.table |
String corresponding to the levels of the covariate for the number at risk table, when NULL will use the levels of the covariate. Can use a string of "-" when the labels are long |
strataname.table |
String of the covariate name for the number at risk table default is nicename(cov |
median.text |
boolean to specify if you want the median values added to the legend (or as added text if there are no covariates), for KM only |
median.lines |
boolean to specify if you want the median values added as lines to the plot, for KM only |
median.CI |
boolean to specify if you want the 95\ interval with the median text (Only for KM) |
set.time.text |
string for the text to add survival at a specified time (eg. year OS) |
set.time.line |
boolean to specify if you want the survival added as lines to the plot at a specified point |
set.time |
Numeric values of the specific time of interest, default is 5 (Multiple values can be entered) |
set.time.CI |
boolean to specify if you want the 95\ interval with the set time text |
censor.marks |
logical value. If TRUE, includes censor marks (only for KM curves) |
censor.size |
size of censor marks, default is 3 |
censor.stroke |
stroke of censor marks, default is 1.5 |
fsize |
font size |
nsize |
font size for numbers in the numbers at risk table |
lsize |
line size |
psize |
size of the pvalue |
median.size |
size of the median text (Only when there are no covariates) |
median.pos |
vector of length 2 corresponding to the median position (Only when there are no covariates) |
median.lsize |
line size of the median lines |
set.size |
size of the survival at a set time text (Only when there are no covariates) |
set.pos |
vector of length 2 corresponding to the survival at a set point position (Only when there are no covariates) |
set.lsize |
line size of the survival at set points |
ylim |
vector of length 2 corresponding to limits of y-axis. Default to NULL |
linetype |
vector of line types; default is solid for all lines |
xlim |
vector of length 2 corresponding to limits of x-axis. Default to NULL |
legend.pos |
Can be either a string corresponding to the legend position ("left","top", "right", "bottom", "none") or a vector of length 2 corresponding to the legend position (uses normalized units (ie the c(0.5,0.5) is the middle of the plot)) |
pval.pos |
vector of length 2 corresponding to the p-value position |
event |
String specifying if the event should be mapped to the colour, or linetype when plotting both events to colour = "col", line type |
flip.CIF |
boolean to flip the CIF curve to start at 1 |
cut |
numeric value indicating where to divide a continuous covariate (default is the median) |
eventlabs |
String corresponding to the event type names |
event.name |
String corresponding to the label of the event types |
Numbers_at_risk_text |
String for the label of the number at risk |
HR.digits |
Number of digits printed of the hazard ratio |
HR.pval.digits |
Number of digits printed of the hazard ratio pvalue |
pval.digits |
Number of digits printed of the Gray's/log rank pvalue |
median.digits |
Number of digits printed of the median pvalue |
set.time.digits |
Number of digits printed of the probability at a specified time |
print.n.missing |
Logical, should the number of missing be shown !Needs to be checked |
returns |
Logical, if TRUE a list contain the plot and at risk table is returned |
Plot KM and CIF curves with ggplot
Description
This function will plot a KM or CIF curve with option to add the number at risk. You can specify if you want confidence bands, the hazard ratio, and pvalues, as well as the units of time used.
Arguments
response |
character vector with names of columns to use for response |
cov |
String specifying the column name of stratification variable |
data |
dataframe containing your data |
pval |
boolean to specify if you want p-values in the plot (Log Rank test for KM and Gray's test for CIF) |
conf.curves |
boolean to specify if you want confidence interval bands |
table |
Logical value. If TRUE, includes the number at risk table |
xlab |
String corresponding to xlabel. By default is "Time" |
ylab |
String corresponding to ylabel. When NULL uses "Survival |
col |
vector of colours |
times |
Numeric vector of times for the x-axis probability" for KM cuves, and "Probability of an event" for CIF |
type |
string indicating he type of univariate model to fit. The function will try and guess what type you want based on your response. If you want to override this you can manually specify the type. Options include "KM", and ,"CIF" |
plot.event |
Which event(s) to plot (1,2, or c(1,2)) |
returns |
boolean indicating if a list with the objects should be returned. Default is FALSE and plot will be printed |
... |
for additional plotting arguments see ggkmcif2Parameters_2025 |
Details
Note that for proper pdf output of special characters the following code needs to be included in the first chunk of the rmd knitr::opts_chunk$set(dev="cairo_pdf")
Value
ggplot object; if table = F then only curves are output; if table = T then curves and risk table are output together
Examples
# Simple plot without confidence intervals
data("pembrolizumab")
ggkmcif2(response = c('os_time','os_status'),
cov='cohort',
data=pembrolizumab)
# Plot with median survival time
ggkmcif2(response = c('os_time','os_status'),
cov='sex',
data=pembrolizumab,
median.text = TRUE,median.lines=TRUE,conf.curves=TRUE)
# Plot with specified survival times and log-log CI
ggkmcif2(response = c('os_time','os_status'),
cov='sex',
data=pembrolizumab,
median.text = FALSE,set.time.text = 'mo OS',
set.time = c(12,24),conf.type = 'log-log',conf.curves=TRUE)
# KM plot with 95% CI and censor marks
ggkmcif2(c('os_time','os_status'),'sex',data = pembrolizumab, type = 'KM',
HR=TRUE, HR_pval = TRUE, conf.curves = TRUE,conf.type='log-log',
set.time.CI = TRUE, censor.marks=TRUE)
combine components of a call to ggkmci
Description
ggkmcif() was deprecated in version 0.1.2 and will be removed in a future version.
Usage
ggkmcif_paste(list_gg)
Arguments
list_gg |
A list of ggplot objects from |
Calculate global p-values for categorical variables
Description
S3 generic to compute global (Type II/III) p-values for categorical predictors in a fitted model.
Usage
gp(model, ...)
Arguments
model |
A fitted model object. |
... |
Additional arguments passed to methods. |
Value
A data frame with columns var and global_p.
Bold strings for HTML output
Description
Wraps strings in HTML bold formatting using inline CSS.
Usage
hbld(strings)
Arguments
strings |
Vector of strings to bold |
Value
Vector of strings wrapped in HTML bold span
Format p-values for plot annotations
Description
Formats p-values specifically for display in plots (e.g., survival curves). Returns formatted string with "p = " or "p < " prefix.
Usage
lpvalue2(x, digits)
Arguments
x |
Numeric p-value |
digits |
Number of decimal places to display (default from context) |
Details
Formatting rules:
p < 10^-digits: returns "p < threshold" (e.g., "p < 0.001")
p >= threshold: returns "p = value" rounded to specified digits
Used by: ggkmcif2() for survival curve annotations in main.R and ggkmcif3.R
Value
Character string with "p = " or "p < " prefix
Examples
## Not run:
lpvalue2(0.0001, 3) # Returns: "p < 0.001"
lpvalue2(0.0456, 3) # Returns: "p = 0.046"
## End(Not run)
Output a table for multivariate or univariate regression models
Description
A dataframe corresponding to a univariate or multivariate regression table. If for_plot = TRUE, estimates and confidence interval bounds will also be displayed separately for easy plotting.
Usage
m_summary(
model,
CIwidth = 0.95,
digits = 2,
vif = FALSE,
whichp = "levels",
for_plot = FALSE
)
Arguments
model |
model fit |
CIwidth |
width for confidence intervals, defaults to 0.95 |
digits |
number of digits to round estimates to, does not affect p-values |
vif |
boolean indicating if the variance inflation factor should be included. See details |
whichp |
string indicating whether you want to display p-values for levels within categorical data ("levels"), global p values ("global"), or both ("both"). Irrelevant for continuous predictors. When for_plot = TRUE, global p values will be displayed in a separate column from p values. If whichp = "levels", global p values will not be included in the outputted table. |
for_plot |
boolean indicating whether or not the function will be used for plotting. Default is FALSE |
Details
Global p-values are likelihood ratio tests for lm, glm and polr models. For lme models an attempt is made to re-fit the model using ML and if,successful LRT is used to obtain a global p-value. For coxph models the model is re-run without robust variances with and without each variable and a LRT is presented. If unsuccessful a Wald p-value is returned. For GEE and CRR models Wald global p-values are returned. For negative binomial models a deviance test is used.
If the variance inflation factor is requested (VIF=TRUE) then a generalised VIF will be calculated in the same manner as the car package.
As of R 4.4.0 the likelihood profiles are included in base R.
The number of decimals places to display the statistics can be changed with digits, but this will not change the display of p-values. If more significant digits are required for p-values then use tableOnly=TRUE and format as desired.
Examples
## Not run: data("pembrolizumab")
uv_lm <- lm(age~sex,data=pembrolizumab)
m_summary(uv_lm, digits = 3, for_plot = FALSE)
mv_binom <- glm(orr~age+sex+cohort,family = 'binomial',data = pembrolizumab)
m_summary(mv_binom, whichp = "both", for_plot = TRUE)
## End(Not run)
Match coefficient names to covariate indices
Description
Matches model coefficient names (including interactions) to original covariate indices from the model call. Handles centered variables and interaction terms. Returns a numeric encoding for sorting.
Usage
matchcovariate(betanames, ucall)
Arguments
betanames |
Vector of coefficient names from model |
ucall |
Vector of unique covariate names from model call |
Details
Changes:
Feb 21, 2019: Changed from charmatch to grepl for more reliable matching
Dec 14, 2020: Added space removal to support centered variables
Value
Numeric vector of encoded covariate indices, or -1 if matching fails
Extract data frame name from model call argument
Description
Attempts to identify the data frame object used in a model call by searching the global environment.
Usage
matchdata(dataArg)
Arguments
dataArg |
Data argument from model call |
Value
Character string of data frame name, or NULL if not found
Create model matrix from formula
Description
Extracts model matrix from formula, optionally separating response variables from predictors. Removes intercept column and cleans column names.
Usage
modelmatrix(f, data = NULL)
Arguments
f |
Formula object |
data |
Data frame (optional) |
Value
Model matrix or list of matrices (y and x) if response present
Extract model term names
Description
S3 generic to extract predictor term names from a fitted model, excluding the intercept.
Usage
mterms(model)
Arguments
model |
A fitted model object. |
Value
A character vector of term names.
Get multivariate summary dataframe
Description
Returns a dataframe with the model summary and global p-value for multi-level variables.
Usage
mvsum(
model,
data,
digits = getOption("reportRmd.digits", 2),
showN = TRUE,
showEvent = TRUE,
markup = TRUE,
sanitize = TRUE,
nicenames = TRUE,
CIwidth = 0.95,
vif = TRUE
)
Arguments
model |
fitted model object |
data |
dataframe containing data |
digits |
number of digits to round to |
showN |
boolean indicating sample sizes should be shown for each comparison, can be useful for interactions |
showEvent |
boolean indicating if number of events should be shown. Only available for logistic. |
markup |
boolean indicating if you want latex markup |
sanitize |
boolean indicating if you want to sanitize all strings to not break LaTeX |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space. |
CIwidth |
width for confidence intervals, defaults to 0.95 |
vif |
boolean indicating if the variance inflation factor should be included. See details |
Details
Global p-values are likelihood ratio tests for lm, glm and polr models. For lme models an attempt is made to re-fit the model using ML and if,successful LRT is used to obtain a global p-value. For coxph models the model is re-run without robust variances with and without each variable and a LRT is presented. If unsuccessful a Wald p-value is returned. For GEE and CRR models Wald global p-values are returned.
If the variance inflation factor is requested (VIF=TRUE) then a generalised VIF will be calculated in the same manner as the car package.
VIF for competing risk models is computed by fitting a linear model with a dependent variable comprised of the sum of the model independent variables and then calculating VIF from this linear model.
References
John Fox & Georges Monette (1992) Generalized Collinearity Diagnostics, Journal of the American Statistical Association, 87:417, 178-183, DOI: 10.1080/01621459.1992.10475190
John Fox and Sanford Weisberg (2019). An R Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage.
Combine two table columns into a single column with levels of one nested within levels of the other.
Description
This function accepts a data frame (via the data argument) and combines two columns into a single column with values from the head_col serving as headers and values of the to_col displayed underneath each header. The resulting table is then passed to outTable for printing and output, to use the grouped table as a data frame specify tableOnly=TRUE. By default the headers will be bolded and the remaining values indented.
Usage
nestTable(
data,
head_col,
to_col,
colHeader = "",
caption = NULL,
indent = TRUE,
boldheaders = TRUE,
hdr_prefix = "",
hdr_suffix = "",
digits = getOption("reportRmd.digits", 2),
tableOnly = FALSE,
fontsize
)
Arguments
data |
dataframe |
head_col |
character value specifying the column name with the headers |
to_col |
character value specifying the column name to add the headers into |
colHeader |
character with the desired name of the first column. The default is to leave this empty for output or, for table only output to use the column name 'col1'. |
caption |
table caption |
indent |
Boolean should the original values in the to_col be indented |
boldheaders |
Boolean should the header column values be bolded |
hdr_prefix |
character value that will prefix headers |
hdr_suffix |
character value that will suffix headers |
digits |
number of digits to round numeric columns to, wither a single number or a vector corresponding to the number of numeric columns |
tableOnly |
boolean indicating if the table should be formatted for printing or returned as a data frame |
fontsize |
PDF/HTML output only, manually set the table fontsize |
Details
Note that it is possible to combine multiple tables (more than two) with this function.
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
Examples
## Investigate models to predict baseline ctDNA and tumour size and display together
## (not clinically useful!)
data(pembrolizumab)
fit1 <- lm(baseline_ctdna~age+l_size+pdl1,data=pembrolizumab)
m1 <- rm_mvsum(fit1,tableOnly=TRUE)
m1$Response = 'ctDNA'
fit2 <- lm(l_size~age+baseline_ctdna+pdl1,data=pembrolizumab)
m2 <- rm_mvsum(fit2,tableOnly=TRUE)
m2$Response = 'Tumour Size'
nestTable(rbind(m1,m2),head_col='Response',to_col='Covariate')
Format model call as clean string
Description
Converts model call to string with single quotes instead of double quotes.
Usage
nicecall(model_call)
Arguments
model_call |
Call object from model |
Value
Character string of formatted call
Order forest plot data by risk and factor levels
Description
Orders variables by their maximum estimate (descending) and levels within variables by their original order, with reference levels first.
Usage
order_forest_data(tab)
Arguments
tab |
data frame prepared by prepare_forest_data |
Details
When both Adjusted and Unadjusted rows are present, ordering is determined by the Adjusted estimates. When only Unadjusted rows are present (e.g., from forestplotUV), ordering uses those estimates directly.
Print tables to PDF/Latex HTML or Word
Description
Output the table nicely to whatever format is appropriate. This is the output function used by the rm_* printing functions.
Usage
outTable(
tab,
row.names = NULL,
to_indent = numeric(0),
bold_headers = TRUE,
rows_bold = numeric(0),
bold_cells = NULL,
caption = NULL,
digits = getOption("reportRmd.digits", 2),
align,
applyAttributes = TRUE,
keep.rownames = FALSE,
nicenames = TRUE,
fontsize,
chunk_label,
format = NULL,
header_above = NULL
)
Arguments
tab |
a table to format |
row.names |
a string specifying the column name to assign to the rownames. If NULL (the default) then rownames are removed. |
to_indent |
numeric vector indicating which rows to indent in the first column. |
bold_headers |
boolean indicating if the column headers should be bolded |
rows_bold |
numeric vector indicating which rows to bold |
bold_cells |
array indices indicating which cells to bold. These will be in addition to rows bolded by rows_bold. |
caption |
table caption |
digits |
number of digits to round numeric columns to, either a single number or a vector corresponding to the number of numeric columns in tab |
align |
string specifying column alignment, defaults to left alignment of the first column and right alignment of all other columns. The align argument accepts a single string with 'l' for left, 'c' for centre and 'r' for right, with no separations. For example, to set the left column to be centred, the middle column right-aligned and the right column left aligned use: align='crl' |
applyAttributes |
boolean indicating if the function should use to_indent and bold_cells formatting attributes. This will only work properly if the dimensions of the table output from rm_covsum, rm_uvsum etc haven't changed. |
keep.rownames |
should the row names be included in the output |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space |
fontsize |
PDF/HTML output only, manually set the table fontsize |
chunk_label |
only used knitting to Word docs to allow cross-referencing |
format |
if specified ('html','latex') will override the global pandoc setting |
header_above |
a named numeric vector specifying an extra header row
above the column names, where the names are the labels and the values are
the number of columns each label should span. For example,
|
Details
Entire rows can be bolded, or specific cells. Currently indentation refers to the first column only. By default, underscores in column names are converted to spaces. To disable this set nicenames to FALSE
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
Survival data Survival status and ctDNA levels for patients receiving pembrolizumab
Description
Survival data
Survival status and ctDNA levels for patients receiving pembrolizumab
Usage
data('pembrolizumab')
Format
A data frame with 94 rows and 15 variables:
- id
Patient ID
- age
Age at study entry
- sex
Patient Sex
- cohort
Study Cohort
- l_size
Target lesion size at baseline
- pdl1
PD L1 percent
- tmb
log of TMB
- baseline_ctdna
Baseline ctDNA
- change_ctdna_group
Did ctDNA increase or decrease from baseline to cycle 3
- orr
Objective Response
- cbr
Clinical Beneficial Response
- os_status
Overall survival status
- os_time
Overall survival time in months
- pfs_status
Progression free survival status
- pfs_time
Progression free survival time in months
Source
https://www.nature.com/articles/s43018-020-0096-5
Plot multiple bivariate relationships in a single plot
Description
This function is designed to accompany rm_uvsum as a means of
visualising the results, and uses similar syntax.
Usage
plotuv(
response,
covs,
data,
showN = FALSE,
showPoints = TRUE,
na.rm = TRUE,
response_title = NULL,
return_plotlist = FALSE,
ncol = 2,
p_margins = c(0, 0.2, 1, 0.2),
bpThreshold = 20,
mixed = TRUE,
violin = FALSE,
position = c("dodge", "stack", "fill"),
use_labels = TRUE
)
Arguments
response |
character vector with names of columns to use for response |
covs |
character vector with names of columns to use for covariates |
data |
dataframe containing your data |
showN |
boolean indicating whether sample sizes should be shown on the plots |
showPoints |
boolean indicating whether individual data points should be shown when n>20 in a category |
na.rm |
boolean indicating whether na values should be shown or removed |
response_title |
character value with title of the plot |
return_plotlist |
boolean indicating that the list of plots should be returned instead of a plot, useful for applying changes to the plot, see details |
ncol |
the number of columns of plots to be display in the ggarrange call, defaults to 2 |
p_margins |
sets the TRBL margins of the individual plots, defaults to c(0,0.2,1,.2) |
bpThreshold |
Default is 20, if there are fewer than 20 observations in a category then dotplots, as opposed to boxplots are shown. |
mixed |
should a mix of dotplots and boxplots be shown based on sample size? If false then all categories will be shown as either dotplots, or boxplots according the bpThreshold and the smallest category size |
violin |
Show violin plots instead of boxplots. This will override bpThreshold and mixed. |
position |
for categorical variables how should barplots be presented. Default is "dodge" IF stack is TRUE then n will not be shown. |
use_labels |
boolean, default is true if the variables have label attributes this will be shown in the plot instead of the variable names, or if there are no labels then tidy versions of the variable names will be used. If use_labels=FALSE the variable names will be used. |
Details
Plots are displayed as follows: If response is continuous For a numeric predictor scatterplot For a categorical predictor: If 20+ observations available boxplot, otherwise dotplot with median line If response is a factor For a numeric predictor: If 20+ observations available boxplot, otherwise dotplot with median line For a categorical predictor barplot Response variables are shown on the ordinate (y-axis) and covariates on the abscissa (x-axis)
Variable names are replaced by their labels if available, or by tidy versions if not. Set use_labels=FALSE to use the variable names.
Value
a list containing plots for each variable in covs
See Also
ggplot2::ggplot and
ggpubr::ggarrange
replace_plot_labels
Examples
## Run multiple univariate analyses on the pembrolizumab dataset to predict cbr and
## then visualise the relationships.
data("pembrolizumab")
rm_uvsum(data=pembrolizumab,
response='cbr',covs=c('age','sex','l_size','baseline_ctdna'))
plotuv(data=pembrolizumab, response='cbr',
covs=c('age','sex','l_size','baseline_ctdna'),showN=TRUE)
Extract and prepare forest plot data from m_summary output
Description
Extract and prepare forest plot data from m_summary output
Usage
prepare_forest_data(summary_output, model_type = "Adjusted", digits = 2)
Arguments
summary_output |
output from m_summary with for_plot = TRUE |
model_type |
character, "Adjusted" or "Unadjusted" |
digits |
number of digits for rounding |
Process CIF Median Values
Description
Process CIF Median Values
Usage
process_cif_medians(
fit,
plot.event,
stratalabs,
median.lines = FALSE,
median.text = FALSE,
median.digits = 3,
multiple_lines = FALSE
)
Arguments
fit |
CIF fit object |
plot.event |
Events to plot |
stratalabs |
Strata labels |
median.lines |
Whether to calculate for median lines |
median.text |
Whether to add median text |
median.digits |
Number of digits for median |
multiple_lines |
Whether there are multiple strata |
Value
List with updated stratalabs and median values
Process CIF Time-Specific Estimates
Description
Process CIF Time-Specific Estimates
Usage
process_cif_timepoints(
fit,
plot.event,
stratalabs,
set.time.text = NULL,
set.time = NULL,
set.time.line = FALSE,
set.time.CI = FALSE,
set.time.digits = 3,
multiple_lines = FALSE
)
Arguments
fit |
CIF fit object |
plot.event |
Events to plot |
stratalabs |
Strata labels |
set.time.text |
Text label for time points |
set.time |
Time points to evaluate |
set.time.line |
Whether to add lines |
set.time.CI |
Whether to include confidence intervals |
set.time.digits |
Number of digits |
multiple_lines |
Whether there are multiple strata |
Value
List with updated stratalabs and time-specific estimates
Process covariate variable (factor conversion, numeric cutoffs)
Description
Process covariate variable (factor conversion, numeric cutoffs)
Usage
process_covariate(data, cov, cut = NULL, stratalabs = NULL)
Arguments
data |
Input data |
cov |
Covariate column name |
cut |
Numeric cutoff for continuous variables |
stratalabs |
Custom strata labels |
Round and paste with parentheses (smart formatting)
Description
Rounds numeric values and formats as "value (lower, upper)" with intelligent formatting:
Values with |x| < 0.01 or |x| > 1000: scientific notation
Other values: standard rounding with trailing zeros
Usage
psthr(x, y = 2, compact = FALSE)
psthr0(x, digits = 2)
Arguments
x |
Numeric vector to round and format |
y |
Number of digits/significant figures (default 2) |
compact |
If TRUE, omit spaces for compact display (e.g. plots) |
Value
Character string with first element followed by remaining elements in parentheses
Functions
-
psthr0(): Compact version without spaces (for plots)
Paste vector elements with parentheses
Description
Formats a vector as "first (second, third, ...)" where remaining elements are comma-separated inside parentheses.
Usage
pstprn(x, compact = FALSE)
pstprn0(x)
Arguments
x |
Vector of values (first element shown separately) |
Value
Character string with first element followed by remaining elements in parentheses
Functions
-
pstprn0(): Compact version without spaces (for plots)
Remove dollar sign prefix from column names
Description
Removes common prefix (before $) from interaction term column names. Used to clean up model matrix column names.
Usage
removedollar(x)
Arguments
x |
Character vector of column names |
Value
Character vector with dollar sign prefixes removed
Replace variable names with labels in ggplot
Description
If the data stored in a ggplot object has variable labels then this will replace the variable names with the variable labels. If no labels are set then the variable names will be tidied and a nicer version used.
Usage
replace_plot_labels(plot)
Arguments
plot |
output from a call to ggplot2 |
See Also
set_var_labels() for setting individual variable labels,
set_labels() for setting variable labels using a data frame,
extract_labels() for creating a data frame of all variable labels,
clear_labels() for removing variable labels
Examples
## Not run:
data("pembrolizumab")
p <- ggplot(pembrolizumab,aes(x=change_ctdna_group,y=baseline_ctdna)) +
geom_boxplot()
replace_plot_labels(p)
pembrolizumab <- set_var_labels(pembrolizumab,
change_ctdna_group="Change in ctDNA group")
p <- ggplot(pembrolizumab,aes(x=change_ctdna_group,y=baseline_ctdna)) +
geom_boxplot()
replace_plot_labels(p)
# Can also be used with a pipe, but expression needs to be wrapped in a brace
(ggplot(pembrolizumab,aes(x=change_ctdna_group,y=baseline_ctdna)) +
geom_boxplot()) |> replace_plot_labels()
## End(Not run)
Summarize cumulative incidence by group
Description
Displays event counts and event rates at specified time points for the entire cohort and by group. Gray's test of differences in cumulative incidence is displayed.
Usage
rm_cifsum(
data,
time,
status,
group = NULL,
eventcode = 1,
cencode = 0,
eventtimes,
eventtimeunit,
eventtimeLbls = NULL,
CIwidth = 0.95,
unformattedp = FALSE,
na.action = "na.omit",
showCounts = TRUE,
showGraystest = TRUE,
digits = 2,
caption = NULL,
tableOnly = FALSE
)
Arguments
data |
data frame containing survival data |
time |
string indicating survival time variable |
status |
string indicating event status variable; must have at least 3 levels, e.g. 0 = censor, 1 = event, 2 = competing risk |
group |
string or character vector indicating the variable to group observations by |
eventcode |
numerical variable indicating event, default is 1 |
cencode |
numerical variable indicating censored observation, default is 0 |
eventtimes |
numeric vector specifying when event probabilities should be calculated |
eventtimeunit |
unit of time to suffix to the time column label if event probabilities are requested, should be plural |
eventtimeLbls |
if supplied, a vector the same length as eventtimes with descriptions (useful for displaying years with data provided in months) |
CIwidth |
width of the event probabilities, default is 95% |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Should be used in conjunction with the digits argument. |
na.action |
default is to omit missing values, but can be set to throw and error using na.action='na.fail' |
showCounts |
boolean indicating if the at risk, events and censored columns should be output, default is TRUE |
showGraystest |
boolean indicating Gray's test should be included in the final table, default is TRUE |
digits |
the number of digits to report in the event probabilities, default is 2. |
caption |
table caption for markdown output |
tableOnly |
should a dataframe or a formatted object be returned |
Value
A character vector of the event table source code, unless tableOnly=TRUE in which case a data frame is returned
Examples
library(survival)
data(pbc)
# Event probabilities at various time points with replacement time labels
rm_cifsum(data=pbc,time='time',status='status',
eventtimes=c(1825,3650),eventtimeLbls=c(5,10),eventtimeunit='yr')
# Event probabilities by one group
rm_cifsum(data=pbc,time='time',status='status',group='trt',
eventtimes=c(1825,3650),eventtimeunit='day')
# Event probabilities by multiple groups
rm_cifsum(data=pbc,time='time',status='status',group=c('trt','sex'),
eventtimes=c(1825,3650),eventtimeunit='day')
Output a compact summary table
Description
Outputs a table formatted for pdf, word or html output with summary statistics
Usage
rm_compactsum(
data,
xvars,
grp,
use_mean,
caption = NULL,
tableOnly = FALSE,
covTitle = "",
digits = 1,
digits.cat = 0,
nicenames = TRUE,
iqr = TRUE,
all.stats = FALSE,
pvalue = TRUE,
effSize = FALSE,
p.adjust = "none",
unformattedp = FALSE,
show.sumstats = FALSE,
show.tests = FALSE,
full = TRUE,
percentage = "col"
)
Arguments
data |
dataframe containing data |
xvars |
character vector with the names of covariates to include in table |
grp |
character with the name of the grouping variable |
use_mean |
logical indicating whether mean and standard deviation will be returned for continuous variables instead of median. Otherwise, can specify for individual variables using a character vector containing the names of covariates to return mean and sd for (if use_mean is not supplied, all covariates will have median summaries). See examples. |
caption |
character containing table caption (default is no caption) |
tableOnly |
logical, if TRUE then a dataframe is returned, otherwise a formatted printed object is returned (default is FALSE) |
covTitle |
character with the name of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate' |
digits |
numeric specifying the number of digits for summarizing mean data. Digits can be specified for individual variables using a named vector in the format digits=c("var1"=2,"var2"=3). If a variable is not in the vector the default will be used for it (default is 1). See examples |
digits.cat |
numeric specifying the number of digits for the proportions when summarizing categorical data (default is 0) |
nicenames |
logical indicating if you want to replace . and _ in strings . with a space |
iqr |
logical indicating if you want to display the interquartile range (Q1-Q3) as opposed to (min-max) in the summary for continuous variables |
all.stats |
logical indicating if all summary statistics (Q1, Q3 + min, max on a separate line) should be displayed. Overrides iqr |
pvalue |
logical indicating if you want p-values included in the table |
effSize |
logical indicating if you want effect sizes and their 95% confidence intervals included in the table. Effect sizes calculated include Cramer's V for categorical variables, and Cohen's d, Wilcoxon r, Epsilon-squared, or Omega-squared for numeric/continuous variables |
p.adjust |
p-adjustments to be performed |
unformattedp |
logical indicating if you would like the p-value to be returned unformatted (ie. not rounded or prefixed with '<'). Best used with tableOnly = T and outTable function. See examples |
show.sumstats |
logical indicating if the type of statistical summary (mean, median, etc) used should be shown. |
show.tests |
logical indicating if the type of statistical test and effect size (if effSize = TRUE) used should be shown in a column beside the p-values. |
full |
logical indicating if you want the full sample included in the table, ignored if grp is not specified |
percentage |
choice of how percentages are presented, either column (default) or row |
Details
Comparisons for categorical variables default to chi-square tests, but if there are counts of <5 then the Fisher Exact test will be used. For grouping variables with two levels, either t-tests (mean) or wilcoxon tests (median) will be used for numerical variables. Otherwise, ANOVA (mean) or Kruskal- Wallis tests will be used. The statistical test used can be displayed by specifying show.tests = TRUE. Statistical tests and effect sizes for grp and/ or xvars with less than 2 counts in any level will not be shown.
Effect sizes are calculated as Cohen d for between group differences if the variable is summarised with the mean, otherwise Wilcoxon R if summarised with a median. Cramer's V is used for categorical variables, omega is used for differences in means among more than two groups and epsilon for differences in medians among more than two groups. Confidence intervals are calculated using bootstrapping.
tidyselect can only be used for xvars and grp arguments. Additional arguments (digits, use_mean) must be passed in using characters if variable names are used.
Value
A character vector of the table source code, unless tableOnly = TRUE in which case a data frame is returned. The output has the following attribute:
"description", which describes what is included in the output table and the type of statistical summary for each covariate. When applicable, the types of statistical tests used will be included. If effSize = TRUE, the effect sizes for each covariate will also be mentioned.
References
Smithson, M. (2002). Noncentral Confidence Intervals for Standardized Effect Sizes. (07/140 ed., Vol. 140). SAGE Publications. doi:10.4135/9781412983761.n4
Steiger, J. H. (2004). Beyond the F Test: Effect Size Confidence Intervals and Tests of Close Fit in the Analysis of Variance and Contrast Analysis. Psychological Methods, 9(2), 164–182. doi:10.1037/1082-989X.9.2.164
Kelley, T. L. (1935). An Unbiased Correlation Ratio Measure. Proceedings of the National Academy of Sciences - PNAS, 21(9), 554–559. doi:10.1073/pnas.21.9.554
Okada, K. (2013). Is Omega Squared Less Biased? A Comparison of Three Major Effect Size Indices in One-Way ANOVA. Behavior Research Methods, 40(2), 129-147.
Breslow, N. (1970). A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship. Biometrika, 57(3), 579-594.
FRITZ, C. O., MORRIS, P. E., & RICHLER, J. J. (2012). Effect Size Estimates: Current Use, Calculations, and Interpretation. Journal of Experimental Psychology. General, 141(1), 2–18. doi:10.1037/a0024338
Examples
data("pembrolizumab")
rm_compactsum(data = pembrolizumab, xvars = c("age",
"change_ctdna_group", "l_size", "pdl1"), grp = "sex", use_mean = "age",
digits = c("age" = 2, "l_size" = 3), digits.cat = 1, iqr = TRUE,
show.tests = TRUE)
# Other Examples (not run)
## Include the summary statistic in the variable column
#rm_compactsum(data = pembrolizumab, xvars = c("age",
#"change_ctdna_group"), grp = "sex", use_mean = "age", show.sumstats=TRUE)
## To show effect sizes
#rm_compactsum(data = pembrolizumab, xvars = c("age",
#"change_ctdna_group"), grp = "sex", use_mean = "age", digits = 2,
#effSize = TRUE, show.tests = TRUE)
## To return unformatted p-values
#rm_compactsum(data = pembrolizumab, xvars = c("l_size",
#"change_ctdna_group"), grp = "cohort", effSize = TRUE, unformattedp = TRUE)
## Using tidyselect
#pembrolizumab |> rm_compactsum(xvars = c(age, sex, pdl1), grp = cohort,
#effSize = TRUE)
Add header row to table Outputs a descriptive covariate table
Description
Returns a data frame corresponding to a descriptive table.
Usage
rm_covsum(
data,
covs = NULL,
maincov = NULL,
caption = NULL,
tableOnly = FALSE,
covTitle = "",
digits = 1,
digits.cat = 0,
nicenames = TRUE,
IQR = FALSE,
all.stats = FALSE,
pvalue = TRUE,
effSize = FALSE,
p.adjust = "none",
unformattedp = FALSE,
show.tests = FALSE,
testcont = c("rank-sum test", "ANOVA"),
testcat = c("Chi-squared", "Fisher"),
full = TRUE,
include_missing = FALSE,
percentage = c("column", "row"),
dropLevels = TRUE,
excludeLevels = NULL,
numobs = NULL,
fontsize,
chunk_label,
xvars = NULL,
grp = NULL
)
Arguments
data |
dataframe containing data |
covs |
Covariate names to summarize. Accepts either a character vector
(e.g., |
maincov |
Grouping variable. Accepts either a character string
(e.g., |
caption |
character containing table caption (default is no caption) |
tableOnly |
Logical, if TRUE then a dataframe is returned, otherwise a formatted printed object is returned (default). |
covTitle |
character with the names of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate'. |
digits |
number of digits for summarizing mean data |
digits.cat |
number of digits for the proportions when summarizing categorical data (default: 0) |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space |
IQR |
boolean indicating if you want to display the inter quantile range (Q1,Q3) as opposed to (min,max) in the summary for continuous variables |
all.stats |
boolean indicating if all summary statistics (Q1,Q3 + min,max on a separate line) should be displayed. Overrides IQR. |
pvalue |
boolean indicating if you want p-values included in the table |
effSize |
boolean indicating if you want effect sizes included in the table. Can only be obtained if pvalue is also requested. Effect sizes calculated include Cramer's V for categorical variables, Cohen's d, Wilcoxon r, or Eta-squared for numeric/continuous variables. |
p.adjust |
p-adjustments to be performed. Uses the p.adjust function from base R |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Best used with tableOnly = T and outTable function. See examples. |
show.tests |
boolean indicating if the type of statistical test and effect size used should be shown in a column beside the pvalues. Ignored if pvalue=FALSE. |
testcont |
test of choice for continuous variables,one of rank-sum (default) or ANOVA |
testcat |
test of choice for categorical variables,one of Chi-squared (default) or Fisher |
full |
boolean indicating if you want the full sample included in the table, ignored if maincov is NULL |
include_missing |
Option to include NA values of maincov. NAs will not be included in statistical tests |
percentage |
choice of how percentages are presented, one of column (default) or row |
dropLevels |
logical, indicating if empty factor levels be dropped from the output, default is TRUE. |
excludeLevels |
a named list of covariate levels to exclude from statistical tests in the form list(varname =c('level1','level2')). These levels will be excluded from association tests, but not the table. This can be useful for levels where there is a logical skip (ie not missing, but not presented). Ignored if pvalue=FALSE. |
numobs |
named list overriding the number of people you expect to have the covariate |
fontsize |
PDF/HTML output only, manually set the table fontsize |
chunk_label |
only used if output is to Word to allow cross-referencing |
xvars |
Alias for |
grp |
Alias for |
Details
Comparisons for categorical variables default to chi-square tests, but if there are counts of <5 then the Fisher Exact test will be used and if this is unsuccessful then a second attempt will be made computing p-values using MC simulation. If testcont='ANOVA' then the t-test with unequal variance will be used for two groups and an ANOVA will be used for three or more. The statistical test used can be displayed by specifying show.tests=TRUE.
Effect size can be obtained when p-value is requested.
Further formatting options are available using tableOnly=TRUE and outputting the table with a call to outTable.
A newer version of this function is rm_compactsum which is more flexible and displays fewer rows of output.
Tidyselect can be used for covs, maincov, xvars, and
grp arguments, allowing bare column names (e.g., c(age, sex))
in addition to character strings (e.g., c("age", "sex")).
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
References
Ellis, P.D. (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.doi:10.1017/CBO9780511761676
Lakens, D. (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4; 863:1-12. doi:10.3389/fpsyg.2013.00863
See Also
covsum,fisher.test,
chisq.test, wilcox.test,
kruskal.test, anova, and outTable
Examples
data("pembrolizumab")
rm_covsum(data=pembrolizumab, maincov = 'orr',
covs=c('age','sex','pdl1','tmb','l_size','change_ctdna_group'),
show.tests=TRUE)
# To Show Effect Sizes
rm_covsum(data=pembrolizumab, maincov = 'orr',
covs=c('age','sex'),
effSize=TRUE)
# To make custom changes or change the fontsize in PDF/HTML
tab <- rm_covsum(data=pembrolizumab,maincov = 'change_ctdna_group',
covs=c('age','sex','pdl1','tmb','l_size'),show.tests=TRUE,tableOnly = TRUE)
outTable(tab, fontsize=7)
# To return unformatted p-values
tab <- rm_covsum(data=pembrolizumab, maincov = 'orr',
covs=c('age','sex','pdl1','tmb','l_size','change_ctdna_group'),
show.tests=TRUE,unformattedp=TRUE,tableOnly=TRUE)
outTable(tab,digits=5)
outTable(tab,digits=5, applyAttributes=FALSE) # remove bold/indent
Format a regression model nicely for 'Rmarkdown'
Description
Multivariable (or univariate) regression models are re-formatted for reporting and a global p-value is added for the evaluation of factor variables.
Usage
rm_mvsum(
model,
data,
digits = getOption("reportRmd.digits", 2),
covTitle = "",
showN = TRUE,
showEvent = TRUE,
CIwidth = 0.95,
vif = TRUE,
whichp = c("levels", "global", "both"),
caption = NULL,
tableOnly = FALSE,
p.adjust = "none",
unformattedp = FALSE,
nicenames = TRUE,
include_unadjusted = FALSE,
chunk_label,
fontsize
)
Arguments
model |
model fit |
data |
|
digits |
number of digits to round estimates to, does not affect p-values |
covTitle |
character with the names of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate'. |
showN |
boolean indicating sample sizes should be shown for each comparison, can be useful for interactions |
showEvent |
boolean indicating if number of events should be shown. Only available for logistic. |
CIwidth |
width for confidence intervals, defaults to 0.95 |
vif |
boolean indicating if the variance inflation factor should be included. See details |
whichp |
string indicating whether you want to display p-values for levels within categorical data ("levels"), global p values ("global"), or both ("both"). Irrelevant for continuous predictors. |
caption |
table caption |
tableOnly |
boolean indicating if unformatted table should be returned |
p.adjust |
p-adjustments to be performed. Uses the p.adjust function from base R |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Should be used in conjunction with the digits argument. |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space |
include_unadjusted |
Logical. If TRUE, includes univariate estimates alongside multivariable estimates. Default is FALSE. |
chunk_label |
Deprecated, previously used in Word to allow cross-referencing, this should now be done at the chunk level. |
fontsize |
PDF/HTML output only, manually set the table fontsize |
Details
Global p-values are likelihood ratio tests for lm, glm and polr models. For lme models an attempt is made to re-fit the model using ML and if successful LRT is used to obtain a global p-value. For lmer models (lme4), if the lmerTest package is installed, Satterthwaite-based p-values and F-test global p-values are used; otherwise Wald z-based p-values and chi-squared LRT global p-values are returned. For glmer models (lme4), Wald z-based p-values are used with chi-squared LRT global p-values. Estimates are exponentiated for binomial (OR) and poisson/negative binomial (RR) families. For coxph models the model is re-run without robust variances with and without each variable and a LRT is presented. If unsuccessful a Wald p-value is returned. For GEE and CRR models Wald global p-values are returned. For negative binomial models a deviance test is used.
If the variance inflation factor is requested (VIF=TRUE, default) then a generalised VIF will be calculated in the same manner as the car package.
As of version 0.1.1 if global p-values are requested they will be included in the p-value column.
As of R 4.4.0 profile likelihood confidence intervals will be calculated automatically and there is no longer an option to force Wald tests.
The number of decimals places to display the statistics can be changed with digits, but this will not change the display of p-values. If more significant digits are required for p-values then use tableOnly=TRUE and format as desired.
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
References
John Fox & Georges Monette (1992) Generalized Collinearity Diagnostics, Journal of the American Statistical Association, 87:417, 178-183, doi:10.1080/01621459.1992.10475190
John Fox and Sanford Weisberg (2019). An R Companion to Applied Regression, Third Edition. Thousand Oaks CA: Sage.
Examples
data("pembrolizumab")
glm_fit = glm(change_ctdna_group~sex:age+baseline_ctdna+l_size,
data=pembrolizumab,family = 'binomial')
rm_mvsum(glm_fit)
#linear model with p-value adjustment
lm_fit=lm(baseline_ctdna~age+sex+l_size+tmb,data=pembrolizumab)
rm_mvsum(lm_fit,p.adjust = "bonferroni")
#Coxph
require(survival)
res.cox <- coxph(Surv(os_time, os_status) ~ sex+age+l_size+tmb, data = pembrolizumab)
rm_mvsum(res.cox, vif=TRUE)
# lmer (lme4 mixed effects model) - single random intercept
if (require(lme4)){
lmer_fit <- lme4::lmer(age ~ sex + pdl1 + (1|cohort), data = pembrolizumab)
rm_mvsum(lmer_fit)
}
# lmer with multiple random effects and global p-values
if (require(lme4) && require(geepack)){
data(dietox, package = "geepack")
dietox$Cu <- as.factor(dietox$Cu)
lmer_fit2 <- lme4::lmer(Weight ~ Cu + Time + (1|Pig) + (1|Litter), data = dietox)
rm_mvsum(lmer_fit2, whichp = "both")
}
# glmer (binomial mixed effects model) - odds ratios
if (require(lme4)){
data(cbpp, package = "lme4")
glmer_fit <- lme4::glmer(cbind(incidence, size - incidence) ~ period + (1|herd),
data = cbpp, family = binomial)
rm_mvsum(glmer_fit)
}
# glmer.nb (negative binomial mixed effects model) - rate ratios
if (require(lme4) && require(geepack)){
data(dietox, package = "geepack")
dietox$Cu <- as.factor(dietox$Cu)
nb_fit <- lme4::glmer.nb(Weight ~ Cu + Time + (1|Pig), data = dietox)
rm_mvsum(nb_fit, whichp = "both")
}
Display event counts, expected event counts and logrank test of differences
Description
This is a wrapper function around the survdiff function to display overall event rates and group-specific rates along with the log-rank test of a difference in survival between groups in a single table suitable for markdown output. Median survival times are included by default but can be removed setting median=FALSE
Usage
rm_survdiff(
data,
time,
status,
covs,
strata,
includeVarNames = FALSE,
digits = 1,
showCols = c("N", "Observed", "Expected"),
CIwidth = 0.95,
conf.type = "log",
caption = NULL,
tableOnly = FALSE,
fontsize,
unformattedp = FALSE
)
Arguments
data |
data frame containing survival data |
time |
string indicating survival time variable |
status |
string indicating event status variable |
covs |
character vector indicating variables to group observations by |
strata |
string indicating the variable to stratify observations by |
includeVarNames |
boolean indicating if the variable names should be included in the output table, default is FALSE |
digits |
the number of digits in the survival rate |
showCols |
character vector indicating which of the optional columns to display, defaults to c('N','Observed','Expected') |
CIwidth |
width of the median survival estimates, default is 95% |
conf.type |
type of confidence interval see
|
caption |
table caption |
tableOnly |
should a dataframe or a formatted object be returned |
fontsize |
PDF/HTML output only, manually set the table fontsize |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Should be used in conjunction with the digits argument. |
Value
A character vector of the survival table source code, unless tableOnly=TRUE in which case a data frame is returned
See Also
Examples
#' # Differences between sex
data("pembrolizumab")
rm_survdiff(data=pembrolizumab,time='os_time',status='os_status',
covs='sex',digits=1)
# Differences between sex, stratified by cohort
rm_survdiff(data=pembrolizumab,time='os_time',status='os_status',
covs='sex',strata='cohort',digits=1)
# Differences between sex/cohort groups
rm_survdiff(data=pembrolizumab,time='os_time',status='os_status',
covs=c('sex','cohort'),digits=1)
Summarise survival data by group
Description
Displays event counts, median survival time and survival rates at specified times points for the entire cohort and by group. The logrank test of differences in survival curves is displayed.
Usage
rm_survsum(
data,
time,
status,
group = NULL,
survtimes = NULL,
survtimeunit,
survtimesLbls = NULL,
CIwidth = 0.95,
unformattedp = FALSE,
conf.type = "log",
na.action = "na.omit",
showCounts = TRUE,
showLogrank = TRUE,
eventProb = FALSE,
digits = getOption("reportRmd.digits", 2),
caption = NULL,
tableOnly = FALSE,
fontsize
)
Arguments
data |
data frame containing survival data |
time |
string indicating survival time variable |
status |
string indicating event status variable |
group |
string or character vector indicating the variable(s) to group observations by. If this is left as NULL (the default) then summaries are provided for the entire cohort. |
survtimes |
numeric vector specifying when survival probabilities should be calculated. |
survtimeunit |
unit of time to suffix to the time column label if survival probabilities are requested, should be plural |
survtimesLbls |
if supplied, a vector the same length as survtimes with descriptions (useful for displaying years with data provided in months) |
CIwidth |
width of the survival probabilities, default is 95% |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Should be used in conjunction with the digits argument. |
conf.type |
type of confidence interval see |
na.action |
default is to omit missing values, but can be set to throw and error using na.action='na.fail' |
showCounts |
boolean indicating if the at risk, events and censored columns should be output; default is TRUE |
showLogrank |
boolean indicating if the log-rank test statistic and p-value should be output; default is TRUE |
eventProb |
boolean indicating if event probabilities, rather than survival probabilities, should be displayed; default is FALSE |
digits |
the number of digits in the survival rate, default is 2, unless the reportRmd.digits option is set |
caption |
table caption for markdown output |
tableOnly |
should a dataframe or a formatted object be returned |
fontsize |
PDF/HTML output only, manually set the table fontsize |
Details
This summary table is supplied for simple group comparisons only. To examine
differences in groups with stratification see rm_survdiff. To
summarise differences in survival rates controlling for covariates see
rm_survtime.
Value
A character vector of the survival table source code, unless tableOnly=TRUE in which case a data frame is returned
See Also
Examples
# Simple median survival table
data("pembrolizumab")
rm_survsum(data=pembrolizumab,time='os_time',status='os_status')
# Survival table with yearly survival rates
rm_survsum(data=pembrolizumab,time='os_time',status='os_status',
survtimes=c(12,24),survtimesLbls=1:2, survtimeunit='yr')
#Median survival by group
rm_survsum(data=pembrolizumab,time='os_time',status='os_status',group='sex')
# Survival Summary by cohort, displayed in years
rm_survsum(data=pembrolizumab,time='os_time',status='os_status',
group="cohort",survtimes=seq(12,72,12),
survtimesLbls=seq(1,6,1),
survtimeunit='years')
# Survival Summary by Sex and ctDNA group
rm_survsum(data=pembrolizumab,time='os_time',status='os_status',
group=c('sex','change_ctdna_group'),survtimes=c(12,24),survtimeunit='mo')
Display survival rates and events for specified times
Description
This is a wrapper for the survfit function to output a tidy display for reporting. Either Kaplan Meier or Cox Proportional Hazards models may be used to estimate the survival probabilities.
Usage
rm_survtime(
data,
time,
status,
covs = NULL,
strata = NULL,
type = "KM",
survtimes,
survtimeunit,
strata.prefix = NULL,
survtimesLbls = NULL,
showCols = c("At Risk", "Events", "Censored"),
CIwidth = 0.95,
conf.type = "log",
na.action = "na.omit",
showCounts = TRUE,
digits = getOption("reportRmd.digits", 2),
caption = NULL,
tableOnly = FALSE,
fontsize
)
Arguments
data |
data frame containing survival data |
time |
string indicating survival time variable |
status |
string indicating event status variable |
covs |
character vector with the names of variables to adjust for in coxph fit |
strata |
string indicating the variable to group observations by. If this is left as NULL (the default) then event counts and survival rates are provided for the entire cohort. |
type |
survival function, if no covs are specified defaults to Kaplan-Meier, otherwise the Cox PH model is fit. Use type='PH' to fit a Cox PH model with no covariates. |
survtimes |
numeric vector specifying when survival probabilities should be calculated. |
survtimeunit |
unit of time to suffix to the time column label if survival probabilities are requested, should be plural |
strata.prefix |
character value describing the grouping variable |
survtimesLbls |
if supplied, a vector the same length as survtimes with descriptions (useful for displaying years with data provided in months) |
showCols |
character vector specifying which of the optional columns to display, defaults to c('At Risk','Events','Censored') |
CIwidth |
width of the survival probabilities, default is 95% |
conf.type |
type of confidence interval see |
na.action |
default is to omit missing values, but can be set to throw and error using na.action='na.fail' |
showCounts |
boolean indicating if the at risk, events and censored columns should be output, default is TRUE |
digits |
the number of digits in the survival rate, default is 2. |
caption |
table caption for markdown output |
tableOnly |
should a dataframe or a formatted object be returned |
fontsize |
PDF/HTML output only, manually set the table fontsize |
Details
If covariates are supplied then a Cox proportional hazards model is fit for the entire cohort and each strata. Otherwise the default is for Kaplan-Meier estimates. Setting type = 'PH' will force a proportional hazards model.
Value
A character vector of the survival table source code, unless tableOnly=TRUE in which case a data frame is returned
See Also
Examples
# Kaplan-Mieir survival probabilities with time displayed in years
data("pembrolizumab")
rm_survtime(data=pembrolizumab,time='os_time',status='os_status',
strata="cohort",type='KM',survtimes=seq(12,72,12),
survtimesLbls=seq(1,6,1),
survtimeunit='years')
# Cox Proportional Hazards survivial probabilities
rm_survtime(data=pembrolizumab,time='os_time',status='os_status',
strata="cohort",type='PH',survtimes=seq(12,72,12),survtimeunit='months')
# Cox Proportional Hazards survivial probabilities controlling for age
rm_survtime(data=pembrolizumab,time='os_time',status='os_status',
covs='age',strata="cohort",survtimes=seq(12,72,12),survtimeunit='months')
Combine univariate and multivariable regression tables
Description
This function will combine rm_uvsum and rm_mvsum outputs into a single table. The tableOnly argument must be set to TRUE when tables to be combined are created. The resulting table will be in the same order as the uvsum table and will contain the same columns as the uvsum and mvsum tables, but the p-values will be combined into a single column. There must be a variable overlapping between the uvsum and mvsum tables and all variables in the mvsum table must also appear in the uvsum table.
Usage
rm_uv_mv(
uvsumTable,
mvsumTable,
covTitle = "",
vif = FALSE,
showN = FALSE,
showEvent = FALSE,
caption = NULL,
tableOnly = FALSE,
chunk_label,
fontsize
)
Arguments
uvsumTable |
Output from rm_uvsum, with tableOnly=TRUE |
mvsumTable |
Output from rm_mvsum, with tableOnly=TRUE |
covTitle |
character with the names of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate'. |
vif |
boolean indicating if the variance inflation factor should be shown if present in the mvsumTable. Default is FALSE. |
showN |
boolean indicating if sample sizes should be displayed. |
showEvent |
boolean indicating if number of events (dichotomous outcomes) should be displayed. |
caption |
table caption |
tableOnly |
boolean indicating if unformatted table should be returned |
chunk_label |
only used if output is to Word to allow cross-referencing |
fontsize |
PDF/HTML output only, manually set the table fontsize |
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
See Also
Examples
require(survival)
data("pembrolizumab")
uvTab <- rm_uvsum(response = c('os_time','os_status'),
covs=c('age','sex','baseline_ctdna','l_size','change_ctdna_group'),
data=pembrolizumab,tableOnly=TRUE)
mv_surv_fit <- coxph(Surv(os_time,os_status)~age+sex+
baseline_ctdna+l_size+change_ctdna_group, data=pembrolizumab)
uvTab <- rm_mvsum(mv_surv_fit)
#linear model
uvtab<-rm_uvsum(response = 'baseline_ctdna',
covs=c('age','sex','l_size','pdl1','tmb'),
data=pembrolizumab,tableOnly=TRUE)
lm_fit=lm(baseline_ctdna~age+sex+l_size+tmb,data=pembrolizumab)
mvtab<-rm_mvsum(lm_fit,tableOnly = TRUE)
rm_uv_mv(uvtab,mvtab,tableOnly=TRUE)
#logistic model
uvtab<-rm_uvsum(response = 'os_status',
covs=c('age','sex','l_size','pdl1','tmb'),
data=pembrolizumab,family = binomial,tableOnly=TRUE)
logis_fit<-glm(os_status~age+sex+l_size+pdl1+tmb,data = pembrolizumab,family = 'binomial')
mvtab<-rm_mvsum(logis_fit,tableOnly = TRUE)
rm_uv_mv(uvtab,mvtab,tableOnly=TRUE)
Output several univariate models nicely in a single table
Description
#'A table with the model parameters from running separate univariate models on each covariate. For factors with more than two levels a Global p-value is returned.
Usage
rm_uvsum(
response,
covs,
data,
digits = getOption("reportRmd.digits", 2),
covTitle = "",
caption = NULL,
tableOnly = FALSE,
removeInf = FALSE,
p.adjust = "none",
unformattedp = FALSE,
whichp = c("levels", "global", "both"),
chunk_label,
gee = FALSE,
id = NULL,
corstr = NULL,
family = NULL,
type = NULL,
offset = NULL,
strata = 1,
nicenames = TRUE,
showN = TRUE,
showEvent = TRUE,
CIwidth = 0.95,
reflevel = NULL,
returnModels = FALSE,
fontsize,
forceWald = FALSE
)
Arguments
response |
string vector with name of response |
covs |
character vector with the names of columns to fit univariate models to |
data |
dataframe containing data |
digits |
number of digits to round estimates and CI to. Does not affect p-values. |
covTitle |
character with the names of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate'. |
caption |
character containing table caption (default is no caption) |
tableOnly |
boolean indicating if unformatted table should be returned |
removeInf |
boolean indicating if infinite estimates should be removed from the table |
p.adjust |
p-adjustments to be performed. Uses the p.adjust function from base R |
unformattedp |
boolean indicating if you would like the p-value to be returned unformatted (ie not rounded or prefixed with '<'). Should be used in conjunction with the digits argument. |
whichp |
string indicating whether you want to display p-values for levels within categorical data ("levels"), global p values ("global"), or both ("both"). Irrelevant for continuous predictors. |
chunk_label |
only used if output is to Word to allow cross-referencing |
gee |
boolean indicating if gee models should be fit to account for correlated observations. If TRUE then the id argument must specify the column in the data which indicates the correlated clusters. |
id |
character vector which identifies clusters. Only used for geeglm |
corstr |
character string specifying the correlation structure. Only used for geeglm. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"' and '"userdefined"' |
family |
description of the error distribution and link function to be used in the model. Only used for geeglm |
type |
string indicating the type of univariate model to fit. The function will try and guess what type you want based on your response. If you want to override this you can manually specify the type. Options include "linear", "logistic", "poisson",coxph", "crr", "boxcox", "ordinal", "geeglm" |
offset |
string specifying the offset term to be used for Poisson or negative binomial regression. Example: offset="log(follow_up)" |
strata |
character vector of covariates to stratify by. Only used for coxph and crr |
nicenames |
boolean indicating if you want to replace . and _ in strings with a space |
showN |
boolean indicating if you want to show sample sizes |
showEvent |
boolean indicating if you want to show number of events. Only available for logistic. |
CIwidth |
width of confidence interval, default is 0.95 |
reflevel |
manual specification of the reference level. Only used for ordinal regression This will allow you to see which model is not fitting if the function throws an error |
returnModels |
boolean indicating if a list of fitted models should be returned. If this is TRUE then the models will be returned, but the output will be suppressed. In addition to the model elements a data element will be appended to each model so that the fitted data can be examined, if necessary. See Details |
fontsize |
PDF/HTML output only, manually set the table fontsize |
forceWald |
|
Details
Global p-values are likelihood ratio tests for lm, glm and polr models. For lme models an attempt is made to re-fit the model using ML and if,successful LRT is used to obtain a global p-value. For coxph models the model is re-run without robust variances with and without each variable and a LRT is presented. If unsuccessful a Wald p-value is returned. For GEE and CRR models Wald global p-values are returned.
As of version 0.1.1 if global p-values are requested they will be included in the p-value column.
The number of decimals places to display the statistics can be changed with digits, but this will not change the display of p-values. If more significant digits are required for p-values then use tableOnly=TRUE and format as desired.
tidyselect can only be used for response and covs variables. Additional arguments must be passed in using characters
Value
A character vector of the table source code, unless tableOnly=TRUE in which case a data frame is returned
See Also
lm,glm,
cmprsk::crr,
survival::coxph,
nlme::lme,
geepack::geeglm,
MASS::glm.nb
Examples
# Examples are for demonstration and are not meaningful
# Coxph model with 90% CI
data("pembrolizumab")
rm_uvsum(response = c('os_time','os_status'),
covs=c('age','sex','baseline_ctdna','l_size','change_ctdna_group'),
data=pembrolizumab,CIwidth=.9)
# Linear model with default 95% CI
rm_uvsum(response = 'baseline_ctdna',
covs=c('age','sex','l_size','pdl1','tmb'),
data=pembrolizumab)
# Logistic model with default 95% CI
rm_uvsum(response = 'os_status',
covs=c('age','sex','l_size','pdl1','tmb'),
data=pembrolizumab,family = binomial)
# Poisson models returned as model list
mList <- rm_uvsum(response = 'baseline_ctdna',
covs=c('age','sex','l_size','pdl1','tmb'),
data=pembrolizumab, returnModels=TRUE)
#'
# GEE on correlated outcomes
data("ctDNA")
rm_uvsum(response = 'size_change',
covs=c('time','ctdna_status'),
gee=TRUE,
id='id', corstr="exchangeable",
family=gaussian("identity"),
data=ctDNA,showN=TRUE)
# Using tidyselect
pembrolizumab |> rm_uvsum(response = sex,
covs = c(age, cohort))
Round with sprintf formatting
Description
Rounds values using sprintf for precise decimal formatting. Used internally by psthr0().
Usage
round_sprintf(value, digits)
Arguments
value |
Numeric value to round |
digits |
Number of decimal places |
Value
Character string with exactly 'digits' decimal places
Output a scrollable table
Description
This function accepts the output of a aa call to knitr::kable or reportRmd::outTable and, if the output format is html, will produce a scrollable table. Otherwise a regular table will be output for pandoc/latex
Usage
scrolling_table(knitrTable, pixelHeight = 500)
Arguments
knitrTable |
output from a call to knitr::kable or outTable |
pixelHeight |
the height of the scroll box in pixels, default is 500 |
Examples
data("pembrolizumab")
tab <- rm_covsum(data=pembrolizumab,maincov = 'change_ctdna_group',
covs=c('age','cohort','sex','pdl1','tmb','l_size'),full=FALSE)
scrolling_table(tab,pixelHeight=300)
Set variable labels
Description
Assign variable labels to a data.frame from a lookup table.
Usage
set_labels(data, names_labels)
Arguments
data |
data frame to be labelled |
names_labels |
data frame with column 1 containing variable names from data and column 2 containing variable labels. Other columns will be ignored. |
Details
Useful if variable labels have been imported from a data dictionary. The first column in names_labels must contain the variable name and the second column the variable label. The column names are not used.
If no label is provided then the existing label will not be changed. To remove a label set the label to NA.
See Also
set_var_labels() for setting individual variable labels,
extract_labels() for creating a data frame of all variable labels,
clear_labels() for removing variable labels
Examples
data("ctDNA")
# create data frame with labels
lbls <- data.frame(c1=c('cohort','size_change'),
c2=c('Cancer cohort','Change in tumour size'))
# set labels and return labelled data frame
set_labels(ctDNA,lbls)
Set variable labels
Description
Set variable labels for a data frame using name-label pairs.
Usage
set_var_labels(data, ...)
Arguments
data |
data frame containing variables to be labelled |
... |
Name-label pairs the name gives the name of the column in the output and the label is a character vector of length one. |
Details
If no label is provided for a variable then the existing label will not be changed. To remove a label set the label to NA.
See Also
set_labels() for setting variable labels using a data frame,
extract_labels() for creating a data frame of all variable labels,
clear_labels() for removing variable labels
Examples
# set labels using name-label pairs
# and return labelled data frame
data("ctDNA")
ctDNA |> set_var_labels(
ctdna_status="detectable ctDNA",
cohort="A cohort label")
Validate and prepare input data
Description
Validate and prepare input data
Usage
validate_and_prepare_data(data, response, cov = NULL, print.n.missing = TRUE)
Arguments
data |
Input dataframe |
response |
Character vector with time and status column names |
cov |
Covariate column name (optional) |
print.n.missing |
Whether to print missing data message |
Create a summary table for an individual covariate
Description
Create a summary table for an individual covariate
Usage
xvar_function(
xvar,
data,
grp,
covTitle = "",
digits = 1,
digits.cat = 0,
iqr = TRUE,
all.stats = FALSE,
pvalue = TRUE,
effSize = FALSE,
show.tests = FALSE,
percentage = "col"
)
Arguments
xvar |
character with the name of covariate to include in table |
data |
dataframe containing data |
grp |
character with the name of the grouping variable |
covTitle |
character with the name of the covariate (predictor) column. The default is to leave this empty for output or, for table only output to use the column name 'Covariate' |
digits |
numeric specifying the number of digits for summarizing mean data. Otherwise, can specify for individual covariates using a vector of digits where each element is named using the covariate name. If a covariate is not in the vector the default will be used for it (default is 1). See examples |
digits.cat |
numeric specifying the number of digits for the proportions when summarizing categorical data (default is 0) |
iqr |
logical indicating if you want to display the interquartile range (Q1, Q3) as opposed to (min, max) in the summary for continuous variables |
all.stats |
logical indicating if all summary statistics (Q1, Q3 + min, max on a separate line) should be displayed. Overrides iqr |
pvalue |
logical indicating if you want p-values included in the table |
effSize |
logical indicating if you want effect sizes and their 95% confidence intervals included in the table. Effect sizes calculated include Cramer's V for categorical variables, and Cohen's d, Wilcoxon r, Epsilon-squared, or Omega-squared for numeric/continuous variables |
show.tests |
logical indicating if the type of statistical test and effect size (if effSize = TRUE) used should be shown in a column beside the p-values. Ignored if pvalue = FALSE |
percentage |
choice of how percentages are presented, either column (default) or row |
Value
A data frame is returned