Most subsequent examples build on this simple rifttable:
library(rifttable)
library(dplyr) # for data management, e.g., mutate()
library(tibble) # for constructing a tibble, e.g. via tribble()
data(breastcancer, package = "risks")
design <- tribble(
~label, ~type, ~stratum,
"**Overall**", "", "",
" Deaths/N", "outcomes/total", c("Low", "High"),
" Risk", "risk", c("Low", "High"),
" Risk ratio (95% CI)", "rr", c("Low", "High"),
" Risk difference (95% CI)", "rd", c("Low", "High"),
"", "", "",
"**Low hormone receptor**", "", "",
" Deaths/N (Risk)", "outcomes/total (risk)", "Low",
" Risk difference (95% CI)", "rd", "Low",
"**High hormone receptor**", "", "",
" Deaths/N (Risk)", "outcomes/total (risk)", "High",
" Risk difference (95% CI)", "rd", "High"
) |>
mutate(
exposure = "stage",
outcome = "death",
effect_modifier = "receptor"
)
rifttable(
design = design,
data = breastcancer
) |>
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Overall | |||
Deaths/N | 7/67 | 26/96 | 21/29 |
Risk | 0.10 | 0.27 | 0.72 |
Risk ratio (95% CI) | 1 (reference) | 2.59 (1.20, 5.6) | 6.9 (3.3, 14) |
Risk difference (95% CI) | 0 (reference) | 0.17 (0.05, 0.28) | 0.62 (0.44, 0.80) |
Low hormone receptor | |||
Deaths/N (Risk) | 2/12 (0.17) | 9/22 (0.41) | 12/14 (0.86) |
Risk difference (95% CI) | 0 (reference) | 0.24 (-0.05, 0.54) | 0.69 (0.41, 0.97) |
High hormone receptor | |||
Deaths/N (Risk) | 5/55 (0.09) | 17/74 (0.23) | 9/15 (0.60) |
Risk difference (95% CI) | 0 (reference) | 0.14 (0.02, 0.26) | 0.51 (0.25, 0.77) |
R’s error messages can be frustrating. When using rifttable, these are the typical sources of errors:
type
) and the data. For
example, type = "mean"
will not work on a categorical
(factor) variable.To identify where an error is coming from, start simple. Comment out
all but one line of the design
, putting #
at
the beginning of the line. Start with a line that gives basic
descriptive data, such as type = "total"
,
type = "outcomes"
or type = "events/time"
, and
re-run rifttable()
. Then add more lines with descriptive
estimators, one by one. At the end, add lines that fit models, such as
type = "hr"
.
design
?The design
that the rifttable()
function
takes as input is simply a dataset that defines how the table should
look like when rifttable()
has processed the
data
.
The design
can be constructed in many different ways.
All lead to the same table:
A dataset (tibble) defined using
tribble()
design1 <- tribble(
~label, ~exposure, ~outcome, ~type,
"N", "stage", "death", "total",
"Deaths", "stage", "death", "outcomes"
)
design1
label | exposure | outcome | type |
---|---|---|---|
N | stage | death | total |
Deaths | stage | death | outcomes |
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
N | 67 | 96 | 29 |
Deaths | 7 | 26 | 21 |
A dataset (tibble) defined using
tibble()
design2 <- tibble(
label = c("N", "Deaths"),
exposure = "stage",
outcome = "death",
type = c("total", "outcomes")
)
design2
label | exposure | outcome | type |
---|---|---|---|
N | stage | death | total |
Deaths | stage | death | outcomes |
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
N | 67 | 96 | 29 |
Deaths | 7 | 26 | 21 |
Concatenating tibbles, then editing with
mutate()
design3 <- bind_rows(
tibble( # row 1
label = "N",
type = "total"
),
tibble( # row 2
label = "Deaths",
type = "outcomes"
)
) |>
mutate( # elements that are the same for all rows
exposure = "stage",
outcome = "death"
)
design3
label | type | exposure | outcome |
---|---|---|---|
N | total | stage | death |
Deaths | outcomes | stage | death |
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
N | 67 | 96 | 29 |
Deaths | 7 | 26 | 21 |
For descriptive tables: Use
table1_design()
design4 <- breastcancer |>
table1_design(
death, # the total count will automatically be included
by = stage
)
design4
label | outcome | type | exposure |
---|---|---|---|
N | total | stage | |
Death | death | outcomes (risk) | stage |
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
N | 67 | 96 | 29 |
Death | 7 (10%) | 26 (27%) | 21 (72%) |
See a separate overview about a descriptive Table 1.
External datasets
The design
could even be written out in an external
dataset that can be loaded with readr::read_csv()
(for CSV
files) or readxl::read_excel()
(for Excel sheets).
rifttable tries to make as few assumptions as possible about how the user wants to treat missing data.
NA
) in the exposure are displayed as a separate
exposure category for descriptive statistics (e.g.,
type = "total"
or type = "mean"
). They are
omitted by comparative estimators (e.g., regression models). To change
this behavior, call rifttable()
with the argument
exposure_levels = "nona"
.--
or
NA
), and regression models will use the non-missing
observations. To exclude observations with missing outcome values
altogether, add na_rm = TRUE
to (specific rows of) the
design
.stratum
of the effect_modifier
. To include observations with the
missing effect modifier, add NA
to the requested stratum in
the design
, e.g.,
effect_modifier = "bmi", stratum = c("<25", NA)
.Use the overall
argument to show descriptive data for
the entire data
set. Inferential estimators showing
comparisons between exposure categories will be blank there.
rifttable(
design = design,
data = breastcancer,
overall = TRUE
) |>
rt_gt() # obtain formatted output
Summary | Overall | Stage I | Stage II | Stage III |
---|---|---|---|---|
Overall | ||||
Deaths/N | 54/192 | 7/67 | 26/96 | 21/29 |
Risk | 0.28 | 0.10 | 0.27 | 0.72 |
Risk ratio (95% CI) | 1 (reference) | 2.59 (1.20, 5.6) | 6.9 (3.3, 14) | |
Risk difference (95% CI) | 0 (reference) | 0.17 (0.05, 0.28) | 0.62 (0.44, 0.80) | |
Low hormone receptor | ||||
Deaths/N (Risk) | 23/48 (0.48) | 2/12 (0.17) | 9/22 (0.41) | 12/14 (0.86) |
Risk difference (95% CI) | 0 (reference) | 0.24 (-0.05, 0.54) | 0.69 (0.41, 0.97) | |
High hormone receptor | ||||
Deaths/N (Risk) | 31/144 (0.22) | 5/55 (0.09) | 17/74 (0.23) | 9/15 (0.60) |
Risk difference (95% CI) | 0 (reference) | 0.14 (0.02, 0.26) | 0.51 (0.25, 0.77) |
Instead of testing a null hypothesis about a trend, rifttable proposes estimating the difference in the outcome for a one-unit higher exposure. This is also called a linear slope. Here, we estimate the risk associated for stage that is one category higher.
rifttable(
design = design |>
mutate(trend = "stage_numeric"),
data = breastcancer |>
mutate(stage_numeric = as.numeric(stage))
) |>
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III | Trend |
---|---|---|---|---|
Overall | ||||
Deaths/N | 7/67 | 26/96 | 21/29 | |
Risk | 0.10 | 0.27 | 0.72 | |
Risk ratio (95% CI) | 1 (reference) | 2.59 (1.20, 5.6) | 6.9 (3.3, 14) | 2.50 (1.97, 3.2) |
Risk difference (95% CI) | 0 (reference) | 0.17 (0.05, 0.28) | 0.62 (0.44, 0.80) | 0.26 (0.19, 0.33) |
Low hormone receptor | ||||
Deaths/N (Risk) | 2/12 (0.17) | 9/22 (0.41) | 12/14 (0.86) | |
Risk difference (95% CI) | 0 (reference) | 0.24 (-0.05, 0.54) | 0.69 (0.41, 0.97) | 0.32 (0.21, 0.43) |
High hormone receptor | ||||
Deaths/N (Risk) | 5/55 (0.09) | 17/74 (0.23) | 9/15 (0.60) | |
Risk difference (95% CI) | 0 (reference) | 0.14 (0.02, 0.26) | 0.51 (0.25, 0.77) | 0.20 (0.11, 0.29) |
Our simple toy dataset just has one exposure variable. For
demonstration, we just create a second variable, with two categories,
“Level 1” and “Level 2,” which is a simplified combination of the
stage
and receptor
variables.
We will flip the table layout
from "rows"
(the default) to "cols"
and concatenate two rifttables. We
also need to give our new exposure2
variable the same label
as stage
to make sure results appear in the same
column.
breastcancer_2exposures <- breastcancer |>
mutate(
exposure2 = case_when(
stage == "Stage I" |
(stage == "Stage II" & receptor == "High") ~
"Level 1",
stage == "Stage III" |
(stage == "Stage II" & receptor == "Low") ~
"Level 2"
)
)
attr(breastcancer_2exposures$exposure2, which = "label") <- "Exposure"
attr(breastcancer_2exposures$stage, which = "label") <- "Exposure"
bind_rows(
design |>
mutate(exposure = "exposure2") |>
slice(2:5) |>
rifttable(
data = breastcancer_2exposures,
layout = "cols"
),
design |>
slice(2:5) |>
rifttable(
data = breastcancer_2exposures,
layout = "cols"
)
) |>
rt_gt() # obtain formatted output
Exposure | Deaths/N | Risk | Risk ratio (95% CI) | Risk difference (95% CI) |
---|---|---|---|---|
Level 1 | 24/141 | 0.17 | 1 (reference) | 0 (reference) |
Level 2 | 30/51 | 0.59 | 3.5 (2.25, 5.3) | 0.42 (0.27, 0.57) |
Stage I | 7/67 | 0.10 | 1 (reference) | 0 (reference) |
Stage II | 26/96 | 0.27 | 2.59 (1.20, 5.6) | 0.17 (0.05, 0.28) |
Stage III | 21/29 | 0.72 | 6.9 (3.3, 14) | 0.62 (0.44, 0.80) |
By default, difference measures are being rounded to 2 decimal digits
(0.01
), such as type = "diff"
, the mean
difference, or type = "quantreg"
, the median difference.
The same goes for risk measures, such as type = "risk"
,
unless shown as percentage points. Ratio measures are also shown with 2
decimal digits, such as type = "hr"
, the hazard ratio, or
type = "fold"
, a ratio of arithmetic means.
Rounding can be changed by setting the rifttable()
arguments diff_digits
, risk_digits
, and
ratio_digits
globally for the entire table.
design <- tribble(
~label, ~type,
"Deaths/N", "outcomes/total",
"Risk", "risk",
"Risk ratio (95% CI)", "rr",
"Odds ratio (95% CI)", "or",
"Risk difference (95% CI)", "rd"
) |>
mutate(
exposure = "stage",
outcome = "death"
)
rifttable(
design = design,
data = breastcancer,
ratio_digits = 3, # Many digits for ratios
risk_digits = 1
) |> # Fewer digits for risks
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Deaths/N | 7/67 | 26/96 | 21/29 |
Risk | 0.1 | 0.3 | 0.7 |
Risk ratio (95% CI) | 1 (reference) | 2.592 (1.195, 5.62) | 6.93 (3.32, 14.5) |
Odds ratio (95% CI) | 1 (reference) | 3.18 (1.351, 8.43) | 22.5 (7.69, 75.0) |
Risk difference (95% CI) | 0 (reference) | 0.2 (0.1, 0.3) | 0.6 (0.4, 0.8) |
As can be seen, ratios > 3 are still shown with 1 fewer decimal, and ratios > 10 are shown with 2 fewer decimals (Wilcox, Epidemiology 2004 motivates why). To disable such additional rounding of extremely high ratios:
rifttable(
design = design,
data = breastcancer,
ratio_digits = 3,
ratio_digits_decrease = NULL, # Do not round high ratios more
risk_digits = 1
) |>
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Deaths/N | 7/67 | 26/96 | 21/29 |
Risk | 0.1 | 0.3 | 0.7 |
Risk ratio (95% CI) | 1 (reference) | 2.592 (1.195, 5.621) | 6.931 (3.320, 14.471) |
Odds ratio (95% CI) | 1 (reference) | 3.184 (1.351, 8.427) | 22.500 (7.691, 74.976) |
Risk difference (95% CI) | 0 (reference) | 0.2 (0.1, 0.3) | 0.6 (0.4, 0.8) |
Additionally, rounding can be changed for each row, adding a column
digits
to the rifttable design
:
tribble(
~label, ~type, ~digits,
"Deaths/N", "outcomes/total", NA, # Uses rifttable default
"Risk", "risk", NA, # Uses risk_digits below
"Risk ratio (95% CI)", "", NA,
" Rounded to 1 digit", "rr", 1,
" Rounded to 2 digits", "rr", 2,
"Risk difference (95% CI)", "rd", 3
) |> # Overrides risk_digits
mutate(
exposure = "stage",
outcome = "death"
) |>
rifttable(
data = breastcancer,
risk_digits = 1
) |> # Fewer digits for risks, unless specified by "digits"
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Deaths/N | 7/67 | 26/96 | 21/29 |
Risk | 0.1 | 0.3 | 0.7 |
Risk ratio (95% CI) | |||
Rounded to 1 digit | 1 (reference) | 2.6 (1.2, 6) | 7 (3, 14) |
Rounded to 2 digits | 1 (reference) | 2.59 (1.20, 5.6) | 6.9 (3.3, 14) |
Risk difference (95% CI) | 0 (reference) | 0.166 (0.051, 0.282) | 0.620 (0.441, 0.798) |
By default, regression models will be fit separately for each stratum
of the effect_modifier
.
Append "_joint"
to "hr"
, "rr"
,
"rd"
, "irr"
, "irrrob"
,
"diff"
, "fold"
, "foldlog"
,
"quantreg"
, or "or"
to obtain “joint” models
for exposure and effect modifier that have a single reference
category.
Note that the joint model will be fit across all non-missing
(NA
) strata of the effect modifier, even if the
design
table does not request all strata be shown.
Compare stratified models to joint models for risk differences (for simplicity of presentation, count data are omitted):
tribble(
~label, ~type, ~stratum,
"**Overall**", "rd", c("Low", "High"),
"", "", "",
"**Stratified models**", "", "",
" Low hormone receptor", "rd", "Low",
" High hormone receptor", "rd", "High",
"", "", "",
"**Joint models**", "", "",
" Low hormone receptor", "rd_joint", "Low",
" High hormone receptor", "rd_joint", "High"
) |>
mutate(
exposure = "stage",
outcome = "death",
effect_modifier = "receptor"
) |>
rifttable(data = breastcancer) |>
rt_gt()
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Overall | 0 (reference) | 0.17 (0.05, 0.28) | 0.62 (0.44, 0.80) |
Stratified models | |||
Low hormone receptor | 0 (reference) | 0.24 (-0.05, 0.54) | 0.69 (0.41, 0.97) |
High hormone receptor | 0 (reference) | 0.14 (0.02, 0.26) | 0.51 (0.25, 0.77) |
Joint models | |||
Low hormone receptor | 0.08 (-0.15, 0.30) | 0.32 (0.10, 0.54) | 0.77 (0.57, 0.96) |
High hormone receptor | 0 (reference) | 0.14 (0.02, 0.26) | 0.51 (0.25, 0.77) |
The reference categories for exposure and effect modifier are always
their first factor levels. Compare the preceding example:
"High"
is, alphabetically, before "Low"
. To
change the reference category, use forcats::fct_relevel()
or the base R alternative relevel()
on variables in the
data
provided to rifttable()
:
tribble(
~label, ~type, ~stratum,
"**Joint models**", "", "",
" Low hormone receptor", "rd_joint", "Low",
" High hormone receptor", "rd_joint", "High"
) |>
mutate(
exposure = "stage",
outcome = "death",
effect_modifier = "receptor"
) |>
rifttable(
data = breastcancer |>
mutate(
receptor = relevel(
factor(receptor), # Make "receptor" a factor in the first place
ref = "Low"
)
)
) |> # Set new reference category
rt_gt()
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Joint models | |||
Low hormone receptor | 0 (reference) | 0.24 (-0.05, 0.54) | 0.69 (0.41, 0.97) |
High hormone receptor | -0.08 (-0.30, 0.15) | 0.06 (-0.17, 0.29) | 0.43 (0.11, 0.76) |
If a middle category of the exposure stage
is desired as
the reference:
result_reordered <- tibble(
label = "**RD (95% CI)**",
type = "rd",
exposure = "stage",
outcome = "death"
) |>
rifttable(
data = breastcancer |>
mutate(
stage = relevel(
stage,
ref = "Stage II"
)
)
)
result_reordered |>
rt_gt()
stage | Stage II | Stage I | Stage III |
---|---|---|---|
RD (95% CI) | 0 (reference) | -0.17 (-0.28, -0.05) | 0.45 (0.27, 0.64) |
Using forcats::fct_relevel()
may be preferable over
relevel()
, as it preserves the variable label: here, the
variable stage
lost its label, "Stage"
starting with upper case S.
That results for “Stage II” are now listed first is probably
undesirable. Reorder the columns of the table that
rifttable()
produced to print results for “Stage I”
first:
stage | Stage I | Stage II | Stage III |
---|---|---|---|
RD (95% CI) | -0.17 (-0.28, -0.05) | 0 (reference) | 0.45 (0.27, 0.64) |
Add a ci
column to the design
:
tribble(
~label, ~type, ~ci,
"Deaths/N (Risk)", "outcomes/total (risk)", NA,
"Risk ratio", "", NA,
" 80% CI", "rr", 0.8,
" 95% CI", "rr", NA, # Defaults to 0.95
" 99% CI", "rr", 0.99
) |>
mutate(
exposure = "stage",
outcome = "death"
) |>
rifttable(
data = breastcancer,
risk_percent = TRUE
) |>
rt_gt() # obtain formatted output
Stage | Stage I | Stage II | Stage III |
---|---|---|---|
Deaths/N (Risk) | 7/67 (10%) | 26/96 (27%) | 21/29 (72%) |
Risk ratio | |||
80% CI | 1 (reference) | 2.59 (1.56, 4.3) | 6.9 (4.3, 11) |
95% CI | 1 (reference) | 2.59 (1.20, 5.6) | 6.9 (3.3, 14) |
99% CI | 1 (reference) | 2.59 (0.94, 7.2) | 6.9 (2.63, 18) |
While the package provides a number of estimators commonly used in epidemiology, it will never be able to include all possible estimators. However, any custom estimate can be integrated into a rifttable by a defining custom estimation function.
The subsequent example will reproduce the following basic rifttable,
which shows the mean age by sex, stratified by ECOG performance status,
in the cancer
data set:
data(cancer, package = "survival")
cancer <- cancer |>
tibble::as_tibble() |>
mutate(
sex = factor(
sex,
levels = 1:2,
labels = c("Male", "Female")
)
)
design <- tibble::tibble(
type = "mean",
exposure = "sex",
outcome = "age",
effect_modifier = "ph.ecog",
stratum = 1:2,
label = paste0("ECOG PS ", stratum, ": mean age")
)
design |>
rifttable(
data = cancer,
overall = TRUE
) |>
rt_gt()
Summary | Overall | Male | Female |
---|---|---|---|
ECOG PS 1: mean age | 61.45 | 62.79 | 59.19 |
ECOG PS 2: mean age | 66.22 | 65.00 | 67.90 |
Instead of relying on rifttable’s built-in estimator
type = "mean"
, we will define a custom function that
calculates the mean:
estimate_my_mean <- function(data, ...) {
data |>
group_by(.exposure) |>
summarize(
res = paste(
round(
mean(.outcome),
digits = 3
),
"yrs"
)
)
}
Use the custom function my_mean
instead of the built-in
mean
:
design |> # Edit the previous design
mutate(
type = "my_mean", # Replace built-in "mean" by custom "my_mean"
label = paste0(label, " (custom)")
) |>
rifttable(
data = cancer,
overall = TRUE
) |>
rt_gt()
Summary | Overall | Male | Female |
---|---|---|---|
ECOG PS 1: mean age (custom) | 61.451 yrs | 62.789 yrs | 59.19 yrs |
ECOG PS 2: mean age (custom) | 66.22 yrs | 65 yrs | 67.905 yrs |
Specifications for custom functions:
estimate_
; this
prefix is to be omitted when later calling the custom function within
rifttable()
.data
provided to rifttable()
will be
available to the custom function as an argument named
data
.data
are already subsetted to the
stratum
of the effect_modifier
, if
applicable..exposure
, .outcome
, .event
,
.time
, and .time2
, as applicable.design
(e.g., confounders
,
digits
, na_rm
, etc.) and of the
rifttable()
function (e.g., reference
,
risk_percent
) as arguments. Many may not be relevant and
can be captured in the argument list, ...
(see
example)..exposure
, with one row per exposure category, and one
string column res
for the estimate.