Package {seqwrap}


Type: Package
Title: Item-by-Item Iterative Model Fitting
Version: 0.7.0
Description: Models high-dimensional data, such as RNA-seq or proteomic data using an item-by-item strategy. The package contains functions to wrap high-dimensional data and iterate over them using established R packages for regression modelling (e.g., 'glmmTMB' or 'mgcv').
License: GPL-3
Encoding: UTF-8
LazyData: true
LazyDataCompression: xz
URL: https://github.com/trainome/seqwrap
BugReports: https://github.com/trainome/seqwrap/issues
Depends: R (≥ 4.1)
Imports: cli, S7, tibble, pbapply, parallel, stats, DHARMa, broom.mixed
Suggests: testthat (≥ 3.0.0), dplyr, knitr, purrr, quarto, glmmTMB, lme4, nlme, MASS, rmarkdown, gt, edgeR, tidyselect, ggplot2, cowplot, ggtext, R.rsp
Config/testthat/edition: 3
Collate: 'seqwrap-global.R' 'aaa-.R' 'data-helper.R' 'data.R' 'generic-summaries.R' 'seqwrap.R' 'seqwrap_mtf.R' 'simcounts.R' 'simcounts2.R'
VignetteBuilder: quarto, R.rsp
Config/roxygen2/version: 8.0.0
NeedsCompilation: no
Packaged: 2026-05-15 14:17:25 UTC; 706194
Author: Daniel Hammarström ORCID iD [aut, cre, cph], Chidimma Echebiri ORCID iD [ctb]
Maintainer: Daniel Hammarström <daniel.hammarstrom@inn.no>
Repository: CRAN
Date/Publication: 2026-05-19 10:00:02 UTC

Dungan et al 2022 counts

Description

Raw count data from Dungan et al. 2022 was downloaded from NCBI Gene Expression Omnibus (GEO) and prepared for use in the seqwrap package. Preparation included adding gene symbols to the first column of the count data frame (counts), and sorting relevant variables to metadata data frame (metadata).

Usage

dungan_counts

Format

A list with two data frames:

counts

Raw gene expression count matrix (genes x samples).

genesymbol

HGNC gene symbol

OS.. OV..

Raw counts, one column for each sample containing integer read counts, named by experimental id.

metadata

Sample-level metadata.

seq_sample_id

Experimental unit id (mouse identifier)

treatment

Senolytic or control (Vehicle) treatment

surgery

Surgery inducing overload (synergist ablation) or sham surgury

Source

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE195707

References

Dungan, C. M. et al. Senolytic treatment rescues blunted muscle hypertrophy in old mice. GeroScience 44, 1925–1940 (2022).


Generic evaluation for model fits

Description

When no evaluation function is provided to seqwrap (eval_fun), this function uses DHARMa to simulate residuals and test for uniformity, dispersion and outliers. These tests could guide troubleshooting and detection of model misspecification.

Usage

generic_evaluation(x)

Arguments

x

A model fitted in seqwrap.

Value

A tidy data frame possible to combine using seqwrap_summarise

Examples

# The generic summary works on model objects
library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
  library(glmmTMB)

  dat <- simcounts(n_genes = 1000,
                   n_samples = 30,
                   beta_0 = 5,
                   overdispersion_min_max = c(1, 10))

  counts <- dat$data
  metadata <- dat$metadata
  metadata$ln_libsize <- log(colSums(counts[,-1]))

  # Save counts for gene i in the same data frame
  metadata$y <- as.integer(counts[1, -1])

  m <- glmmTMB(y ~ as.factor(x) + offset(ln_libsize),
               data = metadata,
               family = nbinom2)

  generic_summary(m)
  generic_evaluation(m)
}


Generic summaries for model parameter estimates.

Description

When no summary function is provided to seqwrap (summary_fun), this function uses broom(.mixed)::tidy to give a table of model parameter estimates.

Usage

generic_summary(x)

Arguments

x

A model fitted in seqwrap.

Value

A tidy data frame possible to bind using seqwrap_summarise

Examples

# The generic summary works on model objects
library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
  library(glmmTMB)

  dat <- simcounts(n_genes = 1000,
                   n_samples = 30,
                   beta_0 = 5,
                   overdispersion_min_max = c(1, 10))

  counts <- dat$data
  metadata <- dat$metadata
  metadata$ln_libsize <- log(colSums(counts[,-1]))

  # Save counts for gene i in the same data frame
  metadata$y <- as.integer(counts[1, -1])

  m <- glmmTMB(y ~ as.factor(x) + offset(ln_libsize),
               data = metadata,
               family = nbinom2)

  generic_summary(m)
  generic_evaluation(m)
}


Print method for objects of class seqwrapResults

Description

Invoking the print method on seqwrapResults gives a summary of the fitted objects.

Arguments

x

A seqwrapResults object

...

Currently unused; included for compatibility with the print generic.

Value

Invisibly returns the object

Examples

# Load packages and prepare data for examples --------------------------------

library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
library(glmmTMB)

dat <- simcounts2()

# Save simulated data as separate objects
counts <- dat$counts
metadata <- dat$metadata

# Prepare library size for use as offset
metadata$ln_libsize <- log(metadata$library_size)


# A mixed effects negative binomial model of RNA-seq counts ------------------

# Populate the seqwrap container
container <- seqwrap_compose(
  modelfun = glmmTMB::glmmTMB,
  arguments = list(
    formula = y ~ time * condition + (1|id) + offset(ln_libsize),
    family = glmmTMB::nbinom2()
  ),
  data = counts,
  metadata = metadata,
  samplename = "seq_sample_id"
)

# Run seqwrap using the container
results <- seqwrap(container,
                   cores = 1)
print(results)
}


A flexible upper-level wrapper for iterative modelling using any available fitting algorithm

Description

A flexible upper-level wrapper for iterative modelling using any available fitting algorithm

Usage

seqwrap(
  y = NULL,
  modelfun = NULL,
  arguments = NULL,
  data = NULL,
  metadata = NULL,
  samplename = NULL,
  additional_vars = NULL,
  summary_fun = NULL,
  eval_fun = NULL,
  exported = list(),
  return_models = TRUE,
  save_models = FALSE,
  model_path = NULL,
  subset = NULL,
  cores = 1,
  verbose = TRUE
)

Arguments

y

An swcontainer object created with seqwrap_compose, a named list or a DGEList object.

modelfun

A model fitting function like stats::lm, glmmTMB::glmmTMB or lme4::lmer

arguments

An alist or list of arguments to be passed to the fitting function, this should not contain data. Note that the formula must have y as the dependent variable.

data

A data frame or a list of data frames with targets (e.g. genes, transcripts) as rows and sample names as columns. If rownames = FALSE (default), each data frame should have target identifications as the first column in the data frame(s). If rownames = TRUE row names will be converted to target identifications. If data is provided as a list, each element of the list should be named. The corresponding names be available as variables for the fitting function.

metadata

A data frame with sample names (corresponding to column names in the target matrix) and design variables.

samplename

A character value indicating the variable by which metadata can merge with the target data. This defaults to "seq_sample_id" as this is used in the trainomeMetaData package.

additional_vars

A vector of additional variables that is contained in the metadata data set that is needed to fit the model. By default the metadata is reduced to variables contained in the slots formula/model/fixed/random in additional arguments. More variables may be needed for offsets, weights etc.

summary_fun

A custom (user-created) function for evaluating/summarizing models. If NULL, a list of fitted models are returned

eval_fun

A custom (user-created) function for model diagnostics/evaluation. If NULL, no evaluation/diagnostics of models are returned

exported

A list of functions, values etc. to be passed to summary_fun and eval_fun. This list must contain any functions that should be used in model summarise or evaluations.

return_models

Logical, should models be returned as part of the output? Save models during development on subsets of the data. If used on large data sets, this will result in large memory usage.

save_models

Logical, should models be saved? Models may be saved on disk to save working memory.

model_path

A character. The path to saved models.

subset

A sequence, random samples or integers to indicate which rows to keep in data. This is useful if you want to test the model in a subset of targets. If left to the default (NULL), all rows will be used.

cores

An integer indicating the number of cores to be used in parallel computations. If NULL, a sequential for loop is used. If "max", all available cores are used.

verbose

Logical, should the function print diagnostics after checking the data container?

Details

This function provides a flexible wrapper to fit, summarize and evaluate statistical models fitted to high dimensional omics-type data. Models are fitted and passed to user defined functions to summarize and evaluate models.

Value

A nested list with three upper levels slots: models, a list of fitted objects; summaries, a list of summaries created from the summary_fun function; evaluations, a list of diagnostics created from eval_fun.

Examples

library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
library(glmmTMB)

# Simulate n targets
dat <- simcounts(n_genes = 10000)

# Save simulated data as separate objects
counts <- dat$data
metadata <- dat$metadata

# Prepare library size for use as offset
metadata$ln_libsize <- log(colSums(counts[,-1]))


# A mixed effects negative binomial model of RNA-seq counts -----------------

# Populate the seqwrap container
container <- seqwrap_compose(
  modelfun = glmmTMB::glmmTMB,
  arguments = list(
    formula = y ~ x + (1|cluster) + offset(ln_libsize),
    family = glmmTMB::nbinom2()
  ),
  data = counts,
  metadata = metadata,
  samplename = "sample"
)

# Run seqwrap using the container on a subset of targets
results <- seqwrap(container,
                   subset = 1:5,
                   cores = 1)


# Summarise results only contains the subset
summaries <- seqwrap_summarise(results)
}


seqwrapResults class

Description

seqwrapResults constructor function.

Arguments

.data

A list containing initial data (inherited from S7::class_list)

models

A list of fitted model objects

summaries

A list of model summaries

evaluations

A list of model evaluations

errors

A data frame of errors and warnings. defaults to an empty dataframe

n

Number of samples

k

Number of targets

call_arguments

Character string of function arguments used

call_engine

Character string of modeling engine used

elapsed_time

An proc.time() object giving the time needed to complete iterative model fitting.

Value

An S7 object of class seqwrap_results storing fitted models, summaries, evaluations, and diagnostic information from a seqwrap() run.

Examples

# seqwrapResults is the S7 class returned by seqwrap(). End users do
# not normally call the constructor directly -- it is invoked
# internally once iterative model fitting has finished.

library(seqwrap)

dat <- simcounts(n_genes = 5)

container <- seqwrap_compose(
  modelfun = stats::lm,
  arguments = list(formula = y ~ x),
  data = dat$data,
  metadata = dat$metadata,
  samplename = "sample"
)

results <- seqwrap(container, subset = 1:2, cores = 1)

# The object returned by seqwrap() is a seqwrapResults instance
S7::S7_inherits(results, seqwrapResults)


Compose a swcontainer object for use in the seqwrap function.

Description

This function makes it possible to compose and run checks on combined data sets (meta data and target data) and fitting functions to avoid issues in iterative modelling. See examples and vignettes for details.

Usage

seqwrap_compose(
  x = NULL,
  modelfun,
  arguments,
  data,
  rownames = FALSE,
  metadata,
  targetdata = NULL,
  samplename = "seq_sample_id",
  additional_vars = NULL,
  summary_fun = NULL,
  eval_fun = NULL,
  exported = list(),
  update = list()
)

Arguments

x

An optional named list or DGEList object (DESeqDataSet not yet implemented).

modelfun

A model fitting function like stats::lm, glmmTMB::glmmTMB or lme4::lmer

arguments

An alist or list of arguments to be passed to the fitting function, this should not contain data. Note that the formula must have y as the dependent variable.

data

A data frame or a list of data frames with targets (e.g. genes, transcripts) as rows and sample names as columns. If rownames = FALSE (default), each data frame should have target identifications as the first column in the data frame(s). If rownames = TRUE row names will be converted to target identifications. If data is provided as a list, each element of the list should be named. The corresponding names be available as variables for the fitting function.

rownames

should row names in data be used as target identifications? Defaults to FALSE.

metadata

A data frame with sample names (corresponding to column names in the target matrix) and design variables.

targetdata

A data frame or a list with target-wise values (e.g. dispersion or start values) for each target. This data is made available for the model fitting function and can be used to specify target specific data in each iteration of seqwrap. When a data frame is provided each row corresponds to the target specific value. When a list is provided, each column of the data frame is available and can be called by name.

samplename

A character value indicating the variable by which metadata can merge with the target data. This defaults to "seq_sample_id" as this is used in the trainomeMetaData package.

additional_vars

A vector of additional variables that is contained in the metadata data set that is needed to fit the model. By default the metadata is reduced to variables contained in the slots formula/model/fixed/random in additional arguments. More variables may be needed for offsets, weights etc.

summary_fun

A custom (user-created) function for evaluating/summarizing models. If NULL, a list of fitted models are returned

eval_fun

A custom (user-created) function for model diagnostics/evaluation. If NULL, no evaluation/diagnostics of models are returned

exported

A list of functions, values etc. to be passed to summary_fun and eval_fun. This list must contain any functions that should be used in model summarise or evaluations.

update

A list of named parameters to update a swcontainer object.

Value

A swcontainer object for direct use in seqwrap.

Examples

# Load packages and prepare data for examples -------------------------------

library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
library(glmmTMB)

dat <- simcounts2()

# Save simulated data as separate objects
counts <- dat$counts
metadata <- dat$metadata

# Prepare library size for use as offset
metadata$ln_libsize <- log(metadata$library_size)


# A mixed effects negative binomial model of RNA-seq counts -----------------

# Populate the seqwrap container
container <- seqwrap_compose(
  modelfun = glmmTMB::glmmTMB,
  arguments = list(
    formula = y ~ time * condition + (1|id) + offset(ln_libsize),
    family = glmmTMB::nbinom2()
  ),
  data = counts,
  metadata = metadata,
  samplename = "seq_sample_id"
)

# Run seqwrap using the container
results <- seqwrap(container,
                   cores = 1)




# Summarise results
summaries <- seqwrap_summarise(results)


# Including target-specific data --------------------------------------------

# Target specific data can be supplied to seqwrap_compose to enable, e.g.,
# the use of priors for empirical Bayes shrinkage. In this example we are
# setting a dummy-prior on the `condition` parameter and target-specific
# prior on the dispersion parameter.

fixef_priors <- data.frame(
  prior = "normal(0, 1)",
  class = "fixef",
  coef = "conditionB"
)

dips_priors <- data.frame(
  prior = c(
    "normal(0.5, 0.25)",
    "normal(1, 0.25)"),
  class = "fixef_disp",
  coef = 1
)


# Combine information in target-specific list
prior_list <- list()
for(i in 1:2) {

  prior_list[[i]] <- rbind(
    fixef_priors,
    dips_priors[i,]
  )


}



container <- seqwrap_compose(
  modelfun = glmmTMB::glmmTMB,
  # NOTE: The use of `alist` prevents evaluation of list components
  arguments = alist(
    formula = y ~ time * condition + (1|id) + offset(ln_libsize),
    family = glmmTMB::nbinom2(),
    priors = data.frame(
      prior = prior,
      class = class,
      coef = coef
    )
  ),
  data = counts,
  metadata = metadata,
  targetdata = prior_list,
  samplename = "seq_sample_id"
)

# Run seqwrap using the container
results_prior <- seqwrap(container,
                         # Return models to confirm use of prior information
                         # The use of prior information is not recommended
                         return_models = TRUE,
                         cores = 1)

# Confirm prior information
summary(results_prior@models[[1]])
# Compare to naive model
summary(results@models[[1]])
}


Summarise seqwrapResults objects

Description

Summarise seqwrapResults objects

Usage

seqwrap_summarise(
  x,
  summaries = TRUE,
  evaluations = TRUE,
  errors = TRUE,
  verbose = TRUE
)

Arguments

x

A seqwrapResults object

summaries

Logical, should summaries be combined?

evaluations

Logical, should evaluations be combined?

errors

Logical, should errors be combined?

verbose

Logical should progress be printed? Default TRUE

Details

This functions attempts to summarize results from the summary and evaluation functions applied in each iteration during modelling. The function expects that the summary and evaluation functions return data frames.

Value

A list (invisibly) with up to two data frames: summaries, combined parameter summaries from each model, and evaluations, combined diagnostics from each model. Entries are omitted when the corresponding slot of x is empty or the user disables them via the summaries / evaluations arguments.

Examples

# Load packages and prepare data for examples -------------------------------

library(seqwrap)

if (requireNamespace("glmmTMB", quietly = TRUE)) {
library(glmmTMB)

# Simulate n targets
dat <- simcounts(n_genes = 10000)

# Save simulated data as separate objects
counts <- dat$data
metadata <- dat$metadata

# Prepare library size for use as offset
metadata$ln_libsize <- log(colSums(counts[,-1]))


# A mixed effects negative binomial model of RNA-seq counts -----------------

# Populate the seqwrap container
container <- seqwrap_compose(
  modelfun = glmmTMB::glmmTMB,
  arguments = list(
    formula = y ~ x + (1|cluster) + offset(ln_libsize),
    family = glmmTMB::nbinom2()
  ),
  data = counts,
  metadata = metadata,
  samplename = "sample"
)

# Run seqwrap using the container on a subset of targets
results <- seqwrap(container,
                   subset = 1:5,
                   cores = 1)


# Summarise results only contains the subset
summaries <- seqwrap_summarise(results)

# Get summaries
summaries$summaries

# Get model evaluations
summaries$evaluations
}



Simulate counts from a simple experiment (paired or unpaired) using Poisson or negative binomial distributions.

Description

This function is used for internal testing and tutorials.

Usage

simcounts(
  n_genes = 10,
  n_samples = 16,
  beta_0 = 1,
  sigma_0 = 0.5,
  beta_1 = 1,
  sigma_1 = 0.5,
  b_0 = 0.5,
  clusters = 8,
  sample_sd = 0.5,
  overdispersion_min_max = c(1, 10),
  seed = 123
)

Arguments

n_genes

Number of genes to simulate.

n_samples

Number of samples to simulate.

beta_0

Gene intercepts.

sigma_0

SD of gene intercepts

beta_1

Gene slopes.

sigma_1

SD of gene slopes

b_0

Cluster-specific intercept. If NULL, non-paired data is simulated.

clusters

Number of clusters. If NULL, non-paired data is simulated.

sample_sd

SD of sample-specific effects.

overdispersion_min_max

Range of overdispersion parameter. If NULL, the Poisson distribution is used to simulate data.

seed

Random seed for reproducibility.

Value

A list with three elements: data, a data frame of simulated counts with one row per gene and one column per sample (first column identifies the gene); parameters, a data frame of true gene-wise coefficients and overdispersion values; and metadata, a data frame of sample-level covariates (sample, cluster, x).

Examples

# Simulate n targets from the negative binomial distribution
dat <- simcounts(n_genes = 10,
                 overdispersion_min_max = c(1, 5))

# Simulate n targets from the Poisson distribution
dat <- simcounts(n_genes = 10,
                 overdispersion_min_max = NULL)


# Data are organized in counts
dat$data
# .. and meta data
dat$metadata


Simulate counts from a parallel groups design with three time-points

Description

Simulate gene counts from a Negative Binomial distribution conditional on a two-condition, repeated measures experiment with three time points. The function is used for internal testing and tutorials.

Usage

simcounts2(
  n1 = 5,
  n2 = 5,
  beta0 = c(2.3, 3),
  conditionB = c(0.2, 0.1),
  timet2 = c(0, 0),
  timet3 = c(0.5, 0.25),
  conditionB_timet2 = c(0.1, 0.2),
  conditionB_timet3 = c(0.5, 0.6),
  b0 = c(1, 1),
  b1 = c(0, 0),
  b2 = c(0, 0),
  phi_model = NULL,
  library_size = NULL,
  lib_size_mean = 1e+06,
  lib_size_cv = 0.3,
  max_prop = 0.02
)

Arguments

n1

Number clusters in group 1 (control)

n2

Number of clusters in group 2 (treatment)

beta0

The intercept parameter on the log scale.

conditionB

Baseline differences between conditions

timet2

Changes from baseline to time-point 2 in the reference group

timet3

Changes from baseline to time-point 3 in the reference group

conditionB_timet2

Difference from reference group in changes from baseline to time-point 2

conditionB_timet3

Difference from reference group in changes from baseline to time-point 3

b0

Vector of SD of the between cluster variation in intercept

b1

Vector of SD of the between cluster variation in timet1 effects

b2

Vector of SD of the between cluster variation in timet2 effects

phi_model

A model (lm or loess) of a dispersion ~ log_mu relationship

library_size

A vector of library sizes with the length (n1 + n2) * 3 or NULL if library sizes are to be simulated

lib_size_mean

Mean of the distribution of library sizes.

lib_size_cv

Coefficient of variation for the distribution of library sizes.

max_prop

The maximum count for a single observation as a proportion of the library size.

Details

Fixed effects parameter values are specified as vectors (beta0, conditionB, timet2, etc.) with the common vector length being the number of genes simulated.

The function simulation of counts from a conditional Negative Binomial distribution:

y_{i[g]} \sim \text{NB}(\mu_{i[g]}, \phi_{[g]})

\text{log}(\mu_{i[g]}) = \beta_0 + \beta1 \text{condition}_B + \\ \beta_2 \text{time}_{t2} + \beta_3 \text{time}_{t3} + \\ \beta_4 \text{condition}_B \times \text{time}_{t2} + \\ \beta_5 \text{condition}_B \times \text{time}_{t3} + \\ \text{offset}(\text{library size}) + \\ \text{b}_0 + \text{b}_1 \times \text{time}_{t2} + \text{b}_2 \times \text{time}_{t3}

Varying effects are added as:

\text{b}_l \sim \text{Normal}(0, \sigma_l)

The library size is used as an offset for simulating data. Library sizes are simulated from a log-normal distribution or provided in library_size. In the simulation of counts, the library size is entered as offset after scaling to the median library size. Raw library sizes are included in the meta data. The dispersion parameter can be simulated from a model (lm or loess) provided to the function where the predictor variable should be \text{log}(\mu) and the outcome variable (dispersion) should be \text{log}(\phi). If no model is provided, values are simulated from a log-normal distribution based on hard coded parameter values from an observed mean-dispersion relationship.

Value

A list of (1) simulated counts, (2) the eta, and (3) phi values used for simulating data, and (4) the meta data data frame.

Examples

# Simulate n_genes number of genes
n_genes <- 1000

dat <- simcounts2(
  n1 = 5,
  n2 = 5,
  beta0 = rnorm(n_genes, mean = 5),
  conditionB =rnorm(n_genes),
  timet2 = rnorm(n_genes),
  timet3 = rnorm(n_genes),
  conditionB_timet2 = rnorm(n_genes),
  conditionB_timet3 = rnorm(n_genes),
  b0 = abs(rnorm(n_genes)),
  b1 = abs(rnorm(n_genes)),
  b2 = abs(rnorm(n_genes))
)

# Data are organized in counts
dat$counts
# .. and meta data
dat$metadata



seqwrap container class

Description

The seqwrap container is used to store and validate data input to the to seqwrap.

Arguments

.data

A list containing initial data (inherited from S7::class_list)

modelfun

A function used to fit models

arguments

A list of arguments for the fitting function

data

A data frame or list of data frames with target data

rownames

Logical, should row names be used as target IDs?

metadata

A data frame with sample information

targetdata

A data frame with target-wise information

samplename

Character for sample name identification

additional_vars

Character vector of additional variables

summary_fun

A function for summarizing models

eval_fun

A function for evaluating models

exported

A list of objects to export to workers

model_print

Character representation of model function

arguments_print

Character representation of arguments

Value

An S7 object of class swcontainer bundling data, metadata, and the modelling function plus arguments to be consumed by seqwrap().

Examples

# swcontainer is the S7 class produced by seqwrap_compose(). End users
# normally build one via seqwrap_compose() rather than calling this
# constructor directly.

library(seqwrap)

dat <- simcounts(n_genes = 5)

container <- seqwrap_compose(
  modelfun = stats::lm,
  arguments = list(formula = y ~ x),
  data = dat$data,
  metadata = dat$metadata,
  samplename = "sample"
)

# The object returned by seqwrap_compose() is a swcontainer instance
S7::S7_inherits(container, swcontainer)

mirror server hosted at Truenetwork, Russian Federation.