jamba

The goal of jamba is to provide useful custom functions for R data analysis and visualization. jamba version 1.0.2

Package Reference

A full online function reference is available via the pkgdown documentation:

Full jamba command reference

Functions are categorized, some examples are listed below:

Installation

Production will soon be available from CRAN:

install.packages("jamba")

The development version can be installed:

remotes::install_github("jmw86069/jamba")

Additional R Packages in “Suggests”

Additional R Packages in “Enhances”

Bioconductor packages are invaluable for bioinformatics work, but can be a bit “heavy” to install if not absolutely necessary. Therefore, Bioconductor packages are in “Enhances” so they require someone to make the choice to install them.

Background

The R functions in jamba have been built up, used, tested, revised over several years. They are immediately useful for day-to-day work, and efficient and robust enough for production pipelines.

Many were inspired by discussion from Stackoverflow, R-help, or Bioconductor, with citations thanking principal author(s). Many thanks to the original authors! The R community is built upon the collective greatness of its contributors!

Most of the functions are designed around workflows for Bioinformatics analyses, where functions need to be efficient when operating over 10,000 to 100,000 elements. (They work quite well with millions as well.) Usually the speed gains are obvious with about 100 elements, then scale linearly (or worse) as the number increases. I and others use these functions all the time.

One example function writeOpenxlsx() is a simple wrapper around very useful openxlsx::write.xlsx(), which also applies column formatting for column types: P-values, fold changes, log2 fold changes, numeric, and integer values. Columns use conditional Excel formatting to apply color-shading to cells for each type.

Similarly, readOpenxlsx() is a wrapper function to openxlsx::read.xlsx() which reads each worksheet and returns a list of data.frame objects. It can detect multi-row column headers, for which it returns combined column names. It also applies equivalent of check.names=FALSE so column names are returned without change.

Small and large efficiencies are used wherever possible. The mixedSort() functions are based upon gtools::mixedsort(), with additional optimizations for speed and custom needs. It sorts chromosome names, gene names, micro-RNA names, etc.

Alphanumeric sort

Example:

miRNA sort_rank mixedSort_rank
2 ABCA2 2 1
1 ABCA12 1 2
3 miR-1 3 3
6 miR-1a 6 4
7 miR-1b 7 5
8 miR-2 8 6
4 miR-12 4 7
9 miR-22 9 8
5 miR-122 5 9

Base R plotting

These functions help with base R plots, in all those little cases when the amazing ggplot2 package is not a smooth fit.

Excel export

Every Bioinformatician/statistician needs to write data to Excel, the writeOpenxlsx() function is consistent and makes it look pretty. You can save numerous worksheets in a single Excel file, without having to go back and custom-format everything.

Colors

Almost everything uses color somewhere, especially on R console, and in every R plot.

Image showing a series of color palettes, adjusting contrast with lens, and expanding palettes with color2gradient()

List Functions

Efficient methods to operate on lists in one call, to avoid looping through the list either with for() loops, lapply() or map() functions. Driven by speed with 10k-100k rows, typical biological datasets.

Compared to convenient alternatives, apply() or tidyverse, typically order of magnitude faster. (Ymmv.) Notable exceptions: data.table and Bioconductor S4Vectors. Both are amazing, and are fairly heavy installations. S4Vectors is used when available.

Unique names with versions

R object names provide an additional method to confirm data are kept in the proper order. Duplicated names may be silently ignored, which motivated the easy approach to “make unique names”.

data.frame/matrix/tibble

String / grep

Numeric

Common usage

noiseFloor(0:10, minimum=1e-20, newValue=NA)
#>  [1] NA  1  2  3  4  5  6  7  8  9 10
noiseFloor(0:10, minimum=3)
#>  [1]  3  3  3  3  4  5  6  7  8  9 10
noiseFloor(c(0:10, NA), minimum=3, adjustNA=TRUE)
#>  [1]  3  3  3  3  4  5  6  7  8  9 10  3

Practical / helpful

jargs(plotSmoothScatter)
#>                 x = ,
#>                 y = NULL,
#>              bwpi = 50,
#>             binpi = 50,
#>        bandwidthN = NULL,
#>              nbin = NULL,
#>            expand = c(0.04, 0.04),
#>       transFactor = 0.25,
#>    transformation = function( x ) x^transFactor,
#>              xlim = NULL,
#>              ylim = NULL,
#>              xlab = NULL,
#>              ylab = NULL,
#>          nrpoints = 0,
#>           colramp = c("white", "lightblue", "blue", "orange", "orangered2"),
#>               col = "black",
#>            doTest = FALSE,
#>    fillBackground = TRUE,
#>          naAction = c("remove", "floor0", "floor1"),
#>              xaxt = "s",
#>              yaxt = "s",
#>               add = FALSE,
#>               asp = NULL,
#> applyRangeCeiling = TRUE,
#>         useRaster = TRUE,
#>           verbose = FALSE,
#>               ... =

R console

RMarkdown

printDebugHtml("printDebugHtml(): ",
  "Output is colorized: ",
  head(LETTERS, 8))

(12:05:41) 07Mar2025: printDebugHtml(): Output is colorized: A,B,C,D,E,F,G,H


withr::with_options(list(jam.htmlOut=TRUE, jam.comment=FALSE), {
  printDebugHtml(c("printDebug() using withr::with_options(): "),
    c("Output should be colorized: "),
    head(LETTERS, 8));
})

(12:05:41) 07Mar2025: printDebug() using withr::with_options(): Output should be colorized: A,B,C,D,E,F,G,H

expt_df <- data.frame(
  Sample_ID="",
  Treatment=rep(c("Vehicle", "Dex"), each=6),
  Genotype=rep(c("Wildtype", "Knockout"), each=3),
  Rep=paste0("rep", c(1:3)))
expt_df$Sample_ID <- pasteByRow(expt_df[, 2:4])

# define colors
colorSub <- c(Vehicle="palegoldenrod",
  Dex="navy",
  Wildtype="gold",
  Knockout="firebrick",
  nameVector(color2gradient("grey48", n=3, dex=10), rep("rep", 3), suffix=""),
  nameVector(
    color2gradient(n=3,
      c("goldenrod1", "indianred3", "royalblue3", "darkorchid4")),
    expt_df$Sample_ID))
kbl <- kable_coloring(
  expt_df,
  caption="Experiment design table showing categorical color assignment.",
  colorSub)

Jam Github R packages are being transitioned to CRAN/Bioconductor:

mirror server hosted at Truenetwork, Russian Federation.