Chapter 04: The nmath OpenCL Library

Kjell Nygren

2026-06-11

What is inst/cl/nmath/?

The nmath/ subdirectory under inst/cl/ contains a complete OpenCL C port of R’s nmath (Mathlib) library — approximately 180 .cl files, one per source file in R’s src/nmath/. Together they implement the full suite of statistical distribution functions (densities, CDFs, quantiles, random-variate generators) and supporting mathematical helpers that make up R’s statistical computing core.

The port is designed so that any downstream package can load a subset of this library — just the functions it actually needs — without pulling in the entire collection. The annotation scheme and dependency resolver (described in Chapter 08) make this automatic.

Annotation scheme

Every .cl file in nmath/ begins with a standard metadata block:

// @source_type: c
// @source_origin: dnorm.c
// @includes: nmath.h, dpq.h
// @depends: nmath, dpq
// @provides: dnorm4
// @all_depends_count: 3
// @all_depends: dpq, Rmath, nmath
// @load_order: 36
Tag Meaning
@source_type Always c for nmath ports
@source_origin The original C source file in R’s src/nmath/
@includes Headers included in the original C file
@depends Direct file dependencies (.cl filenames without extension)
@provides Functions defined in this file
@all_depends Full transitive closure of dependencies (union over the DAG)
@all_depends_count Number of entries in @all_depends
@load_order Pre-computed position in the topologically sorted order

The @depends tag is what opencltools::load_kernel_library() uses to sort files. The @all_depends tag is informational — it documents the complete dependency set so that a developer extracting a minimal subset knows exactly which files to include.

File families

The ~180 files are organized by function family. The table below shows representative entries for the main families.

Infrastructure / header files

File Provides
nmath.cl Library header: constants, macros, error codes
Rmath.cl Forward declarations for all Mathlib functions
dpq.cl R_D__0, R_D__1, R_DT_0, R_DT_1, R_D_Lval, … (log-prob helpers)
refactored.cl Forward declarations for cycle-broken functions

Core math helpers

File Provides
chebyshev.cl chebyshev_init, chebyshev_eval
cospi.cl cospi, sinpi, tanpi
fmax2.cl fmax2
fmin2.cl fmin2
gammalims.cl gammalims
lgammacor.cl lgammacor
lgamma.cl lgammafn, lgammafn_sign
gamma.cl gammafn
stirlerr.cl stirlerr
bd0.cl bd0, ebd0
pgamma_utils.cl log1pmx, lgamma1p
mlutils.cl log1pexp, log1mexp, logspace_add, logspace_sub
expm1.cl expm1 (if not an OpenCL built-in)
log1p.cl log1p (if not an OpenCL built-in)

Density functions (d*)

File Provides
dnorm.cl dnorm4
dgamma.cl dgamma
dbinom.cl dbinom_raw, dbinom
dpois.cl dpois_raw, dpois
dbeta.cl dbeta
dt.cl dt
df.cl df
dnbinom.cl dnbinom, dnbinom_mu
dchisq.cl dchisq
dexp.cl dexp
dweibull.cl dweibull
dlnorm.cl dlnorm
dlogis.cl dlogis
dcauchy.cl dcauchy
dunif.cl dunif
dhyper.cl dhyper
dgeom.cl dgeom
dnt.cl dnt (noncentral t)
dnf.cl dnf (noncentral F)
dnchisq.cl dnchisq (noncentral chi-sq)
dnbeta.cl dnbeta (noncentral beta)

CDF functions (p*)

Similar coverage for pnorm.cl, pgamma.cl, pbinom.cl, ppois.cl, pbeta.cl, pt.cl, pf.cl, etc. Each file ports the corresponding p<dist>.c from R’s source.

Quantile functions (q*)

qnorm.cl, qgamma.cl, qbinom.cl, qpois.cl, qbeta.cl, qt.cl, qf.cl, etc. Discrete quantile functions use a shared binary-search helper in qDiscrete_search.cl.

Random-variate generators (r*)

rnorm.cl, rgamma.cl, rbinom.cl, rpois.cl, rbeta.cl, rt.cl, runif.cl, etc. These files define the device-side generator logic; the actual RNG seed and state management is handled by the host-side kernel runner.

Special functions

File Provides
beta.cl beta (beta function)
lbeta.cl lbeta
choose.cl choose, lchoose
polygamma.cl psigamma, digamma, trigamma, …
toms708.cl Incomplete beta (TOMS 708 algorithm)
bessel.cl bessel_i, bessel_j, bessel_k, bessel_y
signrank.cl dsignrank, psignrank, qsignrank, rsignrank
wilcox.cl dwilcox, pwilcox, qwilcox, rwilcox

Additional utilities

File Provides
fprec.cl, fround.cl, ftrunc.cl, fsign.cl Rounding/truncation
imax2.cl, imin2.cl Integer min/max
sign.cl sign
d1mach.cl, i1mach.cl Machine constants (BLAS-era helpers)
r_check_user_interrupt.cl Stub for interrupt checking
sexp.cl, snorm.cl, sunif.cl Alternative RNG implementations

Cycle-breaking artifacts

OpenCL compiles a program as a single translation unit: every function must be defined before it is called (no forward-declaration headers, no separate compilation). A handful of the nmath C files are mutually recursive in ways that C handles via separate compilation but OpenCL cannot.

nmathopencl resolves these cycles by splitting the offending source file into two .cl files:

stirlerr.c  ->  stirlerr_cycle_free.cl      (no forward references)
               stirlerr_cycle_dependent.cl  (calls the forward-declared part)
               refactored.cl               (forward declarations only)

refactored.cl contains extern-style forward declarations so that the cycle-dependent portion can reference symbols that appear later in the concatenated source. The @depends tags ensure the three files are loaded in the correct order: refactored -> stirlerr_cycle_free -> stirlerr_cycle_dependent -> stirlerr.

Similarly, bessel_j and bessel_y are each split into *_cycle_free.cl and *_cycle_dependent.cl for the same reason.

The ex_glmbayes_nmath/ subset

Because the full nmath/ library is large (~180 files), the package also ships a pre-curated minimal subset in inst/cl/ex_glmbayes_nmath/ — exactly the 20 files needed by the six glmbayes kernels:

Rmath.cl, nmath.cl, refactored.cl,
chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl, lgammacor.cl,
gamma.cl, lgamma.cl,
stirlerr_cycle_free.cl, pgamma_utils.cl, stirlerr_cycle_dependent.cl,
stirlerr.cl, bd0.cl,
dpois.cl, dgamma.cl, dnorm.cl, pnorm.cl, dbinom.cl

This subset is determined by the @all_depends metadata in the six kernel files. Chapter 10 shows how to perform this extraction for a new downstream package.

Porting fidelity

The port aims for bit-for-bit equivalence with the CPU nmath results on IEEE 754 hardware. The main sources of deviation are:

  1. Compiler flags: the OpenCL compiler may reorder floating-point operations unless -cl-fp64-round-to-nearest is specified. The OpenCLConfig struct in openclPort.h probes the device and sets appropriate build flags.
  2. Built-in precision: some OpenCL devices implement sqrt, exp, and log with 1 ULP error (IEEE 754 guarantees exactly rounded); others allow slightly more. In practice, deviations are below numerical tolerance for all statistical applications.
  3. OpenCL built-ins: functions like lgamma, tgamma, expm1, and log1p are part of the OpenCL 1.1+ specification and do not need to be ported. The corresponding .cl files include #ifndef guards so that the ported version is only compiled if the built-in is absent.