---
title: "Chapter 04: The nmath OpenCL Library"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Chapter 04: The nmath OpenCL Library}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## What is `inst/cl/nmath/`?

The `nmath/` subdirectory under `inst/cl/` contains a complete OpenCL C
port of R's `nmath` (Mathlib) library --- approximately 180 `.cl` files,
one per source file in R's `src/nmath/`. Together they implement the full
suite of statistical distribution functions (densities, CDFs, quantiles,
random-variate generators) and supporting mathematical helpers that make up
R's statistical computing core.

The port is designed so that any downstream package can load a **subset** of
this library --- just the functions it actually needs --- without pulling in the
entire collection. The annotation scheme and dependency resolver (described
in Chapter 08) make this automatic.

## Annotation scheme

Every `.cl` file in `nmath/` begins with a standard metadata block:

```c
// @source_type: c
// @source_origin: dnorm.c
// @includes: nmath.h, dpq.h
// @depends: nmath, dpq
// @provides: dnorm4
// @all_depends_count: 3
// @all_depends: dpq, Rmath, nmath
// @load_order: 36
```

| Tag | Meaning |
|-----|---------|
| `@source_type` | Always `c` for nmath ports |
| `@source_origin` | The original C source file in R's `src/nmath/` |
| `@includes` | Headers included in the original C file |
| `@depends` | **Direct** file dependencies (`.cl` filenames without extension) |
| `@provides` | Functions defined in this file |
| `@all_depends` | Full transitive closure of dependencies (union over the DAG) |
| `@all_depends_count` | Number of entries in `@all_depends` |
| `@load_order` | Pre-computed position in the topologically sorted order |

The `@depends` tag is what `opencltools::load_kernel_library()` uses to sort files. The
`@all_depends` tag is informational --- it documents the complete dependency
set so that a developer extracting a minimal subset knows exactly which files
to include.

## File families

The ~180 files are organized by function family. The table below shows
representative entries for the main families.

### Infrastructure / header files

| File | Provides |
|------|----------|
| `nmath.cl` | Library header: constants, macros, error codes |
| `Rmath.cl` | Forward declarations for all Mathlib functions |
| `dpq.cl` | `R_D__0`, `R_D__1`, `R_DT_0`, `R_DT_1`, `R_D_Lval`, ... (log-prob helpers) |
| `refactored.cl` | Forward declarations for cycle-broken functions |

### Core math helpers

| File | Provides |
|------|----------|
| `chebyshev.cl` | `chebyshev_init`, `chebyshev_eval` |
| `cospi.cl` | `cospi`, `sinpi`, `tanpi` |
| `fmax2.cl` | `fmax2` |
| `fmin2.cl` | `fmin2` |
| `gammalims.cl` | `gammalims` |
| `lgammacor.cl` | `lgammacor` |
| `lgamma.cl` | `lgammafn`, `lgammafn_sign` |
| `gamma.cl` | `gammafn` |
| `stirlerr.cl` | `stirlerr` |
| `bd0.cl` | `bd0`, `ebd0` |
| `pgamma_utils.cl` | `log1pmx`, `lgamma1p` |
| `mlutils.cl` | `log1pexp`, `log1mexp`, `logspace_add`, `logspace_sub` |
| `expm1.cl` | `expm1` (if not an OpenCL built-in) |
| `log1p.cl` | `log1p` (if not an OpenCL built-in) |

### Density functions (`d*`)

| File | Provides |
|------|----------|
| `dnorm.cl` | `dnorm4` |
| `dgamma.cl` | `dgamma` |
| `dbinom.cl` | `dbinom_raw`, `dbinom` |
| `dpois.cl` | `dpois_raw`, `dpois` |
| `dbeta.cl` | `dbeta` |
| `dt.cl` | `dt` |
| `df.cl` | `df` |
| `dnbinom.cl` | `dnbinom`, `dnbinom_mu` |
| `dchisq.cl` | `dchisq` |
| `dexp.cl` | `dexp` |
| `dweibull.cl` | `dweibull` |
| `dlnorm.cl` | `dlnorm` |
| `dlogis.cl` | `dlogis` |
| `dcauchy.cl` | `dcauchy` |
| `dunif.cl` | `dunif` |
| `dhyper.cl` | `dhyper` |
| `dgeom.cl` | `dgeom` |
| `dnt.cl` | `dnt` (noncentral t) |
| `dnf.cl` | `dnf` (noncentral F) |
| `dnchisq.cl` | `dnchisq` (noncentral chi-sq) |
| `dnbeta.cl` | `dnbeta` (noncentral beta) |

### CDF functions (`p*`)

Similar coverage for `pnorm.cl`, `pgamma.cl`, `pbinom.cl`, `ppois.cl`,
`pbeta.cl`, `pt.cl`, `pf.cl`, etc. Each file ports the corresponding
`p<dist>.c` from R's source.

### Quantile functions (`q*`)

`qnorm.cl`, `qgamma.cl`, `qbinom.cl`, `qpois.cl`, `qbeta.cl`, `qt.cl`,
`qf.cl`, etc. Discrete quantile functions use a shared binary-search
helper in `qDiscrete_search.cl`.

### Random-variate generators (`r*`)

`rnorm.cl`, `rgamma.cl`, `rbinom.cl`, `rpois.cl`, `rbeta.cl`, `rt.cl`,
`runif.cl`, etc. These files define the device-side generator logic; the
actual RNG seed and state management is handled by the host-side kernel
runner.

### Special functions

| File | Provides |
|------|----------|
| `beta.cl` | `beta` (beta function) |
| `lbeta.cl` | `lbeta` |
| `choose.cl` | `choose`, `lchoose` |
| `polygamma.cl` | `psigamma`, `digamma`, `trigamma`, ... |
| `toms708.cl` | Incomplete beta (TOMS 708 algorithm) |
| `bessel.cl` | `bessel_i`, `bessel_j`, `bessel_k`, `bessel_y` |
| `signrank.cl` | `dsignrank`, `psignrank`, `qsignrank`, `rsignrank` |
| `wilcox.cl` | `dwilcox`, `pwilcox`, `qwilcox`, `rwilcox` |

### Additional utilities

| File | Provides |
|------|----------|
| `fprec.cl`, `fround.cl`, `ftrunc.cl`, `fsign.cl` | Rounding/truncation |
| `imax2.cl`, `imin2.cl` | Integer min/max |
| `sign.cl` | `sign` |
| `d1mach.cl`, `i1mach.cl` | Machine constants (BLAS-era helpers) |
| `r_check_user_interrupt.cl` | Stub for interrupt checking |
| `sexp.cl`, `snorm.cl`, `sunif.cl` | Alternative RNG implementations |

## Cycle-breaking artifacts

OpenCL compiles a program as a **single translation unit**: every function
must be defined before it is called (no forward-declaration headers, no
separate compilation). A handful of the nmath C files are mutually
recursive in ways that C handles via separate compilation but OpenCL cannot.

`nmathopencl` resolves these cycles by splitting the offending source file
into two `.cl` files:

```
stirlerr.c  ->  stirlerr_cycle_free.cl      (no forward references)
               stirlerr_cycle_dependent.cl  (calls the forward-declared part)
               refactored.cl               (forward declarations only)
```

`refactored.cl` contains `extern`-style forward declarations so that the
cycle-dependent portion can reference symbols that appear later in the
concatenated source. The `@depends` tags ensure the three files are loaded
in the correct order: `refactored` -> `stirlerr_cycle_free` ->
`stirlerr_cycle_dependent` -> `stirlerr`.

Similarly, `bessel_j` and `bessel_y` are each split into `*_cycle_free.cl`
and `*_cycle_dependent.cl` for the same reason.

## The `ex_glmbayes_nmath/` subset

Because the full `nmath/` library is large (~180 files), the package also
ships a pre-curated minimal subset in `inst/cl/ex_glmbayes_nmath/` ---
exactly the 20 files needed by the six `glmbayes` kernels:

```
Rmath.cl, nmath.cl, refactored.cl,
chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl, lgammacor.cl,
gamma.cl, lgamma.cl,
stirlerr_cycle_free.cl, pgamma_utils.cl, stirlerr_cycle_dependent.cl,
stirlerr.cl, bd0.cl,
dpois.cl, dgamma.cl, dnorm.cl, pnorm.cl, dbinom.cl
```

This subset is determined by the `@all_depends` metadata in the six kernel
files. Chapter 10 shows how to perform this extraction for a new downstream
package.

## Porting fidelity

The port aims for bit-for-bit equivalence with the CPU nmath results on
IEEE 754 hardware. The main sources of deviation are:

1. **Compiler flags**: the OpenCL compiler may reorder floating-point
   operations unless `-cl-fp64-round-to-nearest` is specified. The
   `OpenCLConfig` struct in `openclPort.h` probes the device and sets
   appropriate build flags.
2. **Built-in precision**: some OpenCL devices implement `sqrt`, `exp`, and
   `log` with 1 ULP error (IEEE 754 guarantees exactly rounded); others
   allow slightly more. In practice, deviations are below numerical
   tolerance for all statistical applications.
3. **OpenCL built-ins**: functions like `lgamma`, `tgamma`, `expm1`, and
   `log1p` are part of the OpenCL 1.1+ specification and do **not** need to
   be ported. The corresponding `.cl` files include `#ifndef` guards so
   that the ported version is only compiled if the built-in is absent.
