Chapter 12: The nmathopencl R API — Distribution Functions on the GPU

Kjell Nygren

2026-06-11

Overview

nmathopencl exports a family of R functions that mirror R’s stats:: distribution functions but dispatch computation to an OpenCL device when one is available. The calling convention is intentionally close to the base-R equivalents so that substitution is straightforward.

All exported functions share three common arguments:

Argument Type Default Meaning
fallback logical FALSE If TRUE while nmathopencl_has_opencl() is TRUE, fall back to the stats:: analogue when the OpenCL call fails; ignored when OpenCL is unavailable (CPU is used anyway)
verbose logical FALSE Extra diagnostics (including when picking the CPU path with no OpenCL runtime)
log / lower.tail logical distribution-specific Mirror the stats:: convention for the same function

Checking availability

library(nmathopencl)

# TRUE if this nmathopencl build was compiled with USE_OPENCL
nmathopencl_has_opencl()

# GPU device names on the host (opencltools --- system inventory)
opencltools::gpu_names()

If OpenCL is not available (nmathopencl_has_opencl() returns FALSE), calls use the CPU equivalent unconditionally so examples and CI keep working.

By default fallback = FALSE: when OpenCL is available but a dispatch fails, the error propagates—useful while debugging kernels. Pass fallback = TRUE only when you deliberately want GPU failures masked with the CPU analogue.

Normal distribution

x  <- rnorm(1e6)

# Density
d  <- dnorm_opencl(x, mean = 0, sd = 1, log = FALSE)

# CDF --- same core arguments as stats::pnorm(q, mean, sd, lower.tail, log.p);
#          plus opencl_parallel, fallback, verbose. Long outputs use recycling length.
p <- pnorm_opencl(
  rep(1.96, 1e6),
  mean = 0,
  sd = 1,
  lower.tail = TRUE,
  log.p = FALSE
)

# Quantile (`qnorm_opencl` retains leading `n` + scalar `p`, unlike `stats::qnorm`-style vectors)
q  <- qnorm_opencl(n = 1e6, p = 0.975, mean = 0, sd = 1)

# Random draws
r  <- rnorm_opencl(n = 1e6, mean = 0, sd = 1)

Note the signature difference between dnorm_opencl (takes a vector x) and qnorm_opencl / rnorm_opencl, which retain a leading n argument for replicate-many quantiles (qnorm) or draws (rnorm). For the CDF, pnorm_opencl parallels stats::pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) and adds only opencl_parallel, fallback, and verbose. Output length is the usual recycled length (max(length(q), ...), as in stats). On GPU (when used), each output element runs pnorm_kernel once in sequence (high overhead until a batched kernel is added).

Distribution families

Gamma

dgamma_opencl(n, x, shape = 2, scale = 1)
pgamma_opencl(q, shape = 2, rate = 1)
rgamma_opencl(n, shape = 2, scale = 1)
# qgamma: use stats::qgamma(); the OpenCL kernel is internal-only
# (device compile failure, see inst/OPENCL_KERNEL_KNOWN_FAILURES.md)

Binomial

dbinom_opencl(x, size = 10, prob = 0.3)
pbinom_opencl(q, size = 10, prob = 0.3)
qbinom_opencl(p, size = 10, prob = 0.3)
rbinom_opencl(n, size = 10, prob = 0.3)

Poisson

dpois_opencl(x, lambda = 3)
ppois_opencl(q, lambda = 3)
qpois_opencl(p, lambda = 3)
rpois_opencl(n, lambda = 3)

Beta

dbeta_opencl(x, shape1 = 2, shape2 = 5)
pbeta_opencl(q, shape1 = 2, shape2 = 5)
rbeta_opencl(n, shape1 = 2, shape2 = 5)
# qbeta: use stats::qbeta(); the OpenCL kernel is internal-only
# (device link failure, see inst/OPENCL_KERNEL_KNOWN_FAILURES.md)

Additional families

All families below follow the same four-function pattern (d*, p*, q*, r*) with the same fallback / verbose arguments:

Family R file
Cauchy cauchy_opencl.R
Chi-squared chisq_opencl.R
Exponential exponential_opencl.R
F f_opencl.R
Geometric geometric_opencl.R
Hypergeometric hypergeometric_opencl.R
Log-normal lnorm_opencl.R
Logistic logistic_opencl.R
Negative binomial negative_binomial_opencl.R
t t_opencl.R
Uniform uniform_opencl.R
Weibull weibull_opencl.R
Multivariate discrete multinomial_opencl.R
Tukey tukey_opencl.R

The Wilcoxon rank-sum (rwilcox_opencl.R), Wilcoxon signed-rank (signrank_opencl.R), and Bessel (bessel_opencl.R) wrappers are internal-only: their OpenCL ports have documented allocation/runtime failures (see inst/OPENCL_KERNEL_KNOWN_FAILURES.md and the notes in inst/cl/). Use the stats/base equivalents instead.

Noncentral distributions

Noncentral variants (dnt_opencl, dnchisq_opencl, dnf_opencl, dnbeta_opencl, and their p*, q* counterparts) are available with the noncentrality parameter ncp.

Special functions and math support

# Log-gamma
lgammafn_opencl(n, x)
lgammafn_sign_opencl(n, x)

# Gamma function
gammafn_opencl(n, x)

# Log-gamma at x+1
lgamma1p_opencl(n, x)

# Digamma / polygamma
digamma_opencl(n, x)
trigamma_opencl(n, x)
psigamma_opencl(n, x, deriv)

# Beta and log-beta
beta_special_opencl(n, a, b)
lbeta_special_opencl(n, a, b)

# Binomial coefficients
choose_special_opencl(n_out, n, k)
lchoose_special_opencl(n_out, n, k)

# Logspace arithmetic
logspace_add_opencl(n, logx, logy)
logspace_sub_opencl(n, logx, logy)
logspace_sum_opencl(n, logx, logy)

# Math support
fmax2_opencl(n, x, y)
fmin2_opencl(n, x, y)

These are exported via special_opencl.R and math_support_opencl.R.

R extension utilities (R_ext)

# R-compatible power
r_pow_opencl(n, x, y)
r_pow_di_opencl(n, x, i)

# Miscellaneous math helpers
log1pmx_opencl(n, x)
log1pexp_opencl(n, x)
log1mexp_opencl(n, x)
pow1p_opencl(n, x, y)

These are exported via rext_utils_opencl.R.

RNG core

# Base RNG draws
norm_rand_opencl(n)
unif_rand_opencl(n)
exp_rand_opencl(n)
r_unif_index_opencl(n, dt)

These generate random samples using device-side implementations of the same Kinderman-Ramage / Ahrens-Dieter algorithms used by R’s internal RNG. They are exported via rng_core_opencl.R.

Fallback behavior in detail

  1. nmathopencl_has_opencl() is FALSE (no runtime device / not compiled with OpenCL): the wrapper always uses the CPU stats::/base analogue. The fallback argument does not gate this; it is ignored for this branch.

  2. nmathopencl_has_opencl() is TRUE, but OpenCL enqueue/compile/bind fails inside the .Call: the outer R shim catches the failure. fallback = FALSE (default): propagate the error. fallback = TRUE: return the CPU analogue instead (optional masking for fragile stacks).

Runnable examples under inst/examples/ mostly use fallback = FALSE so R CMD check surfaces real OpenCL breakage when a device is present.

# Default --- surface GPU/build errors during development / local check with OpenCL enabled:
dnorm_opencl(x)

# Permit CPU masking while OpenCL is flaky (not the development default):
dnorm_opencl(x, fallback = TRUE, verbose = TRUE)

Performance notes

GPU acceleration is most beneficial for large vectors (>= ~10,000 elements) and for functions with non-trivial per-element computation (e.g., pgamma, pt, qnorm). For very small inputs the kernel compilation overhead and data transfer latency dominate.

The first call in an R session may be slightly slower because OpenCL JIT- compiles the kernel source for the specific device. Subsequent calls with the same kernel reuse the compiled program from the driver’s cache.

mirror server hosted at Truenetwork, Russian Federation.