nmathopencl exports a family of R functions that mirror
R’s stats:: distribution functions but dispatch computation
to an OpenCL device when one is available. The calling convention is
intentionally close to the base-R equivalents so that substitution is
straightforward.
All exported functions share three common arguments:
| Argument | Type | Default | Meaning |
|---|---|---|---|
fallback |
logical | FALSE |
If TRUE while nmathopencl_has_opencl() is
TRUE, fall back to the stats:: analogue when
the OpenCL call fails; ignored when OpenCL is unavailable (CPU is used
anyway) |
verbose |
logical | FALSE |
Extra diagnostics (including when picking the CPU path with no OpenCL runtime) |
log / lower.tail |
logical | distribution-specific | Mirror the stats:: convention for the same
function |
library(nmathopencl)
# TRUE if this nmathopencl build was compiled with USE_OPENCL
nmathopencl_has_opencl()
# GPU device names on the host (opencltools --- system inventory)
opencltools::gpu_names()If OpenCL is not available (nmathopencl_has_opencl()
returns FALSE), calls use the CPU equivalent
unconditionally so examples and CI keep working.
By default fallback = FALSE: when OpenCL
is available but a dispatch fails, the error
propagates—useful while debugging kernels. Pass
fallback = TRUE only when you deliberately want GPU
failures masked with the CPU analogue.
x <- rnorm(1e6)
# Density
d <- dnorm_opencl(x, mean = 0, sd = 1, log = FALSE)
# CDF --- same core arguments as stats::pnorm(q, mean, sd, lower.tail, log.p);
# plus opencl_parallel, fallback, verbose. Long outputs use recycling length.
p <- pnorm_opencl(
rep(1.96, 1e6),
mean = 0,
sd = 1,
lower.tail = TRUE,
log.p = FALSE
)
# Quantile (`qnorm_opencl` retains leading `n` + scalar `p`, unlike `stats::qnorm`-style vectors)
q <- qnorm_opencl(n = 1e6, p = 0.975, mean = 0, sd = 1)
# Random draws
r <- rnorm_opencl(n = 1e6, mean = 0, sd = 1)Note the signature difference between dnorm_opencl
(takes a vector x) and qnorm_opencl /
rnorm_opencl, which retain a leading n
argument for replicate-many quantiles (qnorm) or draws
(rnorm). For the CDF, pnorm_opencl parallels
stats::pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
and adds only opencl_parallel, fallback, and
verbose. Output length is the usual recycled length
(max(length(q), ...), as in stats). On GPU (when used),
each output element runs pnorm_kernel once in sequence
(high overhead until a batched kernel is added).
All families below follow the same four-function pattern
(d*, p*, q*, r*)
with the same fallback / verbose
arguments:
| Family | R file |
|---|---|
| Cauchy | cauchy_opencl.R |
| Chi-squared | chisq_opencl.R |
| Exponential | exponential_opencl.R |
| F | f_opencl.R |
| Geometric | geometric_opencl.R |
| Hypergeometric | hypergeometric_opencl.R |
| Log-normal | lnorm_opencl.R |
| Logistic | logistic_opencl.R |
| Negative binomial | negative_binomial_opencl.R |
| t | t_opencl.R |
| Uniform | uniform_opencl.R |
| Weibull | weibull_opencl.R |
| Multivariate discrete | multinomial_opencl.R |
| Tukey | tukey_opencl.R |
The Wilcoxon rank-sum (rwilcox_opencl.R), Wilcoxon
signed-rank (signrank_opencl.R), and Bessel
(bessel_opencl.R) wrappers are internal-only: their OpenCL
ports have documented allocation/runtime failures (see
inst/OPENCL_KERNEL_KNOWN_FAILURES.md and the notes in
inst/cl/). Use the stats/base
equivalents instead.
Noncentral variants (dnt_opencl,
dnchisq_opencl, dnf_opencl,
dnbeta_opencl, and their p*, q*
counterparts) are available with the noncentrality parameter
ncp.
# Log-gamma
lgammafn_opencl(n, x)
lgammafn_sign_opencl(n, x)
# Gamma function
gammafn_opencl(n, x)
# Log-gamma at x+1
lgamma1p_opencl(n, x)
# Digamma / polygamma
digamma_opencl(n, x)
trigamma_opencl(n, x)
psigamma_opencl(n, x, deriv)
# Beta and log-beta
beta_special_opencl(n, a, b)
lbeta_special_opencl(n, a, b)
# Binomial coefficients
choose_special_opencl(n_out, n, k)
lchoose_special_opencl(n_out, n, k)
# Logspace arithmetic
logspace_add_opencl(n, logx, logy)
logspace_sub_opencl(n, logx, logy)
logspace_sum_opencl(n, logx, logy)
# Math support
fmax2_opencl(n, x, y)
fmin2_opencl(n, x, y)These are exported via special_opencl.R and
math_support_opencl.R.
R_ext)# R-compatible power
r_pow_opencl(n, x, y)
r_pow_di_opencl(n, x, i)
# Miscellaneous math helpers
log1pmx_opencl(n, x)
log1pexp_opencl(n, x)
log1mexp_opencl(n, x)
pow1p_opencl(n, x, y)These are exported via rext_utils_opencl.R.
# Base RNG draws
norm_rand_opencl(n)
unif_rand_opencl(n)
exp_rand_opencl(n)
r_unif_index_opencl(n, dt)These generate random samples using device-side implementations of
the same Kinderman-Ramage / Ahrens-Dieter algorithms used by R’s
internal RNG. They are exported via rng_core_opencl.R.
nmathopencl_has_opencl() is
FALSE (no runtime device / not compiled with
OpenCL): the wrapper always uses the CPU stats::/base
analogue. The fallback argument does not gate
this; it is ignored for this branch.
nmathopencl_has_opencl() is
TRUE, but OpenCL enqueue/compile/bind fails inside
the .Call: the outer R shim catches the failure.
fallback = FALSE (default): propagate the
error. fallback = TRUE: return the CPU
analogue instead (optional masking for fragile stacks).
Runnable examples under inst/examples/ mostly use
fallback = FALSE so
R CMD check surfaces real OpenCL breakage when a device is
present.
GPU acceleration is most beneficial for large
vectors (>= ~10,000 elements) and for functions with
non-trivial per-element computation (e.g., pgamma,
pt, qnorm). For very small inputs the kernel
compilation overhead and data transfer latency dominate.
The first call in an R session may be slightly slower because OpenCL JIT- compiles the kernel source for the specific device. Subsequent calls with the same kernel reuse the compiled program from the driver’s cache.