nmathopencl?nmathopencl is a developer library: it
ports R’s internal nmath (Mathlib) statistical math
functions to OpenCL so that downstream R packages can embed those
functions inside their own custom GPU kernels. The primary audience is
package authors who want GPU-accelerated computation
and need statistical math functions available on the device side —
without having to port the underlying nmath sources themselves.
A secondary audience is end users who want to call
distribution functions (dnorm, pgamma,
rbinom, …) directly on GPU hardware. The package exports
*_opencl wrappers for the full nmath family, but their main
role is validation: running them on large vectors
confirms that the OpenCL pipeline and GPU hardware are working before a
downstream package is built. For modest vector sizes the GPU often
performs no better than the CPU, because the cost of kernel compilation
and host-to-device data transfer dominates. Meaningful GPU acceleration
of individual nmath calls requires very large workloads.
The real performance story is at the downstream package level. When
nmath calls are embedded inside larger GPU kernels — alongside
other expensive device-side operations such as the gradient and envelope
calculations in glmbayes — the GPU does the computation
without the round-trip transfer penalty, and substantial gains become
possible. The design here supports that pattern; the exported
*_opencl functions demonstrate it works.
OpenCL is vendor-neutral: the same kernels run on NVIDIA, AMD, and Intel hardware. CPU-only execution is always supported when no OpenCL stack is present, so the package is safe to list as a dependency even in environments that lack a GPU.
The package is organized in three layers, each corresponding to a set of vignettes:
????????????????????????????????????????????????
| Layer 3 --- Kernels (inst/cl/src/) |
| __kernel functions for the R-callable API |
????????????????????????????????????????????????
| Layer 2 --- nmath library (inst/cl/nmath/) |
| Ported nmath/Rmath functions as device-side |
| OpenCL C functions |
????????????????????????????????????????????????
| Layer 1 --- Upstream shims |
| (inst/cl/R_shims/, R_ext/, System/, ...) |
| Type definitions, macros, and constants |
| that replace C headers unavailable in |
| OpenCL C |
????????????????????????????????????????????????
Layer 1 is the foundation: it makes the rest of the ported code
compile under OpenCL’s restricted C99 dialect without modification to
the nmath sources. Layer 2 is the library: ~180 .cl files
implementing the full suite of Mathlib functions. Layer 3 is the API
surface: thin wrapper kernels that map a GPU work-item index to an
element of an input vector and call the appropriate Layer 2
function.
Downstream packages locate the Layer 2 sources at runtime with
system.file("cl", package = "nmathopencl") and assemble
them into their own OpenCL programs using
opencltools::load_kernel_library(..., package = "nmathopencl").
They own the kernel runners, R wrappers, and compilation lifecycle;
nmathopencl simply provides the portable math library they
build on.
See Chapter 03 for the detailed assembly model, including how the four components of a complete kernel program (global configuration header, shims, nmath subset, and kernel function) are concatenated and compiled at runtime.
| Layer | Location | Purpose |
|---|---|---|
nmathopencl |
nmathopencl.h, kernel_runners.cpp,
kernel_wrappers.cpp |
Distribution-specific kernel runners and R-facing wrappers for all nmath functions |
| Internal OpenCL infrastructure | openclPort.h,
opencl_kernel_runners.cpp |
Generic kernel runner, error helpers, device probing, and kernel loading inside the DLL — see Chapter 09 |
ex_glmbayes |
ex_glmbayes_*.cpp/.h |
Self-contained example showing how a downstream package
(glmbayes) builds custom GLM kernels on top of the layers
above |
Kernel authors who LinkingTo: nmathopencl may include
openclPort.h directly; the internal runner layer is
documented in Chapter 09.
The exported *_opencl functions cover the full nmath
family and mirror the structure of base R’s stats
package:
| R file | Functions |
|---|---|
normal_opencl.R |
dnorm_opencl, pnorm_opencl,
qnorm_opencl, rnorm_opencl |
gamma_opencl.R |
dgamma_opencl, pgamma_opencl, … |
binomial_opencl.R |
dbinom_opencl, pbinom_opencl, … |
poisson_opencl.R |
dpois_opencl, ppois_opencl, … |
beta_opencl.R |
dbeta_opencl, … |
| … | (and so on for all families) |
special_opencl.R |
lgammafn_opencl, gammafn_opencl, … |
math_support_opencl.R |
fmax2_opencl, fmin2_opencl, … |
Every function accepts a scalar parameter set, dispatches to the GPU
via the kernel infrastructure, and falls back to the corresponding
stats:: or base-R function if OpenCL is unavailable or if
the call fails. As noted above, these wrappers serve primarily as a
working demonstration of the GPU pipeline; they can show speedups at
very large vector sizes but are not the primary mechanism through which
downstream packages obtain GPU acceleration.
library(nmathopencl)
# Compile-time OpenCL support in this nmathopencl build
nmathopencl_has_opencl()
# Same check for the imported opencltools dependency
opencltools::has_opencl()
# Host/runtime diagnostic report (opencltools)
opencltools::diagnose_glmbayes()nmathopencl_has_opencl() (nmathopencl)
— was this package built with OpenCL
(-DUSE_OPENCL)?opencltools::has_opencl() — was the
imported dependency built with OpenCL?opencltools::diagnose_glmbayes() —
host/runtime report from opencltools.Host and driver inventory
(detect_environment_and_gpus(),
verify_opencl_runtime(), and related probes) live in
opencltools — use
opencltools::… when calling them directly. All exported
*_opencl wrappers branch on
nmathopencl_has_opencl() first; the fallback
argument then controls whether a failed OpenCL call is replaced with the
CPU path (ignored when OpenCL is absent at compile time).
See Chapter 01 for the step-by-step enablement path (attach messages, opencltools first, then source reinstall of nmathopencl).
Part 0: Overview
| Vignette | Topic |
|---|---|
| Chapter 00 (this document) | Package overview and architecture |
Part I: Getting Started
| Vignette | Topic |
|---|---|
| Chapter 01 | OpenCL enablement for nmathopencl (attach messages,
opencltools dependency, source reinstall) |
| Chapter 02 | Adding USE_OPENCL and has_opencl() to your
package: configure scripts, opencltools
runtime relationship |
Part II: The Library and Program Model
| Vignette | Topic |
|---|---|
| Chapter 03 | Structure of nmath kernel programs: the four-layer
assembly model |
| Chapter 04 | The nmath OpenCL library (inst/cl/nmath/):
cycles, shims, and annotation |
Part III: Developer Guide
| Vignette | Topic |
|---|---|
| Chapter 05 | Kernels, kernel runners, and kernel wrappers: roles and interaction |
| Chapter 06 | Integrating kernel wrappers into your codebase: CPU fallbacks and R interfaces |
| Chapter 07 | Writing and annotating __kernel functions |
| Chapter 08 | Kernel loading: load_kernel_source and
load_kernel_library |
| Chapter 09 | Generic OpenCL kernel runners: the openclPort C++
infrastructure |
| Chapter 10 | Case study: building custom GLM kernels
(ex_glmbayes) |
| Chapter 11 | Testing, debugging, and benchmarking GPU kernels |
Part IV: The R API
| Vignette | Topic |
|---|---|
| Chapter 12 | The nmathopencl R API: distribution functions on the
GPU |