Chapter 00: nmathopencl — Package Overview

What is `nmathopencl`?

nmathopencl is a developer library: it ports R’s internal nmath (Mathlib) statistical math functions to OpenCL so that downstream R packages can embed those functions inside their own custom GPU kernels. The primary audience is package authors who want GPU-accelerated computation and need statistical math functions available on the device side — without having to port the underlying nmath sources themselves.

A secondary audience is end users who want to call distribution functions (dnorm, pgamma, rbinom, …) directly on GPU hardware. The package exports *_opencl wrappers for the full nmath family, but their main role is validation: running them on large vectors confirms that the OpenCL pipeline and GPU hardware are working before a downstream package is built. For modest vector sizes the GPU often performs no better than the CPU, because the cost of kernel compilation and host-to-device data transfer dominates. Meaningful GPU acceleration of individual nmath calls requires very large workloads.

The real performance story is at the downstream package level. When nmath calls are embedded inside larger GPU kernels — alongside other expensive device-side operations such as the gradient and envelope calculations in glmbayes — the GPU does the computation without the round-trip transfer penalty, and substantial gains become possible. The design here supports that pattern; the exported *_opencl functions demonstrate it works.

OpenCL is vendor-neutral: the same kernels run on NVIDIA, AMD, and Intel hardware. CPU-only execution is always supported when no OpenCL stack is present, so the package is safe to list as a dependency even in environments that lack a GPU.

Three-layer architecture

The package is organized in three layers, each corresponding to a set of vignettes:

????????????????????????????????????????????????
|  Layer 3 --- Kernels  (inst/cl/src/)           |
|  __kernel functions for the R-callable API   |
????????????????????????????????????????????????
|  Layer 2 --- nmath library  (inst/cl/nmath/)   |
|  Ported nmath/Rmath functions as device-side |
|  OpenCL C functions                          |
????????????????????????????????????????????????
|  Layer 1 --- Upstream shims                    |
|  (inst/cl/R_shims/, R_ext/, System/, ...)      |
|  Type definitions, macros, and constants     |
|  that replace C headers unavailable in       |
|  OpenCL C                                    |
????????????????????????????????????????????????

Layer 1 is the foundation: it makes the rest of the ported code compile under OpenCL’s restricted C99 dialect without modification to the nmath sources. Layer 2 is the library: ~180 .cl files implementing the full suite of Mathlib functions. Layer 3 is the API surface: thin wrapper kernels that map a GPU work-item index to an element of an input vector and call the appropriate Layer 2 function.

Downstream packages locate the Layer 2 sources at runtime with system.file("cl", package = "nmathopencl") and assemble them into their own OpenCL programs using opencltools::load_kernel_library(..., package = "nmathopencl"). They own the kernel runners, R wrappers, and compilation lifecycle; nmathopencl simply provides the portable math library they build on.

See Chapter 03 for the detailed assembly model, including how the four components of a complete kernel program (global configuration header, shims, nmath subset, and kernel function) are concatenated and compiled at runtime.

C++ layout inside the package DLL

Layer	Location	Purpose
`nmathopencl`	`nmathopencl.h`, `kernel_runners.cpp`, `kernel_wrappers.cpp`	Distribution-specific kernel runners and R-facing wrappers for all nmath functions
Internal OpenCL infrastructure	`openclPort.h`, `opencl_kernel_runners.cpp`	Generic kernel runner, error helpers, device probing, and kernel loading inside the DLL — see Chapter 09
`ex_glmbayes`	`ex_glmbayes_*.cpp/.h`	Self-contained example showing how a downstream package (`glmbayes`) builds custom GLM kernels on top of the layers above

Kernel authors who LinkingTo: nmathopencl may include openclPort.h directly; the internal runner layer is documented in Chapter 09.

Related packages

nmathopencl is part of a small suite of cooperating packages:

Package	Role	Typical entry points
`nmathopencl` (this package)	OpenCL-ported Mathlib, `*_opencl` validation API, kernel loaders, package-local device selection	`nmathopencl_has_opencl()`, `load_kernel_*`, `dnorm_opencl()`
`opencltools` (CRAN)	Host/runtime diagnostics and kernel-library authoring tools	`detect_environment_and_gpus()`, `verify_opencl_runtime()`, `load_library_for_kernel()`, `diagnose_glmbayes()` (opencltools-only report)
`glmbayes` (CRAN)	End-user Bayesian GLMs with optional GPU paths	`glmb()`, `use_opencl = TRUE`

nmathopencl Imports opencltools (>= 0.8.0). Host inventory, driver/ICD checks, and PATH validation are delegated to opencltools; compile-time OpenCL status for this package’s DLL stays local via nmathopencl_has_opencl(). Host/runtime probes (detect_*, PATH helpers, gpu_names, and related functions) are not re-exported from nmathopencl — call opencltools::… directly. Kernel-library authoring helpers (load_library_for_kernel, extract_library_subset, and related tagging tools) are re-exported for downstream kernel authors.

For OpenCL setup and enablement, start with Chapter 01 (attach messages and the nmathopencl-specific enablement path) and opencltools vignette Chapter 01 (platform install details).

R-side API families

The exported *_opencl functions cover the full nmath family and mirror the structure of base R’s stats package:

R file	Functions
`normal_opencl.R`	`dnorm_opencl`, `pnorm_opencl`, `qnorm_opencl`, `rnorm_opencl`
`gamma_opencl.R`	`dgamma_opencl`, `pgamma_opencl`, …
`binomial_opencl.R`	`dbinom_opencl`, `pbinom_opencl`, …
`poisson_opencl.R`	`dpois_opencl`, `ppois_opencl`, …
`beta_opencl.R`	`dbeta_opencl`, …
…	(and so on for all families)
`special_opencl.R`	`lgammafn_opencl`, `gammafn_opencl`, …
`math_support_opencl.R`	`fmax2_opencl`, `fmin2_opencl`, …

Every function accepts a scalar parameter set, dispatches to the GPU via the kernel infrastructure, and falls back to the corresponding stats:: or base-R function if OpenCL is unavailable or if the call fails. As noted above, these wrappers serve primarily as a working demonstration of the GPU pipeline; they can show speedups at very large vector sizes but are not the primary mechanism through which downstream packages obtain GPU acceleration.

Checking OpenCL availability

library(nmathopencl)

# Compile-time OpenCL support in this nmathopencl build
nmathopencl_has_opencl()

# Same check for the imported opencltools dependency
opencltools::has_opencl()

# Host/runtime diagnostic report (opencltools)
opencltools::diagnose_glmbayes()

nmathopencl_has_opencl() (nmathopencl) — was this package built with OpenCL (-DUSE_OPENCL)?
opencltools::has_opencl() — was the imported dependency built with OpenCL?
opencltools::diagnose_glmbayes() — host/runtime report from opencltools.

Host and driver inventory (detect_environment_and_gpus(), verify_opencl_runtime(), and related probes) live in opencltools — use opencltools::… when calling them directly. All exported *_opencl wrappers branch on nmathopencl_has_opencl() first; the fallback argument then controls whether a failed OpenCL call is replaced with the CPU path (ignored when OpenCL is absent at compile time).

See Chapter 01 for the step-by-step enablement path (attach messages, opencltools first, then source reinstall of nmathopencl).

Vignette guide

Part 0: Overview

Vignette	Topic
Chapter 00 (this document)	Package overview and architecture

Part I: Getting Started

Vignette	Topic
Chapter 01	OpenCL enablement for `nmathopencl` (attach messages, `opencltools` dependency, source reinstall)
Chapter 02	Adding `USE_OPENCL` and `has_opencl()` to your package: `configure` scripts, `opencltools` runtime relationship

Part II: The Library and Program Model

Vignette	Topic
Chapter 03	Structure of `nmath` kernel programs: the four-layer assembly model
Chapter 04	The `nmath` OpenCL library (`inst/cl/nmath/`): cycles, shims, and annotation

Part III: Developer Guide

Vignette	Topic
Chapter 05	Kernels, kernel runners, and kernel wrappers: roles and interaction
Chapter 06	Integrating kernel wrappers into your codebase: CPU fallbacks and R interfaces
Chapter 07	Writing and annotating `__kernel` functions
Chapter 08	Kernel loading: `load_kernel_source` and `load_kernel_library`
Chapter 09	Generic OpenCL kernel runners: the `openclPort` C++ infrastructure
Chapter 10	Case study: building custom GLM kernels (`ex_glmbayes`)
Chapter 11	Testing, debugging, and benchmarking GPU kernels

Part IV: The R API

Vignette	Topic
Chapter 12	The `nmathopencl` R API: distribution functions on the GPU

Chapter 00: nmathopencl — Package Overview

Kjell Nygren

2026-07-15

What is `nmathopencl`?

Three-layer architecture

C++ layout inside the package DLL

R-side API families

Checking OpenCL availability

Vignette guide

Chapter 00: nmathopencl — Package Overview

Kjell Nygren

2026-07-15

What is nmathopencl?

Three-layer architecture

C++ layout inside the package DLL

Related packages

R-side API families

Checking OpenCL availability

Vignette guide

What is `nmathopencl`?