Chapter 00: nmathopencl — Package Overview

Kjell Nygren

2026-06-11

What is nmathopencl?

nmathopencl is a developer library: it ports R’s internal nmath (Mathlib) statistical math functions to OpenCL so that downstream R packages can embed those functions inside their own custom GPU kernels. The primary audience is package authors who want GPU-accelerated computation and need statistical math functions available on the device side — without having to port the underlying nmath sources themselves.

A secondary audience is end users who want to call distribution functions (dnorm, pgamma, rbinom, …) directly on GPU hardware. The package exports *_opencl wrappers for the full nmath family, but their main role is validation: running them on large vectors confirms that the OpenCL pipeline and GPU hardware are working before a downstream package is built. For modest vector sizes the GPU often performs no better than the CPU, because the cost of kernel compilation and host-to-device data transfer dominates. Meaningful GPU acceleration of individual nmath calls requires very large workloads.

The real performance story is at the downstream package level. When nmath calls are embedded inside larger GPU kernels — alongside other expensive device-side operations such as the gradient and envelope calculations in glmbayes — the GPU does the computation without the round-trip transfer penalty, and substantial gains become possible. The design here supports that pattern; the exported *_opencl functions demonstrate it works.

OpenCL is vendor-neutral: the same kernels run on NVIDIA, AMD, and Intel hardware. CPU-only execution is always supported when no OpenCL stack is present, so the package is safe to list as a dependency even in environments that lack a GPU.

Three-layer architecture

The package is organized in three layers, each corresponding to a set of vignettes:

????????????????????????????????????????????????
|  Layer 3 --- Kernels  (inst/cl/src/)           |
|  __kernel functions for the R-callable API   |
????????????????????????????????????????????????
|  Layer 2 --- nmath library  (inst/cl/nmath/)   |
|  Ported nmath/Rmath functions as device-side |
|  OpenCL C functions                          |
????????????????????????????????????????????????
|  Layer 1 --- Upstream shims                    |
|  (inst/cl/R_shims/, R_ext/, System/, ...)      |
|  Type definitions, macros, and constants     |
|  that replace C headers unavailable in       |
|  OpenCL C                                    |
????????????????????????????????????????????????

Layer 1 is the foundation: it makes the rest of the ported code compile under OpenCL’s restricted C99 dialect without modification to the nmath sources. Layer 2 is the library: ~180 .cl files implementing the full suite of Mathlib functions. Layer 3 is the API surface: thin wrapper kernels that map a GPU work-item index to an element of an input vector and call the appropriate Layer 2 function.

Downstream packages locate the Layer 2 sources at runtime with system.file("cl", package = "nmathopencl") and assemble them into their own OpenCL programs using opencltools::load_kernel_library(..., package = "nmathopencl"). They own the kernel runners, R wrappers, and compilation lifecycle; nmathopencl simply provides the portable math library they build on.

See Chapter 03 for the detailed assembly model, including how the four components of a complete kernel program (global configuration header, shims, nmath subset, and kernel function) are concatenated and compiled at runtime.

C++ layout inside the package DLL

Layer Location Purpose
nmathopencl nmathopencl.h, kernel_runners.cpp, kernel_wrappers.cpp Distribution-specific kernel runners and R-facing wrappers for all nmath functions
Internal OpenCL infrastructure openclPort.h, opencl_kernel_runners.cpp Generic kernel runner, error helpers, device probing, and kernel loading inside the DLL — see Chapter 09
ex_glmbayes ex_glmbayes_*.cpp/.h Self-contained example showing how a downstream package (glmbayes) builds custom GLM kernels on top of the layers above

Kernel authors who LinkingTo: nmathopencl may include openclPort.h directly; the internal runner layer is documented in Chapter 09.

R-side API families

The exported *_opencl functions cover the full nmath family and mirror the structure of base R’s stats package:

R file Functions
normal_opencl.R dnorm_opencl, pnorm_opencl, qnorm_opencl, rnorm_opencl
gamma_opencl.R dgamma_opencl, pgamma_opencl, …
binomial_opencl.R dbinom_opencl, pbinom_opencl, …
poisson_opencl.R dpois_opencl, ppois_opencl, …
beta_opencl.R dbeta_opencl, …
(and so on for all families)
special_opencl.R lgammafn_opencl, gammafn_opencl, …
math_support_opencl.R fmax2_opencl, fmin2_opencl, …

Every function accepts a scalar parameter set, dispatches to the GPU via the kernel infrastructure, and falls back to the corresponding stats:: or base-R function if OpenCL is unavailable or if the call fails. As noted above, these wrappers serve primarily as a working demonstration of the GPU pipeline; they can show speedups at very large vector sizes but are not the primary mechanism through which downstream packages obtain GPU acceleration.

Checking OpenCL availability

library(nmathopencl)

# Compile-time OpenCL support in this nmathopencl build
nmathopencl_has_opencl()

# Same check for the imported opencltools dependency
opencltools::has_opencl()

# Host/runtime diagnostic report (opencltools)
opencltools::diagnose_glmbayes()

Host and driver inventory (detect_environment_and_gpus(), verify_opencl_runtime(), and related probes) live in opencltools — use opencltools::… when calling them directly. All exported *_opencl wrappers branch on nmathopencl_has_opencl() first; the fallback argument then controls whether a failed OpenCL call is replaced with the CPU path (ignored when OpenCL is absent at compile time).

See Chapter 01 for the step-by-step enablement path (attach messages, opencltools first, then source reinstall of nmathopencl).

Vignette guide

Part 0: Overview

Vignette Topic
Chapter 00 (this document) Package overview and architecture

Part I: Getting Started

Vignette Topic
Chapter 01 OpenCL enablement for nmathopencl (attach messages, opencltools dependency, source reinstall)
Chapter 02 Adding USE_OPENCL and has_opencl() to your package: configure scripts, opencltools runtime relationship

Part II: The Library and Program Model

Vignette Topic
Chapter 03 Structure of nmath kernel programs: the four-layer assembly model
Chapter 04 The nmath OpenCL library (inst/cl/nmath/): cycles, shims, and annotation

Part III: Developer Guide

Vignette Topic
Chapter 05 Kernels, kernel runners, and kernel wrappers: roles and interaction
Chapter 06 Integrating kernel wrappers into your codebase: CPU fallbacks and R interfaces
Chapter 07 Writing and annotating __kernel functions
Chapter 08 Kernel loading: load_kernel_source and load_kernel_library
Chapter 09 Generic OpenCL kernel runners: the openclPort C++ infrastructure
Chapter 10 Case study: building custom GLM kernels (ex_glmbayes)
Chapter 11 Testing, debugging, and benchmarking GPU kernels

Part IV: The R API

Vignette Topic
Chapter 12 The nmathopencl R API: distribution functions on the GPU

mirror server hosted at Truenetwork, Russian Federation.