Chapter 07: Kernels — Writing and Using OpenCL Kernel Files

Kjell Nygren

2026-06-11

What is a kernel?

An OpenCL kernel is the entry point that executes on the GPU. It is declared with the __kernel qualifier and is compiled at runtime (by the OpenCL driver) from a source string assembled by the host program. Each invocation of the kernel processes one work-item — typically one element of an output array — in parallel with all other work-items.

In nmathopencl, kernels live in inst/cl/src/. Each file defines exactly one __kernel void function. The file name follows the pattern <function>_kernel.cl; for example, dnorm_kernel.cl implements the vectorized density computation for the normal distribution.

Anatomy of a simple kernel

// dnorm_kernel.cl
// Vectorized wrapper kernel for the public Mathlib dnorm interface.

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void dnorm_kernel(
    __global const double* x,
    const double mu,
    const double sigma,
    const int give_log,
    __global double* out,
    const int n
) {
    int i = get_global_id(0);
    if (i >= n) return;

    out[i] = dnorm4(x[i], mu, sigma, give_log);
}

Key elements:

Element Purpose
#pragma OPENCL EXTENSION cl_khr_fp64 : enable Activates double-precision arithmetic (required for all nmathopencl kernels)
__kernel void Marks this function as a GPU entry point
__global const double* x Input buffer in device (global) memory
const double mu Scalar parameter passed by value
__global double* out Output buffer in device (global) memory
const int n Work-item count guard
get_global_id(0) Returns this work-item’s index in the first dimension
dnorm4(...) Calls the nmath library function, defined in the concatenated library source

The if (i >= n) return; guard handles the case where the global work size is rounded up to a multiple of the local work-group size, leaving excess work-items that must not write out-of-bounds.

Standard argument layout

All kernels in inst/cl/src/ follow a consistent argument layout designed to work with openclPort::opencl_dbl_scalar_kernel_runner:

kernel(double arg0, double arg1, ..., double argN,
       __global double* out, int n_out)

Scalar double parameters come first (in any order needed by the underlying nmath function), followed by the output buffer and its length. Input vectors are also represented as __global const double* and precede the scalar arguments when present.

The *_ex_kernel.cl variants

Some distributions expose a secondary interface. The bessel_*_ex_kernel.cl files, for example, provide a version that returns both the function value and a sign/exponent component simultaneously. These exist to mirror the extended interfaces in nmath (bessel_i vs bessel_i with expo argument) and are distinguished by the _ex_ infix.

Kernel file index

The ~130 kernel files in inst/cl/src/ cover every function exposed by the nmathopencl R API:

Density kernels

dbeta_kernel.cl, dbinom_kernel.cl, dbinom_raw_kernel.cl, dcauchy_kernel.cl, dchisq_kernel.cl, dexp_kernel.cl, df_kernel.cl, dgamma_kernel.cl, dgeom_kernel.cl, dhyper_kernel.cl, dlnorm_kernel.cl, dlogis_kernel.cl, dnbeta_kernel.cl, dnbinom_kernel.cl, dnbinom_mu_kernel.cl, dnchisq_kernel.cl, dnf_kernel.cl, dnorm_kernel.cl, dnt_kernel.cl, dpois_kernel.cl, dpois_raw_kernel.cl, dsignrank_kernel.cl, dt_kernel.cl, dunif_kernel.cl, dweibull_kernel.cl, dwilcox_kernel.cl

CDF kernels

pbeta_kernel.cl, pbinom_kernel.cl, pcauchy_kernel.cl, pchisq_kernel.cl, pexp_kernel.cl, pf_kernel.cl, pgamma_kernel.cl, pgeom_kernel.cl, phyper_kernel.cl, plnorm_kernel.cl, plogis_kernel.cl, pnbeta_kernel.cl, pnbinom_kernel.cl, pnbinom_mu_kernel.cl, pnchisq_kernel.cl, pnf_kernel.cl, pnorm_kernel.cl, pnt_kernel.cl, ppois_kernel.cl, psignrank_kernel.cl, pt_kernel.cl, ptukey_kernel.cl, punif_kernel.cl, pweibull_kernel.cl, pwilcox_kernel.cl

Quantile kernels

qbeta_kernel.cl, qbinom_kernel.cl, qcauchy_kernel.cl, qchisq_kernel.cl, qexp_kernel.cl, qf_kernel.cl, qgamma_kernel.cl, qgeom_kernel.cl, qhyper_kernel.cl, qlnorm_kernel.cl, qlogis_kernel.cl, qnbeta_kernel.cl, qnbinom_kernel.cl, qnbinom_mu_kernel.cl, qnchisq_kernel.cl, qnf_kernel.cl, qnorm_kernel.cl, qnt_kernel.cl, qpois_kernel.cl, qsignrank_kernel.cl, qt_kernel.cl, qtukey_kernel.cl, qunif_kernel.cl, qweibull_kernel.cl, qwilcox_kernel.cl

Random-variate kernels

rbeta_kernel.cl, rbinom_kernel.cl, rcauchy_kernel.cl, rchisq_kernel.cl, rexp_kernel.cl, rf_kernel.cl, rgamma_kernel.cl, rgeom_kernel.cl, rhyper_kernel.cl, rlnorm_kernel.cl, rlogis_kernel.cl, rmultinom_kernel.cl, rnbinom_kernel.cl, rnbinom_mu_kernel.cl, rnchisq_kernel.cl, rnorm_kernel.cl, rpois_kernel.cl, rsignrank_kernel.cl, rt_kernel.cl, runif_kernel.cl, rweibull_kernel.cl, rwilcox_kernel.cl

Special function kernels

beta_special_kernel.cl, choose_special_kernel.cl, lbeta_special_kernel.cl, lchoose_special_kernel.cl, gammafn_kernel.cl, lgammafn_kernel.cl, lgammafn_sign_kernel.cl, lgamma1p_kernel.cl, digamma_kernel.cl, trigamma_kernel.cl, psigamma_kernel.cl, pentagamma_kernel.cl, tetragamma_kernel.cl, dpsifn_kernel.cl

Math support kernels

fmax2_kernel.cl, fmin2_kernel.cl, fprec_kernel.cl, fround_kernel.cl, fsign_kernel.cl, ftrunc_kernel.cl, imax2_kernel.cl, imin2_kernel.cl, sign_kernel.cl, r_pow_kernel.cl, r_pow_di_kernel.cl, pow1p_kernel.cl, log1pmx_kernel.cl, log1pexp_kernel.cl, log1mexp_kernel.cl, logspace_add_kernel.cl, logspace_sub_kernel.cl, logspace_sum_kernel.cl

RNG core kernels

norm_rand_kernel.cl, unif_rand_kernel.cl, exp_rand_kernel.cl, r_unif_index_kernel.cl

Utility kernels

r_check_user_interrupt_kernel.cl, r_check_stack_kernel.cl

Bessel kernels (standard and extended)

bessel_i_kernel.cl, bessel_i_ex_kernel.cl, bessel_j_kernel.cl, bessel_j_ex_kernel.cl, bessel_k_kernel.cl, bessel_k_ex_kernel.cl, bessel_y_kernel.cl, bessel_y_ex_kernel.cl

Program assembly

Kernels are never compiled in isolation. The host-side runner assembles a complete OpenCL program string by concatenating:

  1. The nmath library source (via opencltools::load_kernel_library("nmath", package = "nmathopencl"))
  2. The kernel file (via opencltools::load_kernel_source("src/<name>_kernel.cl", package = "nmathopencl"))

This concatenated string is then passed to clBuildProgram. Because opencltools::load_kernel_library() performs topological sorting, the library functions are always defined before the kernel refers to them. Chapter 08 covers this assembly process in detail.

Writing a custom kernel

To write a kernel that calls an nmathopencl nmath function:

  1. Identify the nmath function you need (e.g., dgamma).
  2. Check its @provides tag in the corresponding .cl file to confirm the exact C signature.
  3. Write a __kernel void function following the standard layout.
  4. Load the nmath library subset your function needs using the @all_depends metadata as a guide (see Chapter 08).
  5. Append your kernel source after the library source in the assembled program string.

Chapter 10 works through this process end-to-end for the glmbayes GLM log-posterior kernels.

mirror server hosted at Truenetwork, Russian Federation.