An OpenCL kernel is the entry point that executes on
the GPU. It is declared with the __kernel qualifier and is
compiled at runtime (by the OpenCL driver) from a source string
assembled by the host program. Each invocation of the kernel processes
one work-item — typically one element of an output array — in parallel
with all other work-items.
In nmathopencl, kernels live in
inst/cl/src/. Each file defines exactly one
__kernel void function. The file name follows the pattern
<function>_kernel.cl; for example,
dnorm_kernel.cl implements the vectorized density
computation for the normal distribution.
// dnorm_kernel.cl
// Vectorized wrapper kernel for the public Mathlib dnorm interface.
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void dnorm_kernel(
__global const double* x,
const double mu,
const double sigma,
const int give_log,
__global double* out,
const int n
) {
int i = get_global_id(0);
if (i >= n) return;
out[i] = dnorm4(x[i], mu, sigma, give_log);
}Key elements:
| Element | Purpose |
|---|---|
#pragma OPENCL EXTENSION cl_khr_fp64 : enable |
Activates double-precision arithmetic (required for all
nmathopencl kernels) |
__kernel void |
Marks this function as a GPU entry point |
__global const double* x |
Input buffer in device (global) memory |
const double mu |
Scalar parameter passed by value |
__global double* out |
Output buffer in device (global) memory |
const int n |
Work-item count guard |
get_global_id(0) |
Returns this work-item’s index in the first dimension |
dnorm4(...) |
Calls the nmath library function, defined in the concatenated library source |
The if (i >= n) return; guard handles the case where
the global work size is rounded up to a multiple of the local work-group
size, leaving excess work-items that must not write out-of-bounds.
All kernels in inst/cl/src/ follow a consistent argument
layout designed to work with
openclPort::opencl_dbl_scalar_kernel_runner:
kernel(double arg0, double arg1, ..., double argN,
__global double* out, int n_out)
Scalar double parameters come first (in any order needed
by the underlying nmath function), followed by the output buffer and its
length. Input vectors are also represented as
__global const double* and precede the scalar arguments
when present.
*_ex_kernel.cl variantsSome distributions expose a secondary interface. The
bessel_*_ex_kernel.cl files, for example, provide a version
that returns both the function value and a sign/exponent component
simultaneously. These exist to mirror the extended interfaces in nmath
(bessel_i vs bessel_i with expo
argument) and are distinguished by the _ex_ infix.
The ~130 kernel files in inst/cl/src/ cover every
function exposed by the nmathopencl R API:
dbeta_kernel.cl, dbinom_kernel.cl,
dbinom_raw_kernel.cl, dcauchy_kernel.cl,
dchisq_kernel.cl, dexp_kernel.cl,
df_kernel.cl, dgamma_kernel.cl,
dgeom_kernel.cl, dhyper_kernel.cl,
dlnorm_kernel.cl, dlogis_kernel.cl,
dnbeta_kernel.cl, dnbinom_kernel.cl,
dnbinom_mu_kernel.cl, dnchisq_kernel.cl,
dnf_kernel.cl, dnorm_kernel.cl,
dnt_kernel.cl, dpois_kernel.cl,
dpois_raw_kernel.cl, dsignrank_kernel.cl,
dt_kernel.cl, dunif_kernel.cl,
dweibull_kernel.cl, dwilcox_kernel.cl
pbeta_kernel.cl, pbinom_kernel.cl,
pcauchy_kernel.cl, pchisq_kernel.cl,
pexp_kernel.cl, pf_kernel.cl,
pgamma_kernel.cl, pgeom_kernel.cl,
phyper_kernel.cl, plnorm_kernel.cl,
plogis_kernel.cl, pnbeta_kernel.cl,
pnbinom_kernel.cl, pnbinom_mu_kernel.cl,
pnchisq_kernel.cl, pnf_kernel.cl,
pnorm_kernel.cl, pnt_kernel.cl,
ppois_kernel.cl, psignrank_kernel.cl,
pt_kernel.cl, ptukey_kernel.cl,
punif_kernel.cl, pweibull_kernel.cl,
pwilcox_kernel.cl
qbeta_kernel.cl, qbinom_kernel.cl,
qcauchy_kernel.cl, qchisq_kernel.cl,
qexp_kernel.cl, qf_kernel.cl,
qgamma_kernel.cl, qgeom_kernel.cl,
qhyper_kernel.cl, qlnorm_kernel.cl,
qlogis_kernel.cl, qnbeta_kernel.cl,
qnbinom_kernel.cl, qnbinom_mu_kernel.cl,
qnchisq_kernel.cl, qnf_kernel.cl,
qnorm_kernel.cl, qnt_kernel.cl,
qpois_kernel.cl, qsignrank_kernel.cl,
qt_kernel.cl, qtukey_kernel.cl,
qunif_kernel.cl, qweibull_kernel.cl,
qwilcox_kernel.cl
rbeta_kernel.cl, rbinom_kernel.cl,
rcauchy_kernel.cl, rchisq_kernel.cl,
rexp_kernel.cl, rf_kernel.cl,
rgamma_kernel.cl, rgeom_kernel.cl,
rhyper_kernel.cl, rlnorm_kernel.cl,
rlogis_kernel.cl, rmultinom_kernel.cl,
rnbinom_kernel.cl, rnbinom_mu_kernel.cl,
rnchisq_kernel.cl, rnorm_kernel.cl,
rpois_kernel.cl, rsignrank_kernel.cl,
rt_kernel.cl, runif_kernel.cl,
rweibull_kernel.cl, rwilcox_kernel.cl
beta_special_kernel.cl,
choose_special_kernel.cl,
lbeta_special_kernel.cl,
lchoose_special_kernel.cl, gammafn_kernel.cl,
lgammafn_kernel.cl, lgammafn_sign_kernel.cl,
lgamma1p_kernel.cl, digamma_kernel.cl,
trigamma_kernel.cl, psigamma_kernel.cl,
pentagamma_kernel.cl, tetragamma_kernel.cl,
dpsifn_kernel.cl
fmax2_kernel.cl, fmin2_kernel.cl,
fprec_kernel.cl, fround_kernel.cl,
fsign_kernel.cl, ftrunc_kernel.cl,
imax2_kernel.cl, imin2_kernel.cl,
sign_kernel.cl, r_pow_kernel.cl,
r_pow_di_kernel.cl, pow1p_kernel.cl,
log1pmx_kernel.cl, log1pexp_kernel.cl,
log1mexp_kernel.cl, logspace_add_kernel.cl,
logspace_sub_kernel.cl,
logspace_sum_kernel.cl
norm_rand_kernel.cl, unif_rand_kernel.cl,
exp_rand_kernel.cl, r_unif_index_kernel.cl
r_check_user_interrupt_kernel.cl,
r_check_stack_kernel.cl
bessel_i_kernel.cl, bessel_i_ex_kernel.cl,
bessel_j_kernel.cl, bessel_j_ex_kernel.cl,
bessel_k_kernel.cl, bessel_k_ex_kernel.cl,
bessel_y_kernel.cl, bessel_y_ex_kernel.cl
Kernels are never compiled in isolation. The host-side runner assembles a complete OpenCL program string by concatenating:
opencltools::load_kernel_library("nmath", package = "nmathopencl"))opencltools::load_kernel_source("src/<name>_kernel.cl", package = "nmathopencl"))This concatenated string is then passed to
clBuildProgram. Because
opencltools::load_kernel_library() performs topological
sorting, the library functions are always defined before the kernel
refers to them. Chapter 08 covers this assembly process in detail.
To write a kernel that calls an nmathopencl nmath
function:
dgamma).@provides tag in the corresponding
.cl file to confirm the exact C signature.__kernel void function following the standard
layout.@all_depends metadata as a guide (see Chapter 08).Chapter 10 works through this process end-to-end for the
glmbayes GLM log-posterior kernels.