Chapter 08: Kernel Loading — load_kernel_source and load_kernel_library

Kjell Nygren

2026-06-11

The assembly problem

OpenCL compiles a single string of source code for each program. The string must be self-contained: all functions used by the kernel must be defined in the same string, in order, before they are called. There are no separate compilation units and no linker.

When a kernel needs dnorm4, and dnorm4 itself needs R_D__0 and R_finite, those definitions must precede dnorm4 in the string, which must precede the kernel body. opencltools provides two R functions that assemble and sort source components; use package = "nmathopencl" for this package’s inst/cl/ tree.

opencltools::load_kernel_source()

opencltools::load_kernel_source(relative_path, package = "nmathopencl")

Reads and returns the contents of a single .cl file from inst/cl/ inside the named package. The relative_path is relative to inst/cl/:

# Load the dnorm kernel entry-point file
kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl",
                                              package = "nmathopencl")

# Load a custom kernel from another package's inst/cl/
my_kernel   <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl",
                                   package = "nmathopencl")

The returned character string can be concatenated with a library string and passed to the C++ runner.

opencltools::load_kernel_library()

opencltools::load_kernel_library(subdir, package = "nmathopencl", verbose = FALSE)

Reads all .cl files from inst/cl/<subdir>/, parses the @provides / @depends annotations in each file, performs a topological sort, and returns a single concatenated source string with files in dependency-correct order.

# Load the complete nmath library (all ~180 files)
nmath_src <- opencltools::load_kernel_library("nmath", package = "nmathopencl")

# Load a smaller subset pre-curated for glmbayes kernels
nmath_glm <- opencltools::load_kernel_library("ex_glmbayes_nmath",
                                              package = "nmathopencl")

# Load with diagnostic output to trace the sort order
nmath_src <- opencltools::load_kernel_library("nmath",
                                              package = "nmathopencl",
                                              verbose = TRUE)

The annotation format

For opencltools::load_kernel_library() to sort correctly, every .cl file in the subdirectory must begin with an annotation block:

// @provides: symbol1, symbol2
// @depends:  fileA, fileB
// @includes: library_name
Tag Required? Meaning
@provides Yes Comma-separated list of function (or symbol) names defined in this file
@depends Yes (may be empty) Comma-separated list of .cl filenames (without extension) that must appear earlier in the output
@includes Optional Library grouping label; used for diagnostics only

Rules for @depends:

Example for dgamma.cl:

// @provides: dgamma
// @depends:  nmath, dpois, lgamma
// @includes: nmath

This tells the resolver: emit nmath.cl, dpois.cl, and lgamma.cl before dgamma.cl. The resolver handles those files’ own dependencies recursively.

The topological sort algorithm

opencltools::load_kernel_library() uses a straightforward iterative topological sort:

  1. Parse all .cl files and extract their @provides and @depends tags.
  2. Build a set of “unsorted” files. Initialize the “sorted” set to empty.
  3. In each pass, find every unsorted file whose @depends entries are all already in the sorted set. Emit those files and add them to the sorted set.
  4. Repeat until all files are sorted or no progress is made.
  5. If no progress is made but unsorted files remain, throw: "Dependency sort failed: unresolved dependencies remain."

Files with an empty @depends list are emitted first (pass 1). These are always the library header files.

Verbose output

When verbose = TRUE, the function prints each pass of the sort:

Pass 1: emitting nmath.cl, Rmath.cl, dpq.cl
Pass 2: emitting chebyshev.cl, cospi.cl, fmax2.cl, gammalims.cl
Pass 3: emitting lgammacor.cl
Pass 4: emitting lgamma.cl, gamma.cl
...

This is invaluable for debugging missing or circular @depends tags.

Cycle detection and breaking

If file A lists file B in its @depends and file B lists file A, the resolver will detect this on the first pass where neither can be promoted, and throw an error. If verbose = TRUE, it prints the stuck files before throwing.

The standard fix is to split one of the mutually dependent files into two:

Chapter 04 describes the three specific cycles in the nmath library and how they are resolved.

Assembling a complete program

A complete OpenCL program string is assembled by concatenating library and kernel sources in order:

library(nmathopencl)

# 1. Load the nmath library (sorted automatically)
nmath_src  <- opencltools::load_kernel_library("nmath", package = "nmathopencl")

# 2. Load the kernel entry-point
kernel_src <- opencltools::load_kernel_source("src/dnorm_kernel.cl",
                                              package = "nmathopencl")

# 3. Concatenate into a single program string
program    <- paste(nmath_src, kernel_src, sep = "\n")

This program string is then passed to the C++ runner (see Chapter 09).

Using libraries from another package

The package argument tells opencltools::load_kernel_source() and opencltools::load_kernel_library() which package’s inst/cl/ to look in:

# Load the curated nmath subset shipped inside nmathopencl
# for use by a downstream package's kernels
nmath_sub <- opencltools::load_kernel_library("ex_glmbayes_nmath",
                                  package = "nmathopencl")

# Load a custom kernel from the same downstream package
my_kernel <- opencltools::load_kernel_source("ex_glmbayes_src/f2_f3_binomial_logit.cl",
                                 package = "nmathopencl")

program <- paste(nmath_sub, my_kernel, sep = "\n")

This is exactly the pattern used by f2_f3_opencl() in the ex_glmbayes example (see Chapter 10).

Dependency index — faster, kernel-specific loading

opencltools::load_kernel_library() always sorts and loads all files in the subdirectory. For a large library like nmath (137 files) this is thorough but wasteful when only a small subset is needed for a specific kernel.

nmathopencl therefore ships a pre-built dependency index alongside every curated library subdirectory. The index encodes the full transitive dependency map and the global load order, enabling two important optimizations:

  1. load_library_for_kernel() (R) — loads only the files required by one specific kernel, with no sort at runtime.
  2. load_library_for_kernel() (C++) — the same function implemented entirely in C++, suitable for use inside Rcpp kernel wrappers.

Building the index — write_kernel_dependency_index()

write_kernel_dependency_index(library_dir, verbose = TRUE)

This function analyzes a library directory, computes the full transitive dependency closure for every stem, and writes two companion files next to the .cl sources:

File Format Consumer
kernel_dependency_index.rds Binary R object R functions
kernel_dependency_index.tsv Tab-separated text C++ code

The .tsv has a header row and one row per stem in global load order:

stem    all_depends
refactored  
Rmath   refactored
nmath   refactored, Rmath
chebyshev   refactored, Rmath, nmath
dbinom  dpq, refactored, Rmath, nmath, stirlerr_cycle_free, ...

Both files are shipped with the package so end users never need to regenerate them. Run write_kernel_dependency_index() only after adding or removing .cl files from a library directory.

# Regenerate after changing the nmath library
nmath_dir <- system.file("cl", "nmath", package = "nmathopencl")
write_kernel_dependency_index(nmath_dir, verbose = TRUE)

Loading with the index — load_library_for_kernel() (R)

load_library_for_kernel(kernel_path, library_dir,
                        depends_tag = "all_depends",
                        index = NULL)

Reads the annotation tag @{depends_tag} from the kernel file to get the list of needed stems, looks them up in the pre-loaded index, and returns their source concatenated in correct dependency order. Returns "" when the kernel has no @{depends_tag} annotation (e.g. a kernel using only OpenCL built-ins).

The index argument should always be supplied. Load it once and reuse it across all calls to avoid reading from disk on every invocation:

nmath_dir <- system.file("cl", "nmath", package = "nmathopencl")

# Load the index once
idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds"))

# Re-use across multiple kernels --- only the subset for each kernel is read
src_gaussian <- load_library_for_kernel(
    kernel_path = system.file("cl", "ex_glmbayes_src",
                              "f2_f3_gaussian.cl", package = "nmathopencl"),
    library_dir = nmath_dir,
    depends_tag = "depends_nmath",
    index       = idx
)
src_gamma <- load_library_for_kernel(
    kernel_path = system.file("cl", "ex_glmbayes_src",
                              "f2_f3_gamma.cl", package = "nmathopencl"),
    library_dir = nmath_dir,
    depends_tag = "depends_nmath",
    index       = idx
)

The annotation tag in the kernel file must follow the same format used throughout the library:

// @depends_nmath: dbinom, pnorm, dnorm

Extracting a subset — extract_library_subset()

extract_library_subset(kernel_paths, library_dir, dest_dir,
                       depends_tag = "all_depends",
                       index = NULL,
                       overwrite = FALSE)

Given a set of kernel files, reads their @{depends_tag} annotations, takes the union of all transitive dependencies, and copies exactly those .cl files (plus both index files) from library_dir to dest_dir in dependency order. The destination directory must already exist.

nmath_dir    <- system.file("cl", "nmath",           package = "nmathopencl")
src_dir      <- system.file("cl", "ex_glmbayes_src", package = "nmathopencl")
dest_dir     <- "inst/cl/mypkg_nmath"   # must exist

idx <- readRDS(file.path(nmath_dir, "kernel_dependency_index.rds"))

result <- extract_library_subset(
    kernel_paths = list.files(src_dir, pattern = "\\.cl$", full.names = TRUE),
    library_dir  = nmath_dir,
    dest_dir     = dest_dir,
    depends_tag  = "depends_nmath",
    index        = idx
)
print(result)  # data frame: stem, source, dest, copied

The result includes a row for kernel_dependency_index.rds and kernel_dependency_index.tsv so the extracted subset is immediately usable by both R and C++.

C++ equivalent — openclPort::load_library_for_kernel()

For use inside Rcpp kernel wrappers (where calling back into R is not permitted), kernel_loader.cpp provides a C++ implementation with the same semantics:

#include "openclPort.h"   // declares load_library_for_kernel

// Inside a kernel wrapper, after the family/link dispatch has set kernel_file:
std::string nmath_source = openclPort::load_library_for_kernel(
    kernel_file,          // e.g. "ex_glmbayes_src/f2_f3_binomial_logit.cl"
    "ex_glmbayes_nmath",  // library subdirectory
    "nmathopencl",        // package
    "depends_nmath"       // annotation tag
);

The C++ function reads kernel_dependency_index.tsv from the library directory once per call, parses the annotation tag from the kernel file, and loads only the required subset in the pre-computed order. No topological sort is performed at runtime.

The @depends_nmath annotation must list the direct nmath entry-point stems called by the kernel (not the full transitive list):

// @depends_nmath: dbinom, pnorm, dnorm

The transitive expansion is performed automatically using the TSV index.

Error messages

Error Cause
"OpenCL support is not available" Package built without USE_OPENCL
"Dependency sort failed: unresolved dependencies remain." A @depends entry names a file that does not exist in the directory, or there is a genuine circular dependency
"File not found: inst/cl/<path>" relative_path does not match any file in the package

mirror server hosted at Truenetwork, Russian Federation.