openclPort namespaceThe openclPort namespace (declared in
openclPort.h and implemented in
opencl_kernel_runners.cpp) provides the generic OpenCL
infrastructure that underlies all of nmathopencl’s GPU
execution. It is designed to be completely independent of any particular
mathematical library: it knows only about OpenCL C types and standard
C++ types, not about nmath, distributions, or R data
structures.
This namespace is currently shipped inside nmathopencl
to make the package self-contained. It will migrate to a standalone
openclPort CRAN package in a future release (see Chapter 00
for the release roadmap). The design described here will remain stable
during that migration; downstream packages using these utilities via
#include "openclPort.h" will require only a change in their
LinkingTo field.
opencl_dbl_scalar_kernel_runnerThis is the single generic runner that powers every kernel in
nmathopencl.
| Parameter | Type | Description |
|---|---|---|
kernel_source |
const std::string& |
Complete, concatenated OpenCL program source (shims + library + kernel). Must be syntactically valid OpenCL C. |
kernel_name |
const char* |
Name of the __kernel void function to execute, e.g.,
"dnorm_kernel". |
dargs |
const std::vector<double>& |
Scalar double arguments to pass to the kernel, in the
order the kernel declares them. |
n_out |
int |
Number of double values to read back from the output
buffer. Also becomes the global work size. |
out_flat |
std::vector<double>& |
Output buffer; resized to n_out on entry and filled
with the kernel’s results. |
The runner performs the complete OpenCL lifecycle for a single kernel invocation:
clGetPlatformIDs and clGetDeviceIDs to select
the first available GPU device.cl_context and a cl_command_queue for the
selected device.clCreateProgramWithSource followed by
clBuildProgram. On failure, it retrieves the build log via
CL_PROGRAM_BUILD_LOG and throws a
std::runtime_error containing the compiler output.clCreateKernel
with the supplied kernel_name.CL_MEM_WRITE_ONLY device buffer of size
n_out * sizeof(double).dargs as a scalar kernel argument
(clSetKernelArg) in index order, then sets the output
buffer as the next argument, and finally n_out as the last
integer argument.n_out via
clEnqueueNDRangeKernel.clFinish
then reads the output buffer back to host memory via
clEnqueueReadBuffer.Error checking is performed after every OpenCL API call. Any
non-CL_SUCCESS status causes the runner to clean up all
already-created resources and throw a std::runtime_error
with a message that includes the status code, its symbolic name, and the
platform/device information.
The runner assumes the kernel is declared with the following argument signature:
__kernel void my_kernel(
double arg0, double arg1, ..., double argN,
__global double* out,
int n_out
)The dargs vector maps element-by-element to
arg0 through argN. The runner always appends
the output buffer and n_out as the last two arguments
automatically. Kernels in inst/cl/src/ that accept vector
inputs (e.g., dnorm_kernel) do not fit this layout and use
a separate, more specialized runner in kernel_runners.cpp;
the double-scalar runner handles the scalar-parameter kernels that
produce replicated output.
// Inside a kernel_wrappers.cpp function (nmathopencl namespace):
namespace nmathopencl {
Rcpp::NumericVector dnorm_opencl_impl(
int n, double mu, double sigma, bool give_log, bool verbose)
{
std::string src = openclPort::load_kernel_library("nmath") + "\n"
+ openclPort::load_kernel_source("src/dnorm_scalar_kernel.cl");
std::vector<double> out(n);
openclPort::opencl_dbl_scalar_kernel_runner(
src,
"dnorm_scalar_kernel",
{ mu, sigma, static_cast<double>(give_log) }, // dargs
n, // n_out
out
);
return Rcpp::NumericVector(out.begin(), out.end());
}
} // namespace nmathopenclFive inline functions in openclPort.h
provide consistent error reporting across all runners. Because they are
inline, they are available to any translation unit that
includes openclPort.h; downstream packages that
LinkingTo: nmathopencl get them for free.
opencl_status_nameReturns the symbolic name of an OpenCL error code as a C string:
CL_SUCCESS, CL_DEVICE_NOT_FOUND, CL_BUILD_PROGRAM_FAILURE, ...
Returns "UNKNOWN_OR_VENDOR_SPECIFIC" for codes not in
the table. Covers all error codes defined by the OpenCL 3.0
specification that are commonly encountered in practice.
opencl_status_hintReturns a plain-English diagnostic hint for the most actionable error codes:
| Status | Hint |
|---|---|
CL_OUT_OF_RESOURCES |
Device/runtime resource limit exceeded (watchdog, register pressure, …) |
CL_OUT_OF_HOST_MEMORY |
Host memory allocation failed |
CL_MEM_OBJECT_ALLOCATION_FAILURE |
Device memory allocation failed |
CL_BUILD_PROGRAM_FAILURE |
Kernel compilation failed; inspect build log |
CL_INVALID_CONTEXT |
OpenCL context is invalid |
CL_DEVICE_NOT_AVAILABLE |
Device present but temporarily unavailable |
Returns "No additional hint available." for all other
codes.
opencl_read_platform_info_str /
opencl_read_device_info_strinline std::string opencl_read_platform_info_str(
cl_platform_id platform, cl_platform_info param);
inline std::string opencl_read_device_info_str(
cl_device_id device, cl_device_info param);Query string-valued platform or device info (e.g.,
CL_PLATFORM_NAME, CL_DEVICE_NAME,
CL_DRIVER_VERSION) safely: they handle null handles, query
failures, and trailing null terminators, returning
"unknown" on any error.
Typical usage:
opencl_make_context_errorinline std::runtime_error opencl_make_context_error(
cl_int status, cl_platform_id platform, cl_device_id device);Constructs a std::runtime_error with a message of the
form:
OpenCL error at clCreateContext (status=-6, name=CL_OUT_OF_HOST_MEMORY).
platform_name=NVIDIA CUDA, platform_vendor=NVIDIA Corporation,
device_name=NVIDIA GeForce RTX 4090, driver_version=560.94.
This may indicate a transient driver/runtime context failure.
Used inside opencl_dbl_scalar_kernel_runner and
f2_f3_kernel_runner immediately after a
clCreateContext failure.
openclPortstd::vector conversionstd::vector<double> openclPort::flattenMatrix(const Rcpp::NumericMatrix& mat);
std::vector<double> openclPort::copyVector(const Rcpp::NumericVector& vec);flattenMatrix converts a column-major R matrix to a
row-major (or simply contiguous) std::vector<double>
suitable for passing as a device buffer. copyVector does
the same for a numeric vector. Both are implemented in
OpenCL_helper.cpp.
openclPort)At the R prompt, host GPU inventory uses
opencltools::gpu_names() (see Chapters
00–01). Inside the package DLL, openclPort
exposes related C++ helpers for kernel runners:
Rcpp::CharacterVector openclPort::gpu_names();
int openclPort::detect_num_gpus_internal();
bool openclPort::has_opencl();
int openclPort::get_opencl_core_count();openclPort::gpu_names() — OpenCL device names from the
runtime linked into this package (distinct from
opencltools::gpu_names() R helper).detect_num_gpus_internal() — GPU device count for
envelope-style runners.nmathopencl_has_opencl() — compile-time
USE_OPENCL flag in this DLL.openclPort::get_opencl_core_count() — internal C++
helper (delegates to opencltools); at the R prompt use
opencltools::get_opencl_core_count().struct openclPort::OpenCLConfig {
bool have_expm1;
bool have_log1p;
std::string buildOptions;
};
openclPort::OpenCLConfig openclPort::configureOpenCL(
cl_context context, cl_device_id device);configureOpenCL probes the device to determine which
standard math functions are available as OpenCL built-ins (specifically
expm1 and log1p). It returns a set of
buildOptions (e.g., -DHAVE_EXPM1 -DHAVE_LOG1P)
that are passed to clBuildProgram so that the nmath
.cl files can conditionally use the built-in or the ported
version.
openclPort from a downstream packageA package that LinkingTo: nmathopencl can include
openclPort.h directly:
// In a downstream package's source file:
#include "openclPort.h"
void my_kernel_runner(const std::string& program_source, int n) {
std::vector<double> out;
openclPort::opencl_dbl_scalar_kernel_runner(
program_source,
"my_kernel",
{ 1.0, 2.5 }, // scalar args
n,
out
);
}No additional CMake or autoconf work is needed beyond the
LinkingTo field, because openclPort.h is
installed to inst/include/ and the exported symbols are
visible in the shared library.