This chapter covers strategies for verifying correctness, diagnosing
failures, and measuring performance of OpenCL kernel code developed on
top of nmathopencl.
Because every kernel wrapper contains a CPU fallback path, the most
reliable testing strategy compares the OpenCL output against the CPU
reference output on the same inputs. Standard R unit-test frameworks
(testthat, tinytest) work directly — write
tests that call the wrapper function and assert numerical agreement
within an appropriate tolerance (typically
.Machine$double.eps^0.5 for double-precision
kernels).
Key points:
nmathopencl_has_opencl() == FALSE) as well
as with it enabled. This ensures the fallback path is also covered.opencltools::verify_opencl_runtime() as a
pre-condition guard in any test that requires an active OpenCL
device.float vs double precision. Document your
tolerance assumptions.When a kernel fails to compile or execute, the OpenCL runtime reports
an error code. nmathopencl propagates these as R errors via
stop(). Common causes:
.cl source. Inspect the build log returned by
clGetProgramBuildInfo; nmathopencl includes it
in the error message.opencltools::gpu_names() to list
available devices.float instead of double. Verify that the
cl_khr_fp64 pragma is present and that all literals are
written as 1.0 (not 1.0f).Use bench::mark() or
microbenchmark::microbenchmark() to compare the GPU path
against the CPU fallback. A few guidelines:
clBuildProgram). Exclude the first
iteration or run a warm-up call before timing.clEnqueueWriteBuffer /
clEnqueueReadBuffer) are included in the wrapper timing.
For latency-sensitive use cases, consider whether the data can remain on
the device between calls.nmathopencl CPU fallback and the upstream
stats:: function to understand relative overheads.