ggmlR includes a built-in zero-dependency ONNX loader (hand-written protobuf parser in C). Load any compatible ONNX model and run inference on CPU or Vulkan GPU — no Python, no TensorFlow, no ONNX Runtime required.
Note: The examples below require a valid
.onnxmodel file. Replace"path/to/model.onnx"with the actual path on your system.
model <- onnx_load("path/to/model.onnx")
# Model summary (layers, ops, parameters)
onnx_summary(model)
# Input tensor info (name, shape, dtype)
onnx_inputs(model)Inputs are named R arrays in NCHW order (matching the ONNX model’s expected layout).
# Random image batch — replace with real data
input <- array(runif(1 * 3 * 224 * 224), dim = c(1L, 3L, 224L, 224L))
result <- onnx_run(model, list(input_name = input))
cat("Output shape:", paste(dim(result[[1]]), collapse = " x "), "\n")For models with multiple inputs, pass a named list:
result <- onnx_run(model, list(
input_ids = array(as.integer(tokens), dim = c(1L, length(tokens))),
attention_mask = array(1L, dim = c(1L, length(tokens)))
))By default ggmlR tries Vulkan first and falls back to CPU automatically. To force a specific backend:
# Check what's available
if (ggml_vulkan_available()) {
cat("Vulkan GPU ready\n")
ggml_vulkan_status()
}
# Load with explicit device
model_gpu <- onnx_load("path/to/model.onnx", device = "vulkan")
model_cpu <- onnx_load("path/to/model.onnx", device = "cpu")Weights are transferred to the GPU once at load time. Repeated calls
to onnx_run() do not re-transfer weights.
Some models accept variable-length inputs. Override shapes at load time:
Run in half-precision for faster GPU inference:
model_fp16 <- onnx_load("path/to/model.onnx", dtype = "f16")
result <- onnx_run(model_fp16, list(input = input))ggmlR supports 50+ ONNX operators, including:
Custom fused ops: RelPosBias2D (BoTNet).
For full working examples with real ONNX Zoo models see:
# GPU vs CPU benchmark across multiple models
# inst/examples/benchmark_onnx.R
# FP16 inference benchmark
# inst/examples/benchmark_onnx_fp16.R
# Run all supported ONNX Zoo models
# inst/examples/test_all_onnx.R
# BERT sentence similarity
# inst/examples/bert_similarity.RIf a model fails to load or produces wrong results:
Check operator support — print the model’s op
list with Python’s onnx package and compare against the
table above.
Verify protobuf field numbers — the built-in parser is hand-written; an unexpected field can cause silent mis-parsing.
NaN tracing — use the eval callback for per-node inspection rather than a post-compute scan (which aliases buffers and gives false readings).
Repeated-run aliasing —
ggml_backend_sched aliases intermediate buffers over weight
buffers. ggmlR calls sched_alloc_and_load() before each
compute to reset allocation. If you see correct results on the first run
but garbage on subsequent runs, this is the cause.