ONNX Model Import

ggmlR includes a built-in zero-dependency ONNX loader (hand-written protobuf parser in C). Load any compatible ONNX model and run inference on CPU or Vulkan GPU — no Python, no TensorFlow, no ONNX Runtime required.

Note: The examples below require a valid .onnx model file. Replace "path/to/model.onnx" with the actual path on your system.

library(ggmlR)

1. Load and inspect a model

model <- onnx_load("path/to/model.onnx")

# Model summary (layers, ops, parameters)
onnx_summary(model)

# Input tensor info (name, shape, dtype)
onnx_inputs(model)

2. Run inference

Inputs are named R arrays in NCHW order (matching the ONNX model’s expected layout).

# Random image batch — replace with real data
input <- array(runif(1 * 3 * 224 * 224), dim = c(1L, 3L, 224L, 224L))

result <- onnx_run(model, list(input_name = input))

cat("Output shape:", paste(dim(result[[1]]), collapse = " x "), "\n")

For models with multiple inputs, pass a named list:

result <- onnx_run(model, list(
  input_ids      = array(as.integer(tokens), dim = c(1L, length(tokens))),
  attention_mask = array(1L, dim = c(1L, length(tokens)))
))

3. GPU inference

By default ggmlR tries Vulkan first and falls back to CPU automatically. To force a specific backend:

# Check what's available
if (ggml_vulkan_available()) {
  cat("Vulkan GPU ready\n")
  ggml_vulkan_status()
}

# Load with explicit device
model_gpu <- onnx_load("path/to/model.onnx", device = "vulkan")
model_cpu <- onnx_load("path/to/model.onnx", device = "cpu")

Weights are transferred to the GPU once at load time. Repeated calls to onnx_run() do not re-transfer weights.


4. Dynamic input shapes

Some models accept variable-length inputs. Override shapes at load time:

model <- onnx_load("path/to/bert.onnx",
                    input_shapes = list(input_ids = c(1L, 128L)))

5. FP16 inference

Run in half-precision for faster GPU inference:

model_fp16 <- onnx_load("path/to/model.onnx", dtype = "f16")
result <- onnx_run(model_fp16, list(input = input))

6. Supported operators

ggmlR supports 50+ ONNX operators, including:

Custom fused ops: RelPosBias2D (BoTNet).


7. Examples

For full working examples with real ONNX Zoo models see:

# GPU vs CPU benchmark across multiple models
# inst/examples/benchmark_onnx.R

# FP16 inference benchmark
# inst/examples/benchmark_onnx_fp16.R

# Run all supported ONNX Zoo models
# inst/examples/test_all_onnx.R

# BERT sentence similarity
# inst/examples/bert_similarity.R

8. Debugging tips

If a model fails to load or produces wrong results:

  1. Check operator support — print the model’s op list with Python’s onnx package and compare against the table above.

  2. Verify protobuf field numbers — the built-in parser is hand-written; an unexpected field can cause silent mis-parsing.

  3. NaN tracing — use the eval callback for per-node inspection rather than a post-compute scan (which aliases buffers and gives false readings).

  4. Repeated-run aliasingggml_backend_sched aliases intermediate buffers over weight buffers. ggmlR calls sched_alloc_and_load() before each compute to reset allocation. If you see correct results on the first run but garbage on subsequent runs, this is the cause.

mirror server hosted at Truenetwork, Russian Federation.