r4subtrace is the traceability engine in the
R4SUB ecosystem. It quantifies and explains
end-to-end traceability between clinical submission
artifacts – primarily ADaM outputs <-> derivations
<-> SDTM sources <-> specs <-> code – and
converts trace evidence into standardized R4SUB Evidence
Table rows (from r4subcore).
It focuses on answering one question:
Can we prove where each analysis variable/value came from, and can a reviewer follow it?
In real submissions, issues are rarely “a single failed rule.” Many are trace failures: - Missing or ambiguous derivation documentation - ADaM variable not linkable to SDTM sources - Mismatch between spec and what code produces - Inconsistent naming across specs, define.xml, and datasets - Reviewer cannot reproduce or validate lineage
r4subtrace formalizes traceability as evidence +
measurable indicators.
pak::pak(c("R4SUB/r4subcore", "R4SUB/r4subtrace"))library(r4subcore)
library(r4subtrace)
ctx <- r4sub_run_context(study_id = "ABC123", environment = "DEV")adam_meta <- read.csv("adam_metadata.csv") # columns: dataset, variable, label, type
sdtm_meta <- read.csv("sdtm_metadata.csv") # same structure
map <- read.csv("trace_map.csv")
# recommended columns:
# adam_dataset, adam_var, sdtm_domain, sdtm_var, derivation_text(optional), confidence(optional)tm <- build_trace_model(
adam_meta = adam_meta,
sdtm_meta = sdtm_meta,
mapping = map
)
ev <- trace_model_to_evidence(tm, ctx = ctx, source_name = "r4subtrace", source_version = "0.1.0")
validate_evidence(ev)
evidence_summary(ev)ind <- trace_indicator_scores(ev)
indA list with:
nodes: tidy table of assets
(dataset/variable/spec/program)edges: tidy table of relationships + confidencediagnostics: issues found (orphans, ambiguities,
conflicts)Evidence rows are emitted for:
TRACE_VAR_COVERAGE_L2PLUS: proportion of ADaM variables
with L2+ traceTRACE_VAR_COVERAGE_L3PLUS: proportion with L3+
traceTRACE_ORPHAN_VAR_COUNT: orphan ADaM vars with no SDTM
mappingTRACE_AMBIGUOUS_MAPPING_COUNT: vars mapped to multiple
SDTM sourcesTRACE_MEAN_TRACE_LEVEL: mean trace level across all
ADaM variablesMIT