| Title: | Pure-R Core Engine for Optimal Data Analysis (ODA / MultiODA) |
| Version: | 0.1.2 |
| Description: | Pure-R implementation of univariate binary-class ODA (UniODA), univariate multiclass ODA (MultiODA), and binary Classification Tree Analysis (CTA). Supports ordered and categorical attributes, priors-on inverse-frequency weighting, MAXSENS / SAMPLEREP / first-identified tie-breaking, true leave-one-out cross-validation, and Monte Carlo Fisher-randomization p-values. Covered UniODA, MultiODA, and binary CTA fixtures are tested for parity against MegaODA.exe and CTA.exe outputs. |
| License: | GPL-3 |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | graphics, grDevices, stats, utils |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, pkgdown, ggplot2 (≥ 3.4.0), patchwork (≥ 1.1.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| RoxygenNote: | 7.3.3 |
| URL: | https://njrhodes.github.io/oda_r/, https://github.com/njrhodes/oda_r |
| BugReports: | https://github.com/njrhodes/oda_r/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-06-09 20:39:01 UTC; nrhode |
| Author: | Nathaniel Rhodes [aut, cre], Paul Yarnold [ctb, cph] |
| Maintainer: | Nathaniel Rhodes <nrhode@midwestern.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-17 14:00:02 UTC |
Infer attribute types for SDA candidate columns. User-declared types in attr_types take precedence.
Description
Infer attribute types for SDA candidate columns. User-declared types in attr_types take precedence.
Usage
.auto_sda_infer_types(data, pool, attr_types)
Build parent map and endpoint-index map for LORT nodes (internal helper used by lort_index_path and lort_path_table)
Description
Build parent map and endpoint-index map for LORT nodes (internal helper used by lort_index_path and lort_path_table)
Usage
.lort_parent_maps(ort_nodes)
Arguments
ort_nodes |
Named list of LORT node objects from a |
Check that an object is an sda_fit.
Description
Check that an object is an sda_fit.
Usage
.sda_check_class(fit)
Identify duplicate (collinear) columns in X. Returns character vector of column names to drop (keeps first occurrence).
Description
Identify duplicate (collinear) columns in X. Returns character vector of column names to drop (keeps first occurrence).
Usage
.sda_find_collinear(X)
Remove correctly classified observations and return global index vectors.
Description
Remove correctly classified observations and return global index vectors.
Usage
.sda_remove_correctly_classified(y_true, y_pred, active_rows)
Resolve attr_types for SDA candidates (fill in "auto" for unspecified).
Description
Resolve attr_types for SDA candidates (fill in "auto" for unspecified).
Usage
.sda_resolve_attr_types(X, cand_names, attr_types)
Run one novometric_min_d SDA step via per-attribute MDSA.
Description
For each candidate attribute, runs cta_descendant_family() on the
active working sample starting at settings$mindenom. Applies the
MINDENOM gate (Axiom 1) then the p gate, and selects the eligible candidate
with minimum D, using the tie-breaking hierarchy from the SDA-4A contract.
Usage
.sda_step_novometric_min_d(
X,
y,
candidates,
active_rows,
class_levels,
settings
)
Value
Named list: winner_attr, winner_result,
candidate_table, stop_reason_hint.
Run one unioda_max_ess SDA step: evaluate all candidates, select max ESS.
Description
Run one unioda_max_ess SDA step: evaluate all candidates, select max ESS.
Usage
.sda_step_unioda_max_ess(
X,
y,
candidates,
active_rows,
class_levels,
attr_types,
settings
)
Validate the candidate frame for sda_fit.
Description
Validate the candidate frame for sda_fit.
Usage
.sda_validate_candidate_frame(X, y, min_class_n = NULL)
Convert a tidy confusion data frame to a 2x2 integer matrix
Description
Converts the data.frame returned by cta_confusion_table
(columns actual, predicted, n) to a 2x2 integer matrix
suitable for novo_boot_ci.
Usage
as_confusion_matrix(df)
Arguments
df |
A |
Value
A 2x2 integer matrix with rows = actual class (0/1) and columns =
predicted class (0/1), matching the training_confusion convention
used throughout oda. Row and column names are "0" and
"1".
See Also
cta_confusion_table, novo_boot_ci
Examples
# From raw data frame:
df <- data.frame(
actual = c(0L, 0L, 1L, 1L),
predicted = c(0L, 1L, 0L, 1L),
n = c(146L, 40L, 36L, 33L)
)
m <- as_confusion_matrix(df)
novo_boot_ci(m, nboot = 200L, seed = 1L)
# From a fitted tree:
fit <- cta_fit(data.frame(x = seq_len(8L)),
c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L),
mindenom = 2L, mc_iter = 100L, loo = "off")
ct <- cta_confusion_table(fit)
m <- as_confusion_matrix(ct)
novo_boot_ci(m, nboot = 200L, seed = 42L)
Subset a data frame to the SDA-selected candidate columns
Description
Returns X restricted to the columns identified by
sda_selected_attributes(fit). Intended to produce the constrained
candidate frame for cta_fit or
cta_descendant_family.
Usage
as_cta_candidates(fit, X)
Arguments
fit |
An |
X |
Data frame or matrix containing at least all selected attribute columns. Extra columns are dropped silently. |
Value
Data frame with columns matching sda_selected_attributes(fit),
in SDA step order.
Convert an object to an sda_anchor
Description
Generic converter. Methods are provided for sda_fit and
data.frame. Use sda_anchor for direct construction.
Usage
as_sda_anchor(x, ...)
## S3 method for class 'sda_fit'
as_sda_anchor(x, ...)
## S3 method for class 'data.frame'
as_sda_anchor(
x,
selected_attributes,
candidate_universe = NULL,
group_levels = NULL,
canon_notes = c("Explicit / manual anchor - user-declared stage table",
"Not derived from sda_fit", "This anchor is for future SORT / staged CTA workflows",
"SORT is not implemented", "GORT is not implemented"),
...
)
Arguments
x |
A data frame with at least columns |
... |
Additional arguments passed to methods. |
selected_attributes |
Character vector of attribute names in stage
order. Must match |
candidate_universe |
Character vector of all candidate attributes, or
|
group_levels |
Integer vector, or |
canon_notes |
Character vector describing the source. |
Value
Object of class c("sda_anchor", "list").
See Also
sda_anchor, validate_sda_anchor
Dry-run planning and validation layer for SDA
Description
Validates and constructs a candidate set for sda_fit without
fitting. Returns an auditable plan object that records which columns were
accepted, which were excluded and why, and what settings would be passed to
sda_fit().
Usage
auto_sda_plan(
data,
outcome,
candidates = NULL,
exclude = NULL,
role_map = NULL,
time_map = NULL,
stage_map = NULL,
attr_types = NULL,
collinearity_threshold = 1,
min_n = NULL,
min_class_n = NULL,
mode = c("unioda_max_ess", "novometric_min_d"),
dry_run = TRUE
)
Arguments
data |
A data frame. |
outcome |
Character scalar: name of the binary outcome column in
|
candidates |
Character vector of candidate column names, or |
exclude |
Character vector of column names to force-exclude from candidates regardless of other checks. |
role_map |
Named list mapping column names to declared roles:
|
time_map |
Named numeric/integer vector mapping column names to time
indices. Columns with |
stage_map |
Named integer vector mapping column names to stage assignments. Stored for downstream use; not used for exclusions. |
attr_types |
Named character vector mapping column names to declared
attribute types ( |
collinearity_threshold |
Numeric threshold for collinearity detection.
Default |
min_n |
Passed through to |
min_class_n |
Passed through to |
mode |
SDA mode: |
dry_run |
Logical. Must be |
Details
Agent principle: auto_sda_plan() proposes and validates.
It does not silently decide causal validity, temporal ordering, exposure
roles, or outcome roles. If temporal or causal structure is required,
declare it via role_map, time_map, or stage_map.
Value
Object of class c("auto_sda_plan", "odacore_plan").
Assign observations to CTA terminal endpoints
Description
Traverses the fitted cta_tree for each row of newdata and
returns the terminal leaf reached, expressed as both its stored node
identifier (endpoint_node_id) and its sequential endpoint index
(endpoint_id) matching cta_endpoint_summary.
No endpoint membership is stored at fit time. This function performs the
traversal on demand so the cta_tree object remains lean. The
returned endpoint_id can be joined with the output of
cta_propensity_weights to assign endpoint-level stabilized
weights to individual observations.
Column order requirement: newdata must have the same
attribute column order as the X matrix passed to
oda_cta_fit. Traversal uses the stored integer column
positions (attr_col) from the fit, not column names. If both
names(newdata) and tree$attr_names are non-NULL, a warning is
issued when they disagree at the split attribute positions.
Missingness:
"na"(default)Canonical path-local behaviour: when a split attribute value is
NAor a stored miss-code on the observation's actual traversal path, the row returnsNAfor both output columns. This matches the canonicalmissing_action = "na"semantics ofpredict."majority"Routes the observation to the child subtree with the larger
n_obs, then continues traversal to a terminal leaf. Ties are resolved by selecting the first child.
Usage
cta_assign_endpoints(tree, newdata, missing_action = c("na", "majority"))
Arguments
tree |
A |
newdata |
A |
missing_action |
Character; one of |
Details
Observation-level propensity weights (workflow sketch):
ep <- cta_assign_endpoints(tree, X_train, missing_action = "na")
pw <- cta_propensity_weights(tree, target_class = 1L, adjusted = TRUE)
# One row per classified training observation with its weight:
obs <- merge(
data.frame(row_id = seq_len(nrow(X_train)),
class = as.character(y_train)),
merge(ep, pw[, c("endpoint_id", "class", "adjusted_propensity_weight")],
by = "endpoint_id"),
by = c("row_id", "class")
)
# Rows with NA endpoint_id (missing root attribute) drop naturally.
Observation-level propensity weight expansion is intentionally left to the
caller so that the cta_tree object stores no observation indices.
Value
A data.frame with one row per row of newdata and columns:
row_idInteger; positional row index in
newdata(1 tonrow(newdata)).endpoint_node_idInteger;
node_idof the terminal leaf reached by traversal.NA_integer_when the observation cannot be routed to a terminal leaf (missing split attribute withmissing_action = "na", or no-tree fit).endpoint_idInteger; sequential endpoint index matching
cta_endpoint_summary.NA_integer_under the same conditions asendpoint_node_id.
For no-tree fits all rows have endpoint_node_id = NA_integer_ and
endpoint_id = NA_integer_.
See Also
oda_cta_fit, cta_endpoint_summary,
cta_propensity_weights, predict.cta_tree
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
ep <- cta_assign_endpoints(tree, X)
head(ep)
CTA covariate balance evidence-interval summary
Description
Builds one row per analysis scale (multivariate CTA) containing the
observed full-tree ESS/WESS, a bootstrap confidence interval, and a chance
interval. This is the multivariate analogue of
oda_balance_effect_table: a single CTA ENUMERATE run per
bootstrap or permutation iteration classifies all covariates jointly.
Usage
cta_balance_effect_summary(
group,
X,
w = NULL,
compare_weights = FALSE,
mindenom = 1L,
nboot = 200L,
chance_iter = 200L,
ci = 0.95,
mc_seed = NULL,
mc_iter = 5000L,
...
)
Arguments
group |
Integer (or coercible) binary group indicator. |
X |
Data frame of baseline covariate columns. |
w |
Optional numeric case-weight vector. |
compare_weights |
Logical; when |
mindenom |
Integer minimum endpoint denominator. Default |
nboot |
Integer bootstrap resamples. Default |
chance_iter |
Integer group-label permutations. Default |
ci |
Numeric nominal coverage. Default |
mc_seed |
Integer RNG seed set once at function entry. |
mc_iter |
Integer CTA MC iterations per node for the observed fit.
Default |
... |
Additional arguments forwarded to |
Details
Three passes are run:
-
Observed: full
cta_fit()withmc_iter– point estimate and tree metadata. -
Bootstrap:
nbootrow-resamples,loo = "off"– ESS/WESS percentile CI.no_treeresults contribute0. -
Chance:
chance_itergroup-label permutations – null percentile interval.no_treeresults contribute0.
no_tree convention: when CTA finds no admissible tree on a
bootstrap or chance iteration, ESS = 0 (no discrimination above chance).
The observed no_tree result is also recorded as estimate = 0.
Value
A list of class "cta_balance_effect_summary" with:
rowsData frame; one row per analysis scale. Columns:
analysis,metric,estimate,boot_lo,boot_hi,chance_lo,chance_hi,d_stat,n_endpoints,root_attribute,status,balance_interpretation.metaList:
n_obs,has_weights,compare_weights,analyses,mindenom,nboot,chance_iter,ci,mc_iter,mc_seed.
References
Linden A, Yarnold PR (2016). Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice, 22(6), 861-867.
See Also
cta_balance_table, plot_cta_balance_effects
Examples
X <- data.frame(
A = c(rep(0L, 20), rep(1L, 20), rep(1L, 20)),
B = c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
)
group <- c(rep(0L, 40), rep(1L, 20))
ces <- cta_balance_effect_summary(group, X, mindenom = 5L,
mc_iter = 200L, mc_seed = 42L,
nboot = 20L, chance_iter = 20L)
ces$rows[, c("analysis", "estimate", "boot_lo", "boot_hi",
"chance_lo", "chance_hi", "status")]
Renderer-ready plot data for CTA covariate balance
Description
Transforms a cta_balance_table result into a
renderer-independent data structure suitable for Graphics v3 plotting.
For no_tree results, populates no_tree_message with the
favorable-balance interpretation.
Usage
cta_balance_plot_data(cta_balance, target_class = 1L, digits = 1L)
Arguments
cta_balance |
A |
target_class |
Integer; target class for endpoint coloring in the
embedded tree diagram. Default |
digits |
Integer; decimal digits passed to |
Details
This function does not fit any CTA models. It is a pure
transformation of the pre-computed cta_balance_table result.
Value
A list of class "cta_balance_plot_data" with elements:
statusCharacter;
"valid_tree","stump","no_tree", or"fit_error".balance_interpretationCharacter.
no_tree_messageCharacter; human-readable no-tree annotation for renderers;
NAwhen status is not"no_tree".cta_pdList from
cta_plot_datawhen a valid tree or stump was found;NULLfor no_tree or fit_error.ess_displayNumeric; full-tree ESS/WESS (%);
NAfor no_tree.d_statNumeric;
NAfor no_tree.has_weightsLogical.
ess_labelCharacter;
"WESS"or"ESS".
See Also
cta_balance_table, cta_plot_data
Examples
X <- data.frame(
A = c(rep(0L, 20), rep(1L, 20), rep(1L, 20)),
B = c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
)
group <- c(rep(0L, 40), rep(1L, 20))
ct <- cta_balance_table(group, X, mindenom = 5L,
mc_iter = 200L, mc_seed = 42L)
cpd <- cta_balance_plot_data(ct)
cpd$status
Multivariate CTA covariate balance diagnostics
Description
Fits a single cta_fit model with group as the class
variable and all columns of X as candidate predictors. Returns a
structured summary of the CTA balance result.
Usage
cta_balance_table(
group,
X,
w = NULL,
mindenom = 1L,
alpha = 0.05,
loo = "off",
mc_iter = 5000L,
mc_seed = NULL,
...
)
Arguments
group |
Integer (or coercible) binary group indicator. Must have exactly two distinct non-missing values. |
X |
Data frame of baseline covariate columns. |
w |
Optional numeric case-weight vector. When supplied, CTA uses case
weights and |
mindenom |
Integer minimum endpoint denominator passed to
|
alpha |
Numeric significance threshold stored in the result and used
in the |
loo |
LOO gate mode passed to |
mc_iter |
Integer MC iterations per CTA node. Default |
mc_seed |
Integer RNG seed; |
... |
Additional arguments forwarded to |
Details
A status = "no_tree" result means no combination of baseline
covariates in X predicted group membership at the declared
significance level, LOO constraint, and minimum endpoint denominator.
This is favorable evidence of multivariable covariate balance
under the declared analytic constraints. It must not be interpreted as
a model failure; in balance analysis, inability to discriminate groups is
the goal.
group vs. outcome: group is the binary class variable.
The scientific outcome is strictly out of scope.
Implementation constraint: this function calls cta_fit
once; it does not reimplement ENUMERATE or node-growth logic.
Value
A list of class "cta_balance_table" with fields:
statusCharacter:
"valid_tree","stump","no_tree", or"fit_error".balance_interpretationCharacter:
"discriminating"or"no_discriminating_combinations"(whenno_tree);NAon fit error.root_attributeCharacter; root split variable name;
NAwhenno_tree.n_endpointsInteger; number of terminal endpoints;
NAwhenno_tree.overall_essNumeric; full-tree ESS (%) when weights not active;
NAotherwise.overall_wessNumeric; full-tree WESS (%) when weights active;
NAotherwise.ess_displayNumeric; operative measure (
overall_wesswhen weights active, elseoverall_ess);NAfor no_tree.d_statNumeric; parsimony-adjusted D statistic;
NAfor no_tree.mindenomInteger; MINDENOM used.
alphaNumeric; significance threshold stored for downstream use.
has_weightsLogical; whether case weights were active.
treeThe raw
cta_treeobject;NULLon fit error.endpoint_tableData frame from
cta_endpoint_table; zero-row for no_tree.node_tableData frame from
cta_node_table.fit_errorLogical;
TRUEwhencta_fitthrew.fit_reasonCharacter; error message when
fit_error;NAotherwise.
References
Linden A, Yarnold PR (2016). Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice, 22(6), 861-867.
See Also
cta_balance_plot_data, oda_balance_table,
cta_fit
Examples
X <- data.frame(
A = c(rep(0L, 20), rep(1L, 20), rep(1L, 20)),
B = c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
)
group <- c(rep(0L, 40), rep(1L, 20))
ct <- cta_balance_table(group, X, mindenom = 5L,
mc_iter = 200L, mc_seed = 42L)
ct$status
ct$balance_interpretation
Extract training confusion matrix from a fitted CTA tree
Description
Convenience wrapper: returns the 2x2 integer training confusion matrix for a
binary oda_cta_fit result directly, without the intermediate
tidy long-format step required by cta_confusion_table and
as_confusion_matrix.
Usage
cta_confusion_matrix(tree)
Arguments
tree |
A |
Details
Rows are actual class (0/1), columns are predicted class (0/1).
Returns NULL invisibly when tree$no_tree is TRUE.
Value
A 2x2 integer matrix (actual x predicted) or NULL when no
tree was found.
See Also
cta_confusion_table, as_confusion_matrix
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
if (!isTRUE(tree$no_tree)) cta_confusion_matrix(tree)
Final selected tree training confusion table
Description
Returns the stored full-tree training confusion matrix for the final selected CTA model in tidy long format (one row per actual x predicted class pair).
The confusion matrix is captured at fit time at the exact moment the winning candidate is selected, using the same scoring predictions. For the expanded ENUMERATE phase, predictions use majority-fallback for missing attributes. For the root-only stump phase, predictions are path-local (observations whose root attribute is missing are excluded).
This function does not report split-node local confusion. Split-node confusion reflects all observations at a node classified by that node's rule alone; it is not the same as full-tree confusion for trees with more than one split. The two coincide incidentally for stumps but the semantics here are always final-tree.
Usage
cta_confusion_table(tree)
Arguments
tree |
A |
Value
A data.frame with columns:
actualInteger actual class label.
predictedInteger predicted class label.
nInteger raw count of observations with this actual x predicted combination in the final selected tree.
Rows are sorted by actual then predicted.
For a no-tree fit (or if training_confusion is absent), the returned
data frame has zero rows but the correct column structure.
See Also
oda_cta_fit, summary.cta_tree,
cta_endpoint_table, cta_node_table
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_confusion_table(tree)
D statistic for a fitted CTA tree
Description
Computes the parsimony-normalized classification criterion:
Usage
cta_d_stat(tree)
Arguments
tree |
A |
Details
D = \frac{100}{\text{ESS} / \text{strata}} - \text{strata}
where strata is the number of terminal leaf endpoints and
ESS is tree$overall_ess (WESS when case weights are active,
ESS otherwise).
Returns NA_real_ when:
-
tree$no_treeisTRUE; -
tree$overall_essis missing, non-finite, or\le 0; -
strata < 2.
Value
Numeric scalar D, or NA_real_.
See Also
cta_strata, cta_min_terminal_denom
CTA demonstration dataset
Description
A simulated data frame with 200 observations and 6 variables, designed to
illustrate Classification Tree Analysis with cta_fit.
This is the dataset used in the CTA.exe demonstration program.
Format
A data frame with 200 rows and 6 columns:
- V1
Class label (integer; 1 or 2).
- V2
Ordered attribute (root in MINDENOM = 1 solution).
- V3
Ordered attribute.
- V4
Binary attribute (0/1).
- V5
Ordered attribute.
- V6
Ordered attribute.
Details
The CTA.exe golden output for MINDENOM = 1 selects V2 as root
(cut = 4.5, ESS = 52.63%). MINDENOM = 8 requires mc_iter = 25000
for parity.
Simulated dataset; no real subjects or PHI. Used as the primary
introductory CTA example in the oda package vignettes and in the
CTA.exe demonstration program (CTA_DEMO.pgm).
MDSA descendant family for CTA
Description
Traces the MDSA descendant family by fitting CTA models starting at
start_mindenom and stepping according to the novometric MDSA rule:
next MINDENOM = minimum terminal endpoint denominator + 1. The family
terminates when a no-tree fit is produced or max_steps is reached.
Usage
cta_descendant_family(
X,
y,
w = NULL,
...,
start_mindenom = 1L,
max_steps = 20L
)
Arguments
X |
Data frame of predictor attributes; passed to
|
y |
Integer class vector; passed to |
w |
Optional numeric case-weight vector; passed to
|
... |
Additional arguments forwarded to |
start_mindenom |
Integer MINDENOM for the first family member.
Defaults to |
max_steps |
Integer safety cap on the number of CTA fits; prevents
unbounded loops. Defaults to |
Value
A list of class cta_family with fields:
- members
List of
new_cta_family_memberobjects in order, including the terminal no-tree member.- mindenoms
Integer vector of MINDENOM values tried.
- summary
Data frame with one row per member:
mindenom,status("valid_tree","stump", or"no_tree"),strata,min_terminal_denom,overall_ess,d,no_tree.- min_d_idx
Integer index of the feasible (non-no-tree) member with minimum D;
NA_integer_if no feasible member exists.- terminated
Logical; always
TRUE.- termination_reason
Character: one of
"no_tree","max_steps","no_next_mindenom".
See Also
oda_cta_fit, cta_d_stat,
cta_min_terminal_denom, cta_strata
Per-endpoint class count table for a fitted CTA tree
Description
Returns one row per terminal endpoint (leaf) per actual class, read directly from stored leaf node fields. No refitting, no prediction, and no recomputation from training data is performed.
Class counts are stored at fit time by oda_cta_fit on
every terminal leaf. Row order within each endpoint follows the order
of names(leaf$class_counts_raw), which is ascending by class
label. Endpoints are ordered by node_id, matching
cta_endpoint_summary.
Scope: This function exposes stored raw and weighted class
counts only. It does not include target-class proportions,
event rates, odds, or staging order. Staging-table and event-rate
summaries are available via cta_staging_table.
If any terminal leaf is missing the stored class counts (i.e., the
cta_tree was fitted by an earlier version of oda that did
not store endpoint counts), the function stops with a clear error.
Usage
cta_endpoint_counts(tree)
Arguments
tree |
A |
Value
A data.frame with one row per terminal endpoint per actual class
and columns:
endpoint_idInteger sequential endpoint index 1..n in node order, matching
cta_endpoint_summary.endpoint_node_idInteger tree node identifier for this endpoint leaf.
pathCharacter; AND-joined branch labels from root to this leaf (e.g.
"V14<=0.5 AND V15>0.5").terminal_predictionInteger class label assigned to this endpoint (stored leaf
majority_class).classCharacter; actual class label for this row (e.g.
"0","1").n_rawInteger raw count of observations of this actual class reaching this endpoint.
n_weightedNumeric weighted total for this actual class reaching this endpoint. Equals
n_rawwhen case weights are not active.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
See Also
oda_cta_fit, cta_endpoint_summary,
cta_confusion_table, cta_endpoint_table,
cta_staging_table, cta_propensity_weights
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_endpoint_counts(tree)
Terminal endpoint denominators of a CTA tree
Description
Returns the observation counts (n_obs) for each terminal leaf node,
named by node ID. These are the raw row counts stored at fit time - they
are not recomputed from training data or predictions.
Usage
cta_endpoint_denominators(tree)
Arguments
tree |
A |
Details
Returns integer(0) for no-tree fits.
Value
Named integer vector of leaf n_obs values, named by node ID
(as character); integer(0) for no-tree fits.
See Also
cta_strata, cta_min_terminal_denom
Endpoint reporting summary for a fitted CTA tree
Description
Returns one row per terminal leaf (endpoint) with stable endpoint identifiers and stored node fields suitable for downstream reporting. All values are read directly from stored node fields; no refitting, no prediction, and no recomputation of tree metrics is performed.
Scope: This function reports structural endpoint fields only.
It does not include endpoint class counts, target-class proportions,
event rates, odds, or staging order. Per-endpoint class counts are available
via cta_endpoint_counts. Staging-table and event-rate summaries
are available via cta_staging_table.
Usage
cta_endpoint_summary(tree)
Arguments
tree |
A |
Value
A data.frame with one row per terminal leaf and columns:
endpoint_idInteger sequential index 1..n in node order.
endpoint_node_idInteger tree node identifier for this leaf, corresponding to
node_idincta_endpoint_table.pathCharacter; AND-joined branch labels from root to this leaf (e.g.
"V14<=0.5 AND V15>0.5").depthInteger depth from root (root = 1).
terminal_predictionInteger class label assigned to this endpoint (stored leaf
majority_class).n_obsInteger raw observation count at this endpoint.
n_weightedNumeric weighted observation count. Equals
n_obswhen case weights are not active (notNA).denominatorInteger endpoint denominator (equal to
n_obs); included to align with MPE/MDSA terminology.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
See Also
oda_cta_fit, cta_endpoint_table,
cta_strata, cta_endpoint_denominators,
cta_endpoint_counts, cta_staging_table
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_endpoint_summary(tree)
Canonical terminal endpoint map for a fitted CTA tree
Description
Returns one row per terminal leaf (endpoint) of a cta_tree. All
values are read directly from stored node fields; no refitting or prediction
is performed. This is the canonical endpoint map for reporting, translation,
ORT, and staged workflows.
Leaf class counts are stored on every terminal node at fit time
(class_counts_raw, class_counts_weighted). target_n
and target_prop are derived from the stored counts.
ESS, WESS, p, LOO status, LOO ESS/WESSL, and LOOp are canonical split-node
report metrics (see cta_node_table). Terminal endpoints are
connected to those metrics through their parent split-node lineage. The
parent_split_* columns expose the immediate parent split's canonical
metrics for auditability. They are not recomputed ESS at the leaf.
Usage
cta_endpoint_table(tree, target_class = NULL)
Arguments
tree |
A |
target_class |
Integer class label to use as the target (positive)
class for |
Value
A data.frame with one row per terminal leaf and columns:
endpoint_idInteger sequential endpoint index 1..n.
leaf_node_idInteger tree node identifier for this leaf.
terminal_markerCharacter
"*"on every row.terminalLogical
TRUEon every row.depthInteger depth from root (root = 1).
parent_split_node_idInteger parent split node identifier.
pathCharacter; AND-joined branch labels from root to this leaf (e.g.
"V14<=0.5 AND V15>0.5").nInteger raw observation count at this endpoint.
class_counts_rawList column; each element is a named integer vector of raw per-class counts, or
NULL.class_counts_weightedList column; each element is a named numeric vector of weighted per-class counts, or
NULL.predicted_classInteger class label assigned to this endpoint (stored leaf majority class).
target_nInteger count of
target_classobservations at this endpoint (NAwhen not resolvable).target_propNumeric proportion
target_n / n(NAwhen not resolvable).parent_split_attributeAttribute name of the parent split.
parent_split_essESS of the parent split node.
parent_split_wessWESS of the parent split node.
parent_split_loo_statusLOO status of the parent split node.
parent_split_loo_essLOO ESS/WESSL of the parent split node.
parent_split_p_mcMC p-value of the parent split node.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
See Also
oda_cta_fit, cta_node_table,
summary.cta_tree, cta_strata,
cta_endpoint_denominators, cta_endpoint_summary,
cta_endpoint_counts
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_endpoint_table(tree)
Tidy table of a CTA descendant family
Description
Returns a data.frame with one row per family member, reading all values
from the stored cta_family object. No refitting or recomputation is
performed.
Usage
cta_family_table(family)
Arguments
family |
A |
Value
A data.frame with columns:
indexInteger position of the member in the chain.
mindenomInteger MINDENOM used for this fit.
statusCharacter:
"valid_tree","stump", or"no_tree".no_treeLogical;
TRUEfor the terminal no-tree member.strataInteger number of terminal leaf endpoints;
NAfor no-tree members.min_terminal_denomInteger minimum leaf
n_obs;NAfor no-tree members.next_mindenomInteger MINDENOM for the next chain step (
min_terminal_denom + 1);NAfor no-tree members.overall_essNumeric overall ESS or WESS stored at fit time;
NAfor no-tree members.has_weightsLogical;
TRUEwhen case weights were active for this fit.dNumeric D statistic (
100 / (ESS / strata) - strata);NAfor no-tree members.selected_min_dLogical;
TRUEfor the feasible member with minimum D (indexfamily$min_d_idx). AllFALSEwhen no feasible member exists.
See Also
cta_descendant_family, summary.cta_family
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
fam <- suppressMessages(
cta_descendant_family(X, y, start_mindenom = 1L, mc_iter = 200L,
mc_seed = 42L, loo = "off")
)
cta_family_table(fam)
Fit a Classification Tree Analysis (CTA) model (public wrapper)
Description
Public entry point for CTA. Currently supports binary (two-class) outcome variables only.
When recursive = FALSE (default), validates the class variable and
delegates to oda_cta_fit. When recursive = TRUE,
runs the Locally Optimal Recursive Tree (LORT) engine: at each endpoint a full MDSA
family scan (cta_descendant_family) is performed, the min-D
member is selected, and recursion continues until no further structure is
found or a compute guard fires. Returns a dual-tagged
cta_ort / cta_tree object.
Usage
cta_fit(X, y, verbose = FALSE,
recursive = FALSE,
min_n = 30L,
max_depth = 8L,
max_nodes = 31L,
family_max_steps = 20L,
...)
Arguments
X |
Data frame or matrix of attribute columns. For recursive CTA,
|
y |
Integer class variable vector. Must have exactly two distinct values. |
verbose |
Logical; if |
recursive |
Logical; if |
min_n |
Integer; minimum endpoint n to attempt recursion. Endpoints
smaller than |
max_depth |
Integer; safety cap on recursion depth. Nodes at
|
max_nodes |
Integer; safety cap on total ORT nodes allocated. When
the node count exceeds |
family_max_steps |
Integer or |
... |
Additional arguments passed to |
Value
Non-recursive: a cta_tree object.
Recursive: a dual-tagged cta_ort / cta_tree object.
All existing cta_tree S3 methods (predict, print,
summary, plot) operate on the root-level model.
cta_ort-aware methods (predict.cta_ort,
print.cta_ort, summary.cta_ort, plot.cta_ort) operate
on the full composite tree. Use predict(obj, newdata, type="all")
to retrieve stratum assignments.
Note
oda_cta_fit() is the internal engine name; cta_fit() is the
preferred public entry point for non-recursive CTA. Both are exported and
functionally equivalent for non-recursive use.
cta_fit(..., recursive = TRUE) is a legacy-compatible interface for
the LORT workflow layer. Prefer lort_fit() for new code.
SORT and GORT are reserved and not implemented.
See Also
oda_fit, cta_descendant_family,
cta_node_table, cta_staging_table,
plot.cta_tree, plot.cta_ort,
ort_plot_data
Examples
# Small synthetic two-class example (non-recursive)
X <- data.frame(
x1 = c(1, 2, 3, 4, 5, 6, 7, 8),
x2 = c(0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L)
)
y <- c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)
tree <- cta_fit(X, y,
priors_on = TRUE,
mindenom = 1L,
mc_iter = 500L,
mc_seed = 42L,
loo = "off",
attr_names = c("x1", "x2")
)
print(tree)
# Recursive ORT - two-level synthetic dataset
X2 <- data.frame(
A = c(rep(0, 20), rep(1, 20), rep(1, 20)),
B = c(rep(0, 20), rep(0, 20), rep(1, 20))
)
y2 <- c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
ort <- cta_fit(X2, y2, recursive = TRUE,
mc_iter = 100L, mc_seed = 42L, loo = "off",
min_n = 5L)
print(ort)
Minimum terminal endpoint denominator of a CTA tree
Description
Returns the smallest leaf n_obs across all terminal endpoints.
This value drives the next MINDENOM step in the MDSA descendant family:
next_mindenom = cta_min_terminal_denom(tree) + 1L.
Usage
cta_min_terminal_denom(tree)
Arguments
tree |
A |
Details
Returns NA_integer_ for no-tree fits.
Value
Minimum leaf n_obs as an integer, or NA_integer_ for
no-tree fits.
See Also
cta_strata, cta_endpoint_denominators
Canonical CTA node report table
Description
Returns a data frame with one row per node, mirroring the CTA.exe-style node report (ATTRIBUTE, NODE, LEV, OBS, p, ESS/WESS, LOO, WESSL/LOO ESS, LOOp, TYP, MODEL columns).
Split nodes carry canonical split metrics (ESS, WESS, p, LOO status, LOO
ESS/WESSL, LOOp) and a MODEL field with branch strings and terminal-leaf
* markers. Leaf rows have NA for all split metrics and MODEL.
Usage
cta_node_table(tree)
Arguments
tree |
A |
Value
Data frame with columns:
node_idInteger node identifier.
parent_idInteger parent node identifier (0 for root).
levelInteger level from root (root = 1); alias for
depth.depthInteger depth from root (root = 1).
leafLogical;
TRUEfor terminal leaf nodes.attributeCharacter attribute name (
NAfor leaves).attr_typeCharacter attribute type (
NAfor leaves).n_obsInteger observation count at this node.
n_weightedNumeric weighted observation count.
p_mcNumeric Monte Carlo p-value (
NAfor leaves).essNumeric ESS at this split (
NAfor leaves).ess_weightedNumeric WESS at this split (
NAfor leaves); equalsesswhen case weights are not active.loo_statusCharacter LOO status, e.g.
"STABLE"(NAfor leaves).loo_essNumeric LOO ESS/WESSL (
NAfor leaves).loo_pNumeric LOO p-value (
NAfor leaves).modelCharacter CTA.exe-style branch string with terminal-leaf
*markers, e.g."<=0.5-->0,101/131,77.10%*; >0.5-->1,21/55,38.18%*".NAfor leaf nodes.
See Also
oda_cta_fit, cta_endpoint_table,
summary.cta_tree
Assign per-observation CTA propensity weights
Description
Convenience wrapper that calls cta_assign_endpoints and
cta_propensity_weights and returns a joined observation-level
data frame. The cta_tree object is not mutated; all computation is
on demand.
Column order requirement: newdata must have the same attribute
column order as the X matrix passed to oda_cta_fit.
Traversal uses the stored integer column positions (attr_col) from the
fit, not column names.
Unroutable observations: Observations with NA endpoint
(missing root split attribute under missing_action = "na") or
NA class label receive assigned = FALSE and NA for all
weight columns. The output always contains nrow(newdata) rows.
Unmatched classified observations: When a non-NA endpoint
observation's class is not present in the propensity weight table (e.g.,
a class unseen at fit time), a warning is issued and assigned = FALSE.
Usage
cta_observation_weights(tree, newdata, y, target_class = NULL,
adjusted = TRUE,
missing_action = c("na", "majority"))
Arguments
tree |
A |
newdata |
A |
y |
Class labels for each row of |
target_class |
Passed to |
adjusted |
Logical; passed to |
missing_action |
Character; one of |
Details
No observation-level data are stored in the cta_tree object at fit
time. This function performs traversal and weight lookup on demand.
No-tree fits: When the tree has no splits (leaf-only), all rows have
endpoint_id = NA_integer_ and assigned = FALSE.
Join semantics: The join key is
paste(endpoint_id, actual_class). Each observation is matched to the
propensity weight row whose class equals its actual_class.
The target_class parameter annotates all rows with the resolved design
target class but does not affect which rows participate in the join.
Value
A data.frame with nrow(newdata) rows and columns:
row_idInteger; positional row index (1 to
nrow(newdata)).actual_classCharacter; class label from
y, coerced to character.endpoint_node_idInteger; node ID of the terminal leaf reached by traversal, or
NA_integer_when unroutable.endpoint_idInteger; sequential endpoint index matching
cta_endpoint_summary, orNA_integer_.target_classInteger; resolved design target class annotation from
cta_propensity_weights, orNA_integer_when unassigned.propensity_weightNumeric; unadjusted propensity weight for the observation's endpoint–class cell, or
NAwhen unassigned.adjusted_propensity_weightNumeric; adjusted propensity weight (Yarnold-Linden correction for perfectly predicted endpoints), or
NAwhen unassigned.undefined_empiricalLogical;
TRUEwhen the endpoint–class cell has zero observed frequency, orNAwhen unassigned.perfectly_predicted_endpointLogical;
TRUEwhen all observations at the endpoint belong to one class, orNAwhen unassigned.adjustedLogical;
TRUEwhen the adjusted weight was applied at this endpoint, orNAwhen unassigned.assignedLogical;
TRUEwhen a propensity weight was successfully matched for this observation.
See Also
cta_assign_endpoints, cta_propensity_weights,
oda_cta_fit, cta_endpoint_summary
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
ow <- cta_observation_weights(tree, X, y)
head(ow)
Node-level summary table for a fitted LORT (legacy name: cta_ort)
Description
Returns one row per LORT node from a cta_ort (LORT) object. Each row
exposes the embedded CTA member selected at that node (MINDENOM, ESS, D,
root attribute, split/leaf counts, endpoint count) plus the LORT method
taxonomy metadata (method, selection_scope,
global_optimization, sda_anchored).
Terminal nodes have NA for all selected-model columns. Non-terminal
nodes have NA for stop_reason and non-empty child_ids.
Naming note: The function name cta_ort_node_table and the class
cta_ort are legacy compatibility names for the implemented LORT method.
They are retained for backward compatibility; new code and documentation should
refer to the method as LORT.
Usage
cta_ort_node_table(object)
Arguments
object |
A |
Value
A data.frame with one row per ORT node and columns:
ort_node_idInteger ORT node identifier.
parent_ort_node_idInteger parent ORT node id;
NAfor root.depthInteger recursion depth (root = 0).
nInteger observations at this ORT node.
class_countsCharacter; named class counts, e.g.
"0=60, 1=40".terminalLogical;
TRUEfor terminal leaf ORT nodes.stop_reasonCharacter stop reason for terminal nodes;
NAfor non-terminal.selected_mindenomInteger MINDENOM of the embedded CTA member.
selected_essNumeric ESS of the embedded CTA member (%).
selected_dNumeric D-statistic of the embedded CTA member.
selected_root_attributeCharacter root attribute of the embedded CTA member.
selected_tree_nodesInteger split-node count in the embedded CTA member.
selected_tree_leavesInteger leaf count in the embedded CTA member.
selected_endpoint_countInteger endpoint (terminal leaf) count of the embedded CTA member; equals number of ORT child nodes.
child_idsCharacter comma-separated child ORT node ids; empty string for terminal nodes.
methodCharacter; always
"lort"for current fits.selection_scopeCharacter; always
"local_node"for LORT.global_optimizationLogical; always
FALSEfor LORT.sda_anchoredLogical; always
FALSEfor LORT.
See Also
cta_fit, predict.cta_ort,
summary.cta_ort
Examples
X <- data.frame(A = c(rep(0L, 20), rep(1L, 20), rep(1L, 20)),
B = c(rep(0L, 20), rep(0L, 20), rep(1L, 20)))
y <- c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
ort <- cta_fit(X, y, recursive = TRUE, mc_iter = 100L, mc_seed = 42L,
loo = "off", min_n = 5L)
cta_ort_node_table(ort)
Extract layout data for plotting a CTA tree
Description
Returns a pure-data list describing tree topology and layout coordinates.
No graphics are produced. Use this as input to plot.cta_tree
or to custom rendering code.
Layout algorithm: leaves receive sequential integer x-positions in
depth-first (left-to-right) order; internal nodes are centred over their
children. y = -depth so the root sits at the top.
Target-class enrichment: when target_class is supplied,
each terminal leaf is joined to cta_staging_table and
annotated with target-class counts, proportions, and a continuous display
color derived from the endpoint's rank among all endpoints by ascending
target-class proportion. Colors encode relative position within this
tree's endpoint distribution and do not imply clinical thresholds
or categories.
Usage
cta_plot_data(tree, target_class = NULL, class_labels = NULL,
digits = 1, endpoint_palette = NULL)
Arguments
tree |
A |
target_class |
Integer (or |
class_labels |
Optional character vector of display names for class
labels. Supply as a named vector, e.g.
|
digits |
Integer number of decimal places for percentage formatting
in |
endpoint_palette |
Palette for endpoint fill colors, used only when
|
Value
When target_class = NULL: a list with elements
nodes, edges, no_tree, has_weights.
When target_class is supplied: the same list plus
endpoints (a staging data.frame with layout coordinates) and
target_class_used (the integer target class used).
nodesA
data.framewith one row per node. Always-present columns:node_id(integer),parent_id(integer),depth(integer),x(numeric),y(numeric),leaf(logical),attribute(character;NAfor leaves),n_obs(integer),majority_class(integer),ess(numeric;NAfor leaves),label(character multi-line display text).Additional columns present when
target_classis supplied (values areNAon split nodes):endpoint_id(integer),stage(integer),target_class(integer),target_n(numeric),denominator(numeric),target_proportion(numeric; raw continuous proportion, not binned),target_rank(integer; ascending rank of proportion, ties broken byties.method = "first"),endpoint_fill_color(character hex color assigned by rank within this tree - does not imply clinical thresholds or categories),predicted_label(character),target_label(character),endpoint_label(character multi-line display text).edgesA
data.framewith one row per parent-to-child edge and columnsfrom_node_id(integer),to_node_id(integer),x0,y0,x1,y1(numeric centre-to-centre coordinates),label(character branch condition, e.g."V14<=0.5").endpoints(
target_classonly) Adata.framewith one row per endpoint, ordered by ascending stage. Columns include all staging fields fromcta_staging_tableplus layout coordinatesx,y, display columnspredicted_label,target_label,endpoint_fill_color, and integertarget_rank.target_class_used(
target_classonly) The integertarget_classargument used for enrichment.no_treeLogical;
TRUEfor leaf-only fits.has_weightsLogical;
TRUEwhen case weights are active.
See Also
plot.cta_tree, cta_staging_table,
oda_cta_fit
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- suppressMessages(
oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L,
loo = "off")
)
# Structural layout only
pd <- cta_plot_data(tree)
head(pd$nodes)
pd$edges
# Target-class enrichment
pd2 <- cta_plot_data(tree, target_class = 1L,
class_labels = c("0" = "Manual", "1" = "Auto"))
pd2$endpoints[, c("stage", "target_proportion", "endpoint_fill_color")]
Endpoint-level propensity-score weights for a fitted CTA tree
Description
Returns one row per terminal endpoint per actual class, containing the CTA-derived stabilized propensity-style weights described in Yarnold and Linden (2017). All values are computed on demand from the stored leaf class counts; no refitting, no prediction, and no training-data recomputation is performed.
Formula: For endpoint s and actual class z,
w_{s,z} = \frac{n_s \cdot \Pr(Z=z)}{n_{s,z}}
where n_s is the endpoint denominator, n_{s,z} is the raw
count of class z observations at endpoint s, and
\Pr(Z=z) is the marginal class probability across the full
classified analytic sample.
Perfect endpoints: When n_{s,z} = 0 for some class, the
empirical weight is undefined (Inf). When adjusted = TRUE
(default), one hypothetical misclassified observation is added to the
absent class profile - and to the global marginal totals - so that all
endpoint x class cells yield finite adjusted weights. This is the canon
remedy from Yarnold and Linden (2017).
Scope: Raw observation counts (n_raw) are used exclusively.
The function does not return observation-level weights; those require
endpoint membership per training observation, which is not stored on the
fitted tree.
Usage
cta_propensity_weights(tree, target_class = NULL, adjusted = TRUE)
Arguments
tree |
A |
target_class |
Integer (or coercible); annotation column only -
does not filter output rows. |
adjusted |
Logical. |
Value
A data.frame with one row per terminal endpoint per actual class,
with columns:
endpoint_idInteger sequential endpoint index.
endpoint_node_idInteger tree node identifier.
pathCharacter; AND-joined branch labels from root.
terminal_predictionInteger majority-class prediction.
classCharacter; actual class label for this row.
target_classInteger; design-annotation class label.
class_nInteger; raw count of this class at this endpoint (empirical
n_{s,z}).endpoint_nInteger; total raw observations at this endpoint (empirical
n_s).marginal_class_nInteger; total raw observations of this class across all endpoints (empirical
N_z).marginal_total_nInteger; total classified observations across all endpoints (empirical
N).marginal_class_probabilityNumeric; empirical marginal class probability
\Pr(Z=z) = N_z / N.propensity_weightNumeric; empirical stabilized weight
n_s \cdot \Pr(Z=z) / n_{s,z}.Infwhenclass_n == 0.undefined_empiricalLogical;
TRUEwhenclass_n == 0.perfectly_predicted_endpointLogical;
TRUEwhen any class hasclass_n == 0at this endpoint.adjustedLogical;
TRUEwhen the one-hypothetical-observation adjustment was applied to this row.adjusted_class_nNumeric;
class_n + 1whereadjusted, otherwiseclass_n.adjusted_endpoint_nNumeric; endpoint denominator after adjustment.
adjusted_marginal_class_nNumeric; global class count after all hypothetical additions.
adjusted_marginal_total_nNumeric; global total after all hypothetical additions.
adjusted_marginal_class_probabilityNumeric; adjusted marginal class probability.
adjusted_propensity_weightNumeric; adjusted weight. Finite whenever
adjusted_class_n > 0.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
References
Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.
See Also
oda_cta_fit, cta_endpoint_counts,
cta_staging_table
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_propensity_weights(tree)
Staging table for a fitted CTA tree
Description
Returns one row per terminal endpoint ordered by ascending target-class
propensity (lowest to highest risk stratum). Empirical counts,
proportions, and odds are computed from the stored leaf class counts.
When an endpoint is perfectly predicted (100 percent one class), the
empirical odds and proportion are undefined; the adjust_perfect
option adds one hypothetical misclassified observation to the undefined
profile so all endpoints can be ranked and compared - a canon remedy
anchored in Yarnold and Linden (2017).
Scope: The two-class case is handled automatically when
target_class = NULL (defaults to the numerically larger class
label, typically 1). For trees with three or more classes
target_class must be supplied explicitly.
Usage
cta_staging_table(tree, target_class = NULL, weighted = FALSE,
adjust_perfect = TRUE)
Arguments
tree |
A |
target_class |
Integer (or coercible); the class label treated as
the target (positive / high-risk) class. |
weighted |
Logical. |
adjust_perfect |
Logical. |
Value
A data.frame with one row per terminal endpoint, ordered by
ascending target-class propensity (lowest to highest risk stratum),
with columns:
stageInteger rank 1..n, ascending by target proportion.
endpoint_idInteger sequential endpoint index, matching
cta_endpoint_summary.endpoint_node_idInteger tree node identifier.
pathCharacter; AND-joined branch labels from root.
terminal_predictionInteger majority-class prediction.
target_classInteger; the target class used for this table.
target_nNumeric; raw (or weighted) count of target-class observations at this endpoint.
denominatorNumeric; total raw (or weighted) observations at this endpoint.
target_proportionNumeric; empirical target-class proportion (
target_n / denominator).non_target_nNumeric; denominator minus target_n.
oddsNumeric; empirical odds (
target_n / non_target_n);NAwhenperfectly_predictedisTRUE.perfectly_predictedLogical;
TRUEwhen the endpoint is 100 percent one class (target_n == 0ornon_target_n == 0).adjustedLogical;
TRUEwhen the one-hypothetical-misclassification adjustment has been applied. AlwaysFALSEwhenadjust_perfect = FALSE.adjusted_target_nNumeric; target_n after adjustment. Equal to
target_nwhenadjustedisFALSE.adjusted_denominatorNumeric; denominator after adjustment.
adjusted_target_proportionNumeric; adjusted proportion.
adjusted_non_target_nNumeric; adjusted non-target count.
adjusted_oddsNumeric; adjusted odds.
weightedLogical; the value of the
weightedargument.n_obsInteger; raw observation count at this endpoint (from
cta_endpoint_summary).n_weightedNumeric; weighted observation count.
For a no-tree fit the returned data frame has zero rows but the correct column structure and types.
References
Yarnold PR, Linden A (2017). Computing propensity score weights for CTA models involving perfectly predicted endpoints. Optimal Data Analysis, 6, 43-46.
See Also
oda_cta_fit, cta_endpoint_summary,
cta_endpoint_counts, cta_propensity_weights
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
cta_staging_table(tree)
Number of terminal leaf endpoints in a CTA tree
Description
Returns the count of terminal leaf nodes in a fitted
cta_tree (an integer scalar, not a table). Returns
NA_integer_ for no-tree fits (where tree$no_tree is
TRUE).
Usage
cta_strata(tree)
Arguments
tree |
A |
Details
To obtain endpoint details (node IDs, path labels, class counts,
predicted class), use cta_endpoint_table.
Value
Integer scalar: number of terminal leaf nodes, or
NA_integer_ for no-tree fits. This is a count, not a
data frame - use cta_endpoint_table for per-endpoint rows.
See Also
cta_endpoint_table,
cta_endpoint_denominators,
cta_min_terminal_denom
Fit a Locally Optimal Recursive Tree (LORT)
Description
Preferred explicit entry point for the LORT workflow layer. LORT is a
non-canonical workflow composition: at each recursive endpoint it runs a
full MDSA family scan (cta_descendant_family), selects the
min-D member, and recurses until no further structure is found or a compute
guard fires. It uses canon CTA/MDSA components but is not itself a canon
CTA.exe behavior.
Usage
lort_fit(
X,
y,
w = NULL,
mc_iter = 5000L,
mc_seed = 42L,
mc_stop = 99.9,
mc_stopup = NA,
alpha_split = 0.05,
prune_alpha = 0.05,
loo = "stable",
min_n = 30L,
max_depth = 8L,
max_nodes = 31L,
family_max_steps = 20L,
verbose = FALSE
)
Arguments
X |
Data frame or matrix of candidate predictor columns. |
y |
Integer class variable vector. Must have exactly two distinct values. |
w |
Optional numeric case-weight vector. Default |
mc_iter |
Integer; maximum Monte Carlo iterations per node. Default |
mc_seed |
Integer or |
mc_stop |
Numeric; confidence bound for lower-tail early MC stopping
(percent). Default |
mc_stopup |
Numeric; confidence bound for upper-tail early MC stopping
(percent). Default |
alpha_split |
Numeric; node-level significance threshold. Default |
prune_alpha |
Numeric; pruning significance threshold. Default |
loo |
LOO gate mode per node: |
min_n |
Integer; minimum endpoint n to attempt recursion. Endpoints
smaller than |
max_depth |
Integer; safety cap on recursion depth. Nodes at
|
max_nodes |
Integer; safety cap on total ORT nodes. When node count
exceeds |
family_max_steps |
Integer; maximum MDSA family members evaluated at
each recursive node. Default |
verbose |
Logical; emit |
Details
lort_fit() is functionally equivalent to
cta_fit(..., recursive = TRUE). cta_fit(..., recursive = TRUE)
is retained as a legacy-compatible alias and will continue to work; prefer
lort_fit() for new code. SORT and GORT are reserved and not
implemented.
Value
A dual-tagged cta_ort / cta_tree object.
cta_ort-aware methods (predict.cta_ort,
print.cta_ort, summary.cta_ort, plot.cta_ort,
cta_ort_node_table) operate on the full composite tree.
ort_settings$method is always "lort".
See Also
cta_fit, predict.cta_ort,
cta_ort_node_table, ort_plot_data
Examples
X <- data.frame(
A = c(rep(0L, 20), rep(1L, 20), rep(1L, 20)),
B = c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
)
y <- c(rep(0L, 20), rep(0L, 20), rep(1L, 20))
fit <- lort_fit(X, y, mc_iter = 100L, mc_seed = 42L, loo = "off", min_n = 5L)
print(fit)
LORT path from root to a given node index
Description
Returns a data frame tracing the LORT recursion path from the root node (index 1) to the requested node, one row per LORT node on the path.
Usage
lort_index_path(x, index)
Arguments
x |
A |
index |
Integer; target LORT node index. |
Value
A data frame with columns:
lort_index, parent_lort_index, depth, n,
stop_reason, is_terminal,
incoming_endpoint_id (which endpoint of the parent led here),
incoming_path_condition (condition string for that endpoint),
incoming_path_label (human-readable label),
local_status, local_ess, local_d,
local_n_endpoints.
See Also
lort_local_tree, lort_path_table,
plot_lort_path
Extract the local CTA model embedded at a LORT node
Description
Returns the full cta_tree object selected at LORT node index.
This is the complete CTA/MDSA family member fitted on the observations that
reached that node – not a summary, not a stump approximation, not
reconstructed from plot-data.
Usage
lort_local_tree(x, index)
Arguments
x |
A |
index |
Integer; LORT node index. |
Details
Returns NULL when the node is terminal due to min_n,
max_depth, max_nodes, or a pure-class guard (no fit was
attempted). A no_tree result at a non-forced-terminal node yields
a cta_tree with $no_tree = TRUE.
Value
A cta_tree object or NULL (with a message for
forced-terminal nodes).
See Also
lort_index_path, plot_lort_path
Formatted path table for a LORT recursion path
Description
Prints and returns (invisibly) a summary of the LORT recursion path from the root node to the requested index. For each node on the path it shows the local CTA model's key metrics and the endpoint condition that led to the next recursive call.
Usage
lort_path_table(x, index)
Arguments
x |
A |
index |
Integer; target LORT node index. |
Value
Invisibly, the data frame from lort_index_path.
Printed output goes to stdout().
See Also
lort_index_path, lort_local_tree,
plot_lort_path
LORT terminal strata propensity weights
Description
Computes propensity weights from the terminal strata of a fitted LORT
(Locally Optimal Recursive Tree) model. Uses stored
class_counts per terminal node. Implements the Yarnold/Linden
stratum-weight formula (same as cta_propensity_weights):
w = n_s \times \Pr(Z=z) / n_{z,s}
Usage
lort_propensity_weights(ort, target_class = NULL, adjusted = TRUE)
Arguments
ort |
A |
target_class |
Integer target class for annotation (optional; if
|
adjusted |
Logical; if |
Details
The fitted model must have been trained with the treatment/exposure/group membership as the class variable, not a clinical outcome. The user is responsible for this labeling decision.
Value
Data frame with one row per (stratum, class) combination.
Columns: stratum_id (integer), path (character),
depth (integer), stratum_n (integer),
terminal_class (integer), class (character),
class_n (integer), target_class (integer),
marginal_class_n (integer), marginal_total_n (integer),
marginal_class_probability (numeric),
propensity_weight (numeric), undefined_empirical
(logical), adjusted (logical),
adjusted_propensity_weight (numeric), model_family
("lort"), global_optimization (FALSE),
sda_anchored (FALSE).
See Also
cta_propensity_weights,
oda_propensity_weights, lort_fit
Myeloma gene-expression dataset (CTA benchmark)
Description
A data frame with 256 observations and 19 variables, formatted for use
with cta_fit and oda_fit. Derived from the
publicly available myeloma gene-expression dataset (GEO accession GSE4581),
as distributed in the survminer package.
Format
A data frame with 256 rows and 19 columns:
- V1
Survival event indicator (0 = censored, 1 = event). Used as the class variable
yin CTA/ODA.- V2
Case weight (observation time in months). Use as
wincta_fit; rows with V2 == 0 should be excluded.- V3
CCND1 gene expression.
- V4
CRIM1 gene expression.
- V5
DEPDC1 gene expression.
- V6
IRF4 gene expression.
- V7
TP53 expression / mutation burden.
- V8
WHSC1 gene expression.
- V9
Molecular group: Cyclin D-1 (binary).
- V10
Molecular group: Cyclin D-2 (binary).
- V11
Molecular group: Hyperdiploid (binary).
- V12
Molecular group: Low bone disease (binary).
- V13
Molecular group: MAF (binary).
- V14
Molecular group: MMSET (binary).
- V15
Molecular group: Proliferation (binary).
- V16
Chr1q21 status: 2 copies (binary).
- V17
Chr1q21 status: 3 copies (binary).
- V18
Chr1q21 status: 4+ copies (binary).
- V19
Chr1q21 status: NA-coded (binary). Missing values are coded as -9 (
miss_codes = -9).
Details
This dataset is used throughout the oda documentation and vignettes to illustrate weighted CTA, MINDENOM constraints, LOO STABLE validation, and missing-code handling. Reference CTA.exe golden outputs for MINDENOM = 1, 30, and 56 are used as regression anchors.
Use miss_codes = -9 and w = myeloma$V2 when calling
cta_fit. With mindenom = 1, the enumerated CTA tree roots
at V14 with a V15 child (OVERALL ESS = 26.32%, WEIGHTED ESS = 27.69%).
With mindenom = 30, the selected tree is a V17 stump
(WEIGHTED ESS = 16.51%). With mindenom = 56, no admissible
tree exists.
Source
Derived from the myeloma dataset in the survminer package.
Original data: NCBI GEO accession GSE4581. No PHI; no institutional data.
See tests/testthat/fixtures/myeloma/README.md in the source tree.
Construct a CTA descendant family object (internal)
Description
Container for MDSA descendant family members built by
new_cta_family_member. Used internally by
cta_descendant_family.
Usage
new_cta_family(
members = list(),
mindenoms = integer(0L),
summary = data.frame(),
min_d_idx = NA_integer_,
terminated = FALSE,
termination_reason = NA_character_
)
Arguments
members |
A list of objects from |
mindenoms |
Integer vector of MINDENOM values, same length as
|
summary |
Data frame summary with one row per member. |
min_d_idx |
Integer index of the feasible member with minimum D;
|
terminated |
Logical; |
termination_reason |
Character: one of |
Value
A list of class cta_family.
Construct a single MDSA descendant family member (internal)
Description
Collects tree-level metadata for one fitted CTA tree in an MDSA chain.
d and overall_ess are read directly from the supplied
tree object; both will be NA for no-tree fits.
Usage
new_cta_family_member(mindenom, tree)
Arguments
mindenom |
Integer MINDENOM used to fit |
tree |
A |
Value
A named list with fields: mindenom, no_tree,
tree, strata, endpoint_denominators,
min_terminal_denom, has_weights, overall_ess,
d, next_mindenom.
Novometric bootstrap CI from a fixed 2x2 confusion matrix
Description
Estimates the precision of an observed binary classification effect by comparing model and chance distributions via permutation/resampling bootstrap. Based on the NOVOboot methodology (Yarnold 2020; Yarnold & Soltysik 2016).
Fixed-confusion bootstrap: This function samples from the observed confusion matrix structure. It does not refit ODA or CTA models and does not estimate model-selection variability. The model distribution is generated by resampling paired (actual, predicted) rows from the expanded confusion table; the chance distribution is generated by independently resampling actual and predicted labels, breaking their association. Novometric significance (Axiom 1) is declared when the 95% confidence intervals for model and chance ESS do not overlap.
Usage
novo_boot_ci(x, ...)
## Default S3 method:
novo_boot_ci(x,
nboot = 5000L,
seed = NULL,
sample_frac = 0.5,
probs = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
alternative = c("two.sided", "greater", "less"),
...)
## S3 method for class 'oda_fit'
novo_boot_ci(x,
nboot = 5000L,
seed = NULL,
sample_frac = 0.5,
probs = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
alternative = c("two.sided", "greater", "less"),
...)
## S3 method for class 'cta_tree'
novo_boot_ci(x,
nboot = 5000L,
seed = NULL,
sample_frac = 0.5,
probs = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
alternative = c("two.sided", "greater", "less"),
node_id = NULL,
weighted = FALSE,
...)
## S3 method for class 'cta_ort'
novo_boot_ci(x,
nboot = 5000L,
seed = NULL,
sample_frac = 0.5,
probs = c(0, .025, .05, .25, .5, .75, .95, .975, 1),
alternative = c("two.sided", "greater", "less"),
stratum_id = NULL,
weighted = FALSE,
...)
## S3 method for class 'novo_boot_ci'
print(x, ...)
Arguments
x |
For the |
nboot |
Number of bootstrap replicates. Default 5000. |
seed |
Integer seed passed to |
sample_frac |
Fraction of |
probs |
Quantile probability levels for the summary table. |
alternative |
Direction for exact Fisher p-values:
|
node_id |
Integer node id of a terminal (leaf) node in a
|
stratum_id |
Integer stratum id from |
weighted |
Logical. When |
... |
For the generic and S3 fit methods: additional arguments
passed to |
Details
Model distribution: The input confusion matrix is expanded to
n paired (actual, predicted) observation rows. For each replicate,
k row indices are drawn with replacement, preserving the observed
(actual, predicted) joint distribution. This mirrors the NOVOboot
row-resampling approach.
Chance distribution: Actual and predicted labels are resampled independently for each replicate, breaking any association between them. This generates the null distribution against which the model effect is compared.
p-values: An exact 2x2 Fisher p-value is computed for every replicate confusion matrix for both model and chance distributions. These form precision distributions and complement the CI non-overlap criterion; they are not a substitute for it.
Novometric Axiom 1: A statistically significant effect exists when
the exact discrete confidence intervals for model and chance performance do
not overlap. significant = TRUE indicates the ESS model 95% CI lies
entirely above the ESS chance 95% CI.
ESS formula: ESS(%) = 100 * (mean_PAC - 0.5) / 0.5,
consistent with oda_ess_from_meanpac.
OR: Diagnostic odds ratio (TP * TN) / (FP * FN). NA
when FP = 0 or FN = 0 in a replicate.
RR: Positive predictive value / false omission rate
[TP / (TP+FP)] / [FN / (FN+TN)]. NA when undefined.
Value
An object of class novo_boot_ci, a list with:
callThe matched call.
confusionInput confusion matrix (integer, 2x2).
nTotal observations (
sum(x)).kObservations sampled per replicate (
round(sample_frac * n)).nboot,sample_frac,probs,alternativeInput parameters.
has_zero_cellsLogical;
TRUEif any cell ofxis zero. Does not stop computation; NA propagates for affected metrics in affected replicates.observedData frame with one row per metric. Columns:
metric,value. Reports the observed (not bootstrapped) sensitivity, specificity, mean_pac, ess, odds_ratio, and risk_ratio computed directly from the input confusion matrix.modelData frame (
nbootrows). Per-replicate model bootstrap distributions:sensitivity,specificity,mean_pac,ess(all in %),odds_ratio,risk_ratio,p_value.NAfor undefined OR/RR.chanceData frame (
nbootrows). Same columns asmodel. Generated by independently resampling actual and predicted labels (null of no classification association).quantilesData frame (
length(probs)rows). Quantiles of each metric for model and chance across all replicates, includingp_value_modelandp_value_chance.ciData frame (one row per metric). Fixed 95% CI bounds (2.5th and 97.5th percentiles) for model and chance. Columns:
metric,model_lower,model_upper,chance_lower,chance_upper,overlap.significantLogical scalar.
TRUEif the ESS model 95% CI lower bound exceeds the ESS chance 95% CI upper bound - novometric Axiom 1 CI non-overlap criterion.source_typeCharacter. Evidence provenance tag:
"matrix","oda_fit","cta_tree","cta_tree_node","cta_ort", or"cta_ort_stratum".source_idInteger or
NA. Node or stratum id when evidence came from a specific sub-unit;NAfor full-tree paths.weightedLogical or
NA.TRUEwhen weighted class counts were used;FALSEfor raw counts;NAfor the default matrix path.
References
Yarnold PR (2020). Reformulating the First Axiom of Novometric Theory: Assessing Minimum Sample Size in Experimental Design. Optimal Data Analysis 9, 7–8.
Yarnold PR, Soltysik RC (2016). Maximizing Predictive Accuracy. ODA Books.
Examples
# Myeloma MINDENOM=1 confusion (actual x predicted, byrow = TRUE)
conf <- matrix(c(146, 40,
36, 33), nrow = 2, byrow = TRUE)
ci <- novo_boot_ci(conf, nboot = 200L, seed = 42L)
ci$significant
print(ci)
Apply primary/secondary tie-breaking to candidates
Description
Apply primary/secondary tie-breaking to candidates
Usage
oda_apply_primary_secondary(cand_df, primary, secondary, y, w, preds_list)
ODA covariate balance evidence-interval table
Description
Builds one row per covariate \times analysis scale containing the
observed ESS/WESS, a bootstrap confidence interval (model sampling
variability), and a chance interval (null distribution from group-label
permutation). The resulting table answers whether each covariate's model
confidence interval clears the chance interval.
Usage
oda_balance_effect_table(
group,
X,
w = NULL,
compare_weights = FALSE,
covariate_types = NULL,
nboot = 2000L,
chance_iter = 2000L,
ci = 0.95,
mc_seed = NULL,
mc_iter = 1000L,
...
)
Arguments
group |
Integer (or coercible) binary group indicator with exactly two
distinct non-missing values. Plays |
X |
Data frame of baseline covariate columns ( |
w |
Optional numeric case-weight vector. When supplied, weighted ODA
is used and |
compare_weights |
Logical; when |
covariate_types |
Optional named character vector mapping column names
to ODA attribute types. Unmapped columns use |
nboot |
Integer; number of bootstrap resamples. Default |
chance_iter |
Integer; number of group-label permutations for the
null interval. Default |
ci |
Numeric; nominal coverage for both intervals. Default
|
mc_seed |
Integer RNG seed set once at function entry. Controls all
bootstrap and permutation sampling deterministically. |
mc_iter |
Integer; MC iterations passed to the observed |
... |
Additional arguments forwarded to each |
Details
Three passes are run per covariate:
-
Observed:
oda_fit(mcarlo = TRUE)– point estimate and Monte Carlo p-value. -
Bootstrap:
nbootresamples (rows with replacement),mcarlo = FALSE– percentile confidence interval. -
Chance:
chance_itergroup-label permutations,mcarlo = FALSE– null percentile interval.
When compare_weights = TRUE and w is supplied, both an
"unweighted" and a "weighted" row are produced per covariate.
Multiplicity corrections (Sidak, Bonferroni) are applied within each
analysis scale across covariates.
Interpretation:
-
balanced_by_interval = TRUE: model bootstrap CI overlaps the chance interval (boot_lo <= chance_hi) – no evidence of residual imbalance for this covariate. -
residual_imbalance = TRUE: model CI clears chance (boot_lo > chance_hi) – residual imbalance detected.
Value
A list of class "oda_balance_effect_table" with:
rowsData frame; one row per covariate
\timesanalysis scale. Columns:attribute,analysis,metric,estimate,boot_lo,boot_hi,chance_lo,chance_hi,p_mc,p_sidak,p_bonferroni,rule_summary,sensitivity,specificity,n_total,balanced_by_interval,residual_imbalance.metaList of metadata:
n_covariates,n_obs,has_weights,compare_weights,analyses,nboot,chance_iter,ci,mc_iter,mc_seed.
References
Linden A, Yarnold PR (2016). Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice, 22(6), 861-867.
See Also
oda_balance_table, plot_oda_balance_effects
Examples
set.seed(1)
group <- c(rep(0L, 30), rep(1L, 30))
X <- data.frame(
age = c(rnorm(30, 45, 8), rnorm(30, 55, 8)),
score = rnorm(60, 50, 10)
)
et <- oda_balance_effect_table(group, X,
nboot = 50L, chance_iter = 50L,
mc_iter = 200L, mc_seed = 1L)
et$rows[, c("attribute", "estimate", "boot_lo", "boot_hi",
"chance_lo", "chance_hi", "balanced_by_interval")]
Renderer-ready plot data for univariate ODA covariate balance
Description
Transforms an oda_balance_table result (and optionally an
smd_balance_table result) into a renderer-independent data
structure suitable for Graphics v3 plotting.
Usage
oda_balance_plot_data(
balance_table,
smd_table = NULL,
p_col = c("p_mc", "p_sidak", "p_bonferroni"),
rank_by = c("abs_ess", "p", "abs_smd")
)
Arguments
balance_table |
An |
smd_table |
Optional |
p_col |
Character; which p-value column to use for the |
rank_by |
Character; how to rank covariates for display order.
|
Details
This function does not fit any ODA models and does not accept
group or X arguments. It is a pure transformation of
pre-computed balance tables.
Value
A list of class "oda_balance_plot_data" with elements:
rowsData frame; one row per covariate, sorted by
rank_by. Columns:attribute,attr_type,ess_display,ess_display_bar(clipped to [0, 100]),p_plot(selected p column),significant,significance_label("*"or""),rule_summary,abs_smd,wsmd_available,abs_smd_display(weighted if active),fit_ok,rank.has_weightsLogical.
ess_labelCharacter;
"WESS"or"ESS".p_col_usedCharacter; selected p column name.
alphaNumeric; significance threshold from metadata.
n_covariatesInteger.
n_significantInteger; covariates significant on
p_col_used.rank_byCharacter.
See Also
oda_balance_table, smd_balance_table
Examples
set.seed(1)
group <- c(rep(0L, 30), rep(1L, 30))
X <- data.frame(age = c(rnorm(30, 45, 8), rnorm(30, 55, 8)),
score = rnorm(60, 50, 10))
bt <- oda_balance_table(group, X, mcarlo = TRUE, mc_iter = 500L)
smd <- smd_balance_table(group, X)
pd <- oda_balance_plot_data(bt, smd_table = smd)
pd$rows[, c("attribute", "ess_display", "p_plot", "significant", "abs_smd")]
Univariate ODA covariate balance diagnostics
Description
Fits a univariate ODA model for each covariate in X with
group as the class variable. Returns one row per covariate
summarizing ODA-based balance diagnostics: rule, sensitivity, specificity,
Mean PAC, ESS/WESS, and permutation p-value with Sidak and Bonferroni
multiplicity corrections.
Usage
oda_balance_table(
group,
X,
w = NULL,
covariate_types = NULL,
loo = "off",
mcarlo = TRUE,
mc_iter = 1000L,
alpha = 0.05,
adjust = c("none", "sidak", "bonferroni"),
...
)
Arguments
group |
Integer (or coercible) binary group indicator. Must have
exactly two distinct non-missing values. Plays the role of the class
variable ( |
X |
Data frame of baseline covariate columns. Plays the role of
attributes ( |
w |
Optional numeric case-weight vector (length |
covariate_types |
Optional named character vector mapping column names
to ODA attribute types ( |
loo |
LOO gate mode passed to each |
mcarlo |
Logical; run Monte Carlo permutation p-value? Default
|
mc_iter |
Integer; maximum MC iterations per covariate. Default
|
alpha |
Numeric significance threshold for the |
adjust |
Character; which p-value drives the primary |
... |
Additional arguments forwarded to each |
Details
Balance asks whether group membership (treatment, exposure, or study arm) can be predicted from observed baseline covariates. When no covariate predicts group membership above chance, the groups are considered balanced on those covariates under the declared analytic constraints.
group vs. outcome: group is the binary class variable in
every ODA call. The scientific outcome of interest is strictly out of
scope; do not pass the outcome as group or as a column of X.
SMD: conventional standardized mean difference is a companion
diagnostic, not the oda balance objective. Use
smd_balance_table for the conventional companion table.
Value
A list of class "oda_balance_table" with elements:
rowsData frame; one row per covariate. Key columns:
attribute,attr_type,n_total,n_group_0,n_group_1,sensitivity,specificity,mean_pac,ess,wess,ess_display(operative measure),p_mc,p_sidak,p_bonferroni,significant_raw,significant_sidak,significant_bonferroni,significant(driven byadjust),rule_type,rule_summary,loo_status,ess_loo,has_weights,fit_ok,fit_reason.metaList of metadata:
n_covariates,n_obs,has_weights,ess_label,alpha,adjust,k_valid(number of covariates with valid p_mc used for multiplicity correction),loo_mode,mcarlo,mc_iter.
References
Linden A, Yarnold PR (2016). Using machine learning to assess covariate balance in matching studies. Journal of Evaluation in Clinical Practice, 22(6), 861-867.
See Also
smd_balance_table, oda_balance_plot_data,
oda_fit
Examples
set.seed(1)
n <- 60
group <- c(rep(0L, 30), rep(1L, 30))
X <- data.frame(
age = c(rnorm(30, 45, 8), rnorm(30, 55, 8)), # imbalanced
score = rnorm(60, 50, 10) # balanced
)
bt <- oda_balance_table(group, X, mcarlo = TRUE, mc_iter = 500L)
bt$rows[, c("attribute", "ess_display", "p_mc", "significant_raw")]
Select the best K-segment ordered partition by MegaODA spec: PRIMARY -> SECONDARY -> FIRST IDENTIFIED (enum order via tick()).
Description
Select the best K-segment ordered partition by MegaODA spec: PRIMARY -> SECONDARY -> FIRST IDENTIFIED (enum order via tick()).
Usage
oda_best_ordered_multiclass_partition(
x_rep,
counts_obj,
counts_raw,
K,
priors_on_eff = TRUE,
degen = FALSE,
primary = NULL,
secondary = NULL,
cut_value_mode = c("midpoint", "lower", "upper"),
debug_return_ties = FALSE,
debug_max_ties = 200L,
direction = "off"
)
Arguments
x_rep |
Representative x value per unique block. |
counts_obj |
m x C count matrix in objective (priors-weighted) space. |
counts_raw |
m x C count matrix in raw (case-weight) space. |
K |
Number of segments (cuts = K-1). |
priors_on_eff |
Logical. |
degen |
Allow degenerate solutions? |
primary, secondary |
Heuristic strings (NULL = spec defaults). |
cut_value_mode |
"midpoint", "lower", or "upper". |
debug_return_ties |
Return all primary-tied candidates for diagnostics. |
debug_max_ties |
Cap on number of ties stored. |
direction |
Directional constraint (MPE Chapter 4 ordered DIRECTIONAL). "ascending" forces segment s to map to class s; "descending" forces class C+1-s. Default "off" (nondirectional; all assignments evaluated). |
Value
List with ok, cuts_idx, cut_values, seg_cls_idx, primary_obj, secondary_obj, best_enum_id, ties, classes.
Replace missing-code values with NA
Description
Replaces all values in miss_codes with replacement (default
NA). Accepts a numeric vector or a data frame. Does not modify
the class variable or weight vector — pass those separately if needed.
Usage
oda_clean_missing_codes(X, miss_codes, replacement = NA)
Arguments
X |
Numeric vector or data frame of predictors. |
miss_codes |
Numeric vector of values to treat as missing (e.g.
|
replacement |
Replacement value (default |
Value
Object of the same class and dimensions as X with
miss_codes values replaced.
Retrieve a confusion matrix from a fitted ODA model
Description
Retrieve a confusion matrix from a fitted ODA model
Usage
oda_confusion(fit, split = c("train", "loo"), weighted = FALSE)
Arguments
fit |
An |
split |
One of |
weighted |
Logical; if |
Value
The confusion object stored on the fit, or NULL.
Binary confusion table
Description
Compute a weighted binary confusion table from actual and predicted labels.
Usage
oda_confusion_binary(y, y_pred, w = NULL)
Arguments
y |
Actual class labels (0/1 integer). |
y_pred |
Predicted class labels (0/1 integer). |
w |
Optional numeric weights. Default: unit weights. |
Value
Named list with integer count fields TP, TN, FP,
FN (weighted sums), and rate fields sensitivity,
specificity (proportions in [0, 1]), and mean_pac
(proportion in [0, 1]).
Note: With unit weights these are raw integer counts. With
prior-odds weights (from oda_univariate_core with
priors_on = TRUE) they are weighted counts, not raw integers.
Multiclass confusion matrix
Description
Compute a weighted multiclass confusion matrix with PAC and PV summaries.
Usage
oda_confusion_multiclass(y, y_pred, w = NULL)
Arguments
y |
Actual integer class labels. |
y_pred |
Predicted integer class labels. |
w |
Optional numeric weights. Default: unit weights. |
Value
Named list:
confusionC x C numeric matrix of weighted counts. With unit weights these are raw integer observation counts. Rows are actual classes; columns are predicted classes.
correctTotal weighted count of correct classifications.
overall_accOverall accuracy as a proportion [0, 1].
pac_by_classPer-class sensitivity as proportions [0, 1].
mean_pacMean sensitivity across classes, proportion [0,1].
pv_by_classPer-class predictive value, proportions [0, 1].
mean_pvMean predictive value, proportion [0, 1].
Fit a Classification Tree Analysis (CTA) model (internal engine)
Description
Internal CTA engine name retained for backward compatibility.
Users should prefer cta_fit() as the public entry point.
Builds a classification tree by recursively applying ODA at each node. At each split, all attributes are evaluated and the attribute with the highest ESS passing the significance threshold is selected. Matches MegaODA CTA behaviour including MINDENOM, PRUNE, ENUMERATE, LOO STABLE, and WEIGHT parameters.
Usage
oda_cta_fit(X, y, w = NULL, priors_on = TRUE, miss_codes = NULL,
alpha_split = 0.05, mindenom = 5L, prune_alpha = 1.0,
max_depth = 10L, ess_min = 0,
mc_iter = 25000L, mc_target = 0.05, mc_stop = 99.9, mc_stopup = NULL,
mc_seed = NULL, loo = "off", attr_names = NULL, K_segments = NULL,
verbose = FALSE, diag_env = NULL)
Arguments
X |
Data frame or matrix of attribute columns. |
y |
Class variable vector. |
w |
Optional numeric case weights (MegaODA WEIGHT). Same length as y. |
priors_on |
Use prior-odds weighting at each node. Default TRUE. |
miss_codes |
Numeric vector of missing-value codes (MegaODA MISSING). |
alpha_split |
Significance threshold to split a node (MegaODA MC CUTOFF). Default 0.05. |
mindenom |
Minimum weighted node size to attempt a split (MegaODA MINDENOM). Default 5. |
prune_alpha |
Branches with p >= prune_alpha are not grown (MegaODA PRUNE). Default 1.0 = no pruning (unpruned tree). |
max_depth |
Maximum tree depth. Default 10. |
ess_min |
Minimum ESS required to split. Default 0. |
mc_iter |
Maximum MC iterations per node. Default 25000. |
mc_target, mc_stop, mc_stopup |
MC stopping parameters. |
mc_seed |
Base RNG seed; each node uses mc_seed + node_id * 1000 + attr_j. |
loo |
LOO mode per node: |
attr_names |
Attribute names. Defaults to column names of X. |
K_segments |
Segments for multiclass ordered splits. Default = C. |
verbose |
Logical; if |
diag_env |
Internal diagnostic environment used to collect CTA timing
and Monte Carlo instrumentation. Intended for development diagnostics only;
leave as |
Value
An object of class cta_tree containing:
nodesNamed list of node objects, each with fields: node_id, parent_id, depth, n_obs, n_weighted, attribute, rule, ess, p_mc, loo_status, loo_ess, confusion, child_ids, split_labels, majority_class, leaf.
root_idInteger ID of the root node.
n_nodesTotal number of nodes grown.
Use predict.cta_tree to classify new data and
cta_node_table to extract the node summary table.
See Also
predict.cta_tree, cta_node_table
Examples
## Binary CTA on mtcars
data(mtcars)
mt <- mtcars
X <- mt[, c("cyl","disp","hp","wt")]
y <- as.integer(mt$am)
tree <- oda_cta_fit(X, y, alpha_split = 0.05, mindenom = 5L,
mc_iter = 500L, mc_seed = 42L)
print(tree)
preds <- predict(tree, X)
mean(preds == y) # training accuracy
Compute the D statistic for a fitted ODA model
Description
D measures the distance between a model's classification accuracy (ESS) and
chance, expressed relative to the number of terminal prediction strata.
Formula: D = \frac{100}{ESS / strata} - strata, where strata
counts terminal prediction endpoints only.
Usage
oda_d_stat(fit)
Arguments
fit |
An |
Details
Supported rule types and strata definitions:
Binary (
oda_fit_binary): strata = 2, ESS =fit$ess.Multiclass ordered (
multiclass_orderedrule): strata =length(fit$rule$seg_classes), ESS =fit$ess.Multiclass nominal/categorical: returns
NA_real_(strata count is ambiguous without additional canon specification).Failed fit (
ok = FALSE): returnsNA_real_.
Value
A scalar numeric D value, or NA_real_ when the fit
failed or the rule type does not have an unambiguous strata count.
Enforce weighting policy for multiclass ODA
Description
Enforce weighting policy for multiclass ODA
Usage
oda_enforce_weighting_policy(
attr_type_res,
priors_on,
loo,
w_case,
reason_prefix = ""
)
ESS from mean metric for a C-class problem
Description
Compute Effect Strength for Sensitivity from mean PAC or mean PV for a problem with C classes.
Usage
oda_ess_from_mean(mean_metric, C)
Arguments
mean_metric |
Mean PAC or mean PV as a proportion [0, 1]. |
C |
Number of classes. |
Value
ESS as a percentage [0, 100]. Chance baseline is 1/C.
Effect Strength for Sensitivity from mean PAC
Description
Compute ESS (Effect Strength for Sensitivity) in percent, scaled against a chance baseline.
Usage
oda_ess_from_meanpac(mean_pac, chance)
Arguments
mean_pac |
Mean PAC as a proportion [0, 1]. |
chance |
Chance baseline as a proportion (e.g. 0.5 for 2-class). |
Value
ESS as a percentage [0, 100].
Fit an ODA model
Description
Unified entry point for Optimal Data Analysis. Dispatches to the binary-class engine when the outcome has exactly two distinct values, or the multiclass engine for three or more classes. This is the function CTA nodes call at each split candidate.
Usage
oda_fit(x, y, w = NULL,
attr_type = c("auto","ordered","categorical","binary"),
priors_on = TRUE, K_segments = NULL, degen = FALSE,
miss_codes = NULL, missing_code = NULL,
mcarlo = TRUE, mc_iter = 25000L, mc_target = 0.05,
mc_stop = 99.9, mc_stopup = NA, mc_seed = NULL,
loo = "off",
boundary_mode = c("megaoda_halfopen","right_closed"),
eval_order = c("mc_then_loo","loo_then_mc"),
mindenom = 1L,
direction = c("both","off","greater","less","ascending","descending"),
direction_map = NULL)
Arguments
x |
Attribute values (numeric, factor, character, or logical). |
y |
Class labels; must have 2 or 3+ distinct values. |
w |
Optional numeric case weights. Default: unit weights. These are
economic or importance weights, distinct from prior-odds weighting which
is controlled by |
attr_type |
Attribute measurement type: |
priors_on |
Logical; if |
K_segments |
Number of segments for multiclass ordered models.
Default equals the number of classes |
degen |
Logical; if |
miss_codes |
Numeric vector of values to treat as missing (excluded from analysis). |
missing_code |
Scalar alias for |
mcarlo |
Logical; run Monte Carlo Fisher-randomization p-value?
Default |
mc_iter |
Maximum Monte Carlo iterations. Default 25000. |
mc_target |
Significance threshold for STOP early stopping. Default 0.05. |
mc_stop |
Confidence level (percent) for lower-tail STOP. Default 99.9. |
mc_stopup |
Confidence level (percent) for upper-tail STOPUP. Default NA (disabled; matches MegaODA behavior). |
mc_seed |
Optional integer RNG seed for reproducibility. |
loo |
LOO mode. |
boundary_mode |
Boundary convention for multiclass ordered rules.
Default |
eval_order |
Controls whether Monte Carlo testing is run before LOO
validation or whether eligible ordered-cut LOO stability is checked
before Monte Carlo. The default |
mindenom |
Minimum raw observation count required in each child node for a candidate cut to be evaluated. Default 1 (no enforcement). |
direction |
Directional hypothesis control.
|
direction_map |
Named integer vector for categorical fixed-partition
DIRECTIONAL (MPE Chapter 4). Names are attribute levels (character); values
are predicted class labels. All attribute levels must be covered exactly once
with at least two distinct target classes. When supplied, ODA evaluates only
the specified mapping and skips the partition search. For binary class, values
should be the original class labels (recoded to 0/1 internally). For
multiclass, values should be 1..C class labels. Compatible with
|
Value
A named list with components:
okLogical;
TRUEif a valid model was found.reasonCharacter reason string if
ok = FALSE.ruleThe fitted rule (list; structure depends on
attr_typeand engine).n_effNumber of observations used (after missing removal).
essEffect Strength for Sensitivity (percent), scaled 0–100.
pacPercentage Accuracy in Classification (training).
p_mcMonte Carlo p-value, or
NAifmcarlo = FALSE.looLOO results list, or
NULLifloo = "off".engineCharacter;
"binary"or"multiclass".confusionConfusion table. For the binary engine this is a list with integer counts
TP,TN,FP,FNplussensitivityandspecificityas proportions [0,1]. For the multiclass engine this is a numeric matrix of (possibly weighted) counts.
Examples
## Binary (C = 2)
x <- c(1,2,3,4,5,6,7,8)
y <- c(0L,0L,0L,0L,1L,1L,1L,1L)
fit <- oda_fit(x, y, mcarlo = FALSE)
fit$ok
fit$rule$cut_value
## Multiclass (C = 3)
x3 <- c(1,2,3,4,5,6,7,8,9)
y3 <- c(1L,1L,1L,2L,2L,2L,3L,3L,3L)
fit3 <- oda_fit(x3, y3, mcarlo = FALSE)
fit3$rule$cut_values
fit3$rule$seg_classes
Infer attribute types from a predictor data frame
Description
Uses the same type-inference logic as oda_fit() (“auto”
mode) to report the likely ODA attribute type for each column.
Usage
oda_infer_attr_types(X, miss_codes = NULL)
Arguments
X |
Data frame of predictors. |
miss_codes |
Numeric vector of missing-code values to exclude when
counting unique levels (default |
Value
Data frame with one row per column in X:
attribute (character), inferred_type (one of
"ordered", "categorical", "binary"),
n_unique (integer, excluding miss_codes and NA),
n_missing (integer, NA count),
n_miss_code (integer, miss_code hit count).
See Also
oda_fit, oda_clean_missing_codes
Leave-one-out cross-validation for ordered multiclass ODA.
Description
Leave-one-out cross-validation for ordered multiclass ODA.
Usage
oda_loo_multiclass_ordered(
x,
y,
w0,
priors_on_eff,
degen,
K_segments,
miss_codes = NULL,
cut_value_mode = c("midpoint", "lower", "upper"),
grid_mode = c("fixed", "refit"),
boundary_mode = c("megaoda_halfopen", "right_closed"),
loo_use_samplerep = FALSE,
loo_return_folds = FALSE,
loo_priors_mode = c("fold", "global")
)
Arguments
x, y |
Attribute and class vectors. |
w0 |
Raw case weights. |
priors_on_eff |
Logical. |
degen |
Logical. |
K_segments |
Number of segments. |
miss_codes |
Optional missing codes. |
cut_value_mode |
"midpoint","lower","upper". |
grid_mode |
"refit" (true per-fold rebuild) or "fixed" (global grid). |
boundary_mode |
"megaoda_halfopen" or "right_closed". |
loo_use_samplerep |
Include samplerep in fold selection. |
loo_return_folds |
Return per-fold rules and debug info. |
loo_priors_mode |
"fold" (renorm each fold) or "global" (global wts). |
Value
List with allowed, confusion_raw, confusion_weighted, y_pred, and optional fold_rule, fold_debug, fold_best_enum_id.
Monte Carlo Fisher-randomization p-value with Clopper-Pearson early stopping.
Description
Monte Carlo Fisher-randomization p-value with Clopper-Pearson early stopping.
Usage
oda_mc_p_value(
x,
y,
w = NULL,
attr_type,
priors_on,
primary,
secondary,
miss_codes = NULL,
chance_model = c("class", "attribute"),
mc_iter = 25000L,
mc_target = 0.05,
mc_stop = 99.9,
mc_stopup = NA,
mc_adjust = FALSE,
seed = NULL,
ess_obs = NULL,
direction = c("both", "off", "greater", "less"),
direction_map = NULL
)
Arguments
x, y, w |
Data for the current attribute (already cleaned). |
attr_type |
"ordered", "categorical", or "binary". |
priors_on |
Logical. |
primary, secondary |
Tie-break heuristic strings. |
miss_codes |
Optional numeric vector of additional missing codes. |
chance_model |
"class" (1/2) or "attribute" (1/k_attr). |
mc_iter |
Maximum iterations. |
mc_target |
Significance threshold (e.g. 0.05). |
mc_stop |
Confidence level for lower-tail stop (e.g. 99.9). |
mc_stopup |
Confidence level for upper-tail stop (e.g. 20 -> 0.20). Default NA (disabled). |
mc_adjust |
Kept for API compatibility; not used. |
seed |
Optional RNG seed. |
ess_obs |
Observed ESS (must be supplied). |
direction |
Directional constraint forwarded from oda_univariate_core(): "both" (canonical non-directional default), "off" (synonym for "both"), "greater", or "less". Each permutation refit uses the same constraint. |
direction_map |
Named integer vector for categorical fixed-partition DIRECTIONAL. When supplied, each permutation evaluates the SAME fixed mapping on permuted y labels. Default NULL. |
Value
List with p_mc, ge_count, iter_used, ess_obs.
Monte Carlo p-value for multiclass ODA
Description
Monte Carlo p-value for multiclass ODA
Usage
oda_mc_p_value_multiclass(
x,
y,
w,
attr_type,
priors_on,
degen,
K_segments,
mc_iter = 25000L,
mc_target = 0.05,
mc_stop = 99.9,
mc_stopup = NA,
mc_adjust = FALSE,
seed = NULL,
observed_mean_pac,
direction = "off",
direction_map = NULL
)
Mean PAC from sensitivity and specificity
Description
Compute mean Percentage Accuracy in Classification.
Usage
oda_mean_pac(sens, spec)
Arguments
sens |
Sensitivity (proportion [0, 1]). |
spec |
Specificity (proportion [0, 1]). |
Value
Mean PAC as a proportion [0, 1].
Retrieve scalar performance metrics from a fitted ODA model
Description
Returns a list of scalar metrics present on the fit. No quantities are
recomputed; absent fields appear as NA_real_. LOO p-value uses 2x2
Fisher exact when stored and available (p_status = "computed"); if
the value is absent or NA the status is "not_computed" with an
explicit reason. Multiclass/polychotomous LOO always returns
p_status = "not_computed".
Usage
oda_metrics(fit, split = c("train", "loo"))
Arguments
fit |
An |
split |
One of |
Value
Named list of scalar metrics.
Compute PAC, SAMPLEREP, and other metrics for a partition
Description
Compute PAC, SAMPLEREP, and other metrics for a partition
Usage
oda_metrics_candidate(
NP_obj,
NP_raw,
NA_raw,
x_rep,
cuts_idx,
priors_on_eff = TRUE
)
Fit a univariate multiclass ODA model
Description
Low-level engine for multiclass (C >= 3) Optimal Data Analysis. Handles
ordered and categorical attributes. Most users should call
oda_fit instead.
Usage
oda_multiclass_unioda_core(x, y, w = NULL,
attr_type = c("auto","ordered","categorical","binary"),
priors_on = TRUE, miss_codes = NULL, missing_code = NULL,
K_segments = NULL, degen = FALSE,
mcarlo = TRUE, mc_iter = 25000L, mc_target = 0.05,
mc_stop = 99.9, mc_stopup = NA, mc_adjust = FALSE, mc_seed = NULL,
loo = c("off","on"),
boundary_mode = c("megaoda_halfopen","right_closed"),
loo_opts = list(),
direction = "off", direction_map = NULL)
Arguments
x |
Attribute values (numeric or factor). |
y |
Integer class labels (will be re-coded to 1..C internally). |
w |
Optional numeric case weights. |
attr_type |
Attribute type. |
priors_on |
Inverse-frequency weighting. |
miss_codes |
Additional missing codes (scalar or vector). |
missing_code |
Alias for |
K_segments |
Number of segments; default = C. |
degen |
Allow degenerate solutions? |
mcarlo |
Run Monte Carlo p-value? |
mc_iter, mc_target, mc_stop, mc_stopup, mc_adjust, mc_seed |
MC parameters. |
loo |
|
boundary_mode |
Boundary convention for ordered cut values. |
loo_opts |
Named list of LOO options passed to the LOO engine. |
direction |
Directional constraint (MPE Chapter 4). |
direction_map |
Named integer vector for categorical fixed-partition
DIRECTIONAL. Names are attribute levels; values are class labels 1..C.
When supplied, bypasses the partition search and evaluates only the
specified mapping. Default |
Value
Named list. Key fields: ok, rule (with cut_values
and seg_classes), confusion (weighted count matrix),
pac, mean_pac, ess_pac, p_mc, loo,
n_eff.
Note on confusion matrix: confusion contains weighted
counts (priors-adjusted when priors_on = TRUE). For raw integer
counts use loo$confusion_raw.
See Also
ODA power analysis via simulation
Description
Estimates planning power for unit-weighted binary 2×2 ODA-equivalent
designs. The implemented design assumes fixed group sizes n1/n2
and binomial outcome probabilities p1/p2, then evaluates whether
the resulting 2×2 table is significant by Fisher's exact test at the
(optionally Sidak-adjusted) alpha.
Usage
oda_power(
n1,
n2 = n1,
p1 = NULL,
p2 = NULL,
ess = NULL,
alpha = 0.05,
comp = 1L,
nsim = 10000L,
mc_seed = NULL
)
Arguments
n1 |
Integer (or integer vector) giving the per-group sample size for class 0. When a vector is supplied, power is estimated at each element. |
n2 |
Integer (or integer vector) giving the per-group sample size for
class 1. Defaults to |
p1 |
Probability of the event in class 0. Ignored when |
p2 |
Probability of the event in class 1. Ignored when |
ess |
Effect Strength for Sensitivity (percent, |
alpha |
Nominal significance level. Default 0.05. May be a vector to evaluate power at multiple alpha levels simultaneously. |
comp |
Number of comparisons for Sidak multiple-comparison correction. Default 1 (no correction). Must be a single positive integer. |
nsim |
Number of Monte Carlo replications per cell. Default 10000. |
mc_seed |
Integer seed passed to |
Details
This is the binary lowest-measurement planning case discussed by Rhodes (2020). See also Yarnold and Soltysik (2005) for the underlying ODA/Fisher isomorphism. Scope: unit-weighted, binary class, binary (2-level) attribute only. This is not a general CTA, LORT, SDA, weighted, or multiclass power method.
Method: For each Monte Carlo replicate, binomial draws are generated
under (p1, p2) with fixed group sizes (n1, n2).
The resulting 2×2 table is tested by Fisher's exact test; power is the
proportion of replicates in which the null is rejected. The prospective
sampling treats group sizes as fixed and outcomes as binomial within each
group; the Fisher test is then applied to the generated table with its
realized marginals. This is the standard simulation-based power approach
for 2×2 contingency analyses.
Effect-size input:
Specify the effect either as per-group proportions p1 and p2
directly, or as ess (Effect Strength for Sensitivity, percent) under
the symmetric balanced convention:
p_2 = \text{ESS}/200 + 0.5, p_1 = 1 - p_2.
Sidak correction:
When comp > 1, the working \alpha is Sidak-adjusted:
\alpha_{\text{adj}} = 1 - (1 - \alpha)^{1/\text{comp}}.
Value
An object of class "oda_power", a list with elements:
powerNumeric matrix (rows = n1, cols = alpha_adj) of estimated power. Simplified to a named vector if one dimension is scalar, or to a scalar if both are.
n1,n2Per-group sample sizes.
p1,p2Per-group event rates used.
ess_inputESS supplied, or
NAifp1/p2used.alpha,alpha_adjInput and Sidak-adjusted alpha.
comp,nsim,mc_seedInput parameters.
References
Rhodes, N. J. (2020). Statistical power analysis in ODA, CTA and Novometrics. Optimal Data Analysis, 9. https://odajournal.files.wordpress.com/2020/02/v9a5.pdf
Yarnold PR, Soltysik RC (2005). Optimal Data Analysis: A Guidebook with Software for Windows. Washington, DC: APA Books.
Examples
# Power for ESS = 48%, n = 50 per group (CRAN-safe nsim; use 10000L for publication)
oda_power(n1 = 50, ess = 48, nsim = 500L, mc_seed = 42L)
# Power curve across a range of n
oda_power(n1 = c(30, 50, 80), ess = 48, nsim = 500L, mc_seed = 42L)
# Direct proportions (p1 = 0.26, p2 = 0.74)
oda_power(n1 = 50, p1 = 0.26, p2 = 0.74, nsim = 500L, mc_seed = 42L)
# Sidak correction for 3 comparisons
oda_power(n1 = 80, ess = 48, comp = 3L, nsim = 500L, mc_seed = 42L)
Retrieve predictions from a fitted ODA model
Description
Returns stored LOO predictions when available, or calls
predict.oda_fit() on supplied newdata. Training predictions
are not stored by the engine; supply newdata to obtain them.
Usage
oda_predictions(fit, split = c("train", "loo"), newdata = NULL, ...)
Arguments
fit |
An |
split |
One of |
newdata |
For |
... |
Passed to |
Value
Integer vector of predictions or NULL.
ODA rule strata propensity weights
Description
Computes propensity weights from the two rule strata (left and right of the ODA cutpoint) using stored training confusion counts. Implements the Yarnold/Linden stratum-weight formula:
w = n_s \times \Pr(Z=z) / n_{z,s}
Usage
oda_propensity_weights(fit, adjusted = TRUE)
Arguments
fit |
An |
adjusted |
Logical; if |
Details
Currently implemented for binary (C=2) ODA fits only.
The fitted model must have been trained with the treatment/exposure/group
membership as the class variable (y), not a clinical outcome.
The user is responsible for this labeling decision.
Value
Data frame with one row per (stratum, class) combination:
stratum_id (1L = rule predicts class 0, 2L = rule predicts
class 1), predicted_class (integer), class (character),
class_n (integer), stratum_n (integer),
marginal_class_n (integer), marginal_total_n (integer),
marginal_class_probability (numeric),
propensity_weight (numeric), undefined_empirical
(logical), adjusted (logical),
adjusted_propensity_weight (numeric),
model_family ("oda").
See Also
cta_propensity_weights,
lort_propensity_weights
Preflight readiness check for ODA / CTA analysis
Description
Validates a predictor frame, class vector, and optional weight vector before fitting. Returns a structured report. Does not modify inputs.
Usage
oda_readiness_check(
X,
y,
w = NULL,
miss_codes = NULL,
binary_only = FALSE,
min_class_n = 5L
)
Arguments
X |
Data frame of predictors. |
y |
Integer class/group vector. |
w |
Optional numeric weight vector. |
miss_codes |
Numeric vector of missing-code values (default
|
binary_only |
Logical; flag > 2 classes as an issue (default
|
min_class_n |
Minimum observations per class; flags if any class is
below this threshold (default |
Details
Flags:
Missing class/group variable.
Non-binary group when
binary_only = TRUE.Non-numeric weights, wrong-length weights, NA/Inf/zero weights.
Missing-code patterns in predictors (if
miss_codessupplied).Constant attributes (zero variance after miss-code removal).
Insufficient class counts (<
min_class_n).Attribute-type uncertainty (logical/factor columns).
Value
Named list with:
ok (logical, TRUE if no issues),
issues (character vector),
warnings (character vector, non-fatal),
n_obs (integer),
group_report (from oda_validate_group()),
weight_report (from oda_validate_weights()),
attr_types (from oda_infer_attr_types()),
constant_attrs (character vector of constant columns).
See Also
oda_validate_group, oda_validate_weights,
oda_infer_attr_types, oda_clean_missing_codes
Apply a binary ODA rule to new data
Description
Predict class labels (0 or 1) for new attribute values using a fitted binary ODA rule.
Usage
oda_rule_predict(x, rule)
Arguments
x |
Numeric or character attribute values. |
rule |
A rule list returned in |
Value
Integer vector of predicted class labels (0 or 1).
Apply a multiclass ODA rule to new data
Description
Predict class labels for new attribute values using a fitted multiclass ODA rule.
Usage
oda_rule_predict_multiclass(x, rule,
boundary = c("megaoda_halfopen","right_closed"))
Arguments
x |
Numeric attribute values. |
rule |
A rule list from |
boundary |
Boundary convention. Default |
Value
Integer vector of predicted class labels.
ODA minimum sample size via bisection
Description
Finds the minimum per-group sample size n (balanced design) at which
power reaches or exceeds power_target. Uses bisection over
oda_power() with a fixed RNG seed for stable search.
Usage
oda_sample_size(
power_target = 0.8,
p1 = NULL,
p2 = NULL,
ess = NULL,
alpha = 0.05,
comp = 1L,
nsim = 10000L,
mc_seed = 42L,
n_min = 2L,
n_max = 2000L
)
Arguments
power_target |
Target power. Default 0.80. |
p1 |
Probability of the event in class 0. Ignored when |
p2 |
Probability of the event in class 1. Ignored when |
ess |
Effect Strength for Sensitivity (percent, |
alpha |
Nominal significance level. Default 0.05. |
comp |
Number of comparisons for Sidak correction. Default 1. |
nsim |
Number of Monte Carlo replications per candidate |
mc_seed |
Integer seed used for every |
n_min |
Minimum |
n_max |
Maximum |
Details
Scope: unit-weighted, binary class, binary (2-level) attribute only.
This is not a general CTA, LORT, SDA, weighted, or multiclass sample-size
method. For unbalanced designs, call oda_power() directly across a
candidate grid.
Value
An object of class "oda_sample_size", a list with elements:
nMinimum per-group sample size achieving
power_target.power_achievedEstimated power at
n.power_targetInput target power.
p1,p2,ess_inputEffect-size inputs.
alpha,alpha_adj,compAlpha parameters.
nsim,mc_seedSimulation parameters.
References
Rhodes, N. J. (2020). Statistical power analysis in ODA, CTA and Novometrics. Optimal Data Analysis, 9. https://odajournal.files.wordpress.com/2020/02/v9a5.pdf
Yarnold PR, Soltysik RC (2005). Optimal Data Analysis: A Guidebook with Software for Windows. Washington, DC: APA Books.
Examples
# Minimum n for ESS = 48%, 80% power (use nsim >= 500L for publication-quality estimates)
oda_sample_size(ess = 48, nsim = 200L, mc_seed = 42L)
# 90% power target (publication-quality nsim)
oda_sample_size(ess = 48, power_target = 0.90, nsim = 500L, mc_seed = 42L)
Fit a univariate binary-class ODA model
Description
Low-level engine for binary-class Optimal Data Analysis. Handles ordered,
categorical, and binary attributes with optional prior-odds weighting,
Monte Carlo p-value, and leave-one-out validity analysis. Most users should
call oda_fit instead.
Usage
oda_univariate_core(x, y, w = NULL,
attr_type = c("auto","ordered","categorical","binary"),
priors_on = TRUE, primary = NULL, secondary = NULL,
miss_codes = NULL, missing_code = NULL,
loo = c("off","stable","pvalue"), loo_alpha = 0.05,
mcarlo = TRUE, mc_iter = 25000L, mc_target = 0.05,
mc_stop = 99.9, mc_stopup = NA_real_, mc_adjust = FALSE,
mc_seed = NULL, chance_model = c("class","attribute"),
eval_order = c("mc_then_loo","loo_then_mc"),
mindenom = 1L,
direction = c("both","off","greater","less"),
direction_map = NULL)
Arguments
x |
Attribute values. |
y |
Binary class labels, coercible to 0/1 integers. |
w |
Optional numeric case weights. |
attr_type |
Attribute type. |
priors_on |
If |
primary |
Primary tie-break heuristic. |
secondary |
Secondary tie-break. |
miss_codes |
Additional missing-value codes. |
missing_code |
Scalar alias for |
loo |
|
loo_alpha |
Alpha threshold for |
mcarlo |
Run Monte Carlo p-value? |
mc_iter |
Maximum MC iterations. |
mc_target |
Significance threshold. |
mc_stop |
Confidence level (percent) for STOP early stopping. |
mc_stopup |
Confidence level (percent) for STOPUP. |
mc_adjust |
Legacy parameter; unused. |
mc_seed |
RNG seed. |
chance_model |
|
eval_order |
Controls whether Monte Carlo testing is run before LOO
validation or whether eligible ordered-cut LOO stability is checked
before Monte Carlo. The default |
mindenom |
Minimum raw observation count required in each child node for a candidate cut to be evaluated. Default 1 (no enforcement). |
direction |
Directional hypothesis (MPE Chapter 2 scope):
|
direction_map |
Named integer vector for categorical fixed-partition
DIRECTIONAL (MPE Chapter 4). Names are attribute levels (character);
values are 0/1 coded class labels. All attribute levels must be covered.
When supplied for a categorical attribute, the specified partition is
evaluated without searching alternatives; LOO predictions are trivially
stable. Default |
Value
Named list. Key fields: ok, rule, confusion (list
with integer counts TP, TN, FP, FN and rate
fields sensitivity, specificity as proportions in [0,1]),
ess, pac, p_mc, loo, n_eff.
See Also
oda_fit, oda_multiclass_unioda_core
Validate a class / group variable
Description
Returns a structured report list rather than erroring. Useful as a
preflight check before passing y to oda_fit() or
cta_fit().
Usage
oda_validate_group(y, binary_only = FALSE)
Arguments
y |
Integer (or coercible to integer) class vector. |
binary_only |
Logical; if |
Value
Named list with: ok (logical), n_classes (integer),
class_levels (integer vector), class_counts (named integer
table), issues (character vector, empty if ok).
Validate a case weight vector
Description
Returns a structured report rather than throwing an error. NULL
weights are valid (interpreted as unit weights) and return
ok = TRUE.
Usage
oda_validate_weights(w, n)
Arguments
w |
Numeric weight vector or |
n |
Expected length of |
Value
Named list with: ok (logical), issues (character
vector, empty if ok), n_weights (integer or NA),
range (numeric(2) or NULL).
Renderer-independent layout data for a LORT composite tree
Description
Computes node positions and edge metadata for plot.cta_ort.
Terminal nodes receive integer x-slot positions (left-to-right in DFS
right-first order); internal nodes are centered over their children.
Usage
ort_plot_data(object, target_class = NULL, class_labels = NULL, digits = 1L)
Arguments
object |
A |
target_class |
Integer target class for terminal node annotation, or
|
class_labels |
Optional named character vector of class display names. |
digits |
Integer decimal places for proportion labels. Default 1. |
Value
A list with elements:
nodesdata.frame:
node_id,depth,x,y,is_terminal,label,n,stop_reason.edgesdata.frame:
from_id,to_id,x0,y0,x1,y1,label.strataThe strata table from the LORT object.
Note
ort_plot_data is a legacy compatibility name for the LORT method.
See print.cta_ort for the naming note.
Plot method for Locally Optimal Recursive Tree (LORT)
Description
Renders the composite LORT using G1 base-R conventions: ellipses for split nodes, rectangles for terminal nodes, directed arrows for edges.
Usage
## S3 method for class 'cta_ort'
plot(
x,
target_class = NULL,
class_labels = NULL,
digits = 1L,
main = "LORT",
split_fill = "#D9EAF7",
endpoint_fill = "#D9F7E6",
endpoint_palette = NULL,
border_col = "grey30",
text_col = "black",
edge_col = "grey40",
arrow_col = NULL,
show_caption = FALSE,
cex = 0.75,
...
)
Arguments
x |
A |
target_class |
Integer target class for terminal node annotation;
|
class_labels |
Optional named character vector of class display names. |
digits |
Decimal places for proportion labels. Default |
main |
Plot title. Default |
split_fill |
Fill color for split (internal) ellipse nodes. |
endpoint_fill |
Default fill for terminal rectangle nodes. |
endpoint_palette |
Palette for terminal nodes when |
border_col |
Border color for all nodes. Default |
text_col |
Text color for node labels. Default |
edge_col |
Color for directed edge arrows. Default |
arrow_col |
Arrow color; |
show_caption |
Logical; add color-encoding caption when
|
cex |
Text expansion factor. Default |
... |
Unused. |
Value
invisible(pd), the layout list from ort_plot_data.
Note
plot.cta_ort and ort_plot_data are legacy compatibility
names for the LORT method. See print.cta_ort for the naming
note.
See Also
Plot a fitted CTA tree
Description
Native base-R CTA visualization. Calls cta_plot_data for
layout; uses only base graphics - no external package dependencies.
Split (internal) nodes are drawn as ellipses; terminal endpoint
nodes are drawn as rectangles; edges are directed arrows.
Split nodes show the split attribute, node-level ESS or WESS, and
observation count. Without target_class, leaf nodes show the
majority-class prediction and observation count. With target_class,
leaf nodes show the target-class count, percentage, predicted class, and
stage from cta_staging_table. Edge labels show the branch
condition (e.g. "V14<=0.5").
Color note: when target_class is supplied, endpoint fill
colors are assigned by ascending rank of each endpoint's target-class
proportion within this tree. Colors encode relative position in the
endpoint distribution and do not imply clinical thresholds or
categories. Supply a custom palette via endpoint_palette to change
the color encoding. Use show_caption = TRUE to render an explicit
note on the plot.
cta_plot_data is the renderer-independent data contract.
This function (plot.cta_tree) is the current native base-R renderer.
Usage
## S3 method for class 'cta_tree'
plot(x,
target_class = NULL, class_labels = NULL, digits = 1,
main = "CTA Tree", show_counts = TRUE, show_stage = TRUE,
endpoint_palette = NULL, endpoint_fill = "#D9F7E6",
split_fill = "#D9EAF7", node_col_split = NULL,
node_col_leaf = NULL, edge_col = "grey40",
border_col = "grey30", text_col = "black",
arrow_col = NULL, show_caption = FALSE,
cex = 0.75, ...)
Arguments
x |
A |
target_class |
Integer target class for endpoint annotation; passed
to |
class_labels |
Optional display names for class labels; passed to
|
digits |
Decimal places for percentage labels in enriched endpoint
nodes; passed to |
main |
Character plot title. Default |
show_counts |
Logical; include |
show_stage |
Logical; include |
endpoint_palette |
Palette for endpoint fill colors when
|
endpoint_fill |
Default fill colour for leaf (terminal) nodes when
|
split_fill |
Fill colour for split (internal) ellipse nodes.
Default |
node_col_split |
Legacy alias for |
node_col_leaf |
Legacy alias for |
edge_col |
Colour for directed edge arrows. Default |
border_col |
Border colour for all nodes. Default |
text_col |
Text colour for node labels. Default |
arrow_col |
Arrow colour for directed edges. |
show_caption |
Logical; if |
cex |
Text expansion factor for node labels. Default |
... |
Unused; included for S3 compatibility. |
Value
invisible(pd), where pd is the cta_plot_data
list used to render the plot. The caller can inspect layout coordinates,
enrichment columns, and endpoint annotations from the returned object.
See Also
cta_plot_data, cta_staging_table,
oda_cta_fit
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- suppressMessages(
oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L,
loo = "off")
)
# Structural plot
plot(tree)
# Target-class enriched plot with custom labels
plot(tree, target_class = 1L,
class_labels = c("0" = "Manual", "1" = "Auto"))
# Custom palette (white to dark red)
plot(tree, target_class = 1L,
endpoint_palette = c("#ffffff", "#c62828"))
Love plot for covariate balance (SMD)
Description
A direct alias for plot_smd_balance. Produces a
Cleveland-style Love plot of absolute SMD with conventional threshold
reference lines.
Usage
plot_balance_love(x, ...)
Arguments
x |
A |
... |
Arguments forwarded to |
Value
A ggplot object.
See Also
plot_smd_balance, smd_balance_table
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
group <- c(rep(0L, 20), rep(1L, 20))
X <- data.frame(A = c(rep(0L,20), rep(1L,20)),
B = rnorm(40))
smd <- smd_balance_table(group, X)
p <- plot_balance_love(smd)
print(p)
}
Plot CTA multivariate covariate balance
Description
Renders the CTA covariate balance result. When no discriminating tree was
found (status = "no_tree"), a message panel confirms favorable
evidence of multivariable balance under the declared constraints. When a
valid tree or stump was found, the tree diagram is rendered via
plot_cta_tree.
Usage
plot_cta_balance(
x,
target_class = 1L,
color_by = c("target_rate", "prediction", "none"),
main = NULL,
subtitle = NULL,
...
)
Arguments
x |
A |
target_class |
Integer; target class for leaf-node coloring.
Default |
color_by |
Character; leaf-node fill: |
main |
Character; plot title. Default: auto-generated from ESS/WESS. |
subtitle |
Character; plot subtitle. |
... |
Additional arguments forwarded to |
Details
This function is a pure renderer. It does not fit any CTA models and does
not accept group or X arguments.
Value
A ggplot object.
See Also
cta_balance_plot_data, cta_balance_table,
plot_cta_tree
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
X <- data.frame(
A = c(rep(0L,20), rep(1L,20), rep(1L,20)),
B = c(rep(0L,20), rep(0L,20), rep(1L,20))
)
group <- c(rep(0L, 40), rep(1L, 20))
ct <- cta_balance_table(group, X, mindenom = 5L,
mc_iter = 200L, mc_seed = 42L)
cpd <- cta_balance_plot_data(ct)
p <- plot_cta_balance(cpd)
print(p)
}
Evidence card for CTA multivariate covariate balance
Description
Renders an evidence-interval card from a
cta_balance_effect_summary object. Each row of the card
corresponds to one analysis scale. The plot uses the same interval
encoding as plot_oda_balance_effects: thick black = bootstrap
CI, thin gray = chance CI, open circle = observed ESS/WESS.
Usage
plot_cta_balance_effects(x, main = NULL, subtitle = NULL, xlim = NULL, ...)
Arguments
x |
A |
main |
Optional character; plot title. |
subtitle |
Optional character; plot subtitle. |
xlim |
Optional numeric(2); x-axis limits. |
... |
Ignored; reserved for future use. |
Details
When status = "no_tree" for all rows, a favorable-balance message
panel is returned instead of an interval plot.
This function does not fit any models.
Value
A ggplot object.
See Also
Examples
group <- c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)
X <- data.frame(v1 = c(1, 2, 3, 4, 5, 6, 7, 8),
v2 = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L))
ces <- cta_balance_effect_summary(group, X, mindenom = 5L,
mc_iter = 200L, mc_seed = 42L,
nboot = 20L, chance_iter = 20L)
plot_cta_balance_effects(ces)
Plot a CTA descendant family member using ggplot2
Description
Renders a publication-quality CTA tree diagram for a single member of a
cta_family object (indexed inspection), or a named list of plots for
all members (show_all = TRUE). Requires the ggplot2 package.
Usage
plot_cta_family(
family,
index = 1L,
min_d = FALSE,
show_all = FALSE,
layout = c("multipanel", "list"),
ncol = 1L,
target_class = 1L,
color_by = c("none", "target_rate", "prediction"),
label_detail = c("simple", "full"),
show_node_ess = FALSE,
show_p = TRUE,
show_loo = TRUE,
main = NULL,
subtitle = NULL,
show_rule = TRUE,
show_metrics = FALSE,
short_edge_labels = TRUE,
node_text_size = 3.5,
edge_text_size = 3.2,
palette = NULL
)
Arguments
family |
A |
index |
Integer or |
min_d |
Logical; convenience shorthand for |
show_all |
Logical; if |
layout |
Character; |
ncol |
Integer; number of columns in the multipanel grid. Default
|
target_class |
Integer; target class for endpoint coloring (default
|
color_by |
Character; leaf-node fill. |
label_detail |
Character; |
show_node_ess |
Logical; append node ESS to split labels.
Default |
show_p |
Logical; append |
show_loo |
Logical; append LOO status/p to split-node labels.
Default |
main |
Character; plot title. Default: auto-generated with MINDENOM and D. |
subtitle |
Character; plot subtitle. |
show_rule |
Logical; show edge condition labels. Default |
show_metrics |
Logical; append ESS/D to subtitle. Default |
short_edge_labels |
Logical; strip attribute prefix from edge labels.
Default |
node_text_size |
Numeric; text size for node labels. Default |
edge_text_size |
Numeric; text size for edge labels. Default |
palette |
Named list for color overrides. |
Value
A ggplot object (single member or multipanel),
or (when show_all = TRUE and layout = "list") a named list
of ggplot objects.
See Also
cta_descendant_family, plot_cta_tree,
plot_lort_tree, ggsave
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
X <- data.frame(x1 = c(rep(0L,20), rep(1L,20)),
x2 = c(rep(0L,10), rep(1L,10), rep(0L,10), rep(1L,10)))
y <- c(rep(0L,30), rep(1L,10))
fam <- cta_descendant_family(X, y, mc_iter=200L, mc_seed=42L, loo="off")
p <- plot_cta_family(fam, index=1L)
print(p)
}
Plot a CTA tree using ggplot2
Description
Renders a publication-quality tree diagram for a fitted CTA tree. Requires
the ggplot2 package (listed in Suggests); if unavailable, a
clear error is raised.
Usage
plot_cta_tree(
x,
target_class = 1L,
color_by = c("none", "target_rate", "prediction"),
label_detail = c("simple", "full"),
show_node_ess = FALSE,
show_p = TRUE,
show_loo = TRUE,
main = NULL,
subtitle = NULL,
show_rule = TRUE,
show_metrics = FALSE,
short_edge_labels = TRUE,
node_text_size = 3.5,
edge_text_size = 3.2,
palette = NULL
)
Arguments
x |
A |
target_class |
Integer; target class for endpoint coloring and
target-rate annotation (default |
color_by |
Character; controls leaf-node fill color.
|
label_detail |
Character; node label verbosity. |
show_node_ess |
Logical; if |
show_p |
Logical; if |
show_loo |
Logical; if |
main |
Character; plot title. Default: auto-generated from tree structure (n, endpoints, ESS/D). |
subtitle |
Character; plot subtitle. |
show_rule |
Logical; show branch condition labels on edges.
Default |
show_metrics |
Logical; if |
short_edge_labels |
Logical; if |
node_text_size |
Numeric; ggplot text size for node labels.
Default |
edge_text_size |
Numeric; ggplot text size for edge labels.
Default |
palette |
Named list for color overrides: |
Value
A ggplot object. Print it, modify it, or
save with ggplot2::ggsave().
See Also
cta_fit, cta_plot_data,
plot_lort_tree, plot_cta_family,
ggsave
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
X <- data.frame(x1 = c(1,2,3,4,5,6,7,8),
x2 = c(0L,0L,1L,0L,1L,1L,0L,1L))
y <- c(1L,1L,1L,1L,2L,2L,2L,2L)
tree <- cta_fit(X, y, mindenom=1L, mc_iter=500L, mc_seed=42L, loo="off")
p <- plot_cta_tree(tree)
print(p)
}
Plot the full local CTA models along a LORT recursion path
Description
Returns a named list of ggplot objects, one per LORT node on the path from
the root to the requested index. Each panel shows the full
local CTA model embedded at that LORT node – not a stump summary.
Usage
plot_lort_path(
x,
index = 1L,
layout = c("multipanel", "list"),
ncol = 1L,
target_class = 1L,
color_by = c("none", "target_rate", "prediction"),
label_detail = c("simple", "full"),
show_node_ess = FALSE,
show_p = TRUE,
show_loo = TRUE,
show_rule = TRUE,
show_metrics = FALSE,
short_edge_labels = TRUE,
node_text_size = 3.5,
edge_text_size = 3.2,
palette = NULL,
...
)
Arguments
x |
A |
index |
Integer; target LORT node index (end of path). |
layout |
Character; |
ncol |
Integer; number of columns in the multipanel layout. Default
|
target_class |
Integer; target class for node coloring. Default
|
color_by |
Character; leaf fill mode. Default |
label_detail |
Character; |
show_node_ess |
Logical. Default |
show_p |
Logical; append |
show_loo |
Logical; append LOO status/p to split-node labels.
Default |
show_rule |
Logical. Default |
show_metrics |
Logical. Default |
short_edge_labels |
Logical. Default |
node_text_size |
Numeric. Default |
edge_text_size |
Numeric. Default |
palette |
Named list; color overrides. |
... |
Ignored; reserved. |
Details
The list is named index_1, index_2, etc. (one name per LORT
node on the path). Terminal nodes with no model get a message panel.
Value
With layout = "multipanel": a single patchwork/ggplot
object containing all path panels. With layout = "list": a named
list of ggplot objects.
See Also
lort_index_path, lort_local_tree,
lort_path_table, plot_lort_tree
Plot a LORT (Locally Optimal Recursive Tree) using ggplot2
Description
Renders a publication-quality CTA tree diagram for a single sub-tree within
a LORT object (indexed inspection), or a named list of plots for all sub-trees
(show_all = TRUE). Requires the ggplot2 package.
Usage
plot_lort_tree(
x,
index = 1L,
show_all = FALSE,
show_path = FALSE,
target_class = 1L,
color_by = c("none", "target_rate", "prediction"),
label_detail = c("simple", "full"),
show_node_ess = FALSE,
show_p = TRUE,
show_loo = TRUE,
main = NULL,
subtitle = NULL,
show_rule = TRUE,
show_metrics = FALSE,
short_edge_labels = TRUE,
node_text_size = 3.5,
edge_text_size = 3.2,
palette = NULL,
...
)
Arguments
x |
A |
index |
Integer or character; which LORT node (sub-tree) to render.
Default |
show_all |
Logical; if |
show_path |
Logical; if |
target_class |
Integer; target class for endpoint coloring and
target-rate annotation (default |
color_by |
Character; controls leaf-node fill color.
|
label_detail |
Character; |
show_node_ess |
Logical; append node-level ESS to split labels.
Default |
show_p |
Logical; append |
show_loo |
Logical; append |
main |
Character; plot title. Default: auto-generated. When
|
subtitle |
Character; plot subtitle. |
show_rule |
Logical; show branch condition labels on edges. |
show_metrics |
Logical; append ESS/D to subtitle. Default |
short_edge_labels |
Logical; strip attribute-name prefix from edge labels.
Default |
node_text_size |
Numeric; text size for node labels. Default |
edge_text_size |
Numeric; text size for edge labels. Default |
palette |
Named list for color overrides. |
... |
Additional arguments passed to |
Value
A ggplot object, or (when show_all =
TRUE) a named list of ggplot objects.
See Also
lort_fit, plot_cta_tree,
plot_cta_family, ggsave
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
X <- data.frame(
A = c(rep(0L,20), rep(1L,20), rep(1L,20)),
B = c(rep(0L,20), rep(0L,20), rep(1L,20))
)
y <- c(rep(0L,40), rep(1L,20))
lort <- lort_fit(X, y, mc_iter=100L, mc_seed=42L, loo="off", min_n=5L)
p <- plot_lort_tree(lort, index=1L)
print(p)
}
Plot ODA covariate balance
Description
Renders a horizontal dot-plot of ODA-based covariate balance diagnostics.
Each covariate is shown as a point; the x-axis is ESS or WESS (0-100 %),
and point color reflects significance status. The function is a pure
renderer: it does not fit any ODA models and does not accept group
or X arguments. If abs_smd is absent from the plot-data it
is not plotted.
Usage
plot_oda_balance(
x,
p_col = "p_mc",
rank_by = "abs_ess",
main = NULL,
subtitle = NULL,
show_significance = TRUE,
palette = NULL,
theme = c("clean", "minimal")
)
Arguments
x |
An |
p_col |
Character; which p-value column drives significance colour when
coercing from an |
rank_by |
Character; sort order when coercing from
|
main |
Character; plot title. Default: auto-generated summary. |
subtitle |
Character; plot subtitle. |
show_significance |
Logical; annotate significantly imbalanced
covariates with a |
palette |
Named list for color overrides: |
theme |
Character; |
Value
A ggplot object.
See Also
oda_balance_plot_data, oda_balance_table
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
group <- c(rep(0L, 20), rep(1L, 20))
X <- data.frame(A = c(rep(0L,20), rep(1L,20)),
B = rnorm(40))
bt <- oda_balance_table(group, X, mcarlo = FALSE, mc_iter = 100L)
pd <- oda_balance_plot_data(bt)
p <- plot_oda_balance(pd)
print(p)
}
Forest plot of ODA covariate balance evidence intervals
Description
Renders a forest plot from an oda_balance_effect_table object.
Each covariate is displayed as one row. A thin gray segment shows the
chance (null) confidence interval; a thick black segment shows the
bootstrap model CI; a point shows the observed ESS/WESS. A vertical
dashed line marks the chance upper bound (chance_hi) as a visual reference.
Usage
plot_oda_balance_effects(
x,
main = NULL,
subtitle = NULL,
x_label = NULL,
xlim = NULL,
...
)
Arguments
x |
An |
main |
Optional character; plot title. Defaults to
|
subtitle |
Optional character; plot subtitle. |
x_label |
Optional character; x-axis label. Defaults to the metric
label from the data ( |
xlim |
Optional numeric(2); x-axis limits. Auto-computed when
|
... |
Ignored; reserved for future use. |
Details
When the object contains multiple analysis scales (e.g.,
compare_weights = TRUE), the plot is faceted by analysis.
This function does not fit any models. Pass a pre-computed
oda_balance_effect_table from oda_balance_effect_table.
Value
A ggplot object.
See Also
Examples
group <- c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L)
X <- data.frame(v1 = c(1, 2, 3, 4, 5, 6, 7, 8),
v2 = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L))
et <- oda_balance_effect_table(group, X,
nboot = 50L, chance_iter = 50L,
mc_iter = 200L, mc_seed = 1L)
plot_oda_balance_effects(et)
Plot SMD covariate balance
Description
Renders a horizontal dot-plot of absolute standardized mean differences (|SMD|) for each covariate. Vertical reference lines at 0.10 (and optionally 0.20) mark conventional balance thresholds. Points are colored by whether |SMD| < 0.10.
Usage
plot_smd_balance(
x,
ref_010 = TRUE,
ref_020 = FALSE,
main = NULL,
subtitle = NULL,
palette = NULL,
theme = c("clean", "minimal")
)
Arguments
x |
A |
ref_010 |
Logical; draw a dashed reference line at |SMD| = 0.10.
Default |
ref_020 |
Logical; draw a dotted reference line at |SMD| = 0.20.
Default |
main |
Character; plot title. Default |
subtitle |
Character; plot subtitle. |
palette |
Named list for color overrides: |
theme |
Character; |
Value
A ggplot object.
See Also
smd_balance_table, plot_balance_love
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
group <- c(rep(0L, 20), rep(1L, 20))
X <- data.frame(A = c(rep(0L,20), rep(1L,20)),
B = rnorm(40))
smd <- smd_balance_table(group, X)
p <- plot_smd_balance(smd)
print(p)
}
Predict method for Locally Optimal Recursive Tree (LORT)
Description
Routes each row of newdata down the composite LORT by recursively
applying each node's cta_tree model via
cta_assign_endpoints.
Usage
## S3 method for class 'cta_ort'
predict(
object,
newdata,
type = c("class", "stratum", "path", "all"),
missing_action = c("na", "majority"),
...
)
Arguments
object |
A |
newdata |
Data frame or matrix matching the training X column layout. |
type |
Character; one of |
missing_action |
Passed to each node-level
|
... |
Unused. |
Value
For type = "class": integer vector of predicted class labels
(length nrow(newdata)). For type = "stratum": integer
stratum_id vector. For type = "path": character path vector.
For type = "all": data.frame with columns
predicted_class, stratum_id, path,
prop_class1, stop_reason.
Note
predict.cta_ort is a legacy compatibility name; the class
cta_ort and all *.cta_ort methods refer to the implemented
LORT method. New docs and APIs should use LORT terminology.
Classify new observations using a CTA tree
Description
Applies a fitted cta_tree to new data by routing each observation
through the tree until it reaches a leaf node.
Usage
## S3 method for class 'cta_tree'
predict(object, newdata,
missing_action = c("majority", "na"), ...)
Arguments
object |
A |
newdata |
Data frame or matrix with the same columns as training X. |
missing_action |
How to handle observations whose split attribute is
missing on their traversal path. |
... |
Unused. |
Value
Integer vector of predicted class labels, length nrow(newdata).
When missing_action = "na", observations missing a split attribute
on their path receive NA_integer_.
Predict class labels from a fitted ODA model
Description
Applies the fitted ODA rule to new attribute values, returning predicted
class labels in the original label space. Missing values and miss-coded
values return NA_integer_. Failed fits return all NA_integer_
with a warning.
Usage
## S3 method for class 'oda_fit'
predict(object, newdata, ...)
Arguments
object |
An |
newdata |
Numeric vector or single-column data frame of attribute values. |
... |
Unused. |
Value
Integer vector of predicted class labels, length length(newdata)
or nrow(newdata).
Predict from an SDA procedure result
Description
Applies the learned selected-step sequence to newdata. For each
observation, steps are applied in order; the first step whose rule
classifies the observation is authoritative. Observations not classified
by any step are returned as NA (resolved = FALSE).
Usage
## S3 method for class 'sda_fit'
predict(object, newdata, type = "class", ...)
Arguments
object |
A |
newdata |
Data frame or matrix. Must contain columns with names
matching all selected attributes in |
type |
Output type. One of |
... |
Unused. |
Details
This is sequential selected-step application - it follows the learned SDA
structure, not a re-scan of X. It does not select a "first attribute" from
newdata; it replays object$steps[[1]], object$steps[[2]], ...
in the order established at fit time.
Value
"class"Integer vector of predicted class labels;
NAfor unresolved observations."stage"Integer vector of step_id at which each observation was classified;
NAfor unresolved."rule"Character vector of the selected attribute name at the classifying step;
NAfor unresolved."trace"Data frame with one row per observation x step:
obs_id,step_id,attribute,classified,class_pred.
Print an auto_sda_plan object
Description
Print an auto_sda_plan object
Usage
## S3 method for class 'auto_sda_plan'
print(x, ...)
Arguments
x |
An |
... |
Unused. |
Value
Invisibly returns x. Called primarily for its side effect of
printing a human-readable summary of the SDA plan to the console.
Print a CTA descendant family
Description
Calls summary.cta_family and prints the result.
Usage
## S3 method for class 'cta_family'
print(x, ...)
Arguments
x |
A |
... |
Passed to |
Value
invisible(x).
See Also
summary.cta_family, cta_family_table
Print a CTA family summary
Description
Compact display of the cta_family_summary object returned by
summary.cta_family.
Usage
## S3 method for class 'cta_family_summary'
print(x, ...)
Arguments
x |
A |
... |
Unused; included for S3 compatibility. |
Value
invisible(x).
See Also
summary.cta_family, cta_family_table
Print method for Locally Optimal Recursive Tree (LORT)
Description
Print method for Locally Optimal Recursive Tree (LORT)
Usage
## S3 method for class 'cta_ort'
print(x, ...)
Arguments
x |
A |
... |
Unused. |
Value
invisible(x).
Note
print.cta_ort is a legacy compatibility name for the LORT
method. The class cta_ort and all *.cta_ort methods refer
to LORT; do not introduce new bare-ort public names.
Print method for cta_ort_summary
Description
Print method for cta_ort_summary
Usage
## S3 method for class 'cta_ort_summary'
print(x, ...)
Arguments
x |
A |
... |
Unused. |
Value
invisible(x).
Print a CTA tree in MegaODA node table format
Description
Displays each split node with its attribute, depth, n, p-value, ESS, LOO status, and rule string, followed by the node confusion matrix.
Usage
## S3 method for class 'cta_tree'
print(x, ...)
Arguments
x |
A |
... |
Unused. |
Value
Invisibly returns x.
Print a CTA tree summary
Description
Compact display of a cta_tree_summary object produced by
summary.cta_tree.
Usage
## S3 method for class 'cta_tree_summary'
print(x, ...)
Arguments
x |
A |
... |
Unused. |
Value
Invisibly returns x.
See Also
Print a fitted ODA model
Description
Compact display of rule, ESS/Mean PAC, and available MC/LOO metadata. Does not recompute any quantities.
Usage
## S3 method for class 'oda_fit'
print(x, ...)
Arguments
x |
An |
... |
Unused. |
Value
Invisibly returns x.
Print an ODA fit summary
Description
Print an ODA fit summary
Usage
## S3 method for class 'oda_fit_summary'
print(x, ...)
Arguments
x |
An |
... |
Unused. |
Value
Invisibly returns x.
Print an sda_anchor
Description
Prints a concise summary: anchor type, number of stages, selected attributes, implementation status. Does not claim SORT or GORT are implemented.
Usage
## S3 method for class 'sda_anchor'
print(x, ...)
Arguments
x |
An |
... |
Ignored. |
Value
x invisibly.
Print an sda_fit object
Description
Print an sda_fit object
Usage
## S3 method for class 'sda_fit'
print(x, ...)
Arguments
x |
An |
... |
Unused. |
Value
Invisibly returns x. Called primarily for its side effect of
printing a human-readable summary of the SDA fit to the console.
Print an sda_fit_summary object
Description
Print an sda_fit_summary object
Usage
## S3 method for class 'sda_fit_summary'
print(x, ...)
Arguments
x |
An |
... |
Unused. |
Value
Invisibly returns x. Called primarily for its side effect of
printing a human-readable summary of the SDA results to the console.
Propensity-weighted ESS balance diagnostic
Description
For each covariate in X_balance, computes the unweighted and
propensity-weighted ODA ESS association with group, the delta ESS
(weighted minus unweighted), and a bootstrap confidence interval on the
delta.
Usage
propensity_ess_balance(
propensity_fit,
group,
X_balance,
x_prop = NULL,
newdata = NULL,
target_class = NULL,
adjusted = TRUE,
n_boot = 500L,
boot_alpha = 0.05,
seed = NULL
)
Arguments
propensity_fit |
An |
group |
Integer (or coercible) binary group/treatment vector of length
|
X_balance |
Data frame of baseline covariates. Must have |
x_prop |
Numeric vector of length |
newdata |
Data frame with |
target_class |
Integer. Passed to
|
adjusted |
Logical. If |
n_boot |
Integer. Number of bootstrap resamples. Default 500L. |
boot_alpha |
Numeric in (0, 1). CI level is |
seed |
Integer or |
Details
If propensity weighting controls confounding, the weighted ODA ESS should
move toward 0 (the chance/null boundary). A negative delta_ess
means the ODA association was attenuated by weighting (improved balance).
crosses_null = TRUE means the bootstrap CI for the delta includes 0.
LORT (cta_ort) propensity models are not supported in this version.
Use a single cta_tree via cta_fit() instead.
The bootstrap uses plug-in propensity weights: weights computed on the full data are reused in each resample rather than re-estimating the propensity model. This is appropriate for assessing sampling variability in the balance diagnostic given a fixed propensity model.
oda_balance_table is called with mcarlo = FALSE; MC
p-values are not computed during bootstrap iterations.
Value
A data.frame of class
c("propensity_ess_balance", "data.frame") with one row per
covariate and columns:
- variable
Covariate name.
- n
Effective sample size from the unweighted ODA fit.
- unweighted_ess
Unweighted ODA ESS (%).
- weighted_ess
Propensity-weighted ODA ESS / WESS (%).
- delta_ess
weighted_ess - unweighted_ess. Negative values indicate attenuation (improved balance).- boot_low
Lower bound of the bootstrap CI on
delta_ess.- boot_high
Upper bound of the bootstrap CI on
delta_ess.- crosses_null
Logical.
TRUEwhen the CI includes 0.- status
"ok","inadmissible_unweighted","inadmissible_weighted", or"inadmissible_both".
See Also
oda_propensity_weights,
cta_propensity_weights,
oda_balance_table
Examples
set.seed(1L)
n <- 80L
group <- c(rep(0L, 40L), rep(1L, 40L))
x_pv <- c(rnorm(40, 0), rnorm(40, 3))
prop_fit <- oda_fit(x = x_pv, y = group)
X_bal <- data.frame(age = c(rnorm(40, 45), rnorm(40, 55)),
score = rnorm(80))
peb <- propensity_ess_balance(prop_fit, group, X_bal,
x_prop = x_pv, n_boot = 50L, seed = 1L)
print(peb[, c("variable", "unweighted_ess", "weighted_ess",
"delta_ess", "crosses_null")])
Construct an sda_anchor object
Description
Low-level constructor. Prefer as_sda_anchor when converting
from an sda_fit. Use this constructor when building an explicit /
manual anchor from pre-specified fields (e.g. from a published attribute
ordering).
Usage
sda_anchor(
anchor_type = "explicit",
source_class = NULL,
source_call = NULL,
group_levels = NULL,
selected_attributes,
candidate_universe = NULL,
stage_table,
branch_candidate_map = NULL,
removal_history = NULL,
weights_used = FALSE,
weight_summary = NULL,
loo_mode = NULL,
mc_iter = NULL,
mc_seed = NULL,
mindenom = NULL,
alpha = NULL,
stop_reason = NA_character_,
reproducibility_notes = character(0),
canon_notes = character(0),
task_hook = .sda_anchor_task_hook()
)
Arguments
anchor_type |
Character scalar: |
source_class |
Character vector: class of the source object, or
|
source_call |
Language object or |
group_levels |
Integer vector of class/group levels, or |
selected_attributes |
Non-empty character vector of selected attribute names in stage order. |
candidate_universe |
Character vector of all attributes evaluated, or
|
stage_table |
Data frame with at least columns |
branch_candidate_map |
Named list for SORT branch-level candidates, or
|
removal_history |
List of per-step removal records, or |
weights_used |
Logical. |
weight_summary |
List or |
loo_mode |
Character scalar or |
mc_iter |
Integer or |
mc_seed |
Integer or |
mindenom |
Integer or |
alpha |
Numeric or |
stop_reason |
Character scalar or |
reproducibility_notes |
Character vector. |
canon_notes |
Character vector. |
task_hook |
List. Machine-readable metadata for future agent/pipeline
consumers. Defaults to the standard anchor task hook (see
|
Details
An sda_anchor is a typed structural object that carries SDA
selection history for future SORT (staged CTA) workflows. It is not a
fitting object and does not estimate propensity scores.
What an SDA anchor is not:
It is not a propensity-score estimator. SDA produces stage order and selected attributes, not a propensity stratification.
It is not an implementation of SORT or GORT. Both remain future reserved workflows.
Explicit / manual anchors are not SDA-derived and must be labeled
anchor_type = "explicit".
Task hook:
The default task_hook marks implementation_status =
"anchor_only_no_sort", lists prohibited_downstream =
c("propensity_weighting", "fraud_demo"), and requires human review.
Value
Object of class c("sda_anchor", "list").
See Also
as_sda_anchor, validate_sda_anchor,
sda_fit
Return the candidate table from one or all SDA steps
Description
The candidate table is the primary auditability record: one row per candidate attribute evaluated at a step, showing ESS, p-value, eligibility, and why a candidate was rejected or selected.
Usage
sda_candidate_table(fit, step = NULL)
Arguments
fit |
An |
step |
Integer step index, or |
Value
If step is an integer: the candidate table data frame for
that step (with an added step_id column). If step = NULL:
a named list of candidate table data frames, one per step.
Run a Structural Decomposition Analysis (SDA) procedure
Description
Executes staged attribute-set identification on binary class data. Traverses the attribute space by class, selecting the best eligible attribute at each step, removing correctly classified observations, and repeating on the unresolved sample until a stopping condition is met. The result identifies which attributes to pass to downstream CTA or MDSA.
Usage
sda_fit(
X,
y,
mode = c("novometric_min_d", "unioda_max_ess"),
attr_types = NULL,
weights = NULL,
mindenom = NULL,
mc_iter = 5000L,
mc_seed = 42L,
mc_stop = 99.9,
mc_stopup = NA,
alpha = 0.05,
loo = "off",
max_steps = NULL,
min_n = NULL,
min_class_n = NULL,
remove_correct = TRUE,
collinearity = c("skip", "warn", "allow"),
verbose = FALSE
)
Arguments
X |
Data frame of candidate attribute columns. |
y |
Integer class vector. Must have exactly two distinct values. |
mode |
SDA mode. |
attr_types |
Named character vector of attribute types
( |
weights |
Case weights. Must be |
mindenom |
Integer MINDENOM (novometric mode only; ignored with warning in unioda_max_ess mode). |
mc_iter |
Maximum Monte Carlo iterations per attribute fit. Default 5000L. |
mc_seed |
RNG seed set once before the SDA run. Default 42L. |
mc_stop |
Lower-tail early-stop confidence (percent). Default 99.9. |
mc_stopup |
Upper-tail early-stop confidence (percent). Default NA (disabled; matches MegaODA behavior). |
alpha |
Significance threshold for p-value gate. Default 0.05. |
loo |
LOO mode passed to |
max_steps |
Maximum number of SDA steps (safety cap). Default |
min_n |
Minimum working-sample size. If unresolved n drops below this,
stop with |
min_class_n |
Minimum per-class count. Stop with |
remove_correct |
Logical. If |
collinearity |
How to handle duplicate candidate columns:
|
verbose |
Logical. Emit |
Value
Object of class c("sda_fit", "odacore_sda").
Return the selected attribute names from an SDA procedure result
Description
Returns the names of attributes selected across all SDA steps, in step order. This is the constrained candidate set to pass to MDSA/CTA.
Usage
sda_selected_attributes(fit)
Arguments
fit |
An |
Value
Character vector of selected attribute names (length = number of completed SDA steps). Empty character vector if no steps completed.
Return a summary table of SDA steps
Description
One row per completed SDA step. Columns cover the key auditability fields needed to review what was selected, why, and how the working sample changed.
Usage
sda_step_table(fit)
Arguments
fit |
An |
Value
Data frame with columns: step_id, attribute,
n_in, n_correct, n_incorrect, ess, d,
p_mc, mindenom.
Prepare X and y for CTA using SDA-selected attributes
Description
Returns a named list list(X_cta, y_cta) where X_cta contains
only the SDA-selected attribute columns and y_cta is the full
outcome vector (all observations, not just unresolved).
Usage
sda_to_cta_data(fit, X, y)
Arguments
fit |
An |
X |
Data frame of predictors (all observations). |
y |
Integer class vector (all observations). |
Details
This matches the Path B workflow from MPE Chapter 12: SDA identifies the attribute subset; MDSA/CTA receives the full sample with a constrained candidate frame. SDA resolution does not restrict which observations CTA sees.
Value
Named list with elements X_cta (data frame, selected columns
only) and y_cta (integer vector, full length).
Conventional SMD companion table for covariate balance
Description
Computes standardized mean differences (SMD) between two groups for each
covariate in X. Returns one row per covariate with group means,
standard deviations, raw and absolute SMD, and conventional balance
thresholds.
Usage
smd_balance_table(group, X, w = NULL)
Arguments
group |
Integer (or coercible) binary group indicator. Must have exactly two distinct non-missing values. |
X |
Data frame of baseline covariate columns. |
w |
Optional numeric case-weight vector. When supplied, weighted
group means ( |
Details
SMD is a conventional companion diagnostic, not the oda
balance objective. The primary oda balance assessment uses
oda_balance_table. This function is intended for comparison
with non-ODA balance reports.
No p-values are computed. SMD is a descriptive statistic. For a variable
with zero within-group variance in both groups, smd is NA.
Value
A data.frame of class c("smd_balance_table",
"data.frame") with one row per covariate and columns:
attribute, n_group_0, n_group_1,
mean_0, sd_0, mean_1, sd_1,
smd, abs_smd,
balanced_020 (abs_smd < 0.20),
balanced_010 (abs_smd < 0.10),
wmean_0, wmean_1, wsmd, wabs_smd,
wbalanced_020, wbalanced_010
(weighted variants; NA when w = NULL).
See Also
oda_balance_table, oda_balance_plot_data
Examples
group <- c(rep(0L, 30), rep(1L, 30))
X <- data.frame(age = c(rep(45, 30), rep(55, 30)),
score = rnorm(60, 50, 10))
smd_balance_table(group, X)
Summarise a CTA descendant family
Description
Returns a structured S3 object summarising the CTA descendant family. All values are read from stored fields - no refitting or recomputation is performed.
Usage
## S3 method for class 'cta_family'
summary(object, ...)
Arguments
object |
A |
... |
Unused; included for S3 compatibility. |
Value
summary.cta_family returns a list of class
c("cta_family_summary", "list") with fields:
n_membersInteger number of family members.
min_d_idxInteger index of the feasible member with minimum D;
NA_integer_if no feasible member exists.terminatedLogical; always
TRUEfor a completed chain.termination_reasonCharacter: one of
"no_tree","max_steps","no_next_mindenom".has_weightsLogical;
TRUEwhen any family member used case weights.tableA
data.framefromcta_family_table.
See Also
cta_descendant_family, cta_family_table
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
fam <- suppressMessages(
cta_descendant_family(X, y, start_mindenom = 1L, mc_iter = 200L,
mc_seed = 42L, loo = "off")
)
s <- summary(fam)
print(s)
print(fam)
Summary method for Locally Optimal Recursive Tree (LORT)
Description
Returns a structured list of class "cta_ort_summary" capturing
tree-level metadata for the composite LORT.
Usage
## S3 method for class 'cta_ort'
summary(object, ...)
Arguments
object |
A |
... |
Unused. |
Value
A list of class "cta_ort_summary".
Note
summary.cta_ort is a legacy compatibility name for the LORT
method. See print.cta_ort for the naming note.
Summarize a fitted CTA tree
Description
Returns a structured list with class "cta_tree_summary" capturing
tree-level metadata. All fields are read directly from stored objects;
no refitting or prediction is performed.
Usage
## S3 method for class 'cta_tree'
summary(object, ...)
Arguments
object |
A |
... |
Unused. |
Value
A list of class "cta_tree_summary" with fields:
statusCharacter:
"valid_tree","stump", or"no_tree".no_treeLogical;
TRUEfor leaf-only fits.root_attributeCharacter attribute name at the root split;
NA_character_for no-tree fits.n_nodesTotal number of nodes including leaves.
n_splitsNumber of non-leaf (split) nodes.
n_leavesNumber of terminal leaf endpoints (=
strata).strataAlias for
n_leaves;NA_integer_for no-tree fits.overall_essWESS when weights are active, ESS otherwise;
NA_real_when absent.dD statistic (
NA_real_for no-tree or ESS\le0).min_terminal_denomSmallest leaf
n_obs;NA_integer_for no-tree fits.endpoint_denominatorsNamed integer vector of leaf
n_obs;integer(0)for no-tree fits.has_weightsLogical;
TRUEwhen case weights are active.mindenomMINDENOM used when fitting.
alpha_splitSignificance threshold used when fitting.
prune_alphaPruning threshold used when fitting.
looLOO mode string used when fitting.
See Also
oda_cta_fit, cta_node_table,
cta_strata, cta_d_stat,
print.cta_tree_summary
Examples
data(mtcars)
X <- mtcars[, c("cyl", "disp", "hp", "wt")]
y <- as.integer(mtcars$am)
tree <- oda_cta_fit(X, y, mindenom = 5L, mc_iter = 500L, mc_seed = 42L)
s <- summary(tree)
print(s)
Summarize a fitted ODA model
Description
Returns a structured list with class "oda_fit_summary" exposing train
and LOO sections. Does not recompute any quantities; fields absent from the
fit appear as NA or NULL.
Usage
## S3 method for class 'oda_fit'
summary(object, ...)
Arguments
object |
An |
... |
Unused. |
Value
A list of class "oda_fit_summary".
Summarise an sda_anchor
Description
Returns a named list with the key structural fields needed to audit the anchor or pass it to future SORT / staged-CTA pipelines.
Usage
## S3 method for class 'sda_anchor'
summary(object, ...)
Arguments
object |
An |
... |
Ignored. |
Value
Named list with fields: anchor_type, n_stages,
selected_attributes, candidate_universe,
group_levels, stop_reason, weights_used,
loo_mode, mc_iter, mc_seed, mindenom,
alpha, stage_table, canon_notes,
implementation_status, safety_notes.
Summarise an sda_fit object
Description
Summarise an sda_fit object
Usage
## S3 method for class 'sda_fit'
summary(object, ...)
Arguments
object |
An |
... |
Unused. |
Value
An object of class "sda_fit_summary" (a list) with elements:
mode, n_initial, n_final_unresolved,
stop_reason, selected_attributes, step_table
(data.frame), and settings.
Validate an sda_anchor object
Description
Checks that all required fields are present and well-formed. Errors clearly on any violation so that downstream SORT / staged-CTA code can rely on the contract.
Usage
validate_sda_anchor(anchor, strict = TRUE)
Arguments
anchor |
An object to validate. |
strict |
Logical (default |
Value
anchor invisibly (on success).