Appleton (1995) reported a clinical trial in which 67 patients experiencing migraine were randomised to one of two treatments, and the number of migraine attacks was recorded.1 Various conventional methods - Student’s t-test (including square-root and log transformations), the Mann-Whitney U-test, and a Poisson normal test - either failed to reach conventional significance or violated their underlying assumptions. The analyst ultimately discretised the count at 0 vs. >=1 and validated the split with a one-tailed Fisher’s exact test (p < 0.022).
Because ODA is invariant under any monotonic transformation and requires no distributional assumptions, it can analyse the raw count directly. ODA is used here to determine whether number of migraine attacks discriminates treatment arm, and to quantify the strength of the association.
Treatment arm (0 = Treatment 1, 1 = Treatment 2) is the class variable; number of migraine attacks (0-7, ordered) is the attribute. Published cell frequencies are reconstructed directly into observation-level vectors - no external data file is required.
library(oda)
# Cross-classification: rows = attacks (0-7), cols = treatment arm.
# T1 (0) T2 (1)
# 0 att: 13 5
# 1 att: 9 13
# 2 att: 4 6
# 3 att: 2 1
# 4 att: 1 2
# 5 att: 1 3
# 6 att: 3 3
# 7 att: 0 1
treatment <- c(
rep(0L, 13), rep(1L, 5), # attacks = 0
rep(0L, 9), rep(1L, 13), # attacks = 1
rep(0L, 4), rep(1L, 6), # attacks = 2
rep(0L, 2), rep(1L, 1), # attacks = 3
rep(0L, 1), rep(1L, 2), # attacks = 4
rep(0L, 1), rep(1L, 3), # attacks = 5
rep(0L, 3), rep(1L, 3), # attacks = 6
rep(0L, 0), rep(1L, 1) # attacks = 7
)
attacks <- c(
rep(0L, 18), rep(1L, 22), rep(2L, 10),
rep(3L, 3), rep(4L, 3), rep(5L, 4),
rep(6L, 6), rep(7L, 1)
)
table(attacks, treatment,
dnn = c("Migraine Attacks (0-7)", "Treatment (0=T1, 1=T2)"))
#> Treatment (0=T1, 1=T2)
#> Migraine Attacks (0-7) 0 1
#> 0 13 5
#> 1 9 13
#> 2 4 6
#> 3 2 1
#> 4 1 2
#> 5 1 3
#> 6 3 3
#> 7 0 1Number of attacks is an ordered integer; ODA scans it as an ordered
attribute (no categorical flag), consistent with the MegaODA reference
analysis. No directional hypothesis was specified a priori, so
the default nondirectional search (direction = "both") is
used. Leave-one-out (LOO) jackknife validity analysis is included.
print(fit)
#>
#> ODA (binary) attr_type=ordered priors=TRUE n=67
#>
#> Rule: <= 0.5 --> 0 | > 0.5 --> 1
#>
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#>
#> Mean PAC: 62.34% ESS: 24.69% p(MC): 0.096
#>
#> -- LOO --
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#>
#> LOO ESS: 24.69% p(LOO): 0.022ODA identified a single cut at 0.5, consistent with Appleton’s hand-chosen spline:
# Confusion matrix: actual treatment (rows) x predicted treatment (cols)
conf_mat <- matrix(
c(fit$confusion$TN, fit$confusion$FP,
fit$confusion$FN, fit$confusion$TP),
nrow = 2L, byrow = TRUE,
dimnames = list(Actual = c("T1(0)", "T2(1)"),
Predicted = c("T1(0)", "T2(1)"))
)
print(conf_mat)
#> Predicted
#> Actual T1(0) T2(1)
#> T1(0) 13 20
#> T2(1) 5 29summary(fit)
#>
#> ODA Summary (binary) status=valid n=67
#> attr_type=ordered priors=TRUE weights=FALSE
#> Rule: <= 0.5 --> 0 | > 0.5 --> 1
#>
#> -- Train --
#> Mean PAC (wt): 62.34% ESS: 24.69%
#> Sensitivity: 0.853 Specificity: 0.394
#> p(MC): 0.096 [MC permutation, two-tailed]
#> -- LOO --
#> CLASS n PAC
#> 0 33 39.4%
#> 1 34 85.3%
#> LOO ESS: 24.69%
#> LOO Mean PAC: 62.34%
#> p(LOO): 0.022 [Fisher exact (2x2), one-tailed]# Predictive value: accuracy when the model makes a prediction into each class
pv_t1 <- fit$confusion$TN / (fit$confusion$TN + fit$confusion$FN)
pv_t2 <- fit$confusion$TP / (fit$confusion$TP + fit$confusion$FP)
cat("PV Treatment 1 (0):", round(pv_t1 * 100, 1), "%\n")
#> PV Treatment 1 (0): 72.2 %
cat("PV Treatment 2 (1):", round(pv_t2 * 100, 1), "%\n")
#> PV Treatment 2 (1): 59.2 %The MC p-value and LOO result are shown in the summary
output above.
Why MC p and LOO p diverge for non-directional analyses: MC permutation p is more conservative than LOO Fisher p when the analysis is non-directional. The MC test accounts for the fact that training optimized over both directions (making the permutation baseline harder to beat); the LOO Fisher test does not apply that adjustment. Both values are valid for their respective purposes: MC p assesses training model significance with direction-search adjustment; LOO Fisher p assesses replication of the fixed training rule in held-out data. The divergence narrows or disappears when a directional hypothesis is declared a priori (see Notes).
Fixture parity. The training rule, confusion matrix,
and ESS are verified against MegaODA.exe output in the package test
suite (tests/testthat/test-fixture-vignettes.R, Example
4).
MC p-value calibration. The MC p shown here reflects
mc_iter = 500L for CRAN build speed and will differ from
MegaODA’s reported value (p = 0.086 at 25000 iterations). With only 500
permutations the estimate is noisy (Monte Carlo standard error ~1-2%).
Use the canonical run with mc_iter = 25000L (chunk
fit-canonical, eval=FALSE) for
publication-quality results. Training ESS and confusion matrix are
unaffected by mc_iter.
Directional analysis. The original analysis did not
specify a directional hypothesis a priori; the nondirectional
default (direction = "both") is therefore appropriate. If a
directional ordered hypothesis had been specified in advance (e.g., more
attacks predicts Treatment 2), direction = "greater" or
direction = "less" could be used to enforce MPE Chapter 2
binary ordered directional ODA and obtain a one-tailed p-value.