---
title: "Binary ODA: Gully Erosion Adjustment and Motivation"
author: "oda"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Binary ODA: Gully Erosion Adjustment and Motivation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## Research question

A study of gully erosion in rural southeast Nigeria documented the type of
adjustment made by community members in response to erosion hazards, and asked
whether adjustment type could distinguish individually-motivated from
community-motivated responses.^[Okuh D, Osumgborogwu IE (2019). Adjustments to
hazards of gully erosion in rural southeast Nigeria: A case of Amacha
communities. *Applied Ecology and Environmental Sciences*, 7, 11-20.]

Four adjustment types were recorded: (1) Use of Ridges, (2) Shifting
Habitation, (3) Relocation, and (4) Intensified Cultivation. Because Ridges
and Shifting Habitation are actions that can be taken individually or
collectively, whereas Relocation and Intensified Cultivation require organised
community effort, the hypothesis is that Ridges and Shifting predict Community
motivation and Relocation and Intensification predict Individual motivation.
Optimal Data Analysis (UniODA) tests whether adjustment type discriminates
motivation and quantifies the strength of the association.

## Data

Motivation (0 = Individual, 1 = Community) is the class variable; adjustment
type (1-4) is the attribute. Published cell frequencies are reconstructed
directly into observation-level vectors  -  no external data file is required.

```{r data}
library(oda)

# Cross-classification: rows = adjustment type, cols = motivation.
#                    Indiv (0)  Comm (1)   total
#  Ridges       (1)      85       173       258
#  Shifting     (2)      65       170       235
#  Relocation   (3)     172        10       182
#  Intensified  (4)      45         0        45
#                       367       353       720

motivation  <- c(rep(0L,  85), rep(1L, 173),   # adjustment = 1
                 rep(0L,  65), rep(1L, 170),   # adjustment = 2
                 rep(0L, 172), rep(1L,  10),   # adjustment = 3
                 rep(0L,  45), rep(1L,   0))   # adjustment = 4
adjustment  <- c(rep(1L, 258), rep(2L, 235),
                 rep(3L, 182), rep(4L,  45))

table(adjustment, motivation,
      dnn = c("Adjustment (1=Ridges,2=Shifting,3=Relocation,4=Intensified)",
              "Motivation (0=Individual, 1=Community)"))
```

## Fit the ODA model

Adjustment type is a four-category nominal variable. ODA searches all possible
binary partitions of the four categories and selects the partition that maximises
ESS. No *a priori* direction is supplied; the search is nondirectional
(`Hypothesis: NONDIRECTIONAL` in MegaODA output). Leave-one-out (LOO) jackknife
validity analysis is included.

```{r fit-canonical, eval=FALSE}
# Canonical reference run (mc_iter = 25000L; not evaluated in CRAN vignette)
fit <- oda_fit(
  x         = adjustment,
  y         = motivation,
  attr_type = "categorical",
  mc_iter   = 25000L,
  loo       = "on"
)
```

```{r fit}
# CRAN-safe run: mc_iter = 500L for vignette rendering speed.
# Training rule, ESS, and confusion matrix are identical to the canonical run.
fit <- oda_fit(
  x         = adjustment,
  y         = motivation,
  attr_type = "categorical",
  mc_iter   = 500L,
  mc_seed   = 42L,
  loo       = "on"
)
```

## Rule and confusion matrix

```{r print-fit}
print(fit)
```

ODA's nondirectional search identified the optimal binary partition:

- If adjustment in {1, 2} (Ridges or Shifting) -> predict motivation = Community (1)
- If adjustment in {3, 4} (Relocation or Intensified) -> predict motivation = Individual (0)

This recovered mapping is substantively consistent with the adjustment/motivation
hypothesis: adjustments requiring collective action (Ridges, Shifting) predict
Community motivation; adjustments undertaken individually (Relocation, Intensified
Cultivation) predict Individual motivation.

```{r confusion}
# Confusion matrix: actual motivation (rows) x predicted motivation (cols)
conf_mat <- matrix(
  c(fit$confusion$TN, fit$confusion$FP,
    fit$confusion$FN, fit$confusion$TP),
  nrow = 2L, byrow = TRUE,
  dimnames = list(Actual    = c("Indiv(0)", "Comm(1)"),
                  Predicted = c("Indiv(0)", "Comm(1)"))
)
print(conf_mat)
```

## ESS / PAC / PV interpretation

```{r metrics}
summary(fit)
```

```{r pv}
# Predictive value: accuracy when the model makes a prediction into each class
pv_indiv <- fit$confusion$TN / (fit$confusion$TN + fit$confusion$FN)
pv_comm  <- fit$confusion$TP / (fit$confusion$TP + fit$confusion$FP)
cat("PV Individual (0):", round(pv_indiv * 100, 1), "%\n")
cat("PV Community  (1):", round(pv_comm  * 100, 1), "%\n")
```

- **PAC (sensitivity per class):** 59.1% for Individual members, 97.2% for
  Community members. Because 50% correct per class is expected by chance, the
  model classifies Community-motivated adjustments nearly twice as well as chance
  while Individual classification exceeds chance by only 18%.
- **ESS = 56.30%** indicates a relatively strong effect.^[Yarnold, P.R., &
  Soltysik, R.C. (2005). *Optimal Data Analysis: A Guidebook with Software for
  Windows.* Washington, D.C.: APA Books.] The pronounced asymmetry (59% vs. 97%)
  reflects near-uniform community uptake of Ridges and Shifting adjustments.
- **PV:** When the model predicts Individual motivation it is correct ~95.6% of
  the time; when it predicts Community motivation ~69.6%.

## Monte Carlo and LOO validity

The MC p-value and LOO result are shown in the `summary` output above.

- **LOO stability:** Each LOO fold independently searches for the optimal
  nondirectional categorical partition on n - 1 observations. Because the
  {1,2} -> 1 / {3,4} -> 0 mapping is the globally optimal partition for these
  data, every fold recovers the same rule. LOO ESS equals training ESS (56.30%)
  and LOO confusion equals the training confusion matrix.
- **LOO Fisher exact p < .001:** Statistical significance confirmed in holdout
  analysis.
- **MC p-value:** Each permutation evaluates the best nondirectional partition
  of the permuted labels, yielding a nondirectional Fisher-randomization p-value.
  The `p(MC)` shown is expected to be very small (MegaODA reports p = 0.000000
  at 25000 iterations); the 500-iteration CRAN run may show p = 0.0. Use
  `mc_iter = 25000L` for publication results.

## Notes on reproducibility

**Fixture parity.** The training rule, confusion matrix, and ESS are verified
against MegaODA.exe output in the package test suite
(`tests/testthat/test-fixture-vignettes.R`, Example 2).

**MC p-value calibration.** The MC p shown here reflects `mc_iter = 500L`
for CRAN build speed. MegaODA reports p = 0.000000 (exact zero) at 25000
iterations; with 500 iterations a near-zero p will still be reported
accurately (STOP fires early). Use the canonical run with `mc_iter = 25000L`
(chunk `fit-canonical`, `eval=FALSE`) for publication-quality results.
Training ESS and confusion matrix are unaffected by `mc_iter`.

**Nondirectional search.** No `direction_map` is supplied. ODA evaluates all
possible binary partitions of the four adjustment categories and selects the
one that maximises ESS. The MC permutation test is nondirectional: each
permutation selects the best partition for the permuted labels. This matches
the MegaODA.exe gold run (`Hypothesis: NONDIRECTIONAL`).

**Optional constrained analysis.** A researcher with an *a priori* hypothesis
specifying exactly which categories predict which class can supply
`direction_map = c("1"=1L, "2"=1L, "3"=0L, "4"=0L)` for a fixed-partition
directional analysis (MPE Chapter 4 Phase 6C). For this dataset the two
analyses yield identical ESS and confusion because the *a priori* mapping
happens to be the global optimum; they differ in MC interpretation (directional
vs. nondirectional p-value).
