---
title: "Getting Started with NRMstatsML"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Getting Started with NRMstatsML}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 4,
  message   = FALSE,
  warning   = FALSE
)
```

## Introduction

**NRMstatsML** is an R package for statistical and machine learning-based
analysis of long-term Natural Resource Management (NRM) datasets. It is
designed for datasets spanning 10–20 years covering soil health, water,
crop yield, and climate variables.

The package is organised into seven core modules:

| Module | Functions | Purpose |
|--------|-----------|---------|
| `trendML` | `nrm_trend`, `nrm_mann_kendall`, `nrm_sens_slope`, `nrm_structural_break` | Trend detection and structural change |
| `multiSysML` | `nrm_multivariate`, `nrm_pls`, `nrm_sem` | Multivariate system modelling |
| `responseML` | `nrm_response_curve`, `nrm_optimize_input` | Yield-response and optimisation |
| `tsML` | `nrm_forecast`, `nrm_arima` | Time series forecasting |
| `panelML` | `nrm_panel`, `nrm_did` | Panel data and treatment effects |
| `uncertaintyML` | `nrm_uncertainty`, `nrm_bootstrap`, `nrm_monte_carlo` | Uncertainty quantification |
| `autoML` | `nrm_automl`, `nrm_benchmark` | Automated model selection |

---

## Installation

```{r install, eval = FALSE}
# From CRAN (once published)
install.packages("NRMstatsML")

# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourorg/NRMstatsML")
```

---

## Example Data

The package ships with `nrm_example`, a synthetic 20-year dataset that
mimics a long-term fertiliser experiment.

```{r load-data}
library(NRMstatsML)

data(nrm_example)
head(nrm_example)
```

Always start by validating the data:

```{r data-check}
nrm_data_check(nrm_example)
```

---

## Module 1 — Trend Analysis (trendML)

### Mann-Kendall test and Sen's slope

```{r trend}
trend_res <- nrm_trend(nrm_example,
                       time_var  = "year",
                       value_var = "crop_yield",
                       breaks    = TRUE)
print(trend_res)
```

### Visualise the trend

```{r trend-plot}
nrm_plot(trend_res)
```

### Individual components

```{r mk-only}
mk <- nrm_mann_kendall(nrm_example, time_var = "year",
                       value_var = "soil_OC")
print(mk)

ss <- nrm_sens_slope(nrm_example, time_var = "year",
                     value_var = "soil_OC")
print(ss)
```

---

## Module 2 — Multivariate System Modelling (multiSysML)

### Scaled OLS regression

```{r multivariate}
mv <- nrm_multivariate(nrm_example,
        formula = crop_yield ~ N + P + K + rainfall,
        scale   = TRUE)
print(mv)
```

### Partial Least Squares (PLS)

PLS is preferred when predictors are highly collinear, which is common in
NRM data where N, P, K, and rainfall are often correlated.

```{r pls}
pl <- nrm_pls(nrm_example,
              formula = crop_yield ~ N + P + K + rainfall + soil_OC,
              ncomp   = 3)
print(pl)
```

---

## Module 3 — Response Curve & Optimisation (responseML)

```{r response-curve}
rc <- nrm_response_curve(nrm_example,
        input_var    = "N",
        response_var = "crop_yield",
        type         = "quadratic")
print(rc)
nrm_plot(rc)
```

### Economic optimum

Find the nitrogen rate that maximises profit given a cost-to-price ratio:

```{r optimise}
opt <- nrm_optimize_input(rc, price_ratio = 0.6)
print(opt)
```

---

## Module 4 — Time Series Forecasting (tsML)

```{r arima}
ar <- nrm_arima(nrm_example, value_var = "crop_yield")
print(ar)
```

```{r forecast}
fc <- nrm_forecast(nrm_example,
                   value_var = "crop_yield",
                   horizon   = 5)
print(fc)
nrm_plot(fc)
```

---

## Module 5 — Panel Data & Treatment Effects (panelML)

Panel and DiD analyses require a multi-site, multi-year dataset. See
`?nrm_panel` and `?nrm_did` for worked examples with appropriate data.

```{r panel, eval = FALSE}
# Requires a panel dataset with site and year identifiers
pm <- nrm_panel(panel_data,
                formula = crop_yield ~ N + P + K + rainfall,
                index   = c("site", "year"),
                model   = "within")
nrm_summary(pm)

did <- nrm_did(panel_data,
               outcome   = "crop_yield",
               treat_var = "treatment_binary",
               time_var  = "post_period")
print(did)
```

---

## Module 6 — Uncertainty & Sensitivity Analysis (uncertaintyML)

```{r bootstrap}
bs <- nrm_bootstrap(nrm_example,
        stat_fn = function(d) mean(d$crop_yield),
        n_iter  = 500)
print(bs)
```

```{r monte-carlo}
mc <- nrm_monte_carlo(nrm_example,
        stat_fn   = function(d) mean(d$crop_yield),
        n_iter    = 500,
        noise_sd  = 0.1)
print(mc)
```

Use `nrm_uncertainty()` as a unified entry point:

```{r uncertainty}
unc <- nrm_uncertainty(nrm_example,
         stat_fn = function(d) mean(d$crop_yield),
         method  = "bootstrap",
         n_iter  = 500)
print(unc)
```

---

## Module 7 — AutoML & Model Benchmarking (autoML)

```{r automl, eval = FALSE}
# Requires caret + method-specific packages (e.g. randomForest, gbm)
am <- nrm_automl(nrm_example,
                 formula  = crop_yield ~ N + P + K + rainfall + soil_OC,
                 methods  = c("lm", "rf", "gbm"),
                 cv_folds = 5,
                 seed     = 42)
nrm_summary(am)
```

### Benchmarking on a hold-out set

```{r benchmark, eval = FALSE}
n     <- nrow(nrm_example)
train <- nrm_example[seq_len(floor(0.8 * n)), ]
test  <- nrm_example[seq(floor(0.8 * n) + 1, n), ]

m_ols <- lm(crop_yield ~ N + P + K, data = train)

bm <- nrm_benchmark(
  models       = list(ols = m_ols),
  test_data    = test,
  response_var = "crop_yield"
)
print(bm)
```

---

## Recommended Workflow

```
nrm_data_check()          # 1. Validate data
    ↓
nrm_trend()               # 2. Detect trends and breaks
    ↓
nrm_multivariate() / nrm_pls()   # 3. Multivariate modelling
    ↓
nrm_response_curve()      # 4. Fit yield-response
nrm_optimize_input()      #    → Identify optimal inputs
    ↓
nrm_forecast()            # 5. Forecast future values
    ↓
nrm_uncertainty()         # 6. Quantify uncertainty
    ↓
nrm_automl()              # 7. Compare candidate models
nrm_benchmark()           #    → Select best for deployment
```

---

## Session Info

```{r session-info}
sessionInfo()
```