bayesDiagnostics: Comprehensive Bayesian Model Diagnostics

The bayesDiagnostics package provides a comprehensive suite of diagnostic tools for Bayesian models fitted with brms, rstan, and other popular Bayesian modeling packages. This vignette demonstrates all major functions organized by diagnostic category.

library(bayesDiagnostics)
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.23.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#> 
#> Attaching package: 'brms'
#> The following object is masked from 'package:stats':
#> 
#>     ar
library(ggplot2)

Overview of Package Functions

The package contains 18 main functions organized into 5 categories:

MCMC Convergence Diagnostics (3 functions)
Posterior Predictive Checks (5 functions)
Prior Specification & Sensitivity (3 functions)
Model Comparison (3 functions)
Utilities & Reporting (4 functions)

Part 1: MCMC Convergence Diagnostics

1.1 Quick MCMC Diagnostics Summary

The mcmc_diagnostics_summary() function provides a comprehensive overview of MCMC convergence, including R-hat, ESS, and NUTS-specific diagnostics.

# Fit a simple model
data <- data.frame(
  y = rnorm(100, mean = 5, sd = 2),
  x1 = rnorm(100),
  x2 = rnorm(100)
)

fit <- brm(y ~ x1 + x2, data = data, chains = 4, iter = 2000, refresh = 0)

# Run comprehensive diagnostics
diag <- mcmc_diagnostics_summary(fit, rhat_threshold = 1.01, ess_threshold = 400)
print(diag)

# Check results
diag$converged  # TRUE/FALSE
diag$summary_table  # Full diagnostic table
diag$divergences  # Number of divergent transitions

What it checks: - R-hat values (should be < 1.01) - Bulk ESS and Tail ESS (should be > 400) - Divergent transitions (should be 0) - Tree depth saturation (NUTS-specific)

1.2 Effective Sample Size Diagnostics

The effective_sample_size_diagnostics() function provides detailed ESS analysis with visual diagnostics.

# Detailed ESS analysis
ess_diag <- effective_sample_size_diagnostics(
  model = fit,
  parameters = c("b_x1", "b_x2", "sigma"),
  min_ess = 400,
  by_chain = TRUE,
  plot = TRUE
)

print(ess_diag)
plot(ess_diag)

# Access specific components
ess_diag$ess_summary        # Summary statistics
ess_diag$bulk_ess           # Bulk ESS per parameter
ess_diag$tail_ess           # Tail ESS per parameter
ess_diag$by_chain_ess       # Per-chain ESS breakdown
ess_diag$problematic_params # Parameters with low ESS
ess_diag$recommendations    # Actionable suggestions

Interpretation: - Bulk ESS: Measures precision of posterior mean/median estimates - Tail ESS: Measures precision of credible interval bounds - Per-chain ESS: Identifies which specific chains have issues

1.3 Hierarchical Model Convergence

For hierarchical/multilevel models, use hierarchical_convergence() to check group-level and population-level parameters separately.

# Fit hierarchical model
data_hier <- data.frame(
  y = rnorm(200),
  x = rnorm(200),
  group = rep(1:10, each = 20)
)

fit_hier <- brm(
  y ~ x + (1 + x | group),
  data = data_hier,
  chains = 4,
  refresh = 0
)

# Check hierarchical convergence
hier_conv <- hierarchical_convergence(
  model = fit_hier,
  group_vars = "group",
  plot = TRUE
)

print(hier_conv)
plot(hier_conv)

What it provides: - Group-level vs. population-level convergence breakdown - Variance component diagnostics - Shrinkage assessment - Visual comparison of group-level parameters

Part 2: Posterior Predictive Checks (PPC)

2.1 Comprehensive Posterior Predictive Checks

The posterior_predictive_check() function is the workhorse for model validation.

# Basic PPC with multiple test statistics
ppc_result <- posterior_predictive_check(
  model = fit,
  observed_data = data$y,
  n_samples = 1000,
  test_statistics = c("mean", "sd", "median", "min", "max", "skewness", "kurtosis"),
  plot = TRUE
)

print(ppc_result)
plot(ppc_result)

# Access results
ppc_result$observed_stats      # Observed test statistics
ppc_result$replicated_stats    # Posterior predictive statistics
ppc_result$p_values            # Bayesian p-values (should be ~0.5)

Interpretation Guide: - p-value ≈ 0.5: Good fit (model captures the statistic well) - p-value < 0.05 or > 0.95: Model misspecification - p-value near 0: Model underestimates the statistic - p-value near 1: Model overestimates the statistic

2.2 Automated PPC

For quick diagnostic checks across multiple statistics, use automated_ppc():

# Automated checks with flagging
auto_ppc <- automated_ppc(
  model = fit,
  observed_data = data$y,
  n_samples = 1000,
  p_value_threshold = 0.05
)

print(auto_ppc)

# Check for issues
auto_ppc$diagnostics  # Table with all statistics and flags
auto_ppc$flags        # Text warnings for problematic statistics

When to use: - Quick model validation - Screening for gross misspecification - Automated reporting workflows

2.3 Graphical PPC

Create publication-quality PPC visualizations:

# Density overlay
p1 <- graphical_ppc(fit, data$y, type = "density", n_draws = 50)
print(p1)

# Prediction intervals
p2 <- graphical_ppc(fit, data$y, type = "intervals", n_draws = 50)
print(p2)

# Ribbon plot (useful for ordered/time-series data)
p3 <- graphical_ppc(fit, data$y, type = "ribbon", n_draws = 50)
print(p3)

2.4 LOO Cross-Validation PPC

The ppc_crossvalidation() function performs Leave-One-Out Probability Integral Transform (LOO-PIT) checks:

# LOO-PIT check
loo_ppc <- ppc_crossvalidation(
  model = fit,
  observed_y = data$y,
  n_draws = NULL  # Use all draws for accuracy
)

print(loo_ppc)
plot(loo_ppc)

# Inspect results
loo_ppc$pit_values   # Should be ~Uniform(0,1) if well-calibrated
loo_ppc$loo_result   # LOO object
loo_ppc$weighted     # Whether weighted PIT was used

Interpretation: - Uniform PIT distribution: Model is well-calibrated - U-shaped: Over-dispersed predictions - Inverse-U: Under-dispersed predictions - Skewed: Systematic bias in predictions

2.5 Custom Bayesian P-Values

For custom test statistics, use the flexible bayesian_p_values() utility:

# Generate posterior predictive samples
yrep <- posterior_predict(fit, ndraws = 1000)

# Define custom test statistic
custom_stat <- function(x) max(x) - min(x)  # Range

# Calculate Bayesian p-value
result <- bayesian_p_values(
  yrep = yrep,
  y = data$y,
  statistic = custom_stat
)

result$observed_value     # Observed range
result$replicated_values  # Posterior predictive ranges
result$p_value           # P(replicated ≥ observed)

Part 3: Prior Specification & Sensitivity Analysis

3.1 Prior Elicitation Helper

The prior_elicitation_helper() translates expert knowledge into statistical priors:

# Define expert beliefs
expert_input <- list(
  parameter_name = "treatment_effect",
  plausible_range = c(-0.5, 1.5),      # Plausible values
  most_likely_value = 0.3,             # Best guess
  confidence = 0.8                      # How confident (0-1)
)

# Get prior recommendations
prior_rec <- prior_elicitation_helper(
  expert_beliefs = expert_input,
  parameter_type = "continuous",
  method = "quantile",
  data_sample = rnorm(100, 0.3, 0.5),  # Optional: existing data
  visualize = TRUE
)

print(prior_rec)

# Use recommended prior
prior_rec$recommended_prior   # brms::prior object
prior_rec$alternatives        # Alternative priors for sensitivity
prior_rec$sensitivity_note    # Guidance

Parameter types: - "continuous": Normal, Student-t, log-normal - "discrete": Poisson, negative binomial - "proportion": Beta distribution

3.2 Prior Sensitivity Analysis

The prior_sensitivity() function assesses robustness to prior choice:

# Define alternative priors
prior_grid <- list(
  weak = set_prior("normal(0, 10)", class = "b"),
  moderate = set_prior("normal(0, 2)", class = "b"),
  strong = set_prior("normal(0, 0.5)", class = "b")
)

# Run sensitivity analysis
sens_result <- prior_sensitivity(
  model = fit,
  parameters = c("b_x1", "b_x2"),
  prior_grid = prior_grid,
  comparison_metric = "KL",  # or "Wasserstein", "overlap"
  plot = TRUE,
  n_draws = 2000
)

print(sens_result)
plot(sens_result)

# Check sensitivity
sens_result$sensitivity_metrics  # How much posteriors differ

Metrics: - KL divergence: Information-theoretic difference - Wasserstein distance: L1 distance between distributions - Overlap coefficient: Proportion of overlapping density

Interpretation: - Low sensitivity: Conclusions robust to prior choice ✓ - High sensitivity: Data is weak; prior strongly influences results

3.3 Prior Robustness Analysis

For comprehensive multi-dimensional sensitivity assessment:

robust_result <- prior_robustness(
  model = fit,
  prior_specifications = prior_grid,
  parameters = c("b_x1", "b_x2"),
  perturbation_direction = "expand",  # or "contract", "shift"
  dimensions = c(0.5, 1, 2, 4),       # Scaling factors
  comparison_metric = "KL",
  plot = TRUE
)

print(robust_result)

# Check robustness
robust_result$robustness_index       # Composite score (higher = more robust)
robust_result$concerning_parameters  # Parameters with low robustness
robust_result$recommendations        # What to do

Part 4: Model Comparison

4.1 Comprehensive Model Comparison Suite

The model_comparison_suite() compares multiple models using information criteria:

# Fit competing models
fit1 <- brm(y ~ x1, data = data, refresh = 0)
fit2 <- brm(y ~ x1 + x2, data = data, refresh = 0)
fit3 <- brm(y ~ x1 * x2, data = data, refresh = 0)

# Compare models
comparison <- model_comparison_suite(
  Model_1 = fit1,
  Model_2 = fit2,
  Model_3 = fit3,
  criterion = "all",  # LOO, WAIC, Bayes R²
  plot = TRUE,
  detailed = TRUE
)

print(comparison)
plot(comparison)

# Results
comparison$comparison_table   # All metrics
comparison$ic_differences     # ΔLOO, ΔWAIC, model weights
comparison$plots              # Visualization

Metrics explained: - LOO-ELPD: Leave-one-out expected log predictive density (higher is better) - WAIC: Widely Applicable Information Criterion (lower is better) - Bayes R²: Bayesian coefficient of determination - Model weights: Probability each model is best

4.2 Bayes Factor Comparison

For direct model comparison via Bayes Factors:

# Compare two models
bf_result <- bayes_factor_comparison(
  Model_A = fit1,
  Model_B = fit2,
  method = "bridge_sampling",  # or "waic"
  repetitions = 5,
  silent = TRUE
)

print(bf_result)

# Interpretation
bf_result$bayes_factor     # BF_{1,2}
bf_result$log_bf           # Log Bayes Factor
bf_result$interpretation   # Text interpretation

# For 3+ models: pairwise comparisons
bf_multi <- bayes_factor_comparison(fit1, fit2, fit3, method = "bridge_sampling")
bf_multi$pairwise_comparisons

Bayes Factor Interpretation (Kass & Raftery, 1995): - BF < 1: Evidence for Model 2 - 1-3: Weak evidence for Model 1 - 3-10: Moderate evidence - 10-30: Strong evidence - > 30: Very strong evidence

4.3 Predictive Performance Evaluation

The predictive_performance() function evaluates out-of-sample predictions:

# In-sample performance
perf_in <- predictive_performance(
  model = fit,
  newdata = NULL,           # NULL = use training data
  observed_y = data$y,
  metrics = "all",          # RMSE, MAE, Coverage, CRPS
  credible_level = 0.95,
  n_draws = NULL
)

print(perf_in)
plot(perf_in)

# Out-of-sample performance
test_data <- data.frame(x1 = rnorm(50), x2 = rnorm(50), y = rnorm(50))
perf_out <- predictive_performance(
  model = fit,
  newdata = test_data,
  observed_y = test_data$y,
  metrics = "all"
)

# Compare metrics
perf_in$point_metrics      # RMSE, MAE, Correlation
perf_in$interval_metrics   # Coverage, Interval Width
perf_in$proper_scores      # CRPS (lower is better)
perf_in$prediction_summary # Per-observation predictions

Key metrics: - RMSE/MAE: Prediction error (lower is better) - Coverage: % of observations within credible intervals (should ≈ 0.95) - CRPS: Continuous Ranked Probability Score (proper scoring rule)

Part 5: Utilities & Comprehensive Reporting

5.1 Extract Posterior (Unified Interface)

The extract_posterior_unified() provides consistent posterior extraction across packages:

# Extract as data frame
draws_df <- extract_posterior_unified(
  model = fit,
  parameters = c("b_x1", "b_x2", "sigma"),
  format = "draws_df",
  ndraws = 1000,
  include_warmup = FALSE,
  chains = NULL  # All chains
)

# Extract as matrix
draws_matrix <- extract_posterior_unified(fit, format = "draws_matrix")

# Extract as array (iterations × chains × parameters)
draws_array <- extract_posterior_unified(fit, format = "draws_array")

# Extract as named list
draws_list <- extract_posterior_unified(fit, format = "list")

Supported model types: - brmsfit (brms) - stanfit (rstan) - stanreg (rstanarm) - CmdStanMCMC (cmdstanr) - mcmc.list (coda)

5.2 Diagnostic Report Generation

The diagnostic_report() function creates comprehensive HTML/PDF reports:

# Generate comprehensive report
diagnostic_report(
  model = fit,
  output_file = "bayesian_model_diagnostics.pdf",
  output_format = "pdf",  # or "html", "docx"
  include_sections = c(
    "model_summary",
    "convergence",
    "posterior_summary",
    "recommendations"
  ),
  rhat_threshold = 1.01,
  ess_threshold = 0.1,
  open_report = TRUE  # Open automatically
)

Sections included: 1. Model Summary: Formula, family, data info 2. Convergence Diagnostics: R-hat, ESS, divergences 3. Posterior Summary: Parameter estimates with credible intervals 4. Recommendations: Actionable suggestions for model improvement

Complete Workflow Example

Here’s a complete Bayesian workflow using all major functions:

# ===== STEP 1: FIT MODEL =====
library(bayesDiagnostics)
library(brms)

# Simulate data
set.seed(123)
N <- 200
data <- data.frame(
  y = rnorm(N, mean = 3 + 2*rnorm(N) - 0.5*rnorm(N), sd = 1.5),
  x1 = rnorm(N),
  x2 = rnorm(N)
)

# Fit Bayesian model
fit <- brm(
  y ~ x1 + x2,
  data = data,
  prior = c(
    prior(normal(0, 5), class = "b"),
    prior(student_t(3, 0, 2.5), class = "sigma")
  ),
  chains = 4,
  iter = 2000,
  warmup = 1000,
  refresh = 0
)

# ===== STEP 2: CONVERGENCE DIAGNOSTICS =====
# Quick check
diag <- mcmc_diagnostics_summary(fit)
print(diag)

# Detailed ESS analysis
ess_diag <- effective_sample_size_diagnostics(fit, plot = TRUE)
plot(ess_diag)

# ===== STEP 3: POSTERIOR PREDICTIVE CHECKS =====
# Comprehensive PPC
ppc <- posterior_predictive_check(fit, observed_data = data$y, plot = TRUE)
print(ppc)

# Automated screening
auto_ppc <- automated_ppc(fit, data$y)
print(auto_ppc)

# LOO cross-validation
loo_ppc <- ppc_crossvalidation(fit, data$y)
plot(loo_ppc)

# ===== STEP 4: PRIOR SENSITIVITY =====
# Define alternative priors
prior_grid <- list(
  weak = set_prior("normal(0, 10)", class = "b"),
  moderate = set_prior("normal(0, 5)", class = "b"),
  strong = set_prior("normal(0, 1)", class = "b")
)

# Test sensitivity
sens <- prior_sensitivity(
  fit,
  parameters = c("b_x1", "b_x2"),
  prior_grid = prior_grid,
  plot = TRUE
)
print(sens)

# ===== STEP 5: MODEL COMPARISON =====
# Fit alternative models
fit_x1 <- brm(y ~ x1, data = data, refresh = 0)
fit_x1x2 <- fit
fit_interaction <- brm(y ~ x1 * x2, data = data, refresh = 0)

# Compare
comp <- model_comparison_suite(
  Linear = fit_x1,
  Additive = fit_x1x2,
  Interaction = fit_interaction,
  criterion = "all",
  plot = TRUE
)
print(comp)

# ===== STEP 6: PREDICTIVE PERFORMANCE =====
# Hold-out validation
test_idx <- sample(1:N, 40)
test_data <- data[test_idx, ]
train_data <- data[-test_idx, ]

fit_train <- update(fit, newdata = train_data, refresh = 0)

perf <- predictive_performance(
  fit_train,
  newdata = test_data,
  observed_y = test_data$y,
  metrics = "all"
)
print(perf)
plot(perf)

# ===== STEP 7: GENERATE REPORT =====
diagnostic_report(
  fit,
  output_file = "full_diagnostics.html",
  output_format = "html",
  include_sections = c("model_summary", "convergence", 
                       "posterior_summary", "recommendations")
)

Interpretation Guidelines

Convergence (MCMC)

R-hat < 1.01: Chains have converged
ESS > 400: Sufficient effective samples
No divergences: Sampler is behaving well
R-hat > 1.01: Run longer chains or reparameterize
Divergences > 0: Increase adapt_delta or reparameterize

Posterior Predictive Checks

p-values ≈ 0.5: Model captures test statistic
LOO-PIT ~Uniform: Well-calibrated predictions
Extreme p-values: Model misspecification

Prior Sensitivity

Low KL/Wasserstein: Robust to prior choice
High sensitivity: Need more data or informative priors

Model Comparison

ΔLOO < 4: Models are similar
ΔLOO > 10: Strong preference for best model
Coverage ≈ 0.95: Credible intervals well-calibrated

Summary Table: All Functions

Function	Category	Purpose
`mcmc_diagnostics_summary()`	Convergence	Quick MCMC diagnostic overview
`effective_sample_size_diagnostics()`	Convergence	Detailed ESS analysis with plots
`hierarchical_convergence()`	Convergence	Hierarchical model convergence
`posterior_predictive_check()`	PPC	Comprehensive PPC with test statistics
`automated_ppc()`	PPC	Automated screening across statistics
`graphical_ppc()`	PPC	Publication-quality PPC plots
`ppc_crossvalidation()`	PPC	LOO-PIT cross-validation checks
`bayesian_p_values()`	PPC	Custom test statistic p-values
`prior_elicitation_helper()`	Prior	Translate expert knowledge to priors
`prior_sensitivity()`	Prior	Assess robustness to prior choice
`prior_robustness()`	Prior	Multi-dimensional sensitivity analysis
`model_comparison_suite()`	Comparison	Compare models via LOO/WAIC/R²
`bayes_factor_comparison()`	Comparison	Bayes Factor model comparison
`predictive_performance()`	Comparison	Evaluate predictive accuracy
`extract_posterior_unified()`	Utility	Unified posterior extraction
`diagnostic_report()`	Utility	Generate comprehensive reports

Getting Help

# Function documentation
?mcmc_diagnostics_summary
?posterior_predictive_check
?prior_sensitivity

# Package vignettes
browseVignettes("bayesDiagnostics")

# Report issues
# https://github.com/yourusername/bayesDiagnostics/issues

End of Vignette

Introduction to bayesDiagnostics