This vignette shows a compact workflow for fitting several Fay-Herriot models, comparing them with DIC, and extracting posterior summaries. The example uses a small subset of the included ACS data so that the vignette builds quickly. For a substantive analysis, use more MCMC iterations and check convergence before interpreting model differences.
library(nlfh)
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:80, ])
fh_formula <- MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian
sampling_variance <- acs_small$MedIncSE^2
fit_control <- list(
n_iter = 120,
burn_in = 60,
progress = FALSE
)The linear model is the standard area-level Fay-Herriot model. The
RNN and BART models use the same predictors, but estimate a nonlinear
mean function. The RNN fit follows Parker (2024), and the BART-FH fit
follows Parker and Eideh (2026). For nonlinear workflows, inspect the
scale and coding of the covariates before fitting. Formula inputs are
expanded with model.matrix(), so factors are converted to
contrast columns. The RNN method centers and scales non-intercept
covariates by default; use control = list(scale = FALSE) to
disable this. It also standardizes the response and sampling variances
internally, with posterior summaries returned on the original response
scale. For matrix inputs, apply any desired factor encoding before
calling fit_fh().
set.seed(1)
fit_linear <- fit_fh(
formula = fh_formula,
sampling_variance = sampling_variance,
data = acs_small,
method = "linear",
control = fit_control
)
fit_rnn <- fit_fh(
formula = fh_formula,
sampling_variance = sampling_variance,
data = acs_small,
method = "rnn",
control = c(fit_control, list(n_hidden = 20))
)
fit_bart <- fit_fh(
formula = fh_formula,
sampling_variance = sampling_variance,
data = acs_small,
method = "bart",
control = c(fit_control, list(n_bart_samples = 3, n_trees = 10))
)Smaller DIC values indicate better expected out-of-sample fit after accounting for model complexity. DIC is most useful as a relative comparison among models fit to the same response, areas, predictors, and sampling variances. Note that in practice DIC may not be the best tool for model comparison, but is used here for illustration.
fits <- list(
linear = fit_linear,
rnn = fit_rnn,
bart = fit_bart
)
dic_table <- data.frame(
model = names(fits),
dic = vapply(fits, function(x) x$dic, numeric(1)),
row.names = NULL
)
dic_table <- dic_table[order(dic_table$dic), ]
dic_table$delta_dic <- dic_table$dic - min(dic_table$dic)
knitr::kable(dic_table, digits = 2)| model | dic | delta_dic | |
|---|---|---|---|
| 2 | rnn | 1702.02 | 0.00 |
| 3 | bart | 1821.42 | 119.40 |
| 1 | linear | 2231.68 | 529.66 |
summary() returns a structured object with posterior
summaries for available parameters. The printed summary is compact,
while the list components can be used directly in downstream tables or
plots.
summary(fit_linear)
#> <summary.nlfh_fit>
#> Model type: linear Fay-Herriot
#> Formula: MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian
#> DIC: 2232
#>
#> MCMC:
#> n_iter burn_in posterior_draws areas
#> 120 60 60 80
#>
#> Details:
#> retained_draws burn_in_fraction progress scale
#> 60 0.5 FALSE FALSE
#>
#> Variance parameters:
#> parameter mean sd median q2.5 q97.5
#> random_effect_variance 1.319 0.5562 1.229 0.3696 2.477
#>
#> Coefficients:
#> parameter mean sd median q2.5 q97.5
#> (Intercept) 72920 5060 72740 64490 84360
#> SNAPRate -18580 7260 -18770 -30900 -6575
#> PovRate -121100 4880 -121400 -129900 -112200
#> White 7032 4704 7093 -1936 15910
#> Black 6105 5086 6323 -3338 15360
#> Hispanic 14730 6612 14210 3977 27260
#> ... 1 more rows
#>
#> Area-level estimates theta_i:
#> area mean sd median q2.5 q97.5
#> 1 47400 812.1 47450 45930 48830
#> 2 34600 783.1 34780 33100 36150
#> 3 52040 1849.0 51980 48790 55110
#> 4 58320 700.3 58320 56880 59560
#> 5 60390 718.7 60400 58850 61570
#> 6 43690 727.0 43740 42150 44930
#> ... 74 more rowsVariance parameter summaries are available for all three models. The RNN fit also has a coefficient-variance parameter for the random hidden-layer weights.
variance_table <- do.call(
rbind,
lapply(names(fits), function(model) {
out <- summary(fits[[model]])$variance
out$model <- model
out[, c("model", "parameter", "mean", "sd", "median", "q2.5", "q97.5")]
})
)
knitr::kable(variance_table, digits = 2)| model | parameter | mean | sd | median | q2.5 | q97.5 |
|---|---|---|---|---|---|---|
| linear | random_effect_variance | 1.32 | 0.56 | 1.23 | 0.37 | 2.48 |
| rnn | random_effect_variance | 166246557.05 | 32350312.59 | 160876288.93 | 108442252.76 | 228951726.23 |
| rnn | coefficient_variance | 492949564.88 | 98112834.74 | 473792976.75 | 362786763.62 | 705765503.46 |
| bart | random_effect_variance | 0.87 | 0.47 | 0.68 | 0.42 | 2.06 |
For BART fits, the first model-matrix column is treated as a
baseline/intercept column and is excluded from variable importance. With
the formula interface used here, that first column is the default
(Intercept), so the importance values correspond to the
remaining covariates.
knitr::kable(
data.frame(
variable = names(fit_bart$variable_importance),
importance = as.numeric(fit_bart$variable_importance)
),
digits = 3
)| variable | importance |
|---|---|
| SNAPRate | 0.229 |
| PovRate | 0.466 |
| White | 0.060 |
| Black | 0.036 |
| Hispanic | 0.107 |
| Asian | 0.103 |
For the linear model, coefficient summaries are stored in the
coefficients component.
linear_summary <- summary(fit_linear)
knitr::kable(linear_summary$coefficients, digits = 2)| parameter | mean | sd | median | q2.5 | q97.5 |
|---|---|---|---|---|---|
| (Intercept) | 72920.66 | 5059.94 | 72738.84 | 64490.05 | 84359.53 |
| SNAPRate | -18584.81 | 7260.30 | -18774.42 | -30904.73 | -6574.68 |
| PovRate | -121115.62 | 4880.47 | -121397.17 | -129938.16 | -112188.47 |
| White | 7032.40 | 4704.19 | 7093.13 | -1936.36 | 15905.85 |
| Black | 6104.68 | 5085.65 | 6323.47 | -3337.78 | 15362.60 |
| Hispanic | 14726.09 | 6611.59 | 14206.88 | 3976.51 | 27256.55 |
| Asian | 46438.69 | 8088.54 | 46532.66 | 31864.53 | 60178.38 |
Area-level posterior summaries for theta_i are stored in
the areas component. The same information is available from
fitted(..., full = TRUE).
area_summary <- fitted(fit_bart, full = TRUE)
knitr::kable(head(area_summary, 8), digits = 2)| area | mean | sd | median | q2.5 | q97.5 |
|---|---|---|---|---|---|
| 1 | 41371.57 | 2316.76 | 41722.96 | 35605.60 | 45634.65 |
| 2 | 33490.96 | 1034.63 | 33335.48 | 31726.01 | 35366.83 |
| 3 | 44047.69 | 5078.05 | 42549.48 | 36283.79 | 56078.06 |
| 4 | 53412.97 | 1578.18 | 53418.67 | 49775.66 | 56338.41 |
| 5 | 53959.99 | 2020.46 | 53476.87 | 51782.80 | 59158.67 |
| 6 | 42280.93 | 1015.50 | 42277.57 | 40342.77 | 44117.58 |
| 7 | 88237.67 | 4520.50 | 88718.98 | 73765.66 | 94155.71 |
| 8 | 64915.83 | 4239.81 | 65167.43 | 57658.48 | 71585.73 |
The retained posterior draws also support direct visual comparison of area-level estimates across models. The plot below shows posterior means and 95% credible intervals for the first 12 areas.
interval_fits <- list(
Linear = fit_linear,
RNN = fit_rnn,
BART = fit_bart
)
plot_areas <- seq_len(12)
model_offsets <- c(-0.22, 0, 0.22)
model_cols <- c("#1f5f99", "#7c3aed", "#b45309")
interval_summaries <- lapply(interval_fits, fitted, full = TRUE)
mean_range <- range(
unlist(lapply(interval_summaries, function(x) {
c(x$q2.5[plot_areas], x$q97.5[plot_areas])
})),
na.rm = TRUE
)
plot(
NA,
xlim = c(0.5, length(plot_areas) + 0.5),
ylim = mean_range,
xaxt = "n",
xlab = "Area",
ylab = "Posterior estimate of theta"
)
axis(1, at = plot_areas, labels = plot_areas)
for (i in seq_along(interval_summaries)) {
out <- interval_summaries[[i]][plot_areas, ]
x_pos <- plot_areas + model_offsets[i]
segments(
x0 = x_pos,
y0 = out$q2.5,
x1 = x_pos,
y1 = out$q97.5,
col = model_cols[i],
lwd = 2
)
points(
x_pos,
out$mean,
pch = 19,
col = model_cols[i]
)
}
legend(
"topright",
legend = names(interval_fits),
col = model_cols,
pch = 19,
lwd = 2,
bty = "n"
)The posterior mean estimates can be compared with the direct estimates and their standard errors.
plot(
acs_small$MedInc,
fitted(fit_bart),
xlab = "Direct estimate",
ylab = "Posterior mean of theta",
pch = 19,
col = "#2f6f8f"
)
abline(0, 1, col = "#9a3412", lwd = 2)Use posterior_draws() when simulations, custom
summaries, or uncertainty propagation require the retained MCMC draws
instead of summaries.
theta_draws <- posterior_draws(fit_bart, variable = "theta")
A_draws <- posterior_draws(fit_bart, variable = "A")
dim(theta_draws)
#> [1] 60 83head(theta_draws[, 1:4])
#> Warning: Dropping 'draws_df' class as required metadata was removed.
#> # A tibble: 6 × 4
#> `theta[1]` `theta[2]` `theta[3]` `theta[4]`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 39041. 33835. 39041. 51807.
#> 2 41727. 34045. 41725. 56089.
#> 3 41607. 33337. 41606. 53976.
#> 4 40933. 33023. 40934. 55576.
#> 5 42105. 33435. 42105. 54133.
#> 6 39169. 33971. 39171. 56453.head(A_draws)
#> # A draws_df: 6 iterations, 1 chains, and 1 variables
#> A
#> 1 0.94
#> 2 0.98
#> 3 0.78
#> 4 0.87
#> 5 1.02
#> 6 1.47
#> # ... hidden reserved variables {'.chain', '.iteration', '.draw'}For model reporting, combine DIC with posterior summaries and model diagnostics. DIC ranks the fitted candidates in this example; the posterior summaries show the magnitude and uncertainty of the area-level estimates and variance components that drive those comparisons.
Parker, P. A. (2024). Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks. Journal of Official Statistics, 40(2), 317-332. doi:10.1177/0282423X241244671
Parker, P. A. and Eideh, A. (2026). BART-FH: Flexible Nonlinear Modeling for Small Area Estimation. Journal of Survey Statistics and Methodology, 00, 1-18. doi:10.1093/jssam/smaf050