nlfh

nlfh fits linear and nonlinear Bayesian Fay-Herriot models for small area estimation with area-level direct estimates and known sampling variances.

The main entry point is fit_fh(). Use method = "linear" for the standard linear Fay-Herriot model, method = "rnn" for the random-weight neural network extension described by Parker (2024), and method = "bart" for the BART-FH model described by Parker and Eideh (2026).

Installation

You can install the development version of nlfh from GitHub with:

# install.packages("pak")
pak::pak("paparker/nlfh")

Example

The included acs_dat data set contains 2021 direct estimates of median income (MedInc) for the state of Missouri, its standard error (MedIncSE), and area-level covariates. Pass the sampling variance separately as MedIncSE^2.

library(nlfh)

data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
example_control <- list(n_iter = 500, burn_in = 250, progress = FALSE)

fit_linear <- fit_fh(
  MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
  sampling_variance = MedIncSE^2,
  data = acs_small,
  method = "linear",
  control = example_control
)

summary(fit_linear)
#> <summary.nlfh_fit>
#> Model type: linear Fay-Herriot
#> Formula: MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian
#> DIC: 14156
#> 
#> MCMC:
#>  n_iter burn_in posterior_draws areas
#>     500     250             250   500
#> 
#> Details:
#>  retained_draws burn_in_fraction progress scale
#>             250              0.5    FALSE FALSE
#> 
#> Variance parameters:
#>               parameter mean     sd median q2.5 q97.5
#>  random_effect_variance 1.97 0.3603  1.911 1.35 2.731
#> 
#> Coefficients:
#>    parameter    mean   sd  median    q2.5   q97.5
#>  (Intercept)  104700 4459  104600   96920  113600
#>     SNAPRate  -66270 3246  -66140  -72070  -59540
#>      PovRate -139700 2156 -139700 -143400 -135000
#>        White   -1647 4544   -1523  -10740    6418
#>        Black  -10860 4496  -10800  -20080   -2823
#>     Hispanic    5126 5913    4918   -6132   16730
#> ... 1 more rows
#> 
#> Area-level estimates theta_i:
#>  area  mean    sd median  q2.5 q97.5
#>     1 43730 274.0  43730 43170 44310
#>     2 30240 442.7  30270 29370 31070
#>     3 44360 642.6  44360 43110 45660
#>     4 61810 266.7  61810 61260 62340
#>     5 64280 272.1  64290 63670 64780
#>     6 40650 359.5  40640 39940 41340
#> ... 494 more rows

The formula interface specifies the available predictors. For nonlinear methods, it does not impose an additive linear mean structure; the model estimates an unknown function of the model matrix. By default, the RNN model centers and scales non-intercept covariates before fitting; use control = list(scale = FALSE) to disable this covariate preprocessing. The RNN response and sampling variances are standardized internally and returned on the original response scale.

fit_rnn <- fit_fh(
  MedInc ~ .,
  sampling_variance = MedIncSE^2,
  data = acs_small,
  method = "rnn",
  control = example_control
)

fit_bart <- fit_fh(
  MedInc ~ SNAPRate + PovRate + White,
  sampling_variance = MedIncSE^2,
  data = acs_small,
  method = "bart",
  control = example_control
)

c(linear = fit_linear$dic, rnn = fit_rnn$dic, bart = fit_bart$dic)
#>   linear      rnn     bart 
#> 14155.66 10535.90 11782.53
fit_bart$variable_importance
#>  SNAPRate   PovRate     White 
#> 0.3155422 0.3835334 0.3009243

For BART fits, the first model-matrix column is treated as a baseline/intercept column and is excluded from variable_importance. With the formula interface this is usually the default (Intercept) column; with the matrix interface, put the baseline or intercept column first.

You can also use the matrix interface directly.

X <- model.matrix(
  MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
  data = acs_small
)

fit_matrix <- fit_fh(
  y = acs_small$MedInc,
  X = X,
  sampling_variance = acs_small$MedIncSE^2,
  method = "linear",
  control = example_control
)

dim(fit_matrix$predictions)
#> [1] 500 250

References

Parker, P. A. (2024). Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks. Journal of Official Statistics, 40(2), 317-332. doi:10.1177/0282423X241244671

Parker, P. A. and Eideh, A. (2026). BART-FH: Flexible Nonlinear Modeling for Small Area Estimation. Journal of Survey Statistics and Methodology, 00, 1-18. doi:10.1093/jssam/smaf050

mirror server hosted at Truenetwork, Russian Federation.