| Title: | Nonlinear Fay-Herriot Models for Small Area Estimation |
| Version: | 0.1.0 |
| Description: | Fits nonlinear Bayesian extensions of the Fay-Herriot model for small area estimation using area-level direct estimates and corresponding sampling variances. The package provides model fitting, prediction, uncertainty summaries, and diagnostic tools for nonlinear small area estimation workflows. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | dbarts |
| Suggests: | knitr, posterior, rmarkdown, spelling, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| URL: | https://github.com/paparker/nlfh |
| BugReports: | https://github.com/paparker/nlfh/issues |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 3.5) |
| LazyData: | true |
| NeedsCompilation: | no |
| Packaged: | 2026-06-05 23:09:00 UTC; paparker |
| Author: | Paul Parker [aut, cre] |
| Maintainer: | Paul Parker <paulparker@ucsc.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-12 10:00:14 UTC |
nlfh: Nonlinear Fay-Herriot Models for Small Area Estimation
Description
nlfh fits Bayesian Fay-Herriot models for area-level small area estimation.
The package works with direct estimates, known sampling variances, and
area-level covariates, returning posterior draws and summaries for latent
area quantities and model parameters.
Details
The main entry point is fit_fh(). It supports a formula interface for
specifying the response and covariates, and a matrix interface for workflows
that already construct design matrices. Fitted model objects provide compact
printing, posterior summaries via summary(), fitted area estimates via
fitted(), and retained MCMC draws via posterior_draws().
Available model families are:
-
method = "linear": the standard linear Fay-Herriot model. -
method = "rnn": a random-weight neural network Fay-Herriot extension for nonlinear mean functions. Non-intercept covariates are centered and scaled by default, and the response is standardized internally before returned quantities are transformed back to the original scale. -
method = "bart": a BART Fay-Herriot model for flexible nonlinear and interaction effects. The first model-matrix column is treated as a baseline/intercept column and is excluded from BART variable importance.
Author(s)
Maintainer: Paul Parker paulparker@ucsc.edu
See Also
Useful links:
Example ACS Small Area Data
Description
A small example data set for fitting nonlinear Fay-Herriot models. MedInc
is the direct estimate response and MedIncSE is its standard error. Use
MedIncSE^2 as the sampling variance when fitting models.
Usage
acs_dat
Format
A data frame with 1,617 rows and 8 variables:
- MedInc
Area-level direct estimate of median income.
- MedIncSE
Standard error of
MedInc.- SNAPRate
Area-level SNAP participation rate covariate.
- PovRate
Area-level poverty rate covariate.
- White
Area-level proportion identifying as White.
- Black
Area-level proportion identifying as Black.
- Hispanic
Area-level proportion identifying as Hispanic.
- Asian
Area-level proportion identifying as Asian.
Examples
data(acs_dat)
head(acs_dat)
Fit a Fay-Herriot Model
Description
Primary user-facing model fitting function for linear and nonlinear
Fay-Herriot models. Use method to choose the mean-function model and
control to pass method-specific tuning parameters.
Usage
fit_fh(
y = NULL,
x = NULL,
sampling_variance = NULL,
method = c("linear", "rnn", "bart"),
control = list(),
data = NULL,
formula = NULL,
X = NULL
)
Arguments
y |
Numeric vector of area-level direct estimates for the matrix interface. |
x, X |
Numeric matrix or data frame of area-level covariates for the
matrix interface. Rows must correspond to entries of |
sampling_variance |
Numeric vector of known sampling variances for |
method |
Character string selecting the model. Options are |
control |
Named list of control parameters. Common controls are
|
data |
Optional data frame containing variables used by |
formula |
Optional model formula such as |
Details
Formula inputs are parsed with stats::model.frame() and
stats::model.matrix(). Factors are expanded using R's standard contrast and
dummy-variable rules. An intercept is included when the formula includes one,
which is the default for formulas such as y ~ x1 + x2; use 0 + or - 1
in the formula to omit it. Matrix inputs are used as supplied, so include an
intercept column manually if one is desired.
When scale = TRUE, non-intercept covariates are centered and scaled before
fitting. The default is TRUE for method = "rnn" and FALSE for the
linear and BART methods. For BART, the first baseline/intercept column is
never scaled. The RNN method also standardizes the response and sampling
variances internally, then transforms returned posterior quantities back to
the original response scale.
For method = "rnn" and method = "bart", the formula identifies the
predictors available to the nonlinear mean function. It does not impose an
additive linear model. These methods estimate an unknown function f(X), and
nonlinearities or interactions may be learned implicitly by the model.
sampling_variance is always supplied separately from the formula. When
sampling_variance names a column in data, that column is excluded from
y ~ . expansion.
For method = "bart", the first column of the model matrix is treated as a
baseline/intercept column. With the formula interface, this must be the
default (Intercept) column; formulas that omit the intercept with 0 + or
- 1 are rejected. With the matrix interface, put the baseline or intercept
column first. BART variable importance is computed only for the remaining
columns, so fit$variable_importance does not include the first column.
Value
An object inheriting from nlfh_fit. The first class identifies the
fitted method: nlfh_linear_fit, nlfh_rnn_fit, or nlfh_bart_fit.
References
Parker, P. A. (2024). Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks. Journal of Official Statistics, 40(2), 317-332. doi:10.1177/0282423X241244671
Parker, P. A. and Eideh, A. (2026). BART-FH: Flexible Nonlinear Modeling for Small Area Estimation. Journal of Survey Statistics and Methodology, 00, 1-18. doi:10.1093/jssam/smaf050
Examples
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
example_control <- list(n_iter = 50, burn_in = 25, progress = FALSE)
fit_linear <- fit_fh(
MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
sampling_variance = MedIncSE^2,
data = acs_small,
method = "linear",
control = example_control
)
X <- model.matrix(
MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
data = acs_small
)
fit_matrix <- fit_fh(
y = acs_small$MedInc,
X = X,
sampling_variance = acs_small$MedIncSE^2,
method = "linear",
control = example_control
)
fit_rnn <- fit_fh(
MedInc ~ .,
sampling_variance = MedIncSE^2,
data = acs_small,
method = "rnn",
control = example_control
)
fit_bart <- fit_fh(
MedInc ~ SNAPRate + PovRate + White,
sampling_variance = MedIncSE^2,
data = acs_small,
method = "bart",
control = example_control
)
# The default formula intercept is the first model-matrix column and is not
# included in BART variable importance.
fit_bart$variable_importance
Fit a BART Fay-Herriot Model
Description
Fits a Bayesian Fay-Herriot model whose mean function is represented with
Bayesian additive regression trees via the dbarts package.
Usage
fit_fh_bart(
y = NULL,
x = NULL,
sampling_variance = NULL,
formula = NULL,
data = NULL,
X = NULL,
prior_shape = 0.01,
prior_rate = 0.01,
n_iter = 1000,
burn_in = 500,
n_bart_samples = 10,
n_trees = 50,
scale = FALSE,
progress = TRUE
)
Arguments
y |
Numeric vector of area-level direct estimates for the matrix
interface. If the first argument is a formula, it is treated as |
x, X |
Numeric matrix or data frame of area-level covariates for the
matrix interface. Rows must correspond to entries of |
sampling_variance |
Numeric vector of known sampling variances for |
formula |
Optional model formula such as |
data |
Optional data frame containing variables used by |
prior_shape |
Non-negative scalar shape parameter for the inverse-gamma prior on the random-effect variance. |
prior_rate |
Non-negative scalar rate parameter for the inverse-gamma prior on the random-effect variance. |
n_iter |
Positive integer number of MCMC iterations. |
burn_in |
Positive integer number of initial MCMC iterations to discard. |
n_bart_samples |
Positive integer number of BART samples to draw per outer MCMC iteration. |
n_trees |
Positive integer number of trees used by |
scale |
Logical; if |
progress |
Logical; if |
Details
Formula inputs are parsed with stats::model.frame() and
stats::model.matrix(). Factors are expanded using R's standard contrast and
dummy-variable rules. Formula inputs must include an intercept, which is the
default. For this nonlinear method, the formula specifies the available
predictors and does not impose an additive linear mean structure. The BART
mean component estimates an unknown function f(X).
The first model-matrix column is treated as a baseline/intercept column and
is excluded from BART splitting variables. With the formula interface this is
the default (Intercept) column; formulas that omit the intercept with 0 +
or - 1 are rejected. With the matrix interface, put the baseline or
intercept column first. BART variable importance is computed only for the
remaining columns.
Value
An object of class nlfh_bart_fit and nlfh_fit, a list with
posterior draws for predictions, the BART mean function mean, random effects
random_effects, random-effect variance random_effect_variance,
variable_importance, the scalar dic, and MCMC metadata.
References
Parker, P. A. and Eideh, A. (2026). BART-FH: Flexible Nonlinear Modeling for Small Area Estimation. Journal of Survey Statistics and Methodology, 00, 1-18. doi:10.1093/jssam/smaf050
Examples
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
fit <- fit_fh_bart(
MedInc ~ SNAPRate + PovRate + White,
sampling_variance = MedIncSE^2,
data = acs_small,
n_iter = 50,
burn_in = 25,
progress = FALSE
)
summary(fit)
fit$variable_importance
Fit a Linear Fay-Herriot Model
Description
Fits the basic Bayesian Fay-Herriot model with a linear mean function and area-level random effects.
Usage
fit_fh_linear(
y = NULL,
x = NULL,
sampling_variance = NULL,
formula = NULL,
data = NULL,
X = NULL,
prior_beta_variance = 10000^2,
prior_shape = 0.1,
prior_rate = 0.1,
n_iter = 1000,
burn_in = 500,
scale = FALSE,
progress = TRUE
)
Arguments
y |
Numeric vector of area-level direct estimates for the matrix
interface. If the first argument is a formula, it is treated as |
x, X |
Numeric matrix or data frame of area-level covariates for the
matrix interface. Rows must correspond to entries of |
sampling_variance |
Numeric vector of known sampling variances for |
formula |
Optional model formula such as |
data |
Optional data frame containing variables used by |
prior_beta_variance |
Positive scalar prior variance for the regression coefficients. |
prior_shape |
Non-negative scalar shape parameter for the inverse-gamma prior on the random-effect variance. |
prior_rate |
Non-negative scalar rate parameter for the inverse-gamma prior on the random-effect variance. |
n_iter |
Positive integer number of MCMC iterations. |
burn_in |
Positive integer number of initial MCMC iterations to discard. |
scale |
Logical; if |
progress |
Logical; if |
Details
Formula inputs are parsed with stats::model.frame() and
stats::model.matrix(). Factors are expanded using R's standard contrast and
dummy-variable rules. An intercept is included when the formula includes one,
which is the default; matrix inputs are used as supplied.
Value
An object of class nlfh_linear_fit and nlfh_fit, a list with
posterior draws for predictions, random_effect_variance, coefficients,
mean, the scalar dic, and MCMC metadata.
Examples
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
fit <- fit_fh_linear(
MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
sampling_variance = MedIncSE^2,
data = acs_small,
n_iter = 50,
burn_in = 25,
progress = FALSE
)
summary(fit)
Fit a Random-Weight Neural Network Fay-Herriot Model
Description
Fits a Bayesian Fay-Herriot model whose mean function is represented by a fixed random hidden layer with logistic activation and sampled output-layer coefficients.
Usage
fit_fh_rnn(
y = NULL,
x = NULL,
sampling_variance = NULL,
formula = NULL,
data = NULL,
X = NULL,
n_hidden = 200,
prior_beta_variance = NULL,
prior_shape = 0.1,
prior_rate = 0.1,
n_iter = 1000,
burn_in = 500,
scale = TRUE,
progress = TRUE
)
Arguments
y |
Numeric vector of area-level direct estimates for the matrix
interface. If the first argument is a formula, it is treated as |
x, X |
Numeric matrix or data frame of area-level covariates for the
matrix interface. Rows must correspond to entries of |
sampling_variance |
Numeric vector of known sampling variances for |
formula |
Optional model formula such as |
data |
Optional data frame containing variables used by |
|
Positive integer number of hidden nodes in the random-weight neural network. | |
prior_beta_variance |
Optional positive scalar prior variance for the
output-layer coefficients. When |
prior_shape |
Non-negative scalar shape parameter for the inverse-gamma prior on the random-effect variance. |
prior_rate |
Non-negative scalar rate parameter for the inverse-gamma prior on the random-effect variance. |
n_iter |
Positive integer number of MCMC iterations. |
burn_in |
Positive integer number of initial MCMC iterations to discard. |
scale |
Logical; if |
progress |
Logical; if |
Details
Formula inputs are parsed with stats::model.frame() and
stats::model.matrix(). Factors are expanded using R's standard contrast and
dummy-variable rules. An intercept is included when the formula includes one,
which is the default; matrix inputs are used as supplied. For this nonlinear
method, the formula specifies the available predictors and does not impose an
additive linear mean structure. The model estimates an unknown function
f(X).
The response and sampling variances are standardized internally before fitting the RNN. Posterior predictions, mean function draws, coefficients, random-effect variances, and DIC are transformed back to the original response scale before being returned.
Value
An object of class nlfh_rnn_fit and nlfh_fit, a list with
posterior draws for predictions, random_effect_variance,
coefficient_variance, hidden-layer coefficients, mean, the scalar
dic, and MCMC metadata.
References
Parker, P. A. (2024). Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks. Journal of Official Statistics, 40(2), 317-332. doi:10.1177/0282423X241244671
Examples
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
fit <- fit_fh_rnn(
MedInc ~ .,
sampling_variance = MedIncSE^2,
data = acs_small,
n_iter = 50,
burn_in = 25,
progress = FALSE
)
summary(fit)
Extract Fitted Values from a Nonlinear Fay-Herriot Model
Description
For fitted nlfh_fit objects, fitted values are posterior summaries of the
area-level quantities theta_i. The posterior draws are stored in the fitted
object as an n_areas by n_draws matrix.
Usage
## S3 method for class 'nlfh_fit'
fitted(
object,
statistic = c("mean", "median"),
summary = TRUE,
full = FALSE,
...
)
Arguments
object |
An object inheriting from class |
statistic |
Character string selecting the posterior point summary to
return when |
summary |
Logical. If |
full |
Logical. If |
... |
Additional arguments passed to methods. Currently unused. |
Value
If summary = TRUE and full = FALSE, a named numeric vector of
posterior means or medians. If summary = TRUE and full = TRUE, a data
frame of posterior summaries. If summary = FALSE, the posterior draw
matrix for theta_i.
Extract Posterior Draws from a Fay-Herriot Model
Description
Extract posterior draws stored in fitted Bayesian Fay-Herriot model objects. Draws are returned with one row per retained MCMC draw and one column per requested parameter or area-level quantity.
Usage
posterior_draws(object, variable = c("theta", "beta", "u", "A"), ...)
## S3 method for class 'nlfh_fit'
posterior_draws(object, variable = c("theta", "beta", "u", "A"), ...)
Arguments
object |
A fitted model object. |
variable |
Character string naming the posterior quantity to extract.
Supported values are |
... |
Additional arguments passed to methods. |
Value
A posterior::draws_df object when the posterior package is
installed. Otherwise, a data frame with one row per posterior draw.
Examples
data(acs_dat)
acs_small <- as.data.frame(acs_dat[1:500, ])
fit <- fit_fh(
MedInc ~ SNAPRate + PovRate + White + Black + Hispanic + Asian,
sampling_variance = MedIncSE^2,
data = acs_small,
method = "linear",
control = list(n_iter = 50, burn_in = 25, progress = FALSE)
)
theta_draws <- posterior_draws(fit, variable = "theta")
beta_draws <- posterior_draws(fit, variable = "beta")
Print a Fitted Nonlinear Fay-Herriot Model
Description
Provides a compact overview of a fitted nlfh_fit object. The print method
reports the model type, formula or input interface, number of areas, number
of posterior draws, MCMC settings, a short variance-component summary, and
DIC when available. It intentionally avoids printing posterior draw matrices,
coefficient tables, or other large objects.
Usage
## S3 method for class 'nlfh_fit'
print(x, ...)
Arguments
x |
An object inheriting from class |
... |
Additional arguments passed to methods. Currently unused. |
Value
The input object x, invisibly.
Print a Summary of a Fitted Nonlinear Fay-Herriot Model
Description
Prints a compact view of a summary.nlfh_fit object. Large area-level and
coefficient summaries are truncated to their first rows.
Usage
## S3 method for class 'summary.nlfh_fit'
print(x, digits = 4, max_rows = 6, ...)
Arguments
x |
An object returned by |
digits |
Number of significant digits to print. |
max_rows |
Maximum number of rows to print from coefficient and area-level summaries. |
... |
Additional arguments passed to methods. Currently unused. |
Value
The input object x, invisibly.
Summarize a Fitted Nonlinear Fay-Herriot Model
Description
Computes posterior summaries from the MCMC draws stored in a fitted
nlfh_fit object. The returned object is structured for downstream use and
is not printed directly by summary().
Usage
## S3 method for class 'nlfh_fit'
summary(object, ...)
Arguments
object |
An object inheriting from class |
... |
Additional arguments passed to methods. Currently unused. |
Details
Posterior summaries include the mean, standard deviation, median, 2.5%
quantile, and 97.5% quantile. Summaries are computed for area-level
estimates theta_i, variance parameters, and regression or hidden-layer
coefficients when those draws are available.
Value
An object with classes summary.nlfh_fit and summary, containing:
- call
Original model call.
- method
Model fitting method.
- formula
Formula, if the formula interface was used.
- model_type
Human-readable model type.
- mcmc
Stored MCMC settings and draw counts.
- diagnostics
Available MCMC diagnostics and metadata.
- coefficients
Posterior summaries of coefficient draws, if present.
- variance
Posterior summaries of variance parameters.
- areas
Posterior summaries of area-level estimates
theta_i.- dic
DIC, if available.
- variable_importance
BART variable-importance proportions, if available. The first model-matrix column is treated as a baseline/intercept column and is excluded.