Application of the BLE to categorical data

(From Section 4 of the “Gonçalves, Moura and Migon: Bayes linear estimation for finite population with emphasis on categorical data”)

In a situation where the population can be divided into different and exclusive categories, we can calculate the Bayes Linear Estimator for the proportion of individuals in each category with the BLE_Categorical() function, which receives the following parameters:

\(y_s\) - \(k\)-vector of sample proportion for each category;
\(n\) - sample size;
\(N\) - total size of the population;
\(m\) - \(k\)-vector with the prior proportion of each category. If NULL, sample proportion for each category will be used (non-informative prior);
\(rho\) - matrix with the prior correlation coefficients between two different units within categories. It must be a symmetric square matrix of dimension \(k\) (or \(k-1\)). If NULL, non-informative prior will be used (see below).

Vague Prior Distribution

Letting \(\rho_{ii} \to 1\), that is, assuming prior ignorance, the resulting point estimate will be the same as the one seen in the design-based context for categorical data.

This can be achieved using the BLE_Categorical() function by omitting either the prior proportions and/or the parameter rho, that is:

\(m =\) NULL - sample proportions in each category will be used
\(rho =\) NULL - \(\rho_{ii} \to 1\) and \(\rho_{ij} = 0, i \neq j\)

R and Vs Matrices

If the calculation of matrices R and Vs results in non-positive definite matrices, a warning will be displayed. In general this does not produce incorrect/ inconsistent results for the proportion estimate but for its associated variance. It is suggested to review the prior correlation coefficients (parameter rho).

Examples

Example presented in the mentioned article (2 categories)

ys <- c(0.2614, 0.7386)
n <- 153
N <- 15288
m <- c(0.7, 0.3)
rho <- matrix(0.1, 1)
Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop
#> [1] 0.2855228 0.7144772
Estimator$Vest.prop
#>              [,1]         [,2]
#> [1,]  0.001155671 -0.001155671
#> [2,] -0.001155671  0.001155671

Bellow we can see that the greater the correlation coefficient, the closer our estimation will get to the sample proportions.

ys <- c(0.2614, 0.7386)
n <- 153
N <- 15288
m <- c(0.7, 0.3)
rho <- matrix(0.5, 1)
Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop
#> [1] 0.2642195 0.7357805
Estimator$Vest.prop
#>               [,1]          [,2]
#> [1,]  0.0006750388 -0.0006750388
#> [2,] -0.0006750388  0.0006750388

Example from the help page (3 categories)

ys <- c(0.2, 0.5, 0.3)
n <- 100
N <- 10000
m <- c(0.4, 0.1, 0.5)
mat <- c(0.4, 0.1, 0.1, 0.1, 0.2, 0.1, 0.1, 0.1, 0.6)
rho <- matrix(mat, 3, 3)

Estimator <- BLE_Categorical(ys,n,N,m,rho)

Estimator$est.prop
#> [1] 0.2221967 0.4785131 0.2992902
Estimator$Vest.prop
#>               [,1]          [,2]          [,3]
#> [1,]  0.0013711226 -0.0004980297 -0.0008730929
#> [2,] -0.0004980297  0.0006722052 -0.0001741755
#> [3,] -0.0008730929 -0.0001741755  0.0010472684

Same example, but with no prior correlation coefficients informed (non-informative prior)

ys <- c(0.2, 0.5, 0.3)
n <- 100
N <- 10000
m <- c(0.4, 0.1, 0.5)

Estimator <- BLE_Categorical(ys,n,N,m,rho=NULL)
#> parameter 'rho' not informed, non informative prior correlation coefficients used in estimations
#> Warning in BLE_Categorical(ys, n, N, m, rho = NULL): 'Vest.prop' should have
#> only positive diagonal values. Review prior specification and verify calculated
#> matrices 'R' and 'Vs'.

Estimator$est.prop
#> [1] 0.2017585 0.4996729 0.2985685