nonprobsvy News and Updates
pop.size
, controlSel
,
controlOut
and controlInf
were renamed to
pop_size
, control_sel
,
control_out
and control_inf
respectively.genSimData
removed completely as it is not
used anywhere in the package.maxLik_method
renamed to
maxlik_method
in the control_sel
function.control_out
function:
predictive_match
renamed to pmm_match_type
to align with the PMM (Predictive Mean Matching) estimator naming
convention, where all related parameters start with
pmm_
control_sel
function:
method
removed as it was not usedest_method_sel
renamed to
est_method
h
renamed to gee_h_fun
to make
this more readable to the userstart_type
now accepts only zero
and
mle
(for gee
models only).control_inf
function:
bias_inf
renamed to vars_combine
and type
changed to logical
. TRUE
if variables (its
levels) should be combined after variable selection algorithm for the
doubly robust approach.pi_ij
– argument removed as it is not used.nonprobsvy
class renamed to nonprob
and
all related method adjusted to this changelogit_model_nonprobsvy
,
probit_model_nonprobsvy
and
cloglog_model_nonprobsvy
removed in the favour of more
readable method_ps
function that specifies the propensity
score modelcontrol_inference=control_inf(vars_combine=TRUE)
which
allows doubly robust estimator to combine variables prior estimation
i.e. if selection=~x1+x2
and y~x1+x3
then the
following models are fitted selection=~x1+x2+x3
and
y~x1+x2+x3
. By default we set
control_inference=control_inf(vars_combine=FALSE)
. Note
that this behaviour is assumed independently from variable
selection.nonprob(weights=NULL)
replaced to
nonprob(case_weights=NULL)
to stress that this refer to
case weights not sampling or other weights in non-probability
samplejvs
(Job
Vacancy Survey; a probability sample survey) and admin
(Central Job Offers Database; a non-probability sample survey). The
units and auxiliary variables have been aligned in a way that allows the
data to be integrated using the methods implemented in this
package.check_balance
function was added to check the balance
in the totals of the variables based on the weighted weights between the
non-probability and probability samples.na_action
with default
na.omit
weights
– returns IPW weightsupdate
– allows to update the nonprob
class objectmethod_ps
– for modelling propensity scoremethod_glm
– for modelling y using glm
functionmethod_nn
– for the NN methodmethod_pmm
– for the PMM methodmethod_npar
– for the non-parametric methodprint.nonprob
, summary.nonprob
and
print.nonprob_summary
methods> result_mi
A nonprob object- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.1817
- variable y2: 1.8087
- selected estimators:
- variable y1: 2.9498 (se=0.0420, ci=(2.8674, 3.0322))
- variable y2: 1.5760 (se=0.0326, ci=(1.5122, 1.6399))
number of digits can be changed using print(x, digits)
as shown below
> print(result_mi,2)
A nonprob object- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.18
- variable y2: 1.81
- selected estimators:
- variable y1: 2.95 (se=0.04, ci=(2.87, 3.03))
- variable y2: 1.58 (se=0.03, ci=(1.51, 1.64))
> summary(result_mi) |> print(digits=2)
A nonprob_summary object- call: nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 +
~ x1 + x2, svydesign = sample_prob)
y2 - estimator type: mass imputation
- nonprob sample size: 693011 (69.3%)
- prob sample size: 1000 (0.1%)
- population size: 1000000 (fixed: false)
- detailed information about models are stored in list element(s): "outcome"
----------------------------------------------------------------
- distribution of outcome residuals:
- y1: min: -4.79; mean: 0.00; median: 0.00; max: 4.54
- y2: min: -4.96; mean: -0.00; median: -0.07; max: 12.25
- distribution of outcome predictions (nonprob sample):
- y1: min: -2.72; mean: 3.18; median: 3.04; max: 16.28
- y2: min: -1.55; mean: 1.81; median: 1.58; max: 13.92
- distribution of outcome predictions (prob sample):
- y1: min: -0.46; mean: 2.95; median: 2.84; max: 10.31
- y2: min: -0.58; mean: 1.58; median: 1.39; max: 7.87
----------------------------------------------------------------
formula.tools
strata
is not
supported for the time being.maxit
argument from
controlSel
function to internally used nleqslv
functionvector
in
model_frame
when predicting y_hat
in mass
imputation glm
model when X is based in one auxiliary
variable only - fix provided converting it to data.frame
object.summary
about quality of
estimation basing on difference between estimated and known total values
of auxiliary variablescontrolOut
function by
switching values for predictive_match
argument. From now
on, the predictive_match = 1
means \(\hat{y}-\hat{y}\) in predictive mean
matching imputation and predictive_match = 2
corresponds to
\(\hat{y}-y\) matching.div
option when variable selection (more in
documentation) for doubly robust estimation.nonprob
output such as gradient,
hessian and jacobian derived from IPW estimation for mle
and gee
methods when IPW
or DR
model executed.nonprob
output
when IPW
or DR
model executed.model_frame
matrix data from probability sample
used for mass imputation to nonprob
when MI
or
DR
model executed.logit
, complementary log-log
and
probit
link functions.generalized linear models
,
nearest neighbours
and
predictive mean matching
methods for Mass ImputationSCAD
,
LASSO
and MCP
penalization equationsanalytic
and bootstrap
(with
parallel computation - doSNOW
package) variance for
described estimatorsnonprob
class such as
nobs
for samples sizepop.size
for population size estimationresiduals
for residuals of the inverse probability
weighting modelcooks.distance
for identifying influential observations
that have a significant impact on the parameter estimateshatvalues
for measuring the leverage of individual
observationslogLik
for computing the log-likelihood of the
model,AIC
(Akaike Information Criterion) for evaluating the
model based on the trade-off between goodness of fit and complexity,
helping in model selectionBIC
(Bayesian Information Criterion) for a similar
purpose as AIC but with a stronger penalty for model complexityconfint
for calculating confidence intervals around
parameter estimatesvcov
for obtaining the variance-covariance matrix of
the parameter estimatesdeviance
for assessing the goodness of fit of the
modelR-cmd
checknonprob
function.