Testing the CAR assumption

All estimation in seine rests on the Conditional Average Representativeness (CAR) assumption: that individual outcomes are mean-independent of predictor group membership, conditional on the observed covariates. ei_test_car() provides a formal test of this assumption. However, the test has important limitations that users should understand before interpreting its results.

What the test does

The CAR assumption implies that the conditional expectation function (CEF) of the aggregate outcome takes a specific partially linear form. ei_test_car() tests this implication by comparing a fully nonparametric estimate of the CEF to one constrained to that form, and evaluating the goodness-of-fit difference via a Wald statistic. A significant result indicates that the data are inconsistent with the partially linear structure implied by CAR.

By default, the p-value is computed via a permutation test (Kennedy-Cade 1996) on the Wald statistic. For large samples (2000 or more observations), the asymptotic chi-squared distribution, which is faster, is used by default instead.

library(seine)
data(elec_1968)

spec = ei_spec(
    elec_1968,
    predictors = vap_white:vap_other,
    outcome = pres_dem_hum:pres_abs,
    total = pres_total,
    covariates = c(state, pop_city:pop_rural, farm:educ_coll, inc_00_03k:inc_25_99k),
    preproc = function(x) {
        x = model.matrix(~ 0 + ., x) # convert factors to dummies
        bases::b_bart(x, trees = 200)
    }
)

ei_test_car(spec, iter = 200) # use iter = 1000 or more in practice
#> # A tibble: 4 × 4
#>   outcome          W    df p.value
#>   <chr>        <dbl> <int>   <dbl>
#> 1 pres_dem_hum  388.   157   0.005
#> 2 pres_rep_nix  253.   157   0.005
#> 3 pres_ind_wal  443.   157   0.005
#> 4 pres_abs      142.   157   0.665

The output is a data frame with one row per outcome variable. The W column contains the Wald statistic, df its degrees of freedom, and p.value the p-value for each outcome. P-values are not adjusted for multiple testing by default; pass them to p.adjust() if a correction is desired.

Limitations

ei_test_car() is a useful diagnostic, but its limitations are substantial and should be kept in mind when interpreting the results.

The test only checks a necessary implication of CAR, not CAR itself. CAR is a condition on individual-level data, but only aggregate-level data are observed. The test asks whether the aggregate CEF is inconsistent with CAR; a failure to reject does not mean CAR holds, only that the data are not in conflict with one of its implications. There may be many forms of individual-level confounding that leave the aggregate CEF approximately in the partially linear form, and which the test will not detect.

The test requires a rich basis expansion to have power. If the preproc argument to ei_spec() does not include a flexible basis expansion of the covariates and predictors, the test will have little power to detect violations of CAR. An interaction between the predictors and covariates that is not captured by the basis will not be flagged. A warning is issued if preproc is absent. In general, the richer the basis expansion, the better the test can detect violations, but also the more data are needed for the test statistic to be well-calibrated.

The test may be anti-conservative in small samples. The Wald statistic is only asymptotically chi-squared, and the permutation approximation of the null distribution may also be imperfect when the dimensionality of the basis expansion is large relative to the sample size. In practice, this means the test may reject too often in small samples. The undersmooth argument controls how aggressively the partially linear component is estimated, and increasing it can improve Type I error control at the cost of power.

A significant result does not prevent estimation. Rejecting the null means the data suggest CAR does not hold exactly. It does not mean that estimation with ei_est() is impossible or useless, only that the estimates may be biased. In that case, the sensitivity analysis tools in vignette("sensitivity") are important for assessing how much the conclusions depend on the assumption. Conversely, a non-significant result is weak evidence that the assumption holds and does not substitute for careful subject-matter reasoning about what confounders might be present.

References

Helwig, N. E. (2022). Robust permutation tests for penalized splines. Stats, 5(3), 916-933.

Kennedy, P. E., & Cade, B. S. (1996). Randomization tests for multiple regression. Communications in Statistics-Simulation and Computation, 25(4), 923-936.

McCartan, C., & Kuriwaki, S. (2025+). Identification and semiparametric estimation of conditional means from aggregate data. Working paper arXiv:2509.20194.

This vignette was originally produced by a large language model, and then reviewed and edited by the package authors.

mirror server hosted at Truenetwork, Russian Federation.