Getting Started with citestR

Overview

citestR is a lightweight R client for the citest Python package. It lets you run the conditional-independence-of-missingness test without using reticulate at runtime — all communication happens over a local HTTP connection to a FastAPI server that wraps the Python package.

1. A small executable example

The chunk below runs during vignette build. It uses mocked HTTP responses, so it exercises the package interface without requiring a live Python backend.

mock_json_response <- function(body, status = 200L) {
  function(req) {
    httr2::response(
      status_code = status,
      headers = list("Content-Type" = "application/json"),
      body = charToRaw(jsonlite::toJSON(body, auto_unbox = TRUE))
    )
  }
}

pkg_env <- citestR:::.pkg_env
old_process <- pkg_env$process
old_port <- pkg_env$port
old_base_url <- pkg_env$base_url
on.exit({
  pkg_env$process <- old_process
  pkg_env$port <- old_port
  pkg_env$base_url <- old_base_url
}, add = TRUE)

pkg_env$process <- list(is_alive = function() TRUE)
pkg_env$port <- 9999L
pkg_env$base_url <- "http://127.0.0.1:9999"

# Mock a ci_test response
mock_fit <- mock_json_response(list(
  model_id = "test-001",
  dataset_id = "ds-001",
  results = list(
    m = 0.12,
    B = 0.04,
    W_bar = 0.01,
    T = 0.05,
    t_k = 2.5,
    p_k = 0.017,
    p_2s = 0.034,
    df = 9
  )
))

example_data <- data.frame(
  Y  = c(1.2, -0.4, 0.7, 0.3, -1.1),
  X1 = c(NA, 0.5, -1.1, 0.8, NA),
  X2 = c(0.3, 1.4, -0.2, 0.6, 0.9)
)

result <- httr2::with_mocked_responses(mock_fit, {
  ci_test(example_data, y = "Y", m = 2L, n_folds = 2L)
})

result
#> 
#>   Conditional Independence of Missingness Test
#> 
#>   Mean diff in BCE: 0.12 
#>   t-statistic:     2.5 
#>   df:              9 
#>   p-value:         0.034 (two-sided)
#> 
#>   Result: Evidence against conditional independence (p < 0.05)

2. Install the Python backend

If you don’t already have a Python environment with citest installed, the package provides a helper:

library(citestR)

# Creates a virtualenv called "citest_env" and installs the citest API backend
install_backend(method = "pip")

You only need to do this once.

3. Run a test

library(citestR)

# Example data frame with some missing values
set.seed(42)
n <- 500
df <- data.frame(
  Y  = rnorm(n),
  X1 = rnorm(n),
  X2 = rnorm(n),
  X3 = rnorm(n)
)
# Introduce MAR missingness on X2
df$X2[df$X1 > 0.5] <- NA

# Run the CI test (server starts automatically)
result <- ci_test(
  data       = df,
  y          = "Y",
  imputer    = "iterative",
  classifier = "rf",
  m          = 5L,
  n_folds    = 5L
)

result$results

The first call starts the Python server in the background; subsequent calls reuse the running process.

4. Retrieve a summary

summary_info <- get_summary(result$model_id)
summary_info

5. Imputer diagnostics

r2 <- imputer_r2(result$model_id, mask_frac = 0.2, m_eval = 1L)
r2$mean_r2
r2$per_variable

6. Sensitivity calibration

# Single kappa value
compute_kappa(r2_x_z = 0.5, beta_yx = 0.3, gamma_x = 0.2)

# Full calibration table
cal <- kappa_calibration_table()
head(cal)

# Pivot for a fixed beta
calibration_pivot(beta_yx = 0.3)

7. Simulated datasets

sim <- simulate_data("single_mar", n = 300, ci = TRUE)
sim$dataset_id
sim$pct_missing

8. Stopping the server

The server shuts down automatically when the R session ends. To stop it manually:

stop_server()