Type: Package
Title: Fast Calculation of Feature Contributions in Boosting Trees
Version: 1.0
Date: 2026-03-02
Description: Computes feature-specific R-squared (R2) contributions for boosting tree models using a Shapley-value-based decomposition of the total R-squared in polynomial time. Supports models fitted with 'XGBoost' and 'LightGBM', and provides efficient parallel implementations suitable for large-scale problems. Multiple visualization tools are included for interpreting and communicating feature contributions. The methodology is described in Jiang, Zhang, and Zhang (2025) <doi:10.48550/arXiv.2407.03515>.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
URL: https://github.com/catstats/Q-SHAP_R
BugReports: https://github.com/catstats/Q-SHAP_R/issues
Imports: Rcpp (≥ 1.0.14), xgboost (≥ 3.1.3.1), parallel, lightgbm, viridisLite, ggplot2, jsonlite, methods, progress
Suggests: shiny
LinkingTo: Rcpp, RcppEigen
RoxygenNote: 7.3.2
Encoding: UTF-8
NeedsCompilation: yes
Packaged: 2026-03-09 19:50:27 UTC; jiangzhongli
Author: Steven He [aut], Zhongli Jiang [aut, cre], Dabao Zhang [aut]
Maintainer: Zhongli Jiang <zhongli.jiang.stats@gmail.com>
Repository: CRAN
Date/Publication: 2026-03-16 16:00:07 UTC

Calculating Feature-Specific R-Squared Values for Boosting Trees

Description

The qshap package computes feature-specific R-squared values using Shapley decomposition of the total R-squared for boosting trees built in xgboost and lightgbm. It supports parallel computing.

Details

The package provides fast computation of feature importance through Shapley values for tree ensemble models. Main functions include:

The method uses polynomial-time complexity for Shapley value calculation and includes built-in support for multi-core processing.

Author(s)

Steven He, Zhongli Jiang, Min Zhang, Dabao Zhang

References

Zhongli Jiang, Min Zhang, and Dabao Zhang. 2025. Fast calculation of feature contributions in boosting trees. In Proceedings of the Forty-First Conference on Uncertainty in Artificial Intelligence (UAI '25), Vol. 286. JMLR.org, Article 82, 1859–1875.

See Also

Useful links:

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- rsq(explainer, X, y)


Coercion method to data.frame for qshap_result

Description

Coercion method to data.frame for qshap_result

Usage

## S3 method for class 'qshap_result'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

Arguments

x

A qshap_result object

row.names

Not used

optional

Not used

...

Additional arguments (currently unused)

Value

A data.frame with columns feature (character) and rsq (numeric), sorted by rsq in decreasing order.


Create a QSHAP Tree Explainer

Description

Creates an explainer object for computing feature-specific Shapley values from a trained tree ensemble model. Supports XGBoost and LightGBM models.

Usage

gazer(model, max_depth = NULL, base_score = NULL, ...)

Arguments

model

A model object of class xgboost or xgb.Booster from xgboost, or class lgb.Booster from lightgbm

max_depth

Maximum depth of trees, extracted from model by default.

base_score

Base score for predictions, extracted from model by default.

...

Additional arguments, for future use

Value

A class of qshap_tree_explainer object containing the model information and preprocessed tree structures for fast Shapley value computation

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)


Alias for qshap_loss

Description

This is a convenience alias for qshap_loss() that provides a shorter function name for calculating feature-specific loss contributions.

Usage

loss(explainer, x, y, y_mean_ori = NULL)

Arguments

explainer

A qshap_tree_explainer object created by gazer()

x

Feature matrix or data frame

y

Response vector

y_mean_ori

Optional pre-computed mean of y (for efficiency)

Value

A matrix of loss contributions with dimensions (n_samples, n_features)

See Also

qshap_loss

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
loss_matrix <- loss(explainer, X, y)
dim(loss_matrix)


Constructor for qshap_result class

Description

Creates a qshap_result object to store Q-SHAP R-squared results

Usage

new_qshap_result(
  rsq,
  feature_names = NULL,
  total_rsq = NULL,
  n_samples = NULL,
  n_features = NULL,
  loss = NULL
)

Arguments

rsq

Numeric vector of feature-specific R-squared values

feature_names

Character vector of feature names (optional)

total_rsq

Numeric total R-squared (sum of feature-specific values)

n_samples

Integer number of samples used

n_features

Integer number of features

loss

Optional loss matrix (n_samples x n_features)

Value

An object of class qshap_result


Constructor for qshap_tree_explainer class

Description

Creates a qshap_tree_explainer object

Usage

new_qshap_tree_explainer(
  model,
  model_type,
  max_depth,
  base_score = NULL,
  trees,
  store_v_invc,
  store_z
)

Arguments

model

The original tree model object

model_type

Character string indicating model type ("xgboost" or "lightgbm")

max_depth

Integer maximum tree depth

base_score

Numeric base score (for XGBoost)

trees

List of tree objects

store_v_invc

Precomputed complex values for SHAP computation

store_z

Precomputed root values for SHAP computation

Value

An object of class qshap_tree_explainer


Constructor for simple_tree class

Description

Creates a simple_tree object with validation

Usage

new_simple_tree(
  children_left,
  children_right,
  feature,
  threshold,
  max_depth,
  n_node_samples,
  value,
  node_count
)

Arguments

children_left

Integer vector of left child indices (-1 for leaf nodes)

children_right

Integer vector of right child indices (-1 for leaf nodes)

feature

Integer vector of feature indices used for splitting (-1 for leaf nodes)

threshold

Numeric vector of threshold values for splits

max_depth

Integer maximum depth of the tree

n_node_samples

Integer vector of sample counts at each node

value

Numeric vector of node values

node_count

Integer total number of nodes in the tree

Value

An object of class simple_tree


Constructor for tree_summary class

Description

Creates a tree_summary object with validation

Usage

new_tree_summary(
  children_left,
  children_right,
  feature,
  feature_uniq,
  threshold,
  max_depth,
  sample_weight,
  init_prediction,
  node_count
)

Arguments

children_left

Integer vector of left child indices

children_right

Integer vector of right child indices

feature

Integer vector of feature indices

feature_uniq

Integer vector of unique feature indices used in tree

threshold

Numeric vector of threshold values

max_depth

Integer maximum depth

sample_weight

Numeric vector of sample weights per node

init_prediction

Numeric vector of initial predictions per node

node_count

Integer total number of nodes

Value

An object of class tree_summary


Plot method for qshap_rsq objects

Description

This S3 method enables 'plot(x, ...)' where 'x' is a 'qshap_rsq' object. It dispatches to the visualization functions in 'vis'.

Usage

## S3 method for class 'qshap_rsq'
plot(
  x,
  y = NULL,
  type = c("rsq", "elbow", "cumu", "gcorr", "hist", "density", "loss"),
  ...
)

Arguments

x

A 'qshap_rsq' object.

y

Not used.

type

Plot type: one of "rsq", "elbow", "cumu", "gcorr", "hist", "density", or "loss".

...

Passed to the underlying visualization function.

Value

A ggplot2 object (invisibly).


Plot Q-SHAP R-squared contributions

Description

Convenience wrapper that works for both a 'qshap_rsq' object and a plain numeric vector of contributions. Use this if you have a numeric vector and still want to pass arguments like 'color_map_name'.

Usage

plot_qshap(
  x,
  type = c("rsq", "elbow", "cumu", "gcorr", "hist", "density", "loss"),
  ...
)

Arguments

x

A 'qshap_rsq' object (recommended) or a numeric vector.

type

Plot type; see 'plot.qshap_rsq'. Use '"loss"' to launch the interactive explorer (requires a loss matrix).

...

Additional arguments passed to the underlying visualization function (e.g., 'label', 'rotation', 'color_map_name', 'max_feature').

Value

The ggplot2 plot object (invisibly)

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15L, max_depth = 2L, verbosity = 0L, nthreads = 1L)
explainer <- gazer(model)
phi_rsq <- rsq(explainer, X, y)
plot(phi_rsq)


Print method for qshap_result

Description

Print method for qshap_result

Usage

## S3 method for class 'qshap_result'
print(x, n = 10, ...)

Arguments

x

A qshap_result object

n

Integer number of top features to display (default: 10)

...

Additional arguments (currently unused)

Value

The input x is returned invisibly. Called primarily for its side effect of printing a summary of the qshap_result object to the console.


Print method for qshap_tree_explainer

Description

Print method for qshap_tree_explainer

Usage

## S3 method for class 'qshap_tree_explainer'
print(x, ...)

Arguments

x

A qshap_tree_explainer object

...

Additional arguments (currently unused)

Value

The input x is returned invisibly. Called primarily for its side effect of printing a summary of the qshap_tree_explainer object to the console.


Print method for simple_tree

Description

Print method for simple_tree

Usage

## S3 method for class 'simple_tree'
print(x, ...)

Arguments

x

A simple_tree object

...

Additional arguments (currently unused)

Value

The input x is returned invisibly. Called primarily for its side effect of printing a summary of the simple_tree object to the console.


Print method for tree_summary

Description

Print method for tree_summary

Usage

## S3 method for class 'tree_summary'
print(x, ...)

Arguments

x

A tree_summary object

...

Additional arguments (currently unused)

Value

The input x is returned invisibly. Called primarily for its side effect of printing a summary of the tree_summary object to the console.


Alias for rsq

Description

This is a convenience alias for rsq() that provides a shorter function name for calculating feature-specific R-squared values.

Usage

qshap(
  explainer,
  x,
  y,
  feature_names = NULL,
  local = FALSE,
  nsample = NULL,
  sd_out = TRUE,
  ci_out = TRUE,
  level = 0.95,
  nfrac = NULL,
  random_state = 42,
  ncore = 1L
)

Arguments

explainer

A qshap_tree_explainer object created by gazer()

x

Feature matrix or data frame with n samples and p features

y

Response vector of length n

feature_names

Character vector of feature names. If NULL, uses column names from x.

local

Logical; if TRUE, returns both R-squared values and loss matrix

nsample

Optional integer; number of samples to use (random subsample if less than nrow(x))

sd_out

Logical; if TRUE, returns standard deviations of R-squared estimates

ci_out

Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq)

level

Confidence level for the intervals (default 0.95)

nfrac

Optional numeric in (0,1); fraction of samples to use (alternative to nsample)

random_state

Integer seed for reproducible sampling

ncore

Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization)

Value

A qshap_result object; see rsq for details.

See Also

rsq

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- qshap(explainer, X, y)
print(phi_rsq)


S3 Class Constructors and Methods for qshap

Description

This file contains formal S3 class definitions, constructors, validators, and methods for the qshap package objects.


Calculate Q-SHAP Loss Contributions

Description

Computes the feature-specific loss contributions using Q-SHAP decomposition. This is an internal function typically called by rsq().

Usage

qshap_loss(explainer, x, y, y_mean_ori = NULL)

Arguments

explainer

A qshap_tree_explainer object created by gazer()

x

Feature matrix or data frame

y

Response vector

y_mean_ori

Optional pre-computed mean of y (for efficiency)

Value

A matrix of loss contributions with dimensions (n_samples, n_features)


User-friendly constructor for qshap_result

Description

User-friendly constructor for qshap_result

Usage

qshap_result(
  rsq,
  feature_names = NULL,
  total_rsq = NULL,
  n_samples = NULL,
  n_features = NULL,
  loss = NULL
)

Arguments

rsq

Numeric vector of feature-specific R-squared values

feature_names

Character vector of feature names (optional)

total_rsq

Numeric total R-squared (sum of feature-specific values)

n_samples

Integer number of samples used

n_features

Integer number of features

loss

Optional loss matrix (n_samples x n_features)

Value

A validated qshap_result object


Calculate Feature-Specific R-Squared Values

Description

Computes feature-specific R-squared values using Q-SHAP decomposition. Supports parallel processing and sampling for large datasets.

Usage

qshap_rsq(
  explainer,
  x,
  y,
  local = FALSE,
  nsample = NULL,
  sd_out = TRUE,
  ci_out = TRUE,
  level = 0.95,
  nfrac = NULL,
  random_state = 42,
  ncore = 1L
)

Arguments

explainer

A qshap_tree_explainer object created by gazer()

x

Feature matrix or data frame with n samples and p features

y

Response vector of length n

local

Logical; if TRUE, returns both R-squared values and loss matrix

nsample

Optional integer; number of samples to use (random subsample if less than nrow(x))

sd_out

Logical; if TRUE, returns standard deviations of R-squared estimates

ci_out

Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq)

level

Confidence level for the intervals (default 0.95)

nfrac

Optional numeric in (0,1); fraction of samples to use (alternative to nsample)

random_state

Integer seed for reproducible sampling

ncore

Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization)

Value

If local=FALSE (default), returns a numeric vector of length p containing feature-specific R-squared values. If local=TRUE, returns a list with components rsq (the R-squared vector) and loss (an n x p matrix of loss contributions). When ci_out=TRUE, the returned list also contains ci_lower and ci_upper vectors representing Wald-style confidence intervals.

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
phi_rsq <- qshap(explainer, X, y)
print(phi_rsq)


Calculate Feature-Specific R-Squared Values

Description

Computes feature-specific R-squared values using Q-SHAP decomposition, returning a qshap_result object with better formatting and additional metadata. The qshap_result object includes feature names, total R², sample counts, and provides enhanced print(), summary(), and as.data.frame() methods for easier analysis.

Usage

rsq(
  explainer,
  x,
  y,
  feature_names = NULL,
  local = FALSE,
  nsample = NULL,
  sd_out = TRUE,
  ci_out = TRUE,
  level = 0.95,
  nfrac = NULL,
  random_state = 42,
  ncore = 1L
)

Arguments

explainer

A qshap_tree_explainer object created by gazer()

x

Feature matrix or data frame with n samples and p features

y

Response vector of length n

feature_names

Character vector of feature names. If NULL, uses column names from x.

local

Logical; if TRUE, returns both R-squared values and loss matrix

nsample

Optional integer; number of samples to use (random subsample if less than nrow(x))

sd_out

Logical; if TRUE, returns standard deviations of R-squared estimates

ci_out

Logical; if TRUE, returns Wald-style confidence intervals for each feature's R-squared (normal approximation using sd_rsq)

level

Confidence level for the intervals (default 0.95)

nfrac

Optional numeric in (0,1); fraction of samples to use (alternative to nsample)

random_state

Integer seed for reproducible sampling

ncore

Number of cores for parallel processing. Use -1 for all available cores, or a positive integer. Default is 1 (no parallelization)

Details

This function provides a user-friendly interface for Q-SHAP R² computation:

Value

A qshap_result object containing:

See Also

qshap_result

Examples

library(xgboost)
set.seed(42)
n <- 100
p <- 100
X <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- X[, 1] - X[, 2] + rnorm(n, sd = 0.2)
model <- xgboost(X, y, nrounds = 15, max_depth = 2, verbose = 0)
explainer <- gazer(model)
result <- rsq(explainer, X, y)
print(result)


User-friendly constructor for simple_tree

Description

User-friendly constructor for simple_tree

Usage

simple_tree(
  children_left,
  children_right,
  feature,
  threshold,
  max_depth,
  n_node_samples,
  value,
  node_count
)

Arguments

children_left

Integer vector of left child indices (-1 for leaf nodes)

children_right

Integer vector of right child indices (-1 for leaf nodes)

feature

Integer vector of feature indices used for splitting (-1 for leaf nodes)

threshold

Numeric vector of threshold values for splits

max_depth

Integer maximum depth of the tree

n_node_samples

Integer vector of sample counts at each node

value

Numeric vector of node values

node_count

Integer total number of nodes in the tree

Value

A validated simple_tree object


Summary method for qshap_result

Description

Summary method for qshap_result

Usage

## S3 method for class 'qshap_result'
summary(object, ...)

Arguments

object

A qshap_result object

...

Additional arguments (currently unused)

Value

The input object is returned invisibly. Called primarily for its side effect of printing a detailed summary of the qshap_result object to the console.


Summary method for qshap_rsq objects

Description

Provides a summary of the qshap_rsq object, showing the top features by R-squared contribution

Usage

## S3 method for class 'qshap_rsq'
summary(object, n = 10, ...)

Arguments

object

A qshap_rsq object

n

Integer number of top features to display (default: 10)

...

Additional arguments (currently unused)

Value

The input object is returned invisibly. Called primarily for its side effect of printing a summary of the qshap_rsq object to the console.


Summary method for qshap_tree_explainer

Description

Provides detailed summary information about the explainer

Usage

## S3 method for class 'qshap_tree_explainer'
summary(object, ...)

Arguments

object

A qshap_tree_explainer object

...

Additional arguments (currently unused)

Value

The input object is returned invisibly. Called primarily for its side effect of printing a detailed summary of the qshap_tree_explainer object to the console.


User-friendly constructor for tree_summary

Description

User-friendly constructor for tree_summary

Usage

tree_summary(
  children_left,
  children_right,
  feature,
  feature_uniq,
  threshold,
  max_depth,
  sample_weight,
  init_prediction,
  node_count
)

Arguments

children_left

Integer vector of left child indices

children_right

Integer vector of right child indices

feature

Integer vector of feature indices

feature_uniq

Integer vector of unique feature indices used in tree

threshold

Numeric vector of threshold values

max_depth

Integer maximum depth

sample_weight

Numeric vector of sample weights per node

init_prediction

Numeric vector of initial predictions per node

node_count

Integer total number of nodes

Value

A validated tree_summary object


Validator for qshap_result

Description

Validator for qshap_result

Usage

validate_qshap_result(x)

Arguments

x

A qshap_result object

Value

The validated object (invisibly) or stops with an error


Validator for qshap_tree_explainer

Description

Validator for qshap_tree_explainer

Usage

validate_qshap_tree_explainer(x)

Arguments

x

A qshap_tree_explainer object

Value

The validated object (invisibly) or stops with an error


Validator for simple_tree class

Description

Validator for simple_tree class

Usage

validate_simple_tree(x)

Arguments

x

A simple_tree object

Value

The validated object (invisibly) or stops with an error


Validator for tree_summary class

Description

Validator for tree_summary class

Usage

validate_tree_summary(x)

Arguments

x

A tree_summary object

Value

The validated object (invisibly) or stops with an error


Visualization Module for Q-SHAP Results

Description

An environment containing visualization functions for Q-SHAP results. Access functions using vis$rsq(), vis$elbow(), etc.

Usage

vis

Format

An environment with visualization functions:

rsq

Bar plot of feature-specific R-squared values

elbow

Elbow plot showing top contributing features

cumu

Cumulative explained variance plot

gcorr

Generalized correlation plot (square root of R-squared)

hist

Histogram of feature-specific R-squared contributions

density

Density plot of feature-specific R-squared contributions

loss

Interactive loss explorer (requires shiny)

mirror server hosted at Truenetwork, Russian Federation.