| Type: | Package |
| Title: | Linear SVM-Based Recursive Decision Trees |
| Version: | 0.1.0 |
| Description: | Implements Support Vector Machine Oblique Decision Trees (SVMODT). Recursively builds classification trees using linear Support Vector Machines (SVM) hyperplanes at each node instead of axis-parallel splits, creating oblique decision boundaries. Features include multiple feature selection methods, dynamic feature subset strategies, class weight support for imbalanced datasets, pruning and feature penalization. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| Suggests: | knitr, rmarkdown, bookdown, testthat (≥ 3.0.0), rpart, rsample, gridExtra, tidyr, kableExtra, palmerpenguins, dplyr |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 3.5) |
| Imports: | rlang, e1071, FSelectorRcpp, ggplot2 |
| RoxygenNote: | 7.3.3 |
| Config/testthat/edition: | 3 |
| URL: | https://github.com/AneeshAgarwala/svmodt |
| BugReports: | https://github.com/AneeshAgarwala/svmodt/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-06-24 09:10:26 UTC; AneeshAG |
| Author: | Aneesh Agarwal [aut, cre, cph], Jack Jewson [aut, ths], Erik Sverdrup [aut, ths] |
| Maintainer: | Aneesh Agarwal <aaga0022@student.monash.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-30 11:10:02 UTC |
Apply a scaler transformation to a data frame
Description
This internal helper function applies a scaling transformation to a data frame using a provided scaler object. It returns the unscaled data in case of failure.
Usage
apply_scaler(df, scaler)
Arguments
df |
A data frame containing numeric features to be scaled. |
scaler |
A scaler object with a 'transform' method or function used to scale the data. |
Details
This function is intended for internal use within the package and is not exported. It wraps the scaler's 'transform()' call in error handling to prevent failures from interrupting higher-level processes.
Value
A scaled data frame. If scaling fails or invalid inputs are provided, the original (unscaled) data frame is returned.
Check whether a decision-value vector crosses zero
Description
Check whether a decision-value vector crosses zero
Usage
boundary_in_grid(dec_values)
Dynamically determine the number of features to consider at a node
Description
Computes the number of features to be used for splitting at a given tree depth based on the specified strategy. Supports constant, decreasing, and random feature selection strategies.
Usage
calculate_dynamic_max_features(
data,
response,
base_max_features,
depth,
strategy = "constant",
decrease_rate = 0.8,
random_range = c(0.3, 1),
verbose = FALSE
)
Arguments
data |
A data frame containing the predictor variables and the response variable. |
response |
A character string specifying the name of the response variable to exclude from the feature set. |
base_max_features |
Integer; the base number of features to consider. If 'NULL', all available features (excluding the response) are used. |
depth |
Integer; the current depth of the node in the tree (used for depth-dependent strategies). |
strategy |
Character string specifying how to determine the number of features. One of:
|
decrease_rate |
Numeric; factor (0<U+2013>1] controlling how fast the number of features decreases with depth when 'strategy = "decrease"'. Default is 0.8. |
random_range |
Numeric vector of length 2 specifying the lower and upper bounds (as proportions of total features) for random selection when 'strategy = "random"'. Default is 'c(0.3, 1.0)'. |
verbose |
Logical; if 'TRUE', prints details about the chosen strategy and resulting feature count. |
Details
This function helps control model complexity and randomness by varying the number of features used at each split.
Input parameters are validated to ensure sensible defaults. The result is capped to avoid exceeding the total number of available features.
Value
Integer; the number of features to consider at the current node. The value is always constrained between 1 and the total number of available features.
Calculate feature associations with a response variable
Description
Computes the association strength between each predictor and the response variable.
For numeric predictors, the absolute Pearson correlation is used. For categorical
predictors, association is estimated using an ANOVA-based pseudo-R^2 measure.
Usage
calculate_feature_associations(data, response, predictors)
Arguments
data |
A data frame containing the response and predictor variables. |
response |
A string specifying the response variable name. |
predictors |
A character vector of predictor names to evaluate. |
Details
- **Numeric predictors:** Computed using the absolute Pearson correlation. - **Categorical predictors:** Uses the square root of the ratio of between-group sum of squares to total sum of squares from an ANOVA model.
Value
A named numeric vector of association values (0 to 1) for each predictor.
Calculate node impurity
Description
Computes the impurity of a node using either Gini impurity or entropy.
Usage
calculate_impurity(y, method = c("gini", "entropy"))
Arguments
y |
A vector of class labels for the node. |
method |
A string specifying the impurity measure: either "gini" or "entropy". |
Details
If method = "gini", the impurity is calculated as:
G = 1 - \sum_i p_i^2
where p_i is the proportion of samples in class i in the node.
If method = "entropy", the impurity is calculated as:
H = - \sum_i p_i \log(p_i)
Value
A numeric value representing the impurity of the node.
Calculate class weights for a node
Description
Computes class weights for a given set of target values based on the chosen weighting strategy. Supports unweighted, balanced, balanced subsample, and custom weighting schemes, with optional verbosity for diagnostic output.
Usage
calculate_node_class_weights(
y,
class_weights = "none",
custom_class_weights = NULL,
verbose = FALSE
)
Arguments
y |
A vector of class labels at the current node. |
class_weights |
Character string specifying the weighting strategy. Options are:
|
custom_class_weights |
Named numeric vector of custom class weights (used only if 'class_weights = "custom"'). Names must match the unique class labels in 'y'. |
verbose |
Logical; if 'TRUE', prints detailed information about computed weights. |
Details
The function caps computed class weights at 10 to avoid excessively large scaling factors.
Value
A named numeric vector of class weights for each unique class in 'y', or 'NULL' if equal weights are used ('class_weights = "none"') or if the custom weights are invalid.
Select a subset of features based on correlation, mutual information, or randomness
Description
Chooses up to a specified number of features from a dataset using one of three methods: random sampling, correlation with the response, or mutual information ranking.
Usage
choose_features(
data,
response,
max_features,
method = c("random", "mutual", "cor"),
n_subsets = 1
)
Arguments
data |
A data frame containing the response and predictor variables. |
response |
A string specifying the response variable name. |
max_features |
Integer specifying the maximum number of features to select. |
method |
Selection strategy. One of:
|
Details
- If the number of predictors is less than or equal to max_features, all are returned.
- If method = "mutual" and FSelectorRcpp is not installed or fails,
the function gracefully falls back to the correlation-based method.
- The correlation method internally calls
calculate_feature_associations.
Value
A character vector of selected feature names.
Select features with optional penalty for previously used features
Description
Internal helper function to select a subset of features while optionally penalizing features that have been used in ancestor nodes. Supports random selection, mutual information, or correlation-based ranking.
Usage
choose_features_with_penalty(
data,
response,
max_features,
method = c("random", "mutual", "cor"),
penalize_used = FALSE,
penalty_weight = 0.5,
used_features = character(0),
n_subsets = 1,
verbose = FALSE
)
Arguments
data |
A data frame containing predictors and the response. |
response |
Name of the response variable. |
max_features |
Maximum number of features to select. |
method |
Feature selection method; one of |
penalize_used |
Logical; if |
penalty_weight |
Numeric (0<U+2013>0.99); fraction by which to reduce the score/weight of used features. |
used_features |
Character vector of features previously used in the tree. |
verbose |
Logical; if |
Details
- Penalized features have their selection weight or score reduced by multiplying by (1 - penalty_weight).
- For method = "random", the penalty reduces the probability of sampling a feature.
- For method = "mutual" or "cor", the penalty reduces feature importance or correlation.
- If no valid features are available for correlation, the function falls back to random selection with penalty.
- Ensures that no feature is entirely excluded; penalty_weight is capped below 1.
Value
Character vector of selected feature names.
See Also
choose_features, calculate_feature_associations
Convert SVM decision values to probabilities
Description
Converts numeric SVM decision values into probabilities using a logistic/sigmoid transformation. Optionally uses the model's training decision values for calibration. Intended for internal use within the SVM tree prediction workflow.
Usage
convert_decision_to_probs(decision_values, model = NULL)
Arguments
decision_values |
Numeric vector of decision values. |
model |
Optional |
Value
Numeric vector of probabilities, clipped between 0.001 and 0.999.
Build a 2-D prediction grid in ORIGINAL (unscaled) feature space
Description
The two plot features vary over their observed range plus padding; every other node feature is fixed at its median. Returned unscaled so axis labels stay readable; callers scale it themselves before predicting.
Usage
create_decision_grid(
data,
plot_features,
all_node_features,
resolution = 100,
pad_factor = 0.5
)
Calculate Entropy
Description
Computes the entropy for a vector of class labels.
Usage
entropy(y)
Arguments
y |
A vector of class labels. |
Value
Numeric value representing entropy (0 = pure, higher = more impure).
Evaluate Multiple Random Feature Subsets Using SVM Information Gain
Description
Generates and evaluates multiple random feature subsets, ranking them by the information gain achieved through SVM-based splits.
Usage
evaluate_random_subsets(
data,
predictors,
response,
n_subsets = 5,
subset_size = 4,
metric = c("entropy", "gini"),
verbose = FALSE
)
Arguments
data |
A data frame containing predictors and the response variable. |
predictors |
Character vector of available predictor names. |
response |
Character string specifying the response variable name. |
n_subsets |
Integer; number of random feature subsets to evaluate. |
subset_size |
Integer; number of features in each subset. |
metric |
Impurity measure for information gain. One of |
verbose |
Logical; if |
Details
This function randomly samples n_subsets different combinations of
subset_size features from the predictor pool, evaluates each subset
using svm_info_gain, and returns them ranked by performance.
If subset_size is greater than the number of available predictors,
it is automatically reduced to match the predictor count.
Value
A data frame with two columns:
- features
List column containing character vectors of feature names.
- info_gain
Numeric vector of information gain values.
The data frame is sorted in descending order by information gain.
Fit a linear SVM model with optional class weights
Description
Fits a linear Support Vector Machine (SVM) classifier using the e1071 package, with optional class-specific weights to handle class imbalance.
Usage
fit_svm_with_weights(X_scaled, y, class_weights_vec, verbose = FALSE, ...)
Arguments
X_scaled |
A data frame or matrix of predictor variables. |
y |
A vector of class labels corresponding to the rows of |
class_weights_vec |
Optional named numeric vector of class weights. Names must match
the unique class labels in |
verbose |
Logical; if |
... |
Additional arguments passed to |
Details
- Uses a **linear kernel** by default.
- Enables decision values and probability estimates.
- Scaling is disabled (scale = FALSE).
- When class_weights is supplied, weights are capped at 10 and passed to
svm via its class.weights parameter.
- Returns NULL if data is empty or model fitting fails.
Value
A fitted svm model object (of class "svm") on success, or
NULL if fitting fails.
Retrieve all class labels from a decision tree
Description
Recursively extracts all unique class labels stored in a decision tree<U+2019>s leaf nodes.
Usage
get_all_classes(tree)
Arguments
tree |
A decision tree object, where each node may contain:
|
Value
A character vector of all unique class labels present in the tree.
Fallback predictions for SVM decision tree nodes
Description
Generates class predictions and probabilities when SVM predictions are unavailable or insufficient. This function is intended for internal use within the SVM tree.
Usage
get_fallback_predictions(
model,
X_scaled,
decision_values,
svm_probs = NULL,
all_classes,
calibrate = TRUE
)
Arguments
model |
An |
X_scaled |
Scaled predictor matrix for the current node. |
decision_values |
Numeric vector of SVM decision values. |
svm_probs |
Optional SVM probability matrix (from |
all_classes |
Character vector of all possible classes. |
calibrate |
Logical; if |
Value
A list with elements:
-
predictions: Character vector of predicted classes. -
probabilities: Matrix of class probabilities (rows = samples, columns = classes).
Collect every feature name used anywhere in the tree (depth-first)
Description
Collect every feature name used anywhere in the tree (depth-first)
Usage
get_tree_features(tree)
Calculate Gini Impurity
Description
Computes the Gini impurity for a vector of class labels.
Usage
gini(y)
Arguments
y |
A vector of class labels. |
Value
Numeric value representing Gini impurity (0 = pure, higher = more impure).
Handle small child nodes in tree splitting
Description
Internal helper function to handle situations where one or both child nodes
resulting from a split have fewer samples than min_samples. Depending on
which child is too small, it may stop splitting, create only one child, or
return a flag to continue normal processing.
Usage
handle_small_children(
left_idx,
right_idx,
min_samples,
data,
response,
depth,
max_depth,
max_features,
feature_method,
impurity_measure,
max_features_strategy,
max_features_decrease_rate,
max_features_random_range,
penalize_used_features,
feature_penalty_weight,
n_subsets,
used_features,
class_weights,
custom_class_weights,
min_impurity_decrease = 0.001,
features,
scaler,
all_classes,
verbose,
...
)
Arguments
left_idx |
Indices of samples assigned to the left child. |
right_idx |
Indices of samples assigned to the right child. |
min_samples |
Minimum number of samples required for a node to be valid. |
data |
The full dataset being split. |
response |
Name of the response variable. |
depth |
Current depth of the node. |
max_depth |
Maximum allowed depth for the tree. |
max_features |
Maximum number of features to consider at each split. |
feature_method |
Feature selection method (e.g., "random", "cor", "mutual"). |
max_features_strategy |
Strategy for dynamic feature selection ("constant", "decrease", "random"). |
max_features_decrease_rate |
Numeric; factor controlling feature decrease with depth. |
max_features_random_range |
Numeric vector of length 2 specifying min/max proportion for random features. |
penalize_used_features |
Logical; whether to penalize previously used features. |
feature_penalty_weight |
Numeric weight for penalizing used features. |
used_features |
Character vector of features used in ancestor nodes. |
class_weights |
Named numeric vector of class weights. |
custom_class_weights |
Optional custom class weights. |
features |
Character vector of features used at this node. |
scaler |
Optional scaler applied to features at this node. |
all_classes |
Character vector of all possible classes. |
verbose |
Logical; if TRUE, prints messages for debugging. |
... |
Additional arguments passed to |
Details
- If both children are smaller than min_samples, a leaf node is created.
- If only one child is too small, the other child is recursively split.
- This function ensures that tree nodes respect the minimum sample requirement,
avoiding invalid splits that could destabilize the SVM-based tree.
Value
A list with components:
-
stopLogical;TRUEif splitting should stop at this node. -
nodeEither a leaf node object (if stopping) or a partially built internal node with only one child (if one child is too small).
Calculate Information Gain for a Feature Split
Description
Computes the reduction in impurity (information gain) when splitting a target variable by a categorical feature.
Usage
info_gain(feature, target, metric = c("entropy", "gini"))
Arguments
feature |
A vector representing the splitting feature (categorical or factor). |
target |
A vector of class labels for the target variable. |
metric |
The impurity measure to use: either "entropy" or "gini". |
Details
Information gain is computed as:
IG = H(parent) - \sum_{v \in Values} \frac{n_v}{n} H(child_v)
where:
-
H(parent)is the impurity of the original target vector, -
H(child_v)is the impurity of the subset of target where feature = v, -
n_vis the number of samples where feature = v, -
nis the total number of samples.
Value
A numeric value representing the information gain.
Create a leaf node for a decision tree
Description
Constructs a leaf node object containing class probabilities, predicted class, and metadata.
Usage
leaf_node(y, n, all_classes = NULL, features = character(0), scaler = NULL)
Arguments
y |
Vector of class labels for the samples in the node. |
n |
Number of samples in the node. |
all_classes |
Optional character vector of all possible classes. If NULL,
classes are inferred from |
features |
Character vector of features used at this node (default empty). |
scaler |
Optional scaler object applied to the features at this node. |
Details
- If some classes are missing in y, probabilities for those classes are set to 0.
- If all probabilities are 0 or NA, a uniform probability distribution is used.
- Probabilities are normalized to sum to 1.
Value
A list representing a leaf node with components:
-
is_leafLogical;TRUE. -
predictionPredicted class (majority class in the node). -
nNumber of samples in the node. -
featuresFeatures used at this node. -
scalerOptional scaler applied to the node features. -
class_probNamed numeric vector of class probabilities (sums to 1).
Plot method for svmodt_node objects
Description
Thin S3 wrapper that dispatches to plot_boundary or
plot_surface depending on plot.type.
Usage
## S3 method for class 'svmodt_node'
plot(
x,
y = NULL,
...,
data = NULL,
response = NULL,
plot.type = c("surface", "boundary"),
features = NULL,
max_depth = NULL,
check_accuracy = TRUE,
resolution = NULL
)
Arguments
x |
An |
y |
Ignored; present only to satisfy the |
... |
Currently unused. |
data |
The original training data frame (required). |
response |
Character string naming the response column (required). |
plot.type |
One of |
features |
Length-2 character vector of axis features
( |
max_depth |
Maximum depth to visualize
( |
check_accuracy |
Logical; show per-node accuracy
( |
resolution |
Grid resolution per axis.
Default |
Value
-
"boundary": invisibly returns the list fromplot_boundary. -
"surface": invisibly returns the ggplot2 object fromplot_surface.
Examples
tree <- svm_split(wdbc, response = "diagnosis", max_depth = 3)
# All-node boundary panels - prints first, returns list
viz <- plot(tree,
data = wdbc, response = "diagnosis",
plot.type = "boundary"
)
viz$plots[[2]] # second node
# Global decision surface
plot(tree,
data = wdbc, response = "diagnosis",
plot.type = "surface"
)
# Surface with explicit feature axes
plot(tree,
data = wdbc, response = "diagnosis",
plot.type = "surface",
features = c("radius_mean", "concavity_mean")
)
Plot SVM decision boundaries for every node in the tree
Description
Traverses the tree recursively and produces one plot per internal node, showing the SVM hyperplane for that node's binary split, the background region colouring, and the actual data points (coloured by true class). Each node receives only the subset of data that reaches it during training.
Usage
plot_boundary(
tree,
data,
response_col = NULL,
max_depth = NULL,
check_accuracy = TRUE,
resolution = 100
)
Arguments
tree |
An |
data |
The original training data frame. |
response_col |
Character string naming the response column in
|
max_depth |
Maximum tree depth to visualize. |
check_accuracy |
Logical; if |
resolution |
Integer; grid resolution per axis (default |
Value
Invisibly returns a list with four elements:
plotsNamed list of ggplot2 objects, one per node. Names encode depth and path, e.g.
"depth_1_Root","depth_2_Root_L".grid_dataNamed list of data frames (full expanded grid used for each node's contour calculation).
accuracy_infoNamed list of per-node metadata: depth, path, sample count, accuracy, features, whether the boundary was visible, and the pad factor that was needed.
response_colThe response column name used.
Plot the SVM decision boundary for a single internal node
Description
Internal workhorse called by plot_boundary for each node during
tree traversal. Builds the grid in original space, scales it with the
node's own scaler, predicts decision values, and returns a ggplot2 object
together with metadata. The grid is expanded automatically (up to
pad_factor = 3) if the hyperplane falls outside the data range.
Usage
plot_node_boundary(
data,
node_features,
svm_model,
scaler,
response_col,
title = "SVM Decision Boundary",
resolution = 100
)
Plot the global decision surface of the full tree
Description
Predicts class labels across a 2-D grid using the complete tree (not
individual node SVMs), then overlays the original data points. Because
predictions come from svm_predict_tree, multiclass trees are
handled correctly - each grid cell receives the final leaf prediction which
respects all OVR splits along the path.
Usage
plot_surface(tree, data, response, features = NULL, resolution = 200)
Arguments
tree |
An |
data |
The original training data frame. |
response |
Character string naming the response column in |
features |
Character vector of length 2 giving the two features to plot on the x and y axes. Defaults to the first two features used at the root. |
resolution |
Integer; grid resolution per axis (default |
Details
All features not used as plot axes are held fixed at their in-sample median
(numeric) or mode (categorical). You choose which two features to plot via
features; if omitted the first two features used at the root node are
used.
Value
A ggplot2 object. The background tiles show the predicted class for each grid cell; points show true class labels.
Predict method for svmodt_node objects
Description
Predict method for svmodt_node objects
Usage
## S3 method for class 'svmodt_node'
predict(object, newdata, return_probs = FALSE, calibrate_probs = TRUE, ...)
Arguments
object |
An object of class |
newdata |
A data frame of new predictor values. |
return_probs |
Logical; if |
calibrate_probs |
Logical; if |
... |
Currently unused. |
Value
If return_probs = FALSE (the default), a character vector of predicted
class labels, one element per row of newdata.
If return_probs = TRUE, a named list with two elements:
- predictions
Character vector of predicted class labels (length =
nrow(newdata)).- probabilities
Numeric matrix of class probabilities with
nrow(newdata)rows and one column per class. Column names are the class labels; each row sums to 1. Whencalibrate_probs = TRUE, probabilities are derived from the SVM decision value via logistic calibration; otherwise empirical class frequencies at the leaf node are used.
Examples
# Train DTSVM tree
tree <- svm_split(
data = wdbc,
response = "diagnosis",
max_depth = 3,
max_features = 2,
feature_method = "cor"
)
# Predict on WDBC data (returns a character vector of class labels)
preds <- predict(tree, newdata = wdbc)
# Predict with probabilities and logistic calibration
result <- predict(tree, newdata = wdbc,
return_probs = TRUE, calibrate_probs = TRUE
)
head(result$predictions)
head(result$probabilities)
' Print method for svmodt_node objects
Description
' Print method for svmodt_node objects
Usage
## S3 method for class 'svmodt_node'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to |
Value
Invisibly returns x (the svmodt_node object), called
for its side effect of printing a human-readable summary of the tree
structure to the console.
Examples
tree <- svm_split(
data = wdbc,
response = "diagnosis",
max_features = 2,
max_depth = 3,
min_samples = 5,
feature_method = "random",
verbose = TRUE
)
print(tree)
Print an SVM Decision Tree
Description
Recursively prints the structure of an SVM-based decision tree.
Usage
print_svm_tree(
tree,
indent = "",
show_probabilities = FALSE,
show_feature_info = TRUE,
show_penalties = TRUE
)
Arguments
tree |
An object of class |
indent |
String used for indentation (for recursive calls). |
show_probabilities |
Logical; whether to display class probabilities at leaf nodes. |
show_feature_info |
Logical; whether to show features used at nodes. |
show_penalties |
Logical; whether to show penalty flags at nodes. |
Value
Invisibly returns NULL. Prints to console.
Scale Numeric Features for Tree Nodes
Description
Internal utility function to standardize numeric features (zero mean, unit variance) and remove constant columns. Returns both the scaled training data and a transformer function for applying the same scaling to new data.
Usage
scale_node(df)
Arguments
df |
A data frame containing numeric (or factor) features to be scaled. |
Details
- Constant features (zero variance or only one unique value) are automatically removed. - Standard deviation of zero is replaced with 1 to prevent division by zero. - Designed for internal use in SVM tree building and prediction pipelines.
Value
A list with two elements:
- train
The scaled training data frame.
- transform
A function that applies the same scaling to a new data frame.
Check Stopping Conditions for Tree Splitting
Description
Internal utility function to determine if a node in a tree should stop splitting based on depth, purity, or minimum sample size.
Usage
stop_conditions_met(data, y, depth, max_depth, min_samples, verbose)
Arguments
data |
A data frame of predictor features at the current node. |
y |
A vector of target values corresponding to |
depth |
Current depth of the node in the tree. |
max_depth |
Maximum allowed depth for the tree. |
min_samples |
Minimum number of samples required to split a node. |
verbose |
Logical; if |
Details
- Stops if the node reaches max_depth.
- Stops if all target values in the node are identical (pure node).
- Stops if the number of samples is less than min_samples.
Value
Logical; TRUE if the node meets any stopping condition, FALSE otherwise.
Calculate Information Gain Using SVM-based Splits
Description
Computes the information gain achieved by splitting data using a linear SVM trained on a subset of features. The SVM's decision values determine the split, and information gain is calculated based on the resulting partitions.
Usage
svm_info_gain(
feature_subset,
data,
response,
metric = c("entropy", "gini"),
verbose = FALSE
)
Arguments
feature_subset |
Character vector of feature names to use for the SVM split. |
data |
A data frame containing predictors and the response variable. |
response |
Character string specifying the response variable name. |
metric |
Impurity measure for information gain calculation. One of:
|
verbose |
Logical; if |
Details
This function:
Fits a linear SVM using the specified feature subset.
Extracts decision values (distances from the hyperplane).
Creates a binary split: samples with negative decision values go left, positive values go right.
Calculates information gain using the
info_gainfunction.
The SVM split creates an oblique (non-axis-aligned) partition, potentially capturing more complex decision boundaries than single-feature splits.
Value
Numeric value representing the information gain achieved by the SVM split.
Predict Using a Support Vector Machine Oblique Decision Tree
Description
Predicts class labels or class probabilities for new data using a tree constructed with SVM splits. Handles leaf nodes, internal nodes, recursive traversal, and fallback mechanisms when SVM predictions or scaling fail.
Usage
svm_predict_tree(tree, newdata, return_probs = FALSE, calibrate_probs = TRUE)
Arguments
tree |
A tree node object (leaf or internal) created by |
newdata |
A data frame of new predictor values. **Must contain the same features** as those used to fit the tree. Any additional columns (including responses) are ignored. |
return_probs |
Logical; if |
calibrate_probs |
Logical; if |
Details
The function traverses the SVM-based oblique decision tree recursively and predicts class labels or probabilities. Key behaviors:
-
Leaf nodes: Return the majority class stored in the node, along with class probabilities.
-
Internal nodes:
Scale features according to the node's scaling parameters.
Compute SVM decision values.
Recursively traverse left and right children depending on the sign of the decision value.
-
Binary support:
Binary SVMs produce a single decision value per node.
-
Fallback predictions: If scaling fails, SVM predictions are unavailable, or child nodes are missing, predictions are generated in this order:
SVM-provided probabilities (if available).
Calibrated decision values using a logistic/sigmoid function (if
calibrate_probs = TRUE).Leaf node class distribution (empirical frequencies) or uniform probabilities as a last resort.
-
Probability normalization: All returned probabilities are normalized so that each row sums to 1.
-
Feature requirement:
newdatamust contain exactly the features used to train the tree; any extra columns, including responses, are ignored. -
Calibration behavior:
-
calibrate_probs = FALSEreturns class frequencies at the leaf node. -
calibrate_probs = TRUEuses the distance from the hyperplane for logistic post-processing into probabilities.
-
Value
If return_probs = FALSE, a character vector of predicted class labels.
If return_probs = TRUE, a list with elements:
-
predictions: Character vector of predicted class labels. -
probabilities: Numeric matrix of class probabilities (rows = samples, columns = classes).
Build an Oblique Decision Tree Using SVM Splits
Description
Constructs a decision tree where each internal node uses a Support Vector Machine (SVM) to determine the split. Supports dynamic feature selection, feature penalization, scaling, and class weighting.
Usage
svm_split(
data,
response,
depth = 1,
max_depth = 10,
min_samples = 5,
max_features = NULL,
feature_method = c("random", "mutual", "cor"),
impurity_measure = c("entropy", "gini"),
max_features_strategy = c("constant", "random", "decrease"),
max_features_decrease_rate = 0.8,
max_features_random_range = c(0.3, 1),
penalize_used_features = FALSE,
feature_penalty_weight = 0.5,
n_subsets = 1,
used_features = character(0),
class_weights = c("none", "balanced", "custom"),
custom_class_weights = NULL,
min_impurity_decrease = 0.001,
verbose = FALSE,
all_classes = NULL,
...
)
Arguments
data |
A data frame containing predictors and the response variable. |
response |
Character string specifying the response column in 'data'. All other columns are treated as predictors. |
depth |
Integer indicating the current recursion depth (used internally; default is 1). |
max_depth |
Maximum depth of the tree. |
min_samples |
Minimum number of samples required to attempt a split. |
max_features |
Maximum number of features to consider at each split. |
feature_method |
Feature selection method at each node. One of:
|
impurity_measure |
Information Gain evaluation criteria
|
max_features_strategy |
Strategy to adjust the number of features per node:
|
max_features_decrease_rate |
Numeric fraction for decreasing features if 'max_features_strategy = "decrease"'. |
max_features_random_range |
Numeric vector of length 2 specifying min and max fraction of features if 'max_features_strategy = "random"'. |
penalize_used_features |
Logical; if TRUE, features used in ancestor nodes are penalized to encourage diversity. |
feature_penalty_weight |
Numeric (0<U+2013>1) weight for penalizing previously used features. |
n_subsets |
Number of Evaluated Random Feature combinations at each node when 'feature_method = "random' |
used_features |
Character vector of features already used in ancestor nodes (used internally). |
class_weights |
Character string specifying how to handle class imbalance. One of:
|
custom_class_weights |
Optional named numeric vector specifying custom weights per class. |
min_impurity_decrease |
Required decrease in impurity by a split to be considered valid |
verbose |
Logical; if TRUE, prints information about each node during tree construction. |
all_classes |
Optional character vector of all possible response classes (used internally). |
... |
Additional arguments passed to the underlying SVM fitting function. |
Details
This function recursively splits the dataset using an SVM at each node. Splitting stops when maximum depth is reached, the node contains fewer than 'min_samples', or all samples belong to the same class. Features are scaled and selected dynamically at each node, and previously used features can be penalized to promote diversity. Class weighting schemes support handling imbalanced datasets. This approach allows construction of an **oblique decision tree**, where splits are linear hyperplanes rather than axis-aligned.
Value
A nested list representing the decision tree. Each node contains:
- is_leaf
Logical; TRUE if the node is a leaf.
- model
Fitted SVM model at this node (for internal nodes).
- features
Vector of features selected for this node.
- scaler
Scaling information used at this node.
- left
Left child node (decision value > 0).
- right
Right child node (decision value <U+2264> 0).
- depth
Depth of this node in the tree.
- n
Number of samples at this node.
- max_features_used
Number of features considered at this node.
- penalty_applied
Logical; TRUE if feature penalization was applied.
- class_weights_used
Class weights applied at this node.
Examples
data(wdbc)
tree <- svm_split(
data = wdbc,
response = "diagnosis",
max_depth = 3,
min_samples = 5,
feature_method = "random",
verbose = TRUE
)
Trace the prediction path of a sample through an svmodt tree
Description
Generic function that walks the tree for a single row of new data, printing the SVM decision value and chosen branch at every internal node and the final predicted class at the leaf.
Usage
trace_path(object, ...)
## S3 method for class 'svmodt_node'
trace_path(object, sample_data, sample_idx = 1, ...)
Arguments
object |
An |
... |
Currently unused. |
sample_data |
A data frame of new predictor values (one or more rows). |
sample_idx |
Integer; which row to trace (default |
Value
Invisibly returns the predicted class label (character string).
Methods (by class)
-
trace_path(svmodt_node): Method forsvmodt_nodeobjects.
Examples
tree <- svm_split(wdbc, response = "diagnosis", max_depth = 3)
trace_path(tree, wdbc, sample_idx = 5)
Trace Prediction Path for a Sample
Description
Shows the path taken by a single sample through the SVM tree, including decision values, branches, and final prediction.
Usage
trace_prediction_path(tree, sample_data, sample_idx = 1)
Arguments
tree |
The tree object. |
sample_data |
Data frame containing the sample(s). |
sample_idx |
Index of the sample to trace (default 1). |
Value
The predicted class for the sample (a character string). Called primarily for its side effect of printing the full decision path to the console, including node features, SVM decision values, branch directions, and the final predicted class label.
Wisconsin Diagnostic Breast Cancer Dataset
Description
The WDBC dataset contains quantitative measurements from digitized images of fine needle aspirates (FNA) of breast masses. It is commonly used for classification tasks to distinguish between benign and malignant tumors.
Usage
wdbc
Format
A data frame with 569 rows and 32 columns:
- radius_mean
Mean of radius
- radius_se
Standard error of radius
- radius_worst
Worst (largest) radius
- texture_mean
Mean of texture
- texture_se
Standard error of texture
- texture_worst
Worst texture
- perimeter_mean
Mean of perimeter
- perimeter_se
Standard error of perimeter
- perimeter_worst
Worst perimeter
- area_mean
Mean area
- area_se
Standard error of area
- area_worst
Worst area
- smoothness_mean
Mean smoothness
- smoothness_se
Standard error of smoothness
- smoothness_worst
Worst smoothness
- compactness_mean
Mean compactness
- compactness_se
Standard error of compactness
- compactness_worst
Worst compactness
- concavity_mean
Mean concavity
- concavity_se
Standard error of concavity
- concavity_worst
Worst concavity
- concave.points_mean
Mean concave points
- concave.points_se
Standard error of concave points
- concave.points_worst
Worst concave points
- symmetry_mean
Mean symmetry
- symmetry_se
Standard error of symmetry
- symmetry_worst
Worst symmetry
- fractal_dimension_mean
Mean fractal dimension
- fractal_dimension_se
Standard error of fractal dimension
- fractal_dimension_worst
Worst fractal dimension
- diagnosis
Factor with levels 'B' and 'M'
Source
Dr. William H. Wolberg, W. Nick Street, and Olvi L. Mangasarian, University of Wisconsin<U+2013>Madison. Original dataset available at: <https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic>
Wine Dataset
Description
The Wine dataset contains the results of a chemical analysis of wines derived from three different cultivars grown in the same region of Italy. The dataset is commonly used for multiclass classification tasks, where the objective is to identify the cultivar of origin based on physicochemical properties.
Usage
wine
Format
A data frame with 178 rows and 14 columns:
- class
Factor with levels 1, 2, and 3 indicating cultivar
- alcohol
Alcohol content
- malic_acid
Malic acid concentration
- ash
Ash content
- alcalinity_of_ash
Alcalinity of ash
- magnesium
Magnesium content
- total_phenols
Total phenols
- flavanoids
Flavonoid content
- nonflavanoid_phenols
Nonflavanoid phenols
- proanthocyanins
Proanthocyanin content
- color_intensity
Color intensity
- hue
Hue
- od280_od315
OD280/OD315 of diluted wines
- proline
Proline concentration
Source
Aeberhard, S. & Forina, M. (1992). Wine Dataset. UCI Machine Learning Repository. Original dataset available at: <https://archive.ics.uci.edu/dataset/109/wine>