SVMODT: Support Vector Machine based Oblique Decision Trees

R-CMD-check

The svmodt package in R implements recursive oblique decision trees, leveraging linear Support Vector Machines (SVMs) to define oblique splits at each node. While traditional decision trees are valued for their interpretability due to axis-aligned splits, oblique decision trees introduce complexity by using linear combinations of features, making optimal split determination more challenging. SVMs, however, offer a principled approach to splitting by identifying hyperplanes that maximize the margin between classes.

Installation

You can install the development version of svmodt from GitHub with:

# install.packages("devtools")
devtools::install_github("AneeshAgarwala/svmodt")

Key Features

Examples

library(svmodt)

# Load data
data(wdbc)  # The package is inclusive of this dataset
wdbc$diagnosis <- factor(wdbc$diagnosis)

# Split
set.seed(123)
train_idx <- sample(nrow(wdbc), 0.8 * nrow(wdbc))
train_data <- wdbc[train_idx, ]
test_data <- wdbc[-train_idx, ]

SVMODT Tree Workflow

# Train with class weights
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 4,
  max_features = 2,
  feature_method = "mutual",
  class_weights = "balanced",
  verbose = TRUE
)

# Predict
predictions <- predict(tree, test_data)

# Visualize Split Boundary at Individual Node(s)
viz <- plot(
  tree = tree,
  original_data = train_data,
  response_col = "diagnosis",
  plot.type = "boundary"
)

# Visualize Overall Surface Split(s) 
viz <- plot_surface(
  tree = tree,
  data = data, 
  response = "diagnois",
  plot.type = "surface")

Advanced Usage

Feature Selection with Penalties

# Penalize previously used features to promote diversity
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 4,
  max_features = 3,
  feature_method = "mutual",
  penalize_used_features = TRUE,
  feature_penalty_weight = 0.5
)

Dynamic Feature Selection

set.seed(123)
# Decrease number of features at deeper levels
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_depth = 5,
  max_features = 10,
  max_features_strategy = "decrease",
  max_features_decrease_rate = 0.8
)

# Random feature selection at each node
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  max_features_strategy = "random",
  max_features_random_range = c(0.3, 0.8)
)

Handle Imbalanced Data

# Balanced class weights
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  class_weights = "balanced"
)

set.seed(123)
# Custom class weights
custom_weights <- c("B" = 1, "M" = 3)
tree <- svm_split(
  data = train_data,
  response = "diagnosis",
  class_weights = "custom",
  custom_class_weights = custom_weights
)

mirror server hosted at Truenetwork, Russian Federation.