Get started

Introduction

The fmeffects package computes, aggregates, and visualizes forward marginal effects (FMEs) for any supervised machine learning model. Read here how they are computed or the research paper for a more in-depth understanding. There are three main functions:

Example

For demonstration purposes, we consider usage data from the Capital Bike Sharing scheme (Fanaee-T and Gama, 2014). It contains information about bike sharing usage in Washington, D.C. for the years 2011-2012 during the period from 7 to 8 a.m. We are interested in predicting count (the total number of bikes lent out to users).

library(fmeffects)
data(bikes, package = "fmeffects")
str(bikes)
## Classes 'data.table' and 'data.frame':   727 obs. of  11 variables:
##  $ season    : Factor w/ 4 levels "fall","spring",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ year      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ month     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : Factor w/ 2 levels "True","False": 2 2 2 2 2 2 2 2 2 2 ...
##  $ weekday   : Factor w/ 7 levels "Sun","Mon","Tue",..: 7 1 2 3 4 5 6 7 1 2 ...
##  $ workingday: Factor w/ 2 levels "True","False": 2 2 1 1 1 1 1 2 2 1 ...
##  $ weather   : Factor w/ 3 levels "clear","misty",..: 1 2 1 1 1 2 1 2 1 1 ...
##  $ temp      : num  8.2 16.4 5.74 4.92 7.38 6.56 8.2 6.56 3.28 4.92 ...
##  $ humidity  : num  0.86 0.76 0.5 0.74 0.43 0.59 0.69 0.74 0.53 0.5 ...
##  $ windspeed : num  0 13 13 9 13 ...
##  $ count     : num  3 1 64 94 88 95 84 9 6 77 ...
##  - attr(*, ".internal.selfref")=<externalptr>

FMEs are a model-agnostic interpretation method, i.e., they can be applied to any regression or (binary) classification model. Before we can compute FMEs, we need a trained model. The fme package supports models from the caret and mlr3 libraries. Let’s try it with a random forest using the ranger algorithm:

library(mlr3verse)
library(ranger)
task = as_task_regr(x = bikes, id = "bikes", target = "count")
forest = lrn("regr.ranger")$train(task)

Compute FMEs

FMEs can be used to compute feature effects for both numerical and categorical features. This can be done with the fme() function.

Numerical Features

The most common application is to compute the FME for a single numerical feature, i.e., a univariate feature effect. The variable of interest must be specified with the feature argument. In this case, step.size can be any number deemed most useful for the purpose of interpretation. Most of the time, this will be a unit change, e.g., step.size = 1. As the concept of numerical FMEs extends to multivariate feature effects as well, fme() can be asked to compute a bivariate feature effect as well. In this case, feature needs to be supplied with the names of two numerical features, and step.size requires a vector, e.g., step.size = c(1, 1).

Univariate Feature Effects

Assume we are interested in the effect of temperature on bike sharing usage. Specifically, we set step.size = 1 to investigate the FME of an increase in temperature by 1 degree Celsius (°C). Thus, we compute FMEs for feature = "temp" and step.size = 1.

effects = fme(model = forest,
               data = bikes,
               target = "count",
               feature = "temp",
               step.size = 1,
               ep.method = "envelope")

Note that we have specified ep.method = "envelope". This means we exclude observations for which adding 1°C to the temperature results in the temperature value falling outside the range of temp in the overall data. Thereby, we reduce the risk of asking the model to extrapolate.

plot(effects, jitter = c(0.2, 0))
## `geom_smooth()` using formula = 'y ~ s(x, bs = "cs")'

The black arrow indicates direction and magnitude of step.size. The horizontal line is the average marginal effect (AME). The AME is computed as a simple mean over all observation-wise FMEs. Therefore, on average, the FME of a temperature increase of 1°C on bike sharing usage is roughly 2.4. As can be seen, the observation-wise effects seem to vary for different values of temp. While the FME tends to be positive for lower temperature values (0-17°C), it turns negative for higher temperature values (>17°C).

Also, we can extract all relevant aggregate information from the effects object:

effects$ame
## [1] 2.528326

For a more in-depth analysis, we can inspect the FME for each observation in the data set:

head(effects$results)
##    obs.id       fme
## 1:      1  2.200590
## 2:      2  2.152114
## 3:      3  6.002730
## 4:      4 -0.408150
## 5:      5  1.455048
## 6:      6  4.514581

Bivariate Feature Effects

Bivariate feature effects can be considered when one is interested in the combined effect of two features on the target variable. Let’s assume we want to estimate the effect of a decrease in temperature by 3°C, combined with a decrease in humidity by 10 percentage points, i.e., the FME for feature = c("temp", "humidity") and step.size = c(−3, −0.1):

effects2 = fme(model = forest,
               data = bikes,
               target = "count",
               feature = c("temp", "humidity"),
               step.size = c(-3, -0.1),
               ep.method = "envelope")

plot(effects2, jitter = c(0.1, 0.02))

The plot for bivariate FMEs uses a color scale to indicate direction and magnitude of the estimated effect. Let’s check the AME:

effects2$ame
## [1] -2.655111

It seems that a combined decrease in temperature by 3°C and humidity by 10 percentage points seems to result in slightly lower bike sharing usage (on average). However, a quick check of the variance of the FMEs implies that effects are highly heterogeneous:

var(effects2$results$fme)
## [1] 591.5306

Therefore, it could be interesting to move the interpretation of feature effects from a global to a semi-global perspective via the came() function.

Categorical Features

For a categorical feature, the FME of an observation is simply the difference in predictions when changing the observed category of the feature to the category specified in step.size. For instance, one could be interested in the effect of rainy weather on the bike sharing demand, i.e., the FME of changing the feature value of weather to rain for observations where weather is either clear or misty:

effects3 = fme(model = forest,
              data = bikes,
              target = "count",
              feature = "weather",
              step.size = "rain")
summary(effects3)
## 
## Forward Marginal Effects Object
## 
## Step type:
##   categorical
## 
## Feature & reference category:
##   weather, rain
## 
## Extrapolation point detection:
##   none, EPs: 0 of 657 obs. (0 %)
## 
## Average Marginal Effect (AME):
##   -54.0502

Here, the AME of rain is -55. Therefore, while holding all other features constant, a change to rainy weather can be expected to reduce bike sharing usage by 55.
For categorical feature effects, we can plot the empirical distribution of the FMEs:

plot(effects3)

Model Overview with AMEs

For an informative overview of all feature effects in a model, we can use the ame() function:

overview = ame(model = forest,
           data = bikes,
           target = "count")
overview$results
##       Feature step.size      AME      SD SD/AME      0.25      0.75   n
## 1      season    spring -29.6837 31.1549     -1  -39.9484   -5.5179 548
## 2      season    summer   0.7415 22.5845   30.5   -7.7228   11.7662 543
## 3      season      fall  11.1538 27.8616    2.5   -2.1901   35.0825 539
## 4      season    winter   16.188 25.2741    1.6    1.3835   27.7128 551
## 5        year         0 -99.7876 67.7172   -0.7 -158.8407  -21.0627 364
## 6        year         1  97.5794 60.4782    0.6   23.3479  147.1796 363
## 7       month         1   3.9432 13.0966    3.3   -1.2417    6.7622 727
## 8     holiday     False   1.0013 20.6724   20.6   -12.846    11.705  21
## 9     holiday      True -13.6425 25.2035   -1.8  -31.3081    5.9933 706
## 10    weekday       Sat  -57.653 50.8795   -0.9  -90.9633  -18.3733 622
## 11    weekday       Sun  -84.077 56.8256   -0.7 -120.3941  -32.8961 622
## 12    weekday       Mon  12.1655 29.4005    2.4   -8.8125   31.1464 623
## 13    weekday       Tue  18.7743 25.7642    1.4    0.8087   34.1345 625
## 14    weekday       Wed  21.0901 23.7813    1.1    1.3478   35.3659 623
## 15    weekday       Thu  20.4049 25.2487    1.2   -0.3521   36.7181 624
## 16    weekday       Fri   2.4915 37.4019     15  -24.8527   32.6963 623
## 17 workingday     False  -201.14 87.8633   -0.4 -256.2987 -140.2111 496
## 18 workingday      True 160.2681 62.3581    0.4  120.1648  208.9375 231
## 19    weather     clear  25.0977 40.3313    1.6    3.4668   22.3393 284
## 20    weather     misty   3.1518 31.6788   10.1   -8.8715    1.0506 513
## 21    weather      rain -54.0502 49.8181   -0.9  -91.2714   -4.9366 657
## 22       temp         1   2.5041  7.4765      3   -0.4936     4.982 727
## 23   humidity      0.01  -0.2075  2.4971    -12   -0.2783    0.4728 727
## 24  windspeed         1  -0.0351  2.4748  -70.4   -0.2486      0.29 727

This computes the AME for each feature included in the model, with a default step size of 1 for numerical features (or, 0.01 if their range is smaller than 1). For categorical features, AMEs are computed for all available categories.
We can specify a custom subset of features and step sizes using the features argument:

overview = ame(model = forest,
               data = bikes,
               target = "count",
               features = c(weather = c("rain", "clear"), temp = -1, humidity = 0.1),
               ep.method = "envelope")
overview$results
##       Feature step.size      AME      SD SD/AME      0.25      0.75   n
## 1      season    spring -29.6837 31.1549     -1  -39.9484   -5.5179 548
## 2      season    summer   0.7415 22.5845   30.5   -7.7228   11.7662 543
## 3      season      fall  11.1538 27.8616    2.5   -2.1901   35.0825 539
## 4      season    winter   16.188 25.2741    1.6    1.3835   27.7128 551
## 5        year         0 -99.7876 67.7172   -0.7 -158.8407  -21.0627 364
## 6        year         1  97.5794 60.4782    0.6   23.3479  147.1796 363
## 7       month         1   3.9432 13.0966    3.3   -1.2417    6.7622 727
## 8     holiday     False   1.0013 20.6724   20.6   -12.846    11.705  21
## 9     holiday      True -13.6425 25.2035   -1.8  -31.3081    5.9933 706
## 10    weekday       Sat  -57.653 50.8795   -0.9  -90.9633  -18.3733 622
## 11    weekday       Sun  -84.077 56.8256   -0.7 -120.3941  -32.8961 622
## 12    weekday       Mon  12.1655 29.4005    2.4   -8.8125   31.1464 623
## 13    weekday       Tue  18.7743 25.7642    1.4    0.8087   34.1345 625
## 14    weekday       Wed  21.0901 23.7813    1.1    1.3478   35.3659 623
## 15    weekday       Thu  20.4049 25.2487    1.2   -0.3521   36.7181 624
## 16    weekday       Fri   2.4915 37.4019     15  -24.8527   32.6963 623
## 17 workingday     False  -201.14 87.8633   -0.4 -256.2987 -140.2111 496
## 18 workingday      True 160.2681 62.3581    0.4  120.1648  208.9375 231
## 19    weather     clear  25.0977 40.3313    1.6    3.4668   22.3393 284
## 20    weather     misty   3.1518 31.6788   10.1   -8.8715    1.0506 513
## 21    weather      rain -54.0502 49.8181   -0.9  -91.2714   -4.9366 657
## 22       temp         1   2.5041  7.4765      3   -0.4936     4.982 727
## 23   humidity      0.01  -0.2075  2.4971    -12   -0.2783    0.4728 727
## 24  windspeed         1  -0.0351  2.4748  -70.4   -0.2486      0.29 727

Again, note that often it is advisable to set ep.method = "envelope" so we avoid model extrapolation.


Semi-global Interpretations

We can use came() on a specific FME object to compute subspaces of the feature space where FMEs are more homogeneous. Let’s take the effect of a decrease in temperature by 3°C combined with a decrease in humidity by 10 percentage points, and see if we can find three appropriate subspaces.

subspaces = came(effects = effects2, number.partitions = 3)
summary(subspaces)
## 
## PartitioningCtree of an FME object
## 
## Method:  partitions = 3
## 
##    n      cAME  SD(fME)  
##  718 -2.655111 24.32140 *
##  649 -4.891792 21.98467  
##   49  7.893951 22.35667  
##   20 44.079989 42.64779  
## ---
## * root node (non-partitioned)
## 
## AME (Global): -2.6551

As can be seen, the CTREE algorithm was used to partition the feature space into three subspaces. The coefficient of variation (CoV) is used as a criterion to measure homogeneity in each subspace. We can see that the CoV is substantially smaller in each of the subspaces than in the root node, i.e., the global feature space. The conditional AME (cAME) can be used to interpret how the expected FME varies across the subspaces. Let’s visualize our results:

plot(subspaces)

In this case, we get a decision tree that assigns observations to a feature subspace according to the weather situation (weather) and season (season). The information contained in the boxes below the terminal nodes are equivalent to the summary output and can be extracted from subspaces$results. With cAMEs of 6.68, -9.39, and 16.71, respectively, the expected ME is estimated to vary substantially in direction and magnitude across the subspaces. For example, the cAME is highest on rainy days. It turns negative on non-rainy days in spring, summer and winter.


References

Fanaee-T, H. and Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence 2(2): 113–127

Vanschoren, J., van Rijn, J. N., Bischl, B. and Torgo, L. (2013). Openml: networked science in machine learning. SIGKDD Explorations 15(2): 49–60