Classification Metrics

library(SLmetrics)

In this vignette a brief overview of classification metrics in {SLmetrics} will be provided. The classification interface is broadly divided into two methods: foo.cmatrix() and foo.factor(). The former calculates the classification from a confusion matrix, while the latter calculates the same metric from two vectors: a vector of actual values and a vector of predicted values. Both are vectors of [factor] values.

Throughout this vignette, the following data will be used:

# 1) seed
set.seed(1903)

# 2) actual values
actual <- factor(
    x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)

# 3) predicted values
predicted <- factor(
    x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)

# 4) sample weights
weights <- runif(
    n = length(actual)
)

Assume that the predicted values come from a trained machine learning model. This vignette introduces a subset of the metrics available in {SLmetrics}; see the online documentation for more details and other metrics.

Computing classification metrics

The accuracy of the model can be evaluated using the accuracy()-function as follows:

# 1) calculate accuracy
accuracy(
    actual    = actual,
    predicted = predicted
)
#> [1] 0.3

Many classification metrics have different names yet compute the same underlying value. For example, recall is also known as the true positive rate or sensitivity. These metrics can be calculated as follows:

# 1) calculate recall
recall(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333

# 2) calculate sensitivity
sensitivity(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333

# 1) calculate true positive rate
tpr(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333

By default, all classification functions calculates the class-wise performance metrics where possible. The performance metrics can also be aggregated in micro and macro averages by using the micro-parameter:

# 1) macro average
recall(
    actual    = actual,
    predicted = predicted,
    micro     = FALSE 
)
#> [1] 0.3055556

# 2) micro average
recall(
    actual    = actual,
    predicted = predicted,
    micro     = TRUE
)
#> [1] 0.3

Calculating multiple performance metrics using separate calls to foo.factor() can be inefficient because each function reconstructs the underlying confusion matrix. A more efficient approach is to construct the confusion matrix once and then pass it to your chosen metric function. To do this, you can use the cmatrix() function:

# 1) confusion matrix
confusion_matrix <- cmatrix(
    actual    = actual,
    predicted = predicted
)

# 2) summarise confusion matrix
summary(
    confusion_matrix
)
#> Confusion Matrix (3 x 3) 
#> ================================================================================
#>   A B C
#> A 1 0 2
#> B 1 1 2
#> C 1 1 1
#> ================================================================================
#> Overall Statistics (micro average)
#>  - Accuracy:          0.30
#>  - Balanced Accuracy: 0.31
#>  - Sensitivity:       0.30
#>  - Specificity:       0.65
#>  - Precision:         0.30

Now you can pass the confusion matrix directly into the metric functions:

# 1) calculate accuracy
accuracy(
    confusion_matrix
)
#> [1] 0.3

# 2) calculate false positive rate
fpr(
    confusion_matrix
)
#>         A         B         C 
#> 0.2857143 0.1666667 0.5714286

Computing weighted classification metrics

The weighted classification metrics can be calculated by using the weighted.foo-method which have a similar interface as the unweighted versions above. Below is an example showing how to compute a weighted version of recall:

# 1) calculate recall
weighted.recall(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202

# 2) calculate sensitivity
weighted.sensitivity(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202

# 1) calculate true positive rate
weighted.tpr(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202

A small disclaimer applies to weighted metrics: it is not possible to pass a weighted confusion matrix directly into a weighted.foo() method. Consider the following example:

# 1) calculate weighted confusion matrix
weighted_confusion_matrix <- weighted.cmatrix(
    actual = actual,
    predicted = predicted,
    w = weights
)

# 2) calculate weighted accuracy
try(
    weighted.accuracy(weighted_confusion_matrix)
)
#> Error in UseMethod(generic = "weighted.accuracy", object = ..1) : 
#>   no applicable method for 'weighted.accuracy' applied to an object of class "cmatrix"

This approach throws an error. Instead, pass the weighted confusion matrix into the unweighted function that uses a confusion matrix interface (i.e., foo.cmatrix()). For example:

accuracy(weighted_confusion_matrix)
#> [1] 0.3490507

This returns the same weighted accuracy as if it were calculated directly:

all.equal(
    accuracy(weighted_confusion_matrix),
    weighted.accuracy(actual, predicted, w = weights)
)
#> [1] TRUE

mirror server hosted at Truenetwork, Russian Federation.