In this vignette a brief overview of classification metrics in {SLmetrics} will be
provided. The classification interface is broadly divided into two
methods: foo.cmatrix()
and foo.factor()
. The
former calculates the classification from a confusion matrix, while the
latter calculates the same metric from two vectors: a vector of
actual
values and a vector of predicted
values. Both are vectors of [factor] values.
Throughout this vignette, the following data will be used:
# 1) seed
set.seed(1903)
# 2) actual values
actual <- factor(
x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 3) predicted values
predicted <- factor(
x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 4) sample weights
weights <- runif(
n = length(actual)
)
Assume that the predicted
values come from a trained
machine learning model. This vignette introduces a subset of the metrics
available in {SLmetrics}; see the online documentation for
more details and other metrics.
The accuracy of the model can be evaluated using the
accuracy()
-function as follows:
Many classification metrics have different names yet compute the same
underlying value. For example, recall
is also known as the
true positive rate
or sensitivity
. These
metrics can be calculated as follows:
# 1) calculate recall
recall(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
# 2) calculate sensitivity
sensitivity(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
# 1) calculate true positive rate
tpr(
actual = actual,
predicted = predicted
)
#> A B C
#> 0.3333333 0.2500000 0.3333333
By default, all classification functions calculates the class-wise
performance metrics where possible. The performance metrics can also be
aggregated in micro
and macro
averages by
using the micro
-parameter:
# 1) macro average
recall(
actual = actual,
predicted = predicted,
micro = FALSE
)
#> [1] 0.3055556
# 2) micro average
recall(
actual = actual,
predicted = predicted,
micro = TRUE
)
#> [1] 0.3
Calculating multiple performance metrics using separate calls to
foo.factor()
can be inefficient because each function
reconstructs the underlying confusion matrix. A more efficient approach
is to construct the confusion matrix once and then pass it to your
chosen metric function. To do this, you can use the
cmatrix()
function:
# 1) confusion matrix
confusion_matrix <- cmatrix(
actual = actual,
predicted = predicted
)
# 2) summarise confusion matrix
summary(
confusion_matrix
)
#> Confusion Matrix (3 x 3)
#> ================================================================================
#> A B C
#> A 1 0 2
#> B 1 1 2
#> C 1 1 1
#> ================================================================================
#> Overall Statistics (micro average)
#> - Accuracy: 0.30
#> - Balanced Accuracy: 0.31
#> - Sensitivity: 0.30
#> - Specificity: 0.65
#> - Precision: 0.30
Now you can pass the confusion matrix directly into the metric functions:
The weighted classification metrics can be calculated by using the
weighted.foo
-method which have a similar interface as the
unweighted versions above. Below is an example showing how to compute a
weighted version of recall
:
# 1) calculate recall
weighted.recall(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
# 2) calculate sensitivity
weighted.sensitivity(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
# 1) calculate true positive rate
weighted.tpr(
actual = actual,
predicted = predicted,
w = weights
)
#> A B C
#> 0.3359073 0.3027334 0.4245202
A small disclaimer applies to weighted metrics: it is
not possible to pass a weighted confusion matrix
directly into a weighted.foo()
method. Consider the
following example:
# 1) calculate weighted confusion matrix
weighted_confusion_matrix <- weighted.cmatrix(
actual = actual,
predicted = predicted,
w = weights
)
# 2) calculate weighted accuracy
try(
weighted.accuracy(weighted_confusion_matrix)
)
#> Error in UseMethod(generic = "weighted.accuracy", object = ..1) :
#> no applicable method for 'weighted.accuracy' applied to an object of class "cmatrix"
This approach throws an error. Instead, pass the weighted confusion
matrix into the unweighted function that uses a confusion matrix
interface (i.e., foo.cmatrix()
). For example:
This returns the same weighted accuracy
as if it were
calculated directly: