library(biplotEZ)
#>
#> Attaching package: 'biplotEZ'
#> The following object is masked from 'package:stats':
#>
#> biplot
This vignette deals with biplot for separating classes. Topics discussed are
Consider a data matrix \(\mathbf{X}:n \times p\) containing data on \(n\) objects and \(p\) variables. In addition, a vector \(\mathbf{g}:n \times 1\) contains information on class membership of each observation. Let \(G\) indicate the total number of classes. CVA is closely related to linear discriminant anlaysis, in that the \(p\) variables are transformed to \(p\) new variables, called canonical variates, such that the classes are optimally separated in the canonical space. By optimally separated, we mean maximising the between class variance, relative to the within class variance. This can be formulated as follows:
Let \(\mathbf{G}:n \times G\) be an indicator matrix with \(g_{ij} = 0\) unless observation \(i\) belongs to class \(j\) and then \(g_{ij} = 1\). The matrix \(\mathbf{G'G}\) is a diagonal matrix containing the number of observations per class on the diagonal. We can form the matrix of class means \(\bar{\mathbf{X}}:G \times p = (\mathbf{G'G})^{-1} \mathbf{G'X}\). With the usual analysis of variance the total variance can be decomposed into a between class variance and within class variance:
\[ \mathbf{T} = \mathbf{B} + \mathbf{W} \]
\[ \mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'G'G \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}G'] X} \]
To find the canonical variates we want to maximise the ratio
\[ \frac{\mathbf{m'Bm}}{\mathbf{m'Wm}} \]
subject to \(\mathbf{m'Wm} = 1\). It can shown that this leads to the following equivalent eigen equations:
\[ \mathbf{W}^{-1}\mathbf{BM} = \mathbf{M \Lambda} \]
\[ \mathbf{BM} = \mathbf{WM \Lambda} \]
\[ (\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}) \mathbf{M} = (\mathbf{W}^{-\frac{1}{2}} \mathbf{M}) \mathbf{\Lambda} \]
with \(\mathbf{M'BM}= \mathbf{\Lambda}\) and \(\mathbf{M'WM}= \mathbf{I}\).
Since the matrix \(\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}\) is symmetric and positive semi-definite the eigenvalues in the matrix \(\mathbf{\Lambda}\) are positive and ordered. The rank of \(\mathbf{B} = min(p, G-1)\) so that only the first \(rank(\mathbf{B})\) eigenvalues are non-zero. We form the canonical variates with the transformation
\[ \bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{M}. \]
To construct a 2D biplot, we plot the first two canonical variates \(\bar{\mathbf{Z}} = \bar{\mathbf{X}}\mathbf{MJ}_2\) where \(\mathbf{J}_2' = \begin{bmatrix} \mathbf{I}_2 & \mathbf{0} \end{bmatrix}\). We add the individual sample points with the same transformation
\[ \mathbf{Z} = \mathbf{X}\mathbf{MJ}_2. \] Interpolation of a new sample \(\mathbf{x}^*:p \times 1\) follows as \({\mathbf{z}^*}':2 \times 1 ={\mathbf{x}^*}' \mathbf{MJ}_2\). Using the inverse transformation \(\mathbf{x}' = \mathbf{y}'\mathbf{M}^{-1}\), all the points that will predict \(\mu\) for variable \(j\) will have the form
\[ \mu = \mathbf{y}'\mathbf{M}^{-1} \mathbf{e}_j \]
where \(\mathbf{e}_j\) is a vector of zeros with a one in position \(j\). All the points in the 2D biplot that predict the value \(\mu\) will have
\[ \mu = \begin{bmatrix} z_1 & z_2 & 0 & \dots & 0\end{bmatrix}\mathbf{M}^{-1} \mathbf{e}_j \]
defining the prediction line as
\[ \mu = \mathbf{z}_{\mu}' \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_j. \]
Writing \(\mathbf{h}_{(j)} = \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_j\) the construction of biplot axes is similar to the discussion in the biplotEZ vignette for PCA biplots. The direction of the axis is given by \(\mathbf{h}_{(j)}\). To find the intersection of the prediction line with \(\mathbf{h}_{(j)}\) we note that \[ \mathbf{z}'_{(\mu)}\mathbf{h}_{(j)} = \| \mathbf{z}_{(\mu)} \|^2 \| \mathbf{h}_{(j)} \|^2 cos(\mathbf{z}_{(\mu)},\mathbf{h}_{(j)}) = \| \mathbf{p} \|^2 \| \mathbf{h}_{(j)} \|^2 \] where \(\mathbf{p}\) is the length of the orthogonal projection of \(\mathbf{z}_{(\mu)}\) on \(\mathbf{h}_{(j)}\).
Since \(\mathbf{p}\) is along \(\mathbf{h}_{(j)}\) we can write \(\mathbf{p} = c\mathbf{h}_{(j)}\) and all points on the prediction line \(\mu = \mathbf{z}'_{\mu}\mathbf{h}_{(j)}\) project on the same point \(c_{\mu}\mathbf{h}_{(j)}\). We solve for \(c_{\mu}\) from \[ \mu = \mathbf{z}'_{\mu}\mathbf{h}_{(j)}=\| \mathbf{p} \|^2 \| \mathbf{h}_{(j)} \|^2 = \| c_{\mu}\mathbf{h}_{(j)} \|^2 \| \mathbf{h}_{(j)} \|^2 \]
\[ c_{\mu} = \frac{\mu}{\mathbf{h}_{(j)}'\mathbf{h}_{(j)}}. \] If we select ‘nice’ scale markers \(\tau_{1}, \tau_{2}, \cdots \tau_{k}\) for variable \(j\), then \(\tau_{h}-\bar{x}_j = \mu_{h}\) and positions of these scale markers on \(\mathbf{h}_{(j)}\) are given by \(p_{\mu_{1}}, p_{\mu_{2}}, \cdots p_{\mu_{k}}\) with \[ p_{\mu_h} = c_{\mu_h}\mathbf{h}_{(j)} = \frac{\mu_h}{\mathbf{h}_{(j)}'\mathbf{h}_{(j)}}\mathbf{h}_{(j)} \]
\[ = \frac{\mu_h}{\mathbf{e}_{(j)}' \mathbf{M'}^{-1} \mathbf{J} \mathbf{M}^{-1} \mathbf{e}_{(j)}}\ \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_{(j)} \]
with \[ \mathbf{J} = \begin{bmatrix} \mathbf{I}_2 & \mathbf{0}\\ \mathbf{0} & \mathbf{0} \end{bmatrix}. \]
CVA()
To obtain a CVA biplot of the state.x77
data set,
optimally separating the classes according to state.region
we call
Fitting \(\alpha\)-bags to the classes makes it easier to compare class overlap and separation. For a detailed discussion on \(\alpha\)-bags, see the biplotEZ vignette.
biplot(state.x77) |> CVA(classes = state.region) |> alpha.bags() |>
legend.type (bags = TRUE) |> plot()
#> Computing 0.95 -bag for Northeast
#> Computing 0.95 -bag for South
#> Computing 0.95 -bag for North Central
#> Computing 0.95 -bag for West
means()
This function controls the aesthetics of the class means in the
biplot. The function accepts as first argument an object of class
biplot
where the aesthetics should be applied. Let us first
construct a CVA biplot of the state.x77
data with samples
optimally separated according to state.division
.
biplot(state.x77, scaled = TRUE) |>
CVA(classes = state.division) |>
legend.type(means = TRUE) |> plot()
Instead of adding a legend, we can choose to label the class means. Furthermore, the colour of each class mean defaults to the colour of the samples. We wish to select a different colour and plotting character for the class means.
biplot(state.x77, scaled = TRUE) |>
CVA(classes = state.division) |>
means(label = TRUE, col = "olivedrab", pch = 15) |> plot()
If we choose to only show the class means for the central states, the
argument which
is used either indicating the number(s) in
the sequence of levels (which = 4:7
), or as shown below,
the levels themselves:
biplot(state.x77, scaled = TRUE) |>
CVA(classes = state.division) |>
means (which = c("West North Central", "West South Central", "East South Central",
"East North Central"), label = TRUE) |>
plot()
The size of the labels is controlled with label.cex
which can be specified either as a single value (for all class means) or
a vector indicating size values for each individual sample. The colour
of the labels defaults to the colour(s) of the class means. However,
individual label colours can be spesified with label.col
,
similar to label.cex
as either a single value of a vector
of length equal to the number of classes.
biplot(state.x77, scaled = TRUE) |>
CVA(classes = state.division) |>
means (col = "olivedrab", pch = 15, cex = 1.5,
label = TRUE, label.col = c("blue","green","gold","cyan","magenta",
"black","red","grey","purple")) |>
plot()
We can also make use of the functionality of the ggrepel
package to place the labels.
biplot(state.x77, scaled = TRUE) |>
CVA(classes = state.division) |>
samples (label = "ggrepel", label.cex=0.65) |>
means (label = "ggrepel", label.cex=0.8) |> plot()
References