The Concept¹

In the context of supervised machine learning, an interaction (i.e statistically relevant dependence) between two attributes \(X\) and \(Y\), in the presence of the context (i.e. class) atribute \(C\), is called 3-way interaction. A strength of such interaction is measured with 3-way Interaction gain: \(I(X;Y;C) = I(X,Y;C) − I(X;C) − I(Y;C)\). Here, \(I(X,Y;C) = I(X,Y|C) = H(X|C) + H(Y|C) − H(X,Y|C)\) is conditional Information gain (i.e. conditional Mutual information) between \(X\) and \(Y\) in the context \(C\), and \(I(X;Y) = H(X) + H(Y) − H(X,Y)\) is measure of dependence (i.e. “correlation”) between \(X\) and \(Y\) regardless of context, where \(H(X) = P_i \sum_{i}log_{2}P_i\) is Shannon’s entropy measured in bits, and \(P_i\) the probability of the \(i-th\) class; 2-way Interaction gains of the single attributes \(X\) and \(Y\) is represented with \(I(X;C) = InfoGain_{c}(X) = \sum_{x}\sum_{c}P(x,c)log\frac{P(x,c)}{P(x)P(c)}\) and \(I(Y;C) = InfoGain_{c}(Y) = \sum_{y}\sum_{c}P(y,c)log\frac{P(y,c)}{P(y)P(c)}\), respectively.

Interaction graphs (Figure 1) are a graphical representation of the \(k\)-most significant 3-way interactions (\(2 \leq k \leq 20\)). The graph consists of nodes which represent interracting attributes (and their 2-way interactions indicated below the name), and weighted edges which represent the strength of 3-way interaction. There are two types of edges:

The positively interacting (i.e. green) edges indicate that the observed pair of attributes provides more information for making a decision if observed together, rather than observed alone. E.g. (Figure 1): Outlook alone explains 24.69% of the entropy, Windy alone explains 4.79% of the entropy, whilst combined, they additionally explain 30.59% of the entropy. Thus, if observed together, they explain 24.69% + 4.79% + 30.59% = 60.07% of the entropy of the dataset.
The negatively interacting (i.e. red) edges indicate that the the observed pair of attributes repeat the same information and should not be combined. E.g. (Figure 1): Outlook alone explains 24.69% of the entropy, Others alone explains 94.02% of the entropy, whilst combined, they repeat 24.69% of the previous information (i.e. this is why the edge Outlook - Others is negative: -24.69%). Thus, if observed together, they explain 24.69% + 94.02% - 24.69% = 94.02% of the entropy of the dataset.

Figure 1: Interaction graph based on the toy-dataset ‘Golf’

Hence, interaction graphs can be used as a tool for understanding the most important interactions and selection of the attributes suitable for grouping/including in a machine learning model.

Outlook	Temperature	Humidity	Windy	Others	Play
overcast	hot	high	FALSE	yes	yes
overcast	cool	normal	TRUE	yes	yes
overcast	mild	high	TRUE	yes	yes
overcast	hot	normal	FALSE	yes	yes
rainy	mild	high	FALSE	yes	yes
rainy	cool	normal	FALSE	yes	yes
rainy	cool	normal	TRUE	no	no
rainy	mild	normal	FALSE	yes	yes
rainy	mild	high	TRUE	no	no
sunny	hot	high	FALSE	no	no
sunny	hot	high	TRUE	no	no
sunny	mild	high	FALSE	no	no
sunny	cool	normal	FALSE	yes	yes
sunny	mild	normal	TRUE	yes	yes

Step-by-step tutorial

Reading the data

First the ‘integr’ package, and a dataset needs to be loaded. The dataset needs to be discrete, and to have a class attribute. Here the ‘Golf’ toy-dataset will be used:

#load integr package (needs to be installed first!)
library("integr")

#read Golf toy-dataset
data("golf")

Generating the interaction graph object

When the data is loaded, an interaction graph object needs to be created. A data.frame containing the data needs to be provided, as well as the name of the class attribute as a string:

#create an Interaction graph object
g <- interactionGraph(golf, classAtt = "Play", intNo = 10, speedUp = FALSE)

The additional parameters intNo (integer) and speedUp (boolean) are optional. The first indicates the desired number of interactions to be displayed on the interaction graph (2 <= intNo <= 20, default 16), whilst the latter indicates if during the interactions computation all attributes that have 2-way interaction gain equal to zero (on the 4th decimal) should be pruned; this speeds up computation for larger datasets but it can lead to less precise results so it is turned off (i.e. set to FALSE) by default.

In case the intNo parameter is set to an inappropriate value (i.e <2, >20 or larger than theoretically possible number of interactions for the given dataset) it is automatically adjusted to fit and a warning message is printed.

Plotting the interaction graph object

After the interaction graph object has been obtained, it can be plotted using plotIntGraph():

#plot an Interaction graph object (in RStudio!)
plotIntGraph(g)

It only requires an interaction graph object as an input. Here the result of the previous step is used.

The result of this comand is Figure 1.

Exporting the interaction graph object

Integr package allows interaction graphs to be export to a binary file. The supported formats are: a Graphviz graph, SVG image, PNG image, PostScript (PS) file, or PDF. The code for exporting the corresponding binary file is provided below.