# Load required packages only if installed
if (requireNamespace("rosetta", quietly = TRUE)) library(rosetta)
if (requireNamespace("dplyr", quietly = TRUE)) library(dplyr)
if (requireNamespace("psych", quietly = TRUE)) library(psych)
if (requireNamespace("psychTools", quietly = TRUE)) library(psychTools)
if (requireNamespace("knitr", quietly = TRUE)) library(knitr)
if (requireNamespace("kableExtra", quietly = TRUE)) library(kableExtra)
library(LikertMakeR)
LikertMakeR is a package for the R statistical programming language [@winzar2022]. It’s purpose is to synthesise and correlate Likert-scale and similar rating-scale data with predefined first & second moments (mean and standard deviation), Cronbach’s Alpha, Factor Loadings, and other summary statistics.
The makeCorrLoadings()
function generates a correlation
matrix from factor loadings and factor correlations as might be
published in the results of Exploratory Factor Analysis
(EFA) or a Structural Equation Model
(SEM). The resulting correlation matrix then can be
applied to the makeItems()
function to generate a synthetic
data set of rating-scale items that closely resemble the original data
that created the factor loadings summary table.
This paper tests how well the makeCorrLoadings()
function achieves that goal.
A valid makeCorrLoadings()
function should be able to
produce a correlation matrix that is identical to the original
correlation matrix. Further, subsequent treatment with
makeItems()
should produce a dataframe that appears to come
from the same population as the original sample.
So we need an original, True, dataframe to test the function. Preferably, we should have several different original dataframes to ensure that results are generalisable.
We use the pp15 dataset from the rosetta
package [@rosetta]. This is a subset of the Party Panel
2015 dataset. Party Panel is an annual semi-panel study among Dutch
nightlife patrons, where every year, the determinants of another
nightlife-related risk behavior is mapped. In 2015, determinants were
measured of behaviours related to using highly dosed ecstasy pills.
Nine items are relevant to this study. Each is a 7-point likert-style question scored in the range -3 to +3.
The questions and the item labels are presented as follows:
Questions | Labels |
---|---|
Expectation that a high dose results in a longer trip | long |
Expectation that a high dose results in a more intense trip | intensity |
Expectation that a high dose makes you more intoxicated | intoxicated |
Expectation that a high dose provides more energy | energy |
Expectation that a high dose produces more euphoria | euphoria |
Expectation that a high dose yields more insight | insight |
Expectation that a high dose strengthens your connection with others | connection |
Expectation that a high dose facilitates making contact with others | contact |
Expectation that a high dose improves sex | sex |
## variable names
item_list <- c(
"highDose_AttBeliefs_long",
"highDose_AttBeliefs_intensity",
"highDose_AttBeliefs_intoxicated",
"highDose_AttBeliefs_energy",
"highDose_AttBeliefs_euphoria",
"highDose_AttBeliefs_insight",
"highDose_AttBeliefs_connection",
"highDose_AttBeliefs_contact",
"highDose_AttBeliefs_sex"
)
## read the data/ select desired variables/ remove obs with missing values
dat <- read.csv2(file = "data/pp15.csv") |>
select(all_of(item_list)) |>
na.omit()
## give variables shorter names
names(dat) <- itemLabels
sampleSize <- nrow(dat)
The correlations among these nine items should be reproducable by the
makeCorrLoadings()
function.
## correlation matrix
pp15_cor <- cor(dat)
long | intensity | intoxicated | energy | euphoria | insight | connection | contact | sex | |
---|---|---|---|---|---|---|---|---|---|
long | 1.00 | 0.33 | 0.34 | 0.19 | 0.24 | 0.17 | 0.17 | 0.10 | 0.01 |
intensity | 0.33 | 1.00 | 0.69 | 0.12 | 0.26 | -0.01 | 0.14 | 0.12 | 0.00 |
intoxicated | 0.34 | 0.69 | 1.00 | 0.15 | 0.13 | -0.03 | -0.01 | -0.02 | -0.11 |
energy | 0.19 | 0.12 | 0.15 | 1.00 | 0.32 | 0.29 | 0.28 | 0.31 | 0.10 |
euphoria | 0.24 | 0.26 | 0.13 | 0.32 | 1.00 | 0.50 | 0.61 | 0.56 | 0.29 |
insight | 0.17 | -0.01 | -0.03 | 0.29 | 0.50 | 1.00 | 0.57 | 0.54 | 0.30 |
connection | 0.17 | 0.14 | -0.01 | 0.28 | 0.61 | 0.57 | 1.00 | 0.78 | 0.30 |
contact | 0.10 | 0.12 | -0.02 | 0.31 | 0.56 | 0.54 | 0.78 | 1.00 | 0.35 |
sex | 0.01 | 0.00 | -0.11 | 0.10 | 0.29 | 0.30 | 0.30 | 0.35 | 1.00 |
We shall produce the following options for factor correlation reporting:
Full information: factor loadings and factor correlations to five decimal places, plus uniquenesses.
Full information - No uniquenesses: factor loadings and factor correlations to 5 decimal places but without uniquenesses.
Rounded loadings: factor loadings and factor correlations to two decimal places, plus uniquenesses.
Rounded loadings - No uniquenesses: factor loadings and factor correlations to two decimal places but without uniquenesses.
Censored loadings: factor loadings to two decimal places, with loadings less than some arbitrary value removed for clarity of presentation, with uniquenesses.
Censored loadings - No uniqueness: factor loadings to two decimal places, with loadings less some arbitrary value removed for clarity, but without uniquenesses.
Censored loadings, no uniqueness, no factor correlations: factor loadings to two decimal places, with loadings less than some arbitrary value removed for clarity, no uniquenesses or factor correlations. Functionally, this is the equivalent of claiming orthogonal factors even with factor loadings from non-orthogonal rotation.
To compare the True correlation matrix with a
Synthetic matrix, we employ the cortest.jennrich()
function from the psych
package [@psych]. This is a Chi-square test of
whether a pair of matrices are equal [@Jennrich1970]. We report the raw \(\chi^2\) statistic and corresponding
p-value for each test.
Some pretesting suggest that two factors are appropriate for this sample. And we’re confident that the factors will be correlated, so we use promax rotation.
## factor analysis from `rosetta` package
rfaDose <- rosetta::factorAnalysis(
data = dat,
nfactors = 2,
rotate = "promax"
)
factorLoadings <- rfaDose$output$loadings
factorCorrs <- rfaDose$output$correlations
Factor 1 | Factor 2 | Uniqueness | |
---|---|---|---|
long | 0.11 | 0.41 | 0.80 |
intensity | -0.04 | 0.79 | 0.39 |
intoxicated | -0.21 | 0.91 | 0.22 |
energy | 0.34 | 0.15 | 0.83 |
euphoria | 0.68 | 0.17 | 0.44 |
insight | 0.70 | -0.07 | 0.53 |
connection | 0.86 | -0.02 | 0.26 |
contact | 0.85 | -0.05 | 0.30 |
sex | 0.42 | -0.13 | 0.83 |
Factor 1 | Factor 2 | |
---|---|---|
Factor 1 | 1.00 | 0.26 |
Factor 2 | 0.26 | 1.00 |
## round input values to 5 decimal places
# factor loadings
fl1 <- factorLoadings[, 1:2] |>
round(5) |>
as.matrix()
# item uniquenesses
un1 <- factorLoadings[, 3] |> round(5)
# factor correlations
fc1 <- round(factorCorrs, 5) |> as.matrix()
# run makeCorrLoadings() function
itemCors_1 <- makeCorrLoadings(
loadings = fl1,
factorCor = fc1,
uniquenesses = un1
)
## Compare the two matrices
chiSq_1 <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_1,
n1 = sampleSize, n2 = sampleSize
)
factor loadings and factor correlations to 5 decimal places but without uniquenesses
## round input values to 2 decimal places
# factor loadings
fl2 <- factorLoadings[, 1:2] |>
round(5) |>
as.matrix()
# factor correlations
fc2 <- factorCorrs |>
round(5) |>
as.matrix()
itemCors_2 <- makeCorrLoadings(
loadings = fl2,
factorCor = fc2,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_2 <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_2,
n1 = sampleSize, n2 = sampleSize
)
factor loadings and factor correlations to two decimal places.
## round input values to 2 decimal places
# factor loadings
fl3 <- factorLoadings[, 1:2] |>
round(2) |>
as.matrix()
# item uniquenesses
un3 <- factorLoadings[, 3] |>
round(2)
## factor correlations
fc3 <- factorCorrs |>
round(2) |>
as.matrix()
## Compare the two matrices
itemCors_3 <- makeCorrLoadings(
loadings = fl3,
factorCor = fc3,
uniquenesses = un3
)
## Compare the two matrices
chiSq_3 <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_3,
n1 = sampleSize, n2 = sampleSize
)
Factor loadings and factor correlations to two decimal places, and no uniquenesses
## round input values to 2 decimal places
# factor loadings
fl4 <- factorLoadings[, 1:2] |>
round(2) |>
as.matrix()
## factor correlations
fc4 <- factorCorrs |>
round(2) |>
as.matrix()
# apply the function
itemCors_4 <- makeCorrLoadings(
loadings = fl4,
factorCor = fc4,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_4 <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_4,
n1 = sampleSize, n2 = sampleSize
)
Often, item-factor loadings are presented with lower values removed
for ease of reading. I’m calling these “Censored” loadings. This is
usually acceptable, because the purpose is to show the reader where the
larger loadings are. But such missing information may affect the results
of reverse-engineering that we’re employing with the
makeCorrLoadings()
function.
In this study, we set the level of hidden loadings at values less than ‘0.1’, ‘0.2’ and ‘0.3’.
f1 | f2 | f1 | f2 | f1 | f2 | f1 | f2 | |
---|---|---|---|---|---|---|---|---|
long | 0.11 | 0.41 | 0.11 | 0.41 | 0.41 | 0.41 | ||
intensity | -0.04 | 0.79 | 0.79 | 0.79 | 0.79 | |||
intoxicated | -0.21 | 0.91 | -0.21 | 0.91 | -0.21 | 0.91 | 0.91 | |
energy | 0.34 | 0.15 | 0.34 | 0.15 | 0.34 | 0.34 | ||
euphoria | 0.68 | 0.17 | 0.68 | 0.17 | 0.68 | 0.68 | ||
insight | 0.70 | -0.07 | 0.7 | 0.7 | 0.7 | |||
connection | 0.86 | -0.02 | 0.86 | 0.86 | 0.86 | |||
contact | 0.85 | -0.05 | 0.85 | 0.85 | 0.85 | |||
sex | 0.42 | -0.13 | 0.42 | -0.13 | 0.42 | 0.42 |
Factor loadings less than ‘0.1’, ‘0.2’, ‘0.3’ are removed for clarity, presented to two decimal places.
## round input values to 2 decimal places
# factor loadings
fl5a <- factorLoadings[, 1:2] |>
round(2)
# convert factor loadings < '0.1' to '0'
fl5a[abs(fl5a) < 0.1] <- 0
fl5a <- as.matrix(fl5a)
# item uniquenesses
un5 <- factorLoadings[, 3] |>
round(2)
# factor correlations
fc5 <- factorCorrs |>
round(2) |>
as.matrix()
# apply the function
itemCors_5a <- makeCorrLoadings(
loadings = fl5a,
factorCor = fc5,
uniquenesses = un5
)
## Compare the two matrices
chiSq_5a <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_5a,
n1 = sampleSize, n2 = sampleSize
)
# factor loadings
fl5b <- factorLoadings[, 1:2] |>
round(2)
# convert factor loadings < '0.2' to '0'
fl5b[abs(fl5b) < 0.2] <- 0
fl5b <- as.matrix(fl5b)
# apply the function
itemCors_5b <- makeCorrLoadings(
loadings = fl5b,
factorCor = fc5,
uniquenesses = un5
)
## Compare the two matrices
chiSq_5b <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_5b,
n1 = sampleSize, n2 = sampleSize
)
# factor loadings
fl5c <- factorLoadings[, 1:2] |>
round(2)
# convert factor loadings < '0.2' to '0'
fl5c[abs(fl5c) < 0.3] <- 0
fl5c <- as.matrix(fl5c)
# apply the function
itemCors_5c <- makeCorrLoadings(
loadings = fl5c,
factorCor = fc5,
uniquenesses = un5
)
## Compare the two matrices
chiSq_5c <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_5c,
n1 = sampleSize, n2 = sampleSize
)
# kable(itemCors_5, digits = 2)
With no declared uniquenesses, they are inferred from estimated communalities.
itemCors_6a <- makeCorrLoadings(
loadings = fl5a,
factorCor = fc5,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_6a <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_6a,
n1 = sampleSize, n2 = sampleSize
)
# apply the function
itemCors_6b <- makeCorrLoadings(
loadings = fl5b,
factorCor = fc5,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_6b <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_6b,
n1 = sampleSize, n2 = sampleSize
)
# apply the function
itemCors_6c <- makeCorrLoadings(
loadings = fl5c,
factorCor = fc5,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_6c <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_6c,
n1 = sampleSize, n2 = sampleSize
)
No uniquenesses. That is, Uniquenesses are estimated from 1-communalities, where communalities = sum(factor-loadings^2). And no factor correlations. That is, we assume orthogonal factors.
itemCors_7a <- makeCorrLoadings(
loadings = fl5a,
factorCor = NULL,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_7a <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_7a,
n1 = sampleSize, n2 = sampleSize
)
# apply the function
itemCors_7b <- makeCorrLoadings(
loadings = fl5b,
factorCor = NULL,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_7b <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_7b,
n1 = sampleSize, n2 = sampleSize
)
# apply the function
itemCors_7c <- makeCorrLoadings(
loadings = fl5c,
factorCor = NULL,
uniquenesses = NULL
)
## Compare the two matrices
chiSq_7c <- cortest.jennrich(
R1 = pp15_cor, R2 = itemCors_7c,
n1 = sampleSize, n2 = sampleSize
)
Treatment | chi2 | p |
---|---|---|
Full information | 21.08 | 0.977 |
Full information - No uniqueness | 22.22 | 0.965 |
Rounded loadings | 21.07 | 0.978 |
Rounded loadings - No uniqueness | 22.20 | 0.965 |
Censored loadings <0.1 | 24.53 | 0.926 |
Censored loadings <0.2 | 44.08 | 0.167 |
Censored loadings <0.3 | 74.23 | 0.000 |
Censored loadings <0.1 - no uniqueness | 26.18 | 0.886 |
Censored loadings <0.2 - no uniqueness | 46.00 | 0.123 |
Censored loadings <0.3 - no uniqueness | 78.88 | 0.000 |
Censored loadings <0.1 - no uniqueness, factor cors | 30.51 | 0.727 |
Censored loadings <0.2 - no uniqueness, factor cors | 49.63 | 0.065 |
Censored loadings <0.3 - no uniqueness, factor cors | 65.82 | 0.002 |
The makeCorrLoadings
function works quite well when it
has information on factor loadings, but much less well when “summary”
(censored) factor loadings are given.
25 personality self report items taken from the International Personality Item Pool
(ipip.ori.org) were included as part of the Synthetic Aperture
Personality Assessment (SAPA) web based personality
assessment project. The data are available through the
psychTools
package [@psychTools].
2800 subjects are included as a demonstration set for scale construction, factor analysis, and Item Response Theory analysis. Three additional demographic variables (sex, education, and age) are also included.
For purposes of this study, we confine ourselves to women (sex==2) with postgraduate qualifications (education==5), giving us a sample size of 229 subjects.
The dataset contains the following 28 variables. (The q numbers are the SAPA item numbers).
The correlations among these 25 items should be reproducable by the
makeCorrLoadings()
function.
## download data
data(bfi)
## filter for highly-educated women
bfi_short <- bfi |>
filter(education == 5 & gender == 2) |>
na.omit()
## keep just the 25 items
bfi_short <- bfi_short[, 1:25]
sampleSize <- nrow(bfi_short)
## derive correlation matrix
bfi_cor <- cor(bfi_short)
As in the first study, we shall produce the following options for factor correlation reporting:
And, as in the first study, we compare the True correlation matrix with a Synthetic matrix, employing a Jennrich test.
Five correlated factors are appropriate for this sample, so we use promax rotation.
## factor analysis from `rosetta` package is a less messy version of the `psych::fa()` function
fa_bfi <- rosetta::factorAnalysis(
data = bfi_short,
nfactors = 5,
rotate = "promax"
)
bfiLoadings <- fa_bfi$output$loadings
bfiCorrs <- fa_bfi$output$correlations
Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | Uniqueness | |
---|---|---|---|---|---|---|
A1 | 0.07 | 0.14 | 0.03 | -0.55 | -0.15 | 0.70 |
A2 | 0.06 | 0.13 | 0.06 | 0.55 | 0.13 | 0.59 |
A3 | 0.16 | 0.12 | 0.17 | 0.63 | 0.05 | 0.50 |
A4 | -0.11 | 0.06 | 0.07 | 0.37 | -0.08 | 0.81 |
A5 | -0.08 | 0.21 | -0.05 | 0.53 | 0.01 | 0.57 |
C1 | 0.03 | -0.06 | 0.64 | 0.01 | 0.12 | 0.55 |
C2 | 0.18 | 0.00 | 0.79 | 0.01 | -0.07 | 0.44 |
C3 | 0.08 | -0.11 | 0.65 | 0.10 | -0.07 | 0.63 |
C4 | 0.15 | 0.06 | -0.60 | -0.04 | -0.12 | 0.54 |
C5 | 0.27 | -0.04 | -0.53 | -0.02 | 0.08 | 0.59 |
E1 | 0.01 | -0.56 | 0.11 | -0.07 | 0.13 | 0.67 |
E2 | 0.18 | -0.75 | 0.09 | -0.11 | 0.14 | 0.35 |
E3 | 0.14 | 0.49 | -0.04 | 0.17 | 0.14 | 0.62 |
E4 | -0.05 | 0.54 | -0.09 | 0.38 | -0.18 | 0.44 |
E5 | 0.07 | 0.54 | 0.25 | -0.17 | 0.14 | 0.56 |
N1 | 0.73 | 0.16 | 0.11 | -0.14 | -0.15 | 0.47 |
N2 | 0.80 | 0.14 | 0.18 | -0.16 | -0.11 | 0.36 |
N3 | 0.78 | -0.01 | -0.07 | 0.03 | 0.10 | 0.34 |
N4 | 0.60 | -0.17 | -0.11 | -0.05 | 0.16 | 0.51 |
N5 | 0.58 | -0.17 | -0.04 | 0.27 | -0.16 | 0.58 |
O1 | 0.02 | 0.42 | -0.10 | -0.10 | 0.44 | 0.59 |
O2 | 0.21 | -0.02 | -0.10 | 0.02 | -0.57 | 0.60 |
O3 | 0.12 | 0.46 | 0.01 | 0.05 | 0.35 | 0.55 |
O4 | 0.07 | -0.13 | -0.01 | 0.11 | 0.48 | 0.76 |
O5 | 0.08 | 0.01 | 0.02 | -0.10 | -0.74 | 0.45 |
Factor 1 | Factor 2 | Factor 3 | Factor 4 | Factor 5 | |
---|---|---|---|---|---|
Factor 1 | 1.00 | -0.14 | -0.24 | -0.19 | 0.11 |
Factor 2 | -0.14 | 1.00 | 0.21 | 0.47 | 0.32 |
Factor 3 | -0.24 | 0.21 | 1.00 | 0.01 | 0.35 |
Factor 4 | -0.19 | 0.47 | 0.01 | 1.00 | 0.12 |
Factor 5 | 0.11 | 0.32 | 0.35 | 0.12 | 1.00 |
## round input values to 5 decimal places
# factor loadings
fl1 <- bfiLoadings[, 1:5] |>
round(5) |>
as.matrix()
# item uniquenesses
un1 <- bfiLoadings[, 6] |> round(5)
# factor correlations
fc1 <- round(bfiCorrs, 5) |> as.matrix()
# run makeCorrLoadings() function
itemCors_1 <- makeCorrLoadings(
loadings = fl1,
factorCor = fc1,
uniquenesses = un1
)
## Compare the two matrices
chiSq_1 <- cortest.jennrich(
R1 = bfi_cor, R2 = itemCors_1,
n1 = sampleSize, n2 = sampleSize
)
The remaining test cases are as for the first study and this last test case. Changes are made to the number of decimal places considered, the presence of uniquenesses, presence of factor correlation matrix, and level of item-factor loading that is included.
Treatment | chi2 | p |
---|---|---|
Full information | 178.4 | 1.000 |
Full information - No uniquenesses | 183.7 | 1.000 |
Rounded loadings | 178.8 | 1.000 |
Rounded loadings - No uniquenesses | 183.7 | 1.000 |
Censored loadings <0.1 | 223.8 | 1.000 |
Censored loadings <0.2 | 454.5 | 0.000 |
Censored loadings <0.3 | 490.1 | 0.000 |
Censored loadings <0.1 - no uniqueness | 232.6 | 0.998 |
Censored loadings <0.2 - no uniqueness | 447.9 | 0.000 |
Censored loadings <0.3 - no uniqueness | 478.1 | 0.000 |
Censored loadings <0.1 - no uniqueness, no factor cors | 241.2 | 0.995 |
Censored loadings <0.2 - no uniqueness, no factor cors | 440.1 | 0.000 |
Censored loadings <0.3 - no uniqueness, no factor cors | 484.4 | 0.000 |
Both studies show consistent results.
Party panel
|
Big 5 (bfi)
|
|||
---|---|---|---|---|
treatment | chi2 | p | chi2 | p |
Full information | 21.1 | 0.977 | 178.4 | 1.000 |
Full information - No uniquenesses | 22.2 | 0.965 | 183.7 | 1.000 |
Rounded loadings | 21.1 | 0.978 | 178.8 | 1.000 |
Rounded loadings - No uniquenesses | 22.2 | 0.965 | 183.7 | 1.000 |
Censored loadings <0.1 | 24.5 | 0.926 | 223.8 | 1.000 |
Censored loadings <0.2 | 44.1 | 0.167 | 454.5 | 0.000 |
Censored loadings <0.3 | 74.2 | 0.000 | 490.1 | 0.000 |
Censored loadings <0.1 - no uniqueness | 26.2 | 0.886 | 232.6 | 0.998 |
Censored loadings <0.2 - no uniqueness | 46.0 | 0.123 | 447.9 | 0.000 |
Censored loadings <0.3 - no uniqueness | 78.9 | 0.000 | 478.1 | 0.000 |
Censored loadings <0.1 - no uniqueness, no factor cors | 30.5 | 0.727 | 241.2 | 0.995 |
Censored loadings <0.2 - no uniqueness, no factor cors | 49.6 | 0.065 | 440.1 | 0.000 |
Censored loadings <0.3 - no uniqueness, no factor cors | 65.8 | 0.002 | 484.4 | 0.000 |
The makeCorrLoadings()
function is designed to produce
an item correlation matrix based on item-factor loadings and factor
correlations as one might see in the results of Exploratory Factor
Analysis (EFA) or Structural Equation Modelling (SEM).
Results of this study suggest that the
makeCorrLoadings()
function does a surprisingly good job of
reproducing the target correlation matrix when all item-factor loadings
are present.
A correlation matrix created with makeCorrLoadings()
seems to be robust even with the absence of specified
uniquenesses, or even without factor correlations. But
a valid reproduction of a correlation matrix should have complete
item-factor loadings.