SVD wrapper, PCA and the bi_projector

1. Why wrap SVD at all?

There are six popular SVD engines in R (base::svd, corpcor, RSpectra, irlba, rsvd, svd (PROPACK)) – each with its own argument list, naming conventions and edge-cases (some refuse to return the full rank, others crash on tall-skinny matrices).

svd_wrapper() smooths that out:

That means immediate access to verbs such as project(), reconstruct(), truncate(), partial_project().

set.seed(1)
X <- matrix(rnorm(35*10), 35, 10)   # 35 obs × 10 vars

sv_fast <- svd_wrapper(X, ncomp = 5, preproc = center(), method = "fast")

# irlba backend (if installed) gives identical results
sv_irlba <- if (requireNamespace("irlba", quietly = TRUE)) {
  suppressWarnings(svd_wrapper(X, ncomp = 5, preproc = center(), method = "irlba"))
}

# Same downstream code works for both objects:
head(scores(sv_fast)) # 35 × 5
#>            [,1]       [,2]       [,3]        [,4]        [,5]
#> [1,] -2.9415181 -1.6140167  0.2117456  0.12109736 -0.46419317
#> [2,]  0.4743086  0.3458298 -0.8467096 -1.21167498  0.02074819
#> [3,] -1.6999172 -1.1535717 -1.0276227 -0.33535843  0.37155930
#> [4,]  0.1131790  0.7789166 -0.7394153  0.43625966  2.24260205
#> [5,]  0.8437314 -1.7600608 -0.8939140  0.77861595  0.81936957
#> [6,]  0.6063990 -1.8810077  1.2246519  0.03652504 -1.40433408

if (!is.null(sv_irlba)) {
  all.equal(scores(sv_fast), scores(sv_irlba))
}
#> [1] "Mean relative difference: 0.7832173"

2. A one-liner pca()

Most people really want PCA, so pca() is a thin wrapper that

  1. calls svd_wrapper() with sane defaults,
  2. adds the S3 class “pca” (printing, screeplot, biplot, permutation test, …).
data(iris)
X_iris <- as.matrix(iris[, 1:4])

pca_fit <- pca(X_iris, ncomp = 4)    # defaults to method = "fast", preproc=center()
print(pca_fit)
#> PCA object  -- derived from SVD
#> 
#> Data: 150 observations x 4 variables
#> Components retained: 4
#> 
#> Variance explained (per component):
#>  1 2 3 4  92.46  5.31  1.71  0.52%  (cumulative:  92.46 97.77 99.48   100%)

2.1 Scree-plot and cumulative variance

screeplot(pca_fit, type = "lines", main = "Iris PCA – scree plot")

2.2 Quick biplot

# Requires ggrepel for repulsion, but works without it
biplot(pca_fit, repel_points = TRUE, repel_vars = TRUE)

(If you do not have ggrepel installed the text is placed without repulsion.)

3. What is a bi_projector?

Think bidirectional mapping:

data space  (p variables)  ↔  component space  (d ≤ p)
        new samples:  project()        ← scores
       new variables: project_vars()   ← loadings
                     reconstruction ↔  (scores %*% t(loadings))

A bi_projector therefore carries

slot shape description
v p × d component loadings (columns)
s n × d score matrix (rows = observations)
sdev d singular values (or SDs related to components)
preproc fitted transformer so you never leak training stats

Because pca() returns a bi_projector, you get other methods for free:

# rank-2 reconstruction of the iris data
Xhat2 <- reconstruct(pca_fit, comp = 1:2)
print(paste("MSE (rank 2):", round(mean((X_iris - Xhat2)^2), 4))) # MSE ~ 0.076
#> [1] "MSE (rank 2): 0.0253"

# drop to 2 PCs everywhere
pca2 <- truncate(pca_fit, 2)
shape(pca2)            # 4 vars × 2 comps
#> [1] 4 2

4. Fast code-coverage cameo

The next chunk quietly touches a few more branches used in the unit tests (std_scores(), perm_test(), rotate()), but keeps printing to a minimum:

# std scores
head(std_scores(svd_wrapper(X, ncomp = 3))) # Use the earlier X data
#>            [,1]       [,2]         [,3]
#> [1,] -2.1517996 -1.2898029 -0.009068656
#> [2,]  0.2706758  0.3540074 -0.658705409
#> [3,] -1.3315759 -0.7788579 -1.206524820
#> [4,] -0.0595748  0.7971995 -0.971493443
#> [5,]  0.5035052 -1.2105838 -1.291893170
#> [6,]  0.4441909 -1.5304768  0.866038002

# tiny permutation test (10 perms; obviously too few for science)
# This requires perm_test.pca method
# Make sure X_iris is centered if perm_test needs centered data
perm_res <- perm_test(pca_fit, X_iris, nperm = 10, comps = 2)
#> Pre-calculating reconstructions for stepwise testing...
#> Running 10 permutations sequentially for up to 2 PCA components (alpha=0.050, serial)...
#>   Testing Component 1/2...
#>   Component 1 p-value (0.09091) > alpha (0.050). Stopping sequential testing.
print(perm_res$component_results)
#> # A tibble: 1 × 5
#>    comp observed   pval lower_ci upper_ci
#>   <int>    <dbl>  <dbl>    <dbl>    <dbl>
#> 1     1    0.925 0.0909    0.682    0.689

# quick varimax rotation
if (requireNamespace("GPArotation", quietly = TRUE)) {
  pca_rotated <- rotate(pca_fit, ncomp = 3, type = "varimax")
  print(pca_rotated)
} else {
  cat("GPArotation not installed, skipping rotation example.\n")
}
#> PCA object  -- derived from SVD
#> 
#> Data: 150 observations x 4 variables
#> Components retained: 4
#> 
#> Variance explained (per component):
#>  1 2 3 4  46.56  3.92  5.68 43.84%  (cumulative:  46.56 50.48 56.16   100%)
#> 
#> Explained variance from rotation:
#> 82.9 %
#>  6.98 %
#>  10.12 %
#>  
#> 
#> Rotation details:
#>   Type: varimax 
#>   Loadings type: N/A (orthogonal)

(Running these once in the vignette means they are also executed by R CMD check, bumping test-coverage without extra scaffolding.)

5. Take-aways


Session info

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.3
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
#> 
#> time zone: America/Toronto
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] glmnet_4.1-10      Matrix_1.7-3       knitr_1.51         tibble_3.3.1      
#> [5] dplyr_1.1.4        ggplot2_4.0.1      multivarious_0.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] GPArotation_2025.3-1 utf8_1.2.6           sass_0.4.10         
#>  [4] future_1.68.0        generics_0.1.4       shape_1.4.6.1       
#>  [7] lattice_0.22-7       listenv_0.10.0       digest_0.6.39       
#> [10] magrittr_2.0.4       evaluate_1.0.5       grid_4.5.1          
#> [13] RColorBrewer_1.1-3   iterators_1.0.14     fastmap_1.2.0       
#> [16] foreach_1.5.2        jsonlite_2.0.0       ggrepel_0.9.6       
#> [19] RSpectra_0.16-2      survival_3.8-3       mgcv_1.9-3          
#> [22] scales_1.4.0         pls_2.8-5            codetools_0.2-20    
#> [25] jquerylib_0.1.4      cli_3.6.5            crayon_1.5.3        
#> [28] rlang_1.1.7          chk_0.10.0           parallelly_1.45.1   
#> [31] future.apply_1.20.0  splines_4.5.1        withr_3.0.2         
#> [34] cachem_1.1.0         yaml_2.3.12          otel_0.2.0          
#> [37] tools_4.5.1          parallel_4.5.1       corpcor_1.6.10      
#> [40] globals_0.18.0       rsvd_1.0.5           assertthat_0.2.1    
#> [43] vctrs_0.7.0          R6_2.6.1             matrixStats_1.5.0   
#> [46] proxy_0.4-27         lifecycle_1.0.5      MASS_7.3-65         
#> [49] irlba_2.3.5.1        pkgconfig_2.0.3      pillar_1.11.1       
#> [52] bslib_0.9.0          geigen_2.3           gtable_0.3.6        
#> [55] glue_1.8.0           Rcpp_1.1.1           xfun_0.55           
#> [58] tidyselect_1.2.1     svd_0.5.8            farver_2.1.2        
#> [61] nlme_3.1-168         htmltools_0.5.9      labeling_0.4.3      
#> [64] rmarkdown_2.30       compiler_4.5.1       S7_0.2.1

mirror server hosted at Truenetwork, Russian Federation.