Choosing an Initialization Method for Archetypal Analysis

Quick Start

For a first Euclidean AA fit, start with "furthest_sum" or "kmeans_pp". "furthest_sum" is a strong boundary-seeking default for convex data, while "kmeans_pp" is a softer alternative when you want some randomness without falling back to uniform sampling. If the geometry is unknown and you can afford several starts, use "random" with a larger nrep. If you only have a few starts and runtime is less important, use "aa_pp" because each added point is chosen against the current AA reconstruction error.

library(yaap)

toy <- read.csv(system.file("extdata", "toy.csv", package = "yaap"))

fit <- run_aa(toy, K = 3, nrep = 5,  init = "furthest_sum")
fit <- run_aa(toy, K = 3, nrep = 5,  init = "kmeans_pp")
fit <- run_aa(toy, K = 3, nrep = 20, init = "random")
fit <- run_aa(toy, K = 3, nrep = 3,  init = "aa_pp")

Background

Archetypal analysis (AA) minimizes a reconstruction loss, written in the standard Euclidean formulation as

\[\mathcal{L}(X, SA) = \|X - S A\|_F^2.\]

As with many matrix decomposition problems, the objective is non-convex in both $S$ and $A$, and gradient descent can terminate at a local minimum. As a result, the quality of the final solution depends substantially on the starting point.

yaap implements several options for initializing the archetype coordinates, which have been suggested in the literature, through the aa_init() function and the init argument.

Methods at a Glance

The table below summarizes the seven methods by their strategy, whether they involve random sampling, their relative speed, how strongly they depend on the observed sampling density, how strongly they impose geometric assumptions, and any mandatory extra arguments.

Method	Strategy	Stochastic?	Speed	Density sensitivity	Geometry sensitivity	Mandatory extras
`"random"`	Sample $K$ rows uniformly at random	Yes	Fast	High	Low	—
`"dirichlet"`	Construct each archetype as a random convex combination of rows	Yes	Fast	High	Low	—
`"kmeans_pp"`	Sample proportional to squared distance to the nearest current archetype	Yes	Medium	Medium	Medium	—
`"aa_pp"`	Sample using residual reconstruction error from the current hull	Yes	Slow	Medium	Medium	—
`"furthest_first"`	Select the point furthest from all current archetypes	Seed only	Medium	Low-medium	High	—
`"furthest_sum"`	Select the point that maximizes the sum of distances to all current archetypes	Seed only	Medium	Low	High	—
`"hull_outmost"`	Select frequently outlying convex-hull candidates	Partitioned only	Medium	Low	High	`hull_method`

Density and geometry are counterbalancing sensitivities. Uniform methods such as random and diffuse dirichlet starts are strongly affected by where the data are numerous: a sparsely sampled corner may need many restarts before it is selected. Their advantage is that independent restarts genuinely explore different parts of the empirical distribution, which can help avoid repeatedly converging to the same local minimum when many fits are affordable.

More biased boundary-seeking methods such as furthest_sum are less affected by interior density because, once a corner is represented by at least one observed point, the greedy distance criterion can still discover it. The cost of that bias is geometric sensitivity and lower restart diversity: repeated starts often return the same, or nearly the same, boundary configuration. These methods therefore make sense when the assumed geometry is credible or when running many full AA optimizations is too expensive.

Large-data approximations are controlled orthogonally with batch_size, batch_type, and batch_replace. By default, batch_type = "distal" samples candidate rows with probability proportional to squared distance from the data center, preserving the coreset idea of focusing computation on likely boundary points (Mair and Brefeld 2019). Use batch_type = "uniform" when you want an unbiased mini-batch approximation, for example with method = "aa_pp".

The `aa_init()` Function

aa_init() runs any initialization method as a standalone step, returning a named list with two elements:

A — $K \times M$ matrix of initial archetype coordinates.
B — $K \times N$ row-stochastic matrix expressing each archetype as a convex combination of data points ($A = B X$).

toy <- read.csv(system.file("extdata", "toy.csv", package = "yaap"))
X <- as.matrix(toy) # 250 × 2
K <- 3

init <- aa_init(X, K = K, method = "furthest_sum")
str(init)
#> List of 2
#>  $ A: num [1:3, 1:2] 2.86 16.08 19.89 9.87 2.51 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "x" "y"
#>  $ B: num [1:3, 1:250] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : NULL

The A matrix provides the initial archetype coordinates that will seed the optimization loop; B is the convex-hull certificate for those coordinates. If run_aa() is initialized with aa_init() and a method name or a custom function, the B returned is carried into the algorithm initialization. If you instead pass a pre-computed coordinate matrix such as init$A, run_aa() checks that the coordinates still admit a feasible B. Rows outside the allowed data hull are projected back into the hull with a warning.

Supplying Your Own Initialization

The built-in method names cover the common cases, but run_aa() also accepts either a precomputed matrix of archetype coordinates or a custom initialization function.

If the starting coordinates are already available, pass them directly as a matrix in the same units as the X. run_aa() will preprocess it the same way it preprocesses X, handles the required checks, and constructs B.

X <- as.matrix(toy)
K <- 3
tol <- 0.01
eps <- 0

A0 <- X[1:K, , drop = FALSE]

fit_matrix <- run_aa(X, K = K, init = A0, scale = FALSE, tol = tol, eps = eps)

A custom function is useful when initialization itself needs code. The function must take X and K, and return the same two objects as aa_init(): the archetype coordinates A and the data weights B. Additional arguments can be passed through init_args. Below is a deliberately simple initializer that chooses the first K rows of the data.

X <- as.matrix(toy)
K <- 3
tol <- 0.01

first_k_init <- function(X, K, ...) {
    A <- X[1:K, , drop = FALSE]
    B <- diag(1, nrow = K, ncol = nrow(X))
    list(A = A, B = B)
}

fit_custom <- run_aa(X, K = K, init = first_k_init, scale = FALSE, tol = tol)

If your custom initializer naturally produces only archetype coordinates, fit_simplex() exposes the same simplex mapping used internally to construct a feasible B. Reconstructing A_init as B %*% X ensures the coordinates passed to AA are on the data hull:

X <- as.matrix(toy)
K <- 3

A_user <- some_initializer(X, K = K)
B <- fit_simplex(A_user, X)
A_init <- B %*% X
init <- list(A = A_init, B = B)

These two forms are useful when the starting point comes from a previous analysis, a domain-specific rule, or an external clustering workflow. For most workflows, passing a method name remains the shortest form.

X <- as.matrix(toy)
K <- 3
tol <- 0.01

fit <- run_aa(X, K = K, init = "furthest_sum", tol = tol)
fit <- run_aa(X, K = K, init = "aa_pp", tol = tol)
fit <- run_aa(X,
    K = K, init = "aa_pp",
    init_args = list(batch_size = 200, batch_type = "uniform"),
    tol = tol
)

Comparing Initializer Biases

X <- as.matrix(toy)
K <- 3

method_labels <- c(
    random         = "Random",
    kmeans_pp      = "k-means++",
    aa_pp          = "AA++",
    furthest_first = "Furthest First",
    furthest_sum   = "Furthest Sum",
    hull_outmost   = "Hull-outmost"
)
N_expected <- 10  # Expected selections per point across all runs
N_runs <- nrow(X) * N_expected
# "significance" cutoff for coloring points red/blue
color_cutoff <- -log10(1 / nrow(X))

selected_counts <- list()
for (method in names(method_labels)) {
    counts <- rep(0, nrow(X))
    for (run in 1:N_runs) {
        if (method == "hull_outmost") {
            init_obj <- aa_init(X, K = K, method = method,
                                hull_method = "partitioned")
        } else {
            init_obj <- aa_init(X, K = K, method = method)
        }
        selected <- which(init_obj$B > 0, arr.ind = TRUE)[, 2]
        counts[selected] <- counts[selected] + 1
    }
    selected_counts[[method]] <- counts
}

The first practical difference between initializers is not whether they are “good” or “bad”; it is where they tend to look. The toy data below are simple: 250 two-dimensional points spread near the vertices of a triangle. The figure below repeats each row-selecting initializer many times and colors each point by how often it was selected compared with uniform random sampling. Red points are selected more often than random, blue points are selected less often, and white points are close to the random baseline.

As expected, random selects archetypes uniformly from the observed rows, so the points do not diverge much from their null probability of selection. This is probably not the best strategy for a convex geometry like the toy data, but it is unbiased and thus robust to unknown data structure. With enough restarts this strategy will eventually explore every region, but any single run may miss the corners.

On the other extreme, methods like furthest_first, furthest_sum, and hull_outmost are exploring more aggressively the boundary of the data geometry. We see that most of them converge to the same set of candidate points and never pick any point in the middle of the cloud. furthest_first is the only one that shows some variation in the selected points, but it still has a strong corner preference. For a simple convex shape that bias is often useful: the corners are exactly where good archetypes should start.

Clustering-inspired methods such as kmeans_pp and aa_pp sit between these two extremes. These methods adapt ideas developed to initialize clustering algorithms to the AA setting. They are more probabilistic in nature and thus represent softer versions of their boundary-searching counterparts. They also rarely select any point in the center of the cloud, but they explore boundary regions more diffusely and do not explicitly maximize extremeness.

More details about the methodology employed by each method can be found in the Method Reference section at the end of this vignette. The rest of the vignette narrows to three representatives: random as an exploratory baseline, kmeans_pp as the softer outward-biased initializer, and furthest_sum for the strong greedy boundary bias.

Increasing the Number of Archetypes

With $K = 3$, the toy data roughly match the three-corner shape. When $K$ increases, the question changes from “did we find the corners?” to “how do the extra archetypes cover the cloud?” The following chunk is the user-facing call: only the method and K values change.

X <- as.matrix(toy)
k_grid <- c(3, 7, 15)
k_methods <- c("random", "kmeans_pp", "furthest_sum")

k_inits <- list()
for (method in k_methods) {
    method_inits <- list()
    for (K in k_grid) {
        k_lab <- paste("K =", K)
        method_inits[[k_lab]] <- aa_init(X, K = K, method = method)
    }
    k_inits[[method]] <- method_inits
}

As $K$ grows, the non-aggressive methods start to waste extra archetypes in the center of the convex hull. This is visible for random, and eventually also for kmeans_pp: once the main corners have been sampled, additional points are often placed in the interior regions where there are definitely no archetypes for this triangle. Those starts can still optimize successfully, but the initialization budget is no longer being used to approximate the boundary.

furthest_sum behaves differently. It keeps adding points on the outside of the cloud and, as $K$ increases, the selected polygon becomes a finer approximation to the true convex hull boundary. In fact, points generated by furthest_sum have a useful geometric property: they lie in the minimal convex set of the unselected data points [Theorem 2 in (Mørup and Hansen 2012)].

When Initialization Matters

The toy triangle is intentionally simple. If we start from one initialization and then let run_aa() optimize, most methods reach essentially the same final fit. The red dashed triangle below is the starting point; the blue solid triangle is the fitted solution.

X <- as.matrix(toy)
K <- 3
focused_methods <- c("random", "kmeans_pp", "furthest_sum")
fits <- list()
for (method in focused_methods) {
    fits[[method]] <- run_aa(
        x        = X,
        K        = K,
        init     = method,
        scale    = FALSE,
        max_iter = 60,
        tol      = 0.01
    )
}

This is the reassuring case: the data geometry is simple enough that a mediocre start can still be corrected by the optimizer. Initialization matters more when the geometry is less well described by straight-line Euclidean distances.

Nonlinear Geometries

The previous examples still live in a simple Euclidean geometry: distances in the plotted coordinates describe the structure we care about. Some datasets are not like that. In concentric circles, the inner circle is important even though the outer circle contains the most distant points. In a Swiss roll, points that look close in a two-dimensional projection may be far apart along the rolled surface.

K <- 8
nonlinear_data <- list(
    circles    = make_concentric_circles(),
    swiss_roll = make_swiss_roll()
)
nonlinear_methods <- c("random", "kmeans_pp", "furthest_sum")

nonlinear_inits <- list()
for (data_shape in names(nonlinear_data)) {
    data_obj <- nonlinear_data[[data_shape]]
    out <- list()
    for (method in nonlinear_methods) {
        out[[method]] <- aa_init(data_obj$X, K = K, method = method)
    }
    nonlinear_inits[[data_shape]] <- out
}

First inspect the Euclidean initializations. The biased methods are doing what they were designed to do: they seek far-away points. On these shapes, that can over-emphasize the outside and leave the center or inner structure under-represented. random, on the other hand, follows the observed sampling density rather than the outer geometry. On these uniformly sampled shapes, that keeps it from concentrating only on the outside, but it also means performance would change if important regions were sampled sparsely.

For these cases, the better answer is often to change the space in which AA is fit. A kernel method lets nearby points influence each other according to a smooth similarity rule rather than only through straight-line distance in the input coordinates.

fit_kernel <- archetypes_kernel_pgd(
    x      = X,
    K      = K,
    kernel = "laplace",
    init   = "furthest_sum",
    tol    = 0.01
)

The same initialization names still work. The important difference is that the kernel version measures spread in the similarity space used by the model. This can rescue boundary-seeking methods on shapes where the raw coordinates make the outside look more important than the rest of the structure.

Conclusions and Recommendations

There is no universal best initializer. The right choice depends on how much you know about the geometry, whether useful regions are sampled evenly, and how many full optimization restarts you can afford.

Situation	Recommended `init`
Simple convex-ish data	`"furthest_sum"`, `"kmeans_pp"`
Uneven sampling density, corners represented	`"furthest_sum"`, `"furthest_first"`, `"hull_outmost"`
Unknown structure, enough restarts	`"random"` or `"kmeans_pp"` with a larger `nrep`
Unknown structure, few restarts	`"aa_pp"`
Very large dataset, speed critical	`"furthest_sum"` with `batch_size` or `"hull_outmost"` partitioned
Reproducing legacy results	`"furthest_sum"` (PCHA), `"random"` [archetypes], `"dirichlet"` directional

Choose along two main axes: how much you trust the geometry, and how much restart diversity you can afford. Strongly biased initializers such as furthest_sum and hull_outmost are useful when the convex geometry is trusted or full optimization runs are expensive, but repeated starts often return the same boundary configuration. random is the opposite extreme: it makes the fewest geometric assumptions and explores different basins across restarts, but it follows the empirical sampling density, so sparse corners may require a larger nrep.

kmeans_pp and aa_pp are the practical middle ground. They keep stochastic variation across restarts while biasing the draw toward points that improve coverage or reconstruction. Use them when you want a better first guess than random without committing as strongly to one boundary-seeking geometry. If an extreme corner is absent from the sample entirely, no row-selecting initializer can recover it without changing the model, the data collection, or the candidate set.

If the geometry is visibly nonlinear, changing the model is usually more important than fine-tuning the Euclidean initializer. Use archetypes_kernel_pgd() with a suitable kernel, then choose one of the kernel-compatible initializers such as "kmeans_pp" or "furthest_sum".

For memory efficiency and speed, any initializer can be restricted to a candidate batch with batch_size. The default batch_type = "distal" samples candidate rows in proportion to their squared distance from the data center, which is the coreset idea of Mair and Brefeld (2019): if archetypes are expected near the boundary, it is wasteful to spend most candidate memory on central points. Use batch_type = "uniform" when the goal is an unbiased mini-batch approximation, especially for method = "aa_pp".

Initialization Caveats

The default scale = FALSE in run_aa() preserves the data on its supplied feature scale. Setting scale = TRUE applies column-wise standardization (z-scoring) before fitting; this often improves the conditioning of Gaussian Euclidean fits, but it interacts differently with each initialization strategy.

Mair and Sjölund (2023) show that FurthestSum is sensitive to preprocessing: standardization can distort the Euclidean distances it uses to choose boundary points. AA++ is less affected because it samples from residuals tied directly to the AA objective (Mair and Sjölund 2023).

The cost of this objective-aware sampling is runtime. Among the built-in initializers, "aa_pp" is usually the slowest because after the first two points it solves a small AA projection problem at each step, essentially a greedy AA analysis. In return, each added point is expected not to increase the reconstruction cost, and the expected cost decreases whenever the new point expands the current hull.

Initializer names are not supported identically by every model family. The Euclidean solvers (method = "pgd" and "nnls") call aa_init() on the preprocessed data, so they accept the full catalogue of initializer strings described above, as well as custom initializer functions and coordinate matrices. PAA uses family-specific parameter profiles and does not support the Euclidean scale argument.

Kernel AA is the main exception. With method = "kernel", initializer strings are limited to "random", "dirichlet", "furthest_first", "kmeans_pp", and "furthest_sum". Row-selection methods can be expressed using selected row indices and pairwise distances from the kernel Gram matrix; "dirichlet" is also available because it constructs the coefficient matrix $B$ directly and does not require input-space archetype coordinates. Candidate batching is supported for these kernel initializers, with distal batches computed from kernel-space center distances. The remaining strings are not accepted as kernel initializers: "aa_pp" requires convex-hull residual projections and "hull_outmost" requires explicit hull candidates. Those operations would need separate kernel-space implementations. If you need one of those starts for a kernel fit, pass an explicit K x n non-negative coefficient matrix instead; row indices or row names are also accepted when you want to select observed rows directly.

The directional solver also accepts the aa_init() method strings, but it defaults to "dirichlet" for historical reasons.

Method Reference

`random`

Selects $K$ rows of $X$ uniformly at random without replacement.

X <- as.matrix(toy)
K <- 3
init <- aa_init(X, K = K, method = "random")

Uniform sampling is the simplest possible initialization and requires no distance computations. By design it explores the whole geometry with equal probability, so it makes few assumptions about where useful archetypes should lie. When combined with many random restarts via the nrep argument in run_aa(), even simple uniform initialization can find good solutions as pointed out in (Krohne et al. 2019).

`dirichlet`

Instead of sampling archetypes from the data, dirichlet constructs the coefficients matrix $B$ directly by sampling each row from a symmetric Dirichlet distribution $\text{Dirichlet}(\alpha \mathbf{1}_N)$. Thus the resulting archetypes are random convex combinations of all data points.

The alpha parameter controls the shape of the distribution:

alpha = 1 (default): all possible combinations are equally likely
alpha < 1: sparse combinations are more likely, so only a few points contribute to each archetype.
alpha > 1: near uniform combinations are more likely, so many points contribute to each archetype.

As $\alpha \to 0$ the behavior approaches random (archetypes snap to data points), while for $\alpha \gg 1$ the archetypes are more likely to lie near the center of the data hull. The plot below compares initial archetype placements across three variants to demonstrate the convergence of dirichlet to random.

"random" archetypes (left) always coincide with data points and cluster near high-density regions. "dirichlet" with $\alpha = 1$ (center) samples weights from all data points and often produces archetypes near the center of the hull. Lowering $\alpha$ (right) makes weights sparse, which pushes the random mixtures toward specific data points and toward the behavior of "random".

Furthest First (`"furthest_first"`)

Selects the first archetype at random, with bias toward points far from the mean, and then iteratively adds the single point that is furthest from the already-selected set (the nearest-archetype distance to the set is maximized).

X <- as.matrix(toy)
K <- 3
init <- aa_init(X, K = K, method = "furthest_first")

Furthest First is a greedy adaptation of the $k$-center algorithm. It produces well-separated archetypes and is deterministic after the first random draw. Because it targets the globally farthest point at each step, it tends to spread archetypes evenly across the data hull, though this spread is not specifically tuned to the AA reconstruction objective.

k-means++ (`"kmeans_pp"`)

Selects the first archetype at random, with bias toward points far from the mean, and then samples each subsequent archetype with probability proportional to the squared distance to the nearest already-selected archetype.

X <- as.matrix(toy)
K <- 3
init <- aa_init(X, K = K, method = "kmeans_pp")

The k-means++ sampling rule is a softer version of Furthest First. Instead of always picking the single farthest point, it gives more chances to points that are far from the already-selected archetypes. It can also be thought of as an approximation of AA++ which uses the squared distance to the nearest archetype as a proxy for the distance to the convex hull of the already-selected archetypes (distance to corners instead of distance to faces).

Furthest Sum (`"furthest_sum"`)

Selects archetypes by maximizing the sum of distances to all current archetypes (Mørup and Hansen 2012). An optional refinement phase, controlled by refinement_steps (default 10), cycles through the selected set and replaces each archetype with a better candidate to remove the effect of the initial random selection.

X <- as.matrix(toy)
K <- 3
# Default refinement
init <- aa_init(X, K = K, method = "furthest_sum")

# More aggressive refinement
init <- aa_init(X, K = K, method = "furthest_sum", refinement_steps = 30)

# No refinement
init <- aa_init(X, K = K, method = "furthest_sum", refinement_steps = 0)

FurthestSum is the historical default initialization for AA (Mørup and Hansen 2012) and remains the default when init = NULL is passed to run_aa().

Batched Coreset-Style Initialization

Supplying batch_size restricts initialization to a candidate batch before selecting archetypes. With the default batch_type = "distal", candidates are sampled with probability proportional to each point’s squared distance from the data mean. This is the coreset idea of (Mair and Brefeld 2019): reduce the effective problem size while preserving the extremal structure that boundary-seeking methods such as furthest_sum rely on.

X <- as.matrix(toy)
K <- 3
batch_size <- 60

# batch_size must be at least K
init <- aa_init(X, K = K, method = "furthest_sum", batch_size = batch_size)

For large datasets where computing all pairwise distances is expensive, the batch reduces memory use and runtime. Because distal batching downweights the center, it avoids spending most candidate slots in regions where archetypes are unlikely to be useful.

AA++ (`"aa_pp"`)

AA++ (Mair and Sjölund 2023) selects each archetype by sampling with probability proportional to the squared residual distance from the convex hull of the already-selected archetypes. Concretely, after selecting $k - 1$ archetypes it solves a small NNLS projection to find, for each data point, its best approximation as a convex combination of the current archetypes; the squared reconstruction error becomes the sampling weight for the $k$-th archetype.

X <- as.matrix(toy)
K <- 3
init <- aa_init(X, K = K, method = "aa_pp")

Because the sampling probability is tied to the AA objective itself rather than to a surrogate (such as nearest-archetype distance), AA++ has a formal guarantee: each new archetype decreases the expected objective function¹.

Compared to the other initialization methods, AA++ is more computationally intensive because it requires solving $K-2$ NNLS problems (the first two archetypes are selected via kmeans_pp) in order to compute the sampling probabilities for each subsequent archetype. To mitigate this cost, AA++ can be used with batch_size to give a Monte Carlo approximation of the exact method. See the next section for details.

Batched AA++

AA++ can also use batch_size, giving a variant of the Monte Carlo AA++ approximation. Instead of projecting the full dataset at each step, it draws a candidate batch, projects only those rows, and samples the next archetype from their residuals.

X <- as.matrix(toy)
K <- 3
batch_size <- 100
batch_type <- "uniform"

# Larger batch_size -> closer approximation to exact AA++
init <- aa_init(X,
    K = K,
    method = "aa_pp",
    batch_size = batch_size,
    batch_type = batch_type
)

The batch_size parameter trades approximation accuracy for speed and must satisfy batch_size >= K. When batch_size == nrow(X) the method reduces to exact AA++. When batch_size << nrow(X) only a small fraction of the data is evaluated at each step, making batched AA++ useful for large datasets where exact AA++ becomes memory-bound. Use batch_type = "uniform" for the least biased approximation, or keep the default batch_type = "distal" if you want the candidate batches to emphasize likely boundary points.

Hull-outmost (`"hull_outmost"`)

Builds a pool of candidate hull vertices, then ranks them by an outmost-vote criterion: each candidate votes for the vertex farthest from it, and the $K$ most-voted candidates are selected as archetypes. The hull_method argument (required) controls how the candidate pool is constructed:

"full" — exact convex hull of $X$ in the original feature space. Uses grDevices::chull for 2-D data and the geometry package for higher dimensions.
"projected" — union of convex hulls computed across random low-dimensional projections of $X$. Controlled by projected_dim (default 2) and n_projection_max.
"partitioned" — union of hull vertices from random partitions of the data rows. Controlled by n_partitions (default 10). The fastest strategy.

X <- as.matrix(toy)
K <- 3
# Full hull — exact but requires 'geometry' for D > 2
init <- aa_init(X, K = K, method = "hull_outmost", hull_method = "full")

# Projected hull — works in any dimension without extra packages
init <- aa_init(X,
    K = K, method = "hull_outmost",
    hull_method = "projected", projected_dim = 2
)

# Partitioned hull — fastest, suitable for large n
init <- aa_init(X,
    K = K, method = "hull_outmost",
    hull_method = "partitioned", n_partitions = 15
)

The partitioned and projected strategies are designed for datasets where computing the exact hull is intractable. The use_unique_candidates argument (default FALSE) controls whether duplicate hull candidates (a point appearing in multiple projections or partitions) cast only one vote.

Session Information

#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Madrid
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] yaap_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     Matrix_1.7-5     
#>  [5] xfun_0.57         lattice_0.22-9    matrixStats_1.5.0 cachem_1.1.0     
#>  [9] knitr_1.51        nnls_1.6          htmltools_0.5.9   generics_0.1.4   
#> [13] rmarkdown_2.31    lifecycle_1.0.5   cli_3.6.6         grid_4.6.0       
#> [17] sass_0.4.10       jquerylib_0.1.4   compiler_4.6.0    tools_4.6.0      
#> [21] evaluate_1.0.5    bslib_0.11.0      yaml_2.3.12       otel_0.2.0       
#> [25] quadprog_1.5-8    rlang_1.2.0       jsonlite_2.0.0

References

Krohne, Laerke Gebser, Yi Wang, Jesper L. Hinrich, Morten Moerup, Raymond C. K. Chan, and Kristoffer H. Madsen. 2019. “Classification of Social Anhedonia Using Temporal and Spatial Network Features from a Social Cognition fMRI Task.” Human Brain Mapping 40 (17): 4965–81. https://doi.org/https://doi.org/10.1002/hbm.24751.

Mair, Sebastian, and Ulf Brefeld. 2019. “Coresets for Archetypal Analysis.” In Advances in Neural Information Processing Systems. Vol. 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/7f278ad602c7f47aa76d1bfc90f20263-Paper.pdf.

Mair, Sebastian, and Jens Sjölund. 2023. “Archetypal Analysis++: Rethinking the Initialization Strategy.” https://arxiv.org/abs/2301.13748.

Mørup, Morten, and Lars Kai Hansen. 2012. “Archetypal Analysis for Machine Learning and Data Mining.” Neurocomputing 80: 54–63. https://doi.org/https://doi.org/10.1016/j.neucom.2011.06.033.

Method	Strategy	Stochastic?	Speed	Density sensitivity	Geometry sensitivity	Mandatory extras
`"random"`	Sample \(K\) rows uniformly at random	Yes	Fast	High	Low	—
`"dirichlet"`	Construct each archetype as a random convex combination of rows	Yes	Fast	High	Low	—
`"kmeans_pp"`	Sample proportional to squared distance to the nearest current archetype	Yes	Medium	Medium	Medium	—
`"aa_pp"`	Sample using residual reconstruction error from the current hull	Yes	Slow	Medium	Medium	—
`"furthest_first"`	Select the point furthest from all current archetypes	Seed only	Medium	Low-medium	High	—
`"furthest_sum"`	Select the point that maximizes the sum of distances to all current archetypes	Seed only	Medium	Low	High	—
`"hull_outmost"`	Select frequently outlying convex-hull candidates	Partitioned only	Medium	Low	High	`hull_method`