Model continuous trait evolution

Maël Doré

2026-01-13


This vignette presents the different options available to model continuous trait evolution within deepSTRAPP.

It builds mainly upon functions from R packages [geiger] and [phytools] to offer a simplified framework to model and visualize continuous trait evolution on a time-calibrated phylogeny.
Please, cite also the initial R packages if you are using deepSTRAPP for this purpose.


For an example with categorical data, see this vignette: vignette("model_categorical_trait_evolution")

For an example with biogeographic data, see this vignette: vignette("model_biogeographic_range_evolution")



# ------ Step 0: Load data ------ #

## Load phylogeny and tip data

library(phytools)
data(eel.tree)
data(eel.data)
# Dataset of feeding mode and maximum total length from 61 species of elopomorph eels.
# Source: Collar, D. C., P. C. Wainwright, M. E. Alfaro, L. J. Revell, and R. S. Mehta (2014)
# Biting disrupts integration to spur skull evolution in eels. Nature Communications, 5, 5505.

# Extract body size
eel_tip_data <- stats::setNames(eel.data$Max_TL_cm,
                                rownames(eel.data))

plot(eel.tree)
 ape::Ntip(eel.tree) == length(eel_tip_data)

## Check that trait tip data and phylogeny are named and ordered similarly
all(names(eel_tip_data) == eel.tree$tip.label)

# Reorder tip_data as in phylogeny
eel_tip_data <- eel_tip_data[eel.tree$tip.label]

# ------ Step 1: Prepare continuous trait data ------ #

## Goal: Map continuous trait evolution on the time-calibrated phylogeny

# 1.1/ Fit evolutionary models to trait data using Maximum Likelihood.
# 1.2/ Select the best fitting model comparing AICc.
# 1.3/ Infer ancestral characters estimates (ACE) at nodes.
# 1.4/ Infer ancestral states along branches using interpolation to produce a `contMap`.

library(deepSTRAPP)

# All these actions are performed by a single function: deepSTRAPP::prepare_trait_data()
?deepSTRAPP::prepare_trait_data()
# Model selection is performed internally with deepSTRAPP::select_best_model_from_geiger()
?deepSTRAPP::select_best_model_from_geiger()
# Plotting of the contMap is carried out with deepSTRAPP::plot_contMap()
?deepSTRAPP::plot_contMap()
  
## Macroevolutionary models for continuous traits

?geiger::fitContinuous() # For more in-depth information on the models available

## 7 models from geiger::fitContinuous() are available
 # "BM": Brownian Motion model. 
    # Default model that assumes a Brownian random walk in the trait space. 
    # No trend. No time-dependence. 
    # Correlation structure is proportional to the extent of shared ancestry for pairs of species.
    # sigma² ('sigsq') is the evolutionary rate that represents
    # the expected variance in traits in proportion to time.
    # 'z0' is the ancestral character estimate (trait value) at the root.
 # "OU": Ornstein-Uhlenbeck model. 
    # Random walk with a central tendency (= an optimum). 
    # Attraction toward the central tendency is controlled by parameter 'alpha'. 
 # "EB": Early-burst model. 
    # Time-dependent model where rate of evolution increases or decreases exponentially 
    # through time under the model r[t] = r[0] * exp(a * t),  where r[0] is the initial rate,
    # 'a' is the rate change parameter, and t is time. By default, 'a' is set to be negative,
    # therefore the model represents a decelerating rate of evolution.
    # Parameter estimate boundaries can be change to allow accelerating evolution 
    # with positive values of 'a' (as in the ACDC model).
 # "rate_trend": Linear trend model.
    # Time dependent model where rate of evolution varies linearly with time 
    # (i.e., following a slope). If the 'slope' parameter is positive,
    # rates are increasing, and conversely.
 # "lambda": Pagel's lambda model
    # Based on branch length transformation. Modulates the extent to which the phylogeny
    # predicts covariance among trait values for species (i.e., the degree of phylogenetic signal).
    # The model multiplies all internal branch lengths by 'lambda'.
    # 'lambda' close to zero indicates no phylogenetic signal. 
    # 'lambda' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
 # "kappa": Pagel's kappa model.
    # Based on branch length transformation. Punctuational (speciational) model where
    # trait divergence is related to the number of speciation events between pairs of species.
    # Assumes that speciation events are responsible for trait divergence.
    # The model raises all branch lengths to an estimated power 'kappa'.
    # 'kappa' close to zero indicates a strong dependency of trait evolution on speciation events.
    # 'kappa' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
 # "delta": Pagel's delta model.
    # Based on branch length transformation. Time-dependent model that modulates 
    # the relative contributions of early vs. late evolution in the tree.
    # The model raises all node depths to an estimated power 'delta'.
    # 'delta' close to one approximate the 'BM' model and indicates strong phylogenetic signal.
    # 'delta' lower than one gives more weight to early evolution, thus represents decelerating rates.
    # 'delta' higher than one gives more weight to late evolution, thus represents accelerating rates.

## Model trait data evolution
eel_cont_data <- prepare_trait_data(
    tip_data = eel_tip_data,
    trait_data_type = "continuous",
    phylo = eel.tree,
    seed = 1234, # Set seed for reproducibility
    # All possible models
    evolutionary_models = c("BM", "OU",  "EB", "rate_trend", "lambda", "kappa", "delta"), 
    control = list(niter = 200), # Example of additional parameters that can be pass down
    # to geiger::fitContinuous() to control parameter optimization.
    res = 100, # To set the reoslution of the continuous mapping of trait value on the contMap
    color_scale = c("darkgreen", "limegreen", "orange", "red"),
    plot_map = FALSE,
    # PDF_file_path = "./eel_contMap.pdf", # To export in PDF the contMap generated
    return_ace = TRUE, # To include Ancestral Character Estimates (ACE) at nodes in the output
    return_best_model_fit = TRUE, # To include the best model fit in the output
    return_model_selection_df = TRUE, # To include the df for model selection in the output
    verbose = TRUE) # To display progress


# ------ Step 2: Explore output ------ #

## Explore output
str(eel_cont_data, 1)

## Extract the contMap showing interpolated continuous trait evolution
## on the phylogeny as estimated from the model
eel_contMap <- eel_cont_data$contMap

# Plot with initial color_scale
plot_contMap(eel_contMap,
             fsize = c(0.6, 1)) # Adjust tip label size
# Plot with updated color_scale
plot_contMap(contMap = eel_contMap,
             color_scale = c("purple", "violet", "cyan", "blue"),
             fsize = c(0.6, 1)) # Adjust tip label size
# The contMap is the main input needed to perform a deepSTRAPP run on continuous trait data.

## Extract the Ancestral Character Estimates (ACE) = trait values at nodes
eel_ACE <- eel_cont_data$ace
head(eel_ACE)
# This is a named numerical vector with names = internal node ID and values = ACE.
# It can be used as an optional input in deepSTRAPP run to provide
# perfectly accurate estimates for trait values at internal nodes. 

## Explore summary of model selection
eel_cont_data$model_selection_df # Summary of model selection
# Models are compared using the corrected Akaike's Information Criterion (AICc)
# Akaike's weights represent the probability that a given model is the best among 
# the set of candidate models, given the data.
# Here, the best model is Pagel's lambda

## Explore best model fit (Pagel's lambda)
eel_cont_data$best_model_fit # Summary of best model optimization by geiger::fitContinuous()
eel_cont_data$best_model_fit$opt # Parameter estimates and goodness-of-fit information
# 'lambda' = 0.636. The best model detects a moderate degree of phylogenetic signal.


## Inputs needed to run deepSTRAPP are the contMap (eel_contMap), and optionally,
## the tip_data (eel_tip_data), and the ACE (eel_ACE)
#>                 model      logL k      AIC     AICc delta_AICc Akaike_weights
#> BM                 BM -338.9352 2 681.8704 682.0773 17.7682886            0.0
#> OU                 OU -329.2538 3 664.5076 664.9287  0.6196536           28.0
#> EB                 EB -338.9355 3 683.8710 684.2921 19.9830415            0.0
#> rate_trend rate_trend -335.1346 3 676.2692 676.6902 12.3811866            0.1
#> lambda         lambda -328.9440 3 663.8880 664.3090  0.0000000           38.2
#> kappa           kappa -329.0984 3 664.1969 664.6179  0.3088913           32.7
#> delta           delta -332.6586 3 671.3173 671.7383  7.4293004            0.9
#>            rank
#> BM            6
#> OU            3
#> EB            7
#> rate_trend    5
#> lambda        1
#> kappa         2
#> delta         4

mirror server hosted at Truenetwork, Russian Federation.