Phylogenetic trees represent hypothesis about the natural history of clades. No phylogeny could be taken as ground truth, since both the age of common ancestors and the actual tree topology may differ substantially from reality, and usually differ across studies. Hence, accounting for phylogenetic uncertainty is much appropriate in comparative studies.
The function swapONE
provides a fast and effective way
to produce alternative tree topologies swapping a specified proportion
of the tree tips and changing the ages of a specified proportion of
common ancestors (nodes). The user, though, may indicate whether and
which clades have to be kept monophyletic depending on the recognition
of well-supported clades. Tips within the monophyletic clades can still
be swapped. The function also returns the Kuhner-Felsenstein (Kuhner
& Felsenstein 1994) distance between original and ‘swapped’ trees
($Kuhner-Felsenstein distance
). The user may ask to plot
the swapped tree, highlighting the species and nodes with changed
positions by coloring their branches and labels.
When species position are swapped, the function selects exchangeable
species pairs based on phylogenetic covariance and proximity. In order
to be swapped a species pair should share a certain amount of
phylogenetic time (which depends on phylogenetic structure) and to be
less than 3 nodes apart. Then, a given proportion of these pairs
(indicated through the argument si
) is randomly sampled to
switch position. In some cases, also depending on tree topology, the
proportion of species actually swapped is less then the imposed
si
value. This happens when the age (meant as the distance
from the youngest species in the tree) of one of the species in the pair
is older then the age of the ancestors (i.e. nodes) of the other
species, which makes it impossible to swap them (see t3 and t1 in the
figure below). In any case, switching never changes the distance of the
species from the tree root.
The argument si2
specify the proportion of internal
nodes whose age should be shifted. Nodes are randomly sampled within the
tree, excluding the tree root. For each of them, the new age is derived
from a random uniform distribution ranging between the age of the
ancestor and the age of the closest descendant.
The function resampleTree
allows accounting for
phylogenetic and sampling uncertainty at once. It first performs
swapONE
to change the topology and then removes from the
swapped tree a user-specified proportion of species. The probability for
a species to be removed can be either random or conditioned by the user
by setting the argument sdata
. It can include a sampling
probability (meant as the probability to be removed from the tree) for
each species or, in case of stratified random sampling, the strata. In
addition, in case species on the tree belong to some kind of category
whose integrity should be maintained (i.e. no less than 5 species at
least in each of them), the categories
argument is used to
indicate the groups to be preserved.
DataCetaceans$treecet->tree
plot(tree,show.tip.label = FALSE,no.margin = TRUE)
nodelabels(frame="n",col="red")
# Select two clades for stratified random sampling
clanods=c("crown_Odo"=150,"crown_Mysti"=131)
sdata1<-do.call(rbind,lapply(1:length(clanods),function(w)
data.frame(species=tips(tree,clanods[w]),group=names(clanods)[w])))
# generate a vector of probabilities based on body mass
prdata<-max(DataCetaceans$masscet)-DataCetaceans$masscet
# select two nodes to be preserved
nn=c(180,159)
# generate two fictional categorical vectors to be preserved
cat1<-sample(rep(c("a","b","c"),each=39),Ntip(tree))
names(cat1)<-tree$tip.label
cat2<-rep(c("d","e"),each=100)
names(cat2)<-sample(tree$tip.label,100)
# 1. Random sampling
resampleTree(tree,s=0.25,swap.si=0.3)->tree1
# 1.1 Random sampling preserving clades
resampleTree(tree,s=0.25,nodes=nn)->tree2
# 2. Stratified random sampling
resampleTree(tree,sdata = sdata1,s=0.25)->tree3
# 2.1 Stratified random sampling preserving clades and categories
resampleTree(tree,sdata = sdata1,s=0.25,nodes=nn,categories = list(cat1,cat2))->tree4
# 3. Sampling conditioned on probability
resampleTree(tree,sdata = prdata,s=0.25,nsim=5)->tree5
Kuhner, M. K. & Felsenstein, J. (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates, Molecular Biology and Evolution, 11: 459-468.