Pedigrees play an important role in animal selective breeding
programs. On the one hand, pedigree information can improve the accuracy
of estimated breeding values. On the other hand, it helps control
inbreeding and avoid inbreeding depression. Therefore, reliable and
accurate pedigree records are essential for any selective breeding
program. In addition, pedigrees are typically stored in a three-column
format (individual, sire, and dam), which makes it difficult to
visualize ancestors and descendants. Consequently, visualizing
individual pedigrees is highly beneficial. For the Windows platform,
Professor Yang Da’s team at the University of Minnesota developed
pedigraph, a software for displaying individual pedigrees. It can handle
pedigrees containing many individuals. While powerful, it requires
configuration via a parameter file. Professor Brian Kinghorn at the
University of New England developed pedigree viewer,
which can trim, prune, and visually display pedigrees in a windowed
interface. However, if the number of individuals is very large, they may
overlap. Thus, pedigree visualization functions require further
optimization. In the R environment, packages like pedigree,
nadiv, and optiSel provide pedigree
preparation functions. Packages such as kinship2 and
synbreed can also be used to draw pedigree trees. However,
these trees often suffer from significant overlapping when the number of
individuals is large.
To address this, we developed the visPedigree package.
Built on data.table and igraph, it offers
robust data cleaning and social network visualization capabilities,
significantly enhancing pedigree tidying and visualization. With this
package, users can trace and prune ancestors and descendants across
multiple generations. It automatically optimizes the pedigree tree
layout and can quickly display pedigrees with over 10,000 individuals
per generation by compacting full-sib groups and using outlined
displays. The main contents of this guide are as follows:
The visPedigree package can be installed from CRAN:
Or from GitHub:
The first three columns of pedigree data must be in the order of
individual, sire, and dam IDs. The column names can be customized, but
their order must remain unchanged. Individual IDs should not be coded as
"", " ", "0", *, or
NA; otherwise, they will be removed from the pedigree.
Missing parents should be denoted by NA, 0, or
*. Spaces and empty strings ("") will also be
treated as missing parents, though this is not recommended. Additional
columns, such as sex and generation, can also be included.
The pedigree can be checked and tidied through the
tidyped() function.
This function takes a pedigree, checks for duplicates and bisexual individuals, detects loops, adds missing founders, sorts the pedigree, and traces candidate pedigrees.
If the cand parameter is provided, only those
individuals and their ancestors or descendants are retained.
Tracing direction and the number of generations can be specified
using the trace and tracegen parameters.
Virtual generations are inferred and assigned when
addgen = TRUE.
A numeric pedigree is generated when addnum = TRUE.
Sex will be inferred for all individuals if sex information is
missing. If a Sex column is present, values should be coded
as 'male', 'female', or NA
(unknown). Missing sex information will be inferred from the pedigree
structure where possible.
The visPedigree package comes with multiple datasets. You can check through the following command.
The following code displays the simple_ped dataset,
which contains four columns: individual, sire, dam, and sex. Missing
parents are denoted by 'NA', '0', or
*. Founders are not explicitly listed, and some parents
appear after their offspring in the original data.
head(simple_ped)
#> ID Sire Dam Sex
#> <char> <char> <char> <char>
#> 1: J4Y326 J3Y620 J3Y771 male
#> 2: J1H419 J0Z938 J0Z167 female
#> 3: J2F588 NA J1Z417 female
#> 4: J1J576 J0Z938 J0Z843 male
#> 5: J1C802 J0Z333 J0C355 male
#> 6: J2Z411 J1X971 J1J134 female
tail(simple_ped)
#> ID Sire Dam Sex
#> <char> <char> <char> <char>
#> 1: J1E852 J0Z848 J0Z624 female
#> 2: J1H604 J0C583 J0Z380 female
#> 3: J5X804 J4Y326 J4E185 female
#> 4: J1I438 J0Z990 J0Z808 male
#> 5: J2C808 J1I975 J1F266 male
#> 6: J1K462 J0C317 J0C450 female
# The number of individuals in the pedigree dataset
nrow(simple_ped)
#> [1] 31
# Individual records with missing parents
simple_ped[Sire %in% c("0", "*", "NA", NA) |
Dam %in% c("0", "*", "NA", NA)]
#> ID Sire Dam Sex
#> <char> <char> <char> <char>
#> 1: J2F588 NA J1Z417 female
#> 2: J1J858 J0Z060 * female
#> 3: J3X697 J2Z903 0 femaleExample: If we incorrectly set the female J0Z167 as the
sire of J2F588, tidyped() will detect this
bisexual conflict.
x <- data.table::copy(simple_ped)
x[ID == "J2F588", Sire := "J0Z167"]
y <- tidyped(x)
#> Warning in validate_and_prepare_ped(ped): Bisexual individuals found (both Sire
#> and Dam): J0Z167The tidyped() function sorts the pedigree, replaces
missing parents with NA, ensures parents precede their
offspring, and adds missing founders.
tidy_simple_ped <- tidyped(simple_ped)
head(tidy_simple_ped)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum
#> <char> <char> <char> <char> <int> <int> <int> <int>
#> 1: J0C032 <NA> <NA> female 1 1 0 0
#> 2: J0C185 <NA> <NA> female 1 2 0 0
#> 3: J0C231 <NA> <NA> female 1 3 0 0
#> 4: J0C317 <NA> <NA> male 1 4 0 0
#> 5: J0C450 <NA> <NA> female 1 5 0 0
#> 6: J0C561 <NA> <NA> male 1 6 0 0
tail(tidy_simple_ped)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum
#> <char> <char> <char> <char> <int> <int> <int> <int>
#> 1: J1C802 J0Z333 J0C355 male 5 54 47 46
#> 2: J4E185 J3L886 J3X697 female 5 55 48 49
#> 3: J4Y326 J3Y620 J3Y771 male 5 56 50 51
#> 4: J1C929 J0Z511 J0Z444 male 6 57 53 52
#> 5: J2Y434 J1C802 J1H419 female 6 58 54 28
#> 6: J5X804 J4Y326 J4E185 female 6 59 56 55
nrow(tidy_simple_ped)
#> [1] 59In the resulting tidy_simple_ped, founders are added
with their inferred sex, and parents are sorted before their offspring.
The number of individuals increases from 31 to 59. The columns are
renamed to Ind, Sire, and Dam.
Missing parents are uniformly replaced with NA, and
tidyped() provides informative messages during processing.
By default, tidy_simple_ped includes new columns:
Gen, IndNum, SireNum, and
DamNum. These can be disabled by setting
addgen = FALSE and addnum = FALSE.
If the input dataset lacks a Sex column, it will be
automatically added to the tidied output.
tidy_simple_ped_no_gen_num <-
tidyped(simple_ped, addgen = FALSE, addnum = FALSE)
head(tidy_simple_ped_no_gen_num)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex
#> <char> <char> <char> <char>
#> 1: J0Z938 <NA> <NA> male
#> 2: J0Z333 <NA> <NA> male
#> 3: J0C561 <NA> <NA> male
#> 4: J0Z475 <NA> <NA> male
#> 5: J0Z511 <NA> <NA> male
#> 6: J0Z664 <NA> <NA> maleOnce tidied, you can use data.table::fwrite() to export
the pedigree for genetic evaluation software like ASReml.
A pedigree loop occurs when an individual is its own ancestor (e.g.,
A is the parent of B, B is the parent of C, and C is the parent of A).
This is a biological impossibility and a serious error in pedigree
records. The tidyped() function automatically detects these
cycles using graph theory algorithms. If a loop is detected, the
function will stop and provide information about the individuals
involved in the loop.
The following code demonstrates what happens when a pedigree with loops is processed:
# loop_ped contains cycles (e.g., V -> T -> R -> P -> M -> V)
# Attempting to tidy it will result in an error
try(tidyped(loop_ped))
#> Error : Pedigree error! Pedigree loops detected:
#> M -> P -> R -> T -> V -> M
#> F -> E -> C -> A -> FDetecting loops early is crucial for ensuring the integrity of genetic evaluations.
When saving the pedigree, missing parents should typically be
replaced with 0.
To trace the pedigree of specific individuals, use the
cand parameter. This adds a Cand column where
TRUE identifies the specified candidates. If
cand is provided, only the candidates and their
ancestors/descendants are retained.
tidy_simple_ped_J5X804_ancestors <-
tidyped(ped = tidy_simple_ped_no_gen_num, cand = "J5X804")
tail(tidy_simple_ped_J5X804_ancestors)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum Cand
#> <char> <char> <char> <char> <int> <int> <int> <int> <lgcl>
#> 1: J3X697 J2Z903 <NA> female 4 45 43 0 FALSE
#> 2: J3Y620 J2C161 J2Z411 male 4 46 37 42 FALSE
#> 3: J3Y771 J2G465 J2X544 female 4 47 40 41 FALSE
#> 4: J4E185 J3L886 J3X697 female 5 48 44 45 FALSE
#> 5: J4Y326 J3Y620 J3Y771 male 5 49 46 47 FALSE
#> 6: J5X804 J4Y326 J4E185 female 6 50 49 48 TRUEBy default, the function traces ancestors. You can limit the number
of generations using tracegen. If tracegen is
NULL, all available generations are traced.
tidy_simple_ped_J5X804_ancestors_2 <-
tidyped(ped = tidy_simple_ped_no_gen_num,
cand = "J5X804",
tracegen = 2)
print(tidy_simple_ped_J5X804_ancestors_2)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum Cand
#> <char> <char> <char> <char> <int> <int> <int> <int> <lgcl>
#> 1: J3L886 <NA> <NA> male 1 1 0 0 FALSE
#> 2: J3X697 <NA> <NA> female 1 2 0 0 FALSE
#> 3: J3Y620 <NA> <NA> male 1 3 0 0 FALSE
#> 4: J3Y771 <NA> <NA> female 1 4 0 0 FALSE
#> 5: J4E185 J3L886 J3X697 female 2 5 1 2 FALSE
#> 6: J4Y326 J3Y620 J3Y771 male 2 6 3 4 FALSE
#> 7: J5X804 J4Y326 J4E185 female 3 7 6 5 TRUEThe code above traces the ancestors of J5X804 back two
generations.
To trace descendants, set trace = 'down'.
There are three options for the trace parameter:
tidy_simple_ped_J0Z990_offspring <-
tidyped(ped = tidy_simple_ped_no_gen_num, cand = "J0Z990", trace = "down")
print(tidy_simple_ped_J0Z990_offspring)
#> Tidy Pedigree Object
#> Index: <Sex>
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum Cand
#> <char> <char> <char> <char> <int> <int> <int> <int> <lgcl>
#> 1: J0Z990 <NA> <NA> male 1 1 0 0 TRUE
#> 2: J1I438 J0Z990 <NA> male 2 2 1 0 FALSE
#> 3: J2G465 J1I438 <NA> male 3 3 2 0 FALSE
#> 4: J3Y771 J2G465 <NA> female 4 4 3 0 FALSE
#> 5: J4Y326 <NA> J3Y771 male 5 5 0 4 FALSE
#> 6: J5X804 J4Y326 <NA> female 6 6 5 0 FALSETracing the descendants of J0Z990 reveals a total of 5
individuals.
Certain genetic evaluation programs require integer-coded pedigrees, where individuals are numbered consecutively to facilitate the calculation of the additive genetic relationship matrix.
By default, tidyped() adds IndNum,
SireNum, and DamNum columns. This can be
disabled with addnum = FALSE.
tidy_simple_ped_with_int <-
tidyped(ped = tidy_simple_ped_no_gen_num, addnum = TRUE)
head(tidy_simple_ped_with_int)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum
#> <char> <char> <char> <char> <int> <int> <int> <int>
#> 1: J0C032 <NA> <NA> female 1 1 0 0
#> 2: J0C185 <NA> <NA> female 1 2 0 0
#> 3: J0C231 <NA> <NA> female 1 3 0 0
#> 4: J0C317 <NA> <NA> male 1 4 0 0
#> 5: J0C450 <NA> <NA> female 1 5 0 0
#> 6: J0C561 <NA> <NA> male 1 6 0 0The inbreeding coefficient (F) of each individual can be calculated using tidyped() or inbreed() functions. There are two options to add the inbreeding coefficients to a tidied pedigree:
inbreed = TRUE in the tidyped()
function. This will calculate the inbreeding coefficients using the
nadiv package and add an f column to the
tidied pedigree.inbreed() directly on a tidied pedigree to add
the f column.# Create a simple inbred pedigree
library(data.table)
test_ped <- data.table(
Ind = c("A", "B", "C", "D", "E"),
Sire = c(NA, NA, "A", "C", "C"),
Dam = c(NA, NA, "B", "B", "D"),
Sex = c("male", "female", "male", "female", "male")
)
# Option 1: Calculate during tidying
tidy_test <- tidyped(test_ped, inbreed = TRUE)
head(tidy_test)
#> Tidy Pedigree Object
#> Ind Sire Dam Sex Gen IndNum SireNum DamNum f
#> <char> <char> <char> <char> <int> <int> <int> <int> <num>
#> 1: A <NA> <NA> male 1 1 0 0 0.000
#> 2: B <NA> <NA> female 1 2 0 0 0.000
#> 3: C A B male 2 3 1 2 0.000
#> 4: D C B female 3 4 3 2 0.250
#> 5: E C D male 4 5 3 4 0.375
# Option 2: Calculate after tidying
tidy_test <- inbreed(tidyped(test_ped))The summary() method provides a quick overview of the
pedigree statistics, including the number of individuals, sex
distribution, founders, and isolated individuals. If inbreeding
coefficients have been calculated (column f), the summary
will also include descriptive statistics of inbreeding.
# Summarize the tidied pedigree
summary(tidy_simple_ped)
#> Pedigree Summary:
#> -----------------
#> Total Individuals: 59
#> - Males: 29
#> - Females: 30
#>
#> Founders (parents unknown): 28
#> Maximum Generation: 6
# Summarize with inbreeding info
summary(tidy_test)
#> Pedigree Summary:
#> -----------------
#> Total Individuals: 5
#> - Males: 3
#> - Females: 2
#>
#> Founders (parents unknown): 2
#> Maximum Generation: 4
#>
#> Inbreeding coefficients:
#> - All: Mean=0.1250, Min=0.0000, Max=0.3750