The software TRACE is a pedigree analysis tool, which was developed and implemented as part of a bioinformatic's master thesis in 2023 at Leipzig University. It is a C++ written console application, that was designed to calculate dyadic relatedness coefficients from a given pedigree without being limited by the number of considered generations, the number of individuals, or the incompleteness of the pedigree itself. Additionally, TRACE provides some further information about the respective relatedness paths between the focal individuals, such as the name and sex of ancestors along the path, the most recent common ancestors (LCA = lowest common ancestor), the kin class (e.g. siblings or cousins), or the minimal detectable inbreeding value for each individual. The functionality and accuracy were adequately tested with multiple simulated populations as well as with an existing multi-generational pedigree established for the free-ranging population of rhesus macaques (Macaca mulatta) on the island of Cayo Santiago (Puerto Rico, USA) that covers a time span of over 60 years and consists of a total of 12 049 individuals 1. Contrary to other pedigree analyses, which are often limited in the number of considered generations (like 1), the graph-theoretical approach enables a generational unrestricted calculation of the dyadic relatedness coefficients. However, it might not be suitable for highly inbred populations since it does not include inbreeding coefficients of common ancestors (see more information in the section Implementation/Relatedness coefficient).
Since scientists working on wild populations often have to deal with partial pedigrees (mainly due to unknown sires), the second part of the programme focuses on the implementation of an adapted simulated annealing algorithm to find the best solution for a fully-reconstructed pedigree based on "true" relatedness values. True relatedness means in this context dyadic relatedness values, that provide more information about the individuals with incomplete ancestry, than can be obtained from the partial pedigree, like relatedness values calculated from a fully-reconstructed pedigree or - closer to a realistic application - realised relatedness values. Realised relatedness (the proportion of DNA two individuals actually share instead of an average) can be estimated based on the length and number of IBD segments. Identified through dense and genome-wide sets of SNPs (single nucleotide polymorphisms) in whole genome sequencing data, short identity-by-descent (IBD) segments indicate rather distant kin due to more meioses in between, that are responsible for the length reduction 2 3. Eventually, the algorithm aims to provide a pedigree without gaps for which the difference between the given realised relatedness values and the simultaneously calculated pedigree-derived relatedness coefficients is minimal over all dyads (see more information in the section Implementation/Simulated annealing). While patterns of relatedness in group-living animals with promiscuous mating can be really complex, assessing dyadic relatedness from sequencing data is providing the most accurate way to do so. However, at the behavioural level it will be important to still consider kin classes (e.g. maternal half-siblings, paternal cousins etc.) instead of using a global measure of IBD.
Installation guide
- download (and don't forget to unzip) the repository to your local filesystem
- after downloading the source code, open the command line and navigate within the terminal into the folder pedigree_programme/source/
- you can use
ls
to check if you are in the correct folder and if all the necessary files were downloaded: multiple headerfiles (.h), the respective source code files (.cpp), main.cpp and the makefile makefile_pedigree_programme
- you can use
- run in the command line
make -f makefile_pedigree_programme
- this program is written using C++17 features and relies on the C++ Standard Library, which is typically included within the C++ compiler. Therefore, no additional library installations are necessary. However, please ensure that your compiler supports C++17 standards.
- if you have trouble with the make command on windows e.g. 'make' is not recognized as an internal or external command, operable programme or batch file.
- either download Cygwin, use the setup exe to install make and gcc/g++, move the programme folder to Cygwin and run the command in the Cygwin Terminal
- or install MinGW, set a new environment variable to the bin folder of mingw, install make by
mingw-get install mingw32-make
or the MinGW Interface (started bymingw-get
) and use the commandmingw32-make -f makefile_pedigree_programme
instead
- now you can use the command
./pedigree_programme
to start TRACE - for general information you can type
./pedigree_programme -h
to list all possible command line arguments, or./pedigree_programme -v
to get the current version
Command line arguments
TRACE provides three different functionalities: "relatedness", "simulation", and "annealing", which could be chosen by the command line argument -f <functionality>
.
- relatedness: calculates the dyadic relatedness (+ path characteristics) from a given (partial or complete) pedigree
- simulation: simulates a random population and returns a complete pedigree
- annealing: starts a simulated annealing algorithm to fill the parental gaps within a partial pedigree using dyadic values of realised relatedness (IBD)
- if no argument is given, TRACE starts without a task, gives a short warning, and terminates
For each mode, further required and optional arguments are listed below:
functionality == relatedness
required arguments
-
-p <input_pedigree>
[string]: path to a pedigree file, e.g. pedigree.txt
optional arguments
-
-c <cores>
[int]- options: number of cores for multiprocessing
- default: 1 (no multiprocessing)
-
-d <input_dyadlist>
[string]- options: path to file with selected dyads e.g. dyad_selection.txt
- default: [empty] (all dyads within the pedigree will be analysed)
-
-e <output_extend>
[string]-
options:
- full: returns the full dyadlist output, including path characteristics
- reduced: returns only dyadlist with dyadic relatedness coefficients
- default: full
-
options:
-
-l <generation_limit>
[int]- options: restricts the distance to potential lowest common ancestors, e.g. if generation_limit == 3, only paths up to the grandparent generation will be returned, great-grand-parents will be considered as unrelated
- default: [empty] (no limitation; all ancestors of a focal will be considered as potential lowest common ancestor)
-
-o <output>
[string]- options: custom output name (prefix) e.g. if output == programme_output, the resulting output files will be named "programme_output_dyadlist.txt" and "programme_output_info.txt"
- default: [empty] (the input file name will be used as a prefix)
-
-r <reduce_node_space>
[bool]-
options:
- T: [true] before calculating the dyadic relatedness, the number of individuals will be reduced which means that only descendants of the focal's common ancestors will be considered in the analysis (it effectively reduces the search space without affecting the result, but might be only beneficial in almost completely known pedigrees with a long history due to the extra computational cost)
- F: [false] no prior narrowing of the search space
- default: false
-
options:
Example
./pedigree_programme -f relatedness -p pedigree.txt -e reduced -c 5
functionality == simulation
required arguments
-
-n <start_individual>
[int]: number of individuals at the start of the simulation -
-s <simulation_duration>
[int]: number of years considered in pedigree to restrict the duration of the simulation
optional arguments
-
-a <max_age>
[int]- options: species-/population specific age maximum (individuals who reach the maximum age will decease in the following year)
- default: 30
-
-b <birth_rate>
[double]- options: specifies the annual increment in the number of offspring born each year during the population simulation
- default: 4.0
-
-q <death_rate>
[double]- options: specifies the annual increment in the number of deaths each year during the population simulation
- default: 3.0
-
-y <default_year>
[int]- options: start year for population simulation
- default: 1900
Example
./pedigree_programme -f simulation -n 20 -s 10 -y 1938
functionality == annealing
required arguments
-
-d <input_dyads_complete>
[string]: path to dyadlist with realised relatedness values, e.g. true_dyads.txt -
-p <input_pedigree>
[string]: path to pedigree file (with gaps), e.g. pedigree.txt
optional arguments
-
-c <cores>
[int]- options: number of cores for multiprocessing
- default: 1 (no multiprocessing)
-
-i <init_temp>
[double]- options: start temperature
-
default: [empty] (automatically calculated by
$\text{start temperature = init factor (= highest mean relatedness of an individual}) \cdot n_{nodes} \cdot 1.5$
-
-k <visualization>
[bool]-
options:
- T: [true] keep track of simulated annealing steps (the respective relatedness variance and if they are rejected)
- F: [false] prior simulated annealing steps are not recorded/returned
- default: true
-
options:
-
-t <stop_temp>
[double]- options: stop temperature, if the current temperature falls below stop temperature, the algorithm terminates
- default: 1.0
-
-x <temp_decay>
[double]-
options: the temperature multiplication factor to determine the number of iterations (if the number of iteration n is set, the decay factor can be calculated with temp_decay =
$\sqrt[n]{\frac{t_{stop}}{t_{init}}} $ - default: 0.99
-
options: the temperature multiplication factor to determine the number of iterations (if the number of iteration n is set, the decay factor can be calculated with temp_decay =
-
-z <complete_pedigree>
[string]- options: path to complete pedigree if fully known pedigree exists (with all gaps correctly filled) and if it should be used to evaluate the accuracy of simulated annealing output
- default: [empty] (no comparison of whether gaps are correctly filled after the simulated annealing)
Example
./pedigree_programme -f annealing -p pedigree_with_gaps.txt -d realized_dyadic_relatedness.txt -x 0.995 -c 5 -m 1000 -w 1000
general optional arguments
-
-g <gestation_length>
[int]- options: gestation length in days
- default: 200
-
-j <twins>
[bool]-
options:
- T: [true] twins are possible
- F: [false] twins are not possible or rare to the point that potential mom candidates can be excluded if they have already an offspring in the respective birth cohort
- default: false
-
options:
-
-m <maturation_age_m>
[int]- options: maturation age of males in days
- default: 1250
-
-w <maturation_age_f>
[int]- options: maturation age of females in days
- default: 1095
The content in the following section are adapted excerpts from the Master's thesis by Hendrikje Westphal, submitted in December 2023 at Leipzig University
Relatedness Coefficient
Relatedness coefficient calculation
To calculate the dyadic relatedness coefficient, the (partial) pedigree G is conceived as a directed acyclic graph, consisting of two distinct classes of vertices,
Generally, the relatedness coefficient of an individual
Please note, that based on the formulas above, TRACE may provide slightly underestimated relatedness coefficients in the case of inbred common ancestors (for instance, as shown in Figure A). That is because the algorithm stops as soon as the lowest common ancestor in the respective path is found. Inbreeding due to multiple relatedness paths (Figure B), however, is included in the estimation.
For the individuals F and G in Figure A, TRACE would provide an r of 0.25 (whereby the inbreeding coefficient of the lowest common ancestor E remains unconsidered), while the relatedness coefficient in Figure B is 0.265625. To manually estimate the reliability, TRACE additionally offers the inbreeding coefficient for each individual, estimated by the half of the parental relatedness coefficient. That means, that the inbreeding coefficient of E in Figure A would be 0.25.
Simulated Annealing
Adapted Simulated Annealing Algorithm
Within the programme, a simulated annealing algorithm is implemented to fill possibly existing gaps within a given pedigree. Therefore, it uses the discrepancy between user-provided realised relatedness values (e.g. obtained from whole genome sequencing) and the calculated pedigree-derived relatedness values as cost function. In trying to minimize the cost/discrepancy by simulated annealing, the aim is to find the one pedigree solution which explains best the variance.
This is highly relevant, for instance for identifying the ID of a sire, that was originally unknown, based on whole genome sequencing data of his descendants, as illustrated in the following example. Assume a DNA sample of a male is missing, but he sired two offspring that are otherwise unrelated. Hence, the realised relatedness of these paternal half-siblings is something around 0.25, while the pedigree-derived relatedness states them as nonkin with
Simulated Annealing in general requires a given start and stop temperature, as well as a factor to decrease the current temperature until it reaches the stop temperature, whereby temperature refers to the origin of the idea behind it. It was adopted from a gradual cooling process (annealing) in thermodynamics, that was used instead of a rapid cooling off, to allow molecules to order themselves in an optimal energetic state, which mirrors in the simulated annealing algorithm the possibility of escaping a local minima and to end with the global minimum 4. The general concept of simulated annealing starts with a random solution and within each iteration, the current (last accepted) solution is compared to a new neighbourly solution, which is either accepted or rejected, whereby the acceptance depends highly on the current temperature, the total discrepancy/cost function as well as the used acceptance criterion 5.
To fit our specific problem, the general simulated annealing algorithm is adapted as explained in the following outline:
- At first, all pedigree gaps need to be identified.
- Create a start solution by randomly assigning parents from a pool of suitable candidates for each gap. Suitable candidates are parents who were alive and mature at the time of conception (sire) or birth (mother) and were not excluded as potential parents (respectively listed as nonsire/nondam in the input file; usually, individuals can be labelled as nonparent if they are priorly excluded due to genetic analysis). Additionally, if the parameter twins is set as false, females are excluded too, if they have already an offspring in the respective cohort since twins are really rare and unlikely
- Calculation of the relatedness coefficient for each relevant dyad (those for which realised relatedness values are available)
- Evaluate the difference between the realised and pedigree-derived relatedness values of the start solution for each relevant dyad
- Save the current difference as the best-known difference, and the start solution as the best pedigree.
- Iteration: While the current temperature is above the (given) stop temperature:
- Create a new solution by exchanging one potential parent with another suitable candidate (= neighbour solution, since only one gap is modified in comparison to the current solution)
- Calculate the changed relatedness values for dyads affected by this alteration (all relevant dyads which include the offspring, the previous and the new parent candidate).
- Compare the previous (from current solution) and the new relatedness values (from neighbour solution) to determine the discrepancy between both solutions.
- If the neighbour solution is worse, apply the Metropolis acceptance criterion 6 to decide whether to accept it or not:
$$e^\frac{F_{n}-F_{c}}{T} > X\to [0,1]$$ (with$F_n$ as fitness function of the neighbour solution and$F_c$ of the current solution;$T$ as temperature and$X$ as a random number in the range between 0 and 1) - If accepted (or the neighbour solution is better in the first place), the neighbour solution becomes the new current solution; otherwise, it's rejected, and the previous solution (non-updated current solution) remains in place.
- If necessary, update the best difference and pedigree.
- Finally, save the last pedigree solution in a file.
Pedigree files
Pedigree file in this context refers to a file, containing a table with information for each individual in the population per row. Since TRACE is able to handle gaps (missing parental data), both a complete or partial pedigree can be passed as an argument to calculate relatedness coefficients.- Input file format: .txt (tab-separated)
- no header
- empty NA values (like "") lead to adverse behaviour or programme abort
- columns (order and format are mandatory): ID, sex, birth season/year, mom_ID, sire_ID, day of birth (DOB), day of death (DOD), nonsire, nondam (see the explanation for each column in the following table)
- please refer to the column missing_value of the following table to ascertain the correct format for NAs for each attribute
column | data type | missing value | explanation | comment |
---|---|---|---|---|
ID | string | cannot be supported; no NA values possible | unique name for the individual | ID names have to be unique and have to be unambiguously assignable to parent IDs; every parent ID from mom_ID or sire_ID has to be listed in the pedigree file separately; ID names like UNK, NA, unknown, unkn_f, and unkn_m have to be avoided |
sex | char | u | sex of the individual | usage of the following options only f = female, m = male, or u = unknown sex |
birthseason | int | 0 | year or respective birth season the individual is born in | |
mom_ID | string | unknown | ID name of the mother | have to be relatable to exactly one ID, respectively one female individual in the pedigree file |
sire_ID | string | unknown | ID name of the sire | have to be relatable to exactly one ID, respectively one male individual in the pedigree file |
DOB | string (dateformat) | NA | day of birth | in the format: 01-01-1900 |
DOD | string (dateformat) | NA | day of death | in the format: 01-01-1900 |
nonsire | string | NA | all sires that are excluded as potential candidates for instance due to genetic analysis (important if sire_ID is missing for the individual) | IDs of previously excluded sires strung together (have to be relatable to exactly one ID of the respective sex in the pedigree); separated by @ e.g. indiv1@indiv2@indiv3; ensure that each individual has at least one remaining potential sire within the pedigree, else an individual without potential parent candidates will be assumed to be a founder individual, which means that the paternal gap will not be considered in the further analysis |
nondam | string | NA | all females that are excluded as potential maternal candidates for instance due to genetic analysis (important if mom_ID is missing for the individual) | IDs of previously excluded moms strung together (have to be relatable to exactly one ID of the respective sex in the pedigree); separated by @ e.g. indiv1@indiv2@indiv3; ensure that each individual has at least one remaining potential mother within the pedigree, else an individual without potential parent candidates will be assumed to be a founder individual, which means that the maternal gap will not be considered in the further analysis |
Dyadic files
- Input file format: .txt (tab-separated)
- no header
- empty NA values (like "") lead to adverse behaviour or programme abort
- columns (order and format is mandatory): ID_1, ID_2
- ID names have to be unique and have to be unambiguously assignable to pedigree IDs; every focal ID has to be listed in the pedigree separately; ID names like UNK, NA, unknown, unkn_f, and unkn_m have to be avoided
- example
Dyadic relatedness information (Simulated Annealing: realised and pedigree-derived relatedness values)
- Input file format: .txt (tab-separated)
- no header
- empty NA values (like "") lead to adverse behaviour or programme abort
- only dyads listed within this file will be considered as relevant for minimizing the variance between the pedigree-derived relatedness coefficient and the realised relatedness value
- columns (order and format is mandatory): ID_1, ID_2, pedigree_r, real_r
- ID names have to be unique and have to be unambiguously assignable to pedigree IDs; every focal ID has to be listed in the pedigree separately; ID names like UNK, NA, unknown, unkn_f, and unkn_m have to be avoided
- pedigree_r: dyadic relatedness coefficient from the incomplete pedigree; no NA values possible
- real_r: realised relatedness values of the dyad, obtained for instance from shared IBD segments; no NA values possible
- example
Relatedness calculation
To calculate the dyadic relatedness for some selected dyads of this partial pedigree, two input files are required: the pedigree file itself (one individual per row) and the preselected set of dyads to consider. The files used for that example are listed in the subsection Input files, while the resulting output (relatedness coefficients, path characteristics for the selected dyads, minimal inbreeding value and number of completely known generations per individual) can be viewed in the second section Output files.
I. Input files
ID | sex | birthseason | mom | sire | DOB | DOD | nonsire | nondam |
---|---|---|---|---|---|---|---|---|
A | f | 1905 | unknown | unknown | 01-01-1900 | NA | NA | NA |
B | f | 1911 | A | unknown | 01-01-1911 | NA | NA | NA |
C | m | 1912 | unknown | unknown | 01-01-1912 | NA | NA | NA |
D | f | 1913 | A | unknown | 01-01-1913 | NA | NA | NA |
E | f | 1914 | A | unknown | 01-01-1914 | NA | NA | NA |
F | m | 1915 | unknown | unknown | 01-01-1915 | NA | NA | NA |
G | m | 1920 | B | unknown | 01-01-1920 | NA | NA | NA |
H | f | 1921 | D | C | 01-01-1921 | NA | NA | NA |
I | m | 1922 | E | F | 01-01-1922 | NA | NA | NA |
J | m | 1923 | E | F | 01-01-1923 | NA | NA | NA |
K | m | 1928 | H | G | 01-01-1928 | NA | NA | NA |
L | f | 1929 | H | I | 01-01-1929 | NA | NA | NA |
ID_1 | ID_2 |
---|---|
C | F |
H | L |
I | J |
K | L |
C | G |
D | G |
D | J |
II. Output files/explanation
During the analysis, the following path characteristics were computed along the relatedness calculation:
The following table is taken from the Master's thesis by Hendrikje Westphal, submitted in December 2023 at Leipzig University, Germany
name | explanation | example |
---|---|---|
path | consecutive list of nodes along the relatedness path (edge directions are left unregarded) | E@A@B@G |
lca | lowest common ancestor within the path, that is the most recent ancestor both individuals share | A |
pathline | sequence of sexes (f/m/u) along the path | fffm |
kinline | whether the path consists solely of maternal ("mat") or paternal ancestors ("pat"); “mixed” if the one path includes both maternal and paternal ancestors | mat |
depth | path length from LCA to each focal | 1/2 |
kin_class | kin class label based on the table of consanguinity (see below) | nephew-aunt |
fullhalf | whether two identical paths exist with different lowest common ancestors, e.g. to differentiate between full- and half-siblings | half |
min_DGD | minimal dyadic genealogical depth states the pedigree completeness for the dyad; i.e. the minimal amount of fully resolved generations starting from both focals | 1 |
Consanguinity table (Wikipedia)
For instance, if we look at the dyad (E_G) from the pedigree example above. The focal individuals E (circle = female) and G (square = male) are related only by maternal ancestors (kinline = mat), because the individuals along the path (E-A-B-G) are female-female-female-male (pathline = fffm), whereby the first and the last sex belongs to the focals. Therefore E and G are purely maternal related. Furthermore, the lowest common ancestor A is one edge apart from E and two from G (depth = 1/2) which codes in combination with the sexes for the kin class nephew/aunt. Because they are related by exactly one path, they have to be a half nephew/aunt pair. Also, each focal has at least one unknown parent, therefore the min DGD is 1.
The full returned output file would look like this:
ID 1 | ID 2 | dyad | relatedness coefficient | paths | pathline | kinline | LCA | depth | kin_class | fullhalf | min_DGD |
---|---|---|---|---|---|---|---|---|---|---|---|
C | F | C_F | 0 | NA | NA | NA | NA | NA | nonkin | NA | 1 |
H | L | H_L | 0.531250000000000 | H@L/@/H@D@A@E@I@L | ff/@/ffffmf | mat/@/mixed | H/@/A | 0/1/@/2/3 | daughter&mother/@/1st-cousins-once-removed | half/@/half | 2 |
I | J | I_J | 0.500000000000000 | I@E@J/@/I@F@J | mfm/@/mmm | mat/@/pat | E/@/F | 1/1/@/1/1 | brothers/@/brothers | full/@/full | 2 |
K | L | K_L | 0.296875000000000 | K@H@L/@/K@H@D@A@E@I@L/@/K@G@B@A@D@H@L/@/K@G@B@A@E@I@L | mff/@/mffffmf/@/mmfffff/@/mmfffmf | mat/@/mixed/@/mixed/@/mixed | H/@/A/@/A/@/A | 1/1/@/3/3/@/3/3/@/3/3 | siblings/@/2nd-cousins/@/2nd-cousins/@/2nd-cousins | half/@/half/@/half/@/half | 2 |
C | G | C_G | 0 | NA | NA | NA | NA | NA | nonkin | NA | 1 |
D | G | D_G | 0.125000000000000 | D@A@B@G | fffm | mat | A | 1/2 | nephew&aunt | half | 1 |
D | J | D_J | 0.125000000000000 | D@A@E@J | fffm | mat | A | 1/2 | nephew&aunt | half | 1 |
Additionally, a second output file will be generated, including the pedigree file with some additional information like generational depth (column "full_generations", equal to min_DGD but this time the exact value for the respective individual is returned instead of the minimal value of both focals), minimal inbreeding value and a string of listed individuals that are potential mothers/sires for the individual concerned in case of unknown parents.
ID | sex | BS | mom | sire | DOB | DOD | pot_sire | pot_mom | full_generations | min_f |
---|---|---|---|---|---|---|---|---|---|---|
A | f | 1905 | unkn_f | unkn_m | 1-1-1900 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
B | f | 1911 | A | unkn_m | 1-1-1911 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
C | m | 1912 | unkn_f | unkn_m | 1-1-1912 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
D | f | 1913 | A | unkn_m | 1-1-1913 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
E | f | 1914 | A | unkn_m | 1-1-1914 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
F | m | 1915 | unkn_f | unkn_m | 1-1-1915 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
G | m | 1920 | B | unkn_m | 1-1-1920 | 0-0-0 | NA | NA | 1 | 0.000000000000000 |
H | f | 1921 | D | C | 1-1-1921 | 0-0-0 | NA | NA | 2 | 0.000000000000000 |
I | m | 1922 | E | F | 1-1-1922 | 0-0-0 | NA | NA | 2 | 0.000000000000000 |
J | m | 1923 | E | F | 1-1-1923 | 0-0-0 | NA | NA | 2 | 0.000000000000000 |
K | m | 1928 | H | G | 1-1-1928 | 0-0-0 | NA | NA | 2 | 0.031250000000000 |
L | f | 1929 | H | I | 1-1-1929 | 0-0-0 | NA | NA | 3 | 0.031250000000000 |
Population Simulation
exemplary output of a simulated pedigree with 20 founder individuals born/started in 1950, simulated for 10 years: simulated pedigree and the respective list of dyadic relatedness coefficients. In total, 117 individuals were simulated (20 founders + 97 descendants with a complete ancestry, i.e. no parental gaps), which results in 1442 dyads.
- created with:
./pedigree_programme -f simulation -n 20 -s 10 -y 1950 -o ../example/population_simulation/example_simulation
Simulated Annealing
exemplary simulated annealing based on the simulated pedigree above (please refer to section Implementation/Simulated Annealing if you are unfamiliar with the idea behind the implemented algorithm)
- partial pedigree: randomly added paternal gap with a probability of 50% in all descendants of the simulated population
- complete pedigree: file from population simulation
- dyads: combined list of relatedness coefficients for each dyad, (1) from incomplete/partial pedigree and (2) realised relatedness. In this example, I could not use existing realised relatedness values from whole genome sequencing since the pedigree itself was simulated. Therefore, realised relatedness values are in this case the calculated pedigree-derived relatedness coefficients from the complete pedigree with added recombination noise. That means, instead of using the average relatedness for each kin class (like 0.25 for half-siblings, or 0.0625 for half first cousins), a bit more variance was added to these values (like 0.22 instead of 0.25), whereby the range, from which the added variance was randomly chosen, is based on simulated IBD values by Freudiger et al. in prep.
- simulated annealing started with
.\pedigree_programme -f annealing -p ..\example\simulated_annealing\example_simulation_incomplete.txt -d ..\example\simulated_annealing\example_simulation_dyads.txt -o ..\example\simulated_annealing\example_annealing_output -z ..\example\population_simulation\example_simulation.txt -x 0.999
- output files: final pedigree solution after simulated annealing, start solution pedigree (randomly filled pedigree) and visualization data
- simulated annealing assigned 39/43 gaps (90.7%) correctly (time: 1 minute, iterations: 2665, falsely assigned sires: 4) and therefore reduced the total discrepancy in relatedness (= cost function or sum of all pedigree-derived vs. realised relatedness values) from approximately 321 to 96 (minimization of cost function: -70%), see simulated annealing graph below (plotted visualization data). Minimization of the discrepancy towards 0 is highly unlikely due to the variance in the realised relatedness values in comparison to the statistical average of pedigree-derived relatedness values.
v0.1.0
- TRACE v0.1.0 requires the birth season to assign individuals to generations. While the kinship calculation is still correct, the path characteristics may be computed incorrectly if the birth season is missing. The value can be fictional as long as it is numeric and provides a time reference for assigning individuals to generations (e.g. birth season of mother = 1, birth season of offspring = 2).
v0.1.1
- improve cross-platform compatibility (using LF - line feed - instead of CRLF - carriage return + line feed - to ensure universally recognizable line endings, including on Unix-based systems
- mend the birth season dependency to calculate path characteristics correctly (see identified issues v0.1.0; birth season is no longer necessary)
At this time, no identified issues have been reported. However, if any issues are discovered, please notify us by emailing to hendrikje.westphal@gmx.de
We would like to thank the Caribbean Primate Research Center (CPRC), especially Melween Martinez, Carlos A. Sariol Curbelo, Angelina Ruiz-Lambides, and all field staff, for their support of our work. We thank Richard McElreath and Peter Fröhlich for providing data storage and comprehensive IT support. We are grateful to Donald F. Conrad, Brian Miller, Noah Snyder-Mackler, Vladimir Jovanovic, Harald Ringbauer and Yilei Huang for their contribution to the preparation of the IBD data. We would like to thank Stefanie Bley for data management and Lars Kulik and Lydia Schmidt for their thoughtful input into this programme.
Please use the BibTex format, provided by GitHub or cite this programme, referencing the specific version used, as
Westphal, H., Freudiger, A., Gatter, T., Stadler, P., & Widdig, A. (2023). TRACE - Tool for pedigree Relatedness Analysis and Coefficient Estimation. (Version 0.1.0) [Computer software].
https://github.com/Hendrikjen/pedigree_programme
Contact email: hendrikje.westphal@gmx.de
Footnotes
-
Widdig, A., Muniz, L., Minkner, M., Barth, Y., Bley, S., Ruiz-Lambides, A., ... & Kulik, L. (2017). Low incidence of inbreeding in a long-lived primate population isolated for 75 years. Behavioral ecology and sociobiology, 71, 1-15. https://doi.org/10.1007/s00265-016-2236-6 ↩ ↩2
-
Wang, B., Sverdlov, S., & Thompson, E. (2017). Efficient estimation of realized kinship from single nucleotide polymorphism genotypes. Genetics, 205(3), 1063-1078. https://doi.org/10.1534/genetics.116.197004 ↩
-
Li, H., Glusman, G., Hu, H., Caballero, J., Hubley, R., Witherspoon, D., ... & Huff, C. D. (2014). Relationship estimation from whole-genome sequence data. PLoS genetics, 10(1), e1004144. https://doi.org/10.1371/journal.pgen.1004144 ↩
-
Brooks, S. P. and Morgan, B. J. (1995). Optimization using simulated annealing. Journal of the Royal Statistical Society Series D: The Statistician, 44(2):241–257. https://doi.org/10.2307/2348448 ↩
-
Bertsimas, D., & Tsitsiklis, J. (1993). Simulated annealing. Statistical science, 8(1), 10-15. https://doi.org/10.1214/ss/1177011077 ↩
-
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6), 1087-1092. https://doi.org/10.1063/1.1699114 ↩