Calculation of expected and observed heterozygosity #703

hammer · 2021-10-02T21:30:33Z

hammer
Oct 2, 2021
Maintainer

I wanted to start some discussion on the calculation (and API) of expected and observed heterozygosity which are used in estimating inbreeding (F) and fixation indices (FST). These metrics are implemented in scikit-allel and could be generalized further to support polyploid data and (in the case of expected heterozygosity) related samples.

hammer · 2021-10-02T21:31:22Z

hammer
Oct 2, 2021
Maintainer Author

(Posted by @timothymillar)

The observed heterozygosity is a measure of heterozygosity in the real population and is typically estimated from a sample of multiple individuals of that population.
In diploid populations the observed heterozygosity is simply the proportion of heterozygous individuals (see scikit-allel function).

In polyploids the concept of heterozygous or homozygous is less binary because the additional allele copies can result in a spectrum of heterozygosity.
For example the tetraploid genotypes AAAB, AABB, AABC and ABCD are all heterozygous but each to different extent.

Hardy (2016) defined the individual heterozygosity $h_i$ as the probability that two random alleles sampled (without replacement) at a locus in individual i are not identical by state (IBS)

where $p_{ia}$ is the frequency of allele a in individual i and $k_i$ is the ploidy of individual i.

In diploids this metric will be 1 in a het call (AB) or 0 in a hom call (AA).
In a tetraploid it will be 1 for a fully het call (ABCD), 0 for a fully hom call (AAAA) and between 0 and 1 for any other call.
Hardy (2016) provides a table of values for tetraploids, hexaploids and octoploids which are useful for testing.

In terms of API the individual heterozygosity can be easily calculated from call_allele_counts which convey both the ploidy and allele frequency of each call.

The observed heterozygosity of a sample is simply the mean of individual heterozygosities

where there are N individuals in the sample.

0 replies

hammer · 2021-10-02T21:33:02Z

hammer
Oct 2, 2021
Maintainer Author

(Posted by @timothymillar)

The expected heterozygosity is the expected rate of heterozygosity if a given population is in HWE.
This is typically estimated using the allele frequencies observed in a sample of a population.

Nei and Roychoudhury (1974) give a formula for expected heterozygosity when the true allele frequencies are known for a population

where $p_a$ is the frequency of allele a in the population.
For a sample of a population an estimate is typically corrected based on the number of alleles in the sample

where M is the total number of allele copies in the sample e.g. 2N for a diploid sample.
This formula can actually be applied 'as is' to polyploid populations (see Meirmans et al. (2018)).

A limitation of this method is that it assumes that all individuals in the sample are non-related and non-inbred.

Nei and Chesser (1983) introduced a method of calculating expected heterozygosity which attempts to correct for the bias due to multinomial sampling of genotypes

where $H_o$ is the observed heterozygosity.
This method only differs substantially from the above method when the sample size N is small.

Hardy (2016) adapted this method to polyploids

where k is the population ploidy.

Harris and DeGiorgio (2016) provide an alternative method heterozygosity BLUE that can calculate expected heterozygosity in the presence of related and/or inbred individuals of any ploidy. Their method takes a kinship matrix which is used to weight allele frequencies based on relatedness among individuals

where $\tilde{p}_{a}$ is the BLUE of the frequency of allele $a$ in the population and $\kappa_2$ is a weighted mean kinship coefficient for all pairs of individuals.

Heterozygosity BLUE is identical to the Nei and Roychoudhury method when the kinship matrix indicates non-related and non-inbred individuals (i.e. the kinship matrix is the identity matrix divided by ploidy).

0 replies

hammer · 2021-10-02T21:33:25Z

hammer
Oct 2, 2021
Maintainer Author

Thank you @timothymillar this detail is very useful for those of us without a strong background in population genetics!

On a related note, I collected some links based on an email that @alimanfoo sent me and crafted a post on https://discourse.pystatgen.org/t/genome-wide-selection-scans/90 to help us come up to speed on this topic.

Please do pass along any additional details specific to your work that we can read up on!

0 replies

hammer · 2021-10-02T21:33:39Z

hammer
Oct 2, 2021
Maintainer Author

(Posted by @timothymillar)

Thanks @hammer

Please do pass along any additional details specific to your work that we can read up on!

Will do, but I'm still figuring out the specifics!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of expected and observed heterozygosity #703

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Calculation of expected and observed heterozygosity #703

hammer Oct 2, 2021 Maintainer

Replies: 4 comments

hammer Oct 2, 2021 Maintainer Author

hammer Oct 2, 2021 Maintainer Author

hammer Oct 2, 2021 Maintainer Author

hammer Oct 2, 2021 Maintainer Author

hammer
Oct 2, 2021
Maintainer

hammer
Oct 2, 2021
Maintainer Author

hammer
Oct 2, 2021
Maintainer Author

hammer
Oct 2, 2021
Maintainer Author

hammer
Oct 2, 2021
Maintainer Author