Skip to content

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)

Notifications You must be signed in to change notification settings

bioXiaoheng/BalLeRMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BalLeRMix---Balancing selection Likelihood Ratio Mixture models

This repository hosts the software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020).

  • For the software, go to BalLeRMix/software/
  • For scripts used for SLiM simulations, go to BalLeRMix/Simulation_scripts/
  • For scripts used in empirical analyses, go to BalLeRMix/Empirical_analysis/

Please cite the following manuscript if using this software:

Xiaoheng Cheng, Michael DeGiorgio (2020) Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection. Molecular Biology and Evolution, 37(11): 3267--3291


In BalLeRMix v2, we introduce the -m <m> argument to customize the presumed number of alleles being balanced at the selected sites, in case you want to look for multi-allelic balancing selection. The default value is 2.

2020.6.22-Update: Updated the model for multi-allelic balancing selection in v2.2.

2020.2.5-Update: Fixed a minor bug in the initialization module.


Quick Guide

usage: BalLeRMix.py [-h] -i INFILE --spect SPECTFILE [-o OUTFILE] [-m M]
                      [--getSpect] [--getConfig] [--nofreq] [--nosub] [--MAF]
                      [--physPos] [--rec RRATE] [--fixSize] [-w R]
                      [--noCenter] [-s STEP] [--fixX X] [--rangeA SEQA]
                      [--listA LISTA]

You can use python BalLeRMix.py -h to see the more detailed help page.

1. Input format

For B0 and B2 statistics, the user should first generate the tab-delimited site frequency spectrum file, without header, e.g.:

<k> <sample size n> <proportion in the genome>
1 50 0.03572
2 50 0.02024
...

or the configuration file with polymorphism/substitution ratio, without header, e.g.:

<sample size n> <% of substitutions> <% of polymorphisms>
50 0.7346 0.2654

The input files should have four columns, presenting physical positions, genetic positions, number of derived (or minor) alleles observed, and total number of alleles observed (i.e. sample size). This file should be tab-delimited and should have a header, e.g.:

physPos genPos x n
16 0.000016 50 50
35 0.000035 12 50
...

2. Running the B statistics

To perform B2 scans on your input data, use

python BalLeRMix.py -i <input> --spect <derived allele frequency spectrum> -o <output>

To perform B2,MAF scans on your input data, use

python BalLeRMix.py -i <input> --spect <minor allele frequency spectrum> -o <output> --MAF

To perform B1 scans on your input data, use

python BalLeRMix.py -i <input> --config <sub/poly configuration file> -o <output> --nofreq

To perform B0 scans on your input data, use

python BalLeRMix.py -i <input> --config <derived allele frequency spectrum> -o <output> --nosub 

To perform B0,MAF scans on your input data, use

python BalLeRMix.py -i <input> --config <minor allele frequency spectrum> -o <output> --nosub --MAF

3. Generate helper files

To generate spectrum file for B2:

python BalLeRMix.py -i <concatenated input> --getSpect --spect <spectrum file name>

To generate spectrum file for B2,MAF:

python BalLeRMix.py -i <concatenated input> --getSpect --MAF --spect <spectrum file name>

To generate spectrum file for B1:

python BalLeRMix.py -i <concatenated input> --getConfig --spect <config file name>

To generate spectrum file for B0:

python BalLeRMix.py -i <concatenated input> --getSpect --nosbub --spect <spectrum file name>

To generate spectrum file for B0,MAF:

python BalLeRMix.py -i <concatenated input> --getSpect --nosub --MAF --spect <spectrum file name>

4. Customizing the scan

All arguments besides the aforementioned ones are for customizing the scan.

  • [--physPos] [--rec RRATE] :

    Because BalLeRMix uses genetic distances (in cM) to compute likelihood, to direct the software to use physical positions instead, you should use --physPos, and indicate the uniform recombination rate (cM/nt) in your species of interest with --rec. The default value is 10-6 cM/nt.

    This argument will be automatically incurred if you choose to fix the window size (e.g., 1000bp, 5kb, etc. ), in which case yuou want to make sure the software is correctly informed of the recombination rate. Using physical positions will also change how you define window sizes and step sizes, if you were to customize the scanning window.

  • [--fixX X] [--rangeA SEQA] [--listA LISTA]:

    These areguments allow you to specify the parameter space that the software optimizes over. The presumed equilibrium frequency is x, and the rate of decay in linkage disequilibrium is A. If you choose to look for multi-allelic balancing selection where more than two alleles are being balanced, x should be a vector of descending equilibrium frequencies, and should match the number of balanced alleles you chose (via -m) to scan for.

  • [--fixSize] [-w R] [--noCenter] [-s STEP] [--physPos]:

    These areguments are for customizing the scanning window. You probably won't need them because BalLeRMix is robust to window sizes. For more details on how these arguments work, check the v1 software manual.

About

Software package for BalLeRMix and scripts used in the study "Flexible mixture model approaches that accommodate footprint size variability for robust detection of balancing selection" (Cheng & DeGiorgio 2020)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published