- refresh for new R version
- work on assigner::fst_WC84
- Version bump because it updates numerous packages:
tidyr
,readr
, usingfuture
,carrier
. - GitHub actions to run the R-CMD-check on the 3 OS.
- Bug fix in heatmap: digits was fixed to 5 and when pop.levels = NULL, the heatmap was all mixed up. Thanks to @siberianhigh for highlighting the bug.
- included 2 simulated datasets
- updated documentation of
assignment_ngs
- vignette to get started with assigner, finally!
- this is really starting to smell like a CRAN release
- work on travis CI
- work on pkgdown
- cosmetic changes to the package: using
pkgdown
- updated documentation of
assignment_ngs
fst_WC84
: work faster- continue to integrate
assigner
withSeqArray
and GDS object/file
fst_WC84
: work with radiator v.1.0- will continue updating fucntions to work with latest radiator release and work toward releasing the official v.1.0 of assigner.
- Imputation module was removed from
assigner
and now lives exclusively in packagegrur
- working to make assigner work correctly with ggplot2 v.3.0.0
- assigner ready for R 3.5.1 "Feather Spray" released on 2018/07/05
- bug in
assignment_mixture
generated bypurrr::df
replaced recently bypurrr:dfr
. ChangedDESCRIPTION
field accordingly.
subsample
argument inassignment_ngs
andassignment_mixture
can now automatically detect the smallest sample size in the data's grouping. So you can usesubsample = "min"
to let the function decide (if your not sure).
- migration of
assigner
from usingstackr
->radiator
- restored progress bar when using parallel computing by installing the new dev
version of
pbmcapply
package.
- bug fix: removed the progress bar when using parallel computing. This is temporary, while waiting for a fix with
pbmcapply
package.
- assigner works with
dplyr v.0.7.0
dlr
: simplified arguments, faster function and now creates the Dlr plots- dependencies to package
SNPRelate
are removed until the bugs with Fst calculation are resolved.
- bug fix in
assignment_ngs
introduced in last commit that was suppose to be fix. Problem introduced bystackr::change_pop_names
.
- it's now official,
assigner
as a logo - faster
fst_NEI87
- unbalanced design impact on estimates can be tested with
subsample
anditeration.subsample
infst_NEI87
andfst_WC84
- until
SNPRelate
bias issue is resolved the option is unavailable - better use of
pbmcapply
for Windows - imputations is being reworked and will be buggy until the next update. The codes are being completely re-written and arguments will change (for the better).
- debug code to work in parallel with Windows
- code cleaning to prep for CRAN
assignment_ngs
andassignment_mixture
code cleaning to prep for CRAN and make them easier to debug.
- I'm pleased to announce that
assigner
now works in parallel with Windows - bug fix introduce in last commit in
write_gsi_sim
where the file was not created properly from an internal module.
assigner::fst_WC84
can now use SNPRelate to compute Fst. The confidence intervals are not implemented, yet. The speed increase left me speechless, dataset with 30K snp are computed in less than 15 sec!
assigner::fst_WC84
is 40% faster!
- bug fix
assignment_ngs
during imputations, the imputation module could not recognise that REF/ALT alleles are not necessary or usefull for assignment analysis. *enhancement toassignment_ngs
andassignment_mixture
so that whenmarker.number
include"all"
theiteration.method
is set automatically to1
when conducting the assignment with all the markers. Iterations at this point is useless and a waist of time. - random seed number is now stored in the appropriate files.
assignment_mixture
: withassignment.analysis = "gsi_sim
the unknown/mixture samples are compared with baseline populations using common markers between the pair. Now, the tables include the number of markers used. The summary provides the mean number of markers. This number will change each time randomness is used.
- bug fix in population not recognise properly
fst_NEI87
: very fast function that can compute: the overall and pairwise Nei's (1987) fst and f'st (prime). Bootstrap resampling of markers is avalaible to build Confidence Intervals. The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals. Jost's D is also given ;)
fst_WC84
: bug fix, the function was not properly configured for multi-allelic markers (e.g. microsatellite, and haplotype format from STACKS). Thanks to Craig McDougall for catching this.
assignment_mixture
: added a check to throw an error when pop.levels != the pop.id in strata
assignment_mixture
:
- updated with latest modules from
stackr
. - simplified the identification of mixture or unknown samples. See doc.
- updated vignettes
- major bug fix that involved dplyr new version (0.5.0) and mostly with the use of dplyr::distinct
- updated vignettes
- bug fix in
fst_WC84
- bug fix between assinger -> devtools -> github -> travis, [this page helped] (http://itsalocke.com/using-travis-make-sure-use-github-pat/)
- While changing some lines with
tidyr::spread
andtidyr::gather
fordata.table::dcast.data.table
anddata.table::melt.data.table
to make the code faster, I forgot to split genotype into alleles forgsi_sim
. - please update both stackr and assigner
- the build error from Travis will be fixed soon. It should not affect the package "experience"" in any way.
-
you need to update [stackr] (https://github.com/thierrygosselin/stackr) to v.0.2.7 to appreciate this new version of assigner.
-
updated
assignment_ngs
with the separate stackr modules to simplify the function. -
new data file available for
assignment_ngs
:genepop
andgenind
object. -
assignment_ngs
now accept any vcf input file! i.e. it’s no longer limited to stacks vcf. -
new arguments in
assignment_ngs
. The assignment using dapc can now use the optimized alpha scoreadegenet.dapc.opt == "optim.a.score"
or the cross-validationadegenet.dapc.opt == "xval"
. This is useful for fine tuning the trade-off between power of discrimination and over-fitting (for stability of group membership probabilities). Cross validation withadegenet.dapc.opt == "xval"
doesn't work with missing data, so it's only available with imputed data (i.e.imputation.method == "rf" or "max"
). With non imputed data or the default: the optimized alpha-score is used (adegenet.dapc.opt == "optim.a.score"
). When usingadegenet.dapc.opt == "xval"
, 2 new arguments are available: (1)adegenet.n.rep
and (2)adegenet.training
. See documentation for details. -
removed arguments in
assignment_ngs
. Removed thepop.id.start
andpop.id.end
arguments that were confusing people. For those used to these arguments, they are now recycled in the new functionindividuals2strata
in [stackr] (https://github.com/thierrygosselin/stackr). The strata file created by this function can be used with thestrata
argument inassignment_ngs
. -
2 modified arguments in
assignment_ngs
: (1)gsi_sim.filename
is nowfilename
; and (2) if you didn't use the imputation argument, replaceimputation.method = FALSE
toimputation.method = NULL
or leave the argument missing. -
simplified sections of codes in
assignment_ngs
that dealt withstrata
,pop.levels
andpop.labels
. -
new function:
write_gsi_sim
. Write a gsi_sim file from a data frame (wide or long/tidy). Used internally in [assigner] (https://github.com/thierrygosselin/assigner) and might be of interest for users.
- Added a
NEWS.md
file to track changes to the package. fst_WC84
is now a separate and very fast function that can compute: the overall and pairwise Weir and Cockerham 1984 Theta/Fst. Bootstrap resampling of markers is avalaible to build Confidence Intervals (For Louis Bernatchez and his students;). The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals.
- cleaner code for strata section
- bug fix restricted to
assignment_ngs
+assignment.analysis = "adegenet"
+sampling.method = "ranked"
. A line at the beginning of a gsi_sim code section was deleted makig the assignment with adegenet go through that chunk of code and causing 100% assignment! if (assignment.analysis = "gsi_sim") {code} prevent this problem...
- bug fixed using adegenet that was introduced in v.0.2.3
- introducing a new function
import_subsamples_fst
to import the fst ranking results from all the subsample runs inside an assignment folder.
- bug fixed in the compilation results section when no pop.id.start and end are used.
- updated the function
assignment_mixture
withsampling.method = "ranked"
andassignment.analysis = "adegenet"
.
- new function:
assignment_mixture
for mixture analysis.
- Simplified gsi_sim install
- You can now opt between [gsi_sim] (https://github.com/eriqande/gsi_sim) or [adegenet] (https://github.com/thibautjombart/adegenet), a R package developed by Thibaul Jombart, to conduct the assignment analysis
- New input file: Re-introduced the haplotype data frame file from stacks.
- Argument name change:
imputations
is nowimpute.method
. - New argument:
impute
with 2 options:impute = "genotype"
orimpute = "allele"
.
- Input file argument is now
data
and covers the three types of files the function can use: VCF file, PLINK tped/tfam or data frame of genotypes file. - Huge number of markers (> 50 000 markers) can now be imported in PLINK
tped/tfam format. The first 2 columns of the
tfam
file will be used for thestrata
argument, unless a new one is provided. Columns 1, 3 and 4 of thetped
are discarded. The remaining columns correspond to the genotype in the format01/04
whereA = 01, C = 02, G = 03 and T = 04
. ForA/T
format, use PLINK or bash to convert. Use [VCFTOOLS] (http://vcftools.sourceforge.net/) with--plink-tped
to convert very large VCF file. For.ped
file conversion to.tped
use [PLINK] (http://pngu.mgh.harvard.edu/~purcell/plink/) with--recode transpose
.
- bug fix in
method = "random"
andimputation
- Changed function name, from
GBS_assignment
toassignment_ngs
. Stands for assignment with next-generation sequencing data. - New argument
df.file
if you don't have a VCF file. See documentation. - New argument
strata
if you don't have population id or other metadata info in the individual name. See documentation.
- Changed arguments
THL
tothl
andsnp.LD
tosnp.ld
to follow convention. iterations.subsample
changed toiteration.subsample
.iterations
changed toiteration.method
to avoid confusion with other iteration arguments.- Removed
baseline
andmixture
arguments from the functionGBS_assignment
. These options will be re-introduce later in a separate function. - Using
marker.number
higher than the number of markers in the data set was causing problems. This could arise when using arguments that removed markers from the dataset (e.g.snp.ld
,common.markers
, andmaf
filters).
- new version to update with gsi_sim new install instruction for Linux and Mac.
After re-installing assigner package, follow the instruction to re-install
the new [gsi_sim] (https://github.com/eriqande/gsi_sim).
And delete the old binary 'gsisim' in the /usr/local/bin folder
with the following Terminal command:
sudo rm /usr/local/bin/gsisim