- for files with very long names I think R is truncating the file names for the pdf tree plot outputs. See (README.md)[README.md] for details
- docker container's R installation failed to install the ape package, so I got an error when trying to plot the trees. I think the newer ape package version available at CRAN probably fails to install on the very old version of R I have in the container. I fixed it by specifying an older version of ape, and by manually installing the Rcpp package rather than doing it as a dependency.
- more fixes to the
pw_parsedPAMLconvertToWideFormat.pl
script. Now assuming I'll mostly run in sbatch mode, on one gene at a time. If I need to combine results for pairs of CpG-masked and unmasked alignments, I'll make a different script to do that.
- tiny changes in pw_changenamesinphyliptreefile.pl to make it more useful outside the pipeline
- changed
pw_makeTreeAndRunPAML.pl
to use --cpg=1 option when it calls thepw_parsedPAMLconvertToWideFormat.pl
script. This fixes a situation whereby I was running PAML on a single CpG-masked alignment, and it produced an empty "wide" format output, because it didn't have the unmasked alignment to match the results up to.
- PAML 4.10.6 now requires tree files that contain an extra first line, but in paml_wrapper-v1.3.7, the
--usertree
option was broken for treefiles containing that extra first line. I've fixed that now.
- added an exit with error if there's only 1 seq in the alignment
- added a test for alignments containing only 2 sequences (in which case PHYML tree building fails). I make a fake tree file (
(seq1,seq2);
) and proceed with PAML. Results are pretty much meaningless for the more complex models, but I think M0 versus M0fixed might still be useful - added the ability to run the scripts on input files that are not in the current directory. Output files goes where input file are.
- no longer build the bad 4.10.6 PAML versions in the docker container
- remove the tarballs after compiling within the docker container, aiming for a smaller image
- version 4.10.6 is now the default (compiled from github source, commit af30c37) in all scripts and in rhino/gizmo path
- now not switching to --strict=loose for v4.10.6 (because it's not needed)
- now adding a first line (" numSeqs numTrees") to the tree input file for PAML, because newer codeml versions are more picky about tree format than the older versions were
- docker container now compiles a true v4.10.6 of PAML using PAML git repo commit af30c37 (Dec 1, 2022) (before I was using an out-of-date tarball that seemed like it was v4.10.6 but actually it wasn't)
- added smallDiff option to
pw_makeTreeAndRunPAML.pl
and to the sbatch and singularity wrappers
- Fixed tsv output so that every line has the same number of columns (displays better on github)
Feb 10 2023, commit 61d7eee
- minor changes, mostly cosmetic
Feb 9 2023
- added more PAML versions to the Docker container (4.9a, 4.9g, 4.9h, 4.9j, 4.10.6), to help me compare outputs from different versions. Specify which version to use with the option
--version=4.9a
(4.9a is the default) - added a
--verbose=1
option to give me a bit more information in thePAMLsummary.tsv
output file, to help with troubleshooting different results from different PAML versions
Nov 23 2022, commit 34f462f
- minor changes, mostly cosmetic
Nov 23 2022, commit 33aa2c4
- added --strict=loose option so that I can parse output of paml v4.10.6, because it does NOT print the 'Time used' message for M2 and M8 (the models where BEB is run)
Nov 23 2022, commit d889f42
- now capturing elapsed time to the parsed output
Nov 23 2022, commit c1a9927
- add --codeml option so that we can specify a different codeml executable. Useful for testing different versions of PAML. Not applicable when I'm running within singularity/docker: there I only installed a single version of PAML (currently 4.9j).
Nov 22 2022, commit 5f404a0
- now capturing PAML version in parsed output
Nov 22 2022, commit 41b5282
- now capturing tree file name in parsed output
Nov 22 2022, commit 2354751
- use PAML version 4.9j instead of 4.9a
- total rebuild of the Docker/singularity container using bioperl-Ubuntu-trusty as a base rather than the original miniconda base. Couldn't install bioperl any more using conda. Wanted to rebuild so I could use PAML version 4.9j instead of 4.9a, and couldn't do that without also fixing the conda-bioperl problem
Nov 16 2022, commit c0f6507
- when using the
--usertree
option, we now check that the seqnames match up between the alignment and the tree, and offer some hints on what to do if they don't.
Nov 15 2022, commit 2c3f9d0
- added
--usertree
option to allow use of a user-specified tree (e.g. known species tree) rather than the default behavior of creating a tree from the alignment. For now I am not doing any checks on the tree, so there are ways this might break.
Nov 1 2022, commit 974bf91
- fixed a bug in
pw_plottree.R
that messed up taxon names when making pdf plots of the trees (does not affect PAML output, only the tree plots) - make the omega plots look a tiny bit nicer, and added clear indication on the plot that we're not showing the full range of omega in some cases
- changed a column header in the tsv outputs: was 'seqToWhichSiteCoordsRefer', now 'seqToWhichAminoAcidsRefer'
- running via singularity wrapper now records container version number in the log.txt file
May 31 2022, commit f9e4bad
has known bug in pw_plottree.R
that messed up taxon names when making pdf plots of the trees. don't use tree pdf files!
May 5 2022
May 4 2022
May 3 2022
April 28 2022
April 28 2022