Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elsasserlib on uppmax #18

Open
simonelsasser opened this issue Jun 6, 2020 · 2 comments
Open

elsasserlib on uppmax #18

simonelsasser opened this issue Jun 6, 2020 · 2 comments

Comments

@simonelsasser
Copy link

@cnluzon it is possible to use the lib on uppmax as well I guess? It's now really simple and convenient to make a ChromHMM plot, violin plot etc from bigwig files, so I think we could add some R scripts to the pipeline that would run by group as defined in 'controls.tsv'.

You mentioned wanting to keep the pipeline in python as much as possible but I think for this kind of downstream analysis it is very useful to use code that people can easily adjust. E.g. if we had a few plotting scripts executed in the final stage of the pipeline, anyone with R knowledge could rerun them separately, include or exclude data or modify visualization by adapting the R script.

@simonelsasser
Copy link
Author

...and basically we could try to standardise R scripts in the lab to have a certain standard for how they take input so that everyones scripts could be added to the pipeline or run manually on the same config files, e.g. 'controls.tsv'

@cnluzon
Copy link
Collaborator

cnluzon commented Jun 6, 2020

Well the sticking to python preference is not an obligation, really. The underlying logic to me is that Snakemake is a python-based tool, and imports python source. So if there is a command-line tool that does exactly what we need, that's the best, but if there is not, python may fit more nicely to the rest of the code. And also reduces the amount of dependencies. This being said, it is an option that is open for discussion and I'd love to hear further considerations on what would be best in the context of a Snakemake (or any kind of) production pipeline.

But of course it is possible to include R code in downstream analysis. And there is even rpy2 which interfaces with R from python, if that ever was an issue. I need to think a little bit more about how this could be done in a way that it's useful, but I do have some immediate feelings about this:

  1. I'm not a big fan on making command line scripts with R, it feels a bit cumbersome to handle command-line parameters and so on (or maybe I have not mastered those yet ;) ).

  2. If eventually people in the lab would need to tweak the code, then it defeats the purpose of passing parameters through the command line, and it makes more sense to load a package like elsasserlib and run the functions you need, play with the code and eventually document what was done. If the functionality in the package is good enough, eventually the code needed should be a handful of function calls.

  3. we can still provide functionality to handle pipeline outputs or inputs: controls.tsv and the like. But I would rather put those in a package than in standalone scripts and only put in scripts what is strictly standard (no need to tweak to run it - just run directly from the pipeline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants