-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing Treatments with Multiple Paired End Replicates #125
Comments
You should run each of the treatments separately (bioreps for each treatment together). Then you need to use a differential analysis package to identify differential peaks. You can use the union of naive overlap peaks across all conditions as your complete set of peaks. Quantify read counts in each peak in each of the replicates and treatments. Then run these through DESeq2 or EdgeR or some other differential count analysis method. |
Thank you. But now I have another question: I see that the pipeline version I used (downloaded sometime in April) is now deprecated. Should I abandon my previous results and go with the new pipeline? |
No you dont have to re-run the pipeline. Its the same pipeline just dockerized so installs more easily on several platforms. We will improve installation and usage instructions of the new version (@leepc12 Note we need to improve documentation for the new version of the pipelines). I would suggest switching to when you can because we will only be developing the docker version going forward. |
@raboul101 Could you give us specifics on which part of the installation process with the new pipeline you found confusing. We are starting to improve documentation so best to get specific feedback from users. Thanks! |
https://encode-dcc.github.io/wdl-pipelines/install.html#local-computer-with-docker @raboul101: We are sorry about that, we wanted to have a unified documentation for all pipelines but that made users confusing. We will update the documentation. Until then, please let me know which step made you confusing. Also, please feel free to post issues on the new pipeline github repo (or here). [MINICONDA3_INSTALL_DIR]: where you installed miniconda3
New pipeline takes in a JSON file instead of parameters defined in command line arguments. You can find examples on |
Sorry for the late reply. Since I already have results with the old pipeline, I haven't proceeded with installing the docker-based pipeline. However, what was confusing was the input.json file. As I understand the new process, you So, Where does one obtain a template input.json, or if it needs to be created de novo, what is the proper format, What is the backend.conf file, and/or where is it? My main hang-up is where to get or how to create the .json, I think clearing that up will help greatly. And thank you for putting these pipelines together, they are a great resource. |
There are many template input JSON files in
We strongly recommend that users need to Sorry, I am still working on the documentation, will update it soon. |
What is the full path for those JSON examples? I don't see them in github: kundajelab/atac_dnase_pipelines/examples |
New pipeline repo is https://github.com/ENCODE-DCC/atac-seq-pipeline/ |
Aha! That clears it up. Thank you again. |
Hi llz-hiv, please repost this as a separate issue, as this is not related to the above thread. Also please consider subscribing to our pipelines google group, which may have additional useful information as you consider downstream analyses :) https://groups.google.com/forum/#!forum/klab_genomic_pipelines_discuss |
I have an ATACseq data set that includes three different treatments, each with three biological replicates. I have paired-end fastq files for each replicate. The question is: Can the fastqs for each treatment be run through the pipeline simultaneously, or must they be run separately and then compared through post-processing?
If treatments can be run simultaneously, could you provide an example how to properly phrase the BDS command? For more clarifcation, see below --
The Usage section of "https://github.com/kundajelab/atac_dnase_pipelines" states the following:
"For multiple replicates (PE), specify fastqs with -fastq[REP_ID][PAIR_ID]. Add -fastq[][] for each replicate and pair to the command line:replicates.
-fastq1_1 [READ_REP1_PAIR1] -fastq1_2 [READ_REP1_PAIR2] -fastq2_1 [READ_REP2_PAIR1] -fastq2_1 [READ_REP2_PAIR2] .."
This seems to suggest that one can only enter bioreps for one treatment, e.g. -fastq1_1 trt1_rep1_R1.fastq.gz -fastq1_2 trt1_rep1_R2.fastq.gz -fastq2_1 trt1_rep2_R1.fastq.gz and so on. I don't see any clear way to denote treatment. An example of this would be very helpful, if possible.
The text was updated successfully, but these errors were encountered: