-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add start from bam #193
base: dev
Are you sure you want to change the base?
Add start from bam #193
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, starting from BAM sounds like an improvement! Do you think adding a BAM start as a CI test is necessary?
@@ -58,6 +58,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d | |||
#### Salmon | |||
|
|||
[`Salmon`](https://salmon.readthedocs.io/en/latest/) quantifies reads. | |||
Note that as Salmon has been setup to start from fastq files, it will not run if the pipeline starts from bam files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Salmon very well. Would it make sense to convert bam to fastq in order to run Salmon when starting from BAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would, however, I think that this should not be done in this PR, perhaps on the next one.
def id = "${ids}".replace("[","").replace("]","").replace(",","") | ||
def single_end = "${single_ends}".replace("[","").replace("]","").replace(",","") | ||
def sex_drop = "${sex}".replace("[","").replace("]","").replace(",","").replace("1","M").replace("2","F").replace("0","NA").replace("other","NA") | ||
def strandedness = "${strandednesses}".replace("[","").replace("]","").replace(",","") | ||
def id = ids.join(' ') | ||
def single_end = single_ends.join(' ') | ||
def sex_drop = sex.collect { it.replace("1","M").replace("2","F").replace("0","NA").replace("other","NA") }.join(' ') | ||
def strandedness = strandednesses.join(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more concise, just to prettify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so the .replace(...)
are no longer necessary?
SINGLE_ENDS=(${single_end}) | ||
BAMS=(${bam.join(' ')}) | ||
|
||
# Check if single_end values are provided | ||
updated_single_ends=() | ||
for ((i=0; i<\${#SINGLE_ENDS[@]}; i++)); do | ||
if [[ "\${SINGLE_ENDS[i]}" == "null" ]]; then | ||
result=\$(samtools view -c -f 1 "\${BAMS[i]}" | awk '{print \$1 == 0 ? "true" : "false"}') | ||
updated_single_ends+=("\$result") | ||
else | ||
updated_single_ends+=("\${SINGLE_ENDS[i]}") | ||
fi | ||
done | ||
|
||
# Convert updated_single_ends array to space-separated string and save to file | ||
echo "\${updated_single_ends[*]}" > updated_single_ends.txt | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to understand what's happening here. Are you checking if there are any paired ends in the BAM-file and then printing a false or true? Is this information not already in the meta/single_end
input to this process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be there if one starts from fastq, but it won't if you start from bam, that's why I added it. I am very open to any suggestion on how to make it more readable, because I totally agree 😄
ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ] | ||
ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ] | |
ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ] | |
ch_fastq_reads // channel: [optional] [ val(meta), [path(reads)] ] | |
ch_bam_bai_reads // channel: [optional] [ val(meta), [path(bam) path(bai)] ] |
ch_bam_reads = ch_bam_bai_reads.map { meta, bambai -> [ meta, bambai[0] ] } | ||
ch_bai_reads = ch_bam_bai_reads.map { meta, bambai -> [ meta, bambai[1] ] } | ||
|
||
ch_bam_aligned=ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch_bam_aligned=ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned) | |
ch_bam_aligned = ch_bam_reads.mix(STAR_ALIGN.out.bam_sorted_aligned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can now also be aligned reads in ch_bam_reads
right? ch_bam_aligned
and ch_bai
are "equivalents"?
I think I follow, but perhaps there could be more expressive names. It's a bit hard for me to see the difference between ch_bam_aligned
+ ch_bai
, ch_bam_bai
and ch_bam_bai_out
together with the different ifs.
|
||
ch_samplesheet | ||
.map { meta, files -> | ||
[meta.sample, groupKey(meta + [id: meta.sample], meta.fq_pairs ?: 1), files] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does meta.fq_pairs ?: 1
work for the bam branch?
versions = ch_versions | ||
emit: | ||
samplesheet = ch_samplesheet | ||
versions = ch_versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent everything (from line 75 to here)?
def expectedNumber = meta[0].single_end ? 1 : 2 | ||
def sampleNumber = files.flatten().size() | ||
if (expectedNumber != sampleNumber) { | ||
error("Samplesheet contains incorrect number of fastq files for sample ${sample}. Expected ${expectedNumber}, got ${sampleNumber}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| `bam` | Full path to BAM file. | Provide either fastq_1 or bam | | ||
| `bai` | Full path to BAM index file. | Provide either fastq_2 or bai | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the descriptions match here? "Provide either fastq_2 or bai"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part is whether the file is mandatory or not, its the third column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was a copy paste error. Should bai not always be mandatory when you have bam then, or will it work without bai?
@@ -98,17 +98,19 @@ Running the pipeline involves three steps: | |||
|
|||
#### Samplesheet | |||
|
|||
A samplesheet is used to pass the information about the sample(s), such as the path to the FASTQ files and other meta data (sex, phenotype, etc.,) to the pipeline in csv format. | |||
A samplesheet is used to pass the information about the sample(s), such as the path to the FASTQ/BAM files and other meta data (sex, phenotype, etc.,) to the pipeline in csv format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In long read unaligned reads are often stored as BAM files rather than fastq. Is it necessary to specify aligned reads for BAM, or would it be obvious to the user that BAM equals aligned reads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure to be honest, I even wonder if it would work uBAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for most people BAM = aligned reads, so I think you can ignore my question.
"bam": { | ||
"errorMessage": "BAM file cannot contain spaces and must have extension '.bam'", | ||
"type": "string", | ||
"pattern": "^\\S+\\.bam$", | ||
"format": "file-path", | ||
"exists": true | ||
}, | ||
"bai": { | ||
"errorMessage": "BAM index file cannot contain spaces and must have extension '.bai'", | ||
"type": "string", | ||
"pattern": "^\\S+\\.bai$", | ||
"format": "file-path", | ||
"exists": true | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add it later but it would be nice with the possibility to start from cram as well.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).