-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trimming with custom sequences #1174
Comments
Hi @sarah-buddle, There's nothing obvious in your command that should stop this from working. Are you able to share the test data file with us? |
Sorry for the delayed reply, here are my test input and output files. |
Hi @sarah-buddle, There are a few things that could be affecting this:
I think no.3 is what is occurring during your You may also want to try dorado 0.9.0 - this appears to have better results (since it's possible to specify which end the primer sequence relates to, and it's therefore possible to skip the RC search, I suspect). Note that this will require some tweaking to your custom primer file:
and then calling with:
See the docs here for details. |
Thanks that's very helpful! I suspect it may be mainly down to point 2, possibly in combination with a non-standard adapter pattern. In answer to each of your points:
I tried running dorado v0.9.0. This didn't trim any of the primers at all (with v0.8.3 it trims most but not all of them), so perhaps I misunderstood the documentation. The command I used was:
and my custom_primers.fastq looked like this:
|
Assuming your barcodes are arranged in the "normal" way:
then I would expect barcode trimming to remove your adapters as well. If you have something else (like primers and barcodes the other way around) then anything inward of the barcode will remain in place ( Re. point 2: you could try running with the Could you share the commands for your entire pipeline, from start to end? |
Issue Report
I have run dorado demux and I've noticed that some primer/adapter sequences are left at the start the reads, so I have tried to run dorado trim to remove them. I've tried providing the sequence of the primers using --primer-sequences and also without this parameter. In both cases, some reads are trimmed but I find that in the output file many reads (100s out of 3195 total reads) still contain these sequences at the start. Is there a setting I should change so the trimming works for all the reads?
Steps to reproduce the issue:
The command I am using is:
dorado trim input.fastq --emit-fastq --primer-sequences primers.fasta > output.fastq
The input file was the output of the dorado demux command (with trimming enabled) run on a previously basecalled fastq file.
Custom primer sequences that are still found in output:
Run environment:
Logs
[2024-12-13 16:18:57.672] [info] Running: "trim" "input.fastq" "--primer-sequences" "primers.fasta" "--emit-fastq" "-v"
[2024-12-13 16:18:57.672] [debug] > adapter/primer trimming threads 58, writer threads 6
[2024-12-13 16:18:57.673] [info] - Note: FASTQ output is not recommended as not all data can be preserved.
[2024-12-13 16:18:57.677] [info] > starting adapter/primer trimming
[2024-12-13 16:18:57.727] [debug] Total reads processed: 3195
[2024-12-13 16:18:57.781] [info] > Simplex reads basecalled: 3195
[2024-12-13 16:18:57.781] [info] > finished adapter/primer trimming
The text was updated successfully, but these errors were encountered: