-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What do the filtN files do? #2072
Comments
The fastq files in the |
Oh in my case those files were much smaller....were those the ones I should have used for further analysis? I ended up using the original files and got pretty decent results from those. For example, one sample has 12154 seqs, but the filtered file has only 2...Almost all are like these. |
I think I am not understanding something because the primerHits values I get, and when I compare to the number of sequences in my files don't quite match up.
The above will return
I continue with my script in R and then in terminal as follows:
Above returns
If I check how many sequences are in the original file for sample 100, and compare it to the filtN file I get: Original = 12,586 seqs But in the first primerHits command, it was returning 121,169 RevComp primer hits. Does that mean the primer is just matching to several parts within the same sequence? And if most of my sequences do contain ambiguous reads it should be alright to continue using them? Otherwise I am getting samples going from 1000s of reads to 10 or less. |
A couple things: You appear to have single-end data, not paired-end data, so there are no Forward reads/Reverse reads here. A fixed version of the
The value was 12,169, which is close to the input number of reads (12,586). This indicates that in almost all reads the reverse-complement orientation of the REV primer is being detected, which is expected if the sequenced amplicon is usually shorter than your read length. So that is fairy normal. The main issue seems to be that almost all your reads contain N(s). While this can be worked around at the primer removal stage, the main dada2 denoising method does not allow for Ns. What does |
That looks fine. My guess is that the Ns are happening in your reads after the reverse primer is detected, and so are being truncated off the reads by cutadapt, in which case you are fine to move forward. |
Amazing, thanks so much for your help! |
I am confused about the pre-filtering step. After filtering the files for ambiguous sequences, these files are placed in the filtN folder. Can I assume that these are sequences removed from the original fastq files or are they just highlighted - as in - when running primerHits it is just looking at the filtered files to see where those ambiguous sequences are in the original files?
I am just confused because I run cutadapt in terminal on my fastq files but I don't see how it is using the filtered files in filtN.
The text was updated successfully, but these errors were encountered: