Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error tolerance of cutadapt should also be a parameter #6

Open
claraqin opened this issue Mar 27, 2020 · 3 comments
Open

Error tolerance of cutadapt should also be a parameter #6

claraqin opened this issue Mar 27, 2020 · 3 comments
Assignees

Comments

@claraqin
Copy link
Owner

Currently, error tolerance is assumed to be default -e 0.1. Changing the error tolerance could allow more primer sequences to be identified (and thus removed) by cutadapt.

See https://cutadapt.readthedocs.io/en/stable/guide.html#error-tolerance

To be fair, this might not be the most consequential parameter. Error tolerance of 0.1 corresponds to maximum mismatch of 2 bases. Allowing up to 3 mismatched bases would have removed an additional 138 reverse-primer matches from the reverse reads – approximately 10% of all the matches (with max.mismatch=3) in the pre-cutadapt reverse reads.

runB69PP, first R1 sample
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads     423          0       0       3
FWDPrimer.R2.reads      26          0       0      13
REVPrimer.R1.reads     647          0       0     298
REVPrimer.R2.reads     737          0       0       4

runB69PP, first R1 sample, setting max.mismatch = 2
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads     731          0       0       4
FWDPrimer.R2.reads      31          0       0     314
REVPrimer.R1.reads    1518          0       0     807
REVPrimer.R2.reads    1196          0       0      13

runB69PP, first R1 sample, setting max.mismatch = 3
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads     757          0       0       4
FWDPrimer.R2.reads      31          0       0     343
REVPrimer.R1.reads    1528          0       0     944
REVPrimer.R2.reads    1342          0       0      19

runB69PP, post cutadapt (1 of 2), first R1 sample
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads       0          0       0       3
FWDPrimer.R2.reads      26          0       0       0
REVPrimer.R1.reads     646          0       0       0
REVPrimer.R2.reads       0          0       0       4

runB69PP, post cutadapt (1 of 2), first R1 sample, setting max.mismatch = 2
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads       0          0       0       4
FWDPrimer.R2.reads      31          0       0       0
REVPrimer.R1.reads    1509          0       0       0
REVPrimer.R2.reads       0          0       0      13

runB69PP, post cutadapt (1 of 2), first R1 sample, setting max.mismatch = 3
                   Forward Complement Reverse RevComp
FWDPrimer.R1.reads       1          0       0       4
FWDPrimer.R2.reads      31          0       0       5
REVPrimer.R1.reads    1519          0       0     132
REVPrimer.R2.reads     138          0       0      19
@mykophile
Copy link
Collaborator

mykophile commented Mar 27, 2020 via email

@claraqin
Copy link
Owner Author

Hi Kabir,

If I'm not mistaken, I think you're bringing up a separate issue, so I've just opened up a new issue (#7) and tagged you in it.

Best,
Clara

@claraqin claraqin self-assigned this Oct 5, 2020
@claraqin
Copy link
Owner Author

Just a note that while I haven't made the cutadapt error tolerance into another parameter, I did modify utils.R in commit f3c1119 so that the hard-coded error tolerance -e was 0.2 instead of the default 0.1. This allows up to 4 mismatched bases instead of the default 2, which is necessary because of the low primer identification rates under the default error tolerance.

I'm actually not sure whether it would be a good idea to make this into a parameter, so I'll leave this issue open until we can reach more of a consensus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants