Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve typing of functions in 'crispr' module #215

Open
dgruano opened this issue Mar 21, 2024 · 12 comments
Open

Improve typing of functions in 'crispr' module #215

dgruano opened this issue Mar 21, 2024 · 12 comments
Assignees

Comments

@dgruano
Copy link
Contributor

dgruano commented Mar 21, 2024

I was playing around with the crispr module and came across a weird error where the cut coordinates of a cas9 object were way larger than the target sequence.

from pydna.dseqrecord import Dseqrecord
from pydna.crispr import cas9

guide = Dseqrecord("GTTACTTTACCCGACGTCCC")
target = Dseqrecord("GTTACTTTACCCGACGTCCCaGG")

# Create an enzyme object with the guide RNA
enzyme = cas9(str(guide.seq))

# Search for a cutsite in the target sequence
print(enzyme.search(target))  # prints [148] (should be 18)
print(len(target))  # prints 23

The problem was that I was passing a Dseqrecord object and not a string. I am not very familiar yet with the rest of pydna so do most functions require a string or a Dseq / Dseqrecord object? Should we check the input type within the functions or add type hinting?

Let me know if I can help.

@BjornFJohansson
Copy link
Collaborator

Hi and thanks for your interest in pydna. I have been busy with this years round of grant proposals, nomrally I try to respond quicker.

The crispr module right now is a minimally working example.
I think the way to go here is to specify something that intuitively describes a linear ssDNA molecule.
In pydna, Dseq and Dseqrecords are used for dsDNA.
I think better type hinting at the least and perhaps accepting pydna.seqrecord.SeqRecord would make sense?

@manulera
Copy link
Collaborator

manulera commented Sep 5, 2024

Hi @dgruano maybe you want to give a go at this one in the Hackathon?

@manulera
Copy link
Collaborator

manulera commented Sep 5, 2024

Related to #257

@dgruano
Copy link
Contributor Author

dgruano commented Sep 5, 2024

Yes, I was counting on doing that!

(actually I would swear I had tagged this issue on #257 yesterday...)

@manulera
Copy link
Collaborator

manulera commented Sep 5, 2024

A nice followup to this is the documentation: #259

@hiyama341
Copy link
Collaborator

I also have some ideas that would be cool to implement if you wanna team up for the hackathon @dgruano :)

@dgruano
Copy link
Contributor Author

dgruano commented Sep 5, 2024

I'm all ears!

@hiyama341
Copy link
Collaborator

Hi @dgruano, so some of the things I was thinking of incorporating are:

  • Off-target counter as a method. I have a script that does this, which people usually ask for first thing if they do CRISPR experiments. Here we could add seed length as an argument. Also incorporating something like this: https://github.com/secondarymetabolites/nearmiss would be nice in terms of finding substitutions to have even fewer off-target effects.
  • Other Cas-systems would be nice to have i.e. Cas12a, Cas3, Cas13. There are common themes in how they work but are still different in regards to where the pam is etc. (Also have some scripts for this)
  • CRISPR-BEST integration (I have some scripts for this, but check out this cool method here https://pubs.acs.org/doi/full/10.1021/acssynbio.3c00188 ). There is something with sequence context that is quite important i.e what comes before a cytosine etc. if you want to have successful experiments every time and hardcoding this into pydna would be amazing (Check it out here: https://www.nature.com/articles/nbt.4199)

These were just some preliminary thoughts. Looking forward to hearing what you think. :)

@dgruano
Copy link
Contributor Author

dgruano commented Sep 24, 2024

Those are really good suggestions! Maybe we could compile a list of enzymes and methods with appropriate references and then detail the needed steps (e.g. Cas12 is just creating a new enzyme class, but CRISPR-BEST may need new functions). Something like:

Feature Type Reference
Cas12 / Cpf1 New enzyme https://www.cell.com/cell/fulltext/S0092-8674(15)01200-3
Alternative Cas9 New enzyme https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393360
Analyze sequence context New feature https://www.nature.com/articles/nbt.4199
Genome editing New feature https://pubs.acs.org/doi/full/10.1021/acssynbio.3c00188 and here

I am unsure how you would use nearmiss to limit off-targets, can you develop what were you thinking? I will certainly give it a look for my other suggestion in #267 !

@dgruano
Copy link
Contributor Author

dgruano commented Sep 24, 2024

Other possible features:

Near PAM-less / PAM-flexible enzymes

The CRISPR module should also support those Cas enzymes that have more than one PAM. Forr this, we have to:

  • Support for ambiguous nucleotide notation (IUPAC notation in the PAM sequence.
  • Convert this into all the compatible PAMs
  • Change the way the search regexp is compiled to support multiple PAMs or allow the cas object to return several objects

PAM site search

Taking advantage of Dseq.get_cutsites() we could check all posible PAMs with the currently implemented Cas enzymes (or those enzymes in the collection of the user). We could add a constant crispr.CAS_ENZYMES in the module.

On-target and off-target scores

I'm not very knowledgeable on this respect, but could be a nice addition for the designed guides. Some references are:
On-Target

Off-Target

@dgruano
Copy link
Contributor Author

dgruano commented Sep 24, 2024

I totally missed this one:

Support for base editors

This is related to something we want to do in ShareYourCloning. We could achieve this like:

  • Create a subclass of the cas enzyme that cannot cut (only target). We could add the base editing functionality inside it or attach a BaseEditor object. I don't know how modular the base editors are (i.e. if we can combine different cas enzymes with disticnt PAMs and scaffolds together with different editing enzymes).

@hiyama341
Copy link
Collaborator

Cool suggestions @dgruano!

  • Regarding base editing, this is something I worked with in StreptoCAD and that CRISPYweb also does. We could make a subclass like you suggest since they work almost exactly like Cas9 just with an editing window.
  • For the On-target, I think it is something we can add. I found this tool that could be used for inspiration: https://academic.oup.com/bioinformatics/article/38/24/5437/6769890?login=false . In terms of CRISPR efficacy I think it is not needed - most tools are simply not accurate enough - and all the wet lab scientists I know don't really believe in them and follow the approach of trying a few guides instead, which works super well.

For the nearmiss, I think it is a bit of an overkill since the computational load is pretty heavy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants