Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reaction SMILES lossily handles enhanced stereochemistry #2720

Open
dan2097 opened this issue Jan 2, 2025 · 2 comments
Open

Reaction SMILES lossily handles enhanced stereochemistry #2720

dan2097 opened this issue Jan 2, 2025 · 2 comments

Comments

@dan2097
Copy link

dan2097 commented Jan 2, 2025

Summary
Indigo's reaction SMILES does not serialize certain stereochemistry that cannot be expressed in vanilla SMILES e.g. converting to and from reaction SMILES can be lossy. Expected behaviour is that such serialization is lossless, given that it is lossless for the individual molecules.

Steps to Reproduce
This script compares the InChI of a structure when loaded directly and when added as a reactant, and then serialized and deserialized from Reaction SMILES. The different InChIs clearly show the difference.

Indigo indigo = new Indigo();
IndigoInchi inchiGen = new IndigoInchi(indigo);
List<String> inputSmiles = new ArrayList<>();
inputSmiles.add("CCN([C@H]1CO[C@H](C[C@@H]1OC)O[C@@H]2[C@H]([C@@H]([C@H](O[C@H]2O[C@H]3C#C/C=C\\C#C[C@@]\\4(CC(=O)C(=C3/C4=C\\CSSC(C)(C)CC(=O)N/N=C(/C)\\C5=CC=C(C=C5)OCCCC(=O)N)NC(=O)OC)O)C)NO[C@H]6C[C@@H]([C@@H]([C@H](O6)C)SC(=O)C7=C(C(=C(C(=C7OC)OC)O[C@H]8[C@@H]([C@@H]([C@H]([C@@H](O8)C)O)OC)O)I)C)O)O)C(=O)C");
inputSmiles.add("OC[C@@](C(=O)O[C@H]1CN2CCC1CC2)(C2=CC=CC=C2)CSC |o1:2|");
inputSmiles.add("CO[C@H]1C[C@H](O[C@H]2[C@H](C)O[C@@H](O[C@@H]3/C(C)=C/C[C@@H]4C[C@@H](C[C@]5(C=C[C@H](C)[C@@H](C6CCCCC6)O5)O4)OC(=O)[C@@H]4C=C(C)[C@@H](O)[C@H]5OC/C(=C\\C=C\\[C@@H]3C)[C@@]45O)C[C@@H]2OC)O[C@@H](C)[C@@H]1O");

for (String smiles : inputSmiles) {
  IndigoObject mol = indigo.loadMolecule(smiles);
  String expectedInchi = inchiGen.getInchi(mol);
  IndigoObject reaction = indigo.createReaction();
  reaction.addReactant(mol);
  reaction = indigo.loadReaction(reaction.smiles());//This step is lossy!
  IndigoObject reactant = reaction.iterateReactants().next();
  String incorrectInchi = inchiGen.getInchi(reactant);
  if (!expectedInchi.equals(incorrectInchi)) {
    System.out.println(smiles);
    System.out.println(expectedInchi);
    System.out.println(incorrectInchi);
    System.out.println("-----------");
  }
}

Environment details:

  • Tested on Indigo 1.24 and 1.28.0-rc.1
@AlexanderSavelyev
Copy link
Collaborator

Hi @dan2097 . Thanks for the bug report. I didn't execute the code but I can see that one SMILES contains ... |o1:2|. Which is not Daylight SMILES (I guess what is vanilla). but it is a Chemaxon extension defining the OR enhanced stereo for the atom. I believe that Indigo does not save CXSMILES for the reaction, that is why there is a difference. On Chemaxon docs I also didn't find any specification how it should be saved for reactions or any reaction support.

@dan2097
Copy link
Author

dan2097 commented Jan 6, 2025

On Chemaxon docs I also didn't find any specification how it should be saved for reactions or any reaction support.

I think that's because the rules for reactions (with the exception of fragment level grouping, which is unique to reactions) aren't different than for molecules. The atom indexes are still assigned to the atoms in the order they're written in the reaction SMILES. At a glance I agree that the official documentation isn't explicit about this though.

Indigo does currently include some CxSMILES functionality when writing reaction SMILES. Specifically it includes fragment grouping, which is required to avoid a significant shortcoming of Daylight reaction SMILES.

The first example as both a reactant and a product outputs from MarvinSketch:
CCN([C@H]1CO[C@H](C[C@@H]1OC)O[C@@H]1[C@@H](O)[C@H](NO[C@H]2C[C@H](O)[C@H](SC(=O)C3=C(C)C(I)=C(O[C@@H]4O[C@@H](C)[C@H](O)[C@@H](OC)[C@H]4O)C(OC)=C3OC)[C@@H](C)O2)[C@@H](C)O[C@H]1O[C@H]1C#C\C=C/C#C[C@]2(O)CC(=O)C(NC(=O)OC)=C1\C2=C/CSSC(C)(C)CC(=O)N\N=C(\C)C1=CC=C(OCCCC(N)=O)C=C1)C(C)=O>>CCN([C@H]1CO[C@H](C[C@@H]1OC)O[C@@H]1[C@@H](O)[C@H](NO[C@H]2C[C@H](O)[C@H](SC(=O)C3=C(C)C(I)=C(O[C@@H]4O[C@@H](C)[C@H](O)[C@@H](OC)[C@H]4O)C(OC)=C3OC)[C@@H](C)O2)[C@@H](C)O[C@H]1O[C@H]1C#C\C=C/C#C[C@]2(O)CC(=O)C(NC(=O)OC)=C1\C2=C/CSSC(C)(C)CC(=O)N\N=C(\C)C1=CC=C(OCCCC(N)=O)C=C1)C(C)=O |c:27,48,66,110,142,163,181,225,t:31,80,99,101,146,195,214,216|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants