You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
Indigo's reaction SMILES does not serialize certain stereochemistry that cannot be expressed in vanilla SMILES e.g. converting to and from reaction SMILES can be lossy. Expected behaviour is that such serialization is lossless, given that it is lossless for the individual molecules.
Steps to Reproduce
This script compares the InChI of a structure when loaded directly and when added as a reactant, and then serialized and deserialized from Reaction SMILES. The different InChIs clearly show the difference.
Hi @dan2097 . Thanks for the bug report. I didn't execute the code but I can see that one SMILES contains ... |o1:2|. Which is not Daylight SMILES (I guess what is vanilla). but it is a Chemaxon extension defining the OR enhanced stereo for the atom. I believe that Indigo does not save CXSMILES for the reaction, that is why there is a difference. On Chemaxon docs I also didn't find any specification how it should be saved for reactions or any reaction support.
On Chemaxon docs I also didn't find any specification how it should be saved for reactions or any reaction support.
I think that's because the rules for reactions (with the exception of fragment level grouping, which is unique to reactions) aren't different than for molecules. The atom indexes are still assigned to the atoms in the order they're written in the reaction SMILES. At a glance I agree that the official documentation isn't explicit about this though.
Indigo does currently include some CxSMILES functionality when writing reaction SMILES. Specifically it includes fragment grouping, which is required to avoid a significant shortcoming of Daylight reaction SMILES.
The first example as both a reactant and a product outputs from MarvinSketch: CCN([C@H]1CO[C@H](C[C@@H]1OC)O[C@@H]1[C@@H](O)[C@H](NO[C@H]2C[C@H](O)[C@H](SC(=O)C3=C(C)C(I)=C(O[C@@H]4O[C@@H](C)[C@H](O)[C@@H](OC)[C@H]4O)C(OC)=C3OC)[C@@H](C)O2)[C@@H](C)O[C@H]1O[C@H]1C#C\C=C/C#C[C@]2(O)CC(=O)C(NC(=O)OC)=C1\C2=C/CSSC(C)(C)CC(=O)N\N=C(\C)C1=CC=C(OCCCC(N)=O)C=C1)C(C)=O>>CCN([C@H]1CO[C@H](C[C@@H]1OC)O[C@@H]1[C@@H](O)[C@H](NO[C@H]2C[C@H](O)[C@H](SC(=O)C3=C(C)C(I)=C(O[C@@H]4O[C@@H](C)[C@H](O)[C@@H](OC)[C@H]4O)C(OC)=C3OC)[C@@H](C)O2)[C@@H](C)O[C@H]1O[C@H]1C#C\C=C/C#C[C@]2(O)CC(=O)C(NC(=O)OC)=C1\C2=C/CSSC(C)(C)CC(=O)N\N=C(\C)C1=CC=C(OCCCC(N)=O)C=C1)C(C)=O |c:27,48,66,110,142,163,181,225,t:31,80,99,101,146,195,214,216|
Summary
Indigo's reaction SMILES does not serialize certain stereochemistry that cannot be expressed in vanilla SMILES e.g. converting to and from reaction SMILES can be lossy. Expected behaviour is that such serialization is lossless, given that it is lossless for the individual molecules.
Steps to Reproduce
This script compares the InChI of a structure when loaded directly and when added as a reactant, and then serialized and deserialized from Reaction SMILES. The different InChIs clearly show the difference.
Environment details:
The text was updated successfully, but these errors were encountered: