Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identification list for species: #7

Open
DetlevCM opened this issue Jun 1, 2018 · 3 comments
Open

Identification list for species: #7

DetlevCM opened this issue Jun 1, 2018 · 3 comments

Comments

@DetlevCM
Copy link

DetlevCM commented Jun 1, 2018

Chemical names are nice for humans, but less practical for computers.
It would be nice if the cavities were accompanied by a list of CAS numbers (asked for by many journals nowadays too), to uniquely identify each compound.

There is an online tool available - and I have been running the names through the tool. (This is the one I used: http://cts.fiehnlab.ucdavis.edu/ )
I did not (!) verify every match of name to CAS number, but I suspect between nothing or some errors, some errors are the lesser evil. In addition, I determined that using the cavities with the "regular sigma range" is not posible for ions. Radicals as well as some heavier atom also seem to cause issues.

Here is the 'result' of matching names to CAS numbers for the MOPAC cavities:
POA1_working_compounds.txt
POA1_ions_radicals_CAS_not_found.txt

@DetlevCM
Copy link
Author

DetlevCM commented Jun 1, 2018

In the same effort, I just filtered the list for the GAMESS cavities:

list_HF.txt
removed_HF.txt

Side note:

I have also identified an issue with a CAS number. The tool gave me water as '13670-17-2' which is heavy water... - Normal water is '7732-18-5'

There is also at least one duplicate in the GAMESS database: tetrachloroethylene and tetrachloroethene.

@rpseng
Copy link
Contributor

rpseng commented Jun 1, 2018

Thanks for the lists.
We could try to improve on this on the future. Some points to keep in mind:

  • How to handle cations, anions and other possible intermediate radicals without CAS
  • How to handle multiple conformers of the same molecule (currently we are providing only one conformer)

Regarding the usual range of -0.025 to 0.025, this can happen. We are providing here the 'raw' apparent surface charges, after 'averaging' the surface charges this is less likely to happen.

@DetlevCM
Copy link
Author

DetlevCM commented Jun 1, 2018

Well, there is an extension to COSMO-SAC for electrolytes: https://pubs.acs.org/doi/abs/10.1021/ie100689g
Though that isn't of interest to me at current - so I cannot comment on it any further.
(And one more paper: https://www.sciencedirect.com/science/article/pii/S0378381218300347 )

In the case of the sigma profiles, where the range was exceeded, this was for the averaged charge density. Expanding the range takes care of the problem - or changing the parameterisation also would.
But as mentioned above, I am at present not interested in ions - so it is easier for me to just remove them. (And place them in a dedicated list.)
For all stable stepcies, the range of the averaged sigma profile does not exceed the parameterisation range.

As to how to properly handle ions and conformers: I don't know. I do know that other people are also interested in conformers. Whether I will work with them in the future, I do not know.
I would possibly suggest using a slightly more complex naming pattern: either using a "fake CAS number" to append information, say "-c00001", "-c00002" etc. and leave the treatment to code, or an additional column that provides an integer counter for the conformer number with identical CAS numbers. - I'm sure that different people will have different favoured approaches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants