Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for standard machine-learning-type descriptors? #15

Open
jkshenton opened this issue Dec 8, 2021 · 4 comments
Open

add support for standard machine-learning-type descriptors? #15

jkshenton opened this issue Dec 8, 2021 · 4 comments

Comments

@jkshenton
Copy link
Member

It might be useful to have a SOAP gene. i.e. have soprano calculate (possibly via another package such as https://github.com/SINGROUP/dscribe) standard machine learning descriptors such as SOAP.

@stur86
Copy link
Contributor

stur86 commented Dec 8, 2021

That looks a bit similar to what BondOrder is doing (https://github.com/CCP-NC/soprano/blob/master/soprano/properties/order/order.py and there's also a gene version). I agree these sort of descriptors could be useful; given they're already implemented in this library, we could add it as a dependency (even just an optional one) and then have some kind of general interface to it.

About the SOAP descriptor in particular, it seems to me like the general idea is that for each atom it produces one vector with one parameter per species and angular momentum channel representing its neighbourhood. Thinking about turning these into Soprano genes, you could linearise the entire vector (so for example a (32, 422) descriptor array would become a (13504,) one), but the problem I see with that is that it wouldn't necessarily work to compare structures that are almost identical if they happen to have atomic positions swapped. It would work if we either standardised the order of atoms somehow or did something else like reducing the descriptors only to species averages. Or we could possibly combine this with spglib, find the inequivalent sites in the crystal and only compare atoms in those, with some order again standardised on the basis of the symmetry (anything goes and we can figure it out later, as long as it's a consistent convention).

@jkshenton
Copy link
Member Author

An optional dependency could be the way to go, though actually I just tried to pip install dscribe alongside soprano and things didn't go very well (the numba required by dscribe wanted a slightly older version of numpy).

Anyway regarding SOAP, I can see how the re-ordering etc could get very messy! Luckily we're not the first to face such questions. I came across this from the Ceriotti group: https://doi.org/10.1039/C6CP00415F see section 2.2 for various approaches including their "regularized-entropy match" (REMatch) kernel (eqn 12).

@stur86
Copy link
Contributor

stur86 commented Dec 8, 2021

Ah, good, I'll look into it! Right now I haven't made the phylogenetic stuff much of a priority since I'm not aware of anyone that's using it at the moment, but that could change if we did have some expressions of interest (and of course revamping or adding some functionality could generate interest too, so there's that).

The installation failing is a problem, at least a good reason to have it be optional. But if it's hung on an old Numpy version that doesn't bode well. Is it possible that we can just do SOAP with the machinery we already have in Soprano anyway? As I mentioned, the BondOrder functionality is very similar, using spherical harmonics too. Spherical harmonics are defined in Scipy, and Clebsch-Gordan and Wigner 3j coefficients are in Soprano's utils module. Or there could be other libraries supporting it.

@dch0ph
Copy link

dch0ph commented Sep 30, 2022

We're experimenting the Ceriotti group's ShiftML2 which uses SOAP descriptors for machine learned chemical shifts. We'll have a dig to see how "accessible" this functionality is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants