Skip to content

Latest commit

 

History

History
250 lines (188 loc) · 11.9 KB

README.md

File metadata and controls

250 lines (188 loc) · 11.9 KB

GitHub GitHub GitHub

Research Publication

Achintha Ihalage and Yang Hao. Analogical discovery of disordered perovskite oxides by crystal structure information hidden in unsupervised material fingerprints. npj Computational Materials 7, 75 (2021). https://doi.org/10.1038/s41524-021-00536-2

alt text

Installation

The main project requires python3.6. Make sure you have the pip3 module installed. The web scraping tool requires python2.7.

  1. Inside the root directory (analogmat), execute pip install --ignore-installed -r requirements.txt to install the dependencies. (Note that installing pymatgen and tensorflow using pip may produce errors. In this case, please follow the installation documentations of pymatgen and tensorflow for clean installation)
  2. If you wish to use the web scraper, execute pip2 install -r requirements2.txt command as well.
  3. This should install all packages required to run analogmat on your machine. Please open an issue if installation errors occur.

Usage

Evaluation of machine learning classification models

from ML.classification import PVClassifier
clf = PVClassifier()
clf.train_and_test(algo='gradient_boosting')  # algo`: {‘gradient_boosting’, ‘random_forest’, ‘decision_tree’, '`svm`}, default=’gradient_boosting’
clf.plot_confusion_matrix(algo='gradient_boosting') # 10-fold CV confusion matrix
clf.plot_roc_curve()  # 10-fold CV ROC curve
Confusion matrix ROC curve
Alt text Alt text

Plotting the perovskite likelihood of selected A(B1-xB'x)O3 compositions

from ML.plot_results import ABBO3_Viz
viz = ABBO3_Viz()
viz.plot_Bdoped()

alt text

Screening potential perovskites from the composition space

The database of total possible compositions is too large for github (332Mb). You can download this database here. Place this file inside ICSD_data directory and execute the following code.

clf.get_perovskite_candidates(prob_threshold=0.95, no_iterations=100)

100%|████████████████████████████████████████████████████████████| 100/100 [03:43<00:00,  2.23s/it]
 
##################### Classification Results ###############################

46228 new perovskite candidates were found out of 591129 hypothetical compounds!
92.18 % of total compounds were discarded!

############################################################################


Autoencoders (VAE & vanilla)

AutoEncoder class implements the unsupervised material fingerprinting model. Materials analogies can be investigated in a bi-directional manner. That is, "What are the analogous experimental materials to an arbitary composition?" (enabling crystal structure prediction) and "What are the analogous unstudied perovskites to a target experimental material?" (enabling analogical materials discovery).

Following is the code snippet to find 5 experimental analogies (nearest neighbours-NNs) to the composition (K0.5Bi0.5)ZrO3. It is important to write the chemical formula in standard notation with brackets to identify the disordered site - (A1-xA'x)BO3 or A(B1-xB'x)O3. Note that RA should be greater than RB.

from autoencoder import AutoEncoder
ae = AutoEncoder()
model = ae.build_AE(vae=True)  # vae=False for vanilla autoencoder
model.load_weights('saved_models/best_model_VAE.h5')  # best_model_AE.h5 for vanilla autoencoder
exp_analogs = ae.most_similar(model, '(K0.5Bi0.5)ZrO3',  n=5, vae=True)
print (exp_analogs)
   CollectionCode        HMS CrystalSystem      StructuredFormula  Euclidean Distance
0           92640  P 4/m m m    tetragonal  (K0.667Th0.333)(TiO3)            0.064202
1           28621  P 4/m m m    tetragonal     (Ba0.8Pb0.2)(TiO3)            0.076766
2          291164    P 4 m m    tetragonal     (Ba0.95Pb0.05)TiO3            0.080477
3          157807    P 4 m m    tetragonal   (Ba0.67Pb0.33)(TiO3)            0.083409
4            5513    P 4 m m    tetragonal        (K0.5Bi0.5)TiO3            0.116703

Unstudied compositions to a target material, for example (Ba0.5Nd0.5)MnO3 can be obtained as follows.

cand_analogs = ae.most_similar_cand_perovskites(model, '(Ba0.5Nd0.5)MnO3',  n=5)
print (cand_analogs)
    StructuredFormula  Mean_classification_prob  Euclidean Distance
0  (Ba0.35Gd0.65)MnO3                  0.990954            0.018537
1  (Sr0.95Pb0.05)TcO3                  0.985223            0.025213
2     (K0.5Yb0.5)HfO3                  0.984748            0.027341
3  (Ba0.45Eu0.55)MnO3                  0.984248            0.028650
4    (Sr0.9Pb0.1)TcO3                  0.985134            0.030750

We can bypass the new compositions having toxic or expensive elements with except_elems argument. A probability threshold can also be set. For example, relaxor ferroelectric Pb(Mg0.33Nb0.67)O3 as the target material;

cand_analogs = ae.most_similar_cand_perovskites(model, 'Pb(Mg0.33Nb0.67)O3', except_elems=[ 'Tl', 'Pb', 'Hg',  'Cd'], prob_threshold=0.80, n=5, vae=True)
print (cand_analogs)
     StructuredFormula  Mean_classification_prob  Euclidean Distance
0     Bi(Sc0.2Ni0.8)O3                  0.841687            0.013217
1     Bi(Sc0.2Co0.8)O3                  0.837163            0.017750
9   Bi(Ti0.55Cr0.45)O3                  0.896630            0.071381
12   Bi(Ti0.95V0.05)O3                  0.873715            0.073236
13  Bi(Ti0.75Cr0.25)O3                  0.873786            0.075084

Following snippet can be used to predict the crystal system and space group of 2104 experimental compositions based on the plurality vote of 5 nearest neighbours in the fingerprint space.

from validate_fingerprints import CrystalSystem
cc = CrystalSystem()
cc.validate()  # prediction of crystal system and space group
cc.get_confusion_matrix()
cc.get_spg_conf_mat()
Crystal system Space group
Alt text Alt text

Next, we assess the capability of supervised machine learning algorithms to classify crystal system and space group of 2104 experimental compositions with leave-one-out cross validation (LOOCV).

from ML.crystal_system_clf import StructureClf
sclf = StructureClf()
sclf.crystal_system_clf(algo='svm')  # LOOCV   # algo`: {‘gradient_boosting’, ‘random_forest’, ‘decision_tree’, '`svm`}, default=’gradient_boosting’
sclf.cross_val_conf_mat(algo='svm')
sclf.spg_clf(algo='gradient_boosting')    # space group classification

The fingerprint spaces obtained by VAE and vanilla autoencoder for the experimental database can be visualized with;

from autoencoder import AutoEncoder
from plot_df import Fingerprints
ae = AutoEncoder()
fprints = Fingerprints(ae)
fprints.plot_fingerprints(model='vae')    # model='ae' for vanilla autoencoder
Variational autoencoder Vanilla autoencoder
Alt text Alt text

T-SNE and PCA are widely used dimensionality reduction algorithms. High dimensional discrete material features can be projected to two-dimensions (2D) using these algorithms and visualized as follows.

fprints.plot_pca_tsne(algo='tsne')  # algo = 'pca' to visualize with PCA algorithm
t-SNE PCA
Alt text Alt text

We can retrain the autoencoders as follows. This will overwrite the existing model. The parameters can be changed from the code.

from autoencoder import AutoEncoder
ae = AutoEncoder()
ae.train(vae=True)

Web scraper

The tool is implemented in python2.7 to scrape the Bing search engine. This would require numpy, scipy, pandas, monty and pymatgen versions compatible with python2.7 as listed in requirements2.txt file. The usage is as follows.

First, navigate to the web_scraper directory. Next, run the following python2 program.

from bing_scraper import BingScraper
bs = BingScraper()
result = bs.scrape_compound('(Ba0.5Sr0.5)TiO3')
print result
########################################
(Ba0.5Sr0.5)TiO3  is found on web!!! 
See below for results


# TITLE: <h2>(PDF) Dielectric properties of (Ba0.5Sr0.5)TiO3 thin films ...</h2>
#
# DESCRIPTION: <p>Dielectric properties of (Ba0.5Sr0.5)TiO3 thin films</p>
# ___________________________________________________________
#
# TITLE: <h2>Dielectric properties of (Ba0.5Sr0.5)TiO3 thin films - CORE</h2>
#
# DESCRIPTION: <p>The dielectric properties of (Ba0.5Sr0.5)TiO3 (BST) thin films with high electrical resistivity were investigated. BST films are deposited on Pt/TiO2/SiO2/Si substrates by a metal-organic deposition (MOD) method. The dielectric permittivity and ac conductivity of the films are measured in the frequency range 102-105 Hz. The dielectric permittivity εr decreases slightly with frequency f ...</p>
# ___________________________________________________________
#
# TITLE: <h2>Leakage current of (Ba0.5Sr0.5)TiO3 thin film ... - CORE</h2>
#
# DESCRIPTION: <p>The leakage current and relative permittivity of (Ba0.5Sr0.5)TiO3 (BST) thin films prepared by pulsed-laser deposition (PLD) were investigated. It was found that the leakage current for positive bias voltage was higher than that for negative bias voltage, which was attributed to the lattice mismatch between the bottom Pt electrode and the BST thin film. A time-dependent breakdown process under ...</p>
# ___________________________________________________________
#
# TITLE: <h2>Dielectric Properties and Leakage Current Characteristics ...</h2>
#
# DESCRIPTION: <p>The X-ray studies indicated that both MT and Ba0.5Sr0.5TiO3 are highly oriented and remain as two distinct individual entities in the composite films and a considerable reduction in the dielectric loss and leakage currents has been observed.</p>
# ___________________________________________________________

...
...
...

Citation

@Article{Ihalage2021,
author={Ihalage, Achintha
and Hao, Yang},
title={Analogical discovery of disordered perovskite oxides by crystal structure information hidden in unsupervised material fingerprints},
journal={npj Computational Materials},
year={2021},
month={May},
day={21},
volume={7},
number={1},
pages={75},
issn={2057-3960},
doi={10.1038/s41524-021-00536-2},
url={https://doi.org/10.1038/s41524-021-00536-2}
}

Questions and comments

Please contact a.a.ihalage@qmul.ac.uk or y.hao@qmul.ac.uk.

Funding

We acknowledge funding received by The Institution of Engineering and Technology (IET) under the AF Harvey Research Prize. This work is supported in part by EPSRC Software Defined Materials for Dynamic Control of Electromagnetic Waves (ANIMATE) grant (No. EP/R035393/1)