Skip to content

Latest commit

 

History

History
117 lines (89 loc) · 3.67 KB

README.md

File metadata and controls

117 lines (89 loc) · 3.67 KB

DMPmetal

DMPmetal is a deep learning-based method for predicting metal binding sites from amino acid sequences. This uses a pretrained protein language model (pLM) to embed the target sequences and to provide the features for simple feed-forward classifier. From a user perspective, the input to the model is a single amino acid sequence, and the output probabilities relate to each of the 29 CHEBI metal codes. This model was ranked 1st in the UniProt Metal Binding Site Machine Learning Challenge held in 2022, and was trained on the organizers’ provided NEG_TRAIN and POS_TRAIN_FULL datasets, based on curated UniProt annotations (http://insideuniprot.blogspot.com/2022/02/the-uniprot-metal-binding-site-machine.html).

Ansible installation

First ensure that ansible is installeded on your system, then clone the github repo.

pip install ansible
git clone https://github.com/psipred/DMPmetal.git
cd DMPmetal/ansible_installer

Next edit the the config_vars.yml to reflect where you would like DMPmetal and its underlying data to be installed.

You can now run ansible as per

ansible-playbook -i hosts install.yml

You can edit the hosts file to install s4pred on one or more machines. Ansible installation creates a python virtualenv called dmpmetal_env. You activate this with

source [app path]/dmpmetal_env/bin/activate

If you're using a virtualenv to install Torch you may find you need to add the paths to virtualenv versions of cudnn/lib/ and nccl/lib/ to your LD_LIBRARY_PATH

Installation

DMPmetal requires pytorch and the flash-attn packages. At the time of writing flash-attn most easily installs with pytorch 2.0.1 and cuda 11.8. Though you should be able to compile it against later versions of both. Python dependencies can be installed with

pip install -r requirements/requirements_torch.txt --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements/requirements.txt

The weights file should be downloaded and unpackaged from:

http://bioinfadmin.cs.ucl.ac.uk/downloads/DMPmetal/metalpred.pt.gz

Usage

The standard usage is:

python pytorch_dmp2e2e_pred.py -i /path/to/file.fasta

Example

python pytorch_dmp2e2e_pred.py -i 5pcy.fasta

Options

  -i INPUT   Specify path to fasta file input. 
  -d DEVICE  Hardware to run on. Options: 'cpu', 'cuda'. default is 'cpu'

Outputs

String output to stdout

By default, DMPmetal returns output to stdout. You can capture this to a file with a typical redirection. A typical output might be:

FASTA HEADER ID, CHEBI CODE, RESIUDE NUMBER, PROBABILITY
PDB|5PCY	CHEBI:25213	11	0.01
PDB|5PCY	CHEBI:23378	37	0.99
PDB|5PCY	CHEBI:29036	37	0.02
PDB|5PCY	CHEBI:60240	37	0.03
PDB|5PCY	CHEBI:24875	37	0.02

Only binding residues that pass a 0.01 cutoff are returned. Metal binding residues are predicted for the following CHEBI codes:

CHEBI:48775   : Cd2+
CHEBI:29108   : Ca2+
CHEBI:48828   : Co2+
CHEBI:49415   : Co3+
CHEBI:23378   : Cu cation
CHEBI:49552   : Cu+
CHEBI:29036   : Cu2+
CHEBI:60240   : divalent metal cation
CHEBI:190135  : di-μ-sulfido-diiron
CHEBI:24875   : iron cation
CHEBI:29033   : Fe2+
CHEBI:29034   : Fe3+
CHEBI:30408   : iron-sulfur cluster
CHEBI:49713   : Li+
CHEBI:18420   : Mg2+
CHEBI:29035   : Mn2+
CHEBI:16793   : Hg2+
CHEBI:49786   : Ni2+
CHEBI:60400   : nickel-iron-sulfur cluster
CHEBI:47739   : NiFe4S4 cluster
CHEBI:29103   : K+
CHEBI:29101   : Na+
CHEBI:49883   : tetra-μ3-sulfido-tetrairon
CHEBI:21137   : tri-μ-sulfido-μ3-sulfido-triiron
CHEBI:29105   : Zn2+
CHEBI:177874  : NiFe4S5 cluster
CHEBI:21143   : Fe8S7 iron-sulfur cluster
CHEBI:60504   : iron-sulfur-iron cofactor
CHEBI:25213   : metal cation