METAL DETECTOR v2.0 - Cysteines and Histidines Bonding State Predictor

What's new

MetalDetector v1.0 [5] was only able to predict CYS and HIS bonding state. MetalDetector v2.0 [1] takes a protein chain sequence as input and predicts the number of bound metal ions and, for each ion, CYS and HIS ligands in the sequence. This is not a full 3D characterization of the sites geometry, with angles and distances, but rather the prediction of the coordination relationship between ions and their ligands. Bonding state of each CYS and HIS is predicted in two states (metal-bound or not). When the chain is not known to belong to a metalloprotein, bonding state for CYS is predicted in three states (disulfide-bound/metal-bound/free). Chains with both disulfide bridges and metal binding sites are very rare (less than 3%); in these cases, disulfide-bound CYS are not predicted if the option « known metalloprotein » is selected.

Limitations

Metaldetector 2.0 predicts binding to transition metals [6], which make up about 66% of the PDB metallo-chains and include iron and zinc, the two most fundamental ions for cellular functioning. Transition metals usually form coordinate covalent bonds with the protein ligands, showing much higher binding affinities than the electrostatic interactions typical of alkali and alkaline earth metals. Asp and Glu are rarely found in MBS, when compared to their natural frequency of occurrence in proteins. By focusing on CYS and HIS only, we cover about 74% transition metal ligands. For computational efficiency reasons, it is assumed that each ligand binds exactly one ion. This is almost always the case for CYS and HIS.

Method

The underlying methods consist of supervised machine learning algorithms for structured-output prediction. In the first stage, SVM-HMM is used to predict metal bonding state of each CYS and HIS. The algorithm uses dynamic programming to find the best overall (collective) bonding state assignment to all CYS and HIS in the input chain. In the second stage, residues predicted to be ligands are grouped together to form binding sites. This is again achieved by structured-output learning but using an ad-hoc algorithm that exploits the assumption that that each ligand binds exactly one ion. Thanks to this assumption, the maximum-score collective assignment can be obtained exactly using a computationally fast greedy algorithm. An overview of the method can be found in [1] while the gory algorithmic details are explained in [2,3].

Accuracy

MetalDetector 2.0 was evaluated using a rather stringent criterion: training and testing chains belong to different SCOP folds, to ensure a very low redundancy between training and testing data. Prediction performance was evaluated using several measures:

In the table below, m* is the true number of metal binding sites and N the number of chains in the data set.

The following table gives a breakdown of results on most frequent metal ions. N is the number of chains containing at least one ion of that type.

Authors

MetalDetector 2.0 was developed by Andrea Passerini, Marco Lippi and Paolo Frasconi

Software

A standalone version of MetalDetector 2.0 prediction software be downloaded here. The software has been only tested on Linux.

Data set

The complete list of the data sets used for MetalDetector 2.0 evaluation can be downloaded here.

References

For MetalDetector v2.0 please cite
[1] A. Passerini, M. Lippi, P. Frasconi (2011). MetalDetector v2.0: Predicting the Geometry of Metal Binding Sites from Protein Sequence. Nucleic Acids Research Web Server Issue. Vol. 39 (suppl 2): W288-W292.
The binding site prediction algorithm used by MetalDetector v2.0 is described in
[2] A. Passerini, M. Lippi, P. Frasconi (2011). Predicting Metal Binding Sites from Protein Sequence. IEEE/ACM Transactions on Computational Biology and Bioinformatics. PrePrint.
The initial version of the algorithm is described in
[3] P. Frasconi and A. Passerini (2009). Predicting the Geometry of Metal Binding Sites from Protein Sequence. In Advances in Neural Information Processing Systems 21, pp. 465-472. Download PDF
The use of Metaldetector v1.0 for leveraging X-ray Absorption Spectroscopy experimental metalloprotein annotations is described in
[4] Shi W, Punta M, Bohon J, Sauder JM, D'Mello R, Sullivan M, Toomey J, Abel D, Lippi M, Passerini A, Frasconi P, Burley SK, Rost B, Chance MR. (2011). Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy. Genome Research 21:898-907.
The MetalDetector v1.0 paper:
[5] M. Lippi, A. Passerini, M. Punta, B. Rost, P. Frasconi. MetalDetector: a web server for predicting metal binding sites and disulfide bridges in proteins from sequence (2008). Bioinformatics. 24(18):2094-2095. Download PDF
Metal Ligand Predictor:
[6] A. Passerini, M. Punta, A. Ceroni, B. Rost, and P. Frasconi (2006). Identifying Cysteines and Histidines in Transition-Metal-Binding Sites Using Support Vector Machines and Neural Networks. Proteins: Structure, Function, and Bioinformatics 65(2):305-316. Download PDF
For DISULFIND see also:
[7] A. Ceroni, A. Passerini, A. Vullo and P. Frasconi (2006). DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server, Nucleic Acids Research, 34(Web Server issue):W177--W181. Download PDF