MetalDetector v1.0  was only able to predict CYS and HIS bonding state. MetalDetector v2.0  takes a protein chain sequence as input and predicts the number of bound metal ions and, for each ion, CYS and HIS ligands in the sequence. This is not a full 3D characterization of the sites geometry, with angles and distances, but rather the prediction of the coordination relationship between ions and their ligands. Bonding state of each CYS and HIS is predicted in two states (metal-bound or not). When the chain is not known to belong to a metalloprotein, bonding state for CYS is predicted in three states (disulfide-bound/metal-bound/free). Chains with both disulfide bridges and metal binding sites are very rare (less than 3%); in these cases, disulfide-bound CYS are not predicted if the option « known metalloprotein » is selected.
Metaldetector 2.0 predicts binding to transition metals , which make up about 66% of the PDB metallo-chains and include iron and zinc, the two most fundamental ions for cellular functioning. Transition metals usually form coordinate covalent bonds with the protein ligands, showing much higher binding affinities than the electrostatic interactions typical of alkali and alkaline earth metals. Asp and Glu are rarely found in MBS, when compared to their natural frequency of occurrence in proteins. By focusing on CYS and HIS only, we cover about 74% transition metal ligands. For computational efficiency reasons, it is assumed that each ligand binds exactly one ion. This is almost always the case for CYS and HIS.
The underlying methods consist of supervised machine learning algorithms for structured-output prediction. In the first stage, SVM-HMM is used to predict metal bonding state of each CYS and HIS. The algorithm uses dynamic programming to find the best overall (collective) bonding state assignment to all CYS and HIS in the input chain. In the second stage, residues predicted to be ligands are grouped together to form binding sites. This is again achieved by structured-output learning but using an ad-hoc algorithm that exploits the assumption that that each ligand binds exactly one ion. Thanks to this assumption, the maximum-score collective assignment can be obtained exactly using a computationally fast greedy algorithm. An overview of the method can be found in  while the gory algorithmic details are explained in [2,3].
MetalDetector 2.0 was evaluated using a rather stringent criterion: training and testing chains belong to different SCOP folds, to ensure a very low redundancy between training and testing data. Prediction performance was evaluated using several measures:
In the table below, m* is the true number of metal binding sites and N the number of chains in the data set.
The following table gives a breakdown of results on most frequent metal ions. N is the number of chains containing at least one ion of that type.
A standalone version of MetalDetector 2.0 prediction software be downloaded here. The software has been only tested on Linux.
The complete list of the data sets used for MetalDetector 2.0 evaluation can be downloaded here.