1. Ferraro E, Ausiello G, Panni S, Cesareni G, Helmer-Citterich M
Definition of a neural strategy for the prediction of protein interaction specificity
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Abstract: We are working at the development of a neural network strategy for the prediction of peptide recognition specificity by SH3 domains. As a training set we use the results of a large number of SH3-peptide binding experiments obtained by the SPOT synthesis technique (PepSPOT). As input for the neural network, we consider the sequence of both the domain and the hypothetical ligand peptide, in order to infer for each domain peptide combination the likelihood that they form a complex in a binding reaction. The method will be applied to predict the affinity of any peptide for domains of unknown specificity. We analyzed data from PepSPOT experiments for nine SH3 domains each tested against several hundred peptides: we decided to construct a proper dataset where each data point includes the domain and peptide sequence, and a figure in arbitrary BLU units that correlates with binding affinity. In order to translate this information in a format that can be easily captured from a neural network, we focused on three main problems: i) the information coding; ii) the dimension of the input space; iii) the correct identification of the two classes (binding and not binding). We decided to use the orthogonal representation of the sequences and, in order to reduce the huge dimensionality, of the domains residues we only considered those positions that make contact with the ligand peptide. The contact positions are identified from the analysis of the SH3-peptide complexes of known structure and extended to other SH3 domains of known sequence by multiple alignment. For the peptide sequences we restricted our representation to the most significant positions, excluding the two consensus prolines from the input. Finally we identified the binding class considering all the peptides that show spot intensity higher than 10000 BLU units. The resulting dataset was strongly unbalanced and this implies the pursuit of different methodological strategies: usual feed-forward neural networks requires the balancing of the training set, while kernel methods (support vector machine) perform classification even on unbalanced sets but with the correct choice of a non-linear kernel. We will verify the performance of the neural strategy with respect to regular expressions, position weight matrices, position specific scoring matrices (PSSMs) and the SPOT procedure.