BITS Meetings' Virtual Library

BITS Meetings' Virtual Library:
Abstracts from Italian Bioinformatics Meetings from 1999 to 2013

766 abstracts overall from 11 distinct proceedings

1. Ceroni A, Frasconi P
On the Role of Long-Range Dependencies in Learning Protein Secondary Structure
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Prediction of protein secondary structure (SS) is a classic problem in computational molecular biology and one of the first successful applications of machine learning to bioinformatics. Most available prediction methods use feedforward neural networks whose input is the multiple alignment profile in a sliding window of residues centered around the target position. By construction, predictions obtained with these methods are local. Long-range dependencies, on the other hand, clearly play an important role in this problem. In it was proposed the use of bidirectional recurrent neural networks (BRNN) for the prediction of SS. The architecture in this case allows us to process the sequence as a whole and to “translate” the input profile at each position into a corresponding output prediction for that position. Theoretically, the output at any position in a BRNN depends on the entire input sequence and thus a BRNN might actually exploit long-range information. Unfortunately, well known problems of vanishing gradients do not allow us to learn these dependencies. In this paper, we are interested in developing an architecture that can effectively exploit long-range dependencies assuming some additional information is available to the learner. We start from a rather simple intuitive argument: if the learner had access to information about which positions pairs are expected to interact, its task would be greatly simplified and it could possibly succeed. In the case of SS prediction, a reasonable source of information about long-range interaction can be obtained from contact maps (CM), a graphical representation of the spatial neighborhood relation among amino acids. Of course in order to obtain a CM the protein structure must be known. In addition, it is well known that backbone atoms’ coordinates can be reconstructed starting from CMs. Thus, in a sense, using CM information in order to predict SS might appear foolish since most of the information about the 3D structure of the protein is already contained in the map. However, the following considerations suggest that this setting is worth investigation: • Algorithms that reconstruct structure from CMs are based on a potential energy function with many local minima whose optimization is not straightforward. Thus it is not clear that a supervised learning algorithm can actually learn to recover SS from CMs. • CMs can be predicted from sequence or can be obtained from structures predicted by ab-initio methods such as Rosetta. Although accuracy of present methods is certainly not sufficient to provide a satisfactory solution to the folding problem, predicted maps may still contain useful information to improve the prediction of lower order properties such as the SS. • Even if CMs are given, the design of a learning algorithm that can fully exploit their information content is not straightforward. For example, Meiler and Baker have shown that SS prediction can be improved by using information about inter-residue distances. Their architecture is a feedforward network fed by average property profiles associated with amino acids that are near in space to the target position. In this way, relative ordering among neighbors in the CM is discarded. The solution proposed in this paper is based on an extended architecture that receives as an additional input a graphical description of the pairwise interactions between sequence positions. We call this architecture interaction enriched BRNN (IEBRNN). Its details are presented in a longer version of this paper.

BITS Meetings' Virtual Library
driven by Librarian 1.3 in PHP, MySQL^TM and Apache environment.

For information, email to paolo.dm.romano@gmail.com .