BITS Meetings' Virtual Library

BITS Meetings' Virtual Library:
Abstracts from Italian Bioinformatics Meetings from 1999 to 2013

766 abstracts overall from 11 distinct proceedings

1. Toppo S, Fontana P, Velasco R, Valle G, Tosatto SCE
FOX (FOld eXtractor): A novel protein fold recognition method using iterative PSI-BLAST searches and structural alignments
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: We present a novel fold recognition method based on the combination of detailed sequence searches and structural information. Presently the protocol implements two different approaches to assign the correct fold to the target protein sequence: the first is based on database secondary structure search and the second is based on iterative database sequence search. In the first phase a secondary structure prediction of the target is performed and based on the ConSSPred protocol. This prediction is used to search for hits against a database of known secondary structures extracted from PDB (using DSSP). The search is based on a two-step strategy: the first step is based on a Smith-Waterman local secondary structure similarity search with a specific substitution matrix optimized for secondary structure alignment. The second is based on a global alignment based on SSEA (Secondary Structure Element Alignment), as implemented in our program MANIFOLD, to refine the score and the alignment itself in the region extracted from the first step. At the end of the first phase a list of hits that share a similar secondary structure topology with the target sequence is extracted. The second phase is based on a modified protocol for scanning the sequence database called SENSER. In the beginning of the second phase, BLASTP is used to scan the target sequence against the NR database. These initial hits are clustered to reduce sequence bias and a seed alignment with 20 or fewer sequences generated. This step ensures that PSI-BLAST can be jump-started with a more sensitive initial profile, increasing its sequence diversity. PSIBLAST is run for four iterations (e-value inclusion threshold 10e-3) on the NR60 database of known sequences. NR60 is produced by applying the CD-HIT algorithm to cluster the NR database at 60% sequence identity. Sequences producing NR60 hits with the query are assigned either to the significant sequence space (e-value <= 10e-3) or the trailing end (e-value <= 10) for further use. The profile is used to search the PDBAA database of sequences with known structure. If a significant PDBAA hit (e-value <= 10) is found, the protocol proceeds to the back-validation step (see below). If no significant hit is found, or the hit does not back-validate, a new PSI-BLAST search, using the above "4+1" protocol on NR and PDBAA, is started for the highest ranking sequence (i.e. lowest e-value) in the significant sequence space. Sequences from NR60 matching the query are also assigned to either the significant sequence space or the trailing end. Significant PDBAA hits are again submitted to back-validation. If no significant PDBAA hit is recorded and the significant sequence space has been exhausted, then the protocol uses the trailing end sequences as additional starting points for PSI-BLAST searches. In contrast to previous sequences, which were assumed to be similar enough to the target to imply homology, these sequences are submitted to back-validation before proceeding to the "4+1" PSIBLAST protocol. The back-validation step consists in using PSI-BLAST to find the target starting from a different query sequence, found as described above. I.e. due to the asymmetric nature of PSI-BLAST, if sequence A finds sequence B it is not always the case that B also finds A. Sequences that back-validate are more likely to be correct hits. Once a sequence from PDBAA back-validates and its secondary structures is compatible with the one of the target sequence as found in the first phase, the protocol builds a target to template alignment and stops. The procedure described so far serves to identify a template structure for the target sequence. In order to produce an accurate alignment, HMMER is used to build a hidden Markov model (HMM) based on the HOMSTRAD sequence alignment. The target is then aligned to the template using this HMM. Preliminary results for the method indicate a clear increase in both detection rate and alignment accuracy for distantly homologous sequences. Presently FOX has been tested on Fischer-68 test set to compare its performance with standard PSI-BLAST searches, GenTHREADER and the original SENSER protocol. As expected the introduction of the secondary structure prediction of the protein target and the database secondary structure searches in the first phase have increased detection sensitivity and sensibility of the method compared to profile based searches as PSI-BLAST and SENSER protocol (Fig. 1). The performance is comparable to GenTHREADER showing that right template structure is always found in the top 50 hits as shown in Fig. 1. Further score optimization and development are required to definitely test the entire protocol and make the program available as a web-based server from our group's web site (http://protein.cribi.unipd.it/).

BITS Meetings' Virtual Library
driven by Librarian 1.3 in PHP, MySQL^TM and Apache environment.

For information, email to paolo.dm.romano@gmail.com .