BITS Meetings' Virtual Library:
Abstracts from Italian Bioinformatics Meetings from 1999 to 2013

766 abstracts overall from 11 distinct proceedings

1. Milanesi L, Rogozin I, Rizzi R
Application of ESTs mapping to improve gene prediction methods
Meeting: BIOCOMP 2000 - Year: 2000
Full text in a new tab
Topic: Sequence analysis

Abstract: Prediction of protein-coding genes in newly sequenced DNA becomes very important in large genome sequencing projects. These problems are complicated due to exon-intron of the eukaryotic genes. Currently existing collections of expressed sequence tags (ESTs) are very large and thus very useful for gene mapping. Gene identification in the newly-discovered DNA sequences is an important problem in current molecular biology studies. A number of programs have been developed for predicting the protein coding genes. The most common approach is based on the combination of the potential functional signals with global statistical properties of protein coding regions. Another approach for gene structure prediction is based on the homology detection throughout the databases of nucleotide or amino acid sequences. By using the information available on homologous protein sequences, it is possible to significantly improve the accuracy of gene structure prediction. Currently existing collections of expressed sequence tags (ESTs) are very large and can be very useful for gene mapping. Homology searches against the EST Division of GenBank (dbEST) and Unigene database can be used for this purpose. ESTs (Expressed Sequence Tags) offer a rapid route to gene identification (Adams, et al, 1991, Adams , et al, 1992), analysis of expression and regulation data, and can highlight multigene family diversity and gene alternative splicing). EST matches may identify more than half of the known human genes (Hillier et al, 1996). The price of the high-volume and high-throughput nature of the data, however, is that ESTs contain high error rates (Aaronson, et al 1996), do not have a defined protein product, are not well annotated and present only a raw substrate for sequence matching. The ESTMAP system involves the following procedures: 1) Repeat masking. The repeated elements (for example, the human Alu elements) can be automatically masked in a query sequence before the homology search. Homology searches against the collection of repeated element (Jurka et al., 1992) are used for repeats detection. We implemented a program REPEAT for that purpose. A censored sequence (with 'N's instead of repeated elements) is automatically produced by REPEAT. 2) Homology searches. BLASTN (Altschul et al. 1990) is used for homology searches of the censored query sequence against the EST Division of GenBank (dbEST) and the Unigene database of sequences (www. This step is most time-consuming since these EST datsets are very large. 3) EST mapping. The BLASTN output is used as input information by a EST_GENE program. Information about an EST sequence is used only when the similarity between the EST sequence and the query sequence is greater then 95%. The module EST_GENE is also able to predict the introns in DNA comparing ESTs and a query sequence based on the alignment method suggested by Huang (1994) (a linear-space divide-and-conquer strategy). The GT/AG splicing sites rule is used by EST_GENE, however non-canonical splicing signals (Milanesi and Rogozin, 1998) can also be predicted in cases of unambiguous alignment. 4) Output of results. The graphical visualization of the results is particularly important for the analysis of alternative splicing in a query sequence. By using a Java based graphical interface the user can visualize the EST maps and the sequence pattern of predicted features. Homology searches are very important for functional mapping, homology with a known functional region can suggest the function of a query sequence. In particular, when the homologous protein sequence is already known and EST matches are detected, then the gene structure can be reconstructed with high accuracy. Information about EST matches is automatically used by the GeneBuilder system (Milanesi et al., 1999). Acknowledgment This work was supported by Italian CNR Genetic Engineering Project

BITS Meetings' Virtual Library
driven by Librarian 1.3 in PHP, MySQLTM and Apache environment.

For information, email to .