1. Guffanti A, Banfi S, Borsani G, Simon G
Strategies and Tools for EST Data Mining
Meeting: BIOCOMP 1999 - Year: 1999
Full text in a new tab
Abstract: Expressed Sequence Tags (ESTs) constitute an important source of information for laboratories interested in the identification of novel gene sequences. We developed bioinformatic strategies and tools which rely on dbEST sequence data mining in order to support the effort of disease gene identification both at TIGEM and worldwide. One example of systematic human EST analysis is the DRES (Drosophila Related Expressed Sequences) project. As a starting strategy, we applied the power of Drosophila genetics to identify novel human genes of high biological interest. Sixty-six human cDNAs (called DRES clones) showing significant homology to Drosophila mutant genes were identified by screening dbEST with keywords, and their map position was determined experimentally. Based on this approach we developed the "DRES Search Engine", a tool for the systematic identification of human cDNAs homologous to Drosophila genes through an automated sequence database searching procedure. The homepage of the DRES project is at the WWW address http://www.tigem.it/LOCAL/drosophila/dros.html. Other tools of interest to the researchers interested in maximizing the information associated with a single cDNA sequence are freely available at the WWW address http://www.tigem.it/LOCAL/sequtils.html : - the "In Situ Blast" server performs a library-specific (and consequently tissue-specific) Blast search against one or more given cDNA libraries belonging to the UniGene EST cluster database; - the "UniBlast" server performs a local Blast search against the UniGene database or against UniNewGene, a locally generated version of UniGene devoid of all the clusters containing an already known mRNA or coding sequence; - the "EST Assembly Machine" and the "EST Extractor" will build sequence contigs (corresponding to "virtual transcripts") from the UniGene EST cluster database or from dbEST respectively, starting from a sequence Accession Number or a plain DNA/Protein sequence. This procedure extends a given cDNA sequence information through repeated cycles of sequence comparison, ideally providing the sequence of a full-length transcript starting from a single query sequence.