1. Pavesi G, Stefani M, Mauri G, Pesole G
An algorithm for finding regulatory sequences of homologous genes
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Computer algorithms and applications
Abstract: One of the greatest challenges in modern molecular biology is the identification and characterization of the functional elements regulating gene expression. Two of the most important elements are transcription factors (TFs), and the sites of the genome where they can bind (TFBSs). The TF-DNA interactions, that are responsible for the modulation of gene transcription, are at the basis of many critical cellular processes, and their malfunction often involves the onset of genetic diseases. TFBSs are located either near the transcription start site of a gene (usually within 500-1000 bps), or alternatively at very large distance (often several kilobases) from it, either upstream or downstream. When the regulation of a single gene is investigated, the idea is to increase the signal/noise ratio by comparing its flanking regions (upstream and/or downstream) with homologous genome regions of the same or other organisms at different evolutionary distances. Those parts of the regions that are more conserved throughout the different species are more likely to have been preserved by evolution for their function, and thus could be (or contain) TFBSs. Most of the methods introduced so far first build a global alignment of the sequences (some pairwise, some multiple), and report the most conserved parts of the alignment (with or without further processing, for examples by looking for known TFBSs instances in them). While this approach can produce good results, since a highly conserved region can be a good candidate for a regulatory activity, some experiments have shown that real TFBSs are often mis-aligned, and fall outside the “best regions” of the alignment (that, anyway, becomes computationally problematic for long regions, especially in the case of multiple comparisons). In this work we present an algorithm that does not require a global alignment of the sequences, nor needs to be supported by matrices or instances of known TFBSs in order to detect potential regulatory motifs.