1. Bultrini E, Pizzi E
Linguistic analysis of promoter regions in eukaryotic genomes
Meeting: BITS 2004 - Year: 2004
Topic: Computer algorithms and applications

Abstract: Promoter recognition is one of the most difficult tasks in annotating eukaryotic genomes. Binding sites for transcription factors are very short sequences (5-15 bp) and not very well preserved in sequence. In addition, other signals can be associated with a regulatory region. For instance in vertebrates, some classes of promoters are associated with compositionally characterised regions (CpG islands) and there is also evidence that molecular conformation of human promoters is involved in the transcription activity [1, 2]. Following a previous investigation [3, 4], in the present work we propose a new procedure, based on well established statistical methods, to extract a set of oligonucleotides specifically characterising intron sequences. Partitioning of genomic sequences, based on the accordance to the extracted “introns’vocabulary”, reveals that intergenic DNA appears as a patchwork of different elements. The majority of them adopt the “introns’ vocabulary”, whereas some others (a small percentage) do not. We hypothesise that the identified linguistic property is a sort of “background-noise” of a genome; in this perspective regions that play a functional and/or a structural role have probably to emerge from the background, adopting specific compositional properties. The analysis of promoter sequences for the four examined genomes (C. elegans, D. melanogaster, M. musculus, H. sapiens) appears to confirm our hypothesis, as regions immediately surrounding the transcritpion start site deviate from the introns’vocabulary usage. Furthermore, analyses on C+G composition, bendability propensity and torsional rigidity of promoter sequences are presented.

