1. Corà D, Herrmann C, Dieterich C, Di Cunto F, Provero P, Caselle M
Identification of human transcription factor binding sites by comparative genomics.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Comparative genomics
Abstract: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors (TF) which typically bind to specific, short DNA sequence motifs which are usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the identification of these cis-regulatory motifs based on human-mouse genomic comparison. By using the catalogue of conserved upstream sequences collected in the CORG database  we construct sets of genes sharing the same overrepresented motif in their upstream regions both in human and in mouse. We perform this construction for all possible words from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence for coregulation: first, we analyse the Gene Ontology annotation of the genes in the set looking for statistically significant common annotation; second, we analyse the expression profiles of the genes in the set as measured by microarray experiments, looking for evidence of coexpression. The sets which pass one or both these filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation. In this way we find various known motifs (which we use to validate our approach) and also some new candidate binding sites.