1. Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M, Cameron S, Martin DMA, Ausiello G, Brannetti B, Costantini A, Zanzoni A, Maselli V, Via A, Cesareni G, Diella F, Superti-Furga G, Wyrwicz L, Ramu C, McGuigan C, Gudavalli R, Letunic I, Bork P, Rychlewski L, Kuster B, Helmer-Citterich M, Hunter WN, Aasland R, Gibson TJ
Eukaryotic Linear Motifs in the ELM Web Tool
Meeting: BITS 2004 - Year: 2004
Topic: Unspecified

Abstract: Reflecting the modular nature of eukaryotic proteins, several WWW servers (e.g. PFAM, SMART, PROSITE) are dedicated to revealing domains in protein sequences. However, there is no resource, which specifically focuses on short functional motifs (targeting peptides, docking modules, glycosylation sites, phosphorylation sites, etc), yet these modules are just as important for function as the larger protein domains. Domains are identified by conventional methods, such as patterns (regular expressions) profiles or HMM models. But statistically robust methods cannot usually be applied to small motifs, while pattern-based methods over-predict enormously so that the few true motifs are lost amongst the many false positives. ELM (Eucariotic Linear Motifs - [1] is a new web based tool for the prediction of these small motifs on eukaryotic protein sequences. At the moment, the ELM database contains manually curated information about 114 known linear motifs in the form of regular expressions, profiles or hidden markov models that identify the motifs on the sequence. ELM addresses the over prediction deficiency of other methods by the use of context-based rules and logical filters that exclude false positives. The current version of the ELM server provides core functionality including filtering by cell compartment, phylogeny, globular domain clash (using the SMART/Pfam databases), secondary structure, and solvent accessibility. The current set of motifs is not at all exhaustive. Filters work by comparing the information on the motifs stored in the db (taxonomic, structural and cellular context) with the information submitted by the user together with his sequence. The structural filter works by automatically modeling the submitted protein sequences, whenever a good template is found in the SCOP database, and comparing predicted solvent accessibility values and secondary structure features with the corresponding values associated to ELM matches on true positive structures. The ELM server was launched on November 2002 and regularly enhanced since then. The server activity has been running for several months at > 45,000 hits from > 1700 unique internet sites.

