BITS Meetings' Virtual Library:
Abstracts from Italian Bioinformatics Meetings from 1999 to 2013


766 abstracts overall from 11 distinct proceedings





Display Abstracts | Brief :: Order by Meeting | First Author Name
1. Burgarella S, Cattaneo D, Pinciroli F, Masseroli M
MicroGen, a Web based system for microarray experiment management
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Improvements of bio-nano-technologies and biomolecular techniques have led to increasing production of high-throughput experimental data. Spotted cDNA microarray is one of the most diffuse technologies, used in single research laboratories and in biotechnology service facilities. Although they are routinely performed, spotted microarray experiments are complex procedures entailing several experimental steps and actors with different technical skills and roles. During an experiment, involved actors, who can also be located in a distance, need to access and share specific experiment information according to their roles. Furthermore, complete information describing all experimental steps must be orderly collected to allow subsequent correct interpretation of experimental results. To satisfy such requirements, we developed MicroGen, a Web based system for managing information and workflow in the production pipeline of spotted microarray experiments. Our aim was to realize a multi-database system able to store all data completely characterizing different spotted microarray experiments according to the Minimum Information About Microarray Experiments (MIAME) standard, and to support the collaborative work required among multidisciplinary actors and roles involved in microarray experiment production.

2. Ciccarelli FD, Von Mering C, Suyama M, Harrington ED, Izaurralde E, Bork P
Formation and evolution of primate-specific gene functions
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Orthologous genes that maintain a single copy status in a broad range of species may indicate a selection against gene duplication. If this is the case, then duplicates of such genes which do survive may have escaped the dosage control by rapid and sizable changes in their function.

3. Chiappori F, Ferrario MG, Gaiji N, Fantucci P
Docking of estrogen and genistein like molecular library on Estrogen Receptor alpha and beta
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Estrogen replacement therapy is one of the most efficient treatment of menopausal symptoms. Unfortunately, despite the benefits of estrogenic therapy evidence exists of increasing number of cases of reproductive tissue cancer. This lead to a strong interest in discovering new compounds that display the benefits of estrogens avoiding such risks. We decided to apply a virtual high throughput screening, based on docking simulations, for the identification of new possible selective receptor compounds.

4. Catalano D, Licciulli F, Turi A, Grillo G, D'Elia D
MitoRes: a bio-sequences resource of nuclearly encoded mitochondrial genes and products in metazoa
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The incredible explosion of “knowledge production” in Biology in the past two decades has created a critical need for bioinformatic instruments able to manage data and facilitate their retrieval and analysis. Molecular sequences and biological data on nuclear mitochondrial genes and their products are publicly available from a wide variety of mitochondrial specialized databases. Some are species-specific, mainly human dedicated, others contain only few species and for most of them the only sequences data reported concern proteins. We have developed MitoRes database to collect and integrate information on nuclearly encoded proteins and genes targeting the mitochondrion for all metazoan species and to provide a flexible and efficient tool for the export of bio-sequences in support of researchers interested in functional characterization of gene, transcript and amino acid sequences related to biogenesis, metabolism and pathological dysfunctions of mitochondria.

5. Castrignanò T, Talamo IG, Grillo G, Licciulli F, Gisel A, Liuni S, Mignone F, Pesole G
CSTgrid: a high performance environment for searching "Conserved Sequence Tags"
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The explosive growth of the biological data, stimulated by genome projects, has generated a parallel development of efficient computational approaches suitable for several biological research projects. In this area the need of high performance computing is growing, though usually not affordable by computational resources of a single research laboratory. Grid computing addresses this problem by coordinating and unificating several computational resources. To face the problem of searching "Conserved Sequence Tags" (CSTs) between an input DNA sequence, and several whole model genomes a grid framework can provide high performance, high availability and can fairly handle hundreds of concurrent request. Because the size of several whole genomes now exceed the memory capacity of a single machine, it is necessary to spread the search across multiple distributed working hosts to achieve high performance. This also improves the high availability, since the redundancy of the services increases the tolerance to both machine and network failures. This system also guarantees that the same services can be completed by many machines, reaching the ability to perform more requests that a single machine can handle.

6. Casadio R, Fariselli P, Martelli PL
How many membrane proteins in the Human Genome?
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Within the Biosapiens network of excellence (EC Framework VI), the Biocomputing Group of the Bologna University installed a DAS server in a pipeline connected to the EBI. Our task in collaboration with Gunnar von Hejne (Stockholm Bioinformatics Center, SCFAB, Stockholm University, Sweden), Gert Vriend (CMBI University of Nijmegen, the Netherlands) and David Jones (Bioinformatics Unit, University College London, United Kingdom) is the large scale screening of the human genome in order to annotate membrane proteins based on topology prediction of chains.

7. Caprera A, Lazzari B, Vecchietti A, Stella A, Milanesi L, Pozzi C
The ESTree db as an engine for peach EST related information retrieval
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. With this aim, the db has been structured to be a repository of information and links related to the sequences and to provide a userfriendly interface to allow easy querying of all the db fields. Within the month of March 2005 the third release of ESTree will be online and will include 18630 sequences, encompassing 8 libraries at different peach fruit developmental stages. A second version of the db, including only the 6155 sequences produced by the FPTP-CERSA group will also be online. The major data resources that are included in the ESTree db are: annotation both with BLASTx versus the NCBI nr database and with BLASTx versus the GO viridiplantae subset of sp-trembl; contig assembly and display; SNP analysis with AutoSNP and links to the KEGG metabolic pathways and the enzyme entries of NiceZyme (Expasy). Gene Ontology statistics are also presented both for the whole set of sequences that are included in the db and for library-specific subsets.

8. Campagna D, Romualdi C, Vitulo N, Favero M, Lexa M, Cannata N, Valle G
RapH and RapD: two indexes designed for de novo identification of repeats in whole
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The identification of repeats is an essential step for genome analysis and annotation, but is not easy because repeats tend to be little conserved during evolution. This particular aspect of repeats makes very difficult the identification of homologous sequences that diverged significantly, both within the same genome and between genomes of different organisms.

9. Callegaro A, Spinelli R, Battaglia C, Caristina L, Cenzuales S, Beltrame L, Bicciato S
Novel algorithm for automated genotyping of real time data
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: A set of over 3 million putative single Nucleotide Polymorphism (SNPs) are now available for disease association studies, eventually replacing the current RFLP and microsatellite (STRP) linkage analysis screening sets. Many different technological platforms are available for allelic discrimination and, among them, Taq Man chemistry employing 5’ nuclease allelic discrimination is one of the most widely diffused. Each Taq Assay employs two allelic specific probes for each SNPs, commonly VIC or FAM dyes. Although technological advancements in assay design and application allows monitoring SNPs at a high-throughput rate with this technology, the allele calling procedure of any single sample still requires the manual intervention of an expert operator for adjustment of SNP-dependent thresholds, signal selection, and quality revision. Usually the genotyping is calculated at the end of the amplification process resulting in an end point analysis. To enable the genotyping of multiple SNPs in several different samples we developed an algorithm for the automatic allele-calling from real time data.

10. Luchini A, Callegaro A, Bicciato S
Analysis of unreplicated time-course microarray experiments
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Since transcriptional control is the result of complex networks that interpret a variety of inputs, analyzing dynamical states of gene expression is of paramount importance to detect the multivariate nature of complex biological mechanisms. Although hundreds of studies fully demonstrated the relevancy of microarrays in describing the transcriptional status of different physiological conditions, to access and reconstruct complex interaction pathways it is necessary to analyze the temporal evolution of transcriptional states. However, an appropriate experimental design to accurately identify differentially expressed genes over a meaningful temporal window would require large amounts of microarrays and computational procedures able to assess the correlation structure among data at different time points. Unfortunately, replicates for each time point and experimental condition are not always available, because of cost limitations and/or biological sample scarcity, while common data analysis tools, e.g., ANOVA, do require replicates.

11. Bilardi A, Campagna D, Campanaro S, Cestaro A, Levorin F, Vitulo N, Vezzi A, Valle G, Cannata N
Quest for rho dependent terminators in prokaryotic genomes
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: In prokaryotes are known two kinds of transcription terminators that are distinguished by their mechanisms and DNA sequences. When RNA polymerase encounters an intrinsic terminator (RIT), it can release the nascent RNA spontaneously, but when it encounters a Rho dependent terminator (RDT), the release of the RNA depends on the action of a protein factor called Rho. RDT are involved in the gene expression as attenuators in the leader or intra operon and as terminators at the end of operons. A RDT consists of three distinct parts, which together extend over 150-200bp of DNA (figure 1). The upstream part, called the Rho utilization (rut) site, encodes a segment of the nascent transcript to which Rho can bind and is essential for starting termination. The central part, that we called Rho activity (rac) sequence, is the second mRNA binding site to which Rho can bind and is essential for helicase/traslocation activity. The downstream part, called the transcription stop point (tsp) region, is where RNA polymerase pauses during elongation in the absence of Rho. In the literature is present only a small number of studies of single RDT, very little is know about their structure and sequence and is not existing any in silico predictive method.

12. Berardi M, Malerba D, Marinelli C, Leo P, Loglisci C, Scioscia G
A text-mining application able to mine association rules from biomedical texts
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Collecting, analyzing and extracting useful information from a very large amount of biomedical texts is a difficult task for researchers in biomedicine who need to keep up with scientific advances. Nowadays several domains in medical practice, drug development, and health care require support for such actives such as bioinformatics, medical informatics, clinical genomics, and many other sectors. Moreover, for this particular task, the data to be examined (i.e. textual data) are generally unstructured as in the case of Medline abstracts and the available resources (e.g. PubMed) and as many other textual resources such as medical records, patents etc. and they do not still provide adequate mechanisms for retrieving the required information as well as to help humans in “deeply analyse” very large amount of content. In this work we present a Text-Mining framework aiming to support biomedical researchers in the task of disease-genes relationships identification from scientific abstracts retrieved by querying Medline.

13. Berardi M, Attimonelli M, Cascione I, Santamaria M, Accetturo M, Lascaro D, Berardi M, Ceci M, Loglisci C, Malerba D
A data mining approach to retrieve mitochondrial variability data associated to clinical phenotypes
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The maintenance of biological databases is at present a problem of great interest since the progress made in many experimental procedures has led to an ever increasing amount of data. These data need to be structured and stored in databases and made accessible to the biological community in user-friendly ways. Although both the interest and the need of accessing biological databases are high, the mechanisms to fund their maintenance are unclear. Funding agencies cannot support data annotation in terms of labour costs and hence the development of new tools based on “data miming” technologies could greatly contribute to keep biological databases updated. Here we present a new approach aimed to contribute to the annotation in the HmtDB resource (http://www.hmdb.uniba.it/) of variability data associated to clinical phenotypes [1]. These data are prevalently available in literature where they are reported in a completely free style. Thus, we suggest the construction of a knowledge base derived from browsing papers on web and to be used in the retrieval phase. Nevertheless, problems in extracting data from literature come not only from the heterogeneity of presentation styles but mainly from the unstructured format (i.e. the natural language) in which they are represented. In this scenario, the goal is to feed a knowledge base by identifying occurrences of specific biological entities and their features as well as the particular method and experimental setting of the scientific study adopted in the publication. In this work, we describe some solutions to the problem of structuring information contained in scientific literature in digital (i.e., pdf) or paper format.

14. Baldacci L, Capozzi F, Golfarelli M, Lumini A, Rizzi S, Turano M
Protein classification by surface analysis
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Divining the functional role of proteins in life processes, starting from their structural features, is the challenging purpose of structural biology. In this context, SCOP and CATH databases are the starting point for scientist aiming at discovering the relationships between function and molecular structure. The classification criteria adopted are based on the topological architecture of the molecule, thus the proteins are clustered by finding local folding motifs which are repeatedly encountered in the protein data bank. It is an assumption largely accepted that the overall structure deserves the appropriate distribution of chemical properties on the surface that the protein presents to its molecular target. Thanks to its “appearance” the protein could have the correct approach with the target or not. However, two proteins with similar structures may be divergent in their sequence, thus playing different functions. The structural classification as a tool for the individuation of a common physiological role is misleading in this case and a surface classification is more and more reputed necessary. Up to now, a successful strategy has not yet developed to reach such a general goal: the most exploited approach consists in adopting a surface patch already recognized to play a key role for a certain function and use it as a probe to explore the surfaces of all the proteins with known structure. This method, based on local features, will fail in all cases characterized by highly adaptable surfaces and when portions of surface far from the active site have a dominant allosteric effect on the functional region of the protein. In this paper we propose an original approach to classification of proteins based on their surface characteristics, which has the advantage of being not based on local surface features neither on already known functional meanings.

15. Armano G, Mancosu G, Orro A, Saba M, Vargiu E
MASSP: A hybrid genetic-neural system for predicting protein secondary structure
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Being the prediction of protein structure a very complex task, most methodologies concentrate on the simplified task of predicting secondary structures. In this paper, we illustrate a technique based on multiple experts, aimed at predicting protein secondary structures. The prediction activity results from the interaction of a population of experts, each integrating genetic and neural technologies. Roughly speaking, an expert of this kind embodies a genetic classifier designed to control the activation of a feedforward artificial neural network for performing a locally-scoped prediction activity. Genetic and neural components (i.e., guard and embedded predictor, respectively) are devoted to perform different tasks and are supplied with different information: Each guard is aimed at (soft-)partitioning the input space, insomuch assuring both the diversity and the specialization of the corresponding embedded predictor, which in turn is devoted to perform the actual prediction.

16. Arisi I, Roncaglia P, Cascante M, Cattaneo A
The SYMBIONIC project: coordinating a neuronal cell simulation initiative with ongoing EU-wide Systems Biology programs
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The neuronal cell represents a very fascinating and complex system. In addition to basic processes common to all types of eukaryotic cells such as gene transcription, protein synthesis, and metabolism, neurons are electrically excitable and able to receive and propagate excitation via thousands of synaptic contacts. Investigating neuronal functioning requires a crossdisciplinary approach, involving on the one hand quantitative experimental methods to study excitatory processes, large scale molecular networks and the kinetics of protein-protein interactions, and on the other hand computational modeling of intra-cellular processes and, as far as the synaptic transmission is concerned, inter-cellular communication. Thus, an integration of different experimental and modeling approaches is crucial for a comprehensive description of the cell and for a complete biological understanding of neuronal behaviour. Several kinds of expertise are required in order to cope with the great heterogeneity of cellular events that must be investigated and described by computational models in such a comprehensive view: molecular biology, biophysics, mathematics. But it is bioinformatics that plays a pivotal role in extracting information from the huge amounts of data stemming from recent “-omics” research, and in devising ways to integrate such diverse bits of knowledge in order to attain a truly systemic view of cells. Several Systems Biology initiatives are underway worldwide. These are usually large consortia based in the U.S.A. or Japan. A few projects have recently been established in Europe, but none devoted to the study of neurons. The European project SYMBIONIC is a Specific Support Action aimed at establishing a European-wide initiative in the field of the Systems Biology of the neuronal cell. The long-term aim of the project is to contribute to exhaustive in silico models of the neuron. Currently, the activity of SYMBIONIC is mainly focused on the training and dissemination area and on the coordination of this project with other European initiatives in the field of Systems Biology. The project partners are Lay Line Genomics (a biotech company based in Rome that is also the project coordinator), SISSA in Trieste and the University of Barcelona. About 20 other European institutions and industries actively collaborate with the project. The main objectives of SYMBIONIC are the following: 1) To disseminate ideas and techniques among young scientists, also through a training program, in order to create a new generation of specialists in the area of Neuronal Systems Biology; 2) To consolidate among researchers a view of the neuronal cell as a complex ensemble in terms of Systems Biology; 3) To increase the consciousness of the European scientific community that there is a relevant void to be filled, both in terms of scientific topics and available crossdisciplinary expertise, in the study of neuronal cells as a complex system, and in the development of relevant research strategies; 4) To establish contacts and collaborations with other European groups which are carrying out projects in the field of Systems Biology, even if related to other cellular systems. Currently, the main collaboration is with EUSYSBIO, another EU-funded initiative on Systems Biology.

17. Andersen C, Magnoni L, Roncarati R, Diodato E, Raggiaschi R, Kremer A, Terstappen GC
The Amyloid-beta toxicity core pathway
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Pathway models or protein-protein interaction networks are excellent tools for the drug discovery process. They can be used to identify and select relevant targets to test a disease hypothesis. Combining information from diverse sources (in house experiments as well as literature) potentially transforms protein-protein interaction networks into detailed descriptions of cellular pathways. Interactive diagrams allow the linking of data directly onto the pathway and in this way can be used to integrate all relevant data regarding a protein or pathway entry into one framework. This not only leads to a better understanding of the biological mechanisms of normal and disease processes but also enable scientists to compare data from diverse experimental areas.

18. Ancona N, Maglietta R, D'Addabbo A, Liuni S, Pesole G
Models for cancer classification by gene expression data
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The advent of the technology of DNA microarrays constitutes an epochal change in the study, treatment, analysis, classification and discovery of different types of cancer. The information provided by DNA microarrays allows of approaching to the problem of cancer diagnosis and treatment from a quantitative rather than qualitative point of view.

19. D'Agostino N, Aversano M, Chiusano ML
ESTCLASS: a pipeline for EST data analysis using parallel computing
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The vast amount of expression sequence tags (EST) data in the public databases provides an important resource for comparative and functional genomic studies and, moreover, represents a useful information for a reliable annotation of genomic sequences. ESTs are short and errorprone sequences often including artifactual contamination, nevertheless they represent the framework from which fundamental information for both bioinformatic and experimental analysis is derived. Therefore, EST data need suitable preprocessing to provide high quality information. Because of the advances in biotechnologies, large EST datasets are daily released to the scientific community and fast and efficient bioinformatic approaches are the first step to support these data management and analysis. Therefore the requirements for this kind of analysis are suitable computational tools based on advanced technologies. We describe here a pipeline analysis for ESTs clustering, assembling and annotation by parallel computing that optimizes execution time for the processing of large data sets.

20. Lavorgna G, Triunfo R, Santoni F, Orfanelli U, Noci S, Bulfone A, Zanetti G, Casari G
AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). Also, there is a growing evidence to suggest that antisense transcription might have a key role in a range of human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output.

21. Kovaleva G, Bazykin G, Brudno M, Gelfand M
Conservation rate of transcription factor binding sites in Saccharomyces genomes
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Extracting the complete functional information encoded in a genome, including genic, regulatory and structural elements, is a central challenge in biological research. Prediction of non-protein-coding functional regions, such as regulatory elements, is especially difficult because they are usually short (6-15 bp for S.cerevisiae and many other eukaryotic genomes), often degenerate, and can reside on either strand of DNA at variable distances from the genes they control. Since functional sequences tend to be conserved through evolution, they can appear as ‘phylogenetic footprints’ in alignments of genome sequences of different species. Recently, two groups sequenced several Saccharomyces genomes. The main goal of these studies was to identify the regulatory sites in Saccharomyces spp. using multiple whole-genome alignment or and multiple alignments of gene upstream regions. Results were represented as two lists of predicted binding motifs. Our comparison of these lists shows a rather moderate intersection. This prompted us to analyze the conservation rate for known and predicted binding sites in Saccharomyces genomes in more detail.

22. Horner DS, Pirovano W, Pesole G
Using phylogenetic information in the detection of correlated amino acid substitutions
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Much effort has been devoted to the detection, from multiple sequence alignments of homologous proteins, of pairs or groups of amino acid positions that evolve in a nonindependent – or compensatory – manner. It is expected that such clusters of positions might either tend either to be proximal in the mature, folded protein, or to be involved in similar aspects of protein function. Several such methods have been shown to be reasonably effective in the detection of intra-protein contacts. However, all of the most successful published algorithms rely on pairwise comparisons between aligned sequences. We wished to investigate whether evolutionary information (the topology and branchlengths of the phylogenetic tree describing relationships between the sequences under study) can allow an improvement in the prediction of intra protein contacts from correlated substitutions.

23. Greco C, Sacco E, Vanoni M, De Gioia L
Identification and in silico characterization of double histone fold domains in Cca3 and “Similar to Cca3” proteins
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Histone folds are structural elements that are able to form dimers by means of tight interactions between hydrophobic surfaces. Normally, a histone fold is composed by a long alpha-helix flanked by two or three shorter helices. In the nucleosome core particle, two pairs of H2a-H2b and H3-H4 histone heterodimers assemble together, giving rise to a disk-like octamer upon which DNA rolls up. The publication of the X-ray structure of the prokaryotic histone from Methanopyrus kandleri highlighted a novel protein fold, which is originated by the assembly of two consecutive histone folds included in the same peptide chain. More recently, the publication of the X-ray structure of the amino-terminal domain of hSos1 showed that also this protein module assumes a similar fold, dubbed the histone pseudodimer and here also referred to as “double histone fold”. In fact, the evolutionary relationship between the H2a histone and the domain spanning the protein sequence 96-190 of hSos1 had been already disclosed, due to high sequence similarity between the two domains. However, the first histone-like domain of hSos1, spanning the protein portion 6-95, does not show evident sequence similarity with histones (Sondermann et al, 2003). Moreover, it is unknown whether the double histone fold can be found in other protein families. In view of this, we initiated an in silico study aiming at the identification of other proteins characterized by a sequence that is compatible with the double histone fold.

24. Gallimbeni R, Di Marino D, D'Annessa I, Morozzo della Rocca B, Desideri A, Falconi M
Membranome: an active web site
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Membrane proteins play key role in cell biology e.g. as ion channels, drug receptors, and solute transporters. It has been estimated that ~25% of genes code for membrane proteins, and that ca. 50% of potential new drug targets are membrane proteins. Despite the central importance of membrane proteins, the number of high resolution structures (from X-ray diffraction and more recently from NMR) remains small but the literature about experimental data available is huge. Literature gives a large amount of disjointed information about this essential group of proteins that needs to be organized to give a direct access to the researcher. In order to ease the browse of experimental data we are preparing the “membranome” site. Membranome site will select, store and efficiently organize literature data about: - classification; - genomic and protein sequences; - expression, purification, crystallization and structure determination; - structure and function; - transmembrane regions predictions (if the structure is not available); - interactions between membrane proteins and the rest of cell components: ions, lipids, sugars, ligands, substrates, solvent, a variety of molecules and other proteins; - mutants, mutation technique, altered functionality and pathological consequences of mutations; - publication references.

25. Ferraro E, Via A, Ausiello G, Helmer-Citterich M
A new neural network approach for the inference of SH3 domains specificity
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: SH3 domains bind polyproline II peptides characterized by the PxxP consensus (P is proline and x in any amino acid). Single domain specificities display a preference for peptides within a range of variability on the common structural theme and different domains may interact with common peptides. We defined a new neural strategy to extract information from interacting partner sequences to improve the identification of SH3 domains specificity.

26. Ferrario MG, Chiappori F, Ferrario MG, Gaiji N, Fantucci P
Molecular dynamic study of the ligand binding domain of estrogen receptor alfa and beta
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Estrogen Receptor belongs to the Nuclear receptor family, there are two different currently known subtypes Era and Erb. Levels and proportion of the two subtypes differs in different target cells. We can hypotheses a different pharmacological activity upon ligand binding. The aim of this work is the investigation through molecular dynamics simulations, on both the subtypes a and b of the estrogen receptor, of possible differences among them and for the study of their dynamical behaviour.

27. Faccioli P, Provero P, Herrmann C, Stanca AM, Terzi V
A co-expression network for gene function characterization in barley
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The recent advent of high-throughput technology and the exponential increase in computer power have moved biology into a revolutionary mode, shifting the focus of molecular biologists from single genes to whole genomes The possibility of exploring gene function is extremely attractive in such a context of high-throughput data generation and computational inference based on similarities in gene expression has been proved to be a valuable tool for functional characterization. The modern theory of networks offers a new conceptual framework for the analysis of gene expression both at the transcriptomic and proteomic levels: genome-scale data sets can infact be conveniently visualized as networks of gene/protein co-occurrences where genes/proteins are represented by nodes and the relationships between them are represented by connections. This paper reports just an “in silico” approach to gene expression analysis based on a gene co-expression network.

28. Fabbri M, Guffanti A, Cocito A, Furia L, Fallini R, McBlane F
RSSDB: A database of cryptic recombination signal sequences involved in V(D)J recombination.
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The antigen receptor repertoire is generated through rearrangements of Immunoglobulin (Ig) and T-cell receptor (TCR) gene segments into functional genes by a mechanism named V(D)J recombination. This is a two-step process: a cleavage step in which double-strand DNA cuts are made at specific sequences, followed by a joining step to repair the breaks. The coding segments of Ig and TCR genes are flanked by short recombination signal sequences (RSS). The RSS are recognized by a complex of the lymphocyte specific recombination proteins RAG1 and RAG2, which cleave the DNA between the coding sequence and the RSS. The broken coding strands are then rejoined to produce a rearranged gene. Mistakes in this process can generate chromosomal translocations that are involved in acute lymphoid leukemias and non-Hodgkin lymphomas. These errors include cleavage of DNA sequences, termed cryptic RSS, similar to functional RSS but located outside the Ig or TCR loci. We have screened the human and mouse genomes for the presence of putative cryptic RSS. To provide an initial list enriched in cryptic RSS, we used an original search algorithm. This primary set was then further filtered for biologically functional sequences using a published method. We have created a web-accessible database containing these putative recombination signals in the genome context. This is the first repository containing a genome-wide collection of RSS sequences. These sequence tags can be retrieved from a number of starting points including RSS type (with 12- or 23 bp spacer), chromosomal region, cytoband and gene identity. For visualization of our RSS search results, we have chosen to rely on an existing genome annotation knowledgebase and correlate our results with the gene structure, analysis, annotation and browsing features of the UCSC Genome Browser. Sequences of interest may also be searched for the occurrence of RSS and the corresponding tracks searched from within the genome database.

29. Emerson A, Rossi E, Giuliani S
A database infrastructure for microarray data
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The usual procedure employed in the analysis of microarray data typically involves little more than a desktop workstation and a spreadsheet program. However, managing data in the form of spreadsheet files is not convenient, particularly when submitting results to public databases prior to publication. In addition, many researchers are discovering that data coming from the latest generation of microarray chips, which may contain many tens of thousands of gene probes, cannot be processed even with the most powerful personal computer. A further deficiency is that normally there is no provision for the systematic recording of information related to the experiment themselves, e.g. platform design, sample hybridization or protocols; such data are critical in checking reproducibility and for comparing with other experiments. Motivation for a more sophisticated and rigorous approach to microarray analysis data has come from researchers in the Hormone Responsive Breast Cancer Network (HRBC, http://www.hrbcgenomics. net/ ), where the analysis and sharing of microarray data with other members of the project, as well as comparison with relevant data in public repositories, are essential requirements

30. Di Vincenzo L, Grgurina I, Pascarella S
In silico analysis of the adenylation domains of the freestanding enzymes
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: This work presents a computational analysis of the molecular characteristics shared by the A domains from traditional nonribosomal peptide synthetases (NRPSs) and the group of the freestanding homologous enzymes: a-aminoadip ate semialdehyde dehydrogenase, a- aminoadipate reductase and the protein Ebony.

31. Di Camillo B, Nair KS, Toffolo G, Cobelli C
Identification of gene regulatory modules using entropy and mutual information
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: A crucial issue in microarray studies is the elucidation of how genes change expression and interact as a consequence of external/internal stimuli such as illness, drug assumption, hormone stimulation. To do so one has to reconstruct the regulatory network by describing activation/inhibition and cause-effect relationships among expression profiles. Different approaches are available in literature, but the small number of available samples with respect to the number of genes constitutes a major drawback to apply these methods to real microarray data. At present, a realistic aim is the identification of modules of gene regulation, i.e. sets of genes that are possibly regulated by the same transcription factors, or potential inhibitors or activators of a group of co-expressed genes.

32. D'Elia D, Turi A, Catalano D, Licciulli F, Tripoli G, Porcelli D, Saccone C, Caggese C
MitoDrome2: a database of OXPHOS nuclear genes in Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: Mitochondrial disorders are clinical phenotypes associated with abnormalities of oxidative phosphorylation (OXPHOS), the primary energy-producing process of all aerobic organisms. Disorders of OXPHOS are recognized as the most common inborn errors of metabolism affecting at least one in 5000 newborn children. Except for complex II, which is composed of proteins all encoded by nuclear genes, the other OXPHOS complexes are built up of both mitochondrial and nuclear DNA encoded proteins; so, assembling the OXPHOS complexes and fine tuning their activity require specialized regulatory mechanisms to optimize the cross-talk between the two genomes and ensure the coordinated expression of their relevant products. In this context, the characterization of nuclear genes encoding for mitochondrial proteins and of functional elements regulating their expression is of crucial importance to clarify real genetic causes of mitochondrial diseases, to assess the correct diagnosis and set up new and effective therapies. Despite the long evolutionary divergent time, many key pathways that control development and cellular physiology are conserved between Drosophila and humans, and about 70% of the genes associated with human diseases have direct counterpart in the Drosophila genome. To investigate on the functional constraints acting on the evolution and on the regulatory mechanism coordinating the expression of OXPHOS genes we have identified and characterized sequence and structure of these genes in three species of diptera, D. melanogaster, D. pseudoobscura and A. gambiae, and compared them with their human counterparts. Data obtained from this study have been annotated in the MitoDrome2 database. The availability of data produced by our study in MitoDrome2 is expected to be particularly useful for biologists and clinicians interested in studies of functional genomics related to mitochondrial biogenesis, metabolism and to their pathological dysfunctions.

33. D'Ursi P, Salvi E, Fossa P, Milanesi L, Rovida E
Modelling the interaction of steroid receptors with organic polychlorinated compounds
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The organic polychlorinated compounds like dichlorodiphenyltrichloroethane (DDT) with its metabolites and polychlorinated biphenyls (PCBs) are present in atmospheric particulate as persistent contaminants. They have been recognized to have detrimental health effects both on wildlife and humans acting as endocrine disrupters (EDC) due to their ability of mimicking the action of the steroid hormone thus interfering with hormone response. There are several experimental evidences that they bind and activate human steroid receptors. Despite the growing concern about the toxicological activity of EDC, molecular data of the interaction of this compounds with biological targets are still lacking. In order to better understand the ability of EDC to bind in the receptor hormone binding pocket, we have simulated by docking approach the molecular models of the complexes of estrogen, progesterone and androgen receptors with DDT and PCB family compounds.

34. Falconi M, Chillemi G, Di Marino D, D'Annessa I, Ceruso MA, Morozzo della Rocca B, Desideri A
Molecular dynamics simulation of mitochondrial ADP/ATP carrier in absence and in presence of its natural inhibitor carboxyatractyloside
Meeting: BITS 2005 - Year: 2005
Full text in a new tab
Topic: Unspecified

Abstract: The transport of various metabolites across the mitochondrial membranes is essential for eukaryotic metabolism. Specific transport through the inner mitochondrial membrane is achieved by nuclear encoded carriers which form a large transport family, the mitochondrial carrier family. The structure of the ADP/ATP carrier in complex with its inhibitor carboxyatractyloside (CATR) has been recently solved by X-ray crystallography providing for the first time an insight into one conformation of the protein. In order to shed light on the possible conformation sampled by the protein and on the effect of CATR on constraining a definite configuration we have carried out two 10 ns molecular dynamics simulation of the protein embedded in a lipid bilayer of palmitoyl-oleoyl-phosphatidyl-choline (POPC) with and without its co-crystallized inhibitor CATR.

35. Ceol A, Montecchi-Palazzi L, Persico M, Gavrila C, Castagnoli L, Cesareni G
The (new) MINT Database.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Scientists recognize that a complete description of cell physiology requires an understanding of the “global” protein interaction network. Thus, a database that collects this information, which is presently dispersed in the scientific literature (or accumulated by high throughput experiments), is an essential post genomic tool. MINT was conceived a couple of years ago, as a collaborative effort between the group of Molecular Genetics and the students of the PhD program of Molecular and Cellular Biology of the University of Rome Tor Vergata, MINT is a relational database designed to store data on functional interactions between proteins, and aims at being exhaustive in the description of the interaction including information, whenever available, about kinetic and binding constants and about the domains participating in the interaction. Presently MINT focuses on experimentally verified interactions extracted from the scientific literature by curators, with special emphasis on mammalian organisms. The MINT protein interaction database offers to the scientific community, a unique bioinformatic tool to design and interpret their experiments.

36. Cannata N, Forcato C, Fabbro G, Pasin A, Balen J, Valle G
Searching for discriminating degenerated patterns between two populations of sequences
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In this work we present the development of a bioinformatics tool aiming at the individuation of discriminating sequence patterns between two populations of sequences. Some examples in which it could be used are easy to find in genomics and proteomics: introns/exons in gene sequences, coding/non-coding in transcript sequences, proteins that are transported in some subcellular localization and those that are not. Once the patterns are detected they could be searched over non-annotated sequences from some program especially developed to find degenerated patterns. We expect that such a method, used jointly with other more traditional methods could lead to a better predictive power in annotation processes.

37. Vitulo N, Cestaro A, Vezzi A, Campanaro S, Simonato F, Lauro F, Malacrida G, Simionati B, Cannata N, Bartlett D, Valle G
Development of tools based on UCSC and KEGG for the annotation of the Photobacterium profundum genome
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: One of the critical steps in a genome sequencing project is the efficient data storage and retrieval of the large amount of information produced, which represents the starting point for data analysis and interpretation. We have recently completed the genome sequence of Photobacterium profundum strain SS9 and the data have been implemented in a genome browser under the UCSC enviroment. The UCSC genome browser has been developed at the University of California, Santa Cruz and CRIBI hosts one of their official mirror sites at http://genome.cribi.unipd.it. The sequence and annotation information is stored in a MySQL relational database and a web-based tool performs fast visualization and querying of the data. The records are displayed as a series of tracks aligned with the genomic sequence. The Photobacterium profundum genome browser contains the ORF prediction obtained by two different programs (Orpheus and Glimmer) and the related non-redundant ORF consensus, the ribosome, tRNA, operons, the clones spotted on the microarray chips, the differentially expressed clones derived from microarray experiments, the orthologous genes on other bacteria, the phage and a prediction of the repeated element on the genome.

38. Attimonelli M, Accetturo M, Scioscia G, Marinelli C, Leo P, Santamaria M, Mona S, Lascaro D, Cascione I, Tommaseo-Ponzetta M
HMDB, the Human Mitochondrial Data Base, a genomic resource supporting population genetics studies and biomedical research on mitochondrial diseases
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Population genetics studies based on the analysis of mtDNA and mitochondrial disease studies have produced a huge quantity of sequence data and related information. These data, classified as RFLPs, mtDNA SNPs, pathogenic mutations, HVS1 and HVS2 sequences, and complete mtDNA sequences, are at present distributed worldwide in differently organised databases and web sites, not well integrated among them. Several mitochondrial specialised databases and databases related with variability data have been designed and implemented, but generally they are structured as simple databases where data are stored, without the possibility to perform any analysis. Moreover it is not generally possible for the user to submit and contemporarily analyse its own data comparing them with the content of a given database and this is valid both for population genetics data, and for mitochondrial disease data. As far as population genetics data, for example, the problem of sequence classification in haplogroups is becoming more and more important as the improvement of sequencing technologies is increasing the availability of new complete mitochondrial genomes. Indeed up to now the only way to establish the haplogroup paternity of a given mitochondrial sequence is to manually observe its variant sites respect to a reference sequence, referring to literature in order to define its haplogroup-specific polymorphisms. Also as far as mitochondrial disease data, despite the large number of disease-associated mutations already discovered in the last few years, the sequencing of the complete human mt genome is allowing the discovery of new pathogenic mutations. Indeed, up to now, the pathogenicity of mtDNA mutations has been, in most cases, prevalently validated by their segregation with the disease and their consequent loss of function when the mutation involves a structural gene. However, no systematic statistical analysis of the mtDNA SNPs has been performed until now. Here we present the design of a Human Mitochondrial genome DataBase (HMDB) that will collect the complete human mitochondrial genomes publicly available interfaced to analysis programs, allowing the classification of newly sequenced human mitochondrial genomes, and the prediction, through site-specific nucleotidic and aminoacidic analysis[, of the pathogenic potential of mitochondrial polymorphisms.

39. Attimonelli M, Accetturo M, Lascaro D
Statistical prediction of pathogenic variant sites in human mitochondrial genomes
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Mitochondrial DNA disorders – disorders associated with dysfunctions of the oxidative phosphorylation system (OXPHOS) – are caused by inborn metabolism errors and have an estimated frequency of 1 out of 10000 live births. Due to the relevant role played by the OXPHOS system in ATP production, causes and effects of mitochondrial disorders are highly heterogeneous and complex. Major origin of mitochondrial disorders is in both nuclear and mitochondrial DNA mutations. Although prenatal diagnosis is routine for nuclear DNA mutations, the cases of prenatal diagnosis of mtDNA mutations are rare, even though urgent, as no real therapies exist. However thanks to bioinformatics support, the gap may be reduced in a short time. Indeed, up to now, the pathogenicity of mtDNA mutations has been, in most cases, prevalently validated by their segregation with the disease and their consequent loss of function when the mutation involves a structural gene, but no systematic statistical analysis of the mtDNA SNPs has been performed. Moreover the criteria commonly followed to associate a mutation to a given pathology are: - aminoacidic change in a strictly conserved site; - presence in patients only; - heteroplasmy condition; - presence in phenotipically similar, but ethnically different families. However a strict correlation mutation-phenotype in patients is not always verified. Here we propose a statistical approach aimed to contribute in the estimation of the pathogenic variation sites. The analysis is based on the estimation of site-specific relative variability in a sets of homologous sequences, through the application of SiteVarProt and SiteVariability softwares, in order to infer a correlation between site variability and pathogenicity of a given mutation.

40. Di Vincenzo L, Grgurina I, Pascarella S
Computational analysis of structural properties of classical and novel non ribosomal aminoacyladenylate forming domains.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Nonribosomal peptide synthetases (NRPSs) are multidomain, multifunctional enzymes involved in the biosynthesis of many bioactive microbial peptides such as phytotoxins, siderophores, biosurfactants, and anticancer agents. The minimal module required for a single monomer addition consists of a condensation domain (C), an adenylation domain (A) and a peptidyl carrier protein (PCP) domain also denoted as thiolation (T) domain. Systematic comparative analyses identified 8 or 10 sequence positions lining the active site pocket which are held responsible for substrate recognition and selection in A domain. Recently, it has been pointed out that several enzymes possibly involved in lysine metabolism in eucaryotes display a 3-domain architecture where the two N-terminal domains are homologous to the A and T domains from NRPS systems. The third C-terminal section may contain a PQQ, a NADPH or a functionally uncharacterized domain. Our work is aimed at the structural characterization and the study of common molecular features of the family of the aminoacyladenylate-forming enzymes from NRPS and from the recently discovered homologous enzymes. Psi-BLAST searches were applied over the GeAll and Non-Redundant databanks using query sequences Ebony (gi:3286766) from Drosophila melanogaster, 5-aminoadipic acid synthase (gi:30348962) from Mus musculus and aminoadipate-semialdehyde dehydrogenase from yeast (swissprot:LYS2_YEAST). Thirty-two sequences were identified from different eucaryotic species and the domain assignments were confirmed by CDD and Pfam queries. The sequence subsets containing the A-T domains were aligned utilizing the HMMER package. On the basis of the structural homology encoded in this multiple alignment, the potential occurrence of a “specificity code” similar to that described for the NRPS systems has been tested. The residues which interact with the α-amino and α-carboxy groups of the amino acid substrates [2], Asp235 and Lys517 respectively, are conserved, the only exceptions being Ebony protein (gi:3286766) from Drosophila melanogaster and (gi:21291643) from Anopheles gambiae where the Asp235 is replaced by valine. Homology molecular modelling has been utilized to map the conserved residues onto a hypothetical active site structure of the 5-aminoadipic acid synthase from Homo sapiens (gi:32261239) and Ebony (gi:3286766) from Drosophila melanogaster to understand the role of the conserved residues and to predict their interaction with the putative substrates. In case of Ebony proteins, the Asp235 is replaced by Val, while Pro236, conserved in all 5-aminoadipic acid synthase and aminoadipate-semialdehyde dehydrogenase, is substituted by Asp which can form hydrogen bond with the β-amino group of the β-alanine substrate. The β-amino group interacts via hydrogen bonds also with Ser301 and Asp331. The other residues line and shape the active site pocket. Characterization of the α-aminoadipate synthase is under way.

41. Ceroni A, Frasconi P
On the Role of Long-Range Dependencies in Learning Protein Secondary Structure
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Prediction of protein secondary structure (SS) is a classic problem in computational molecular biology and one of the first successful applications of machine learning to bioinformatics. Most available prediction methods use feedforward neural networks whose input is the multiple alignment profile in a sliding window of residues centered around the target position. By construction, predictions obtained with these methods are local. Long-range dependencies, on the other hand, clearly play an important role in this problem. In it was proposed the use of bidirectional recurrent neural networks (BRNN) for the prediction of SS. The architecture in this case allows us to process the sequence as a whole and to “translate” the input profile at each position into a corresponding output prediction for that position. Theoretically, the output at any position in a BRNN depends on the entire input sequence and thus a BRNN might actually exploit long-range information. Unfortunately, well known problems of vanishing gradients do not allow us to learn these dependencies. In this paper, we are interested in developing an architecture that can effectively exploit long-range dependencies assuming some additional information is available to the learner. We start from a rather simple intuitive argument: if the learner had access to information about which positions pairs are expected to interact, its task would be greatly simplified and it could possibly succeed. In the case of SS prediction, a reasonable source of information about long-range interaction can be obtained from contact maps (CM), a graphical representation of the spatial neighborhood relation among amino acids. Of course in order to obtain a CM the protein structure must be known. In addition, it is well known that backbone atoms’ coordinates can be reconstructed starting from CMs. Thus, in a sense, using CM information in order to predict SS might appear foolish since most of the information about the 3D structure of the protein is already contained in the map. However, the following considerations suggest that this setting is worth investigation: • Algorithms that reconstruct structure from CMs are based on a potential energy function with many local minima whose optimization is not straightforward. Thus it is not clear that a supervised learning algorithm can actually learn to recover SS from CMs. • CMs can be predicted from sequence or can be obtained from structures predicted by ab-initio methods such as Rosetta. Although accuracy of present methods is certainly not sufficient to provide a satisfactory solution to the folding problem, predicted maps may still contain useful information to improve the prediction of lower order properties such as the SS. • Even if CMs are given, the design of a learning algorithm that can fully exploit their information content is not straightforward. For example, Meiler and Baker have shown that SS prediction can be improved by using information about inter-residue distances. Their architecture is a feedforward network fed by average property profiles associated with amino acids that are near in space to the target position. In this way, relative ordering among neighbors in the CM is discarded. The solution proposed in this paper is based on an extended architecture that receives as an additional input a graphical description of the pairwise interactions between sequence positions. We call this architecture interaction enriched BRNN (IEBRNN). Its details are presented in a longer version of this paper.

42. Accardo MC, Giordano E, Riccardo S, Digilio FA, Iazzetti G, Calogero RA, Furia M
RNomics: a computational search for box C/D snoRNA genes in the D.melanogaster genome
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Genes producing functional RNAs rather than protein products form a large and variegated class in all genomes, from bacteria to mammals. In higher organisms. non-coding RNA (ncRNA) appears to dominate the whole genomic output, and is not surprising that the range of known RNA-induced phenomena is rapidly expanding. The central importance of RNA signaling to eukaryotic cell has become apparent in the last few years, when a large bulk of evidence has pointed out novel roles for ncRNA molecules in both genetic and epigenetic processes. The family of nc-RNA genes comprises many small nucleolar RNAs (snoRNAs) that guide the maturation or post-transcriptional modification of target RNA molecules. Most snoRNAs fall into two classes called box C/D and box H/ACA snoRNAs, with each class defined by the presence of common sequence motifs and common associated proteins. A few snoRNAs in either class are required for definite pre-rRNA cleavages and essential for viability, whereas most are responsible for the 2’-O-ribose methylation (C/D) or pseudouridylation (H/ACA) of target RNA molecules respectively. The C/D class guides site-specific 2’-Oribose methylation by base-pairing of the 10-21 nt-long sequence positioned upstream from a D (or an internal D’) box to the target RNA, with the nucleotide positioned 5 base pairs (bp) upstream from the D/D’ box selected for methylation. Although most of the C/D and H/ACA box snoRNAs are involved in modifications of ribosomal RNA (rRNA), other types of RNA molecules, as tRNAs, snRNAs, and possibly mRNAs, might be recognised as targets. Despite the importance of their functional roles, most of snoRNAs have not yet been identified even in organisms whose genome has been completely sequenced.

43. Pozzoli U, Menozzi G, Riva L, Sironi M
COBITIS: COmputational BIology Tools Interoperability Schema
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: L’utilizzo programmatico di algoritmi in biologia computazionale presenta quasi sempre parecchie difficoltà. Spesso gli autori scelgono di pubblicare i propri algoritmi mediante interfacce web che ne facilitano l’impiego da parte di un utente umano ma ne rendono impraticabile l’utilizzo all’interno di un processo di elaborazione più complesso da parte di un qualsiasi sistema software. Il problema è ancora più limitante quando l’algoritmo deve essere usato ripetutamente. Anche la disponibilità di versioni compilate o addirittura del codice sorgente non risolve completamente il problema. Infatti, a prescindere dalle difficoltà di installazione/integrazione, vi è pur sempre da risolvere il problema del formato in cui i dati sono richiesti e i risultati forniti. Una soluzione parziale e intuitivamente praticabile è la standardizzazione del formato dei dati. Molti tentativi sono stati fatti in questa direzione ma nessuno ha raggiunto lo scopo di definire un formato generalmente accettato e utilizzato se non in ambiti specifici o all’interno di singole organizzazioni. L’utilizzo di formati definiti mediante schemi XML consente agli algoritmi di identificare il tipo dei dati forniti. L’utilizzo di uno schema XML può risultare assai efficiente se, ad esempio, gli algoritmi possono comunicare mediante SOAP. Abbiamo sviluppato una serie di strumenti in C++ e in modo indipendente dalla piattaforma che consentono l’implementazione di algoritmi in grado di scambiare dati secondo COBITIS; un semplice schema XML. Tali strumenti consentono la trasformazione di dati da diversi formati a COBITIS, l’implementazione di applicazioni client e server che comunicano via SOAP consentendo l’utilizzo remoto e distribuito di algoritmi. In particolare abbiamo sviluppato un server accessibile mediante web services e due client: uno web che sfrutta XSLT per la visualizzazione dei dati risolvendo molti problemi nell’implementazione delle interfacce e uno che consente di accedere al server da Matlab. Riteniamo che, pur rinunciando a imporre qualsiasi ontologia sui dati, questo modello possa risolvere parecchi dei problemi relativi all’utilizzo programmatico di algoritmi in biologia computazionale.

44. Papaleo E, Vai M, Popolo L, Fantucci P, De Gioia L
Structural models of the catalytic domain of the yeast β-(1,3)-glucan transferase Gas1 by combined threading and secondary structure prediction methods
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Gas1p is an exocellular glycoprotein of Saccharomyces cerevisiae and plays a crucial role in cell wall assembly, due to its β-(1,3)-glucan transferase activity. The identification of Gas1p homologues in other yeast species and fungi allowed the definition of a new family of glycosyl hydrolases, family GH72, on the basis of sequence similarity. Hydrophobic cluster analysis of the catalytic domain (C-domain) of some GH72 members suggests a (β/α)8 barrel fold, also supported by our recent study on the structural and functional characteristics of the C-domain of Gas1p. Standard homology modelling approaches cannot be used to infer the structure of C-domain of Gas1p and related proteins, due to the lackness of suitable homologues of known 3D structures. Threading and fold recognition approaches have been shown to predict fold of novel proteins with relatively high accuracy. However it should be noted that the detection of possible remote homologues is only the first step of successful modelling. In fact alignment to the same scaffold produced by different threading methods can be significantly dissimilar and affected by local errors, making difficult the derivation of a good structural model. With the aim of unraveling the key molecular characteristics of the C-domain of Gas1p and related proteins, in the present work, a procedure has been worked in which data derived from threading methods, multiple sequence alignments and secondary structure predictions were merged and compared to experimental results in order to obtain reliable and detailed three dimensional models.

45. Mapelli V, Accardo E, Fantinato S, Sacco E, De Gioia L, Vanoni M
Structure-based hypothesis on active role of RasGEF αG-helix
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Ras proteins are small GTPases ivolved in signaling pathways controlling cell growth and differentiation. They act as molecular switches by cycling between an active GTP- and an inactive GDP-bound state. Following the activation of specific cell-surface receptors, Ras proteins switch from inactive to active state through the catalytic action of specific Guanine nucleotide Exchange Factors (GEFs), that promote the dissociation of GDP from Ras, allowing GTP entrance into the Ras nucleotide poket. The Saccharomyces cerevisiae Ras-GEF Cdc25 (Cdc25Sc) was the first Ras-exchanger to be identified. In higher eukaryotes there are two different classes of Ras-specific Cdc25Sc homologs, Sos proteins and Cdc25Mm, also referred to as Ras GRF. Ras-specific GEFs are made of several functional and structural domains, Ras GEF activity is contained within a domain showing very high similarity to the Cdc25Sc catalytic domain and called, for this reason, Cdc25 homology domain. Structural studies on Ras crystallized in complex with nucleotide (GDP or GTP-analogs) and human exchange factor Sos respectively have allowed both to identify conformational differences between active and inactive state of Ras, and to make hypothesis on molecular determinants of interaction and catalytic activity of human Sos. Mutational and structural studies on Ras GEFs catalytic domain have pointed to a major role for the helical-hairpin formed by αH and αI helixes (catalytical hairpin) in the catalytic mechanism of Ras-specific GEFs. In the present work we investigate the Ras GEF αG-helix role in Ras-GDP to GTP exchange.

46. Manzoni R, Sacco E, De Gioia L, Vanoni M
Hydrophobic network between AB and HI hairpins suggests a new role for AB hairpin in GEF action mechanism
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The analysis of protein 3D structure is an important step to understand their mechanism of action, regulation, function and family’s belonging. Experimental methods for proteins structure determination don’t keep up with the increasing number of genomic sequence available: this led to an increase of computational methods that predict three-dimensional model for a protein of unknown structure (target) on the basis of sequence similarity to proteins of known structure (templates). There are different kinds of Homology Modelling methods, but all of them can’t recover from an incorrect target-template alignment: a good alignment is the first think to be considered when we’re talking about model’s confidence. SWISS MODEL, an automated comparative protein modelling server starts with the analysis of the structural conserved regions in the target-templates alignment. Ras protein are highly conserved GTPase playing a pivotal role in different important cellular events: cell proliferation, differentiation, cellular traffic and cytoskeleton organization. Within cells, Ras proteins exist both in a GTP-bound form (“on” state) or a GDP-bound (“off” state). The level of the GTP-bound state derives from the balance of the activity of the GTPase Activating Proteins (GAPs) and Guanine nucleotide Exchange Factors (GEFs). Common feature of all Ras GEFs is the presence of a domain, the RasGEF domain, carrying all the main structural features needed to interact with Ras and to exchange the nucleotide. A notable feature of this catalytic domain is the protrusion of a hairpin, formed by helices αH and αI, out of the core of the domain. It has been proposed helix αH plays an important role in the nucleotide-exchange mechanism opening up the nucleotide-binding site.

47. Gaiji N, Mazzitello R, Beringhelli T, Fantucci P
Bovine β-lactoglobulin: Interaction studies with Norfloxacin
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Molecular docking is an efficient computational tool to predict the structures of protein-ligand complex. This kind of simulation is of fundamental importance for interpretation of numerous biochemical phenomena, providing useful information on the preferred binding sites of ligands, and therefore in rational drug design. Bovine β-lactoglobulin (BLG) is a small extracellular protein belonging to the lipocalin superfamily. Lipocalins have been classified as transport proteins with the remarkable ability of binding small hydrophobic molecules within the central cavity also known as calyx. Because of its stability, abundance and easiness of preparation BLG, has been frequently studied to clarify its structural and binding features. Several studies suggest that more than one binding site exists, thus the aim of this work is to investigate the existence of other sites, in addition to the calyx one, and to verify if BLG can interact and play the role of carrier of drugs. We considered the particular case of Norfloxacin which is a broad-spectrum antibiotic used in treatment of urinary tract infections.

48. Rossi V, Picco R, Vacca M, D'Urso M, D'Esposito M, Galli T, Filippini F
Novel sequence patterns specific to VAMP subfamilies
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In eukaryotic cells, SNARE proteins of the vesicle or target membrane (v- or t-SNAREs) play a central role in the control of membrane fusion and protein and lipid traffic. SNAREs’ coiled-coil domains (CCDs) have probably evolved from a common ancestor with a hydrophobic heptad register, interrupted by a conserved polar residue at the ionic “zero” layer. Depending on the nature of such residue, SNAREs have been reclassified as either Q- or R-SNAREs. R-SNAREs consist of two subfamilies: (i) short VAMPs or brevins (from the latin word “brevis” = short), and (ii) long VAMPs or longins, sharing a conserved N-terminal Longin Domain. Distinct amino acid patches are likely to determine specificity of SNARE pairing by reducing structural integrity when mismatched SNAREs interact. When considering pairing of the Q- and R-SNARE CCDs, an asymmetric ‘‘complementarity’’ is found in layers -3, -2, and +6, where bulky side chains are packed together with smaller ones, possibly enforcing the correct register between the CCDs of the fusion complex. Sequence variation in the SNARE domains, by altering local charges at the interaction layers, is likely to mediate a fine modulation of the interaction specificity and/or kinetics, regulating intramolecular binding as well as binding to a growing family of SNARE-interacting factors. Although the structure of the SNARE complex is evolutionarily conserved, biological specificity is probably mediated mainly by accessory proteins recognizing different CCD surface patterns of charges, polar and nonpolar side chains different between the endosomal and neuronal complexes. Recently, it has been reported that the interaction among acidic surface residues from the SNAREs and basic residues over the concave surface of α-SNAP is crucial to the disassembly of the complex.

49. Antoniol G, Ceccarelli M
A Computational Intelligence Approach to Unsupervised Microarray Image Gridding
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Image analysis is an essential aspect of microarray experiment: measures over the scanned image can substantially affect successive steps such as clustering and identification of differentially expressed genes. Scanned microarray image processing has three main tasks: (i) gridding, which is the process of assigning the coordinates to the spots, (ii) segmentation, it allows the separation between foreground and background pixels, and (iii) intensity extraction. Most of available gridding approaches require human intervention, for example to specify some points in the spot grid or even to register individual spots. Automating this part of the process will allow high throughput analysis. The paper reports a novel approach for the problem of automatic gridding in Microarray images. The method uses a two step process. First a regular rectangular grid is superimposed on the image by interpolating a set of guide spots, this is done by solving a non-linear optimization process with an evolutionary approach. Second, the interpolating grid is adapted, with Markov Chain Monte Carlo method, to local deformations. This is done by modeling the solution as a Markov Random Field with a Gibbs prior possibly containing first order cliques (1-clique). The algorithm is completely automatic and no human intervention is required, it efficiently accounts grid rotations and irregularities.

50. Staiano A, Tagliaferri R, De Vinco L, Longo G
Advanced Data Mining Methodology Based on Latent Variable Models
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Aim of this paper is to show a powerful tool for data mining activities based on a nonlinear latent variable model, i.e. Probabilistic Principal Surfaces (PPS). PPS builds a probability density function of a given data set of patterns, lying in a D-dimensional space, which can be expressed in terms of a limited number of latent variables lying in a Q-dimensional space. Usually, Q is 2 or 3 dimensional and thus the density function is used to visualize the data in the latent space. PPS have been fruitful exploited for classification as well as visualization and clustering of complex real high-D data and represents a promising data mining tool for researchers in genetics and bioinformatics.

51. Fogolari F, Tosatto SCE
Loop predictions using molecular mechanics/Poisson- Boltzmann solvent accessible surface area (MM/PBSA)
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In many predictive tasks accurate free energy estimation is needed. The molecular mechanics/ Poisson- Boltzmann solvent accessible surface area (MM/PBSA) approach has proven to be one of the most accurate. However, the correlation between the estimated free energy and the distance (e.g. root mean square deviation (RMSD)) from the most stable conformation is hindered by the strong free energy dependence on minor conformational variations. In the present paper a protocol for MM/PBSA free energy estimation is designed and tested successfully on several loop decoy sets. Further integration of MM/PBSA free energy estimator with the "colony energy" approach makes the correlation between free energy and RMSD from the native structure apparent, thus making the method both accurate and robust.

52. Roasio R, Fu L-M, Botta M, Medico E
MulCom: a novel program for the statistical analysis of genomic data obtained on multiple microarray platforms
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The increasing pace at which DNA microarray-based genomic expression profiles are generated and published poses the issue of efficient and reliable comparison between datasets obtained by different laboratories and on different microarray platforms. Statistical analysis of microarray data is in continuous evolution, and several procedures have been described for detection and weighing of systematic and random errors coming from the highly parallel -but poorly replicated- microarray expression data. However, data obtained from different microarray platforms may be of substantially different nature. This is particularly evident when comparing two commonly used platforms, spotted cDNA microarrays and High-Density Oligonucleotide (HDO) microarrays of the Affymetrix type. cDNA microarrays yield a reproducible ratio between two signals, deriving respectively from the reference and from the sample. Conversely, absolute signals tend to vary across microarrays. Therefore, cDNA microarray data have to be analyzed with statistics handling repeated measurements or paired data, such as paired T-test. In the case of HDO microarrays, an absolute signal level is obtained from each single mRNA sample. As a consequence, non-paired statistics have to be applied to this type of data. Given the intrinsic differences between cDNA microarrays, data analysis procedures have generally been developed on one of the two platforms and only in some cases adapted to the other, however without a specific focus on systematic comparison and validation across platforms. It is still unclear whether data obtained in the two systems can be treated, compared and eventually merged under a common analysis framework. We addressed these issues by generating expression profiles from the same RNAs with both microarray platforms and by developing an analysis procedure in which inter-platform differences in data treatment are reduced to the minimum essential. We then developed a novel statistical test specifically designed to handle multiple comparisons against the same reference condition (eg many points of stimulation against one unstimulated control). In the Multiple Comparison (MulCom) test, regulated genes are identified by a ‘tunable’ statistic test weighing expression change in each stimulation point against replicate variability calculated across the whole set of stimulation points.

53. Santarossa G, Roggia L, De Gioia L, Fantucci P
A Molecular Dynamics Study of the DoubleDominant Negative Mutation W809E/T935E in Ras-GEF Complex
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Ras proteins are guanine nucleotide binding enzymes, with intrinsic low GTP-ase activity, involved in the control of cell growth and cell differentiation. They act as molecular switches, cycling between active GTPbound state and inactive GDP-bound state. Ras activation state is regulated by the competing activity of GTPase activating proteins (GAPs) and guanine nucleotide exchange factors (GEFs), the latter promoting the activation of Ras catalysing the exchange of GDP with GTP. In most tumors the activity of Ras proteins is altered, resulting in hyperactive GTP-bound forms of Ras, either because of a reduced GTPase activity or because of an increased GDP/GTP exchange. GEF mutant W809E/T935E (GEFmut) results in a dominant negative GEF, catalitically inactive, which binds to Ras with great affinity and forms a stable complex in the presence of excess nucleotide. By means of Molecular Dynamics (MD) simulations we compared different trajectories of Ras-GEFwt and Ras-GEFmut systems and analyzed them in terms of both energetic and structural parameters, to correlate the conformational differences of wt and mutant GEFs during their interaction with Ras with the observed modifications in Ras biological activity.

54. Pasa S, Kohn KW, Aladjem MI, Consiglieri C, Cocozza S, Bordo D, Parodi S
In silico model of Molecular Interaction Maps: c-Myc and cell cycle control
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Cell behaviour is largely determined by protein:protein interactions. In particular, it has become increasingly evident that cell cycle control, differentiation and death are governed by networks of molecular interactions involving both proteins and DNA. The concomitant rapid increase of data concerning gene expression as measured in large scale experiments, has made evident the need to represent biochemical effectors (proteins and DNA) and their mutual interaction in an integrated way, in the form of a Molecular Interaction Map (MIM). To describe MIMs in a coherent graphical notation, the use of “wiring diagrams” similar to those adopted in electronics is proposed. In this work we describe the main features of a MIM focused on the oncogene c-Myc and on its role in cell cycle control.

55. D'Ursi P, Rovida E, Merati G, Biguzzi E, Caprera A, Milanesi L, Faioni E
Computational analysis of naturally occurring protein C mutants: electrostatic properties implications.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Activated Protein C (APC) is a vitamin K-dependent anticoagulant plasma serin protease that exerts its action through the inactivation of factors Va and VIIIa in presence of Ca++ and phospholipids. Deficiency of protein C is associated with the risk of developing venous thrombosis. APC shares homologies with other vitamin K-dependent coagulation proteins as a results of a common evolutionary pathway. The chymotrypsin-like serine proteases maintain a strictly conserved active site geometry among their catalytic Ser, His and Asp residues. The fact that this core is highly conserved both in sequence and structure among members of the serine protease family suggests that its shape has been finely tuned during evolution. 33 mutations (18 novel) in the promoter and coding regions of the PC gene were identified by PCR and sequencing in 46 patients reporting venous thromboembolic events. Here we present a computational analysis of three selected mutants (G43E, D194N, G216D) that are localized in the catalytic domain and determine a charge modification in the vicinity of the catalytic triad.

56. Trovato A, Seno F
A new perspective on Analysis of Helix-Helix Packing Preferences in Globular Proteins
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: For many years it had been believed that steric compatibility of helix interfacescould be the source of the observed preference for particular angles between neighbouring helices as emerging from statistical analysis of protein databanks. Several elegant models describing how side chains on helices can interdigitate without steric clashes were able to account quite reasonably for the observed distributions. However, it was later recognized that the “bare” measured angle distribution should be corrected to avoid statistical bias. Disappointingly, the rescaled distributions dramatically lost their similarity with theoretical predictions casting many doubts on the validity of the geometrical assumptions and models. In this report we elucidate a few points concerning the proper choice of the random reference distribution. In particular we show the existence of crucial corrections induced by unavoidable uncertainties in determining whether two helices are in face-to-face contact or not and their relative orientations. By using this new rescaling, we show that “true” packing angle preferences are well described by regular packing models, thus proving that preferential angles between contacting helices do actually exist.

57. Toppo S, Fontana P, Velasco R, Valle G, Tosatto SCE
FOX (FOld eXtractor): A novel protein fold recognition method using iterative PSI-BLAST searches and structural alignments
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: We present a novel fold recognition method based on the combination of detailed sequence searches and structural information. Presently the protocol implements two different approaches to assign the correct fold to the target protein sequence: the first is based on database secondary structure search and the second is based on iterative database sequence search. In the first phase a secondary structure prediction of the target is performed and based on the ConSSPred protocol. This prediction is used to search for hits against a database of known secondary structures extracted from PDB (using DSSP). The search is based on a two-step strategy: the first step is based on a Smith-Waterman local secondary structure similarity search with a specific substitution matrix optimized for secondary structure alignment. The second is based on a global alignment based on SSEA (Secondary Structure Element Alignment), as implemented in our program MANIFOLD, to refine the score and the alignment itself in the region extracted from the first step. At the end of the first phase a list of hits that share a similar secondary structure topology with the target sequence is extracted. The second phase is based on a modified protocol for scanning the sequence database called SENSER. In the beginning of the second phase, BLASTP is used to scan the target sequence against the NR database. These initial hits are clustered to reduce sequence bias and a seed alignment with 20 or fewer sequences generated. This step ensures that PSI-BLAST can be jump-started with a more sensitive initial profile, increasing its sequence diversity. PSIBLAST is run for four iterations (e-value inclusion threshold 10e-3) on the NR60 database of known sequences. NR60 is produced by applying the CD-HIT algorithm to cluster the NR database at 60% sequence identity. Sequences producing NR60 hits with the query are assigned either to the significant sequence space (e-value <= 10e-3) or the trailing end (e-value <= 10) for further use. The profile is used to search the PDBAA database of sequences with known structure. If a significant PDBAA hit (e-value <= 10) is found, the protocol proceeds to the back-validation step (see below). If no significant hit is found, or the hit does not back-validate, a new PSI-BLAST search, using the above "4+1" protocol on NR and PDBAA, is started for the highest ranking sequence (i.e. lowest e-value) in the significant sequence space. Sequences from NR60 matching the query are also assigned to either the significant sequence space or the trailing end. Significant PDBAA hits are again submitted to back-validation. If no significant PDBAA hit is recorded and the significant sequence space has been exhausted, then the protocol uses the trailing end sequences as additional starting points for PSI-BLAST searches. In contrast to previous sequences, which were assumed to be similar enough to the target to imply homology, these sequences are submitted to back-validation before proceeding to the "4+1" PSIBLAST protocol. The back-validation step consists in using PSI-BLAST to find the target starting from a different query sequence, found as described above. I.e. due to the asymmetric nature of PSI-BLAST, if sequence A finds sequence B it is not always the case that B also finds A. Sequences that back-validate are more likely to be correct hits. Once a sequence from PDBAA back-validates and its secondary structures is compatible with the one of the target sequence as found in the first phase, the protocol builds a target to template alignment and stops. The procedure described so far serves to identify a template structure for the target sequence. In order to produce an accurate alignment, HMMER is used to build a hidden Markov model (HMM) based on the HOMSTRAD sequence alignment. The target is then aligned to the template using this HMM. Preliminary results for the method indicate a clear increase in both detection rate and alignment accuracy for distantly homologous sequences. Presently FOX has been tested on Fischer-68 test set to compare its performance with standard PSI-BLAST searches, GenTHREADER and the original SENSER protocol. As expected the introduction of the secondary structure prediction of the protein target and the database secondary structure searches in the first phase have increased detection sensitivity and sensibility of the method compared to profile based searches as PSI-BLAST and SENSER protocol (Fig. 1). The performance is comparable to GenTHREADER showing that right template structure is always found in the top 50 hits as shown in Fig. 1. Further score optimization and development are required to definitely test the entire protocol and make the program available as a web-based server from our group's web site (http://protein.cribi.unipd.it/).

58. Marra D, Malusa F, Piersigilli F, Manniello MA, Romano P
The CABRI website: integrating biological resources information in the bioinformatics network environment
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Biological resources are essential tools in modern biomedical research. It is therefore essential that information on quality biological resources are well known in the scientific community. Web sites distributing this information are more and more widely available, but access and retrieval of the information through a unique system is highly desirable. The CABRI (Common Access to Biological Resources and Information) project was funded by the European Union (EU) from 1996 to 1999. It aimed at the setting up of a “one-stop-shop” for biological materials and related information. This project led to the setting up of the CABRI web site (http://www.cabri.org/), where catalogues of participating cultures collections could be queried, either individually or collectively, and the Guidelines for the Collection Quality Management that were adopted by partners, could be examined. It includes information on more than 120.000 items from 28 collections including bacteria, filamentous fungi and yeasts strains, human and animal cell lines, plasmids, phages, DNA probes, plant cells and plant viruses from nine centers (BCCM, CABI, CBS, CIP, DSMZ, ECACC, ICLC, NCCB, NCIMB). This wealth of information has been made searchable through an implementation of SRS (Sequence Retrieval Software). In 2001, a new project was launched, the European Biological Resource Centers Network (EBRCN). This project has been funded by the EU for the period 2001 - 2004. Among its objectives is the extension of the CABRI on-line services, with special emphasis on the achievement of a better integration with molecular biology and literature databanks (see http://www.ebrcn.org/).

59. Cozzini P, Fornabaio M, Mozzarelli A, Spyrakis F, Kellogg GE, Abraham DJ
HIV-1 protease: a good system to evaluate protein-ligand interactions, water role and protonation state, using an empirical approach
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: A set of 23 protein-ligand complexes of HIV-1 protease and inhibitors has been used as a validation test of an empirical approach, to study protein ligand interactions, considering the role of water molecules involved and the protonation state of protein and ligand ionizable groups. It is well known that the protein ligand binding process is a concerted sum of single events, so many aspects have to be considered. We have demonstrated that an empirical approach based on experimental LogP values and structural information could be used to design new ligands and to understand biomolecular association from several points of view. It is also known that water presence and behaviour can affect the binding. Furthermore, modeling the exact protonation state of several ionizable groups leads to a more realistic in silico model design. HIV-1 protease-ligand complexes represent a good system to experiment the empirical approach of the HINT scoring software, because of the good resolution of crystal data, the well known behaviour of the most important water molecule, WAT301, the presence of a set of water molecules in the cavity surrounding the ligands and, moreover, because a more exact treatment of protein and inhibitor ionizable groups could affect the correctness of the models. We have first analysed the role played by five water molecules placed into the active site and well determined both by X-ray crystallographic analyses and GRID simulations. In addition, we have considered the contributions of another twelve waters surrounding the binding cavity. The different values of the HINT scores, calculated for ligand-water and protein-water interactions, could thus be used to define a water importance scale and to understand the role played by each molecule in the binding stabilisation. We have pointed out, in agreement with data reported in literature, the significance of water 301, whose presence is necessary for the complex formation, and the less relevance of water 313, 313’, 313bis and 313bis’, which don’t really affect the binding process but contribute to define the cavity shape. Finally, analyses of the environment surrounding the external ligand extremities, performed for one single HIV-1 protease-inhibitor complex, confirmed our supposition that protein and ligand solvation waters could make strong interactions with one of the two entities or with both but, nevertheless, are not essential for the binding process. Again, the exactly setting of the protonation state was analysed on a protein ligand complex (pdb code 1A30) where experimental Ki at different pH values was carried out.

60. Eleuteri A, Tagliaferri R, Acernese F, Milano L, De Laurentiis M
Information Geometry for Survival Analysis and Feature Selection
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In this paper an information geometric approach to survival analysis is described. A neural network is designed to model the probability of failure of a system, and it is trained by minimising a suitable divergence functional in a Bayesian framework. By using the trained network, minimisation of the same divergence functional allows for fast, efficient and exact feature selection.

61. Sboner A, Barbareschi M, Dell'Anna R, Demichelis F
Large scale TMA experiments: automation and data management
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Characterization of gene-expression profiles with DNA microarrays provides a powerful mean to discover disease-related genes, particularly in cancer. It is well known that clinical validation of disease-relates genes, through standard molecular analysis on individual tissue sections needs enormous effort in terms of time and costs. To overcome this problem, the Tissue Microarray (TMA) methodology has been recently developed: a high-throughput technology enabling “genome-scale” molecular pathology studies. In this paper we briefly present our technological platform designed and optimized for the complete management of Tissue Microarrays experiments. Our comprehensive system is very flexible regarding the management of data and it allows a wide range of microarray experiments on different diseases. We also obtained promising new results of biomarker expressions on ovarian and breast cancer, in terms of discrimination of patients’ overall survival and relapse free survival.

62. Marabotti A, D'Auria S, Rossi M, Facchiano AM
Modelling the Three-Dimensional Structure of a Sugar Binding Protein from a Thermophilic Organism: Analysis on Stability and Sugar Binding Simulations.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The characterization of proteins from thermophilic organisms is becoming more and more interesting for possible biotechnological applications. Recently, the complete genome of a hyper-thermophilic archaebacterium, P. horikoshii, was sequenced [1] and a sugar binding protein (Ph-SBP) was identified by means of analysis of its sequence similarity. Some preliminary experimental information are available on its binding properties and on its structural features; however, the lack of information about its 3D structure impairs the complete knowledge of its conformational properties and interactions with its ligands. Here, we present the results of the homology modelling strategy we used to predict the 3D structure of Ph-SBP, and the analysis we made on the resulting model in order to assess its reliability, with particular care to its expected thermostability features and sugar binding properties.

63. Barberis M, De Gioia L, Ruzzene M, Sarno S, Coccetti P, Pinna LA, Vanoni M, Alberghina L
The Cyclin-Dependent Kinase Inhibitor Sic1 of Saccharomyces cerevisiae Is a Functional and Structural Homologous to the Mammalian p27Kip1
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In budding yeast Sic1, an inhibitor of cyclin-dependent kinase (Cki), blocks the activity of Cdk1-Clb5/6 (S-Cdk1) kinase required for the initiation of DNA replication that takes place only when Sic1 is removed . Deletion of Sic1 causes premature DNA replication from fewer origins, extension of the S-phase and inefficient separation of sister chromatids during anaphase, whereas delaying S-Cdk1 activation rescues both S and M phase defects. Despite the well documented relevance of Sic1 inhibition on S-Cdk1 for cell cycle control and genome instability, the mechanism by which Sic1 inhibits S-Cdk1 activity remains obscure. Sic1 has been proposed to be a functional homologous of mammalian Cki p21Cip1, that is characterized by a significant sequence similarity with Cki p27Kip1, inhibitor of the Cdk2/Cyclin A kinase activity during S-phase.

64. Di Camillo B, Toffolo G, Cobelli C, Nair KS
Selection of Insulin Regulated Gene Expression Profiles Based on Intensity-Dependent Noise Distribution of Microarray Data
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Insulin resistance in skeletal muscle plays a key role in the development of Type 2 diabetes. To define the molecular mechanisms underlying insulin-induced changes in gene expression, recent studies, performed using microarrays techniques, identified genes involved in insulin resistance in control vs diabetic subjects, before vs after insulin treatment, i.e. exploiting only steady state information. Although extremely useful in order to identify candidate genes involved in analyzed processes and to develop new physiological hypothesis, these data can tell little about the interactions among genes. To infer genes regulation, it is of paramount importance to monitor dynamic expression profiles, i.e. time-series of expression data collected during the transition from one physiological state to another. A first necessary step, in order to limit the analysis to those genes that actually change expression over time, is to select differentially expressed genes. Methods proposed in the literature usually deal with comparison of static conditions rather than time-course experiment data, and are based on application of modified t-test and ANOVA test which assume Gaussian distribution of analyzed variables. These methods test the significance of the differential expression gene by gene, and their application requires at least two replicated experiments per each condition. In time course experiments, a number of samples is monitored across time and complete replicates of the experiment are seldom available, mainly for cost reasons. Therefore, differentially expressed genes are often selected using an empirical fold change (FC) threshold. This is a far-from-ideal situation, since it is based on an arbitrary choice (e.g. FC=2). In the case of Affymetrix chips, this choice is even more questionable since a constant threshold does not take in account the intensity dependence of the measurement errors, which is a wellknown feature of this technology.. Here, we propose a novel method for gene selection, to be applied on dynamic gene expression profiles, which explicitly accounts for the properties of the measurement errors and addresses the situation where a relative small number of replicates is available.

65. Muselli M, Ruffino F, Valentini G
An Artificial Model for Validating Gene Selection Methods
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Every DNA microarray experiment provides thousands of real values that correspond to the gene expression levels of a tissue. This technology can offer a new valuable tool for medical diagnosis, since it can yield a reliable way to determine the state of a patient (e.g. healthy or ill) by measuring the gene expression level of its cells. The dataset obtained through several microarray experiments can be represented by a table with m rows and n columns: each of its rows is associated with an examined tissues and each column corresponds to one of the considered genes. To specify a particular state for each tissue, a final column must be added to the table. Typically m ~ 100, while n ~ 10000. When analyzing this table to retrieve a model for diagnosis, we have two different targets: besides finding a method that recognizes the state pertaining to a specific tissue (discrimination), we wish to determine the genes involved in this prediction (gene selection). The quality of the discrimination task can be simply estimated through a measure of accuracy, obtained by proper methods (hold-out, cross validation, etc.). On the contrary, it is very difficult to evaluate the results of the gene selection process, since the genes really involved in the onset of a state are actually unknown. A possible way of validating gene selection could be to analyze the performance of the considered method on a diagnosis problem where significant genes are known. Unfortunately, at the present no problem of this kind is available. An alternative approach consists in building an artificial model, starting from proper biological motivations, that generates data having the same statistical characteristics of gene expression levels produced by microarray experiments. As proposed in [1], the behavior of a biological system can be described through regulatory networks that represent the interaction between different genes. The nodes and the edges of these networks are ruled by dynamic equations that involve the concentration of products encoded by genes and consequently the gene expression levels. Each concentration is expressed through a real variable that changes with time and can determine the transition of the system from a state to another. When the organism is in a particular state some concentrations are lower than a given threshold (specific for each gene), while others exceed a proper value. Thus, if we select a definite state, we can say that a gene is in the active state, if its expression level has a value consistent (lower or greater than a specific threshold) with that state. With this definition each gene can be described by a binary variable, assuming value 1 if the gene is active and 0 otherwise. Also the presence of the considered state can be expressed through a Boolean variable, which takes the value 1, if the tissue is in that state, and 0 otherwise. Consequently, the whole biological system can be described by a Boolean function f with n inputs. Each of the m available microarray experiments corresponds to a particular entry of the truth table for the function f; it is formed by an input-output pair (x,y), where x is a vector of n binary values associated with the examined genes and y is a binary value asserting if the corresponding tissue is in the considered state or not. According to this setting, a technique to generate artificial data for validating gene selection methods consists in building a proper Boolean function f, whose truth table entries share the same statistical characteristics of gene expression levels produced by microarray experiments. Then, the quality of the gene selection method is measured by the percentage of significant genes retrieved. Although each Boolean function can be described by a logical expression containing only AND, OR and NOT operations, in our case it is more convenient to obtain f in a different way. In fact, it can be observed that in biological systems genes can be assembled into groups of expression signatures, i.e. subsets of coordinately expressed genes related to specific biological functions. These groups of genes are, in some sense, equivalent with respect to the state determination. Thus, the Boolean function f can be viewed as a combination of several groups of genes. Each group is considered active if a sufficiently large number of its genes is active. Then, the function f assumes value 1 if the number of active groups exceeds a given threshold. A proper algorithm for constructing Boolean functions with these characteristics has been implemented. It is able to generate data resembling those produced by several microarray experiments for diagnostic purpose. In these cases two or more different states are analyzed and the algorithm constructs a specific Boolean function (adopting the above approach) for each state. Then, to allow the application of the gene selection method, a set of input-output pairs is produced for each Boolean function built. The algorithm includes several parameters that can be tuned to achieve a good agreement between the resulting collection of input-output pairs and the dataset produced by microarray experiments for a specific problem. An evaluation of this agreement can be obtained by looking at the accuracy values scored by a discriminant method for different numbers of considered genes. In this contribution, the Leukemia dataset has been considered and a proper artificial model has been generated by constructing a specific Boolean function for each of the two variants of leukemia examined. Figure 1 shows the accuracy values obtained through the leave-one-out approach by applying the SVM-RFE method described in and the technique proposed in. As one can note, the agreement between the success rate curves is excellent in both situations.

66. Malerba G, Trabetti E, Sandri M, Xumerle L, Cavallari U, Galavotti R, Biscuola M, Patuzzo C, Pignatti PF
Single and multilocus analyses for the identification of at risk genotypes in cardiovascular disease
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Parental history of coronary heart disease (CHD) has long been recognized as a risk factor for CHD. Death from coronary heart disease is influenced by genetic factors both in women and men. Several epidemiological studies have described a number of underlying risk factors for cardiovascular disease (diabetes mellitus, hypercholesterolemia, plasma lipids, hypertension) which are as well under a moderate degree of genetic control. In searching for susceptibility genetic factors associated to coronary artery disease (CAD) we determined the genotypes for 35 candidate genes (63 polymorphisms) in a sample of 757 individuals with angiographically documented coronary artery disease (CAD+, cases), and 320 individuals with angiographically documented normal coronary arteries (CAD-, controls). It is very hard to discover true combinations of multiple factors contributing to the disease. Recent publications show a growing number of genes being studied and correlated with phenotypic variations. The difficulties in treating the increasing amount of available data indicate the need for new tools able to retrive the relevant information. We propose the implementation of the classification tree procedure joined to backward elimination as an explorative tool to screen for genetic factors that may be associated to the CAD phenotype.

67. Galfrè S, Morandin F, Cozza A, Pellegrini S, Marangoni R
A method to improve microarray-based identification of SNPs
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Single Nucleotide Polymorphism (SNP) represents a variation in sequence (polymorphism) between individuals caused by a change in a single nucleotide. This process is responsible for most of the genetic variation between individuals. Furthermore, the identification of distinct SNPs may play a crucial role in assessing a potential genetic influence for those disorders that do not appear to have a simple genetic transmission. In turn, the identification of genetic risk factors may contribute to determine biological markers of disease that can be used for the preclinical diagnosis of a pathological condition. Early diagnosis is important for enacting successful therapeutic strategies. In order to obtain more informative data, multiple SNPs should be tested simultaneously in the same individuals. A common protocol used in SNPs investigations is based on Single Base Extension (SBE) followed by microarrays hybridization, in which each DNA sample is hybridized on two arrays: one used to explore the existence of “A” and “T” in the SNP locus, the other array for “C” and “T”. To obtain a global evaluation of the frequency with which each SNP is represented in the population, it is necessary to make a quantitative comparison of the signals recorded from the two arrays. Because of many technical reasons, during this step a large quantity of noise is introduced, thus compromising the reliability of the final data. Here we present a simple approach, based on the usage of three arrays instead of only two, which can address this problem. We also give a statistics method for data processing to be used with the proposed experimental protocol.

68. Riva L, Menozzi G, Sironi M, Cerutti S, Pozzoli U
A Wavelet Based Method to Predict Nucleosome Positions
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The nucleosome core particle is the fundamental repeating subunit of chromatin. It consists of two molecules each of the four ‘core histone ' proteins, H2A, H2B, H3 and H4, and a 147 bp stretch of DNA. A better knowledge of the chromatin nucleosomal organization is crucial to understand many important phenomena occurring in chromosomes. Regulatory mechanisms of gene expression are partially influenced by nucleosome positioning and regions with exposed chromatin (i.e. where nucleosomes are more distant) can be more prone than others to double strand breaks. Analysis of nucleosomal DNA has demonstrated the existence of a weak sequence-dependent signal for nucleosome positioning, this makes classical computational biology methods, like alignment and consensus sequences, poorly applicable here. The ability of DNA to assume certain conformation in certain positions can considerably enhance its binding potential to nucleosomes. According to recent X-ray structure studies, the 147 bp nucleosomal DNA has detectable bends symmetrically displaced around the central position, this suggests the presence of localized periodicities in DNA bendability. Wavelet transform can be used to locally evaluate periodicities allowing to detect positions with a bend distribution similar to known nucleosomal DNA.

69. Menozzi G, Riva L, Sironi M, Pozzoli U
Intron and exon lengths influence on splicing
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Splice site consensus values (CVs) are usually calculated using previously described matrices [1] which are obtained through the analysis of a relatively small splice site number (1500) from different organisms. Now, genome annotation becoming complete, a much more accurate definition is possible. Furthermore, recent studies [2, 3, 4] indicate that consensus value itself is not sufficient to define splice site strength and other parameters must be considered to improve splice site definition. To investigate how intron and exon lengths might be exploited by the splicing machinery to ensure proper splicing control and regulation a human intron database has been developed and analyzed.

70. D'Alessandro L, Felice B, Montemurro F, Medico E
Meta-analysis of multiple microarray datasets reveals a novel genomic signature associated to invasive growth of epithelial cells and early breast cancer metastasis.
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: HGF, also known as “Scatter Factor”, is a mesenchymal cytokine that acts on epithelial and endothelial cells by promoting a highly integrated biological program, hereafter referred to as “invasive growth”. This program involves coordinated control of basic cellular functions including dissociation and migration (“scattering”), invasion of extracellular matrix, proliferation, prevention of apoptosis and polarization. As a consequence, complex developmental processes take place, such as branched morphogenesis of epithelia and angiogenesis. Oncogenic activation by overexpression or point mutation of the gene encoding the tyrosine kinase receptor for HGF, c-MET, is involved in the progression of tumors towards the invasive-metastatic phenotype. To identify genes involved in Met-driven invasive growth, we explored the transcriptional response of mouse liver cells to HGF at different time points. Two different microarray platforms were adopted, consisting respectively of high-density spotted cDNAs (Incyte) and in-situ synthesized oligonucleotides (Affymetrix). Global exploration of 25’000 gene transcripts yielded over 1500 transcriptionally regulated sequences, corresponding to genes involved in the control of the basic biological functions underlying the invasive growth program: transcription, signal transduction, apoptosis, proliferation, cytoskeleton organization, motility and adhesion. Joint analysis of the data obtained by the two platforms allowed identification of genes with more consistent and reproducible regulation. Meta-analysis on genomic expression datasets obtained from breast carcinoma showed that expression of genes belonging to the HGF signature is correlated to cancer progression.

71. Capriotti E, Fariselli P, Rossi I, Casadio R
Improving the Detection of Protein Remote Homologues Using Shannon Entropy Information
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: We analyze the quality of the alignment generated by the profile-profile alignment comparison algorithm known as BASIC and compare the results with those obtained with a structural alignment code. By this we compute that a Shannon entropy value > 0.5 gives a sequence to sequence alignment of the target/template couple comparable to that obtained with the structural alignment performed with CE. In our fold recognition/threading code Tangram, the BASIC profile-profile alignment is implemented as follows: 1. The composition profiles PA and PB for the target and template are generated by multiple alignment of the sequences obtained from a three-iteration PSI-BLAST search on the Non-Redundant database (the inclusion threshold is E=10-3). 2. the dot matrix (D) for the profile comparison of two protein sequences D= PTA S PB, (with S=BLOSUM62 substitution matrix) is computed using linear algebra routines. 3. the D matrix is searched for high-scoring alignment by means local Smith-Waterman dynamic programming algorithm. The test set used for the evaluation is composed by 185 template/target couples of PDB structures that share the same SCOP label, but have less than 30% sequence identity When the top-scoring alignments for each target protein in the test set is considered, our BASIC implementation detects the full SCOP label for 125 couples (68%) and generates 114 (62%) alignments with a MaxSub score >=1. Interestingly, it is found that nearly all of the high-quality alignments share a common feature: the average Shannon entropy for the profile sections aligned together is greater than 0.5 for both the template and the target. If only the top scoring alignments for which this condition holds are considered, a subset of 119 alignments is selected, and for 116 of them (97%) the full SCOP label can be assigned to the target, while 108 (91%) gets a nonzero MaxSub score, with an average score of 4.6 MaxSub on the subset On the same 119 couples, the structural alignment program CE computes a nonzero MaxSub score for 116 of them, with an average of 5.7 points. These results indicate that the Shannon entropy value can be used to discriminate a subset of sequence profile-profile alignments of quality comparable to that obtained by means of a structural alignment program.

72. Di Dato V, Di Lauro R, Chiusano ML
Comparative genomics to identify regulatory regions: an example from the PAX8 gene
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Comparisons between human and rodent non coding sequences are widely used for the identification of highly conserved sequences that could suggest functional implications. In particular, intergenomic comparisons are rapidly evolving for investigations on regulatory regions involved in promoter activity. Moreover, the efficacy of such comparisons for the identification of functional regulatory elements, can be of help also in the study on the evolutionary dynamics of promoter sequences. We are conducting computational analyses, based on comparative genomics between Homo sapiens and Mus musculus, on regions of at list 200kb spanning the entire genomic locus of genes involved in tyroid differentiation, to understand their expression mechanisms and regulation. A preliminary study on the PAX8 gene was supported by experimental analysis. The analysis resulted in the identification of 91 conserved regions of which 35 located at the 5’ of the gene were chosen to start the experimental analysis. They were tested for functional implications in PAX8 promoter activity leading to the identification of tyroid specific regulatory regions. The results of the current analysis provide experimental evidences that in turn have three fundamental perspectives: to help the clarification of the mechanisms of regulation and expression of the genes investigated; to improve the computational methodology proposed and strengthen its predictive power; to validate the computational approaches for the analysis of transcription factor binding sites, giving more hints to understand their organization and the pattern of evolution in regulatory sequences.

73. Ceroni A, Frasconi P
Using Constraints on Beta Partners to Reconstruct Mainly Beta Proteins
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The knowledge of the spatial conformation of a protein can help the study of its function, but the number of resolved structures is still limited by the low throughput of the methods used. Structure prediction could bridge the sequence-structure gap, but no reliable and general methods have yet been proposed. An attempt to simplify the problem has been made by trying to predict the contact map of a protein instead of its atoms positions. It has been demonstrated the protein structure can be reconstructed with sufficient precision even if the contact map contains error. Unfortunately, the prediction of contact maps is still very unreliable and it is not clear whether the type of errors made by the predictor can be corrected by the reconstruction method. A low-detail representation of the protein conformation could extract the relevant information to train more efficient predictors. The coarse-grain contact map is defined using contacts between secondary structure segments. The prediction of this type of contacts has been tried, but no results exists about the feasibility of a reliable method that uses only this type of information to reconstruct the protein structure. In this work we concentrate on contacts defined by beta partners. The geometry and connectivity of beta strands imposes strong constraints on the overall structure of the protein, especially for those chains thar are formed mainly by residues in beta conformation. The reconstruction of the structure of this kind of proteins would be enhanced by the knowledge of the secondary structure and the indication of which strands are partners. We propose here an efficient procedure to find a structure that matches the aforementioned characteristics of a given protein in its native conformation.

74. Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M, Cameron S, Martin DMA, Ausiello G, Brannetti B, Costantini A, Zanzoni A, Maselli V, Via A, Cesareni G, Diella F, Superti-Furga G, Wyrwicz L, Ramu C, McGuigan C, Gudavalli R, Letunic I, Bork P, Rychlewski L, Kuster B, Helmer-Citterich M, Hunter WN, Aasland R, Gibson TJ
Eukaryotic Linear Motifs in the ELM Web Tool
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Reflecting the modular nature of eukaryotic proteins, several WWW servers (e.g. PFAM, SMART, PROSITE) are dedicated to revealing domains in protein sequences. However, there is no resource, which specifically focuses on short functional motifs (targeting peptides, docking modules, glycosylation sites, phosphorylation sites, etc), yet these modules are just as important for function as the larger protein domains. Domains are identified by conventional methods, such as patterns (regular expressions) profiles or HMM models. But statistically robust methods cannot usually be applied to small motifs, while pattern-based methods over-predict enormously so that the few true motifs are lost amongst the many false positives. ELM (Eucariotic Linear Motifs - http://elm.eu.org) [1] is a new web based tool for the prediction of these small motifs on eukaryotic protein sequences. At the moment, the ELM database contains manually curated information about 114 known linear motifs in the form of regular expressions, profiles or hidden markov models that identify the motifs on the sequence. ELM addresses the over prediction deficiency of other methods by the use of context-based rules and logical filters that exclude false positives. The current version of the ELM server provides core functionality including filtering by cell compartment, phylogeny, globular domain clash (using the SMART/Pfam databases), secondary structure, and solvent accessibility. The current set of motifs is not at all exhaustive. Filters work by comparing the information on the motifs stored in the db (taxonomic, structural and cellular context) with the information submitted by the user together with his sequence. The structural filter works by automatically modeling the submitted protein sequences, whenever a good template is found in the SCOP database, and comparing predicted solvent accessibility values and secondary structure features with the corresponding values associated to ELM matches on true positive structures. The ELM server was launched on November 2002 and regularly enhanced since then. The server activity has been running for several months at > 45,000 hits from > 1700 unique internet sites.

75. Amici R, Bartocci E, Merelli E
A virtual laboratory for simulating metabolic pathways
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Generally, a biological system consists of interconnected processes cooperating to carry out the global behaviour of the system, by defining functional rules and relationships between the subunits. This kind of processes organization leads to a dynamic system model based on the temporal evolution of its parameters. The difficulty to establish a priori the response to new stimulus from the environment increases the complexity of this kind of systems. Among the great number of biological systems, that we can find in nature, we consider metabolic pathways, that are a collection of enzymatic processes involved in the transformation of several substances. Visiting the KEGG web site1 it’s possible to view the available pathways; we choose to study the citric acid cyclic process drawn in Figure 1 and we propose a virtual laboratory for simulating the behaviour of the selected pathway.

76. Di Bernardo D, Gardner TS, Collins JJ
Drug Target Identification from Inferred Gene Networks: a computational and experimental approach
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Genome-wide gene expression profiles provide a means to discover the direct mediators of biologically active compounds. We have already shown that it is possible to infer a predictive model of a genetic network by overexpressing each gene of the network and measuring the resulting expression at steady state of all the genes in the network. This approach however requires the perturbation of each gene and the measurement of the perturbation magnitude. In this work we explored the possibility of inferring predictive models of large genetic networks without requiring the knowledge of which genes have been perturbed and by what amount. The network identification algorithm here described allows to infer a model of a genetic network from perturbation experiments for which the perturbed genes are not known. This model can be used to identify the target gene, or genes, of a given drug.

77. Cordero F, Lazzarato F, De Bortoli M, Weisz A, Cicatiello L, Scafoglio C, Basile W, Calogero RA
Putative Estrongen-Responsive Genes database (PERG)
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Estrogens are known to regulate the proliferation of breast cancer cells and to alter their cytoarchitectural and phenotypic properties, but the gene networks and pathways by which estrogenic hormones regulate these events are only partially understood. As starting point to obtain a genome-wide picture of the genes modulated by estrogens we have built a database of the genes having in their putative promoter region Estrogen-responsive Element (ERE).

78. Bansal M, Di Bernardo D
Inferring gene regulatory networks from time expression profiles
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Recent developments in large-scale genomic technologies, such as DNA microarrays and mass spectroscopy have made the analysis of gene networks more feasible. However, it is not obvious how the data acquired through such method can be assembled into unambiguous and predictive models of these networks. In a recent study our group developed an algorithm (Network Identification by multiple regression – NIR) that used a series of steady state RNA expression measurements, following transcriptional perturbations, to construct a model of a 9 gene network that is a part of larger SOS network in E.Coli. Though the NIR method proved highly effective in inferring small microbial gene networks, its practical utility is limited because it requires: (i) prior knowledge of which genes are involved in the network of interest; (ii) the perturbation of all the genes in the network via the construction of appropriate episomal plasmids; (iii) the measurement of gene expressions at steady state (i.e., constant physiological conditions after the perturbation). This experimental setup is unpractical for large networks, it is not easily applied to higher organisms, and, most importantly, it is not applicable if there is no prior knowledge of the genes belonging to the network. Here we are proposing a new algorithm that can infer the network of gene-gene interactions to which a gene of interest belongs and identify its direct targets, using the perturbation of only one of the genes in the network. To this end, we need to measure gene expression profiles at multiple time points following perturbation of only the known gene, or genes, and without the need of the steady-state assumption.

79. Ferraro E, Ausiello G, Panni S, Cesareni G, Helmer-Citterich M
Definition of a neural strategy for the prediction of protein interaction specificity
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: We are working at the development of a neural network strategy for the prediction of peptide recognition specificity by SH3 domains. As a training set we use the results of a large number of SH3-peptide binding experiments obtained by the SPOT synthesis technique (PepSPOT). As input for the neural network, we consider the sequence of both the domain and the hypothetical ligand peptide, in order to infer for each domain peptide combination the likelihood that they form a complex in a binding reaction. The method will be applied to predict the affinity of any peptide for domains of unknown specificity. We analyzed data from PepSPOT experiments for nine SH3 domains each tested against several hundred peptides: we decided to construct a proper dataset where each data point includes the domain and peptide sequence, and a figure in arbitrary BLU units that correlates with binding affinity. In order to translate this information in a format that can be easily captured from a neural network, we focused on three main problems: i) the information coding; ii) the dimension of the input space; iii) the correct identification of the two classes (binding and not binding). We decided to use the orthogonal representation of the sequences and, in order to reduce the huge dimensionality, of the domains residues we only considered those positions that make contact with the ligand peptide. The contact positions are identified from the analysis of the SH3-peptide complexes of known structure and extended to other SH3 domains of known sequence by multiple alignment. For the peptide sequences we restricted our representation to the most significant positions, excluding the two consensus prolines from the input. Finally we identified the binding class considering all the peptides that show spot intensity higher than 10000 BLU units. The resulting dataset was strongly unbalanced and this implies the pursuit of different methodological strategies: usual feed-forward neural networks requires the balancing of the training set, while kernel methods (support vector machine) perform classification even on unbalanced sets but with the correct choice of a non-linear kernel. We will verify the performance of the neural strategy with respect to regular expressions, position weight matrices, position specific scoring matrices (PSSMs) and the SPOT procedure.

80. Cappadona S, Diestellhorst L, Kemp G, Cerutti S
Analysis of β-helix proteins using the STACK toolkit
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: β-helix proteins contain a solenoid domain of parallel β-strands folded into a large prism. Each turn of the solenoid, called a β-coil, consists of a succession of a few (usually three) β-strands. β-strands from adjacent coils stack to form parallel β-sheets that make up the faces of the prism. These faces are linked by loop regions that protrude from the helix and, in many cases, form the binding site of the helix. The cross section of this prism is typically L-shaped in right-handed parallel β-helices and triangular in left-handed parallel β-helices. Left-handed and right-handed β-helices have a different cross section The stability of the domain is mainly obtained by the stacking of similar residue side-chains at equivalent positions in successive coils, both inside and outside the helix. The inward side chains are mainly hydrophobic and, when not, maximal hydrogen bonding or electrostatic interactions neutralise their polar or charged groups. We have formalised the intuitive notion of a β-helix in a set of objective algorithms that recognize automatically the basic structural elements of β-helices: residue stacks, β-coils, cores and β-helices. We define the core of a β-helix as the helical domain of the protein, as distinguished from the protruding loop regions.

81. Carrara GE, Stella A, Pinciroli F, Alcalay M, Masseroli M
Automatic extraction of gene annotations from data-rich HTML pages
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: High-throughput technologies create the necessity to integrate the resulting gene expression data with information mined from large amounts of gene annotations within several different biomolecular databanks. Most of these databanks can be queried only via web, for a single gene at a time, and query results are generally available in HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise and resources for locally re-implementing the databank. Web wrappers can automate extraction of the information of numerous genes from different web-based databanks. As the content of a dynamic web page can change from one query to another (e.g. tables with extra rows or missing fields), such wrappers should be able to locate and extract data of interest inside different HTML pages. Unfortunately, HTML tags describe the visual formatting of data, not their semantics. Thus, human-readability and machinereadability are often not equivalent. Wrapper generation tools help creating a wrapper for a specific source, i.e. a web-based biomolecular databank with its own HTML layout. First, the user is invited via a Graphic User Interface to select data of interest inside one or more sample HTML pages. Then, the system saves this information as an extraction template for that specific source. The long term goal is to generate wrappers that scale well with the number of processed web pages.

82. Marangoni R
Simulating genes families
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: After their complete sequentiation, genomes are clusterized in genes families, the members of which share a significant similarity in their sequences (and often in the structures of their proteic products) but they are often playing different biological roles. When there is such a relationship between two genes, they are called paralogs. It is of general believe, that paralogs genesis is due to an iterate mechanism of gene duplication with subsequent modification of the copies. In a previous work describing a method to reconstruct the history of genes families, a simulator of genes families was introduced in order to bypass the lack of experimental data about genes families history. Working with these simulated data, some interesting features concerning real biological families has been found. Nevertheless, they have not been explored, since they were too far from the main subject of that paper. In the present work, a simulator similar to that used in the above cited paper has been developed, and many different synthetic data have been generated. The simulation strategy, the biological foundation of it and the comparison between simulated and real sequences are discussed in detail in the poster.

83. Bonizzoni P, Dondi R, Rizzi R, Pesole G
ASPIC: a Novel Method to Predict Alternative Splicing
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: In this paper a new method for detecting splicing sites is proposed. It is based on a combined analysis of all available transcript data in order to produce all transcript alignments to the genomic sequence. The algorithm requires that all transcript-genome alignments are fully compatible with a plausible common exon-intron structure within the genomic sequence. The algorithm was implemented in the ASPIC (Alternative Splicing PredICtion) software.

84. Lazzari B, Milanesi L, Stella A, Caprera A, Bianchi F, Vecchietti A, Pozzi C
ESTree DB: a Tool for Peach Functional Genomics
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: A collection of about 8000 Expressed Sequence Tags (EST) sequences has been prepared starting from clones belonging to four cDNA peach libraries. Libraries have been prepared from Prunus persica mesocarps at four different developmental stages with the aim to collect data for deep investigation of the maturation process at the molecular level. A fully automated pipeline (ESTree DB) has been prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data have been collected in a MySQL database called ESTree available at this URL: http://www.itb.cnr.it/ESTree. These data are produced in the frame of the activities of the National Consortium for Peach Genomics (ESTree), involving also the Universities of Padova, Udine and other research Institutions.

85. Catalano D, Licciulli F, Grillo G, Liuni S, Pesole G, Saccone C, D'Elia D
MitoNuc: a database of nuclear genes encoding for mitochondrial proteins
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: Mitochondria are sub-cellular organelles, present in the majority of eukaryotic organisms, which play a central role in the energy metabolisms of cells. They are also involved in many other cellar processes such as apoptosis, aging and in a number of different human diseases, including Parkinson’s, diabetes mellitus and Alzheimer’s. Despite to their importance in the cell life maintenance, about the 95% of proteins, contributing to mitochondrial biogenesis and functional activities, are nuclear encoded, synthesized in the cytosol and targeted to mitochondria. The expression and assembling of these proteins are strictly dependent by the coordinated activities of the two genomes, mitochondrial and nuclear, but the molecular mechanisms and co-evolutionary processes of the cross-talk between these two genomes are still largely unknown. MitoNuc is a specialized database of nuclear encoded mitochondrial proteins in Metazoa. It provides comprehensive data on genes and proteins consolidating information from external databases. These data include: gene sequence, structure and information from ENSEMBL, protein sequence and information from SWISSPROT, transcript sequence and structure from RefSeq and UTRdb, disease information from OMIM. Each database entry consists of a nuclear gene coding for a mitochondrial protein in a given species, and reports information on: species name and taxonomic classification; gene name, functional product, sub-cellular mitochondrial localization, protein tissue specificity, Enzyme Classification (EC) code for enzyme and disease data related to protein dysfunction. For each gene and gene product the Gene Ontology (GO) classification with regard to molecular function, biological processes and cellular component is reported too. Links to external database resources are also provided. As far as the gene and transcript sequences data are concerned, in the previous MitoNuc releases they were extracted from the EMBL related entries. Due to the high level of sequences redundancy in the primary database, the majority of MitoNuc entries contained more than one transcript and coding gene sequence for the same gene, thus introducing a remarkable redundancy level that affects the effectiveness of the database for sequence analysis aims. In order to remove redundancy we generated a MitoNuc section of gene and transcript sequences derived from those organisms whose genome sequence draft has been completed and annotated in ENSEMBL. These MitoNuc entries are available in the database section called “MitoNuc Genomics” that, at present, include the following species: Homo sapiens, Rattus Norvegicus and Mus Musculus. MitoNuc can be queried using the SRS Retrieval System (http://www.ba.itb.cnr.it/srs/); the present release contains a total of 1344 entries among which 662 are collected in the MitoNuc Genomic section. The total number of species included in MitoNuc is about 64.

86. Mutarelli M, Basile W, Cicatiello L, Scafoglio C, Colonna G, Weisz A, Facchiano AM
Comparative analysis with three different microarray platforms of the oestrogenresponsive transcriptome from breast cancer cells
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Unspecified

Abstract: The DNA microarray technique makes it possible to analyze the expression patterns of tens of thousands genes in a short time. The wide use of this technique and the rapidly improving different technologies available by several commercial and academic providers has led to the publication of thousands of results, extremely heterogeneous with respect to the type of technology used, to the kind of normalization and analysis subsequently applied to data an so on. This leads to a difficulty in collaborating and exchange data between groups with common research interest, whereas collaborations would be extremely useful due to the high cost of this techniques but also to the consideration that an experiment carefully designed could bring results relevant to different groups, each focusing on a different aspect of a main biological problem. So the awareness for the need of common standards or, at least, comparable technologies is emerging in the scientific community, as shown by the effort of the on-purpose Microarray Gene Expression Data (MGED) Society, which is trying to set up at least experimental methodology, ontology and data format standards. In addition, it is important the ability of being able to compare newly produced data with preceding experiments, so to ensure of keeping high the value of results produced with equipment of the old generation. Otherwise, a large amount of the work produced until the outcome of a new release of technology would be lost. This, considering that the huge amount of data produced is largely underexploited, would be a great loss for the scientific community. In fact, as analysis algorithms are improving, existing data can be re-analyzed to give more precise results, thus helping to adjust the planning of future experiments. We thus started this work with the aim of evaluating the technical variability between three commonly used microarray platforms, such to adapt the first part of the analysis to the peculiarity of each technique, and the feasibility of a common subsequent analysis path, thus taking advantage of the different data-extraction abilities of the three. For this purpose, we used three different commercial chips to study the gene expression profiles of hormone-responsive breast cancer cells with and without stimulation with estradiol: i) the Incyte ‘UniGEM V 2.0’ microarrays, containing over 14,000 PCR-amplified cDNAs, corresponding to 8286 unique genes, spotted at a high density pattern onto glass slides; ii) the Affymetrix technology, based on 25 nucleotide-long oligonucleotides directly synthesized on a GeneChip® array, representing more than 39,000 transcripts derived from approximately 33,000 unique human genes; iii) the Agilent ‘Human 1A Oligo’ Microarray consisting of 60-mer, in situ synthesized oligonucleotide probes for a total of about 18000 different genes. The RNA derived from human breast cancer cells (ZR-75.1) stimulated for 72 hrs with 17beta-estradiol after starvation in steroid-free medium for 4 days; the reference sample was derived from synchronized cells grown in steroid-free environment. The same samples were used to generate fluorescent targets to be hybridized on the different slides. Hybridization reactions were performed with four (for the Agilent slides) and two or three (for the other platforms) technical replicates, with a single (Incyte) or double (Agilent), balanced dye swap for competitive hybridizations. A total combined number of 18,823 unique UniGene clusters were represented among the three platforms used. By focusing only on a subset of 5,733 genes that were present in all the chips, about 50% appeared to be significantly expressed and 25% genes resulted significantly regulated by 17beta-estradiol treatment in our experiment. A quite low overlapping was observed between the lists of regulated genes obtained by the three systems. We are working on understanding the conflicting results on some of the genes. The majority of genes were detected by only the Affymetrix platform, probably as a consequence of the higher sensitivity of this system, which allows the detection of some gene expression levels that are not identified with the other platforms. However, a number of genes was identified only by the cDNA and/or oligonucleotide systems. Another possible experimental explanation is that the DNA sequences spotted on the arrays show different affinity for the target, so each slide has a particular pattern of probe-target annealing, although the same genes are represented on all the platforms. Finally, we are improving the data processing by statistical methods in order to allow the better understanding of the experimental results.



BITS Meetings' Virtual Library
driven by Librarian 1.3 in PHP, MySQLTM and Apache environment.

For information, email to paolo.dm.romano@gmail.com .