1. Masulli F, Rovetta S
Ensembling and Clustering Approach to Gene Selection
Meeting: BITS 2004 - Year: 2004
Full text in a new tab
Topic: Microarray algorithms and data analysis
Abstract: In pattern recognition the problem of input variable selection has been traditionally focused on technological issues, e.g., performance enhancement, lowering computational requirements, and reduction of data acquisition costs. However, in the last few years, it has found many applications in basic science as a model selection and discovery technique, as shown by a rich literature on this subject, witnessing the interest of the topic especially in the field of bioinformatics. A clear example arises from DNA microarray technology that provides high volumes of data for each single experiment, yielding measurements for hundreds of genes simultaneously. In this paper, we propose a flexible method for analyzing the relevance of input variables in high dimensional problems with respect to a given dichotomic classication problem. Both linear and non-linear cases are considered. In the linear case, the application of derivative-based saliency yields a commonly adopted ranking criterion. In the non-linear case, the approach is extended by introducing a resampling technique and by clustering the obtained results for stability of the estimate. The method we propose (seeTab. 1) is termed Random Voronoi Ensemble since it is based on random Voronoi partitions , and these partitions are replicated by resampling, so the method actually uses an ensemble of random Voronoi partitions. Within each Voronoi region, a linear classification is performed using Support Vector Machines (SVM) with a linear kernel , while, to integrate the outcomes of the ensemble, we use the Graded Possibilistic Clustering technique to ensure an appropriate level of outlier insensitivity.