bathrest.blogg.se

Rank these systems in order of decreasing entropy.
Rank these systems in order of decreasing entropy.




The elimination of a single variable at each step (as in the basic RFE procedure) is, however, inefficient. The contribution of each variable is defined through a function of the corresponding weight coefficient that appears in the formula defining the SVM model. At each model building step, a pair (classifier, ranked gene set) is constructed from samples in a training set and evaluated on a test set, where training and test are subsets of the data available for development at this step.

rank these systems in order of decreasing entropy.

However, RFE for SVM has high computational costs.

rank these systems in order of decreasing entropy.

The RFE procedure for SVM has been evaluated in experimental analyses and it is considered a relevant method for gene selection and classification on microarrays. We have developed the entropy-based recursive feature elimination (E-RFE) as a non-parametric procedure for gene ranking, which accelerates – without reducing accuracy – the standard recursive feature elimination (RFE) method for SVMs. The method also provides an honest estimate of the model accuracy on novel cases (predictive accuracy). The methodology described in this paper is designed to obtain a list of candidate genes, ranked for importance in discriminating between classes, and the corresponding SVM classification model. For example, recent results have shown that the clinical outcomes of high grade gliomas and of cutaneous T cell lymphoma may be better identified by gene expression-based classification than by histological classification or measures of tumor burden. A typical prediction task for the methodology would be the identification of patients resistant to a therapy or the definition of a 'terminal signature', a set of genes and a decision rule identifying short-term survivors who might benefit from specific therapies. In this paper, we address the problem of developing a practical methodology for gene ranking based on the support vector machine classifier (SVM), a machine learning method that is considered particularly suitable in the classification of microarray data. In particular, the perspective of providing new targets for therapy and of developing clinical biomarkers has given a strong impulse to methods for ranking genes in terms of their importance as predictor variables in the construction of classification models from arrays. The main objectives of a discovery process based on microarray data are the understanding of the molecular pathways of diseases, their early detection, and the development of measures of individual responsiveness to existing or new therapies. The study of gene expression patterns is expected to enable significant advances indisease diagnosis and prognosis.

rank these systems in order of decreasing entropy.

Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data.






Rank these systems in order of decreasing entropy.