Document: is the number of tumor types in the dataset, that depends 64 on miRNA's relative position, bringing the result displayed in Figure 1 . As the objective is to find and validate a reduced list of miRNAs to be used as a 66 signature, feature selection is to be performed on the dataset. Popular approaches to 67 feature selection range from univariate statistical considerations, to iterated runs of the 68 same classifier with a progressively reduced number of features, in order to assess the 69 contribution of the features to the overall result. As the considered case study is 70 particularly complex, however, relying upon simple statistical analyses or a single 71 classifier might not suffice. Following the idea behind ensemble feature selection [31] [32] [33] , 72 we use multiple algorithms to obtain a more robust predictive performance. For this 73 purpose, we train a set of classifiers to then extract a sorted list of the most relevant 74 features from each. As, intuitively, a feature considered important by the majority of 75 classifiers in the set is likely to be relevant for our aim, the information from all 76 classifiers is then compiled to find the most common relevant features. 77 Starting from a thorough comparison of 22 different state-of-the-art classifiers on the 78 considered dataset presented in [34] , in this work a subset of those classifiers is selected 79 considering both (i) high accuracy and (ii) a way to extract the relative importance of 80 the features from the trained classifier. After preliminary tests to set algorithms' 81 hyperparameters, 8 classifiers are chosen, all featuring an average accuracy higher than 82 90% on a 10-fold cross-validation: 83 • • SVC (Support Vector Machines Classifier with a linear kernel) [42] 91 All considered classifiers are implemented in the scikit-learn Python toolbox [43] . 92 Overall, the selected classifiers fall into two broad typologies: those exploiting 93 ensembles of classification trees [44] (Bagging, GradientBoosting, RandomForest), 94 and those optimizing the coefficients of linear models to separate classes 95 (LogisticRegression, PassiveAggressive, Ridge, SGD, SVC). Depending on classifier 96 typology, there are two different ways of extracting relative feature importance. For 97 classifiers based on classification trees, the features used in the splits are counted and 98 sorted by frequency, from the most to the least common. For classifiers based on linear 99 models, the values of the coefficients associated to each feature can be used as a proxy 100 of their relative importance, sorting coefficients from the largest to the smallest in 101 absolute value. As the two feature extraction methods return heterogeneous numeric 102 values, only the relative sorting of features provided by each classifier is considered. We 103 arbitrarily decide to extract the top 100 most relevant features, so we assign to each 104 feature f a simple score S f = N t /N c , where N t is the number of times that specific 105 features appears among the top 100 of a specific classifier instance, while N c is the total 106 number of classifiers instances used; for instance, a feature appearing among the 100 107 most relevant in 73% of the classifiers used would obtain a score S f = 0.73. In order to 108 increase the generalizability of our results, each selected classifier is run 10 times, using 109 a 10-fold stratified cross-validation, so that each fold preserves the percentage of Table 2 comp
Search related documents:
Co phrase search for related documents- absolute value and cross validation: 1
- art state and average accuracy: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- art state and case study: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- art state and classification tree: 1
- art state and classifier instance: 1
- art state and classifier provide: 1, 2
- art state and classifier set: 1, 2
- art state and classifier subset: 1
- art state and cross validation: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- art state classifier and average accuracy: 1, 2
- art state classifier and case study: 1
- art state classifier and classifier provide: 1, 2
- average accuracy and case study: 1, 2, 3
- average accuracy and classifier set: 1, 2, 3
- average accuracy and cross validation: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- case study and classification tree: 1
- case study and classifier set: 1, 2, 3
- case study and consider case study: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
Co phrase search for related documents, hyperlinks ordered by date