Author: Tristan de Jong; Victor Guryev; Yury M. Moshkin
Title: Discovery of pharmaceutically-targetable pathways and prediction of survivorship for pneumonia and sepsis patients from the view point of ensemble gene noise Document date: 2020_4_11
ID: f5w05rc2_18
Snippet: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we c.....
Document: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we compared gene ensembles noise between survived and deceased patients in the discovery cohorts. The p-value cut-offs for the model features were selected based on maximization of models' training accuracy (see Methods). XGBoost hypertuning parameters: learning rate, complexity, depth, etc. were optimized based on the crossvalidation. To avoid overfitting, we used early epoch stopping, which was estimated from the test fold of the discovery cohort (see Methods). Because of the class-imbalance, AUC (area under the receiver operating characteristic (ROC) curves) was used to evaluate the model performance. The validation cohorts were hidden from the feature selection and training. AUCs for the discovery and validation cohorts were 0.871 and 0.707 respectively, suggesting a reasonable accuracy of the model. However, from the model scores, and evaluation of the model specificity/sensitivity it appears that the model is biased towards the prediction of major class (survived) ( Figure 3A , Table 2 and Table S2A ). Thus, class prediction balanced accuracies (bACC = Specificity/2 + Sensitivity/2) were 0.799 and 0.701 for the discovery and validation cohorts respectively. Nonetheless, the survival probability for patients predicted to have a high risk of mortality was significantly lower than the survival probability of patients predicted to have low risk of mortality in both discovery and validation cohorts. To that, our model better predicts the risks of mortality as compared to the Mars1 endotype inferred from the log gene expression unsupervised learning ( Figure 3C ) [8] . Potentially, this could be due to a lower inter-individual variability of gene ensembles noise as compared to log gene expression ( Figure S3 ).
Search related documents:
Co phrase search for related documents- discovery cohort and ensemble gene noise: 1
- discovery cohort and epoch stopping: 1
- discovery cohort and etc depth: 1
- discovery cohort and etc depth complexity: 1
- discovery cohort and etc depth complexity rate: 1
- early epoch stopping and ensemble gene noise: 1
- early epoch stopping and epoch stopping: 1
- early epoch stopping and etc depth: 1
- early epoch stopping and etc depth complexity: 1
- early epoch stopping and etc depth complexity rate: 1
- ensemble gene noise and epoch stopping: 1
- ensemble gene noise and etc depth: 1
- ensemble gene noise and etc depth complexity: 1
- ensemble gene noise and etc depth complexity rate: 1
- epoch stopping and etc depth complexity: 1
- epoch stopping and etc depth complexity rate: 1
Co phrase search for related documents, hyperlinks ordered by date