Author: Tristan de Jong; Victor Guryev; Yury M. Moshkin
Title: Discovery of pharmaceutically-targetable pathways and prediction of survivorship for pneumonia and sepsis patients from the view point of ensemble gene noise Document date: 2020_4_11
ID: f5w05rc2_18
Snippet: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we c.....
Document: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we compared gene ensembles noise between survived and deceased patients in the discovery cohorts. The p-value cut-offs for the model features were selected based on maximization of models' training accuracy (see Methods). XGBoost hypertuning parameters: learning rate, complexity, depth, etc. were optimized based on the crossvalidation. To avoid overfitting, we used early epoch stopping, which was estimated from the test fold of the discovery cohort (see Methods). Because of the class-imbalance, AUC (area under the receiver operating characteristic (ROC) curves) was used to evaluate the model performance. The validation cohorts were hidden from the feature selection and training. AUCs for the discovery and validation cohorts were 0.871 and 0.707 respectively, suggesting a reasonable accuracy of the model. However, from the model scores, and evaluation of the model specificity/sensitivity it appears that the model is biased towards the prediction of major class (survived) ( Figure 3A , Table 2 and Table S2A ). Thus, class prediction balanced accuracies (bACC = Specificity/2 + Sensitivity/2) were 0.799 and 0.701 for the discovery and validation cohorts respectively. Nonetheless, the survival probability for patients predicted to have a high risk of mortality was significantly lower than the survival probability of patients predicted to have low risk of mortality in both discovery and validation cohorts. To that, our model better predicts the risks of mortality as compared to the Mars1 endotype inferred from the log gene expression unsupervised learning ( Figure 3C ) [8] . Potentially, this could be due to a lower inter-individual variability of gene ensembles noise as compared to log gene expression ( Figure S3 ).
Search related documents:
Co phrase search for related documents- gene expression and inter individual heterogeneity: 1
- gene expression and inter individual variability: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
- gene expression and log gene expression: 1, 2, 3, 4, 5, 6, 7, 8, 9
- gene expression and low risk: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28
- gene expression and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48
- gene expression and major class: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18
- gene expression and major problem: 1, 2, 3, 4
- gene expression and mars1 endotype: 1, 2
- gene expression and model feature: 1, 2, 3, 4
- gene expression and model performance: 1, 2, 3, 4, 5, 6, 7, 8, 9
- gene expression and model score: 1, 2, 3, 4, 5, 6
- gene expression and model specificity sensitivity: 1, 2
- gene expression and ROC curve: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
- gene expression and specificity sensitivity: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29
- gene expression and survival probability: 1, 2
- gene expression and test fold: 1, 2
- gene expression and training accuracy: 1, 2, 3
- gene expression and validation cohort: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- gene expression and validation discovery: 1, 2, 3, 4, 5, 6, 7
Co phrase search for related documents, hyperlinks ordered by date