Results

Selected article for: "accuracy validation and machine learning"

Author: Tristan de Jong; Victor Guryev; Yury M. Moshkin

Title: Discovery of pharmaceutically-targetable pathways and prediction of survivorship for pneumonia and sepsis patients from the view point of ensemble gene noise

Document date: 2020_4_11

ID: f5w05rc2_18

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we c.....

KG: Link to Knowledge Graph

Complete Snippet

Document: Overall, class-imbalance, noise due to the inter-individual heterogeneity and highdimensionality of model features are among the major problems of machine learning [37] . In part, ensemble gene noise leads to a reduction in inter-individual variability ( Figure S3 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.04.10.035717 doi: bioRxiv preprint by t-test feature selection. For this, we compared gene ensembles noise between survived and deceased patients in the discovery cohorts. The p-value cut-offs for the model features were selected based on maximization of models' training accuracy (see Methods). XGBoost hypertuning parameters: learning rate, complexity, depth, etc. were optimized based on the crossvalidation. To avoid overfitting, we used early epoch stopping, which was estimated from the test fold of the discovery cohort (see Methods). Because of the class-imbalance, AUC (area under the receiver operating characteristic (ROC) curves) was used to evaluate the model performance. The validation cohorts were hidden from the feature selection and training. AUCs for the discovery and validation cohorts were 0.871 and 0.707 respectively, suggesting a reasonable accuracy of the model. However, from the model scores, and evaluation of the model specificity/sensitivity it appears that the model is biased towards the prediction of major class (survived) ( Figure 3A , Table 2 and Table S2A ). Thus, class prediction balanced accuracies (bACC = Specificity/2 + Sensitivity/2) were 0.799 and 0.701 for the discovery and validation cohorts respectively. Nonetheless, the survival probability for patients predicted to have a high risk of mortality was significantly lower than the survival probability of patients predicted to have low risk of mortality in both discovery and validation cohorts. To that, our model better predicts the risks of mortality as compared to the Mars1 endotype inferred from the log gene expression unsupervised learning ( Figure 3C ) [8] . Potentially, this could be due to a lower inter-individual variability of gene ensembles noise as compared to log gene expression ( Figure S3 ).

Search related documents:

Co phrase search for related documents

balance accuracy and high risk: 1
balance accuracy and machine learning: 1, 2, 3, 4
balance accuracy and ROC curve: 1
balance accuracy and specificity sensitivity: 1, 2, 3, 4

Co phrase search for related documents, hyperlinks ordered by date

ABSTRACT:

TERMS:

DOCUMENTS: