Results

Selected article for: "decision tree and logistic regression"

Author: Nicholas Fountain-Jones; Gustavo Machado; Scott Carver; Craig Packer; Mariana Mendoza; Meggan E Craft

Title: How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure

Document date: 2019_3_6

ID: jc5c87b9_1

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: An individual's risk of infection by a pathogen is dependent upon a wide variety of host and 2.2 Pre-processing 166 167 It is important to account for missing data either by imputation or removal prior to model 168 construction (Fig. 2) . Some machine learning algorithms, such as gradient boosting, bin missing 169 data as a separate node in the decision tree (Friedman, 2002, Fig. S1 ), however other algorithms 170 such as SVM are less flexible. I.....

KG: Link to Knowledge Graph

Complete Snippet

Document: An individual's risk of infection by a pathogen is dependent upon a wide variety of host and 2.2 Pre-processing 166 167 It is important to account for missing data either by imputation or removal prior to model 168 construction (Fig. 2) . Some machine learning algorithms, such as gradient boosting, bin missing 169 data as a separate node in the decision tree (Friedman, 2002, Fig. S1 ), however other algorithms 170 such as SVM are less flexible. In order to compare predictive performance across models, 171 missing data can either be imputed or removed from the dataset. Although providing specific 172 advice on whether to include missing data or not is outside the scope of this paper (see 173 Nakagawa & Freckleton, 2008), we provide an option if imputation is suitable for the study 174 problem. We integrated the 'missForest' (Stekhoven & BÃ¼hlmann, 2012 ) machine-learning 175 imputation routine (using the RF algorithm) into our pipeline, as it has been found to have low The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/569012 doi: bioRxiv preprint Yellow boxes indicate which data split is being tested in that particular 'fold'. We incorporated an internal repeated 10-fold cross-validation (CV) process to estimate model 199 performance. CV can help prevent overfitting and artificial inflation of accuracy due to use of the The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/569012 doi: bioRxiv preprint sensitivity and specificity for classification models). Another advantage of this package is that it 208 can perform classification or regression using 237 different types of models from generalized 209 linear models (GLMs such as logistic regression) to complex machine learning and Bayesian 210 models using a standardized approach (see Kuhn, 2008 for a complete list of models).

Search related documents:

Co phrase search for related documents

Try single phrases listed below for: 1

Co phrase search for related documents, hyperlinks ordered by date

ABSTRACT:

TERMS:

DOCUMENTS: