Selected article for: "logistic regression and machine learning performance"

Author: Nicholas Fountain-Jones; Gustavo Machado; Scott Carver; Craig Packer; Mariana Mendoza; Meggan E Craft
Title: How to make more from exposure data? An integrated machine learning pipeline to predict pathogen exposure
  • Document date: 2019_3_6
  • ID: jc5c87b9_3_0
    Snippet: In our pipeline, we compare supervised machine learning algorithms (RF, SVM, and GBM) as 212 well as logistic regression. These models are among the most popular and best tested machine 213 learning methods, but all operate in different ways, and this can in turn can impact predictive For both CDV and parvovirus, machine learning models had higher predictive performance 316 (higher AUC) compared to logistic regression models (Table 1) The copyrig.....
    Document: In our pipeline, we compare supervised machine learning algorithms (RF, SVM, and GBM) as 212 well as logistic regression. These models are among the most popular and best tested machine 213 learning methods, but all operate in different ways, and this can in turn can impact predictive For both CDV and parvovirus, machine learning models had higher predictive performance 316 (higher AUC) compared to logistic regression models (Table 1) The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/569012 doi: bioRxiv preprint Using our calibration approach further improved the overall predictive performance of each 320 pathogen by increasing the sensitivity of the models (i.e., they more able to correctly identify 321 positives), however, there was a trade-off with reduced specificity. For example, our calibrated 322 CDV model had a 7% increase in AUC with an 23% increase in sensitivity but 18% decrease in 323 specificity compared to the uncalibrated model ( The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/569012 doi: bioRxiv preprint Age and rainfall were the most important features predicting CDV exposure, but both features 338 were relatively less important in the calibrated models (Fig. 3) . Even though the features 339 associated with exposure risk in each model were broadly similar for both pathogens, the 340 relationships between each feature and exposure risk varied. Partial dependency plots showed 341 that risk of CDV increased relatively linearly across age classes in in the uncalibrated model 342 (Fig. 3b) , whereas in the calibrated model exposure risk was much more constant across age 343 classes with an increase in risk in individuals between 1-2 y.o. (Fig. 3f) . Rainfall also showed 344 different relationships in each model with reduced exposure risk when the average monthly 345 rainfall > 40 mm in the age calibrated model (Fig. 3c) . There was a much shallower decline in 346 CDV risk associated with rainfall in the calibrated model (Fig 3e) compared to the uncalibrated 347 model (Fig. 3c) . The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/569012 doi: bioRxiv preprint Fig. 3 : Plots showing the differences in model predictions and the features that contribute to Similar to CDV, age sampled followed by rainfall were the most important features associated 362 with parvovirus exposure risk in the uncalibrated models (Fig. S4a) . Parvovirus exposure risk 363 slightly increased across age classes in the uncalibrated models, however in the calibrated 364 models exposure risk increased rapidly at early ages (0-1), but then was relatively constant 365 across ages >3 (Fig. S4b) . The signature of rainfall on parvovirus risk in the uncalibrated model 366 was remarkably like that of CDV with a large drop in risk when the monthly rainfall was > 40 367 mm a month (Fig. S4c) . However, rainfall was much less important in the calibrated model ( Fig. 368 S4d). Strikingly epidemic year was important in the calibrated model with exposure risk much 369 higher for animals likely exposed in the 1992 epidemic (Fig. S4f) . We further interrogated the calibrated models to visualize how interactions between features 374 could be important for exposure risk of both pathogens. We focussed on interactions with 375 epidemic year, as we were interested to

    Search related documents:
    Co phrase search for related documents
    • age calibrate model and calibrate model: 1, 2
    • age class and different relationship: 1
    • age sample and best test: 1
    • age sample and different relationship: 1, 2, 3