Results

Selected article for: "perform model and training data"

Author: Robert C. Cope; Joshua V. Ross

Title: Identification of the relative timing of infectiousness and symptom onset for outbreak control

Document date: 2019_3_8

ID: 8r0vfzeu_18

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: Once a design has been chosen, to employ this process when an outbreak is observed it would be To more effectively use the household data in training the random forest, we summarize raw 147 household data as daily histograms of incidence, as in Figure 1c . That is, we count the propor-148 tion of households that, on day d, observed an incidence of i, and then use the resultant (design Conducting a First Few Hundred-style study can be extremely la.....

KG: Link to Knowledge Graph

Complete Snippet

Document: Once a design has been chosen, to employ this process when an outbreak is observed it would be To more effectively use the household data in training the random forest, we summarize raw 147 household data as daily histograms of incidence, as in Figure 1c . That is, we count the propor-148 tion of households that, on day d, observed an incidence of i, and then use the resultant (design Conducting a First Few Hundred-style study can be extremely labour intensive. Consequently, 155 we wish to assess the potential for model discrimination when sampling is only performed on a 156 subset of days, rather than every day. If we choose to only sample on D < 14 days, within the 157 first 14 days following the first symptomatic case in each household, we must necessarily also 158 choose the optimal days on which to sample. We choose those days that produce the highest The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/571547 doi: bioRxiv preprint Rather than evaluating the full set of possible designs, or applying an optimisation algorithm, 165 we propose a heuristic for efficiently finding high-quality designs of a given size. This heuristic 166 is to perform random forest model selection on the largest possible design, extract the random 167 forest feature importance Figure 1b) , and use this random forest feature importance to rank 168 design points. Specifically, days are ranked on their maximum feature importance; the sum of 169 the importance of features from a day was also tested, but had inferior performance. A design 170 of size d uses the highest-ranked d design points. The random forest feature importance metric 171 we use is the mean decrease in Gini impurity (24) of a feature across the trees in the random 172 forest (this metric is easily extracted from the python scikit-learn random forest algorithm (23)).

Search related documents:

Co phrase search for related documents

daily histogram and feature importance: 1
daily histogram and forest feature: 1
daily histogram and forest feature importance: 1
daily histogram and model discrimination: 1
daily histogram and optimal day: 1
daily histogram and random forest: 1
daily histogram and random forest feature importance: 1

Co phrase search for related documents, hyperlinks ordered by date

ABSTRACT:

TERMS:

DOCUMENTS: