Author: Robert C. Cope; Joshua V. Ross
Title: Identification of the relative timing of infectiousness and symptom onset for outbreak control Document date: 2019_3_8
ID: 8r0vfzeu_18
Snippet: Once a design has been chosen, to employ this process when an outbreak is observed it would be To more effectively use the household data in training the random forest, we summarize raw 147 household data as daily histograms of incidence, as in Figure 1c . That is, we count the propor-148 tion of households that, on day d, observed an incidence of i, and then use the resultant (design Conducting a First Few Hundred-style study can be extremely la.....
Document: Once a design has been chosen, to employ this process when an outbreak is observed it would be To more effectively use the household data in training the random forest, we summarize raw 147 household data as daily histograms of incidence, as in Figure 1c . That is, we count the propor-148 tion of households that, on day d, observed an incidence of i, and then use the resultant (design Conducting a First Few Hundred-style study can be extremely labour intensive. Consequently, 155 we wish to assess the potential for model discrimination when sampling is only performed on a 156 subset of days, rather than every day. If we choose to only sample on D < 14 days, within the 157 first 14 days following the first symptomatic case in each household, we must necessarily also 158 choose the optimal days on which to sample. We choose those days that produce the highest The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/571547 doi: bioRxiv preprint Rather than evaluating the full set of possible designs, or applying an optimisation algorithm, 165 we propose a heuristic for efficiently finding high-quality designs of a given size. This heuristic 166 is to perform random forest model selection on the largest possible design, extract the random 167 forest feature importance Figure 1b) , and use this random forest feature importance to rank 168 design points. Specifically, days are ranked on their maximum feature importance; the sum of 169 the importance of features from a day was also tested, but had inferior performance. A design 170 of size d uses the highest-ranked d design points. The random forest feature importance metric 171 we use is the mean decrease in Gini impurity (24) of a feature across the trees in the random 172 forest (this metric is easily extracted from the python scikit-learn random forest algorithm (23)).
Search related documents:
Co phrase search for related documents- daily histogram and feature importance: 1
- daily histogram and forest feature: 1
- daily histogram and forest feature importance: 1
- day feature importance and feature importance: 1
- day feature importance and forest feature: 1
- day feature importance and forest feature importance: 1
- design point and high quality: 1
- feature importance and forest algorithm: 1
- feature importance and forest feature: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
- feature importance and forest feature importance: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- feature importance and forest model: 1, 2, 3, 4, 5, 6, 7, 8
- Gini impurity and give size: 1
Co phrase search for related documents, hyperlinks ordered by date