Author: Tanujit Chakraborty; Indrajit Ghosh
Title: Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis Document date: 2020_4_14
ID: ba6mdgq3_43
Snippet: For the risk assessment with the CFR dataset for 50 countries, we apply the regression tree (RT) [7] that has built-in feature selection mechanism, easy interpretability, and provides better visualization. Rt, as a widely used simple machine learning algorithm, can model arbitrary decision boundaries. The methodology outlined in [7] can be summarized into three stages. The first stage involves growing the tree using a recursive partitioning techn.....
Document: For the risk assessment with the CFR dataset for 50 countries, we apply the regression tree (RT) [7] that has built-in feature selection mechanism, easy interpretability, and provides better visualization. Rt, as a widely used simple machine learning algorithm, can model arbitrary decision boundaries. The methodology outlined in [7] can be summarized into three stages. The first stage involves growing the tree using a recursive partitioning technique to select essential variables from a set of possible causal variables and split points using a splitting criterion. The standard splitting criteria for RT is the mean squared error (MSE). After a large tree is identified, the second stage of RT methodology uses a pruning procedure that gives a nested subset of trees starting from the largest tree grown and continuing the process until only one node of the tree remains. The cross-validation technique is popularly used to provide estimates of future prediction errors for each subtree. The last stage of the RT methodology selects the optimal tree that corresponds to a tree yielding the lowest cross-validated or testing set error rate. To avoid instability of trees in this stage, trees with smaller sizes, but comparable in terms of accuracy, are chosen as an alternative. This process can be tuned to obtain trees of varying sizes and complexity. A measure of variable importance can be achieved by observing the drop in the error rate when another variable is used instead of the primary split. In general, the more frequent a variable appears as a primary split, the higher the importance score assigned. A detailed description of the tree building process is available at [17] .
Search related documents:
Co phrase search for related documents- accuracy term and cross validation technique: 1
- accuracy term and error rate: 1
- cross validation and easy interpretability: 1
- cross validation and error rate: 1, 2, 3, 4, 5, 6, 7
Co phrase search for related documents, hyperlinks ordered by date