Author: Chen, Chen; Lu, Shan; Du, Pengcheng; Wang, Haiyin; Yu, Weiwen; Song, Huawen; Xu, Jianguo
Title: Silent geographical spread of the H7N9 virus by online knowledge analysis of the live bird trade with a distributed focused crawler Document date: 2013_12_18
ID: t2zlhamq_7
Snippet: The IC of two cities was calculated by the number of records of the web page containing both cities. Using CC data, we estimated the error rate of IC data under the assumption that the CC data were complete. The distribution of IC with and without CC support was drawn, revealing that CC supported data with more IC records (Supplementary Figures S1A and S1B ). This result suggests that our IC data were reasonable for use in the analysis of tradin.....
Document: The IC of two cities was calculated by the number of records of the web page containing both cities. Using CC data, we estimated the error rate of IC data under the assumption that the CC data were complete. The distribution of IC with and without CC support was drawn, revealing that CC supported data with more IC records (Supplementary Figures S1A and S1B ). This result suggests that our IC data were reasonable for use in the analysis of trading connections in provinces and cities. When compared to the connections between provinces, the connections between cities show sharper peaks and fewer connections because of insufficient information. Accurate analysis of trading links was difficult in these cities, especially in Western China. We defined noise as information that was collected but not relevant to construct the connection of bird trade. We used the following rules to calculate the noise rate: (i) we obtained the rough dataset by using general, relevant keywords about the infections, and then (ii) we used more specific, relevant keywords to search and generate a more accurate dataset before (iii) the noise rate was calculated as the difference between the first and second datasets. To exclude the noise information from web sites, a cutoff value was chosen and the number of connections was recorded (Supplementary Figure S1C ). The false positive ratio (FPR) and false negative ratio (FNR) of IC were estimated based on CC because we lacked complete and accurate datasets for information analysis (Supplementary Figure S1D) . Different cutoff curves for provinces and cities reveal cutoffs of 10 and 20 as the thresholds for cities and provinces, respectively. The correlation between each pair of provinces was calculated and compared using the number of their supported web sites (Supplementary Figure S2) . Infection risk was estimated by the connections between cities and provinces using the following two parameters: (i) the number of provinces or cities potentially associated with the outbreak region and (ii) the number of queries that contain the keywords and the name of two provinces or cities.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date