Results

Selected article for: "high risk and open source"

Author: Miano, J.; Hilton, C.; Gangrade, V.; Pomeroy, M.; Siven, J.; Flynn, M.; Tilashalski, F.

Title: Using Event-Based Web-Scraping Methods and Bidirectional Transformers to Characterize COVID-19 Outbreaks in Food Production and Retail Settings

Cord-id: vxan9i8e

Document date: 2021_1_1

ID: vxan9i8e

Hyperlink: Download document. Google Scholar. Related documents.

Snippet: Current surveillance methods may not capture the full extent of COVID-19 spread in high-risk settings like food establishments. Thus, we propose a new method for surveillance that identifies COVID-19 cases among food establishment workers from news reports via web-scraping and natural language processing (NLP). First, we used web-scraping to identify a broader set of articles (n = 67,078) related to COVID-19 based on keyword mentions. In this dataset, we used an open-source NLP platform (Clarity

KG: Link to Knowledge Graph

Complete Snippet

Document: Current surveillance methods may not capture the full extent of COVID-19 spread in high-risk settings like food establishments. Thus, we propose a new method for surveillance that identifies COVID-19 cases among food establishment workers from news reports via web-scraping and natural language processing (NLP). First, we used web-scraping to identify a broader set of articles (n = 67,078) related to COVID-19 based on keyword mentions. In this dataset, we used an open-source NLP platform (ClarityNLP) to extract location, industry, case, and death counts automatically. These articles were vetted and validated by CDC subject matter experts (SMEs) to identify those containing COVID-19 outbreaks in food establishments. CDC and Georgia Tech Research Institute SMEs provided a human-labeled test dataset containing 388 articles to validate our algorithms. Then, to improve quality, we fine-tuned a pretrained RoBERTa instance, a bidirectional transformer language model, to classify articles containing â‰¥ 1 positive COVID-19 cases in food establishments. The application of RoBERTa decreased the number of articles from 67,078 to 1,112 and classified (â‰¥ 1 positive COVID-19 cases in food establishments) articles with 88% accuracy in the human-labeled test dataset. Therefore, by automating the pipeline of web-scraping and COVID-19 case prediction using RoBERTa, we enable an efficient human in-the-loop process by which COVID-19 data could be manually collected from articles flagged by our model, thus reducing the human labor requirements. Furthermore, our approach could be used to predict and monitor locations of COVID-19 development by geography and could also be extended to other industries and news article datasets of interest. Â© 2021, Springer Nature Switzerland AG.

Search related documents:

Co phrase search for related documents

Try single phrases listed below for: 1

Co phrase search for related documents, hyperlinks ordered by date

loop process

ABSTRACT:

TERMS:

DOCUMENTS: