Author: Richard J. Medford; Sameh N. Saleh; Andrew Sumarsono; Trish M. Perl; Christoph U. Lehmann
Title: An ""Infodemic"": Leveraging High-Volume Twitter Data to Understand Public Sentiment for the COVID-19 Outbreak Document date: 2020_4_7
ID: a6p6ka8w_9
Snippet: We performed all data processing and analysis using Python software, version 3.6.1 (Python Software Foundation) and RStudio version 1.2.1335 (R Foundation for Statistical Computing). We compared the COVID-19-related tweets per hour with the number of newly confirmed cases over each 24-hour period and completed descriptive statistics for the collected metadata. To analyze tweets, we extracted the plain text from the original message and stripped o.....
Document: We performed all data processing and analysis using Python software, version 3.6.1 (Python Software Foundation) and RStudio version 1.2.1335 (R Foundation for Statistical Computing). We compared the COVID-19-related tweets per hour with the number of newly confirmed cases over each 24-hour period and completed descriptive statistics for the collected metadata. To analyze tweets, we extracted the plain text from the original message and stripped out web addresses, Twitter hyperlinks, and punctuation. For all but the sentiment analysis, we removed stop words (words commonly found in a document of little analysis value e.g., "for", "the", "is"), converted text to lowercase, and lemmatized words (changing different forms of a word to its root form e.g., "viruses" to "virus" or "went" to "go"). We transformed the words in tweets into a vector of individual words and two-word phrases (i.e., unigrams and bigrams respectively). We removed terms present in less than five tweets and two words that were found in greater than ten percent of tweets ("case" and "people") decreasing the dictionary from 626,614 terms to 38,823 terms.
Search related documents:
Co phrase search for related documents- analysis value and sentiment analysis: 1
- analysis value and virus virus: 1, 2
Co phrase search for related documents, hyperlinks ordered by date