Selected article for: "large dataset and machine learning"

Author: Gupta, Raj Kumar; Vishwanath, Ajay; Yang, Yinping
Title: COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes
  • Cord-id: w61lsdml
  • Document date: 2020_7_14
  • ID: w61lsdml
    Snippet: This paper describes a large global dataset on people's social media responses to the COVID-19 pandemic over the Twitter platform. From 28 January 2020 to 1 September 2021, we collected over 198 million Twitter posts from more than 25 million unique users using four keywords:"corona","wuhan","nCov"and"covid". Leveraging topic modeling techniques and pre-trained machine learning-based emotion analytic algorithms, we labeled each tweet with seventeen semantic attributes, including a) ten binary at
    Document: This paper describes a large global dataset on people's social media responses to the COVID-19 pandemic over the Twitter platform. From 28 January 2020 to 1 September 2021, we collected over 198 million Twitter posts from more than 25 million unique users using four keywords:"corona","wuhan","nCov"and"covid". Leveraging topic modeling techniques and pre-trained machine learning-based emotion analytic algorithms, we labeled each tweet with seventeen semantic attributes, including a) ten binary attributes indicating the tweet's relevance or irrelevance to the top ten detected topics, b) five quantitative emotion attributes indicating the degree of intensity of the valence or sentiment (from 0: very negative to 1: very positive), and the degree of intensity of fear, anger, happiness and sadness emotions (from 0: not at all to 1: extremely intense), and c) two qualitative attributes indicating the sentiment category (very negative, negative, neutral or mixed, positive, very positive) and the dominant emotion category (fear, anger, happiness, sadness, no specific emotion) the tweet is mainly expressing. We report the descriptive statistics around these new attributes, their temporal distributions, and the overall geographic representation of the dataset. The paper concludes with an outline of the dataset's possible usage in communication, psychology, public health, economics, and epidemiology.

    Search related documents:
    Co phrase search for related documents
    • Try single phrases listed below for: 1