Author: Ari Klein; Arjun Magge; Karen O'Connor; Haitao Cai; Davy Weissenbacher; Graciela Gonzalez-Hernandez
Title: A Chronological and Geographical Analysis of Personal Reports of COVID-19 on Twitter Document date: 2020_4_22
ID: 8f1arjw1_20
Snippet: is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.19.20069948 doi: medRxiv preprint blocks, 768 units for each hidden layer, and 12 self-attention heads. We used a maximum sequence length of 100 tokens to encode. After feeding the sequence of token IDs to BERT, the encoded representation is passed to a dropout layer (dropping rate of 0.1) and, then, a dense layer with 2 units and a softm.....
Document: is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.19.20069948 doi: medRxiv preprint blocks, 768 units for each hidden layer, and 12 self-attention heads. We used a maximum sequence length of 100 tokens to encode. After feeding the sequence of token IDs to BERT, the encoded representation is passed to a dropout layer (dropping rate of 0.1) and, then, a dense layer with 2 units and a softmax activation, which predicts the class for each tweet. For training, we used Adam optimization with rate decay and warm-up. We used a batch size of 64, training runs for 3 epochs, and a maximum learning rate of 1e-4 for the first 10% of training steps, with the learning rate decaying to 0 in the latter 90% of training steps. Prior to automatic classification, we pre-processed the text by normalizing user names (i.e., strings beginning with "@") and
Search related documents:
Co phrase search for related documents- batch size and learning rate: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- batch size and maximum learning rate: 1
- dense layer and dropout layer: 1, 2, 3, 4
- learning rate and maximum learning rate: 1
- learning rate and rate decay: 1
- learning rate and rate drop: 1
Co phrase search for related documents, hyperlinks ordered by date