Author: Chen, Qufei; Sokolova, Marina
Title: Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts Cord-id: a9ab1c4s Document date: 2021_8_15
ID: a9ab1c4s
Snippet: Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b
Document: Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results.
Search related documents:
Co phrase search for related documents- accuracy result and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
- accuracy result and machine learning method: 1, 2
- accuracy result and machine learning model: 1
- accurate training and low accuracy: 1, 2
- accurate training and machine learning: 1, 2, 3, 4, 5, 6, 7
- accurate training and machine learning method: 1
- accurate training and machine learning model: 1
- additional challenge and machine learning: 1, 2
- additional pre and machine learning: 1
- additional subset and low accuracy: 1
- additional subset and machine learning: 1
- additional subset and machine learning model: 1
- low accuracy and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
- low accuracy and machine learning model: 1, 2, 3
- low percentage and machine learning: 1, 2, 3, 4, 5
- low percentage and machine learning model: 1
Co phrase search for related documents, hyperlinks ordered by date