Author: Hozumi, Yuta; Wang, Rui; Yin, Changchuan; Wei, Guo-Wei
Title: UMAP-assisted $K$-means clustering of large-scale SARS-CoV-2 mutation datasets Cord-id: csfn5jy6 Document date: 2020_12_30
ID: csfn5jy6
Snippet: Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. The understanding of evolution and transmission of SARS-CoV-2 is of paramount importance for the COVID-19 control, combating, and prevention. Due to the rapid growth of both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introd
Document: Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. The understanding of evolution and transmission of SARS-CoV-2 is of paramount importance for the COVID-19 control, combating, and prevention. Due to the rapid growth of both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced $k$-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted $k$-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date