Selected article for: "data set and genetic distance"

Author: Chan, Joseph M.; Rabadan, Raul
Title: Quantifying Pathogen Surveillance Using Temporal Genomic Data
  • Document date: 2013_1_29
  • ID: u2t1x89m_25
    Snippet: Comparison to clustering methods. Another possible surveillance measurement characterizes the cluster structure of isolates. In an ideal situation, a well-sampled population of sequences separated by genetic distance would be represented by points densely and homogeneously spread across a continuum. Therefore, clustering techniques such as hierarchical, k-means, or expectationmaximization clustering can be used to ascertain how poorly sampled a p.....
    Document: Comparison to clustering methods. Another possible surveillance measurement characterizes the cluster structure of isolates. In an ideal situation, a well-sampled population of sequences separated by genetic distance would be represented by points densely and homogeneously spread across a continuum. Therefore, clustering techniques such as hierarchical, k-means, or expectationmaximization clustering can be used to ascertain how poorly sampled a pathogen is on the basis of the number of clusters in a data set. Bar coding is an alternative strategy based on the field of per-sistent homology that identifies topologically invariant clusters in cloud data; in particular, it calculates the b 0 Betti number, the number of connected components in a set of simplicial complexes constructed from sequences at different filtration Hamming distances (see Materials and Methods) (40) . A lower b 0 would indicate better sampling.

    Search related documents:
    Co phrase search for related documents
    • bar coding and data set: 1
    • bar coding and different filtration: 1
    • bar coding and genetic distance: 1, 2
    • cluster number and data set: 1
    • connected component and data set: 1
    • data set and different filtration: 1
    • data set and genetic distance: 1, 2, 3, 4, 5
    • different filtration and genetic distance: 1