Author: Enrico Lavezzo; Michele Berselli; Ilaria Frasson; Rosalba Perrone; Giorgio Palù; Alessandra R. Brazzale; Sara N. Richter; Stefano Toppo
Title: G-quadruplex forming sequences in the genome of all known human viruses: a comprehensive guide Document date: 2018_6_11
ID: c3lmmll6_69
Snippet: To determine whether the presence of G4 patterns in a virus is a conserved feature or it is only a consequence of its nucleotide composition, simulated viral genomes were generated and compared with real data. Two different strategies were adopted to generate simulated data: i) Single nucleotide assembling (SN) . A computational approach was adopted where, in analogy to Huppert and Balasubramanian (Huppert and Balasubramanian 2005) , the viral ge.....
Document: To determine whether the presence of G4 patterns in a virus is a conserved feature or it is only a consequence of its nucleotide composition, simulated viral genomes were generated and compared with real data. Two different strategies were adopted to generate simulated data: i) Single nucleotide assembling (SN) . A computational approach was adopted where, in analogy to Huppert and Balasubramanian (Huppert and Balasubramanian 2005) , the viral genome was modelled as a multinomial stream based on the assumption that each DNA base is independent. These authors give an explicit solution for the prevalence of G4s in the human genome as a function of p(G), the probability of any base being G. In our approach, we also accounted for the probability of cytosines (p(C)) and additionally assumed that adenine (A) and thymine (T) bases were equally likely to occur. As all four probabilities need to sum up to one, the statistical reference model is a multinomial distribution with probability vector (p(G), p(C), p(A), p(T)). We hence took as many independent draws from this multinomial distribution as the number of nucleotides in the reference viral genome (Supplemental_Material.pdf, Table S1 ). The probabilities p(G) and p(C) vary for each virus and reflect the prevalence of G and C bases present in that virus, while the remaining proportion is equally split to give p(A) and p(T). For each virus, 10,000 independent sequences were produced in silico with this method; the 'sample' R command with author/funder. All rights reserved. No reuse allowed without permission.
Search related documents:
Co phrase search for related documents- dna base and human genome: 1
Co phrase search for related documents, hyperlinks ordered by date