Selected article for: "query sequence and sequence database"

Author: Chengxin Zhang; Wei Zheng; Xiaoqiang Huang; Eric W. Bell; Xiaogen Zhou; Yang Zhang
Title: Protein structure and sequence re-analysis of 2019-nCoV genome does not indicate snakes as its intermediate host or the unique similarity between its spike protein insertions and HIV-1
  • Document date: 2020_2_8
  • ID: mtv80pjo_3_0
    Snippet: In a recent manuscript entitled "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag" 3 , Pradhan et al. presented a discovery of four novel insertions unique to 2019-nCoV spike protein (Figure 1) . They further concluded that these four insertions are part of the receptor binding site of 2019-nCoV, and that these insertions shared "uncanny similarity" to Human Immunodeficiency Virus 1 (HIV-1) proteins but n.....
    Document: In a recent manuscript entitled "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag" 3 , Pradhan et al. presented a discovery of four novel insertions unique to 2019-nCoV spike protein (Figure 1) . They further concluded that these four insertions are part of the receptor binding site of 2019-nCoV, and that these insertions shared "uncanny similarity" to Human Immunodeficiency Virus 1 (HIV-1) proteins but not to other coronaviruses. These claims have resulted in considerable controversy in the community. To investigate whether the conclusions by Pradhan et al. are scientifically precise, we re-analyzed the structural location and sequence homology of the four spike protein insertions discussed therein. Figure 1 . Sequence alignment of 2019-nCoV spike protein (NCBI accession: QHD43416) and SARS-CoV spike protein (UniProt ID: P59594). The four "novel" insertions "GTNGTKR" (IS1), "YYHKNNKS" (IS2), "GDSSSG" (IS3) and "QTNSPRRA" (IS4) by Pradhan et al. are highlighted in dashed rectangles. We noted that these fragments are not bona fide "insertions"; in fact, at least three out of all four fragments are also shared with Bat Coronavirus RaTG13 spike glycoprotein (NCBI accession: QHR63300.1), as shown in Table 1 -3. Nevertheless, we still refer these fragments as "insertions" in this manuscript for consistency with the original report. The receptor binding domain of spike is marked within the solid box, which corresponds to residue positions 323 to 545 in the above alignment. assembled with the human ACE2 structure (PDB ID: 6ACJ) 7 by DEMO 8 to form a spike-ACE2 complex. As shown in Figure 2 , all four insertions are located outside the Receptor Binding Domain (RBD) of spike, in contrast to the original conclusion made by Pradhan et al. which stated that the insertions are located on the interface with ACE2. To investigate viral homologs of the four insertions, we further performed a BLAST sequence search of these four insertions against the non-redundant (NR) sequence database, restricting the search results to viruses (taxid:10239), but leaving other search parameters at default values. The choice of BLAST instead of the more sensitive PSI-BLAST algorithm 9 is to emulate the original report by Pradhan et al. which mainly aimed to identify near identical sequences. The top 5 sequence homologs (including the query itself) identified for each insertion are listed in Tables 1-4. In contrast to the previous claim that the four insertions are unique to 2019-nCoV and HIV-1, all four insertion fragments can be found in other viruses. In fact, an HIV-1 protein is among the top BLAST hits for only one of the four insertion fragments, while three of the four insertion fragments are found in bat coronavirus RaTG13. Moreover, partly due to the very short length of these insertions, which range from 6 to 8 amino acids, the E-value of the BLAST hits, which is a parameter used by BLAST for assessing the statistical significance of the alignments and usually needs to be below 0.01 to be considered as significant 9 , are all greater than 4, except for a bat coronavirus hit for IS2. These high E-values suggest that the majority of these similarities are likely to be coincidental. Table 1 to 4, if there are multiple redundant hits for the same gene from different strains of the same species removed, only one hit is shown. The residue non-identical to query is highlighted in bold. Sequence identity is calculated as the nu

    Search related documents:
    Co phrase search for related documents
    • alignment statistical significance and sequence alignment: 1, 2
    • alignment statistical significance and statistical significance: 1, 2, 3