Selected article for: "amino acid and esol dataset"

Author: Bikash K. Bhandari; Paul P. Gardner; Chun Shen Lim
Title: Solubility-Weighted Index: fast and accurate prediction of protein solubility
  • Document date: 2020_2_16
  • ID: 2rpr7aph_12
    Snippet: . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.15.951012 doi: bioRxiv preprint To validate the cross-validation results, we used a dataset independent of the PSI:Biology data known as eSOL (Niwa et al. 2009 ) . This dataset consists of the solubility percentages of E. coli proteins determined using an E. col.....
    Document: . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.15.951012 doi: bioRxiv preprint To validate the cross-validation results, we used a dataset independent of the PSI:Biology data known as eSOL (Niwa et al. 2009 ) . This dataset consists of the solubility percentages of E. coli proteins determined using an E. coli cell-free system (N = 3,198) . Our solubility scoring using the final weights showed a significant improved correlation with E. coli protein solubility over the initial weights (Smith et al. 's normalised B-factors) [Spearman's rho of 0.50 (P = 9.46 ✕ 10 -206 ) versus 0.40 (P = 4.57 ✕ 10 -120 )]. We repeated the correlation analysis by removing extra amino acid residues including His-tags from the eSOL sequences (MRGSHHHHHHTDPALRA and GLCGR at the N-and C-termini, respectively). This artificial dataset was created based on the assumption that His-tags have little effect on solubility. We observed a slight decrease in correlation for this artificial dataset (Spearman's rho = 0.47, P= 3.67 ✕ 10-176), which may be due to the effects of His-tag in solubility and/or the limitation(s) of our approach that may overfit to His-tag fusion proteins.

    Search related documents:
    Co phrase search for related documents
    • amino acid and coli protein: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
    • amino acid and correlation analysis: 1, 2, 3, 4, 5, 6, 7, 8
    • amino acid and dataset independent: 1, 2, 3, 4, 5, 6, 7
    • artificial dataset and correlation analysis: 1
    • artificial dataset and dataset independent: 1
    • cell free system and coli protein: 1
    • coli protein and correlation analysis: 1
    • correlation analysis and dataset independent: 1