Author: Bikash K. Bhandari; Paul P. Gardner; Chun Shen Lim
Title: Solubility-Weighted Index: fast and accurate prediction of protein solubility Document date: 2020_2_16
ID: 2rpr7aph_12
Snippet: . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.15.951012 doi: bioRxiv preprint To validate the cross-validation results, we used a dataset independent of the PSI:Biology data known as eSOL (Niwa et al. 2009 ) . This dataset consists of the solubility percentages of E. coli proteins determined using an E. col.....
Document: . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.15.951012 doi: bioRxiv preprint To validate the cross-validation results, we used a dataset independent of the PSI:Biology data known as eSOL (Niwa et al. 2009 ) . This dataset consists of the solubility percentages of E. coli proteins determined using an E. coli cell-free system (N = 3,198) . Our solubility scoring using the final weights showed a significant improved correlation with E. coli protein solubility over the initial weights (Smith et al. 's normalised B-factors) [Spearman's rho of 0.50 (P = 9.46 ✕ 10 -206 ) versus 0.40 (P = 4.57 ✕ 10 -120 )]. We repeated the correlation analysis by removing extra amino acid residues including His-tags from the eSOL sequences (MRGSHHHHHHTDPALRA and GLCGR at the N-and C-termini, respectively). This artificial dataset was created based on the assumption that His-tags have little effect on solubility. We observed a slight decrease in correlation for this artificial dataset (Spearman's rho = 0.47, P= 3.67 ✕ 10-176), which may be due to the effects of His-tag in solubility and/or the limitation(s) of our approach that may overfit to His-tag fusion proteins.
Search related documents:
Co phrase search for related documents- amino acid and coli protein: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
- amino acid and correlation analysis: 1, 2, 3, 4, 5, 6, 7, 8
- amino acid and dataset independent: 1, 2, 3, 4, 5, 6, 7
- artificial dataset and correlation analysis: 1
- artificial dataset and dataset independent: 1
- cell free system and coli protein: 1
- coli protein and correlation analysis: 1
- correlation analysis and dataset independent: 1
Co phrase search for related documents, hyperlinks ordered by date