Selected article for: "secondary structure and structure information"

Author: Villegas-Morcillo, Amelia; Makrodimitris, Stavros; van Ham, Roeland C.H.J.; Gomez, Angel M.; Sanchez, Victoria; Reinders, Marcel J.T.
Title: Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function
  • Cord-id: rs7hnksh
  • Document date: 2020_4_8
  • ID: rs7hnksh
    Snippet: Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pre-trained in an unsupervised setting on the
    Document: Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pre-trained in an unsupervised setting on the supervised task of protein function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for deep prediction models, as a two-layer perceptron was enough to achieve state-of-the-art performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that three-dimensional structure is also potentially learned during the unsupervised pre-training. Availability Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Contact [email protected] Supplementary information Supplementary data are available online.

    Search related documents:
    Co phrase search for related documents
    • logistic regression baseline and lr logistic regression: 1
    • logistic regression baseline and lstm short term memory: 1
    • logistic regression baseline and machine learning: 1, 2, 3
    • logistic regression classifier and long lstm short term memory: 1
    • logistic regression classifier and lr logistic regression: 1, 2, 3, 4, 5, 6, 7, 8
    • logistic regression classifier and lstm short term memory: 1
    • logistic regression classifier and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
    • long lstm short term memory network and lr logistic regression: 1
    • long lstm short term memory network and lstm layer: 1, 2
    • lower dimensional and machine learning: 1
    • lr linear model and machine learning: 1
    • lr logistic regression and lstm network: 1
    • lr logistic regression and lstm short term memory: 1, 2, 3, 4, 5, 6
    • lr logistic regression and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46
    • lstm layer and machine learning: 1, 2, 3, 4
    • lstm network and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
    • lstm network and machine learning advance: 1
    • lstm short term memory and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46
    • lstm short term memory and machine learning advance: 1