Author: Li, Chun; Zhao, Jialing; Wang, Changzhong; Yao, Yuhua
Title: Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation Document date: 2018_2_23
ID: u1imic5l_3
Snippet: Motivated by the work mentioned above, we propose a generalized PseAAC which is grounded on a three-letter model and 2-D graphical representation of a protein sequence. We summarize the main work of this paper as follows: In section 2, we briefly introduce five datasets used in this study. In section 3, on the basis of two important physicochemical properties of amino acids, we cluster the 20 standard amino acids into three groups. By assigning t.....
Document: Motivated by the work mentioned above, we propose a generalized PseAAC which is grounded on a three-letter model and 2-D graphical representation of a protein sequence. We summarize the main work of this paper as follows: In section 2, we briefly introduce five datasets used in this study. In section 3, on the basis of two important physicochemical properties of amino acids, we cluster the 20 standard amino acids into three groups. By assigning to each group a representative symbol, we transform a protein sequence into a three-letter sequence. Then a 2-D graph without loops and multiple edges and its geometric line adjacency matrix are obtained. A sequence-derived feature vector of dimension (25+ ) is thus constructed to characterize a protein sequence. Our scheme is similar to, but obviously different from that of PseAAC. In section 4, we apply the presented feature vector to compare -globin proteins of 17 species and 72 spike proteins of coronaviruses respectively. Also, we develop a SVM (support vector machine) model using the generalized PseAAC to identify DNA-binding and non-binding proteins on three datasets. Experiment results show that the presented method outperforms the existing methods including DNAbinder [1] , DNA-Prot [2] , iDNA-Prot [3] and enDNA-Prot [4] . Finally, conclusions are given in section 5.
Search related documents:
Co phrase search for related documents- amino acid and dna bind: 1, 2, 3
- amino acid and feature vector: 1, 2, 3, 4, 5, 6, 7
- amino acid and generalized pseaac: 1, 2
- amino acid and graphical representation: 1, 2, 3, 4, 5, 6
- amino acid and letter model: 1
- amino acid and letter sequence: 1, 2, 3, 4, 5, 6, 7
- dimension feature vector and feature vector: 1
- feature vector and generalized pseaac: 1
- feature vector and graphical representation: 1, 2, 3
- feature vector and letter sequence: 1
- generalized pseaac and letter sequence: 1, 2
- graphical representation and letter model: 1
- graphical representation and letter sequence: 1, 2
Co phrase search for related documents, hyperlinks ordered by date