Author: Li, Chun; Zhao, Jialing; Wang, Changzhong; Yao, Yuhua
Title: Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation Document date: 2018_2_23
ID: u1imic5l_59
Snippet: . (12) In order to analyze the influence of the number of negative samples in a benchmark dataset on the predictive performance of the current method, we construct a series of subsets of DNAeSet and use them as training set in turn, while DNAiSet is always used as the testing set. Each subset contains all the 146 DNA-BPs and a part of NBPs in DNAeSet. In detail, if the set of NBPs in is denoted by , k=1, 2, ..., then consists of 250 NBPs randomly.....
Document: . (12) In order to analyze the influence of the number of negative samples in a benchmark dataset on the predictive performance of the current method, we construct a series of subsets of DNAeSet and use them as training set in turn, while DNAiSet is always used as the testing set. Each subset contains all the 146 DNA-BPs and a part of NBPs in DNAeSet. In detail, if the set of NBPs in is denoted by , k=1, 2, ..., then consists of 250 NBPs randomly selected from DNAeSet. And is obtained by adding 50 NBPs to , until 1700 NBPs are contained in it. For each subset , k=1, 2, ..., 30, we develop the SVM model by 5CV with 3 runs. The results averaging over the three runs are given in Fig. (5) . From Fig. (5) we can see that the curves of ACC and acc visibly split with each other when n, the size of , is larger. With increasing of n, ACC increases rapidly, while acc tends to be steady. The value of ACC seems higher and higher on the surface, but it cannot correctly reflect the performance because it is nothing but a false appearance.
Search related documents:
Co phrase search for related documents- benchmark dataset and DNAeSet NBPs: 1
Co phrase search for related documents, hyperlinks ordered by date