Selected article for: "domain contain high probability and protein domain contain"

Author: Kirillova, Svetlana; Kumar, Suresh; Carugo, Oliviero
Title: Protein Domain Boundary Predictions: A Structural Biology Perspective
  • Document date: 2009_1_21
  • ID: qrnhp1ek_27
    Snippet: It is interesting to compare the results of this extremely simple prediction strategy with the results obtained within the CASP7 experiment, where several prediction methods were applied to about 100 proteins. Table 3 shows the mcc values computed on the basis of the predictions deposited by the participants to the CASP7 experiment. The same classification in tp, fp, fn, and tn, which is described in the Methods section, was used. This means that.....
    Document: It is interesting to compare the results of this extremely simple prediction strategy with the results obtained within the CASP7 experiment, where several prediction methods were applied to about 100 proteins. Table 3 shows the mcc values computed on the basis of the predictions deposited by the participants to the CASP7 experiment. The same classification in tp, fp, fn, and tn, which is described in the Methods section, was used. This means that if protein P contains more than a single domain and it was predicted to contain more than a single domain by using the prediction method M, this was considered a true positive (tp). On the contrary, if it was predicted to contain only one domain by the method M, the prediction was considered a false negative (fn), etc. The data of Table 3 clearly show that most of the prediction methods are less reliable than the predictions based on the very simple assumption that a small protein has a high probability to contain a single domain and that a large protein is likely to contain two or more domains. Actually, only four methods (baker, foldpro, maopus and robetta) can predict a multidomain protein better than the simple predictor (Matthews correlation coefficient larger than 0.628). What does this mean? Are these bioinformatics tools useless in structural biology? The answer is no. First, some of them seem to be rather accurate. Second, these computational techniques were not specifically trained to identify multi-domain proteins and it is thus not surprising that some of them are not suitable to discriminate mono-and multidomain proteins. However, it is reasonable to suppose that these bioinformatics tools are still immature and progress should be expected in the future. Table 4 shows the average values of the J, R, and FM indices computed by comparing predicted and real partitions [see equations (2)-(4)]. All the values tend to be large, quite close to their maximal value of 1. However, the probabilities (pJ, pR, and pFM) to observe by chance values higher than these are quite large, ranging from about 30% to about 70%. Baker, foldpro, maopus and robetta are better in predicting a partition that is closer to the real one, with J, R, and FM values that are larger and have a minor probability to be observed by chance. Not surprisingly, they are the same methods that work better to identify multi-domain proteins (see the mcc values of Table 3 ).

    Search related documents:
    Co phrase search for related documents
    • average value and high probability: 1, 2
    • average value and large protein: 1
    • better work and correlation coefficient: 1
    • chance value and domain contain: 1
    • chance value and high probability: 1, 2
    • correlation coefficient and high probability: 1
    • correlation coefficient and Matthews correlation coefficient: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
    • correlation coefficient and mcc value: 1, 2, 3
    • domain contain and large protein: 1, 2, 3, 4