Author: Kirillova, Svetlana; Kumar, Suresh; Carugo, Oliviero
Title: Protein Domain Boundary Predictions: A Structural Biology Perspective Document date: 2009_1_21
ID: qrnhp1ek_20
Snippet: The computation of the values of J, R, and FM is elementary. The estimation of their statistical significance is less obvious [67] . For example, it is difficult to estimate the probability that a certain value of the index J was obtained by chance. From another point of view, if J CK > J DL , where J CK monitors the similarity between the classifications C and K and J DL difference between the classifications D and L, it is clear that C and K ar.....
Document: The computation of the values of J, R, and FM is elementary. The estimation of their statistical significance is less obvious [67] . For example, it is difficult to estimate the probability that a certain value of the index J was obtained by chance. From another point of view, if J CK > J DL , where J CK monitors the similarity between the classifications C and K and J DL difference between the classifications D and L, it is clear that C and K are more similar to each other than D and L. However, it is more difficult to estimate the statistical significance of the inequality J CK > J DL . In other words, it is more difficult to estimate the probability that C and K are really more similar to each other than D and L. This depends on the fact that the probability density functions of the indices J, R, and FM are unknown and must therefore be estimated numerically on the basis of adequate simulations. Therefore, we generated a series of simulated partitions, using a Metropolis-Monte Carlo approach, by mean of the following procedure. Each partition is characterized by a series of boundaries that separate a domain and a loop and that can be located also at the N-or at the C-terminus. Given a protein containing N residues, a boundary can be any integer k with 1 k N. A series of boundaries were generated iteratively. The first (k 0 ) was randomly selected in the range (1, N); the second (k 1 ) was randomly selected in the range (1, m 0 ), where m 0 = N -k 0 ; the third (k 2 ) was randomly selected in the range (1, m 1 ) where m 1 = m 0 -k 1 ; and so on, the i th boundary (k i ) was randomly selected in the range (1,m i-1 ), where m i-1 = m i-2 -k i-1 . Two constrains were imposed during the generation of random domain boundaries within a protein. We considered that a domain must contain more than 30 residues and a loop size must be smaller than 30 residues. 10,000 random partitions into domains were generated for proteins containing 75, 100, 125, ..., 550, 575, 600 residues. It was then possible to make 49,995,000 pairwise comparisons between two partitions and the 49,995,000 values of the coefficients J, R, and FM were retained in order to determine their distributions.
Search related documents:
Co phrase search for related documents- Try single phrases listed below for: 1
Co phrase search for related documents, hyperlinks ordered by date