Author: Guo, Feng-Biao; Dong, Chuan; Hua, Hong-Li; Liu, Shuo; Luo, Hao; Zhang, Hong-Wan; Jin, Yan-Ting; Zhang, Kai-Yue
Title: Accurate prediction of human essential genes using only nucleotide composition and association information Cord-id: vj9tv3or Document date: 2017_6_15
ID: vj9tv3or
Snippet: MOTIVATION: Previously constructed classifiers in predicting eukaryotic essential genes integrated a variety of features including experimental ones. If we can obtain satisfactory prediction using only nucleotide (sequence) information, it would be more promising. Three groups recently identified essential genes in human cancer cell lines using wet experiments and it provided wonderful opportunity to accomplish our idea. Here we improved the Z curve method into the λ-interval form to denote nuc
Document: MOTIVATION: Previously constructed classifiers in predicting eukaryotic essential genes integrated a variety of features including experimental ones. If we can obtain satisfactory prediction using only nucleotide (sequence) information, it would be more promising. Three groups recently identified essential genes in human cancer cell lines using wet experiments and it provided wonderful opportunity to accomplish our idea. Here we improved the Z curve method into the λ-interval form to denote nucleotide composition and association information and used it to construct the SVM classifying model. RESULTS: Our model accurately predicted human gene essentiality with an AUC higher than 0.88 both for 5-fold cross-validation and jackknife tests. These results demonstrated that the essentiality of human genes could be reliably reflected by only sequence information. We re-predicted the negative dataset by our Pheg server and 118 genes were additionally predicted as essential. Among them, 20 were found to be homologues in mouse essential genes, indicating that some of the 118 genes were indeed essential, however previous experiments overlooked them. As the first available server, Pheg could predict essentiality for anonymous gene sequences of human. It is also hoped the λ-interval Z curve method could be effectively extended to classification issues of other DNA elements. AVAILABILITY AND IMPLEMENTATION: http://cefg.uestc.edu.cn/Pheg SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Search related documents:
Co phrase search for related documents- additional time and logistic regression: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
- additional time and long range: 1
- additional time and machine learning: 1, 2
- logistic regression and long range: 1, 2, 3, 4, 5, 6, 7
- logistic regression and machine learn: 1
- logistic regression and machine learning: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74
- logistic regression and machine learning model: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48
- long range and machine learning: 1, 2, 3, 4, 5, 6, 7, 8
- long range and machine learning model: 1
Co phrase search for related documents, hyperlinks ordered by date