Author: Silva, José Cleydson F; Carvalho, Thales F M; Fontes, Elizabeth P B; Cerqueira, Fabio R
                    Title: Fangorn Forest (F2): a machine learning approach to classify genes and genera in the family Geminiviridae.  Cord-id: 9vocdgd6  Document date: 2017_1_1
                    ID: 9vocdgd6
                    
                    Snippet: BACKGROUND Geminiviruses infect a broad range of cultivated and non-cultivated plants, causing significant economic losses worldwide. The studies of the diversity of species, taxonomy, mechanisms of evolution, geographic distribution, and mechanisms of interaction of these pathogens with the host have greatly increased in recent years. Furthermore, the use of rolling circle amplification (RCA) and advanced metagenomics approaches have enabled the elucidation of viromes and the identification of 
                    
                    
                    
                     
                    
                    
                    
                    
                        
                            
                                Document: BACKGROUND Geminiviruses infect a broad range of cultivated and non-cultivated plants, causing significant economic losses worldwide. The studies of the diversity of species, taxonomy, mechanisms of evolution, geographic distribution, and mechanisms of interaction of these pathogens with the host have greatly increased in recent years. Furthermore, the use of rolling circle amplification (RCA) and advanced metagenomics approaches have enabled the elucidation of viromes and the identification of many viral agents in a large number of plant species. As a result, determining the nomenclature and taxonomically classifying geminiviruses turned into complex tasks. In addition, the gene responsible for viral replication (particularly, the viruses belonging to the genus Mastrevirus) may be spliced due to the use of the transcriptional/splicing machinery in the host cells. However, the current tools have limitations concerning the identification of introns. RESULTS This study proposes a new method, designated Fangorn Forest (F2), based on machine learning approaches to classify genera using an ab initio approach, i.e., using only the genomic sequence, as well as to predict and classify genes in the family Geminiviridae. In this investigation, nine genera of the family Geminiviridae and their related satellite DNAs were selected. We obtained two training sets, one for genus classification, containing attributes extracted from the complete genome of geminiviruses, while the other was made up to classify geminivirus genes, containing attributes extracted from ORFs taken from the complete genomes cited above. Three ML algorithms were applied on those datasets to build the predictive models: support vector machines, using the sequential minimal optimization training approach, random forest (RF), and multilayer perceptron. RF demonstrated a very high predictive power, achieving 0.966, 0.964, and 0.995 of precision, recall, and area under the curve (AUC), respectively, for genus classification. For gene classification, RF could reach 0.983, 0.983, and 0.998 of precision, recall, and AUC, respectively. CONCLUSIONS Therefore, Fangorn Forest is proven to be an efficient method for classifying genera of the family Geminiviridae with high precision and effective gene prediction and classification. The method is freely accessible at www.geminivirus.org:8080/geminivirusdw/discoveryGeminivirus.jsp .
 
  Search related documents: 
                                Co phrase  search for related documents- Try single phrases listed below for: 1
 
                                Co phrase  search for related documents, hyperlinks ordered by date