Author: Bonidia, Robson Parmezan; Sampaio, Lucas Dias Hiera; Domingues, Douglas Silva; Paschoal, Alexandre Rossi; Lopes, FabrÃcio Martins; de Leon Ferreira de Carvalho, André Carlos Ponce; Sanches, Danilo Sipoli
Title: Feature Extraction Approaches for Biological Sequences: A Comparative Study of Mathematical Models Cord-id: p62rxpdx Document date: 2020_8_6
ID: p62rxpdx
Snippet: The number of available biological sequences has increased significantly in recent years due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed to analyze and extract information from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The utilization of machine learning methods has helped to extract relevant information from various biological datasets. Howev
Document: The number of available biological sequences has increased significantly in recent years due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed to analyze and extract information from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The utilization of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several obstacles that motivate new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes to study and analyze a feature extraction pipeline based on mathematical models (Numerical Mapping, Fourier, Entropy, and Complex Networks). As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this work into two studies, e.g., (I) we assessed our proposal with the most addressed problem in our review, e.g., lncRNA vs. mRNA; (II) we tested its generalization on different classification problems, e.g., circRNA vs. lncRNA. The experimental results demonstrated three main contributions: (1) An in-depth study of several mathematical models; (2) a new feature extraction pipeline and (3) its generalization and robustness for distinct biological sequence classification.
Search related documents:
Co phrase search for related documents, hyperlinks ordered by date