Author: Karthi Balasubramanian; Nithin Nagaraj
Title: Automatic Identification of SARS Coronavirus using Compression-Complexity Measures Document date: 2020_3_27
ID: ljli6a2z_62
Snippet: Compression-complexity measures such as LZ and ETC which are based on lossless compression algorithms are good candidates for developing fast alignment-free methods for genome sequence analysis, comparison and identification. The main reason for this is their ability to characterize and analyze information in biological sequences with very short length contiguous segments. As we have demonstrated in this study, our preliminary results suggests th.....
Document: Compression-complexity measures such as LZ and ETC which are based on lossless compression algorithms are good candidates for developing fast alignment-free methods for genome sequence analysis, comparison and identification. The main reason for this is their ability to characterize and analyze information in biological sequences with very short length contiguous segments. As we have demonstrated in this study, our preliminary results suggests that ETC could be very useful for identifying an unknown sequence from a large database of nucleotide sequences since we can quickly compute the measure on the candidate sequences for a small set of nucleic bases. LZ complexity requires slightly larger nucleotide sequences and that needs more computation. Other information theoretic methods in literature which employ Shannon Entropy, Mutual Information etc. would also need larger nucleotide sequences for computation and are not robust to noise. Some areas for further research are:
Search related documents:
Co phrase search for related documents- candidate sequence and good candidate: 1
- compression complexity and ETC LZ compression complexity measure: 1
- compression complexity measure and ETC LZ compression complexity measure: 1
Co phrase search for related documents, hyperlinks ordered by date