Document: lncRNAs. Long ncRNAs were annotated using a high confidence data set H from the LNCipedia 56 (v5.2) data base comprising 107,039 transcripts of potential human lncRNAs. The transcripts were used as input for a BLASTn (2.7.1+, 1e −10 ) search against each of the 16 bat assemblies (compiled as BLAST databases). The BLASTn result for each bat assembly was further processed to group single hits into potential transcripts as follows: first, for each query sequence q ∈ H, hits of q found on the same contig c and strand s were selected (hits c,s,q ) and the longest one, q 1 , was chosen as a starting point, so that trscp c,s,q = (q 1 ). Second, all hits q i ∈ hits c,s,q with q i / ∈ trscp c,s,q , that do not overlap neither in the query q nor in the target sequence and do not exceed a maximum range of 500,000 nt from the most up-stream to the most down-stream target sequence position of all q j ∈ trscp c,s,q ∪ q i , were added iteratively to trscp c,s,q . To this end, we introduced a simple model of exon-intron structures, naturally occurring when using transcript sequences as queries against a target genome assembly. We defined the 500,000 nt search range based on an estimation of lncRNA gene sizes derived from the human Ensembl 45 annotation. If the sum of the lengths of all q i ∈ trscp c,s,q covers the query transcript length length q at least for 70 %, trscp c,s,q was considered as a transcript and its elements q i as exons, otherwise all q i ∈ trscp c,s,q were withdrawn. This procedure is repeated until all hits ∈ hits c,s,q were used or withdrawn. Therefore, each so-defined group of non-overlapping hits derived from the same query sequence and found on the same contig and strand should represent a lncRNA transcript with its (rough) exon structure. The defined transcripts were saved as BLAST-like output and transformed into GTF file format. To follow the GTF annotation format and to harmonize our lncRNA annotations with the other ncRNA annotations, each lncRNA transcript was also saved as a gene annotation and consists of at least one exon.
Search related documents:
Co phrase search for related documents- annotation format and data base: 1
- annotation format and file format: 1, 2
- annotation format and gene annotation: 1
- bat assembly and data base: 1
- bat assembly and gene annotation: 1
- blast database and data base: 1
- blast database and file format: 1
- data base and gene annotation: 1
Co phrase search for related documents, hyperlinks ordered by date