mir-seek
Patch: updating steps for novel miR identification and quantification
About
By default, miRDeep2 produces identifiers for novel miRs that are un-informative and not human-readable. Internally, miRDeep2 will generate a string that is a combination of the chromosome on which the miR was found and an internal counter to miRDeep2. As so, the first miR will have an identifier as follows: chrN_1
.
Here is an example identifier from the v0.3.0 version of the pipeline:
chr20__AC:CM000682.2__gi:568336004__LN:64444167__rl:Chromosome__M5:b18e6c531b0bd70e949a7fc20859cb01__AS:GRCh38_43554
Note, the extra metadata from the sequence identifier in the genomic fasta file is also included. This patch aims to rename the novel identifiers produced by miRDeep2 into a format that is more human-readable. The new renamed identifiers will contain the following information: chr, start, stop, strand. This patch also accounts for any 1:M relationships between novel mature & precursor mIRs, in a similar manner to how we account for this for known miRs.
Here is an example of a new identifier produced in v0.3.1:
novel_mir_chr19_29605226_29605283_reverse_strand
Full Changelog: v0.3.0...v0.3.1