Date: Fri, 29 Mar 2024 08:23:42 +0100 (CET) Message-ID: <90043644.2835.1711697022841@localhost> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_2834_2144092628.1711697022839" ------=_Part_2834_2144092628.1711697022839 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Hi,
I ran flux capacitor on a human BAM file with 9902168 aligned paire= d end reads (19804336 aligned reads which equals the number of SAM records = as I consider primary alignments only); I used the parameter= file
ANNOTATION_FILE ensembl_human_71.gtf
COUNT_ELEMENTS [SPLICE_JUNCTIONS, INTRONS]
ANNOTATION_MAPPING PAIRED
Now if I only consider the entries of the output GTF file with feature = =3D "transcript" (ignoring intron and junction entries) and add up the coun= ts-per-million (CPM) values:
RPKM * length / 1000
over all these entries, then I obtain as sum 1568177.63124 although by d= efinition the sum of the CPM values should be 1000000.
Can you comment on this? Why do I get signifcantly more normalized read = counts than I should?
One way to deal with this is to renormalize the data so that the CPM add= up to 1000000. Another question would be whether this renormalization also= need to be applied to intron and junction counts.
Best regrads,
Sven