Hello Michael,
Here is a modified extract from my *.pro file. Since my reference transcript length is 1688bp, and the covered fraction as shown, the actual covered length of the transcript comes to 588bp (am I right there?).
locus | transcript_ID | length | expressed fraction | expressed number | sequenced fraction | sequenced number | covered fraction |
---|---|---|---|---|---|---|---|
Chr1:3631-5899W | AT1G01010.1 | 1688 | 2.0002340273812036E-6 | 10 | 2.6374959613343093E-6 | 8 | 0.348341226577759 |
My read headers from the given transcript are:
@Chr1:3631-5899W:AT1G01010.1:2:1688:10:276/1 @Chr1:3631-5899W:AT1G01010.1:2:1688:10:276/2 @Chr1:3631-5899W:AT1G01010.1:3:1688:1134:1342/1 @Chr1:3631-5899W:AT1G01010.1:3:1688:1134:1342/2 @Chr1:3631-5899W:AT1G01010.1:4:1688:886:1144/1 @Chr1:3631-5899W:AT1G01010.1:4:1688:886:1144/2 @Chr1:3631-5899W:AT1G01010.1:5:1688:1340:1686/1 @Chr1:3631-5899W:AT1G01010.1:5:1688:1340:1686/2
My questions are:
Thanks for your time and help.
2 Comments
Micha Sammeth
Hi!
Sorry for the delay in responding. Yes, in your example the covered nucleotides would add up to a total length of 588nt.
However, these 588nt do not have--and are unlikely--to be consecutive on the transcript sequence. More formally, the covered fraction is computed
where would be the length of the transcripts, and is the indicator fuction whether a position is covered by at least one read , or not . Therefore, the distribution of the covered positions can be discontinous, as for instance
where X would mark covered positions .
To your other question, the coordinates in all library files and also in the read output are 0-based. That falls back to their initialization in Fragmenter.processInitial() which currently reads
Note that, when using variation in transcription start or poly-A tails, you may find negative coordinates or coordinates .
Best,
Micha
Prachi
Hi Micha,
Thanks for the detailed reply.
I have a few more clarifications:
0.348341226577759
*
1688
= 587.999990463 exactly. Do the trailing decimal figures have any significance or these can be safely rounded off?
@Chr3:13579593-13580782C:AT3G33045.1:2:1190:383:689/1
AGGATTTGACAGTACATTTAGGCAGAGAAGTTCGGTTAGGTGGACCAGTTCATTTCAGATGGATGTATCCGTTTGA
@Chr3:13579593-13580782C:AT3G33045.1:2:1190:383:689/2
AGACTGCCATATTTTGGATGACAACCATATGGGCTATTTTTGTCTCTAcTgCcnTcgagaagAccTcnncngCTnC
The covered length comes to 1190* 0.1260504275560379 = 150. But the reads are 76bp each, non-overlapping. The covered length should come to 152. I have observed this in all my reads, and the case above as well (588bp one). Could you please help me figure what's amiss here?
Thanks a lot for your help.
Prachi