Child pages
• Covered fraction of a simulated transcript
Go to start of banner

Covered fraction of a simulated transcript

Hello Michael,

Here is a modified extract from my *.pro file. Since my reference transcript length is 1688bp, and the covered fraction as shown, the actual covered length of the transcript comes to 588bp (am I right there?).

locustranscript_IDlengthexpressed fractionexpressed numbersequenced fractionsequenced numbercovered fraction
Chr1:3631-5899WAT1G01010.116882.0002340273812036E-6102.6374959613343093E-680.348341226577759

```@Chr1:3631-5899W:AT1G01010.1:2:1688:10:276/1
@Chr1:3631-5899W:AT1G01010.1:2:1688:10:276/2
@Chr1:3631-5899W:AT1G01010.1:3:1688:1134:1342/1
@Chr1:3631-5899W:AT1G01010.1:3:1688:1134:1342/2
@Chr1:3631-5899W:AT1G01010.1:4:1688:886:1144/1
@Chr1:3631-5899W:AT1G01010.1:4:1688:886:1144/2
@Chr1:3631-5899W:AT1G01010.1:5:1688:1340:1686/1
@Chr1:3631-5899W:AT1G01010.1:5:1688:1340:1686/2```

My questions are:

• Are the coordinates 1-based or 0-based? Simply put, are the first fragment coordinates [10,276] or (10,276] in the first pair of reads?

• What does the length 588bp refer to?

Thanks for your time and help.

• No labels

1. Hi!

Sorry for the delay in responding. Yes, in your example the covered nucleotides would add up to a total length of 588nt.

`0.348341226577759 * 1688 = 588`

However, these 588nt do not have--and are unlikely--to be consecutive on the transcript sequence. More formally, the covered fraction  is computed

where would be the length of the transcripts, and is the indicator fuction whether a position  is covered by at least one read , or not . Therefore, the distribution of the covered positions can be discontinous, as for instance

X X  XXX   XX X

where X would mark covered positions .

To your other question, the coordinates in all library files and also in the read output are 0-based. That falls back to their initialization in Fragmenter.processInitial() which currently reads

```        // 0-based tx coordinates
int start = 0;
int end = origLen - 1;```

Note that, when using variation in transcription start or poly-A tails, you may find negative coordinates or coordinates .

Best,

Micha

1. Hi Micha,

I have a few more clarifications:

1. `0.348341226577759` `* ``1688` `= 587.999990463 exactly. Do the trailing decimal figures have any significance or these can be safely rounded off?`
2. Following are the pair of reads for a transcript whose *.pro entry is listed:

@Chr3:13579593-13580782C:AT3G33045.1:2:1190:383:689/1
AGGATTTGACAGTACATTTAGGCAGAGAAGTTCGGTTAGGTGGACCAGTTCATTTCAGATGGATGTATCCGTTTGA
@Chr3:13579593-13580782C:AT3G33045.1:2:1190:383:689/2
AGACTGCCATATTTTGGATGACAACCATATGGGCTATTTTTGTCTCTAcTgCcnTcgagaagAccTcnncngCTnC
Chr3:13579593-13580782C AT3G33045.1 NC 1190 6.000702082143611E-7 3 0.0 0 6.593739903335773E-7 2 0.1260504275560379 0 NaN

The covered length comes to 1190* 0.1260504275560379 = 150. But the reads are 76bp each, non-overlapping. The covered length should come to 152. I have observed this in all my reads, and the case above as well (588bp one). Could you please help me figure what's amiss here?

Thanks a lot for your help.

Prachi