Child pages
  • 4.5.5 - Uniformity Measurements
Skip to end of metadata
Go to start of metadata

The Flux Simulator provides some statistical indicators to measure the uniformity of the read distribution along a transcript produced by in silico sequencing.

Fraction Covered

The fraction of a transcript that is covered by reads reflects its expression and the degree of of coverage fluctuation provoked by biases.

\sum \limits_{i=1}^L \frac{sgn(cov(i))}{L} where L is the length of the transcript,

and the sign function

sgn(cov(i))=\left\{\begin{eqnarray} 0& \textrm{if}& cov(i)=0\\ 1& \textrm{if}& cov(i)>0 \end{eqnarray}\right.

indicating whether a position is covered by at least one sequenced read.

Chi-square statistics (X2)

Pearson’s chi-square can be used to test the goodness of fit of a given sample to a theoretical distribution. Given a transcript of length L and coverage cov(i) at position i, the test statistic is defined as follows:

\chi^2 = \sum\limits_{i=1}^L \frac{(cov(i)-\mu)^2}{\mu}

where \mu = \sum\limits_{i=1}^L \frac{cov(i)}{L} is the average coverage along the molecule.

Coefficient of variation (CV)

A CV is defined as the ratio of the standard deviation and the standard deviation of a probability distribution:

\textrm{CV}= \frac{\sigma}{\mu}, with standard deviation \sigma= \sqrt{\sum\limits_{i=1}^L \frac{(\overline{cov}(i))^2}{L}} and mean  \mu= \sum\limits_{i=1}^L \frac{\overline{cov}(i)}{L}

of transformed coverage values \overline{cov}(i)= \frac{3}{2} \left(\frac{cov(i)^{\frac{2}{3}}-\mu^{\frac{2}{3}}}{\mu^{\frac{1}{6}}}\right)

Latter Anscombe transformation of coverage values has been proposed [Hansen et al. 2010] under the assumption that the distribution of reads along a transcript follows a Poisson distribution, which is to be transformed to a Gaussian distribution.

  • No labels