The Flux Simulator provides some statistical indicators to measure the uniformity of the read distribution along a transcript produced by in silico sequencing.
The fraction of a transcript that is covered by reads reflects its expression and the degree of of coverage fluctuation provoked by biases.
\sum \limits_{i=1}^L \frac{sgn(cov(i))}{L} where L is the length of the transcript,
and the sign function
sgn(cov(i))=\left\{\begin{eqnarray} 0& \textrm{if}& cov(i)=0\\ 1& \textrm{if}& cov(i)>0 \end{eqnarray}\right. |
indicating whether a position is covered by at least one sequenced read.
Pearson’s chi-square can be used to test the goodness of fit of a given sample to a theoretical distribution. Given a transcript of length L and coverage cov(i) at position i, the test statistic is defined as follows:
\chi^2 = \sum\limits_{i=1}^L \frac{(cov(i)-\mu)^2}{\mu} |
where \mu = \sum\limits_{i=1}^L \frac{cov(i)}{L} is the average coverage along the molecule.
A CV is defined as the ratio of the standard deviation and the standard deviation of a probability distribution:
\textrm{CV}= \frac{\sigma}{\mu}, with standard deviation \sigma= \sqrt{\sum\limits_{i=1}^L \frac{(\overline{cov}(i))^2}{L}} and mean \mu= \sum\limits_{i=1}^L \frac{\overline{cov}(i)}{L}
of transformed coverage values \overline{cov}(i)= \frac{3}{2} \left(\frac{cov(i)^{\frac{2}{3}}-\mu^{\frac{2}{3}}}{\mu^{\frac{1}{6}}}\right)
Latter Anscombe transformation of coverage values has been proposed [Hansen et al. 2010] under the assumption that the distribution of reads along a transcript follows a Poisson distribution, which is to be transformed to a Gaussian distribution.