The Flux Simulator provides some statistical indicators to measure the uniformity of the read distribution along a transcript produced by in silico sequencing.
The fraction of a transcript that is covered by reads reflects its expression and the degree of of coverage fluctuation provoked by biases.
where is the length of the transcript,
and the sign function
indicating whether a position is covered by at least one sequenced read.
Pearson’s chi-square can be used to test the goodness of fit of a given sample to a theoretical distribution. Given a transcript of length and coverage at position , the test statistic is defined as follows:
where is the average coverage along the molecule.
A CV is defined as the ratio of the standard deviation and the standard deviation of a probability distribution:
, with standard deviation and mean
of transformed coverage values
Latter Anscombe transformation of coverage values has been proposed [Hansen et al. 2010] under the assumption that the distribution of reads along a transcript follows a Poisson distribution, which is to be transformed to a Gaussian distribution.