Some work in the field assumes that fragment lengths are distributed according to the observed insert size distribution, and that such fragments are sampled uniformly random from the original transcript sequences. What were the reasons to choose a Weibull distribution for modeling the intermediary result of fragmentation?
1 Comment
Micha Sammeth
The motivation for the fragmentation model of hydrolysis we propose is based on observations from spike-in experiments carried out by the Wold lab, where spiked-in sequences differ substantially in length, from <400nt to >10.000nt. These spike-ins produce fragment size distributions that obviously differ from each other (Fig.2A from the paper). On the other hand, previously published biophysics literature has shown that uniform random fragmentation processes produce Weibull dstributions depending on the size of a particular molecule when fragmenting mixtures of molecules with different lengths.
The physics behind the model is that longer molecules are less likely to be completely linearized when random fragmentation is carried out, and that the corresponding geometric factor ("delta" in the model) leads to fragmentation distributions as in Fig.2C. We carried out in silico size selection on these simulated distributions, employing empirically measured insert sizes (Fig.2B), and we found that these models reproduce our observations from RNA-Seq fairly well (Fig.2A). Uniform random sampling fragments from transcripts obviously would not reproduce these differences.