Child pages
  • 4.3.1 - RNA Hydrolysis
Skip to end of metadata
Go to start of metadata

Parameters

Parameter

Name

Variable

Default

Value

Parameter

Range

Description
FRAG_UR_D0
 D_01>0minimum length of fragments produced by hydrolysis
FRAG_UR_DELTA
 \deltaNaN1\delta \in \{\textrm{NaN}, \mathbb{R}^+\}geometry of the fragmentation process (1=linear, 2=surface-diameter, 3=volume-diameter, etc.); if not explicitly specified (NaN), the geometry of breakage depends logarithmically on the molecule length
FRAG_UR_ETA
 \etaNaN1\eta \in \{\textrm{NaN}, \mathbb{R}^+\}intensity of fragmentation, determining the number of breaks per unit length; if not explicitly specified (NaN), \eta is determined by the corresponding \delta value and an expectation of 200nt (or the mean filtered fragment size, if size selection is used) long fragments

1 NaN stands for "Not a Number" and marks the uninitialized state of a parameter

Algorithm

Frequencies f(d) of fragment sizes d produced by a uniform random fragmentation process have demonstrated to fall along Weibull distributions (d,\eta), if the fragmentation thermodynamics depends on the molecule size:

f(d)= \frac{\delta}{\eta} \left(\frac{d}{\eta}\right)^{\delta- 1} exp^{-\left(\frac{d}{\eta}\right)^\delta}

Scale parameter \eta represents the intensity of fragmentation (i.e., breaks per unit length), and—as a determinant of the mean expected fragment size—is assumed to be constant across molecules of different lengths for fragmentation protocols where the number of produced fragments depends on the molecule length. Shape parameter \delta reflects the geometric relation in which random fragmentation is breaking a molecule (e.g., d= 1 corresponds to uniform fragmentation on the linear chain of nucleotides, d= 2 splits uniformly the surface, and d= 3 the volume, etc.).

The Flux Simulator uses a 3-step algorithm to tokenize a molecule; first, geometry \delta and the number n of fragments that are obtained from the molecule are determined. We found empirically that parameter d depends logarithmically on len, the length of the molecule that is fragmented d= log(len). The number of fragments produced from a specific RNA molecule is determined by n= \frac{len}{E(d_{max})}, where E(d_{max}) is the expectancy of the most abundant fragment size, computed from h and the gamma-function \Gamma of \delta:

E(d_{max})= \eta \Gamma(\frac{1}{\delta}+ 1)

Second, (n-1) breakpoints are sampled uniformly from the interval [0;1[, resulting in relative length fractions x_1, \ldots,x_n for all n fragments. Third, relative fragment sizes x_i are transformed from unit space to sizes x_i that follow a Weibull distribution of shape d by:

d_i= \frac{x_i}{C}^{\frac{1}{\delta}}

where C= \left(\frac{len}{\sum(x_i^{1/\delta})}\right)^{-\delta} is a constant of the transformation to ensure that the sizes of the n fragments sum up exactly to the given molecule length len. Latter transformation produces a slightly distorted Weibull distribution for the sizes d_i, however the deviation is sufficiently small to be neglected in our applications. 

  • No labels