Name | Variable | Default Value | Parameter Range | Description |
---|---|---|---|---|
REF_FILE | file from which the reference annotation (GTF format) is read | |||
LOAD_CODING | true | {true,false} | flag to dis-/consider transcripts that have an annotated coding sequence | |
LOAD_NONCODING | true | {true,false} | flag to dis-/consider transcripts that are annotated to be non-coding | |
PRO_FILE | file to which the simulated expression values are written | |||
LIB_FILE | file to which the expressed transcript molecules are written | |||
NB_MOLECULES | 5,000,000 | >0 | number of expressed RNA molecules simulated | |
EXPRESSION_K | -0.6 | exponent of the expression power law ("Pareto coefficient") | ||
EXPRESSION_X0 | 9,500 | controls the exponential decay | ||
EXPRESSION_X1 | 9,5002 | controls the exponential decay |
In the beginning, the Flux Simulator reads the transcripts of the reference annotation and clusters genomic overlapping ones into loci. To assign a random expression profile where not necessarily all transcripts of the reference are expressed. Expression levels |
\[y=y^{k} exp^{-\frac{x}{a}-\left(\frac{x}{b}\right)^2}\] |
where |
Output: The first 6 columns of the PRO_FILE |
After the number of RNA molecules has been determined for each transcript, in silico expressed transcripts are assigned individual variations in transcription start and the length of the attached poly-A tail. The FLUX SIMULATOR modeles differences in transcription start are modelled by random variables under an exponential model with a mean around 10nt. During poly-adenylation in the nucleus usually 200-250 adenine residues get added to the primary transcript. Disregarding other poly-adenylation mechanisms, as cytoplasmatic polyadenylation, and the exact mechanisms of degrading processes by exo- and endonucleases, our model describes poly-A lengths by randomly sampling under a Gaussian distribution with a mean of 125nt and shape adapted s.t. >99.5% of the random variables fall in the interval [0;250]. |
Requires: PRO_FILE_NAME Column 1-4,
Outputs: PRO_FILE_NAME Column 5 (relative abundance) and 6 (molecule count), both after gene expression