You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

Parameter

NameVariable

Default

Value

Parameter

Range

Description
REF_FILE
   file from which the reference annotation (GTF format) is read
LOAD_CODING
 true{true,false}flag to dis-/consider transcripts that have an annotated coding sequence
LOAD_NONCODING
 true{true,false}flag to dis-/consider transcripts that are annotated to be non-coding
PRO_FILE
   file to which the simulated expression values are written
LIB_FILE
   file to which the expressed transcript molecules are written
NB_MOLECULES
 5,000,000>0number of expressed RNA molecules simulated
EXPRESSION_K
-0.6exponent of the expression power law ("Pareto coefficient")
EXPRESSION_X0
9,500parameter of the exponential decay
EXPRESSION_X1
9,5002parameter of the exponential decay
TSS_MEAN    
POLYA_SCALE    
POLYA_SHAPE    

The Distribution of Gene Expression Levels

Input: reference annotation (REF_FILE), transcript filtering parameter (LOAD_CODING, LOAD_NONCODING), expression parameters (NB_MOLECULES, EXPRESSION_K, EXPRESSION_X0, EXPRESSION_X1)

In the beginning, the Flux Simulator reads the transcripts of the reference annotation (REF_FILE) and clusters genomic overlapping ones into loci. Transcripts that are annotated as non-/coding can be selectively disregarded (LOAD_CODING, LOAD_NONCODING). Then to assign a random expression profile where not necessarily all transcripts of the reference are expressed. Expression levels are connected with the relative expression rank  by a mixed power- and exponential law of the general form

where denotes the rank number of a gene and is the exponent of the intrinsic power law, and respectively  control the exponential decay. The Flux Simulator assigns to the transcripts in the reference annotation randomly expression ranks which then are turned into relative expression levels by the modified Zipf's Law above, which determines the initial number of molecules by multiplication with the total numbers of molecules. Default values for parameters and have been estimated for mammalian cells by non-linear fitting to expression levels observed in experimental results.

Output: Columnn 1-6 of the PRO_FILE, i.e., (1) locus name, (2) transcript identifier, (3) coding flag, (4) length of the processed transcript, (5) relative fraction  and (6) absolute number of the transcript species in the initial RNA extraction.

Transcript Modifications during Expression

Input: Columnn 6 of the PRO_FILE, i.e., the absolute number of RNA molecules that is simulated for a certain transcript in the experiment and the parameters of transcription start (TSS_MEAN) and poly-A tail variation (POLYA_SCALE, POLYA_SHAPE).

After for each transcript the number of RNA copies has been determined, these in silico expressed transcripts are assigned individual variations in transcription start and the length of the attached poly-A tail. The Flux Simulator models differences in the annotated transcription starts by an exponential distribution with an adjustable mean value (TSS_MEAN). During poly-adenylation in the nucleus usually 200-250 adenine residues get added to the primary transcript. Disregarding other poly-adenylation mechanisms (e.g., cytoplasmatic polyadenylation) the Flux Simulator describes poly-A lengths by a flexible Weibull distribution (POLYA_SCALE, POLYA_SHAPE).

Output: One line per simulated transcript molecule containing in the LIB_FILE.

  • No labels