Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Section

The PAR format in the Flux Simulator is used to administrate all parameters of a run. It is a simple format containing key value pairs (one per line) with the following parameter names (i.e., keys):

File Locations

KeyTypeDefault ValueDescription
REF_FILE_NAMEString 

Path to the GTF reference annotation, either absolute or relative to the location of the parameter file

PRO_FILE_NAMEString{REF_FILE_NAME}.PRO

Path to the profile of the run, either absolute or relative to the location of the parameter file; the default profile uses the name of the parameter file with the extension .pro.

LIB_FILE_NAMEString{REF_FILE_NAME}.LIB

Path to the library file of the run, either absolute or relative to the location of the parameter file; the default profile uses the name of the parameter file with the extension .lib.

SEQ_FILE_NAMEString{REF_FILE_NAME}.BED

Path to the sequencing file of the run, either absolute or relative to the location of the parameter file; the default profile uses the name of the parameter file with the extension .bed.

GEN_DIRString 

Path to the directory with the genomic sequences, i.e., one fasta file per chromosome/scaffold/contig with a file name corresponding to the identifiers of the first column in the GTF annotation.

TMP_DIRString$TMP_DIRTemporary directory, can also be specified by the environment variable $TMP_DIR.

Expression

KeyTypeDefault ValueDescription
LOAD_CODINGBooleanYESCoding messengers, i.e., transcripts that have an annotated CDS, are extracted from the cell.
LOAD_NONCODINGBooleanYESNon-coding RNAs, i.e., transcripts without an annotated ORF are extracted from the cell.
NB_MOLECULESLong5,000,000Number of RNA molecules initially in the experiment.
EXPRESSION_KDouble(-0.6)Exponent of power-law underlying the expression profile [-1;0]
EXPRESSION_X0Double9,500Linear parameter of the exponential decay.
EXPRESSION_X1Double90,250,000Quadratic parameter of the exponential decay.

Transcript Modifications

KeyTypeDefault ValueDescription
TSS_MEAN
Double25rate of the exponential for deviation of simulated transcription starts from annotated transcription start point, set to NaN (i.e., "not a number") to deactivate simulated transcription start variability
POLYA_SCALE
Double300scale parameter of the Weibull distribution describing poly-A tail lengths, set to NaN (i.e., "not a number") to deactivate simulated poly-A tails
POLYA_SHAPE
Double2shape paramter of the Weibull distribution describing poly-A tail lengths, set to NaN (i.e., "not a number") to deactivate simulated poly-A tails

Library prepeparation

Fragmentation

KeyTypeDefault ValueDescription
FRAGMENTATIONBooleanYESTurn fragmentation on/off.
FRAG_SUBSTRATE{DNA,RNA}

RNA*

*DNA in Simulator

v1.2 and earlier

Substrate of fragmentation, determines the order of fragmentation and reverse transcription (RT):

     for substrate DNA, fragmentation is carried out after RT,

     substrate RNA triggers fragmentation before RT.

FRAG_METHOD{EZ,NB,UR}UR

Fragmentation method employed:

     * [EZ] Fragmentation by enzymatic digestion

     * [NB] Fragmentation by nebulization

     * [UR] Uniformal random fragmentation

     * [EZ] Fragmentation by enzymatic digestion

    
    
    
    
    

 

Enzymatic Digestion

FRAG_EZ_MOTIFString 

Sequence motif caused by selective restriction with an enzyme, choose pre-defined NlaIII, DpnII, or a file with a custom position weight matrix.

Nebulization
FRAG_NB_LAMBDADouble900.0Threshold on molecule length that cannot be broken by the shearfield of nebulization.
FRAG_NB_THOLDDouble0.1

Threshold on the fraction of the molecule population; if less molecules break per time unit, convergence to steady state is assumed.

FRAG_NB_MDouble1.0Strength of the nebulization shearfield (i.e., rotor speed).
Uniformal Random (UR) Fragmentation
FRAG_UR_ETADoubleNaN

Average expected framgent size after fragmentations, i.e., number of breaks per unit length (exhautiveness of fragmentation);

NaN optimizes the fragmentation process w.r.t. the size filtering

FRAG_UR_DELTADoubleNaN

Geometry of molecules in the UR process:

     * NaN= depends logarithmically on molecule length,

     * 1= always linear,

     * 2= always surface-diameter,

     * 3= volume-diameter, ...

FRAG_UR_D0Double1.0Minimum length of fragments produced by UR fragmentation.

Reverse Transcription (RT)

...

KeyTypeDefault ValueDescription
RTRANSCRIPTIONBooleanYESSwitch on/off Reverse Transcription.
RT_PRIMER[RANDOM|POLY-DT] {RH,PDT}RH

Primers used for first strand synthesis:

     * [RH] for random hexamers or

     * [PDT] for poly-dT primers

Flag to switch between random priming and poly-dT priming for the first strand synthesis of the reverse transcription

RT_MINInteger 500

Minimum

length (in [nt]) of the expected reversely transcribed cDNA molecules

fragment length observed after reverse transcription of full-length transcripts.

RT_MAXInteger 5,500

Maximum

length (in [nt]) of the expected reverse transcription products
FRAGMENTATION[YES|NO] Optional: flag that determines whether a fragmentation step is carried out
FRAG_B4_RT[YES|NO] flag to schedule the fragmentation before (YES), or after (NO) the reverse transcription. Note for fragmentations carried out before reverse transcription, exclusively random priming strategies are reasonable.
FRAG_MODE[PHYSICAL|CHEMICAL] flag to switch between fragmentation according to physical or chemical attributes.
FRAG_LAMBDAInteger Upper boundary of fragment lengths (in [nt]) that are not expected to be fragmented by the applied technique
FILTERING[YES|NO] Flag to indicate whether a length filtering step is carried out on the cDNA library.
FILT_MINInteger Minimum length that is retained during filtering.
FILT_MAXInteger Maximum length that is retained during filtering.

fragment length observed after reverse transcription of full-length transcripts.

Filtering

KeyTypeDefault ValueDescription
FILTERINGBooleanNOSwitches size selection on/off.
SIZE_DISTRIBUTIONStringdefault

Size distribution of fragments after filtering, either specified by the fully qualified path of a file with an empirical distribution where each line represents the length of a read (no ordering required), or attributes of a gaussian distribution (mean and standard deviation) in the form , for example . If no size distribution is provided, an empirical Illumina fragment size distribution is employed.

Amplification

KeyTypeDefault ValueDescription
PCR_DISTRIBUTIONStringdefault

PCR distribution file, 'default' to use a distribution with 15 rounds and 20 bins, 'none' to disable amplification.

PCR_PROBABILITYFloat0.1PCR duplication probability when GC filtering is disabled by setting GC_MEAN to NaN.
GC_MEANFloat0.5

Mean value of a gaussian distribution that reflects GC bias amplification probability, set this to 'NaN' to disable GC biases.

GC_SDFloat0.1

Standard deviation of a gaussian distribution that reflects GC bias amplification probability, inactive if GC_MEAN is set to NaN.

Sequencing

 Path to folder for temporary files, if different from system standard (commonly /tmp on Unix clones)
KeyTypeDefault ValueDescription
READ_NUMBERInteger5,000,000Number of readsREAD_NUMBERInteger Number of reads that are intented to produce. Note: this number is an upper boundary and gets adapted to the actual size of the intermediary generated library.
READ_LENGTHInteger36Length of the generated reads, depends on filtering settings.
PAIRED_ENDBoolean[YES|NO] Flag to indicate whether read pairs are produced.
FASTQ[YES|NO] Flag that indicates whether additionally the read sequences and qualities are output. Depends on GENOME_DIR and ERR_FNAME.
QTHOLDInteger Quality value below which base-calls are considered problematic.
Switch on/off paired-end reads.
FASTABooleanNO

Creates .fasta/.fastq output. Requires the genome sequences in a folder specified by GEN_DIR. If a quality model is provided by parameter ERR_FILE, a .fastq file is produced. Otherwise read sequences are given as .fasta.

ERR_FILEString 

Path to the file with the error model. With the values '35' or '76', default error models are provided for the corresponding read lengths, otherwise the path to a custom error model file is expected.

UNIQUE_IDSBooleanNOCreate unique read identifiers for paired reads. Information about the relative orientation is left out of the read id and encoded in the pairing information. All /1 reads are sense reads, all /2 reads are anti-sense reads. This option is useful if you want to identify paired reads based on the read idsTMP_DIRString .