A simple answer to this question would be: 'That depends on what you want to simulate.' The basic operation of the Flux Simulator requires transcript annotations in the form of a GTF file. Then, if read sequences–and potential biases–are to be simulated, also a set of genomic reference sequences (one per chromosome) is required. Finally, some parameters allow to specify empiric data, i.e., insert size distributions, sequence biases, etc. deduced from experimental evidence.
The Flux Simulator obtains transcript sequences from the genomic sequence and a transcriptome annotation. Although these seem together to reproduce the information that could be provided by solely a file with transcript sequences, there are the following practical advantages that made us to prefer the genomic sequence to the transcribed sequence:
Therefore, in order to employ transcript sequences with the Flux Simulator, they are to be mapped to a corresponding genome. There are several programs available to align transcribed sequences to the genome they originate from, some popular ones are Blat or Exonizer.
The Flux Simulator uses the systems temporary folder (on Linux, usually /tmp
) to store intermediate files. These files can be quiet big and you might run into situations where there is not enough space in your temp folder. You have two ways to change the folder used for temporary files:
Specify TMP_DIR in your parameters file
TMP_DIR my_tmp_folder/
If you specify a relative path here, it will be relative the the current working directory where you start the run.
Set the $TMPDIR environment variable (on Linux)
$> export TMPDIR=$HOME/my_tmp_dir