View Source

B.1 What do I need for Simulating an RNA-Seq Experiment?

A simple answer to this question would be: 'That depends on what you want to simulate.' The basic operation of the Flux Simulator requires transcript annotations in the form of a GTF file. Then, if read sequences–and potential biases–are to be simulated, also a set of genomic reference sequences (one per chromosome) is required. Finally, some parameters allow to specify empiric data, i.e., insert size distributions, sequence biases, etc. deduced from experimental evidence.

B.2 Does the Flux Simulator allow Transcript Sequences as Input

The Flux Simulator obtains transcript sequences from the genomic sequence and a transcriptome annotation. Although these seem together to reproduce the information that could be provided by solely a file with transcript sequences, there are the following practical advantages that made us to prefer the genomic sequence to the transcribed seuquence:

The genome contains naturally more elements than exclusively the transcribed portion of it, and allows for simulation of biological variations in transcription and processing of transcripts as for instance variable transcription start sites or the spontaneous failure to splice out an intron (so-called intron retention events).

For quantitative applications of RNA-Seq, reads are usually mapped to a genomic reference in a first step. Therefore, genomic sequences are often readily available, and genomic coordinates of simulated reads allow for benchmarking the process of genomic mapping.

Therefore, in order to employ transcript sequences with the Flux Simulator, they are to be mapped to a corresponding genome. There are several programs available to align transcribed sequences to the genome they originate from, some popular ones are Blat or Exonizer.