My question is that how does one parameter affect the other? In more specific way,

should I set same values for both to ensure the numbers of molecules (column 6 in .PRO file) will be correctly reflected as the number of reads in final fasta/fastq files?

Thanks

  • No labels

1 Comment

  1. Dear Ruolin,

    the Flux Simulator pipeline simulates each step of a modeled RNA-Seq experiment according to the provided parameters. The process starts with a transcript population of NB_MOLECULES size and then carries out each of the experimental processes (fragmentation, RT, amplification, size selection etc.) on all of the simulated molecules. READ_NUMBER is the number of reads that are requested to be sequenced in the end.

    Therefore, the number of fragments in the final library depends on the initial transcript population (NB_MOLECUES) by a convolved function of the models and corresponding parameters for each of the simulated steps. To prevent from oversampling the library, the number of fragments in the final library is the supremum for the number of sequenced reads, which may be lower than the requested READ_NUMBER. 

    In practice, the easiest way to determine the library yield of a programmed protocol may be to test the programmed protocol on a toy dataset in order to estimate how many initial transcript molecules are required to simulate a library for sequencing the desired sequencing depth.

    Best