I'm trying to use the FluxSimulator to generate some synthetic data on which I'm comparing a number of different RNA-Seq expression estimation tools. A number of existing methods test their quantification procedures on data generated using the generative model that the method itself assumes (thus begging the question and resulting in potentially misleading estimates). The FluxSimulator seems like a great tool — developed independently and with the goal of generating realistic synthetic data — on which to test some of these methods.
Right now, however, I'm getting some very strange results when using eXpress to estimate expression levels on a synthetically generated dataset (.par file – fluxSim_Hsapien_defaultExp_76bpWithError.par). Specifically, eXpress is producing a number of warnings of the form:
"WARNING: The observed alignments appear disporportionately in the forward-reverse order (270392482 vs. 855441). If your library is strand-specific, you should use the --fr-stranded option to avoid incorrect results."
and the resulting quantification estimates are very poor (spearman R ~ 0.64). However, RSEM performs well (spearman R ~ 0.94). I was wondering if the FluxSimulator, run with the parameters included above, will produce a strand-specific library. I read nothing about this in the documentation, and I assumed that the resulting library would not be strand-specific. Could you inform me what the default behavior of the FluxSimulator is regarding the strandedness of the library? If it is strand-specific by default, is there a way to produce a library that is not; what is the parameter that controls this? Thanks for your help!