Is there a way to get around the random expression assignment? It would be a nice addition if users can provide an empirical distribution (a vector of RPKMs or something like that) obtained from a real experiment. This would help on the validation/comparison of observed and simulated profiles at the genome level.
the question about custom expression values has also recently been raised by Moritz from the DKFZ. Yes, there is a way. You need to provide the Simulator with a .PRO file that has 6 columns and provides valid information on the 4 columns marked in bold below.
Do not worry about column 3 and column 5, they are more of informative character (i.e., "output") rather than used in subsequent steps. If you provide such a custom .PRO file in the parameters, and if you do not request re-generation of expression values (flag -x), then the program will use your values in the subsequent steps of the simulation pipeline. For instance the command
will "eat" the expression values you provided in the custom profile.
Column 6 of your custom .PRO file has to be filled with Integer values according to the expression values you have. The numbers represent initial molecules, as a rule of thumb you may want to start with values of about (10 * RPKM) which should come close to the default settings of the Flux Simulator–which are certainly way less molecules than there are in real cells.
Column 1, 2, and 4 you obtain from the transcript annotation (GTF file), either by your preferred scripting language or by running the Simulator and "hitchhiking" the corresponding values from a generated expression profile–they are invariant transcript attributes.
I created a ticket to continue the discussion whether/how we can improve the program to make these steps more automated
Please feel free to put yourself in the watchlist of the ticket to get notified when there are updates.
Thanks Micha, makes sense to me.