Hi Micha,

I am working on discovering isoforms and estimating isoform-specific expression from RNA-seq data. Therefore I would like to get reads for a single gene, and several individuals sharing a given number of isoforms. So I was wondering if flux-simulator could that and how to achieve it.
My first intuition was to provide a shortened input reference sequence to simulate from (basically the gene sequence in fasta format and the shortened corresponding GFF file). However, it doesn't solve the issue of simulating for several individuals with distinct isoform proportions.
My second thought was to simulate for each individual separately and to directly provide a LIB file, but I'm not sure how to simulate a good library and it would be redundant with what flux simulator is already doing.
Do you have any insight on this?
Thanks
  • No labels

1 Comment

  1. Hi,

    to do what you are after, one needs a model on the inter-individual variability of isoforms within a gene--and probably also on the inter-individual fluctuation of promoter activity/basal gene expression. Re-running the complete simulation line (starting from simulated expression) multiple times would not help, because in such independent runs the simulator would generate new cell types every time.
    An adhoc solution could be to generate one expression profile, for instance by just carrying out the first step of the simulation pipeline or to recruit data from some real experiment (e.g., expression levels of some reference annotation estimated by the Flux). These expression levels stored in a .PRO file then represent your population prototype, from which by some model individual variations are produced (i.e., several derived .PRO files). Each of such obtained individual profiles then is processed by the rest of the simulation pipeline (library preparation, sequencing). 
    Do you have some model for inter-individual variation of gene expression and splicing?
    Cheers