I've been playing with the flux-simulator program for a few days.  I've been able to generate simulated FASTQ files for my purposes, but the reads are too nice – after alignment with TopHat or MapSplice there doesn't appear to be any noise to speak of within intron regions.

What parameter settings could I try to increase the amount of noise in non-trascriptomic regions?  For example, occasional reads within introns or within intergenic regions as we often see with real RNA-Seq data?  These might arise from incompletely spliced transcripts in the sample or from rare intron retention events that may not yet be part of the annotated transcriptome.  Is there any way to get flux-simulator to model this?

Many thanks!

  • No labels

4 Comments

  1. Hi Mark,

    which error model did you use to simulate the reads? Currently the default error models are limited in terms of simulating error in specific regions. We are working on a model generator to be able to create custom models from SAM files ( BARNA-185 - Getting issue details... STATUS ), but I think this will not solve the issue of creating "region specific" errors. Maybe we can come up with a more advanced error model that takes the annotation into account. Feel free to raise a JIRA feature request.

    1. Actually I think we've worked around the issue.  However, to answer your question I am using "ERR_FILE 76" with read length 76nt.

      I've also noticed that the simulator creates some large files in the /tmp directory.  Since our /tmp directory has limited space, I'd like to change it to a different directory.  Is there an easy way to do that?

      Many thanks!

       

      1. Sure, I create a section in the FAQ. Either set the TMP_DIR parameter in your parameter file or, on linux, export the TMPDIR environment variable before you start the simulator.

        1. TMP_DIR worked just fine.  Thanks!