I am trying to simulate a full-scale experiment of ~30M, 100-nt reads for Arabidopsis, but I keep getting the same error (also at 16M reads and 8M reads):

[INFO] Loading default PCR distribution
    preparing transcript sequences *******Problems reading 3: 23459833, 74> 23459834 into 100: null
check for the right species/genome version!

I looked at the GTF and FASTA files.  The maximum position for chromosome 3 in the GTF file is 23,459,804; the actual chromosome sequence length is 23,459,831.  For some reason the simulator is trying to go beyond the end of the last gene in chromosome 3?  The full command I'm using is:

flux-simulator -t simulator -x -l -s -p a_thaliana_flux.par

I have attached the .par file I'm using for 8M reads in case this will help: a_thaliana_flux.par

Thanks!

  • No labels

2 Comments

  1. As a temporary hack, I managed to work around this problem by adding "N" characters to the end of the FASTA sequence for chromosome 3.  I have generated ~32M reads this way without any errors.

  2. Is there a fix for this in the new versions?

    I am having the same problem

    Problems reading IWGSC_CSS_1AL_scaff_1157499: 2010, 97> 2011 into 100: null

    Is adding "M" characters to the end of all my scaffolds the best way to overcome the problem?

    I will then have to remove all the simulated sequences which contain "M" characters.