If I am simulating reads in fastQ format using the 76nt error model, are the lower case bases in the resultant fastQ file the bases that were 'mutated'?  

 

Thanks Matt

  • No labels

3 Comments

  1. Hi Matt,

    yes, the lowercase characters are the ones that are alterated compared to the genomic sequence.

    Best

  2. Hi Micha,

    Is there a list of 'mutated' snps somewhere with genomic locations? I'm trying to separate the snps called by mpileup based on whether they were added as errors, or added (by me) as "real" snps (as we discussed in BARNA-324 - Getting issue details... STATUS

     

    1. Hi Maayan,

      not sure whether I understand your question fully, but I assume you mean the errors introduced in the read sequences. In the simulation, these changes are done on a per-read base, and you can track them by the upper- / lower-case letters in the FASTA output.