If I am simulating reads in fastQ format using the 76nt error model, are the lower case bases in the resultant fastQ file the bases that were 'mutated'?
Is there a list of 'mutated' snps somewhere with genomic locations? I'm trying to separate the snps called by mpileup based on whether they were added as errors, or added (by me) as "real" snps (as we discussed in BARNA-324
-
Getting issue details...STATUS
not sure whether I understand your question fully, but I assume you mean the errors introduced in the read sequences. In the simulation, these changes are done on a per-read base, and you can track them by the upper- / lower-case letters in the FASTA output.
3 Comments
Micha Sammeth
Hi Matt,
yes, the lowercase characters are the ones that are alterated compared to the genomic sequence.
Best
Maayan
Hi Micha,
Is there a list of 'mutated' snps somewhere with genomic locations? I'm trying to separate the snps called by mpileup based on whether they were added as errors, or added (by me) as "real" snps (as we discussed in BARNA-324 - Getting issue details... STATUS
Micha Sammeth
Hi Maayan,
not sure whether I understand your question fully, but I assume you mean the errors introduced in the read sequences. In the simulation, these changes are done on a per-read base, and you can track them by the upper- / lower-case letters in the FASTA output.