If I am simulating reads in fastQ format using the 76nt error model, are the lower case bases in the resultant fastQ file the bases that were 'mutated'?
yes, the lowercase characters are the ones that are alterated compared to the genomic sequence.
Is there a list of 'mutated' snps somewhere with genomic locations? I'm trying to separate the snps called by mpileup based on whether they were added as errors, or added (by me) as "real" snps (as we discussed in
Getting issue details...
not sure whether I understand your question fully, but I assume you mean the errors introduced in the read sequences. In the simulation, these changes are done on a per-read base, and you can track them by the upper- / lower-case letters in the FASTA output.
Powered by a free Atlassian Confluence Open Source Project License granted to University. Evaluate Confluence today.