What happens to the reads if there are missing data in the Genome?

  • No labels

2 Comments

  1. Hi,

    in the simulation, the read sequences are extracted from the genomic sequences in the regions where there are transcripts annotated; a border case is the transcription start variation (parameter TSS_mean), which may extend the region that is recruited from the genome.

    I'm not sure whether I understood well what are "missing regions", but if those regions pop up as Ns in the genomic sequence of the transcribed areas, then they would also translate to Ns in the simulated reads. If part of the transcribed region or a complete chromosome/scaffold is unavailable, then it will lead to an error when trying to generate reads from this region; in such cases, it would be possible to simulate reads that are characterized by their location within the genome, but without extracting the read sequences (FASTA NO). 

    Best,

    M.

    1. Yes, thanx that answers my question. I just wanted to know what happens to 'crappy' genomes.