Section |
---|
The Flux Simulator uses FASTA/FASTQ sequences at different points; for the (optional) input of a genomic sequence to (optinally) produce read sequences.Genomic references are expected to provide one single FASTA file per reference sequence (i.e., chromosome, scaffold, etc.), as described in the Sequencing Section. |
Fasta formats are used very commonly as they provide easy (descriptor,sequence) tuples. Generally, it can be differentiated between single-FASTA files — that contain a single sequence — and multi-FASTA files, which correspondingly contain more than one sequence. The Flux Capacitor and Simulator programs usually output multi-FASTA files, an exception is the genomic sequence files, which are to be located in a common directory, with a file chr.fa for each chr annotated in the corresponding GTF annotation file.
Section |
---|
The read sequence output is a multi-FASTA file, where each |
...
fasta block contains a description line that starts with a ">" ("greater than") symbol and the following one or multiple lines containing the read sequence |
...
. If a quality/error-model is provided, the very related FASTQ file format is produced, where the ">" identifier is replaced by the "@" symbol, and a quality block is following the fasta block, which uses a "+" separator and subsequently provides the qualities of the read sequences. The description line contains the read identifier as described in the Sequencing Section. |
FASTQ
Code Block |
---|
@chr1:4847775-4887990W:NM_001159750:1:2668:917:1137/1
AAGAGATGAGGAAAAACCTGACCAAAGAAGCCATCAGGGAGCATCAGATGGCCAAGACTGGTGGGACCCAGACTGA
+
IEEIIGIIIIIIF<GGEEHHHD4<D@147=;7*+BDBGACDGGHIIIIIHHDGGDB@@FEGGD9DGIHHHIH@BDG
@chr1:4847775-4887990W:NM_001159750:1:2668:917:1137/2
CCAATTCTTCCAAACTCAACAGAACTTCCACCGATTTCCACATTCATTACATACAACAAATGTTGTCATTGGTTCA
+
G:GB78??:9>>;?EGGGGHIDGDD=EBFGIIIHHGGGIIIIHHIIIIHHIGEIIIIIHIFCBFIHGD@@@BBEIC |
FASTA
Code Block |
---|
>chr1:4847775-4887990W:NM_001159750:1:2668:917:1137/1
AAGAGATGAGGAAAAACCTGACCAAAGAAGCCATCAGGGAGCATCAGATGGCCAAGACTGGTGGGACCCAGACTGA
>chr1:4847775-4887990W:NM_001159750:1:2668:917:1137/2
CCAATTCTTCCAAACTCAACAGAACTTCCACCGATTTCCACATTCATTACATACAACAAATGTTGTCATTGGTTCA |
...