.FASTA/FASTQ sequences

The Flux Simulator uses FASTA/FASTQ sequences at different points; for the (optional) input of a genomic sequence to (optinally) produce read sequences.Genomic references are expected to provide one single FASTA file per reference sequence (i.e., chromosome, scaffold, etc.), as described

Fasta formats are used very commonly as they provide easy (descriptor,sequence) tuples. Generally, it can be differentiated between single-FASTA files — that contain a single sequence — and multi-FASTA files, which correspondingly contain more than one sequence. The Flux Capacitor and Simulator programs usually output multi-FASTA files, an exception is the genomic sequence files, which are to be located in a common directory, with a file chr.fa for each chr annotated in the corresponding GTF annotation file.

FA, FASTA format

The original Fasta (FA) file format is rather simple. Each fasta block contains a description line that starts with a ">" ("greater than") symbol and multiple lines containing the sequence itself. Further examples for FASTA format can be found for instance here.

Oftenly, the description line is tokenized into different tags, separated by either "|" ("pipe", as in NCBI standard) or ";" ("semi-colon", as in the Pearson FASTA format). The Flux Capacitor and the Flux Simulator use these separators to divide the descriptor line in the fields of the Flux Mapped Read Descriptor.

Space shortcuts

Child pages

FA, FASTA format