Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

where <annotation.gtf> is the transcriptome annotation (see GTF format) which contains the splice sites to be scored, and <genome-folder> is the path to the directory containing genomic sequences, one FASTA file per reference sequence (i.e., chromosome, contig, scaffold, etc.).

Scoring Splice Sites with polymorphisms
Code Block
astalavista -t scorer -i <annotation.gtf> -c <genome-folder> --vcf <vcf-file>

with <vcf-file> being a file in the VCF (Variant Call Format), describing polymorphisms of the genomic sequence.

Scoring Splice Sites with a custom geneid profile
Code Block
astalavista -t scorer -i <annotation.gtf> --gid <profile>

where <profile> is a geneid profile for the HMM model.

Requirements

Hardware

For time efficiency, the positions of all genetic variants are loaded into the computer's memory (RAM), so it is to be ensured that enough memory is provided to the Java Virtual Machine. As an orientation, the variants from the 1000 Genomes project phase 1 and phase 2 just for chr22 occupy 6.4 Gb of disk and require ~1.5Gb for running splice site scoring.

...