...
where <annotation.gtf> is the transcriptome annotation (see GTF format) which contains the splice sites to be scored, and <genome-folder> is the path to the directory containing genomic sequences, one FASTA file per reference sequence (i.e., chromosome, contig, scaffold, etc.).
Code Block |
---|
astalavista -t scorer -i <annotation.gtf> -c <genome-folder> --vcf <vcf-file> |
with <vcf-file> being a file in the VCF (Variant Call Format), describing polymorphisms of the genomic sequence.
Code Block |
---|
astalavista -t scorer -i <annotation.gtf> --gid <profile> |
where <profile> is a geneid profile for the HMM model.
For time efficiency, the positions of all genetic variants are loaded into the computer's memory (RAM), so it is to be ensured that enough memory is provided to the Java Virtual Machine. As an orientation, the variants from the 1000 Genomes project phase 1 and phase 2 just for chr22 occupy 6.4 Gb of disk and require ~1.5Gb for running splice site scoring.
...