Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  1. R. Guigó, "Assembling genes from predicted exons in linear time with dynamic programming", Journal of Computational Biology, 5:681-702 (1998).
  2. R. Guigó, S. Knudsen, N. Drake, and T. F. Smith, "Prediction of gene structure", Journal of Molecular Biology, 226:141-157 (1992).
  3. E. Blanco, G. Parra and R. Guigó, "Using geneid to Identify Genes.", In Current Protocols in Bioinformatics. Unit 4.3. (A. Baxevanis, editor) John Wiley & Sons Inc., New York (2002)
  4. G. Parra, E. Blanco, and R. Guigó, "Geneid in Drosophila", Genome Research 10(4):511-515 (2000).

...

Examples

Scoring Splice Sites of a given annotation employing the default model (i.e., human)

...

where <annotation.gtf> is the transcriptome annotation (see GTF format) which contains the splice sites to be scored, and <genome-folder> is the path to the directory containing genomic sequences, one FASTA file per reference sequence (i.e., chromosome, contig, scaffold, etc.).

Scoring Splice Sites with polymorphisms
Code Block
astalavista -t scorer -i <annotation.gtf> -c <genome-folder> --vcf <vcf-file>

with <vcf-file> being a file in the VCF (Variant Call Format), describing polymorphisms of the genomic sequence.

Scoring Splice Sites with a custom geneid profile
Code Block
astalavista -t scorer -i <annotation.gtf> --gid <profile>

where <profile> is a geneid profile for the HMM model.

The default program output is in VCL format written to a file "<annotation>_sites.vcf" in the same directory as the provided transcriptome annotation <annotation.gtf>. The output file can be changed by the command line flag -f.

Requirements

Hardware

For time efficiency, the positions of all genetic variants are loaded into the computer's memory (RAM), so it is to be ensured that enough memory is provided to the Java Virtual Machine. As an orientation, the variants from the 1000 Genomes project phase 1 and phase 2 just for chr22 occupy 6.4 Gb of disk and require ~1.5Gb for running splice site scoring.

...