Summary

The parameters without brackets are mandatory for the respective mode in the boxes below. Parameters between pipe ("|") are mutually exclusive. Parameters on brackets are optional.

Search Alternatively Spliced Domains

astalavista -t astafunk [--verbose] [--cpu <INT>] [--all | -g] [--local] [-o <INT>] --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>

Search Constitutive Domains

astalavista -t astafunk  [--verbose] [--cpu <INT>] [--local] [-o <INT>] --const --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>

Observation: On AS genes, the current version of this mode searches constitutive domains only on reference transcript (longest ORF).

Search AS domains exhaustively

Searches exhaustively the HMM database against the variant sequences, i.e., without a reference domain file.

astalavista -t astafunk  [--verbose] [--cpu <INT>] [--all | -g] [--local] [-o <INT>] -e|--exh --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE>

Näive Search

astalavista -t astafunk  [--verbose] [--cpu <INT>] [--local] [-o <INT>] --naive --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>

Print the reference transcript sequences

astalavista -t astafunk  --tref --genome <GENOME_DIR> --gtf <GTF_FILE>

Search HMM database against FASTA sequences

astalavista -t astafunk [--local] [-o <INT>]--test --hmm <HMM_FILE> --fa <SEQUENCE_FILE>

How to Create a Reference File

<REFERENCE_FILE> is computed by hmmsearch of the HMMER program, using the command line below:

$> hmmsearch --cut_ga --domtblout <REFERENCE_FILE> <HMM_FILE> <REFERENCE_TRANSCRIPTS.fasta>

hmmsearch is the HMMER algorithm (hmmer.org) to search one or more profiles (from the Pfam-A.hmm database) against the amino acid sequences of reference transcripts (in the <REFERENCE_TRANSCRIPTS>.fasta, see help below). The parameter --cut_ga is that hmmsearch uses gathering domain thresholds stored in the HMM profiles during predictions. The --domtblout output saves a parseable table of per-domain hits to <REFERENCE_FILE>. The reference transcript is the transcript with the longest ORF of a gene.

Using AstaFunk to Generate a Multi-fasta File with the Reference Transcripts

AstaFunk includes a feature to generate a multi-fasta file with the amino acid sequences of reference transcripts for a given annotation.

Firstly, you execute ASTAFUNK to print on standard output (redirected to the file <REFERENCE_TRANSCRIPTS.fasta>) the amino acid sequences of the reference transcripts. A reference transcript is the transcript with the longest Open Reading Frame (ORF) of an alternatively spliced gene.

Obtain the reference transcript FASTA file with the command:

$> astalavista -t astafunk --tref --genome <GENOME_DIR> --gtf <GTF_FILE.gtf> > <REFERENCE_TRANSCRIPTS.fasta>

Getting Started

Searching protein domains on alternatively spliced regions of human gene TNNT1

According to RefSeq (NM_003283),

This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration.

Input

Annotation of eight alternative transcripts from GENCODE Basic v24 (Download)
Chromosome 19 FASTA file from GRCh38/hg38 (Download)
Reference file (Download)
HMM file (Download)

Command line

Obtaining reference transcript sequence

$> astalavista -t astafunk --tref --gtf tnnt1.gtf --genome ~/example/genome/ > reference_tx.fasta

Creating reference file

$> hmmsearch --domtblout reference_file ~/Databases/Pfam/Pfam-A.hmm reference_tx.fasta

Obtaing a reduced HMM file

$> grep -v "#" reference_file | awk '{print $5}' | sort | uniq > list-hmm-tnnt1
$> hmmfetch -f Pfam-A.hmm list-hmm-tnnt1 > database.hmm

or skip these commands and use directly the whole database Pfam-A.hmm as parameter for the option [–hmm].

Running AstaFunk to obtain alternatively spliced domains

astalavista -t astafunk --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm

Extra: Constitutive Domains

Obtaining constitutive domains

astalavista -t astafunk --const --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm

Documentation of the JAVA source code

You can view the complete javadoc of barna on http://sammeth.net/jenkins/job/barna-devel/javadoc/: AstaFunk documentation can be found on packages barna.astafunk.*

Child pages

Usage Example