The parameters without brackets are mandatory for the respective mode in the boxes below. Parameters between pipe ("|") are mutually exclusive. Parameters on brackets are optional.
astalavista -t astafunk [--verbose] [--cpu <INT>] [--all | -g] [--local] [-o <INT>] --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>
astalavista -t astafunk [--verbose] [--cpu <INT>] [--local] [-o <INT>] --const --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>
Observation: On AS genes, the current version of this mode searches constitutive domains only on reference transcript (longest ORF).
Searches exhaustively the HMM database against the variant sequences, i.e., without a reference domain file.
astalavista -t astafunk [--verbose] [--cpu <INT>] [--all | -g] [--local] [-o <INT>] -e|--exh --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE>
astalavista -t astafunk [--verbose] [--cpu <INT>] [--local] [-o <INT>] --naive --genome <GENOME_DIR> --gtf <GTF_FILE> --hmm <HMM_FILE> --reference|-r <REFERENCE_FILE>
astalavista -t astafunk --tref --genome <GENOME_DIR> --gtf <GTF_FILE>
astalavista -t astafunk [--local] [-o <INT>]--test --hmm <HMM_FILE> --fa <SEQUENCE_FILE>
<REFERENCE_FILE> is computed by hmmsearch of the HMMER program, using the command line below:
$> hmmsearch --cut_ga --domtblout <REFERENCE_FILE> <HMM_FILE> <REFERENCE_TRANSCRIPTS.fasta>
hmmsearch is the HMMER algorithm (hmmer.org) to search one or more profiles (from the Pfam-A.hmm database) against the amino acid sequences of reference transcripts (in the <REFERENCE_TRANSCRIPTS>.fasta, see help below). The parameter --cut_ga is that hmmsearch uses gathering domain thresholds stored in the HMM profiles during predictions. The --domtblout output saves a parseable table of per-domain hits to <REFERENCE_FILE>. The reference transcript is the transcript with the longest ORF of a gene.
AstaFunk includes a feature to generate a multi-fasta file with the amino acid sequences of reference transcripts for a given annotation.
Firstly, you execute ASTAFUNK to print on standard output (redirected to the file <REFERENCE_TRANSCRIPTS.fasta>) the amino acid sequences of the reference transcripts. A reference transcript is the transcript with the longest Open Reading Frame (ORF) of an alternatively spliced gene.
Obtain the reference transcript FASTA file with the command:
$> astalavista -t astafunk --tref --genome <GENOME_DIR> --gtf <GTF_FILE.gtf> > <REFERENCE_TRANSCRIPTS.fasta>
According to RefSeq (NM_003283),
This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration.
$> astalavista -t astafunk --tref --gtf tnnt1.gtf --genome ~/example/genome/ > reference_tx.fasta
$> hmmsearch --domtblout reference_file ~/Databases/Pfam/Pfam-A.hmm reference_tx.fasta
Obtaing a reduced HMM file
$> grep -v "#" reference_file | awk '{print $5}' | sort | uniq > list-hmm-tnnt1 $> hmmfetch -f Pfam-A.hmm list-hmm-tnnt1 > database.hmm
or skip these commands and use directly the whole database Pfam-A.hmm as parameter for the option [–hmm].
astalavista -t astafunk --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm
astalavista -t astafunk --const --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm
You can view the complete javadoc of barna on http://sammeth.net/jenkins/job/barna-devel/javadoc/: AstaFunk documentation can be found on packages barna.astafunk.*