You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

 

(question) How to Create a Reference File

<REFERENCE_FILE> is computed by hmmsearch of the HMMER program, using the command line below:

 

$> hmmsearch --cut_ga --domtblout <REFERENCE_FILE> <HMM_FILE> <REFERENCE_TRANSCRIPTS.fasta>

 

hmmsearch is the HMMER algorithm (hmmer.org) to search one or more profiles (from the Pfam-A.hmm database) against the amino acid sequences of reference transcripts (in the <REFERENCE_TRANSCRIPTS>.fasta, see help below). The parameter --cut_ga is that hmmsearch uses gathering domain thresholds stored in the HMM profiles during predictions. The --domtblout output saves a parseable table of per-domain hits to <REFERENCE_FILE>. The reference transcript is the transcript with the longest ORF of a gene.


(question) Using AstaFunk to Generate a Multi-fasta File with the Reference Transcripts

AstaFunk includes a feature to generate a multi-fasta file with the amino acid sequences of reference transcripts for a given annotation.

Firstly, you execute ASTAFUNK to print on standard output (redirected to the file <REFERENCE_TRANSCRIPTS.fasta>) the amino acid sequences of the reference transcripts. A reference transcript is the transcript with the longest Open Reading Frame (ORF) of an alternatively spliced gene.

Obtain the reference transcript FASTA file with the command:

 

$> astalavista -t astafunk --tref --genome <GENOME_DIR> --gtf <GTF_FILE.gtf> > <REFERENCE_TRANSCRIPTS.fasta>

Getting Started

Searching protein domains on alternatively spliced regions of human gene TNNT1

According to RefSeq (NM_003283), 

This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration.


Input

  • Annotation of eight alternative transcripts from GENCODE Basic v24 (Download)
  • Chromosome 19 FASTA file from GRCh38/hg38 (Download)
  • Reference file (Download)
  • HMM file (Download)

Command line

 

Obtaining reference transcript sequence
$> astalavista -t astafunk --tref --gtf tnnt1.gtf --genome ~/example/genome/ > reference_tx.fasta
Creating reference file
$> hmmsearch --domtblout reference_file ~/Databases/Pfam/Pfam-A.hmm reference_tx.fasta

Obtaing a reduced HMM file

$> grep -v "#" reference_file | awk '{print $5}' | sort | uniq > list-hmm-tnnt1
$> hmmfetch -f Pfam-A.hmm list-hmm-tnnt1 > database.hmm

 

or skip these commands and use directly the whole database Pfam-A.hmm as parameter for the option [–hmm].

Running AstaFunk to obtain alternatively spliced domains
astalavista -t astafunk --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm

Extra: Constitutive Domains

 

Obtaining constitutive domains
astalavista -t astafunk --const --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm
  • No labels