You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 25 Next »

 

Searching protein domains on alternatively spliced regions of human gene TNNT1

According to RefSeq (NM_003283), 

This gene encodes a protein that is a subunit of troponin, which is a regulatory complex located on the thin filament of the sarcomere. This complex regulates striated muscle contraction in response to fluctuations in intracellular calcium concentration.


Input

  • Annotation of eight alternative transcripts from GENCODE Basic v24 (Download)
  • Chromosome 19 FASTA file from GRCh38/hg38 (Download)
  • Reference file (Download)
  • HMM file (Download)

Command-lines

 

Obtaining reference transcript sequence
$> astalavista -t astafunk --tref --gtf tnnt1.gtf --genome ~/example/genome/ > reference_tx.fasta
Creating reference file
$> hmmsearch --domtblout reference_file ~/Databases/Pfam/Pfam-A.hmm reference_tx.fasta

Obtaing a reduced HMM file

$> grep -v "#" reference_file | awk '{print $5}' | sort | uniq | hmmfetch -f ~/Pfam/Pfam-A.hmm - > database.hmm

 

or skip these commands and use directly the whole database Pfam-A.hmm as parameter for the option [–hmm].

Running AstaFunk to obtain alternatively spliced domains
astalavista -t astafunk --genome ~/example/genome/ --gtf tnnt1.gtf --reference reference_file --hmm database.hmm

Output

Download Excel Sheet

chrgene_clustervariantaccbitscorestart_seqend_seqstart_genomicend_genomicfirst_sourcelast_sinkstart_modelend_modellength_modelevents
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000588426.5PF00992.1735,230967061102-55147129-55134152-55140883-5514935411341341^3-4^,2^|55147126^55147021-55147008^,55147107^ 1[4^7-8^10-11^14-,2[4^7-8^10-11^14-,2[4^7-9^14-,3[4^5-6^7-8^10-11^14-,12[,13[|55149354[55149161^55147168-55147126^55147021-55147008^55146707-,55149206[55149161^55147168-55147126^55147021-55147008^55146707-,55149206[55149161^55147168-55147107^55146707-,55149193[55149161^55148093-55147999^55147168-55147126^55147021-55147008^55146707-,55146949[,55146942[ 0,1-2^|,55146466-55146434^ 1-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000588981.5PF00992.1798,6465479585205-55141239-55134200-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000587465.6PF00992.17139,05754671135-55141281-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000585321.6PF00992.17139,05754671135-55141281-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000536926.5PF00992.17139,05754671135-55141281-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000291901.12PF00992.17158,357640869205-55141287-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000356783.9PF00992.17158,357640858194-55141287-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 
chr19ENST00000291901.12,ENST00000356783.9,ENST00000587465.6,ENST00000587758.5,ENST00000585321.6,ENST00000588981.5,ENST00000536926.5,ENST00000588426.5ENST00000587758.5PF00992.17158,357640858194-55141287-55134152-55140883-5514935411341341-2^3-4^5-6^7-8^9-,3-4^5-6^7-8^9-,10-|55146466-55146434^55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55145565-55145544^55141920-55141857^55141302-55141186^55140960-,55140912- 

 

 

Description of the output columns can be found in 3.4 - Tool ASTAFUNK (Prediction of functional domains impacted by AS).

Look the results in the UCSC Genome Browser. Convert our output to BED track:

cat output | awk -v FS="\t" -v OFS="\t" '{if(NR>1){split($1,chr,"chr"); if($9<0){s1 = $9*(-1);s2=$8*(-1);st="-"}else{s1=$8;s2=$9;st="+"};print $1, s1-1,s2, $3"-"$4, $5*10,st}else{print "track name=afunk_predictions description=\"AS protein domains\" useScore=1"}}' > myBed.bed

 

  • No labels