The VCF (Variant Call Format) has been developed to standardize the storage of genetic variation, as employed in the 1000 Genomes Project. Here we collect some conventions and extensions to the VCF standard definitions for describing splice site variation.
The basic concept of our VCF adaptation to describe splice site heterogeneity is to collect all information on variants that affect a certain splice site within a line, similar to multi-variant descriptions. We then propose to extend the standard VCF characterization by specific splice site attributes:
|Modality||constitutive or alternative splice site|
|Sequence||splice site sequence (and possibly derived sequences that as alterated by variants)|
|Score||thermodynamic affinity of the splice site, as expressed by its agreement with the species consensus; in our implementation we are applying logarithmic measures of splice site calling probabilities as proposed by the gene finding approach GeneID|