GTF

Another name for the GFF2 spec, sometimes called GFF2.5, however there is a specific versioning of GTF and the current is GTF2.2 (that's really clear now isn't...). This format primarily for encoding location of protein coding genes. Essentially there are four types of features when annotating genes, CDS (coding sequence exons), exon (not necessarily coding from which UTRs can be inferred), start_codon and stop_codon to indicate beginning and end of reading frame. Most critically in the GTF format are the two key-value pairs in the last column. These are gene_id and transcript_id . Some programs also encode the exontype field as any one of initial, internal, terminal, or single although these can be inferred via the placement of the start and stop codons wrt the exon and CDS features. The transcript_id field permits alternative splicing isoforms to be encoded. The order of the key/value pairs is not specified although some programs may expect it to be in gene/transcript/exontype order.