The fun of GFF parsing

Lately I come to work a lot with especially eukaryotic genome annotation (-sigh- prokaryotes are sooo much easier) and have to rely on tools reading GFF or GTF formatted annotations.

Once you get into the trenches of elaborate exon structures of different isoforms, you will notice that neither GFF nor GTF were ever a good idea for quick parsing or any analysis. To make matters worse, both formats adhere only sometimes very loosely to some vague (mostly optional) conventions -sigh-.

Some explanations or rather recommendations can be found here:

The last is part of a widely used GFF utility suite including gffread:

A glimmer of light seems to be the python package gffutils:

And the following site tries to validate any of the arbitrarily pieced-together GFFs:

How come all these will-do “standards” have become so widely adopted and every improvement always leads to even more confusion and more unparsable laissez-faire data junk. </rant>