I recently downloaded gene annotations for Homo sapiens from Ensembl for some bioinformatic analysis. The vast majority of the gene annotations have 20 exons or less, although there are some that have as many as 250. I know enough about gene annotation to take these predictions with a grain of salt, but it got me thinking...what are the biologically relevant factors that might limit the length, exon number, etc of a gene? Is there any real possibility for a gene to have 50 exons? 100 exons? 250 exons? From a biological standpoint, where is the line drawn and why?

3 Answers
3

This question drops firmly into the lap of molecular evolution and the constraints that are placed upon genes by the forces of mutation, selection, drift and recombination.

There are numerous situations, particularly gene duplication, that can result in a gene that is free from the selective constraints of it's parent, many of which will accumulate so many deleterious mutations as a result of stochastic processes that they will become non-functional e.g. psuedogenes. Some can be altered and rearranged, accumulating exons and introns, and if they infer a fitness benefit on the organism, may be moved to fixation within a population.

Evolution is a population genetics process, and there are many variables which can effect the outcome, not least the difference in populations size. The genomes of larger populations (such as those of bacteria) appear to have much smaller genomes, and of course no (at least not spliceosomal) introns, perhaps as a result of increased fitness due to the decreased generation time of an organism with a more slender genome. It would be a good idea to read The Origins of Genome Architecture by Michael Lynch, as I think he answers your questions, better than I can.

Many of the genes you retrieve from EnsEMBL will of course have experimental evidence to support them. The genes that are predicted in the pipeline can be looked upon with less confidence, but you can of course look at the alignments with closely related species to see if you think the introns/exons are indeed viable. An example of a gene with 79 exons is the Dystrophin (DMD) gene, the longest annotated gene at 2,217,347bp (see Roberts et al, 1993 and Nishio et al, 1994).

Talking about annotation quality control, one needs to think about the molecular constraints on the size of an exon, or intron for that matter. The (theoretical) minimum length of an exon would have to 1bp, although one would also need to think about the binding of molecular machinery involved in exon-intron boundry recognition and the splicing of adjacent exons. I should think exons of less than 6bp would probably not be considered functional? See jbc.org/content/270/6/2411.full and mbe.oxfordjournals.org/content/23/12/2392.full
–
gawbulDec 14 '11 at 23:08

Agree with mbq - titan is the longest gene I know of and it has well over 100 exons. Titin and dystrophin are well characterized genetically and not predictions. titin is the champion exoner with 363 exons.

Its only examples like this that can allow the gene predictors to run on as long as they do I think as the predictions are trimmed heuristically to resemble the known gene structures/lengths/junctions etc.