I'm currently working with a Sanger-generated 10-gene dataset, which includes a few fast-evolving genes. During the initial data exploration phase I used saturation plots to check for potential decay of phylogenetic signal caused by multiple substitutions (saturated nucleotide variation).

There are a lot of ways to visualize saturation. One method, which I believe was originally used by Philippe et al. (1994), is to plot the raw or uncorrected pairwise genetic distances in an alignment against model-corrected genetic distances. If the relationship is approximately linear, then the gene is not saturated; if the line curves or plateaus, there is evidence of saturation.

Here is an example of an unsaturated gene:

And an example of a saturated gene:

This is a really rough method, which should probably only be used as a preliminary exploration of your data. As far as I know, there is not an established slope value that says definitively, "yes, this gene is saturated." However, I do think it's a useful thing to look at, and it's really easy to do in R. You may want to look into APE's dist.dna command for all of the available models. Here is the R-code I used to make these simple plots:

Thanks for posting! I have been searching to for methods to assess substitution saturation in R and came across this post. As you note, this is great for a descriptive approach and replicates what is currently implemented in DAMBE; although DAMBE plots transitions/transversions versus corrected distance, which you can do for the 1+2 and 3 codon positions. I am sure your code can be modified to do the same. I was wondering if you have continued to explore this further. Have you seen any extensions for the Xia test (implemented in DAMBE) or likelihood mapping (implemented in TREE-PUZZLE) for R? I am working with bacterial genomes and would like to test for saturation gene-by-gene or by gene clusters. It would be way too time-intensive to do this in the available programs but relatively easy if implemented in R. Thanks!