Let $L$ be a language on a finite alphabet and let $L_n$ be the number of words of length $n$. Let $f_L(x) = \sum_{n \ge 0} L_n x^n$. The following are well-known:

If $L$ is regular, then $f_L$ is rational.

If $L$ is unambiguous and context-free, then $f_L$ is algebraic.

Does there exist a natural family of languages $\mathcal{L}$ containing the context-free languages such that if $L \in \mathcal{L}$, then $f_L$ is holonomic? Is that class of languages also associated to a natural class of automata?

This question is prompted by a remark in Flajolet and Sedgewick where they assert that there is no meaningful generating function formalism associated to context-sensitive languages because of the significant undecidability issues. However, holonomic functions have proven a robust and incredibly useful framework in combinatorics, so I think this is a natural question to ask.

I have considered this problem for a while now. I agree with Greg that the parallels with complexity theory seem to end at unambiguous context free. The quality that makes a word difficult to recognize diverges from what makes a language difficult to enumerate: e.g. { a^n b^n c^n: n a posint} is easy to count. On the other hand, D-finite sequences are unable to handle a certain notion sparseness, (i.e. {a^(2^n): n a posint}) because they tend to give rise to natural boundaries in the generating functions.

As for Jacques' comment, it is true that there is a differential operator in species, but that does not mean that you can model solutions to differential equations easily. If you take an iterative approach a la Chomsky Schützenberger to generate the combinatorial objects, you need to ensure convergence of your language. (the paper cited above by Martin actually needs a partial retraction on this point) It is easy to show convergence if you can use Theta, which is actually x d/dx, but you cannot use theta to build all linear ODEs with polynomial coefficients.

Along this line, if you restrict yourself to solutions of smaller families of differential equations, there are several combinatorial interpretations, often in terms of rooted trees.

Welcome to MO, Marni! I like your two examples, and I agree that they demonstrate the point you made in the beginning of your paper. And I will have to look at the paper you linked to as well.
–
Qiaochu YuanFeb 24 '10 at 23:48

I don't know the topic well enough to give you a complete answer, but two things. First, Chomsky and Schutzenberger proved that unambiguous context-free languages have an algebraic generating function. These are languages for which there is a context-free grammar that accepts each word in only one way. Or more generally and more weakly, they showed that the number of acceptances, as distinct from the number of words, has an algebraic generating function. There is a paper by Philippe Flajolet in which he proves that certain CFLs have no unambiguous grammar exactly by showing that the generating function is transcendental. Flajolet is also interested in holonomic generating functions, but I did not see any clear statement as to whether the g.f. of an ambiguous CFL can be non-holonomic. Flajolet is surely a good person to ask about the entire question.

Second, I can't resist mentioning an unfinished project from a while ago that might help you as a survey. Complexity Zoology is a computer-assisted compilation of inclusions and separations of language classes, to go with the much better known Complexity Zoo living survey started by Scott Aaronson. The Complexity Zoo moved to Stanford, so for now all of my links are broken, but Zoology knows things that would be difficult to add to the Zoo without its help. You can learn pretty quickly that there are not very many language classes between CFL and things that are clearly #P-hard to count, or harder. One candidate is GCSL, which means CSL with the restriction that all replacement rules make the words grow. Although in light of the first comment, looking at the next things after CFL is a little off the mark; DCFL or deterministic CFL is fairly standard and is contained in unambiguous CFL.

My guess is that holonomic sequences are an enumerative intermediate that doesn't show up easily in complexity theory.

I think it depends on your notion of ``natural''. I would expect that if you allow grammar definitions to use the $\Theta$ operator (as well as the usual sums and products of CFGs), then you should get 'exactly' the holonomic class. The $\Theta$ operator is essentially the derivative (that is what it does on GFs).