It's well-known that a regular expression can be recognized by a nondeterministic finite automaton of size proportional to the regular expression, or by a deterministic FA which is potentially exponentially larger. Furthermore, given a string $s$ and a regular expression $r$, the NFA can test membership in time proportional to $|s| \cdot |r|$, and the DFA can test membership in time proportional to $|s|$. The slowdown for the NFA arises from the fact that essentially we need to track the sets of possible states the automaton could be in, and the exponentially blowup for the DFA arises from the fact that its states are elements of the powerset of the states of the NFA.

Is it possible to efficiently (ie, in time better than $O(|r| \cdot |s|)$, and space better than $O(2^{|r|})$) recognize regular expressions, if we allow using more powerful machines than finite automata? (For example, are there succinctness gains to recognizing regular languages with pushdown automata, or counter machines?)

$\begingroup$When you say that the "NFA can test membership in time proportional to $|s|\cdot|r|$" you mean that a (deterministic) RAM machine that simulates the NFA in the obvious way takes so much time? Or is there some other way to define the "run time of an NFA" that does not refer to another computational model? (Apart from the sensible but not very useful definition that says that the runtime of any NFA for the string $s$ is $|s|$.)$\endgroup$
– Radu GRIGoreSep 10 '10 at 11:57

$\begingroup$Then it seems to me more natural to simply ask this: Is there an algorithm (on a RAM machine) that decides if a string $s$ is in the language defined by the regular expression $r$ that works in $o(|s|\cdot|r|)$ time and $o(2^{|r|})$ space? (Especially if you define the runtime of a pushdown automata also in terms of a RAM machine.)$\endgroup$
– Radu GRIGoreSep 10 '10 at 13:01

1

$\begingroup$I don't understand the problem exactly. Is the input a string s and a regular expression r, and the problem is to decide whether s is in the language defined by the regular expression r?$\endgroup$
– Robin KothariSep 12 '10 at 22:55

$\begingroup$@Robin: yes, that's it. I would like to know if you can match regular expressions more efficiently than finite automata can by using more computational power, or if extra features (e.g. stack, RAM) simply don't help.$\endgroup$
– Neel KrishnaswamiSep 13 '10 at 8:01

4 Answers
4

Convert the regular expression to an NFA — for concreteness in comparing algorithms, we'll assume that $r$ is the number of NFA states, so that your $O(rs)$ time bound for directly simulating the NFA is valid and your $O(2^r)$ space bound for running the converted DFA are also valid whenever you're working in a RAM that can address that much memory.

Now, partition the states of the NFA (arbitrarily) into $k$ subsets $S_i$ of at most $\lceil r/k\rceil$ states each. Within each subset $S_i$, we can index subsets $A_i$ of $S_i$ by numbers from $0$ to $2^{\lceil r/k\rceil}-1$.

Build a table $T[i,j,c,A_i]$ where $i$ and $j$ are in the range from 0 to $k-1$, $c$ is an input symbol, and $A_i$ is (the numerical index of) a subset of $S_i$. The value stored in the table is (the numerical index of) a subset of $S_j$: a state $y$ is in $T[i,j,c,A_i]$ if and only if $y$ belongs to $S_j$ and there is a state in $A_i$ that transitions to $y$ on input symbol $c$.

To simulate the NFA, maintain $k$ indices, one for each $S_i$, specifying the subset $A_i$ of the states in $S_i$ that can be reached by some prefix of the input. For each input symbol $c$, use the tables to look up, for each pair $i,j$, the set of states in $S_j$ that can be reached from a state in $A_i$ by a transition on $c$, and then use a bitwise binary or operation on the numerical indices of these sets of states to combine them into a single subset of states of $S_j$. So, each step of the simulation takes time $O(k^2)$, and the total time for the simulation is $O(sk^2)$.

The space required is the space for all the tables, which is $O(k^2 2^{r/k})$. The time and space analysis is valid on any RAM that can address that much memory and that can do binary operations on words that are large enough to address that much memory.

The time-space tradeoff you get from this doesn't perfectly match the NFA simulation, because of the quadratic dependence on $k$. But then, I'm skeptical that $O(rs)$ is the right time bound for the NFA simulation: how do you simulate a single step of the NFA faster than looking at all of the (possibly quadratically many) transitions allowed from a currently active state to another state? Shouldn't it be $O(r^2 s)$?

In any case by letting $k$ vary you can get time bounds on a continuum between the DFA and NFA bounds, with less space than the DFA.

$\begingroup$I think your correction is correct, and your answer does answer my asked question. However, the question I wanted to ask is how much additional computational power helps. (Eg, with a counter you can match a string $a^k$ in O(1) space.) If you don't mind, I'll leave the question open for a little while longer to see if anyone knows the answer to that....$\endgroup$
– Neel KrishnaswamiSep 14 '10 at 10:33

$\begingroup$@Neel: If David's solution is the best a RAM machine can do, then stacks, counters, etc. won't help. (But, of course, he only gave upper bounds, not lower bounds.)$\endgroup$
– Radu GRIGoreSep 14 '10 at 15:18

1

$\begingroup$As far as I can tell, my solution does use "additional power": it is based on table lookups and integer indices, something that is not available in the DFA or NFA models. So I don't really understand how it's not answering that part of the question.$\endgroup$
– David EppsteinSep 15 '10 at 20:03

$\begingroup$Here is an alternative way to parametrize this. Suppose we are on a RAM machine with word width $w$, where $w \ge \lg r$. Then the NFA simulation takes $O(s r^2)$ time and $O(r/w)$ space. The DFA simulation isn't possible if $r \ge w$ (not enough space available). The construction in this answer sets $k \approx \lceil r/w \rceil$ and takes $O(sr^2/w^2)$ time and uses all the space available (i.e., something in the vicinity of $2^w$ space). It is basically exploiting the bit-parallelism available in a RAM machine to do the NFA simulation faster.$\endgroup$
– D.W.Apr 20 '17 at 20:19

This is not an answer, but too long for a comment. I'm trying to explain why the question, as posed, may be hard to understand.

There are two ways to define computational complexity for a device X.

The first and most natural way is intrinsic. One needs to say how the device X uses the input, so that we may later look at how the size n of the input affects the run time of the device. One also needs to say what counts as an operation (or step). Then we simply let the device run on the input and count operations.

The second is extrinsic. We define computational complexity for another device Y and then we program Y to act as a simulator for X. Since there may be multiple ways for Y to simulate X, we need to add that we are supposed to use the best one. Let me say the same with other words: We say that X takes $O(f(n))$ time on an input of size n if there exists a simulator of X implemented on machine Y that takes $f(n)$ time.

For example, an intrinsic definition for NFA says that it takes n steps to process a string of length n; an extrinsic definition that uses a RAM machine as device Y says that the best known upper bound is probably what David Eppstein answered. (Otherwise it would be strange that (1) the best practical implementation pointed in the other answer does not use the better alternative and (2) no one here indicated a better alternative.) Note also that strictly speaking your device X is the regular expression, but since the NFA has the same size it is safe to take it as being the device X you are looking at.

Now, when you use the second kind of definition it makes little sense to ask how restricting the features of device X affects the running time. It does however make sense to ask how restricting the features of device Y affects the running time. Obviously, allowing more powerful machines Y might allow us to simulate X faster. So, if we assume one of the most powerful machines that could be implemented (this rules out nondeterministic machines, for example) and come up with a lower bound $\Omega(f(n))$, then we know that no less powerful machine could do better.

So, in a sense, the best answer you could hope for is a proof in something like the cell probe model that simulating an NFA needs a certain amount of time. (Note that if you take into account the conversion NFA to DFA you need time to write down the big DFA, so memory isn't the only issue there.)

Even if you believe that there's nothing new or old to be learned about regular expression matching, check out one of the most beautiful papers I've come across for a long time: A play on regular expressions by S Fischer, F Huch, and T Wilke, ICFP 2010.

(MMT Chakravarty deserves the credit for recommending this paper.)

EDIT: The reason why this paper is relevant is that it describes a new technique (based on Glushkov's from the 60ies) that avoids constructing the full NFA (let alone the DFA) corresponding to the RE. What is done instead resembles running a marking algorithm similar to the well-known one for deciding acceptance of a word by an NFA on the syntax tree of the RE. Performance measurements suggest that this is competitive, even with google's recently published re2 library.

Take a look at this article by Russ Cox. It describes an NFA-based approach, first employed by Ken Thompson, by which an input string s can be matched to a regular expression r in time O(|s|.c) and space O(|r|.d), where c and d are upper-bound constants. The article also details a C implementation of the technique.

$\begingroup$I'm not convinced that's an accurate description of the article. It appears to be building the DFA from the NFA on an as-needed basis and caching the results. But the cache size could be exponential in r.$\endgroup$
– David EppsteinSep 13 '10 at 21:17