Jocelyn Paine

Dr. Dobb's Bloggers

Snobol Patterns in Prolog IV: bal, and the Use of Failure to Diagnose Patterns

February 06, 2010

It's a pun between logic and control flow: a pun where you hear "A or B" with one ear but "Do A; until something fails, then try B" with the other. In most languages, there are Boolean expressions that return a TRUE or a FALSE, and conditions that jump to a THEN or an ELSE. But in Snobol, there are pattern matches that succeed or they fail; and if a match fails, the matcher will backtrack to already-matched subpatterns, seeking alternative matches that make the current match succeed. And in Prolog, there are calls to predicates that succeed or they fail; and if a call fails, Prolog will backtrack to already-called predicates, seeking alternative solutions that make the current call succeed. "It is not enough", as Gore Vidal wrote, "to succeed. Others must fail". I am going to show you how I implemented Snobol's "bal" pattern, which matches a string that's balanced in round brackets, with Prolog Definite Clause Grammar rules; and how to use failure to probe "bal"'s behaviour.

It's a pun between logic and control flow: a pun where you hear "A or B" with one ear but "Do A; until something fails, then try B" with the other. In most languages, there are Boolean expressions that return a TRUE or a FALSE, and conditions that jump to a THEN or an ELSE. But in Snobol, there are pattern matches that succeed or they fail; and if a match fails, the matcher will backtrack to already-matched subpatterns, seeking alternative matches that make the current match succeed. And in Prolog, there are calls to predicates that succeed or they fail; and if a call fails, Prolog will backtrack to already-called predicates, seeking alternative solutions that make the current call succeed. "It is not enough", as Gore Vidal wrote, "to succeed. Others must fail". I am going to show you how I implemented Snobol's "bal" pattern, which matches a string that's balanced in round brackets, with Prolog Definite Clause Grammar rules; and how to use failure to probe "bal"'s behaviour.

In effect, "dexp" is a macro-expander. It takes a string containing a function defined by an expression, such as

dexp( "ave(x,y) = (x + y) / 2.0" )

and expands it into a Snobol statement containing the function's body. For "ave", that statement is

"ave ave = (x + y) / 2.0 :s(return)f(freturn)"

Then it compiles the body by calling the built-in "code" function. As I demonstrated in Programs That Transform Their Own Source Code, one can then enter the code by jumping to the label that starts it, "ave". Finally, "dexp" calls the built-in "define" function to make "ave" a function having this label as entry point. It calls "define" as follows, passing it "ave"'s name and local parameters:

define( "ave(x,y)" )

The original string — that is, "dexp"'s argument — contains a bracketed list of local parameters, "(x,y)". This list needs to be removed in order to make "ave"'s body. Also, the combination of function name and local parameters, "ave(x,y)", must be recognisable in order that "dexp" can split it away from the expression following and pass it to "define". To recognise it, "dexp" uses "bal".

It uses "bal" in the following pattern:

break('(') . Name bal . Args

I explain this and Snobol in more detail in Programs That Transform Their Own Source Code. But the gist is that the "break('(') . Name" matches everything up to but not including the opening bracket and assigns it to the variable Name. Then "bal . Args" matches the opening bracket up to and including the closing bracket, and assigns it to the variable Args.

Unlike the other patterns I've written about in these essays, I myself haven't used my Prolog "bal" for real-world tasks. Perhaps that's because most of what I've applied Snobol and Prolog to recently has been processing files of drug-trial data: numerical measurements to be fed to the statistical language R. Perhaps it's because when I do manipulate source code, I write Definite Clause Grammars from scratch, following the syntax of the programming language. But some Prolog programmers may find "bal" useful. And I also want to use it in demonstrating how "fail" can make both languages enumerate all possible matches of a pattern, thereby helping us check the pattern's specification.

In these calls, I'm using the built-in phrase/3 predicate to invoke "bal". Its second argument is the complete string to be matched — what Snobol calls the subject. For example "x(a)(b)y". Its third argument, Rest, is what remains unmatched. The argument to "bal", Matched, will be unified with the substring matched by it. Double-quoted string constants are, as usual, represented as lists of character codes: codes 40 and 41 are '(' and ')', while 97 is 'a' and 120 is 'x'. I'm typing the calls into SWI-Prolog's top-level interpreter: the semicolon makes the interpreter backtrack and seek an alternative solution.

Now here's my definition of "bal" in Definite Clause Grammar (DCG) rules. The second rule is the "bal" that returns the substring it matches; the first rule discards this argument:

I hope the code is straightforward, though it would be nice to do away with the two "append"'s. One reason I'm publishing it is that it took me a little bit of work to discover "bal"'s specification. Namely, and in terms of the above listing, what to define as a primitive component in bal_prim, and how exactly to code the recursive call to "bal".

The longest possible substring composed only of round brackets; OR: an opening round bracket followed by "bal" followed by a closing round bracket; OR: any number, including none, of repetitions of "bal".

But the "span" seemed wrong, because it could match an unbalanced string such as "))(". And the third alternative also seemed wrong, because "arbno" can match any number, including none, of repetitions. So it could match the empty string, which "bal" doesn't.

So — and this is where failure and backtracking enter — I checked "bal"'s behaviour with this little Snobol program:

&anchor = 1 &fullscan = 1

"x(a)()xxx(" bal $ output fail

end

This matches the string "x(a)()xxx(" against the pattern. The "bal $ output" displays the matched substring on a line by itself, by assigning it to "output", a variable that Snobol associates with the standard output stream. And the "fail" backtracks into "bal", making it try an alternative match. Because "fail" can never succeed, the pattern will keep backtracking until "bal" finds no more matches. But in doing so, it will display all these matches. This tells me exactly which substrings "bal" will match, from which I can see whether any of them is empty.

This is the program's output, and you can see that none of the alternatives is empty. The output shows how "bal" steps neatly along, matching successively longer balanced portions of the subject:

For comparison, here is corresponding output from my Prolog "bal". I have called it from the built-in predicate forall/2, which finds all solutions to the first argument, invoking the second for each one. Here, that's all solutions to "bal(M)"; the call to "format" displays M:

In fact, "forall" merely packages Prolog "fail" into a cleaner control structure. I could have got the same effect with:

7 ?- phrase( bal(M), "x(a)()xxx(", _ ), format('~s~n', [M] ), fail.

Another example where failure lists all alternative matches is in the next program. I think I saw this example, or one similar, in that standard reference book The SNOBOL4 Programming Language by R. E. Griswold, J. F. Poage, and I. P. Polonsky.

The program matches a string against a pattern made by concatenating two sets of alternative strings and a single string. As before, "fail" causes all matches to be generated. One slight change, not significant, is that now I've stored the main part of the pattern in a variable, "bread". I've concatenated "fail" to this in the pattern-match statement:

Here is my Prolog equivalent. My "bread" pattern calls the immediate-assignment operator "$" that I defined in Arbno, the Cursor, and Snobol Patterns in Prolog. Apart from that, it uses only standard DCG primitives: the semicolon is a DCG alternation operator, analogous to Snobol "|" and translated by Prolog's DCG pre-processor into the ";/2" predicate for disjunction of goals:

And here is its output, which I'll run once with an explicit "fail", and once from "forall". The "{fail}"'s insert a call to fail/0 into the first argument to "phrase". Protecting goals from the DCG pre-processor with curly brackets in this way is a standard feature of Prolog:

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!