The beauty of math; the humor of stupidity.

Friday Pathological Programming: Programming Through Grammars in Thue

Today’s treat: [Thue][thue], pronounced two-ay, after a mathematician named Axel Thue.
Thue is a language based on [semi-Thue][semi-thue] grammars. semi-Thue grammars are equivalent to Chosmky-0 grammars, except that they’re even more confusing: a semi-Thue grammar doesn’t distinguish between terminal and non-terminal symbols. (Since Chomsky-0 is equivalent to a turing machine, we know that Thue is turing complete.) The basic idea of it is that a program is a set of rewrite rules that tell you how to convert one string into another. The language itself is incredibly simple; programs written in it are positively mind-warping.
There are two parts to a Thue program. First is a set of grammar rules, which are written in what looks like pretty standard BNF: “*<lhs> ::= <rhs>*”. After that, there is a section which is the initial input string for the program. The two parts are separated by the “::=” symbol by itself on a line.
As I said, the rules are pretty much basic BNF: whatever is on the left-hand side can be replaced with whatever is on the right-hand side of the rule, whenever a matching substring is found. There are two exceptions, for input and output. Any rule whose right hand side is “:::” reads a line of input from the user, and replaces its left-hand-side with whatever it read from the input. And any rule that has “~” after the “::=” prints whatever follows the “~” to standard out.
So, a hello world program:
x ::=~ Hello world
::=
x
Pretty simple: the input string is “x”; there’s a replacement for “x” which replaces “x” with nothing, and prints out “Hello world”.
How about something much more interesting. Adding one in binary. This assumes that the binary number starts off surrounded by underscore characters to mark the beginning and end of a line.
1_::=1++
0_::=1
01++::=10
11++::=1++0
_0::=_
_1++::=10
//::=:::
::=
_//_
Let’s tease that apart a bit.
1. Start at the right hand side:
1. “1_::=1++” – If you see a “1” on the right-hand end of the number,
replace it with “1++”, removing the “_” on the end of the line. The rest
of the computation is going to use the “++” to mark the point in
string where the increment is going to alter the string next.
2. “0_::=1” – if you see a “0” on the right-hand-end, then replace it with 1. This doesn’t insert a “++”, so none of the other rules are going to do anything: it’s going to stop. That’s what you want: if the string ended in a zero, then adding one to it just changes that 0 to a 1.
2. Now, we get to something interesting: doing the addition:
1. If the string by the “++” is “01”, then adding one will change it to “10”, and that’s the end of the addition. So we just change it to 10, and get rid of the “++. ”
2. If the string by the “++” is “11”, then we change the last digit to “0”, and move the “++” over to the left – that’s basically like adding 1 to 9 in addition: write down a zero, carry the one to the left.
3. Finally, some termination cleanup. If we get to the left edge, and there’s a zero, after the “_”, we can get rid of it. If we get to the left edge, and there’s a “1++”, then we replace it with 10 – basically adding a new digit to the far left; and get rid of the “++”, because we’re done.
3. Finally, inputting the number “//::=:::” says “If you see “//”, then read some input from the user, and replace the “//” with whatever you read.
4. Now we have the marker for the end of the rules section, and the string to start the program is “_//_”. So that triggers the input rule, which inputs a binary number, and puts it between the “_”s.
Let’s trace this on a couple of smallish numbers.
* Input: “111”
1. “_111_” → “_111++”
2. “_111++_” → “_11++0”
3. “_11++0” → “_1++00”
4. “_1++00” → “1000”
* Input: “1011”
1. “_1011_” → “_1011++”
2. “_1011++” → “_101++0”
3. “_101++0” → “_1100”.
Not so hard, eh?
Decrement is the same sort of thing; here’s decrement with an argument hardcoded instead of reading from input:
0_::=0–
1_::=0
10–::=01
00–::=0–1
_1–::=@
_0–::=1
_0::=
::=
_1000000000000_
Now, something more complicated. From [here][vogel], a very nicely documented program that computes fibonacci numbers. It’s an interesting mess – it converts things into a unary form, does the computation in unary, then converts back. This program uses a fairly neat trick to get comments. Thue doesn’t have any comment syntax. But they use “#” on the left hand side as a comment marker, and then make sure that it never appears anywhere else – so none of the comment rules can ever be invoked.
—-
#::= fibnth.t – Print the nth fibonacci number, n input in decimal
#::= (C) 2003 Laurent Vogel
#::= GPL version 2 or later (http://www.gnu.org/copyleft/gpl.html)
#::= Thue info at: http://www.catseye.mb.ca/esoteric/thue/
#::= This is modified thue where only foo::=~ outputs a newline.
#::= ‘n’ converts from decimal to unary-coded decimal (UCD).
n9::=*********,n
n8::=********,n
n7::=*******,n
n6::=******,n
n5::=*****,n
n4::=****,n
n3::=***,n
n2::=**,n
n1::=*,n
n0::=,n
n-::=-
#::= ‘.’ prints an UCD number and eats it.
.*********,::=.~9
.********,::=.~8
.*******,::=.~7
.******,::=.~6
.*****,::=.~5
.****,::=.~4
.***,::=.~3
.**,::=.~2
.*,::=.~1
.,::=.~0
~9::=~9
~8::=~8
~7::=~7
~6::=~6
~5::=~5
~4::=~4
~3::=~3
~2::=~2
~1::=~1
~0::=~0
#::= messages moving over ‘,’, ‘*’, ‘|’
*<::=<*
,<::=<,
|<::=*::=*0>
0>,::=,0>
0>|::=|0>
1>*::=*1>
1>,::=,1>
1>|::=|1>
2>*::=*2>
2>,::=,2>
2>|::=|2>
#::= Decrement n. if n is >= 0, send message ‘2>’;
#::= when n becomes negative, we delete the garbage using ‘z’ and
#::= print the left number using ‘.’.
*,-::=,2>
,,-::=,-*********,
|,-::=z
z*::=z
z,::=z
z|::=.
#::= move the left number to the right, reversing it (0 becomes 9, …)
#::= reversal is performed to help detect the need for carry. As the
#::= order of rule triggering is undetermined in Thue, a rule matching
#::= nine consecutive * would not work.
*c<::=c
,c<::=c
|c
0>d*::=d
1>d::=d*********,
#::= when the copy is done, ‘d’ is at the right place to begin the sum.
2>d::=f<|
#::= add left to right. 'f' moves left char by char when prompted by a '<'.
#::= it tells 'g' to its right what the next char is.
*f
,f
#::= when done for this sum, decrement nth.
|f’ ‘g’ drops an ‘i’ that increments the current digit.
0>g::=ig
*i::=<
,i::=i,*********
|i::=’ tells ‘g’ to move left to the next decimal position,
#::= adding digit 0 if needed (i.e. nine ‘*’ in reversed UCD)
*1>g::=1>g*
,1>g::=g::=|’ tells ‘g’ that the sum is done. We then prepare for copy (moving
#::= a copy cursor ‘c’ to the correct place at the beginning of the left
#::= number) and reverse (using ‘j’) the right number back.
*2>g::=2>g*
,2>g::=2>g,
|2>g::=c|*********j
#::= ‘j’ reverses the right number back, then leaves a copy receiver ‘d’
#::= and behind it a sum receiver ‘g’.
*j*::=j
j,*::=,********j
j,,::=,*********j,
j,|::=’ sent by ‘-‘
#::= will start when hitting ‘g’ the first swap + sum cycle
#::= for numbers 0 (left) and 1 (reversed, right).
::=
|n?-|,|********g,|
——
Ok. Now, here’s where we go *really* pathological. Here, in all it’s wonderful glory, is a [Brainfuck][brainfuck] interpreter written in Thue, written by Mr. Frederic van der Plancke. (You can get the original version, with documentation and example programs, along with a Thue interpreter written in Python from [here][vdp-thue])
——
#::=# BRAINFUNCT interpreter in THUE
#::=# by Frederic van der Plancke; released to the Public Domain.
o0::=0o
i0::=0i
o1::=1o
i1::=1i
o]::=]o
i]::=]i
o[::=[o
i[::=[i
oz::=zo
oZ::=Zo
iz::=zi
iZ::=Zi
0z::=z0
1z::=z1
]z::=z]
[z::=z[
_z::=z_
0Z::=Z0
1Z::=Z1
]Z::=Z]
[Z::=Z[
_Z::=Z_
}}]::=}]}
&}]::=]&}
&]::=]^
}[::=[{
&[::=[0::=0}
{1::=1{
?Z[::=[&
?z[::=[^
?z]::=]
?Z]::=]^
[{{::={[{
[{::=[
[::=[^
]{::={]
]::={]
0::=
1::=1
0{::={0
1{::={1
^[::=?[iii
^]::=?]iii
}|::=}]|WARN]WARN
&|::=&]|WARN]WARN
WARN]WARN::=~ERROR: missing ‘]’ in brainfunct program; inserted at end
^000::=000^ooo
^001::=001^ooi
^010::=010^oio
^011::=011^oii
^100::=100^ioo
^101::=101^ioi
ooo|::=|ooo
ooi|::=|ooi
oio|::=iio|oio
oii|::=|oii
ioo|::=|ioo
ioi|::=|ioi
iii|::=|iii
iio|!::=|
|z::=z|
|Z::=Z|
o_::=_o
i_::=_i
0!::=!0
1!::=!1
_!::=!_
/!::=!/
oooX!::=Xooo
oooY::=$Y
0$::=!1
1$::=$0
X$::=X!1
ooiX!::=Xooi
ooiY::=@1Y
1@1::=@00
0@1::=@11
1@0::=@01
0@0::=@00
X@1::=X!WARNDEC0WARN
WARNDEC0WARN::=~WARNING: attempt at decrementing zero (result is still zero)
X@00::=X@0
X@01::=X!1
X@0Y::=X!0Y
oioX!::=LLYLR
0LL::=LL0
1LL::=LL1
_LL::=!X!
|LL::=|!0X!
LR0::=0LR
LR1::=1LR
LRY::=_
oiiX!::=_oii
oiiY::=X!%
%0::=0%
%1::=1%
%_0::=Y0
%_1::=Y1
%_/::=Y0_/
iooX!::=X(in)
(in)0::=(in)
(in)1::=(in)
(in)Y::=Yi
i/0::=Zi/
i/1::=zi/
i/_::=!/
XY!::=X!0Y
0Y!::=0!Y
1Y!::=1!Y
YZ::=0Y
Yz::=1Y
ioiX!::=X(out)
(out)0::=0(out)O0
(out)1::=1(out)O1
(out)Y::=OO!Y
O0::=~0
O1::=~1
OO::=~_
iiiX!0::=ZX!0
iiiX!1::=zX!1
+::=000
-::=001
::=011
,::=100
.::=101
:::=|X!0Y0_/
::=
^*
[vdp-thue]: http://fvdp.homestead.com/files/eso_index.html#BFThue
[thue]: http://esoteric.voxelperfect.net/wiki/Thue
[semi-thue]: http://en.wikipedia.org/wiki/Semi-Thue_grammar
[brainfuck]: http://scienceblogs.com/goodmath/2006/07/gmbm_friday_pathological_progr.php
[vogel]: http://lvogel.free.fr/thue.htm

I just ran across an old article describing a joke pathological language called Babbage, still in the design phase.
A sample:

The Babbage Language Design Group is continuously evaluating new features that will keep its users from reaching any level of effectiveness. For instance, Babbage’s designers are now considering the ALMOST EQUALS sign, used for comparing two floating-point numbers. This new feature “takes the worry out of being close.”

The binary incrementing script occasionally leaves ‘_’ at the very beginning of a result. I think, the following would be a more adequate version:
1b::=1++
0b::=1
01++::=10
11++::=1++0
a0::=a
a1++::=a10
0_::=_0
1_::=_1
a_0::=a_
a_1::=1
//::=:::
::=
a//b_