The
(range 99
-1
-1)
expression produces a lazy list of integers from 99 down to
-1. The
mapcar*
function lazily maps these numbers to strings, and the rest of the code treats this lazy list as text stream to process, extracting the numbers with some pattern matching cases and interpolating them into the song's text. Functional programming with lazy semantics meets text processing, pattern matching and here documents.

@(next:list@(mapcar*(funtostring)(range99-1-1)))@(collect)@number@ (trailer)@number_less_1@ (cases)@ (bindnumber"1")@ (output)
1 bottle of beer one the wall
1 bottle of beer
@ (end)@ (or)@ (output)@number bottles of beer one the wall
@number bottles of beer
@ (end)@ (end)@ (cases)@ (bindnumber"0")@ (output)
Go to the store and get some more,
99 bottles of beer on the wall!
@ (end)@ (or)@ (output)
Take one down and pass it around
@number_less_1 bottles of beer on the wall
@ (end)@ (end)@(end)

$ txr align-columns.txr align-columns.dat
Given a text file of many lines, where fields within a line
are delineated by a single 'dollar' character, write a program
that aligns each column of fields by ensuring that words in each
column are separated by at least one space.
Further, allow for each word in a column to be either left
justified, right justified, or center justified within its column.
Given a text file of many lines, where fields within a line
are delineated by a single 'dollar' character, write a program
that aligns each column of fields by ensuring that words in each
column are separated by at least one space.
Further, allow for each word in a column to be either left
justified, right justified, or center justified within its column.
Given a text file of many lines, where fields within a line
are delineated by a single 'dollar' character, write a program
that aligns each column of fields by ensuring that words in each
column are separated by at least one space.
Further, allow for each word in a column to be either left
justified, right justified, or center justified within its column.

This is not exactly the implementation of an operator, but a solution worth presenting. The language has the built in pattern matching and backtracking behavior suited for this type of text mining task.
For convenience, we prepare the data in four files:

As you can see, this has the "nondeterministic flavor" of Amb. The
@(skip)
directives"magically" skip over the lines of input that do not succeed.
This example naturally handles empty strings, since the
first_last
function simply does not match such inputs.
Here is how to embed the task's specific data in the code:

For the Y combinator approach in TXR, see the Y combinator task.
The following easy transliteration of one of the Common Lisp solutions shows the conceptual and cultural compatibility between TXR Lisp macros and CL macros:

Print 1 through 10 out of a vector, using
prinl
the callback, right from the system shell command prompt:

$ txr -e'[mapdo prinl #(1 2 3 4 5 6 7 8 9 10)]'12345678910

mapdo
is like
mapcar
but doesn't accumulate a list, suitable for imperative programming situations when the function is invoked to perform a side effect.
TXR extends Lisp list processing primitives to work with vectors and strings also, which is why
mapdo
cheerfully traverses a vector.

TXR has two kinds of aggregate objects for sequences: lists and arrays. There is some syntactic sugar to manipulate them in the same way.

Literals

In the pattern matching language, there are no list literals. A list like
("a" "b" "c")
is actually being evaluated, as can be seen in a directive such as
@(bind (a b) (c "d"))
where
(c "d")
is a list consisting of the value of variable
c
and the string
"d"
. This is subject to destructuring and the two values are assigned to the variables
a
and
b
In TXR Lisp, there are literal lists introduced by a quote
'(1 2 3 4)
. Vectors look like this:
#(1 2 3 4)
.

Construction

Lists can be implicitly produced using pattern matching. Lists and vectors can be constructed using the functions of TXR Lisp.
(vector 3)
creates a vector of length three, whose elements are initialized to
nil
.
(list 1 2 3)
constructs the list
(1 2 3)
.

Array Indexing Notation

The
[] notation performs positional indexing on lists and arrays, which are both zero
-based (element zero is the first element). Negative indices work from the tail of the list, whereby
-1 denotes the last element of a sequence which has at least one element. Out of bounds access to arrays throws exceptions, but out of bounds access to lists produces nil. Out
-of
-bounds assignments are not permitted for either data type.

Array Range Notation

Array range notation (slices) are supported, for both arrays and lists. An array range is a pair object denoted
a .. b
,
which is a syntactic sugar for
(cons a b)
. Therefore, a range constitutes a single argument in the bracket notation (allowing for straightforward future extension to multi
-dimensional arrays indexing and slicing).

Other Kinds of Objects

The
[]
notation also works with strings, including ranges and assignment to ranges.
Hash tables can be indexed also, and the notation is meaningful for functions:
[fun args ...]
means the same thing as
(call fun args ...)
, providing a Lisp
-1 flavor within a Lisp
-2 dialect.

The strategy here, one of many possible ones, is to build, at run time,the arguments to be passed to deffilter to construct a pair of filters
enc
and
dec
for encoding and decoding. Filters are specified as tuples of strings.

Output:

TXR has repeating and non
-repeating permutation and combination functions that produce lazy lists. They are generic over lists, strings and vectors. In addition, the combinations function also works over hashes.
Combinations and permutations are produced in lexicographic order (except in the case of hashes).

Command line arguments in TXR's pattern
-based extraction language can be treated as the lines of a text stream, which is arranged using the directive
@(next :args)
. Thus TXR's text parsing capabilities work over the argument list.
This
@(next :args)
should be written as the first line of the TXR program, because TXR otherwise interprets the first argument as the name of an input file to open.

Arguments are also available via two predefined variables:
*full
-args*
and
*args*
, which are lists of strings, such that
*args*
is a suffic of
*full
-args*
.
*full
-args*
includes the arguments that were processed by TXR itself;
*args*
omits them.
Here is an example program which requires exactly three arguments. Note how
ldiff
is used to compute the arguments that are processed by TXR (the interpreter name, any special arguments and script name), to print an accurate usage message.

In TXR, most directives are conditionals, because they specify some kind of match. Given some directive D, the underlying logic in the language is, roughtly, "if D does not match at the current position in the input, then fail, otherwise the input advances according to the semantics of D".
An easy analogy to regular expressions may be drawn. The regex /abc/ means something like "if a doesn't match, then fail, otherwise consume a character and if b doesn't match, then fail, otherwise consume another character and if c doesn't match, then fail otherwise consume another character and succeed." The expressive power comes from, in part, not having to write all these decisions and book
-keeping.
The interesting conditional
-like structures in TXR are the parallel directives, which apply separate clauses to the same input, and then integrate the results in various ways.
For instance the
choose
construct will select, from among those clauses which match successfully, the one which maximizes or minimizes the length of an extracted variable binding:

@(choose:shortestx)@x:@y@(or)@x<--@y@(or)@x+@y@(end)

Suppose the input is something which can match all three patterns in different ways:

foo
The outcome (with
txr
-B
) will be:

x="foo"
y="bar:baz+xyzzy"

because this match minimizes the length of
x
. If we change this to
:longest x
, we get:

x="foo
The
cases
,
all
and
none
directives most resemble control structures because they have short
-circuiting behavior. For instance:

If any subclause fails to match, then
all
stops processing subsequent clauses. There are subtleties though, because an earlier clause can produce variable bindings which are visible to later clauses. If previously bound variable is bound again, it must be to an identical piece of text:

@# match a line which contains some piece of text x@# after the rightmost occurence of : such that the same piece@# of text also occurs at the start of the line preceded by -->@(all)@*junk:@x@(and)
-->@x@/.*/@(end)

Pattern Matching

@(binda"")

If
a
is unbound, a binding is created, containing the empty string.
If
a
is already bound,
bind
succeeds if
a
contains the empty string, and the pattern matching continues at the next directive. Or else a failure occurs, triggering backtracking behavior.

A recently added
gather
directive is useful for extracting multiple items of data from an unordered stream of this kind (not only the environment vector):

@(next:env)@(gather)
HOME=@home
USER=@user
PATH=@path@(end)

What if some of the variables might not exist? Gather has some discipline for that. The following means that three variables are required (the gather construct fails if they are not found), but
shell
is optional with a default value of
/bin/sh
if it is not extracted from the data:

Here, the hash is being used as a function to filter several environment keys to their values via
mapcar
.
Platform note: On POSIX, environment variables, which are extracted using
extern char **environ
are assumed to contain UTF
-8. On Windows, the
GetEnvironmentStringsW
function is used to obtain the environment vector as wide character data.

Here is a complicated exceptions example straight from the manual.
This is a deliberately convoluted way to process input consisting of lines which have the form:

{monkey | gorilla | human}

Some custom exceptions are defined, and arranged into a hierarchy via @(defex) directives. An exception precedence hierarchy is established. A gorilla is a kind of ape, and an ape is a kind of primate. A monkey is a kind of primate, and so is a human.
In the main @(collect) clause, we have a try protect block in which we collect three different cases of primate. For each one, we throw an exception with the primate type symbol, and its name. This is caught in the catch clause as the argument "name". The catch clause performs another pattern match, @kind @name. This match is being applied to exactly the same line of data for which the exception was thrown (backtracking!). Therefore the @kind variable will collect the primate type. However @name already has a binding since it is the argument of the catch. Since it has a value already, that value has to match what is in the data. Of course, it does since it was derived from that data. The data and the variable unify against each other.

In TXR, there are pattern functions which are predicates that perform pattern matching and variable capture. A call to this type of function call can specify unbound variables. If the function succeeds, it can establish bindings for those variables.
Here is how to make a pattern function that multiplies, and call it. To multiply the numbers, we break out of the pattern language and invoke Lisp evaluation:
@(* a b)

@(definemultiply(about))@(bindout@(*ab))@(end)@(multiply34result)

$ txr -B multiply.txr
result="12"

In the embedded Lisp dialect, it is possible to write an ordinary function that returns a value:

TXR was originally conceived out of the need to have "there documents": parse a document and
extract variables, but in a style similar to generation of here documents. Here doc output was added later.
We use @(maybe)/@(or)/@(end) to set up some default values for variables which are overridden from the command line.
Unification fails for an overridden variable, which is why we have to separate out the bind directives into the
branches of a maybe.
By passing the script to txr using
-f we can pass additional command arguments to the resulting script
which are interpreted by txr.

The second line is a shorthand which defines a lab to be a kind of dog, and at the same time a dog to be a kind of animal.
If we throw an exception of type
lab
, it can be caught in a catch for a
dog
or for an
animal
. Continuing with the query:

Parsing

The following implements the parsing half of the task. It is a parser closely based on the JSON grammar
www.json.org/fatfree.html
.
It is implemented with recursive horizontal pattern matching functions, and so basically the definition resembles a grammar. Horizontal functions are a new feature in TXR, and basically allow the language to easily specify LL grammars with indefinite lookahead, not restricted to regular languages (thanks to TXR's backtracking). The numerous occurences of @\ in the code are line continuations. Horizontal functions must be written on one logical line. @\ eats the whitespace at the start of the next physical line, to allow indentation.
The parser translates to a nested list structure in which the types are labeled with the strings "O", "A", "N", "S" and "K". (Object, array, number, string, and keyword).
The largest grammar rule handles JSON string literals. The strategy is to generate a HTML string and then filter from HTML using the
:from_html
filter in TXR. For instance \uABCD is translated to
&#xABCD;
and then the filter will produce the proper Unicode character. Similarly \" is translated to
&quot;
and \n is translated to
etc.
A little liberty is taken: the useless commas in JSON are treated as optional.
Superfluous terminating commas (not generated by the JSON grammar but accepted by some other parsers) are not allowed by this parser.

A few tests. Note, the
badsyntax
variable is bound to any trailing portion of the input that does not match the syntax. The call to the parser
@(value v)
extracts the longest prefix of the input which is consistent with the syntax, leaving the remainder to be matched into
badsyntax
.

TXR Lisp, using each

Translation of Scheme

Translation of Scheme

No "srfi
-43" required:

@(do;; Scheme's vector-for-each: a one-liner in TXR;; that happily works over strings and lists.(defunvector-for-each(fun.vecs)[applymapcarfun(range)vecs])(defundisplay(obj:(stream*stdout*))(pprintobjstream))(defunnewline(:(stream*stdout*))(display#\newlinestream))(let((a(vec"a""b""c"))(b(vec"A""B""C"))(c(vec123)))(vector-for-each(lambda(current-indexi1i2i3)(displayi1)(displayi2)(displayi3)(newline))abc)))

Quality Breadth-First

The following is a complete, self
-contained command line utility.
The algorithm is quite different from the previous. This version is not recursive. This algorithm divides the maze cells into visited cells, frontier cells and unvisited cells. As in the DFS version, border cells outside of the maze area are pre
-initialized as visited, for convenience. The frontier set initially contains the upper left hand corner.
The algorithm's main loop iterates while there are frontier cells. As the generation progresses, unvisited cells adjacent to frontier cells added to the frontier set. Frontier cells that are only surrounded by other frontier cells or visited cells are removed from the frontier set and become visited cells. Eventually, all unvisited cells become frontier cells and then visited cells, at which point the frontier set becomes empty and the algorithm terminates.
At every step, the algorithm picks the first cell in the frontier list. In the code, the frontier cells are kept in a hash called
fr
and also in a queue
q
. The algorithm tries to extend the frontier around the frontier cell which is at the head of the queue
q
by randomly choosing an adjacent unvisited cell. (If there is no such cell, the node is not a frontier node any more and is popped from the queue and
fr
set). If an unvisited node is picked, then a two
-way path is broken from the given frontier cell to that cell, and that cell is added to the frontier set. '''Important:''' the new frontier cell is added to the head of the queue, rather than the tail.
The algorithm is modified by a "straightness" parameter, which is used to initialize a counter. Every time a new frontier node is added to the front of the queue, the counter decrements. When it reaches zero, the frontier queue is scrambled, and the counter is reset. As long as the count is nonzero, the maze growth proceeds from the previously traversed node, because the new node is placed at the head of the queue. This behavior mimics the DFS algorithm, resulting in long corridors without a lot of branching.
At the user interface level, the straightness parameter is represented as a percentage. This percentage is converted to a number of cells based on the width and height of the maze. For instance if the straightness parameter is 15, and the maze size is 20x20, it means that 15% out of 400 cells, or 60 cells will be traversed before the queue is scrambled. Then another 60 will be traversed and the queue will be scrambled, and so forth.

Example define a while loop which supports break and continue. Two forms of break are supported
break
which causes the loop to terminate with the return value
nil
and
(break <form>)
which returns the specified value.

Using text-extraction pattern language

Here, the separators are embedded into the syntax rather than appearing as a datum. Nevertheless, this illustrates how to do that small tokenizing task with various separators.
The clauses of
choose
are applied in parallel, and all potentially match at the current position in the text.
However
:shortest tok
means that only that clause survives (gets to propagate its bindings and position advancement) which minimizes the length of the string which is bound to the
tok
variable.
The
:gap 0
makes the horizontal collect repetitions strictly adjacent. This means that
coll
will quit when faced with a nonmatching suffix portion of the data rather than scan forward (no gap allowed!). This creates an opportunity for the
tail
variable to grab the suffix which remains, which may be an empty string.

Informal proof.
We consider what happens if we make an alteration to the code and feed it to the original.
Changing any character of
narcissist.txr
can be divided into two cases.

Case 1: modification is done to line 1. This difference be caught by the @(bind firstline ...) directive later down in the query, causing a matching failure. The first line is verified to have exactly the form that it does, with the given base 64 string embedded.

Case 2: modification is done in some other line. This difference will be caught by @(bind in64 my64) because a modification to the data after the first line changes the base 64 string that is computed from that data, making in64 not equivalent to my64, leading to a match failure.

These cases are an exhaustive partitioning of the possibilities; there are no ways to modify the data which do not land into one of these cases.
Nothing in the query calls for any iteration or recursion. Termination depends on the base64 and sed utilities munching through the input, which presumably process an input of size N in O(N) steps. On that note, we could limit how many lines of the input are passed to base64 by using
sed
-n
-e '2,20p'
.

Here is somewhat verbose program showing a different approach.
The idea is to start with the last two verses of the song, and then work backwards to produce the earlier verses. This is done by recursively pattern matching on the song to extract text and produce the earlier verse, which is then prepended to the song.
The later verse does not contain one key piece of information we need to produce the prior verse: the animal
-specific answer line for the prior animal. So we look this up by scanning a text which serves as a table.
The recursion terminates when the second pattern case matches the first verse: the third line is "Perhaps she'll die". In this case the song is not lengthened any more, and a terminating flag variable is bound to true.
Note one detail: in the first verse we have "... don't know why she swallowed the fly". But in subsequent verses it is
"that fly" not "the fly". So we do a lookup on the fly also to substitute the appropriate line, and in the fly case we skip the original line (see the first
@(maybe)
).

This solution is a little long because it works by translating RPN to fully parenthesized prefix (Lisp notation).
Also, it improves upon the problem slightly. Note that for the operators
*
and
+
, the associativity is configured as
nil
("no associativity") rather than left
-to
-right. This is because these operators obey the associative property:
(a + b) + c
is
a + (b + c)
, and so we usually write
a + b + c or a * b * c
without any parentheses, leaving it ambiguous which addition is done first. Associativity is not important for these operators.
The
lisp
-to
-infix
filter then takes advantage of this non
-associativity in minimizing the parentheses.

Partial application is built in via the
op
operator, so there is no need to create all these named functions, which defeats the purpose and beauty of partial application: which is to partially apply arguments to functions in an anonymous, implicit way, possibly in multiple places in a single expression.
Indeed, functional language purists would probably say that even the explicit
op
operator spoils it, somewhat.

Note how in the above, '''no''' function arguments are explicitly mentioned at all except the necessary reference
@1
to an argument whose existence is implicit.
Now, without further ado, we surrender the concept of partial application to meet the task requirements:

A suite for four variations on a theme. The first three use HTML encoding to avoid solving quoting problem. The third stops using
&#10;
to encode newlines, but instead represents the coded portion of the program as a list of lines rather than a string containing newlines encoded in some other way. The fourth dispenses with the HTML crutch and solves the quoting problem with a filter defined in the program itself.

TXR 50 has a PRNG API, and uses a re
-implementation of WELL 512 (avoiding contagion by the "contact authors for commercial uses" virus present in the reference implementation, which attacks BSD licenses). Mersenne Twister was a runner up. There is an object of type random
-state, and a global variable *random
-state* which holds the default random state. Programs can create random states which are snapshots of existing ones, or which are seeded using an integer value (which can be a bignum). The random function produces a random number modulo some integer value, which can have arbitrary precision. The random
-fixnum function produces a non
-heap
-allocated positive integer with random bits.

Note how the junk in the last example does not contain the trailing comma. This is because the rangelist grammar production allows for an empty range, so syntax like "5," is valid: it's an entry followed by a comma and a rangelist, where the rangelist is empty.

From the top

Variable "line" matches and takes eighth line of input:

@(skipnil7)@line

From the bottom

Take the third line from the bottom of the file, if it exists.

@(skip)@line@(skip12)@(eof)

How this works is that the first
skip
will skip enough lines until the rest of the query successfully matches the input. The rest of the query matches a line, then skips two lines, and matches on EOF. So
@line
can only match at one location: three lines up from the end of the file. If the file doesn't have at least three lines, the query fails.

The freeform directive in TXR causes the remaining lines of the text stream to be treated as one big line, catenated together. The default line terminator is the newline "\n". This lets the entire input be captured into a single variable as a whole
-line match.

Search and replace: simple

How it works is that the body of the
coll
uses a double
-variable match: an unbound variable followed by a regex
-match variable. The meaning of this combination is, "Search for the regular expression, and if successful, then bind all the characters whcih were skipped over by the search to the first variable, and the matching text to the second variable."
So we collect pairs: pieces of mismatching text, and pieces of text which match the regex
dog
. At the end, there is usually going to be a piece of text which does not match the body, because it has no match for the regex. Because
:gap 0
is specified, the coll construct will terminate when faced with this nonmatching text, rather than skipping it in a vain search for a match, which allows
@suffix
to take on this trailing text.
To output the substitution, we simply spit out the mismatching texts followed by the replacement text, and then add the suffix.

Search and replace: strip comments from C source

Based on the technique of the previous example, here is a query for stripping C comments from a source file, replacing
them by a space. Here, the "non
-greedy" version of the regex Kleene operator is used, denoted by
%
. This allows for a very simple, straightforward regex which correctly matches C comments. The
freeform
operator allows the entire input stream to be treated as one big line, so this works across multi
-line comments.

TXR functions return material by binding unbound variables.
The following function potentially returns three values, which will happen if called with three arguments, each of which is an unbound variable:

@(definefunc(xyz))@ (bindw"discarded")@ (bind(xyz)("a""b""c"))@(end)

The binding
w
, if created, is discarded because
w
is not in the list of formal parameters. However,
w
can cause the function to fail because there can already exist a variable
w
with a value which doesn't match
"discarded"
.
Call:

@(functrs)

If
t
,
r
and
s
are unbound variables, they get bound to
"a"
,
"b"
and
"c"
, respectively via a renaming mechanism. This may look like C++ reference parameters or Pascal "var" parameters, and can be used that way, but isn't really the same at all.
Failed call ("1" doesn't match "a"):

The
:vars ()
argument to
collect
means that it still iterates, but doesn't actually collect anything (empty list of variables). This is important, so that there isn't a growing data structure being accumulated as the input is processed.
Via TXR Lisp:

Translation of Common Lisp

In TXR's embedded Lisp dialect, we can implement the same solution as Lisp or Scheme: transform the code fragment by wrapping a
let
around it which binds a variable, and then evaluating the whole thing:

Explicit environment manipulation has the disadvantage of being hostile against compiling. (See notes about compilation in the Common Lisp example.)
there is an
eval
function which takes an environment parameter. However, currently there isn't any access to the manipulation of environment objects. It's probably a bad idea because run time tricks with lexical environments lead to programs that are not compilable.
Lastly, we can also solve this problem using dynamically scoped (a.k.a "special") variables. The problem description specifically says that the solution is not to use global variables. Though we must define the variables as global, we do not use the global bindings; we use dynamic bindings.
There is a hidden global variable, namely the dynamic environment itself. That's how
eval
is able to resolve the free
-variable
x
occurring in
code
-fragment
without receiving any environment parameter.
However, our two
let
constructs carefully save and restore the dynamic environment (and therefore any prior value of
x
), even in the face of exceptions, and

However, note that the
@
character has a special meaning:
@#
turns into
(sys:var #)
. TXR's printer right now does not convert this back to
@
notation upon printing (fixed in git master now). (The purpose of this notation is to support Lisp code that requires meta
-variables: variables distinguished from variables. For instance, logic pattern matching or unification code. Instead of hacks like name
-based conventions (for instance
x?
is a meta
-variable,
x
is ordinary), why not build it into the language:
@x
is a meta
-var, identifiable by special abstract syntax, and
x
is just an atom, a symbol. There is also
@(foo ...)
which expands into
(sys:expr foo ...)
, doing a similar thing for expressions.
The following solution avoids "cheating" in this way with the built
-in parser; it implements a from
-scratch S
-exp parser which treats
!@#
as just a symbol.
The grammar is roughly as follows:

TODO: Note that the recognizer for string literals does not actually process the interior escape sequences
\"
; these remain as part of the string data. The only processing is the stripping of the outer quotes from the lexeme.
Explanation of most confusing line:

@/\s*\(\s*/@(coll:vars(e))@(expre)@/\s*/@(last))@(end)

First, we match an open parenthesis that can be embedded in whitespace. Then we have a
@(coll)
construct which terminates with
@(end)
. This is a repetition construct for collecting zero or more items. The
:vars (e)
argument makes the collect strict: each repetition must bind the variable
e
. More importantly, in this case, if nothing is
collected, then
e
gets bound to
nil
(the empty list). The collect construct does not look at context beyond itself. To terminate the collect at the closing parenthesis we use
@(last))
. The second closing parenthesis here is literal text to be matched, not TXR syntax. This special clause establishes the terminating context without which the collect will munge all input. When the last clause matches, whatever it matches is consumed and the collect ends. (There is a related
@(until)
clause which terminates the collect, but leaves its own match unconsumed.)

Functions and filters are global in TXR. Variables are pattern matching variables and have a dynamically scoped discipline. The binding established in a clause is visible to other clauses invoked from that clause, including functions. Whether or not bindings survive from a given scope usually depends on whether the scope, overall, failed or succeeded. Bindings established in scopes that terminate by failing (or by an exception) are rolled back and undone. The
@(local)
or
@(forget)
directives, which are synonyms, are used for breaking the relationship between variables occuring in a scope, and any bindings those variables may have. If a clause declares a variable forgotten, but then fails, then this forgetting is also undone; the variable is known once again. But in successful situations, the effects of forgetting can be passed down.
Functions have special scoping and calling rules. No binding for a variable established in a function survives the execution of the function, except if its symbol matches one of the function parameters, call it P, and that parameter is unbound (i.e. the caller specified some unbound variable A as the argument). In that case, the new binding for unbound parameter P within the function is translated into a new binding for unbound argument A at the call site. Of course, this only happens if the function succeeds, otherwise the function call is a failure with no effect on the bindings.
Illustration using named blocks. In the first example, the block succeeds and so its binding passes on:

Translation of Common Lisp

Mostly the same logic. The
count
-and
-say
function is based on the same steps, but stays in the string domain instead of converting the input to a list, and then the output back to a string. It also avoids building the output backwards and reversing it, so
out
must be accessed on the right side inside the loop. This is easy due to Python
-inspired array indexing semantics:
-1 means last element,
-2 second last
and so on.
Like in Common Lisp, TXR's
sort
is destructive, so we take care to use
copy
-str
.

Translation of Racket

@(do;; Macro very similar to Racket's for/fold(defmacrofor-accum(accum-var-initseach-vars.body)(let((accum-vars[mapcarfirstaccum-var-inits])(block-sym(gensym))(next-args[mapcar(ret(progn@rest(gensym)))accum-var-inits])(nvars(lengthaccum-var-inits)))^(let,accum-var-inits(flet((iter(,*next-args),*[mapcar(ret^(set,@1,@2))accum-varsnext-args]))(each,each-vars,*body)(list,*accum-vars)))))(defunnext(s)(let((v(vector100)))(each((cs))(inc[v(-#\9c)]))(cat-str(collect-each((xv)(i(range90-1)))(when(>x0)`@x@i`)))))(defunseq-of(s)(for*((ns()))((not(membersns))(reversens))((pushsns)(sets(nexts)))))(defunsort-string(s)[sort(copys)>])(tree-bind(lennumsseq)(for-accum((*lennil)(*numsnil)(*seqnil))((n(range10000000-1)));; start at the high end(let*((s(tostringn))(sorted(sort-strings)))(if(equalssorted)(let*((seq(seq-ofs))(len(lengthseq)))(cond((or(not*len)(>len*len))(iterlen(lists)seq))((=len*len)(iterlen(conss*nums)seq))))(iter*len(if(and*nums(membersorted*nums))(conss*nums)*nums)*seq))))(put-line`Numbers: @{nums", "}\nLength: @len`)(each((nseq))(put-line` @n`)))

Most useful txr queries consist of multiple lines, and the line structure is important. Multi
-liners can be passed via
-c
easily, but there is no provision in the syntax that would allow multi
-liners to be actually written as one physical line. There are opposite provisions for splitting long logical lines into multiple physical lines.
The
-e
(evaluate) and
-p
(evaluate and print) options provide shell one
-liner access to
TXR Lisp:

$ txr short-circuit-bool.txr
a(0) and b(0):
a (0) called
a(0) or b(0):
a (0) called
b (0) called
a(0) and b(1):
a (0) called
a(0) or b(1):
a (0) called
b (1) called
a(1) and b(0):
a (1) called
b (0) called
a(1) or b(0):
a (1) called
a(1) and b(1):
a (1) called
b (1) called
a(1) or b(1):
a (1) called

The
a
and
b
functions are defined such that the second parameter is intended to be an unbound variable. When the function binds
out
, that value propagates back to the unbound variable at the call site. But the way calls works in this language allows us to specify a value instead such as
"1"
. So now the directive
@(bind out x)
performs unification instead: if
x
doesn't match
"1"
, the function fails, otherwise it succeeds.
So simply by placing two calls consecutively, we get a short circuting conjunction. The second will not execute if the first one fails.
Short
-circuiting disjunction is provided by
@(cases)
.
The
@(maybe)
construct stops failure from propagating from the enclosed subquery. The
@(accept)
directive will bail out of the closest enclosing anonymous block (the function body) with a success. It prevents the
@(cases)
from failing the function if neither case is successful.

TXR Pattern Language

This implements the full Soundex described in
U.S. National Archives Website
. Doubled letters are condensed before separating the first letter, so that for instance "Lloyd" is not treated as L followed by the coding of LOYD but as L followed by the coding of OYD. Consecutive consonants which map to the same code are not condensed to a single occurrence of the code if they are separated by vowels, but separating W and H do not thus intervene. Names with common prefixes are encoded in two ways.

Where
expr
is Lispy syntax which can be an atom, or a list of atoms or lists in parentheses, or possibly a dotted list (terminated by an atom other than nil):

(elem1 elem2 ... elemn) proper

(elem1 elem2 ... elemn . atom) dotted

Atoms can be:

ABc123_4 symbols, represented by tokens consisting of letters, underscores and digits, beginning with a letter. Symbols have packages, e.g., system:foo, but this is not accessible from the TXR lexical conventions.

:FoO42 keyword symbols, denoted by colon, which is not part of the symbol name.

"string literals"

`quasi @literals` with embedded @ syntax

'c' characters

123 integers

/reg/ regular expressions

Within literals and regexes:

\r various backslash escapes similar to C

\\ single backslash

Within literals, quasiliterals and character constants:

\' \" \` escape any of the quotes: not available within regex.

The regex syntax is fairly standard fare, with these extensions:

~R complement of R: set of strings other than those that match R

R%S match shortest number of repetitions of R prior to S.

R&S match R and S simultaneously: the intersection of the set of strings matching S and the set matching R.

TXR Lisp

@(do(tree-case*args*((bigsmall)(cond((<(lengthbig)(lengthsmall))(put-line`@big is shorter than @small`))((str=bigsmall)(put-line`@big and @small are equal`))((match-strbigsmall)(put-line`@small is a prefix of @big`))((match-strbigsmall-1)(put-line`@small is a suffix of @big`))(t(let((pos(search-strbigsmall)))(ifpos(put-line`@small occurs in @big at position @pos`)(put-line`@small does not occur in @big`))))))(otherwise(put-line`usage: @(ldiff*full-args**args*) <bigstring> <smallstring>`))))

Pattern Language

@line@(cases)@ line@ (output)
second line is the same as first line
@ (end)@(or)@ (skip)@line@ (output)
first line is a suffix of the second line
@ (end)@(or)@ line@(skip)@ (output)
first line is a suffix of the second line
@ (end)@(or)@ prefix@line@(skip)@ (output)
first line is embedded in the second line at position @(lengthprefix)@ (end)@(or)@ (output)
first line is not found in the second line
@ (end)@(end)

Output:

$ txr cmatch.txr -
123
01234
first line is embedded in the second line at position 1
$ txr cmatch.txr -
123
0123
first line is a suffix of the second line

This solution builds up a regular expression in a hygienic way from the set of characters given as a string.
The string is broken into a list, which is used to construct a regex abstract syntax tree for a character set match, using a Lisp quasiquote. This is fed to the regex compiler, which produces an executable machine that is then used with
regsub
On the practical side, some basic structural pattern matching is used to process command line argument list.
Since the partial argument list (the arguments belonging to the TXR script) is a suffix of the full argument list (the complete arguments which include the invoking command and the script name), the classic Lisp function
ldiff
comes in handy in obtaining just the prefix, for printing the usage:

Output:

Now here is a rewrite of
strip
-chars
which just uses classic Lisp that has been generalized to work over strings, plus the
do
syntax (a sibling of the
op
operator) that provides syntactic sugar for a lambda function whose body is an operator or macro form.

(do if (memq @1 set) (list @1))
is just
(lambda (item) (if (memq item set) (list item)))
.
mappend
happily maps over strings and since the leftmost input sequence is a string, and the return values of the lambda are sequence of characters,
mappend
produces a string.

Explanation: the basic structure is
[function " a b "]
where the function is an anonymous lambda generated using the
do
operator. The function is applied to the string
" a b "
.
The structure of the
do
is
(do progn (blah @1) @1)
where the forms make references to implicit argument
@1
, and so the generated lambda has one argument, essentially being:
(lambda (arg) (blah arg) arg)
: do something with the argument (the string) and then return it.
What is done with the argument is this:
(del
[@1 0..(match
-regex @1 #/\s+/)])
. The
match
-regex
function returns the number of characters at the front of the string which match the regex
\s*
: one or more spaces. The return value of this is used to express a range
0..length
which is applied to the string. The syntax
(del
[str from..to])
deletes a range of characters in the string.
Lastly, a pedestrian right trim:

@(do(defuntrim-right(str)(for()((and(>(lengthstr)0)(chr-isspace[str-1]))str)((del[str-1]))))(formatt"{~a}\n"(trim-right" a a "))(formatt"{~a}\n"(trim-right" "))(formatt"{~a}\n"(trim-right"a "))(formatt"{~a}\n"(trim-right"")))

Template Output Version

This version massages the data in a way that is suitable for generating the output template
-wise with an
@(output)
block.
The data is in a file, exactly as given in the problem. Parameter N is accepted from command line.

Now iterate over the data, requiring a variable called
record
to be bound in each iteration, and suppress all other variables from emerging. In the body of the collect, bind four variables. Then use these four variables to create a four
-element list which is bound to the variable
record
. The
int
-str
function converts the textual variable
salary
to an integer:

This code binds some successive variables.
n
is an integer conversion of the command line argument.
dept
-hash
is a hash whose keys are department strings, and whose values are lists of records belonging to each respective department (the records collected previously). The hash keys are the departments; these are extracted into a variable called
dept
for later use. The
ranked
variable takes the ranking information.
The salary ranking info is obtained by sorting each department's records by descending salary and then taking a 0..n slice of the list.
The "apply mapcar list" is a Lisp pattern for doing a matrix transpose. We use it twice: once within the department over the list of records, and then over the list of lists of records.
The reason for these transpositions is to convert the data into individual nested lists, once for each field. This is the format needed by the TXR
@(output)
clause:

Here, all these variables are individual lists. The
dept
variable is a flat list; one nesting of
@(repeat)
iterates over it. The other variables are nested lists; a nested repeat drills into these.

Lisp Output Version

In this version, the Lisp processing block performs the output, so the conversion of records into lists for the template language is omitted, simplifying the code.
The output is identical to the previous version.

TXR source code and I/O are all assumed to be text which is UTF
-8 encoded. This is a self
-contained implementation, not relying on any encoding library. TXR ignores LANG and such environment variables.
One of the regression test cases uses Japanese text.
Characters can be coded directly, or encoded indirectly with hexadecimal escape sequences.
The regular expression engine, also an original implementation, self
-contained within TXR, supports full Unicode (not only the Basic Multilingual Plane, but all planes).
However, as of version 89, identifiers such as variables are restricted to English letters, numbers and underscores.
Whether or not text outside of the Basic Multilingual Plane can actually be represented by a given port of TXR depends on the width of the C compiler's wchar_t type. A 16 bit wchar_t restricts the program to the BMP.
Japanese test case:

This is a general solution which implements a command
-line tool for updating the config file.
Omitted are the trivial steps for writing the configuration back into the same file; the final result is output
on standard output.
The first argument is the name of the config file. The remaining arguments are of this form:

This works by reading the configuration into a variable, and then making multiple passes over it, using the same constructs that normally operate on files or pipes. The first 30% of the script deals with reading the configuration file and parsing each command line argument, and converting its syntax into configuration syntax, stored in
new_opt_line
. For each argument, the configuration is then scanned and filtered from
config
to
new_config
, using the same syntax which could be used to do the same job with temporary files. When the interesting variable is encountered in the config, using one of the applicable pattern matches, then the prepared configuration line is substituted for it. While this is going on, the encountered variable names (bindings for
var_other
) are also being collected into a list. This list is then later used to check via the directive
@(bind opt_there option)
to determine whether the option occurred in the configuration or not. The bind construct will not only check whether the left and right hand side are equal, but if nested lists are involved, it checks whether either side occurs in the other as a subtree.
option
binds with
opt_other
if it matches one of the option names in
opt_other
. Finally, the updated config is regurgitated.

$ txr configfile2.txr configfile NEEDSPEELING= seedsREMOVED NUMBEROFBANANAS=1024 NUMBEROFSTRAWBERRIES=62000
# This is a configuration file in standard configuration file format
#
# Lines begininning with a hash or a semicolon are ignored by the application
# program. Blank lines are also ignored by the application program.
# The first word on each non comment line is the configuration option.
# Remaining words or numbers on the line are configuration parameter
# data fields.
# Note that configuration option names are not case sensitive. However,
# configuration parameter data is case sensitive and the lettercase must
# be preserved.
# This is a favourite fruit
FAVOURITEFRUIT banana
# This is a boolean that should be set
; NEEDSPEELING
# This boolean is commented out
SEEDSREMOVED
# How many bananas we have
NUMBEROFBANANAS 1024
NUMBEROFSTRAWBERRIES 62000

In TXR, the preferred way to render data into octets is to convert it to a character string. Character strings are Unicode, which serializes to UTF
-8 when sent to text streams.

@(do;; show the utf8 bytes from byte stream as hex(defunput-utf8(str:stream)(setstream(orstream*stdout*))(for((s(make-string-byte-input-streamstr))byte)((setbyte(get-bytes)))((formatstream"\\x~,02x"byte))));; print(put-utf8(tostring0))(put-line"")(put-utf8(tostring42))(put-line"")(put-utf8(tostring#x200000))(put-line"")(put-utf8(tostring#x1fffff))(put-line"");; print to string and recover(formatt"~a\n"(read(tostring#x200000)))(formatt"~a\n"(read(tostring#x1f0000))))

Variables have a form of pervasive dynamic scope in TXR. Each statement ("directive") of the query inherits the binding environment of the previous, invoking, or surrounding directive, as the case may be. The initial contents of the binding environment may be initialized on the interpreter's command line. The environment isn't simply a global dictionary. Each directive which modifies the environment creates a new version of the environment. When a subquery fails and TXR backtracks to some earlier directive, the original binding environment of that directive is restored, and the binding environment versions generated by backtracked portions of the query turn to garbage.
Simple example: the
cases

@(cases)
hey @a
how are you
@(or)
hey @b
long time no see
@(end)

This directive has two clauses, matching two possible input cases, which have a common first line. The semantics of cases is short
-circuiting: the first successful clause causes it to succeed and stop processing subsequent clauses. Suppose that the input matches the second clause. This means that the first clause will also match the first line, thereby establishing a binding for the variable
a
. However, the first clause fails to match on the second line, which means that it fails. The interpreter then moves to the second clause, which is tried at the original input position, under the original binding environment which is devoid of the
a
variable. Whichever clause of the
cases
is successful will pass both its environment modifications and input position increment to the next element of the query.
Under some other constructs, environments may be merged:

@(maybe)@a bar
@(or)
foo @b@(end)

The
maybe
directive matches multiple clauses such that it succeeds no matter what, even if none of the clauses succeed. Clauses which fail have no effect, but the effects of all successful clauses are merged. This means that if the input which faces the above
maybe
is the line
"foo bar"
, the first clause will match and bind
a
to foo, and the second clause will also match and bind
b
to bar. The interpreter integrates these results together and the environment which emerges has both bindings.

Here, the TXR pattern language is used to scan letters out of two arguments, and convert them to upper case. The embedded TXR Lisp dialect handles the Vigenère logic, in just a few lines of code.
Lisp programmers may do a "double take" at what is going on here: yes
mapcar
can operate on strings and return strings in TXR Lisp.
(repeat key)
produces an infinite lazy list; but that's okay because
mapcar
stops after the shortest input runs out of items.
Run:

Robust

Large amounts of the document are matched (in fact the entire thing!), rather than blindly looking for some small amount of context.
If the web page changes too much, the query will fail to match. TXR will print the word "false" and terminate with a failed exit status. This is preferrable to finding a false positive match and printing a wrong result. (E.g. any random garbage that happened to be in a line of HTML accidentally containing the string UTC).

This program shows how most of the information in the XML can be extracted
with very little code, which doesn't actually understand XML.
The name Émily is properly converted from the HTML/XML escape syntax.

Both the
op
and
do
operators are a syntactic sugar for currying, in two different flavors. The forms within
do
that are symbols are evaluated in the normal Lisp
-2 style and the first symbol can be an operator. Under
op
, any forms that are symbols are evaluated in the Lisp
-2 style, and the first form is expected to evaluate to a function. The name
do
stems from the fact that the operator is used for currying over special forms like
if
in the above example, where there is evaluation control. Operators can have side effects: they can "do" something. Consider
(do set a @1)
which yields a function of one argument which assigns that argument to
a
.
The compounded
@@
is new in TXR 77. When the currying syntax is nested, code in an inner
op/do
can refer to numbered implicit parameters in an outer
op/do
. Each additional
@
"escapes" out one nesting level.

The following gives us a shell utility which we can invoke with arguments like "rosetta 0" to get the first page of search results for "rosetta".
The two arguments are handled as if they were two lines of text from a data source using @(next :args). We throw an exception if there is no match (insufficient arguments are supplied). The @(cases) directive has strictly ordered evaluation, so the throw in the second branch does not happen if the first branch has a successful pattern match. If the similar @(maybe) or @(some) directives were used, this wouldn't work.
A little sprinkling of regex is used.