Beyond the need (or not) for such a function, my immediate thought was, How would I write that in OCaml? Sadly, it seems I've become monolingual enough that I have to translate most of what I "hear" into my own internal language. The complaint is neither here nor there, the point is that I reasoned pretty operationally: "I have 1 zeroes, 2 ones, 3 twos and 4 threes", hence:

with an accumulating parameter tacked in by tail-recursive reflex. I smelled something wrong when the compiler told me that:

val where : 'a list -> 'a list = <fun>

"Polymorphic?" I thought, and I tried:

# where [1;2;3;4] ;;
- : int list = [2; 3; 3; 4; 4; 4]

Yes, wrong, but…

Of course, I didn't debug the function: the type was too wrong, and it was too obvious a clue that I needed look no further than go. Type-checking by hand, I saw that, at the end, the returned list has type α list; that must be because rep_append returns an α list. Type-checking that, its result type is α list because it appends e to an α list, so that e must have type α instead of int. That variable takes the value from x, the head of the argument list, but that is the number of times to repeat n, that very element's index.

The arguments are swapped: I wrote above rep_append appends a list of x repeated n times, when I should've written rep_append appends a list of n repeated x times. The correct function is:

is satisfactory. The function works correctly, too, but that was to be expected.

This is a typical example of proof-directed debugging. That is, a concrete answer to the question, What good is a strong statically-typed language? It's not only that the types prevent you from connecting the pieces together when they don't fit, even though that's 90% of their usefulness. It is also, and especially, the fact that the type of a function is a proof that it does what you think it does. If the type is wrong, the function must be wrong too, automatically, and you don't need to run it to know that.

Much as with a real proof, the trick is to work backwards to see where did an error creep in. A classical example are the trick "proofs" that purport to demonstrate that 1 = -1 or some such, "proofs" that crucially depend on dividing by zero. Working mathematicians everywhere are confronted daily with this, and therein lies the rub.

A crucial objection to the discipline of programming with a strong, statically-typed language is the view that "you need a degree in Mathematics to program in it". It is often repeated that it is "more productive" (easier, in non-obfuscated terms) to program in a dynamic language, and to debug in run-time, or using test-driven development.

The pragmatics of using a statically-typed language are not as onerous as that might suggest. Unfortunately you really have to try it for yourself to see that (but the same is true of the extreme and agile methodologies.) It looks difficult, and it is difficult to program with a strict type system if you're not used to it. But, and this is important, the best languages (i.e., not Java) lend you a very helpful hand: you don't have to prove any theorems, because the compiler does it for you. I checked a proof, I didn't have to write one. That's the compiler's job. And this particular check, that every term has a given type, is not very difficult once you have the final term; you just have to acquire the knack of working back the types of everything until you get to the source.

So, don't just take my word for it, give it a spin. You'll just have to trust me in that strong statically-typed languages are an effective alternative to TDD and agile methodologies.

By the way, that function is ridiculously operational. What would SPJ do? Easy: just be lazy!:

let where l = concat % map (fun (i, x) -> take x $ all i) % index $ l

That is (from right to left): index the list's elements; convert each pair of an index and a repetition count into the corresponding list, and flatten the result. Of course, this requires a couple of definitions:

(yes, it's the Project Gutemberg ebook for Bram Stoker's Dracula, text version; a suitably large corpus to run tests with). That would read: "Safely open the named file as input; parse it into lines, skip until finding the line that matches the regular expression; skip that line. Avoid empty lines, and split the remainder into words. Change the words to upper case, and take the first twenty and make a list of those." Well, this can be done with very nice code. But first, I'd like to do away with the somewhat "normal" bits.

Regular expressions in OCaml live in the Str module; they're low-level and inconvenient to use. A little wrapping-up makes them more palatable:

Note that I close over the compilation of the regular expression, so that repeated calls to matches with the same regexp don't waste effort. Of course, is_empty is simply:

let is_empty = matches "^[\t ]*$"

The mysterious "piping arrow" is nothing more than function composition in disguise:

let (|>) f g x = g (f x)

This is the reverse (or categorical) composition; its more popular sibling (one could say, its twin) is:

let (%) f g x = f (g x)

The magic sauce in the program is the use of OCaml's imperative streams. They are like lists, but with a lazy tail which is accessed by destructively taking the head. Like lazy lists, they allow for computation with infinite data, and they use just as much as they need. Unlike lazy lists, they don't support persistence, that is, the ability to keep reference to arbitrary versions of a value. For stream processing of text files they're a perfect match. The idea is to work with functions from streams to streams, or stream transformers. This is made very pleasing by the CamlP4 syntax extension for streams and stream parsers. To use it, you must compile your code with camlc -pp camlp4o, or #load "camlp4o.cma" (or #camlp4o if you're using findlib) in your source file, as explained in the manual.

By far the simplest transformer is the one that skips the head of a stream, and returns the (now headless) result:

Here the single quote indicates that we're talking about a concrete stream element, as opposed to a substream. It's natural to extend this to a transformer that keeps dropping elements as long as they satisfy a test or predicatep:

This is the first pattern common to all transformers: return the passed in stream once we're done consuming, or chopping off, elements at its head. This is signalled by the use of the empty pattern [< >] which matches any stream. Of course the opposite filter is also handy and comes for free:

let skip_until p = skip_while (not % p)

We now know how skip_until (matches "^\\*\\*\\* START OF") |> skip works. A variation in the beheading game is to drop a number of elements from the stream:

(note that this function is also purposefully partial). This is the second pattern common to all transformers: insert a substream on the right side of a matching by recursively invoking the transformer. Similar to this code is the filtering of a stream to weed out the elements satisfying a predicate:

The fact that in the first matching the recursive call to filter is outside a stream bracket pair means that it will eagerly call itself recursively until it finds an element not matching the predicate.

It's not too difficult to devise a fold over the elements of a stream:

With this, a pipeline like skip_until (matches "^\\*\\*\\* START OF") |> skip |> filter is_empty |> take 20 |> to_list just works, if only we had something to feed it. It could be handy to zip two streams together. Unfortunately, I can find no straightforward way to synchronize the advance of both streams, as there isn't a parallel match of two stream parsers. If you run camlp4o on a file containing the definitions so far, you'll see that the syntax gets rewritten (or expanded) to direct OCaml code that calls into the functions marked For system use only, not for the casual user in stream.mli. It is possible to emulate the result of a hypothetical stream parser doing the zipping by writing directly:

(why is it a function? Also, note Conal Elliot's remark in his recent post about streams being functions over the natural numbers), a transformer for numbering a stream is as easy as (nat |> zip) (). Another way to build streams is to use the syntax:

let rec repeat e = [< 'e; repeat e >]

Stream expressions are not only used in parsers, as the right hand side of the matchings in the preceding functions hint at. This is possibly the simplest infinite stream you can have.

As it is well-known to Haskellers, streams have algebraic and categorical structure. For instance, they are functors; in other words, they can be mapped over their elements with a function that transforms them pointwise:

Note that the difference between fmap and bind is a single quote! This is a direct consequence of the fact that every monad is a functor, via the identity fmap fx = bind x (return . f). Now words is straightforward:

It only remains one last piece of the puzzle: how does an input channel get transformed into a stream of lines? With a transformer (a parser, actually), of course. This is a rather simple one (especially compared to the imperative monstrosity it replaced; you don't want to know), once you dissect it into its components:

The characters in a line are accumulated in the buf buffer. Once a line is complete, get resets the buffer and returns its contents. The sub-parser islf tries to eat the LF in a CRLF pair. Since it includes an empty match, it cannot fail. The recursive parser read scans for a LF, or for a CR or CRLF terminator, accumulating characters until a complete line is included in the result stream. Invoking islf it in the right-hand side of the CR match in read is safe, in the sense that it cannot lead to back-tracking.

At last we get to the point where we can read from a file. For that Stream.of_channel directly and efficiently turns the contents of a file into a stream of characters:

where k is the continuation that is safely wrapped in an exception handler that closes the file even in the event of an error. Safe input-output in the face of exceptions is an idiom borrowed from LISP, the well-known WITH-OPEN-FILE.

You could find various ways to exercise this small DSL; by way of example, a word count:

Which of course is not very accurate, as the first line should be numbered 1, not 0, and the line numbers don't correspond to the text but to the filtered stream. Let's try something else: define the arrow parts of the pair projection functions:

let first f (x, y) = (f x, y)
and second g (x, y) = (x, g y)

Now, number the lines as read, and then apply the processing to the second component of the paired-up lines:

About Me

Long-time software developer and architect by day, specializing in boring-but-crucial bits of interconnect between disparate architectures. OCaml is my language of choice, but I feel extremely comfortable with C, Java and SQL whenever the need arises.

By night, I'm a calligrapher and a seeker of the hidden light of understanding in the noise of daily living.