Functional Lenshttps://functionallens.wordpress.com
(On Mathematics and Computation)Sat, 07 Apr 2018 08:31:37 +0000enhourly1http://wordpress.com/https://secure.gravatar.com/blavatar/9350c8ba87698e9583020bca9384a723?s=96&d=https%3A%2F%2Fs0.wp.com%2Fi%2Fbuttonw-com.pngFunctional Lenshttps://functionallens.wordpress.com
Typed Functional Prototyping of A Database Import Libraryhttps://functionallens.wordpress.com/2013/01/25/typed-functional-prototyping-of-a-database-import-library/
https://functionallens.wordpress.com/2013/01/25/typed-functional-prototyping-of-a-database-import-library/#respondFri, 25 Jan 2013 17:52:16 +0000http://functionallens.wordpress.com/?p=138I frequently have to "import" data from one database to another in order to get on with more interesting work, a process sometimes ingloriously referred to as "ETL: Extract, Transform, Load". Usually this must be done in some programming language already established at an organization, such as Python or Scala. However, in developing a library to do this quickly and fairly declaratively, I found it useful to write a mostly-undefined prototype in Haskell, to "type-check" my thoughts. This essay is very much about databases as they exist in the wild, but draws quite a bit of inspiration from the philosophical aspects of Rich Hickey’s talksonprogramming and Datomic / the database as a value. There is undoubtedly extensive database theory literature that would also be helpful; please do point me towards it in the comments.

Databases, datums, and imports

It is useful to step back to basic principles to model this scenario sufficiently abstractly. In particular, I want to have no particular notion of a database, except that it is a place that one stores things one learns. So I will define abstract types DB for storage and Datum for the input.

>importControl.Monad.Reader-- Ignore for now, imports are just required up top>>dataDatum-- Essentially a row of the source database, if row-based>dataDB-- The target database

An Import is a function that takes just one additional datum and incorporates all the learned knowledge into the database.

>typeImport=Datum->DB->DB

Not just any function is suitable to be an Import. Learning from the same datum twice – in this setting – should not result in more new information. In order words for f :: Import we require that f be idempotent.

f datum == f datum . f datum

Moreover, if we treat the analogy of database contents with knowledge very strictly, we should be unable to ever "know" a contradiction. We can add information, but never remove or change it. This idea is usuall discussed as the "information ordering" but I will just call it <= where no information (aka NULL or ⊥) is less than any value, and all other values are related only to each other.

db <= f datum db

Note that this is stronger than monotonicity, as a constant function is always monotonic. The best word for this property in the context of databases is that f is consistent.

Knowing these properties of an import, I can be assured that it is safe to run the import on all the data available as many times as I like. The order may effect how many runs it takes, since we do not require commutativity:

f datum1 . f datum2 =?= f datum2 . f datum1

However, we can be assured that re-running will eventually hit a fixed point. In practice, it is usually very easy to order an import so that a single run suffices, two in more complex scenarios.

There are two equivalent yet legitimately interesting ways to think about importing a list of data. The first is the obvious one: For each piece of data, transform the database and pass on the result.

The second considers functions DB -> DB to be more central, and does not even bind the variable db in its definition: It first composes all the single transformations into one mondo transformation, which is then applied to the input database.

Conclusions, Deductions, and Translations

The above definition is complete and flexible, but there is more structure to most databases, hence most imports. To model the extremely likely scenario that the database has an atomic element, such as rows for SQL or documents for various flavors of NoSQL, call these things Conclusions with a single fundamental operation save.

Now a typical import will consist of drawing some set of Conclusions for each Datum encountered, possibly by combining the datum with information already stored in the database. For lack of a better name, I will call this a Deduction, and transform it into an Import with the help of save.

However, it may be that things are even simpler and that each Datum results in a single new Conclusion. This is more the usual notion of a "data import" and means – to stretch an already thin analogy – that each Datum is sort of already a Conclusion but with respect to the wrong context, so I’ll call this a Translation.

However, there is a major problem: Neither translate nor deduce result in consistent imports, because multiple Datums may result in the a Conclusion with the same primary key but different attributes. This is almost never desirable; when two translations or deductions emit a conclusion with the same primary key, it is intended to be consistent with the database, i.e. they should only emit conclusions con such that save con is consistent

db <= save con db

In normal database parlance, this is like an "upsert" except on all attributes as well. At the level of rows or documents, we must first fetch the document that would be created, and then modify it according to the new conclusion. Any conflicting attributes is an error (hacks excepted, of course). I will break these apart into Lookup and Augment, which are then recombined in the brilliantly named lookupAndAugment.

One could implement lookupAndAugment to enforce that the output conclusion is consistent with the input. 1

Making it imperative

To get a step closer to the imperative scripting that this prototype targets, this section adjusts the definitions above to stop passing the database around quite so much.

A first step is to note that the database is "always there" as part of the environment, which is exactly what the Reader monad represents. Here are all of the above definitions rewritten without ever taking the database as input.

And it is nice to see that moving from Reader DB to IO does not change the text of lookupAndAugment, so I have some confidence that it is a canonical definition.

That’s it! Just a bit of how I do typed functional prototyping before committing to the task of implementing in a lower-level scripting language.

In a test-edit-debug cycle, you’ll need a way to turn off consistency checking, unless you snapshot and reset the target database with each run. A good idea, but slow, and it is usually fine to operate on a test database and just mutate it repeatedly.

I recently gave a little demonstration entitled "What is Defunctionalization?" for UCSC TWIGS (the acronym, stolen from a similar seminar in the the U. Mass. math department, stands for The "What Is … ?" Graduate Seminar). The inspiration for this talk was just to present what I’d learned after Conor McBride’s brilliant presentation at POPL’08 drove me to put the words "Olivier Danvy defunctionalize continuation" into Google.

I coded the simplest examples from

Defunctionalization at work. O Danvy, LR Nielsen. PPDP 2001.

in literate Haskell for the audience, and also showed off QuickCheck a little to make sure the translation was correct (finding one error, if I recall).

This blog post is a merging of my talk outline and new stuff that came up live. Try loading it up in GHCi or Haskell-mode and running the examples and QuickCheck properties.

Broadly, defunctionalization is transforming a program to eliminate higher-order functions. Rather than focus on its use for compilation (see this H Cejtin, S Jagannathan, S Weeks paper on MLTon) or analyses (see Firstify from N Mitchell and C Runciman). I wanted to emphasize its use in understanding your own program, along the lines of Wand’s Continuation-Based Program Transformation Strategies (JACM 1980).

Defunctionalization replaces all the first-class functions with an explicit data structure Lam1 and a global apply1 function, essentially embedding a mini-interpreter for just those lambda terms occurring in the program.

Note how LamBTree looks just like the definition of BinaryTree, because it is a catamorphism, hence a hylomorphism, i.e. a recursive function with a call tree that looks like a BinaryTree or whatever structure you are hylo’ing over. (See Sorting morphisms for a beautiful examples of using this to understand your program (L Augusteijn. AFP 1998)). So walk is pretty much the identity function on trees, and then applyBTree is a flatten function with an accumulating parameter. Ignoring the intermediate structure, then we see defunctionalization as a way to derive accumulating parameters.

applyTree is just fold over a tree, as promised, and we’ve recovered the tree data structure. Defunctionalize your continuation

Suppose you have a first-order (boring!) program. You can’t have any fun until you find a way to introduce some first-class functions. A classic way to introduce a gratuitous number is to convert your code into continuation-passing style. Let’s try it.

This is Danvy’s ‘s example of a parser to recognize the language 0^n 1^n. It is written with an auxiliary function in the Maybe monad to simulate throwing an exception as soon as we can reject the string.

I didn’t get to the rest of this in my talk, and anyhow it is most interesting to people who play with operational semantics a lot. This last bit is from Danvy’s paper On Evaluation Contexts, Continuations, and The Rest of Computation from the continuation workshop in 2004.

We have a simple arithmetic language, and two ways of giving it a semantics: We can either reduce the expression a single small step, using reduceAllTheWay to normalize it, or we can eval the expression directly to a result.

The data type ContReduce is the now-common notion of an "evaluation context" which some researchers prefer because it separates the important rules about how terms are reduced from the rules that just tell you where in a term reduction happens.

Hey it looks like almost the same thing! The difference is in how we interpret the data structure. In the previous case, applyContReduce just used it for navigation. In this case, applyContEval calls back into evalCPSdefun to keep the evaluation rolling.

If, like myself, you liked this because you feel there is important and interesting structure underlying operational semantics that is hidden by its many superficial forms, then you’ll probably like this additional reading:

]]>https://functionallens.wordpress.com/2008/05/24/what-is-defunctionalization/feed/0kennknowlesDebugging with Open Recursion Mixinshttps://functionallens.wordpress.com/2008/05/10/debugging-with-open-recursion-mixins/
https://functionallens.wordpress.com/2008/05/10/debugging-with-open-recursion-mixins/#commentsSat, 10 May 2008 21:19:25 +0000http://www.kennknowles.com/blog/2008/05/10/debugging-with-open-recursion-mixins/The call is out for submissions to the next issue of The Monad.Reader! To get an idea of the content (and because D Stewart told us all to read every past issue) I cracked open Issue 10, which has a nice tutorial by B Pope on the GHCi debugger.

But having just finished a post using open recursion, it immediately cried out to me that open-recursive functions already have some debugging hooks for tracing/breakpoints/etc. Naturally, some complications arose, and I got to try out some other cool ideas from the literature.

To combine the State in which I store the memoization table with the IO I use for debugging, I use

That was easy! Now when I also mix in the memoization I should see a lot of those recursive calls drop away. But I cannot simply write fix (gmFib . inspect "fib" . memoize) because mixing in inspect fixes the underlying monad to IO, while mixing in memoize fixes it to Memoized Int Int. I need to run this computation in a monad that supports the operations of both IO and State. Well, in category theory terms, the smallest "thing" that contains two other "things" is their coproduct, so this is exactly what the Luth-Ghani paper mentioned above is for!

I’ll be inlining and de-generalizing a bunch of the (beautiful) code from the paper to make it look more like something an "in the trenches" programmer would write.

This data type is not exactly the coproduct, but rather a data type that can represent it, like using a list to represent a set — there are more lists than sets, but if you respect the abstraction you are OK. Most of the ways of processing this data structure can be written in Haskell using only Functor instances for the underlying structure, but to make sure we only use it in the appropriate places I’ve just made the stronger requirement that m1 and m2 be Monads everywhere. But I still want fmap so I turn on undecidable instances and add the following.

>instanceMonad m =>Functor m where>fmap f m = m >>= (return. f)

Now you might ask why I’m not using monad transformers. Four reasons come to mind:

I wanted to try out the contents of this paper.

The coproduct is defined for two arbitrary monads, without writing a special version of either that "holds" another inside.

The coproduct can have the two "layers" interleaved in more arbitrary ways

The coproduct is theoretically simpler and more fundamental.

This is now one of those structures that is so abstract that you can figure out how to process it just by writing the only function of the proper type.

But wait, how do I run this thing? It has IO and Memoized layers all mixed up! Intuitively, I’m sure you believe that if I start with an empty memo table and start running an IO that has some memoized bits in it, I can thread the memo table throughout.

In classic Haskell style, we can separate the "threading" concern from the "running" by writing an untangling function of type Plus m1 m2 a -> m1 (m2 a). But in fact, we don’t even need to do that much work. Discussed but not hacked up in the Luth-Ghani paper is the idea of a distributivity law, which in hacking terms means a function that just does one bit of the untangling, specifically a single "untwist" forall a. m2 (m1 a) -> (m1 (m2 a)). If we can write an untwist function, then a fold over the monad coproduct does the rest of the untangling.

This function essentially corresponds to the MonadIO instance of the StateT monad transformer. More generally, Luth-Ghani show that when you can write one of these distributivity laws, then using the coproduct is isomorphic to using monad transformers, so I already knew this part would work out

Another way to convince yourself that your function is correct is to think… how many functions even have the necessary type? Not very many, since you need the higher-rank type for the parameter for this guy to even type check! When dealing with very abstract functions, you often gain enough via parametericity to make up for the loss in intuitive clarity.

I have a vague feeling that a real debugging package could be made from this approach, but at if not at least today was some fun.

]]>https://functionallens.wordpress.com/2008/05/10/debugging-with-open-recursion-mixins/feed/2kennknowlesCTL Model Checking in Haskell: A Classic Algorithm Explained as Memoizationhttps://functionallens.wordpress.com/2008/05/07/ctl-model-checking-in-haskell-a-classic-algorithm-explained-as-memoization/
https://functionallens.wordpress.com/2008/05/07/ctl-model-checking-in-haskell-a-classic-algorithm-explained-as-memoization/#commentsWed, 07 May 2008 09:13:46 +0000http://www.kennknowles.com/blog/2008/05/07/ctl-model-checking-in-haskell-a-classic-algorithm-explained-as-memoization/As an exercise, since my reading group was discussing model checking this week, I implemented the classic model checker for CTL specifications from the 1986 paper

which actually eliminated an auxilliary function from the algorithm, and made the Haskell specification of the meaning of CTL connectives clearer than my English prose! (But I’ll still explain it in English)

Here is the easy example from the memoization paper, before getting on to model checking.

To enable functional mixins, we write our functions using open recursion. Instead of a function of type a -> b we write one of type Gen (a -> b) and then later “tie the knot” with fix (reproduced here for reference)

>typeGen a = (a -> a)

> fix ::Gen a -> a
> fix f = f (fix f)

The classic example they start with is fibonacci, but don’t stop reading! It is just to illustrate the technique.

> fib ::Int->Int> fib 0=0> fib 1=1> fib (n+2) = fib n + fib (n+1)

By the time you get to fib 30 it takes a dozen or seconds to return on my poor old computer. Rewrittin in open recursion as

The language for which Clarke et al give a graph-based algorithm is called CTL (Computation Tree Logic). It is a logic for specifying certain restricted kinds of predicates over the states of a state machine, for today a finite state machine.

In the Clarke et al paper, the algorithm is expressed by induction on the formula f you want to check: First, label your state-space graph with all the atomic formula that hold at each state. Then, label with each the compound formula of height two that holds. Etc, etc, you are guarantee that the graph is already labelled with subformulas at each step.

Like dynamic programming, this is simply a complicated way of expressing memoization. In fact, they even use a depth-first search helper function that is completely eliminated by expressing it as a memoized function. This code is considerably shorter and, I think, clearer than the pseudocode in the paper.

Today we have fancy algorithms involving BDDs and abstraction, so I’m not claiming anything useful except pedagogically. I do wonder, though, if this code gains something through laziness. It certainly traverses the state space fewer times (but I’m sure an implementation of their algorithm would do similar optimizations).

One reason is simply that I need a curry/uncurry wrapper for my two argument monadified function. The deeper thing is that cyclicMemoize2 False inserts a fake memoization entry while a computation is progressing. If there is ever a “back edge” in the search, it will return this dummy entry. For CTL, the auxilliary depth-first search used in the paper for AllUntil returns False in these cases, so I seed the memo table accordingly. This is because by the time you have recursed around a cycle, that means that the f2 you are searching for did not occur on the cycle, so it never will.

To play with it, I’ve only made a couple of examples involving stop lights (of occasionally curious colors). I’d love more, and you’ll undoubtedly find bugs if you actually run something significant.

]]>https://functionallens.wordpress.com/2008/05/07/ctl-model-checking-in-haskell-a-classic-algorithm-explained-as-memoization/feed/2kennknowlesDrawing fractals in Haskell with a cursor graphics DSEL and a cute list representationhttps://functionallens.wordpress.com/2008/04/16/drawing-fractals-in-haskell-with-a-cursor-graphics-dsel-and-a-cute-list-representation/
https://functionallens.wordpress.com/2008/04/16/drawing-fractals-in-haskell-with-a-cursor-graphics-dsel-and-a-cute-list-representation/#commentsWed, 16 Apr 2008 18:02:52 +0000http://www.kennknowles.com/blog/2008/04/16/drawing-fractals-in-haskell-with-a-cursor-graphics-dsel-and-a-cute-list-representation/I’m reading the very fun Measure, Topology, and Fractal Geometry by GA Edgar, and thought I’d hack up some of the examples in Haskell. So this post implements cursor graphics in OpenGL in (I think) DSEL style, demonstrating the StateT and Writer monad gadgets from the standard library and a cool "novel representation of lists" due to R Hughes. On the fractal side, I’ll try to convince you that fractals are not just cute pictures, but extremely important illustrations that the real numbers are weird. As usual, you can save this post to Fractals.lhs and compile it with ghc --make Fractals

It seems that a couple of people have gone before me making actually useful fractal packages (the packages are more specifically for "Iterated Function Systems" and "L-systems", respectively) or prettier pictures in their blog posts.

The state of the cursor is a position, direction, and whether the ink is activated (so I can move it about without drawing lines everywhere). The neutral state is at the origin, pointed due east, with the ink on.

A first approach to the semantics is that sequence of commands should have the state threaded through it. The type of a program would be State CursorState () (the unit is because there is no return value). I would then get the final state of prog starting from the neutral state with execState program neutral.

But I don’t actually care about the final state; I want to evaluate this program only for its side effects: Whenever I run a forward command, it should leave a line segment if ink was enabled. This situation is just what the Writer monad is for. When I move from s1 to s2, I call tell [(s1,s2)] in order to "log" this line segment

I actually need to carry the state and the log around, so how do I combine these monads? Well, there’s a huge trail of literature to follow on that! If you are interested, Composing Monads Using Coproducts by C Lüth, N Ghani has a canonical way, and lots of references. But for today, the officially sanctioned approach is to use a monad transformer; in many practical cases this coincides with the coproduct.

So a first attempt at the type of a cursor program would be:

type CursorProgram = StateT CursorState (Writer [(Point2,Point2)]) ()

What is all this? Well, CursorState is the state I want to pass around, and Writer [(Point2, Point2)] is the internal monad. The type is large, but it says a lot! I have a state that is getting passed along, and a log that is being kept. The only thing I have to watch for is to use lift . tell instead of tell because I need to apply it to the inner Writer.

But you shouldn’t use list append for a log in real life. In the above hypothetical definition, the log is a list, so every time computations are combined with >>= the writer monad will invoke a potentially-costly list append operation. The log will always grow from its tail, so I can build the list backwards and it would be efficient, but there is a cooler trick (actually already available as the dlist library, named for "difference lists"), from this paper:

A novel representation of lists and its application to the function "reverse". RJM Hughes. Information Processing Letters. 1986

In a nutshell: lists and partial applications of (++) are in bijection, so I can swap them. Here’s the definition and bijection.

Notice how if I append a bunch of singletons, it is the same number of applications of : as if I had built the list backwards. Then when I recover the list it costs O(n), the same as efficient reversal, so the two are equally good strategies in this case. It would be best to make a newtype for backwards lists with its own monoid instance anyhow, so the programming overhead is also the same.

Now I just wrap all the instructions to operate on this more-complicated state, adding logging to forward.

Viewing that won’t be very interesting; it is just an excuse to talk about it. But Wikipedia has a nice image:

Take the segment latex[0, 1] and remove the center third of it, keeping the endpoints intact. Now remove the center third of each of those segments, and again, and again. Taking the intersection of all of these sets (i.e. the limit) gives the Cantor Set.

So what does it look like? Well, it isn’t empty, since every point that is ever an endpoint sticks around forever. But those aren’t the only points: Convince yourself that is in the set. I’m pretty sure this can be phrased as a coinductive proof ([link to metric coinduction]).

The classic way of understanding the Cantor Set is to use ternary digits. See if you can convince yourself that the cantor set contains every real number that doesn’t require a 1 in its ternary expansion (hint: 0.0222222… = 0.1 so 0.1 doesn’t require a 1 in its ternary expansion)

So any number made of a possibly-infinite string of 0s and 2s is in there. Sound familiar? Well, if we use "1" instead of "2" then we are talking about all possibly-infinite binary strings, which a programmer should intuitively see is all real numbers in latex[0, 1]. So the Cantor Set is, in fact, uncountable!

This is what you get if you just keep folding a piece of paper in half in the same direction, then unfold it and set every fold to a right angle. Rather than recite facts from Wikipedia, I’ll highly recommend following the link as it is an article of rare quality. In fact, all of the articles about these curves were so unexpectedly satisfying that I ended up not feeling the need to write much.

What all of the above fractal curves except the Koch Snowflake have in common is self-similarity. The cantor set is essentially identical to each of its left and right hand sides, i.e. it is identical to the union of two scaled-down copies of itself, as the cursor-graphics code makes obvious. I said I wouldn’t talk about it, so I’ll just mention that if you write this as latexC = f1(C) ∪ f2(C) then f1 and f2 are the functions in "iterated function systems. I highly recommend a googling, or better yet, the book which prompted this post.

Below is the actual nitty-gritty OpenGL, Gtk, and IO code that plugs it together.

]]>https://functionallens.wordpress.com/2008/04/16/drawing-fractals-in-haskell-with-a-cursor-graphics-dsel-and-a-cute-list-representation/feed/6kennknowlesUsing OpenGL’s blending to visualize congestion in convex routing (in Haskell)https://functionallens.wordpress.com/2008/03/23/using-opengls-blending-to-visualize-congestion-in-convex-routing-in-haskell/
https://functionallens.wordpress.com/2008/03/23/using-opengls-blending-to-visualize-congestion-in-convex-routing-in-haskell/#respondSun, 23 Mar 2008 23:44:40 +0000http://www.kennknowles.com/blog/2008/03/23/using-opengls-blending-to-visualize-congestion-in-convex-routing-in-haskell/This is a question posed in my randomized algorithms class. If you are routing in a network whose connectivity looks "more or less" like a convex figure, what does the congestion look like? A quick way to make an educated guess is to draw a bunch of random line segments in such a convex shape and see where the colors get the brightest:

Congestion

This post is literate Haskell that will output that image, so save it to something like Congestion.lhs and run ghc --make Congestion.lhs. I started with the code from an old post and cut out the bits I didn’t need. The libraries used can be found here:

There are a bunch of Gtk and OpenGL calls to add, yielding drawSegments below. The only thing to note is the setting of blendFunc and blendAdd which tell OpenGL to add the color that is already on a pixel to the color I’m trying to draw. It is really a cheap trick to get OpenGL to intersect my line segments and add up the totals for me. One gotcha that held me up is that these settings have to be within the glDrawableGLBegin and glDrawableGLEnd calls.

]]>https://functionallens.wordpress.com/2008/03/23/using-opengls-blending-to-visualize-congestion-in-convex-routing-in-haskell/feed/0kennknowlesCongestionInfinite lazy Knuth-Bendix completion for monoids in Haskellhttps://functionallens.wordpress.com/2007/12/20/infinite-lazy-knuth-bendix-completion-for-monoids-in-haskell/
https://functionallens.wordpress.com/2007/12/20/infinite-lazy-knuth-bendix-completion-for-monoids-in-haskell/#respondThu, 20 Dec 2007 21:20:14 +0000http://www.kennknowles.com/blog/2007/12/20/infinite-lazy-knuth-bendix-completion-for-monoids-in-haskell/The Knuth-Bendix completion procedure (when it succeeds) transforms a collection of equations into a confluent, terminating rewrite system. Sometimes the procedure fails, and sometimes does not terminate, but The Handbook of Computational Group Theory by D Holt remarked that even in this case it generates an infinite set of rewrite rules that are complete, and An Introduction to Knuth-Bendix Completion by AJJ Dick also mentions that in the nonterminating case one can derive a semi-decision procedure. I naturally had to hack this up in Haskell, to create an infinite set of rewrite rules as a lazy list. This illustrates the very real software engineering benefit of decoupling creation and consumption of infinite data. As usual, this post is a valid literate Haskell file, so save it so something like KnuthBendix.lhs and compile with ghc --make KnuthBendix or load it up with ghci KnuthBendix.lhs

To give a little background, something I realize I’ve neglected in the past, Knuth-Bendix completion is a technique in universal algebra, which is essentially the study of unityped syntax trees for operator/variable/constant expression languages, like these:

>dataTerm op a =Operator op [Term op a]
>|VariableString>|Constant a

Your usual algebraic structures are for the most part special cases in universal algebra – anything that has an ambient set with some bunch of operators and equational axioms qualifies, and universal algebra supplies the variables to represent unspecified quantities.

For example, a monoid is a set S with an operator * and a special constant e obeying these axioms, where x, y, z are variables that can be replaced by any term.

Aside: Consider the obvious way to axiomatize a group in this framework. I think it is a nice example of the interaction of constructive logic and computation.

But anyhow today I’m not going to use this structure because I can explain and explore Knuth-Bendix more quickly by sticking to monoids. The full completion procedure, and its modern enhancements, works on terms with variables and uses unification where I use equality, and superposition where I use string matching. In the case of a monoid, the associative law lets me simplify the term structure from a tree to just a list, and since I’m not including variables, I deal just with words over my alphabet a:

>typeWord a = [a]

A presentation is just a formalism as above, specifying the ambient set X (here, the type parameter a), and some equalities R called relations, written in mathematical notation as

⟨X ∣ R⟩

and in Haskell

>typeRelation a = (Word a, Word a)
>typePresentation a = [Relation a]

For an easy example of a monoid and its presentation, Bool forms a monoid using the && operator which has identity True. Here is a presentation for the monoid in each notation (in general, presentations are not unique, and there’s a whole theory of messing about with them, which is exactly what we are about to do!)

In this case, the equations are just the definition for &&. Another monoid you’ve certainly seen as a programmer is the free monoid over X, which looks like this:

⟨X ∣ ⟩

> freeMonoid = []

In other words, it is just lists of elements of X since there are no rules for manipulating the words. The List monad is intimately related to this monoid.

Another good example is the following – see if you can figure out what it represents before going on.

⟨x ∣ xn = e⟩

Yes, it is a presentation for the monoid (in fact, group) Zn, the integers mod n. You are intended to interpret the group operation as addition modulo n, x as 1, and e as the identity 0, hence xn is really n. Of course, the abstractness of the presentation meshes well with this group’s other name, the “cyclic group of order n“.

So, formalism is great fun and all, but sane folk prefer for computers to deal with it if possible. Our goal is to decide whether two words are equal according to the identities in the presentation. In general, this is undecidable, see

The basic thing we are going to do is interpret equalities as rewriting rules. Everywhere the left-hand side appears, we insert the right-hand side. This function rewrites just once, if it finds an opportunity. I’m not going to even try for efficiency, since I get such a kick out of writing these pithy little Haskell functions.

This function fully reduces a word, assuming the presentation has “good” properties like always making words smaller according to a well-founded ordering. To make sure of this, we can orient any relation according to such an ordering.

But the results may be provably equal even though their normal forms are not, for example if I have this monoid

⟨x, y, z ∣ xy = yx = z⟩

> xyzExample = [ ("yx", "z")
> , ("xy", "z") ]

I know that "xz" == "zx" but I cannot obviously prove it

*Main> reduce xyzExample “xz”
“xz”
*Main> reduce xyzExample “zx”
“zx”

because the proof goes through xz = xyx = zx. Knuth calls these “critical pairs” and they, in some sense, represent the point in a proof where someone had to be more clever than just cranking on the rules. But just to preview how completion works,

Now on to the Knuth-Bendix procedure. The primary idea is to find all the possible critical pairs as described above, and to make new rewrite rules so they aren’t critical anymore. In other terminology, we look for exceptions to local confluence, and patch them up.

First, partitions is a list of ways to split a word into two nonempty parts.

Then criticalPairs takes all the superpositions (x,y,z) where xy is reducible by one relation, and yz is reducible by the second, and returns the result of the aforementioned reductions. The last function, allCriticalPairs just filters these for inequivalent pairs.

Just to save some redundant modification, we’ll assume the input presentation is reduced and oriented, and maintain that invariant ourselves with the help of this function. (Note how we run into the annoying aspect of Haskell where instances are global – so you can’t override the ordering on lists; there are good theoretical reasons for this annoyance but it still sucks!).

Then completion is pretty simple – just add the first non-reducible critical pair until there are no more.

But this version of completion simplifies the presentation at every step, as per the descriptions of the algorithm I’ve seen – I obviously can’t do that if the result is infinite. The best I can think of is to track the finite number of relations I’ve already processed, and reduce each relation as I consider it according to those. And since I bet the order that generated rewrite rules are visited matters, I use the interleave function to make sure that all rewrite rules are eventually hit.

The above function does not give the same completions as the finite version, but it does seem to work. Here is a quickcheck property to test it. Since interesting presentations are probably hard to autogenerate, I just specialize it to the xyzExample.

]]>https://functionallens.wordpress.com/2007/12/20/infinite-lazy-knuth-bendix-completion-for-monoids-in-haskell/feed/0kennknowlesCalculating the reflect-rotate-translate normal form for an isometry of the plane in Haskell, and verifying it with QuickCheck.https://functionallens.wordpress.com/2007/12/03/calculating-the-reflect-rotate-translate-normal-form-for-an-isometry-of-the-plane-in-haskell-and-verifying-it-with-quickcheck/
https://functionallens.wordpress.com/2007/12/03/calculating-the-reflect-rotate-translate-normal-form-for-an-isometry-of-the-plane-in-haskell-and-verifying-it-with-quickcheck/#respondMon, 03 Dec 2007 08:53:10 +0000http://www.kennknowles.com/blog/2007/12/03/calculating-the-reflect-rotate-translate-normal-form-for-an-isometry-of-the-plane-in-haskell-and-verifying-it-with-quickcheck/Any isometry of the plane has a unique normal form as the composition of a translation, rotation and reflection. This note computes this normal form and tests the implementation using the QuickCheck automated testing tool for Haskell. To generate random test data, I use another characterization of isometries as products of up to three reflections. This post is a valid literate Haskell file, so save it to something like Isometries.lhs and run ghc --make Isometries. Then check it with quickCheck +names Isometries.lhs.

Two aspects of this post are given about equal weight:

The mathematical content is elementary and can be understood by anyone familiar with basic trigonometry, as you might learn in high school. It is inspired by the book Symmetries by DL Johnson, one of the very excellent Springer Undergraduate Mathematics Series.

The tool QuickCheck is a fairly brilliant and easy-to-use automatic testing library for Haskell. I use it to verify each step of the post. All but the first of my QuickCheck properties found real errors!

>moduleMainwhere>importTest.QuickCheck

Now, the reflect-rotate-translate normal form is defined relative to a point P (the center of rotation) and a line L (of reflection). Concisely: f = t s r where t is a translation, s is a rotation about P, and r is a reflection about L (allowing the identity to be considered a reflection).

We can now express the normalForm function. As input, it takes an arbitrary "black-box" isometry as a Haskell function (the type doesn’t enforce that the function is actually an isometry, of course). As each component of the normal form is computed, the inverse of that component is applied before calculating the next component.

The rest of this post is writing and specifying the translation, rotation, and reflection helper functions. As an example, I’ve created this isometry using GeoGebra. I will maintain the convention that the source objects are blue and the output of a transformation is red.

Since reflections and rotations fix the origin, the translation is just wherever the origin gets sent.

On translations, this should be the identity, and we express that fact with the first of these QuickCheck properties. The second indicates that for an arbitrary isometry f = t s r, composing with the translation’s inverse should fix the origin, because s and r leave the origin where it is: t^{-1} f = t^{-1} t s r = s r Or in pictures:

The operator =~= is an "approximate" equality operator for floating point numbers.

To test this function, we use extensional equality on rotation functions rather than intensional equality on the angle since rotations do not have a unique representation (our function returns a canonical representation between 0 and 2). As inverting the translation component of an isometry fixes the origin, inverting this rotation should fix the point (1,0) and by implication the entire X axis. In pictures:

And we are done! To test, though, we need to tell QuickCheck how to generate isometries. I could reuse the basic isometries, but code duplication is desirable for consistency checking, so I’ll use another mathematical property to generate random isometries: they are all the composition of three reflections, which may each be the identity, of course.

Reflecting about an arbitrary line is pretty easy: translate so the line passes through the origin, rotate the line onto the horizontal axis, then reflect (sound familiar?). You can read more at Planet Math if you like, or figure out the formulae yourself with some high school trigonometry, or just let the computer compose the functions for you. Because I want to decouple my specifications and implementation, I worked out the formulae directly.

And normalForm should also be the identity on normal forms, to check that I’ve written apply correctly. A lot of these properties overlap so they fail together, but it doesn’t hurt to have a lot of properties.

The QuickCheck page has a script to run your tests in hugs, but I had to edit it somewhat to run it on my machine. In case you don’t want to do that, this file can just be compiled and run. Either way you run the checks, then you should see something like this:

]]>https://functionallens.wordpress.com/2007/12/03/calculating-the-reflect-rotate-translate-normal-form-for-an-isometry-of-the-plane-in-haskell-and-verifying-it-with-quickcheck/feed/0kennknowlesVisualizing 2D convex hull using Gtk and OpenGL in Haskellhttps://functionallens.wordpress.com/2007/11/20/visualizing-2d-convex-hull-using-gtk-and-opengl-in-haskell/
https://functionallens.wordpress.com/2007/11/20/visualizing-2d-convex-hull-using-gtk-and-opengl-in-haskell/#commentsTue, 20 Nov 2007 09:06:35 +0000http://www.kennknowles.com/blog/2007/11/20/visualizing-2d-convex-hull-using-gtk-and-opengl-in-haskell/This note shows how to use OpenGL with Gtk in Haskell. The result is a little visualization to check our implementation of the classic iterative convex hull algorithm.

This post is a valid literate Haskell file so save it to something like ConvexHull.lhs and compile with ghc --make ConvexHull. What you see above is what you’ll get when you run `./ConvexHull`

The best OpenGL tutorial for Haskell that I’ve found is this one from Michi’s blog, using GLUT to interface with X. For this tutorial we are going to use the Gtk GLDrawingArea widget, to illustrate the differences, which can be rather hard to find in the documentation.

Now, Haskell’s OpenGL binding has some quirks with regards to numeric overloading, so it helps to define some type aliases. Since I want to take cross products I’ll work in three dimensions, and define some basic operations on my points. The OpenGL binding has separate types for points and vectors, but I’m going to abuse the point type to represent both.

Now for the quirks with using Gtk for OpenGL – there are many more setup calls to make. First, you need to explicitly grab a graphics context (glContext) and GL drawing window (glWin). Then, we manage the viewport manually to scale our rendering up to fill the window. Finally, there are Gtk calls to start and end OpenGL rendering calls.
It took me a while to discover them.

I use the terminology “draw” to refer to Gtk drawing code, which tends to be bookkeeping, while I use “render” to refer to sequences of OpenGL calls. Here is the code to actually render the points and their convex hull. Note the color3f specialization, to help the type inferencer.

This is an iterative algorithm that computes the upper half-hull by travelling left-to-right across the plane making sure to always make right turns; if ever a left turn occurs, it backtracks as far as necessary, patching up the hull. I defer the obvious helper isLeftOf to the end of the file.

There is a divide-and-conquer algorithm which is probably more idiomatic, and has the same asymptotic complexity (different pathological cases) but this is the one I was trying out.

This last helper function only makes sense when points are all on the z=0 plane. It takes a point and a directed line segment, and indicates whether the point lies to the left of the line defined by that segment.