Monday, April 29, 2013

I know I posted earlier, but this has to come out of my head. And I don't think I'll have time to do much more writing this week, despite the fact that there are at least three other things on my mind that I need to idea peel.

So...

That's what I'm thinking.

And here's an attempt at forcing this primitive tool called "English" to explain what it means in my head.

First off, this is what I was thinking previously. Which is to say, not much in particular; just working my way up to meaningfully tearing into the problem. The basic problem is that there are lots of ways to dissect a square into smaller, integer-sized squares, and brute-forcing it isn't going to be viable for very high ns. In fact, from what I've observed, it took multiple days of compute time to solve for 16, and that's not encouraging.

Thing is, there are lots of little ways to cut out a lot of the brute-force work required for the calculation. An approach that starts from the universe of possibilities and filters isn't going to get very far, but we can constrain that universe pretty significantly if we pick our model carefully, and I think I have.

Characteristics of the Optimal Dissection Model

Cut at the base

A huge contribution to the final tally of work is going to be figuring out where 1x1 squares can fit. If you look at each of them as a Placement, you'll be overwhelmed pretty quickly. An easy way out of that is picking a model that lets you ignore 1x1 squares, and the easiest obvious way to do that is using a sparse array with the understanding that any unrecorded placements actually represent 1x1 squares. This lets you ignore a bunch of StartingPoints too (more on that in a moment).

Take symmetry into account

A second huge contribution to the bottom line of work for this problem is that you need to recompute a lot of dissections which you must then de-duplicate later. I don't think we can lick this one entirely; we'll still need to do Set insertions at some point in the process just to make sure we're not counting anything twice. But. We can cut out a lot of them. Specifically, since the problem already mentions rotations and reflections, we don't need to Place any squares that we know will be picked up by some other permutations' reflections or rotations. That can greatly reduce the number potential starting nodes these placement trees. It also means we'll want a representation where rotate and reflect are really, really fast operations. The sparse array approach seems to be better than using an explicit grid, but I'm not quite convinced it's optimal.

Bring it all together

So what we've got as a 10000-foot-view for the process is this:

find all StartingPoints

unfold each starting point, starting with squares of size nxn and working your way down to 2x2

record the null dissection (a grid full of 1x1 squares)

return the size of the set of recorded dissections

A StartingPoint is any point on our grid from '(0 0) to `(,(limit n) ,(limit n)).

The limit of n is (- (ceiling (/ n 2)) 1)

To unfold a StartingPoint, insert the solitary placement as a dissection (remember, any empty spaces are treated as "placed 1x1 squares"), then find all freeSecondPoints and unfold them.

A SecondPoint is any freeStartingPoint, or any point between `(,(limit n) ,y) and `(,(- n 2) ,y). Again, any others should be represented among reflections/rotateions of other dissections.

To unfold a SecondPoint, insert it as a dissection, then unfold all remaining free points on the grid (there's probably a way to cut this step down, but I can't see it yet).

To unfold any other point, insert it as a dissection, and unfold all remaining free points until there's no more room.

To insert a dissection, perform Set insertion on each of its rotateions and reflections into the set of all insertions for this particular grid.

A free point is one where there's enough room to put a square larger than 1x1 (this implies that place could probably remove some additional squares from the potential starting pool to make calculating this easier).

Granted, that's easier said than done. I get the feeling that when I actually go to implement this fully, a whole bunch of problems are going to crop up, but at least I have a half-way decent starting point.

Granted the second set looks more complicated, except for the Haskell line, but it has some advantages. Firstly, while the first bunch of lists is bounded, this second bunch is infinite, and you do sometimes want to express that. More to the point though, the reason these can be infinite is that they're lazy. They only compute as much of the remainder of the sequence as you actually ask for, which means they save you space in two ways

they don't keep the entire list in memory by default; they deal with only one element at a time and any used ones are garbage collected unless you decide to keep a pointer to them yourself

they never bother computing parts of the list you don't call for, so they don't waste space storing values that'll never actually get used somewhere

That's basic stuff, it should be pretty obvious. Less obvious, but more significant, is how this saves you time.

But what about when we compose multiple operations on one sequence? You'll recall that back when I sat down to optimize Life for a couple of days, I made a big deal of reducing the number of iterations of our corpus. Specifically, recall that the Haskell version of Gridless Life started out like this:

which means six total trips over the entire list. That's bad because every element you touch adds to your run time, and each element you have to touch again as part of the computation is one that you can't throw out of memory. On the other hand, expressing a computation by composing smaller, easy computations is very straightforward[2]. This is a place where the code you want to run, and the code you want to write are completely different.

What you want to run is a tight loop that walks over the entire corpus once, and applies all of the chained functions at once per element. What you want to write is the naive composition of those chained functions, because you've otherwise created an algorithm that will only ever be useful for your particular computation, and that computation will be burdened with the specifics of iteration which will make it non-trivial at best and impenetrable at worst.

Now, granted, Common Lisp has other ways of dealing of dealing with this[3], but laziness is one general solution. If each of those functions were lazy (as, in fact, they are in Haskell), what you'd get instead is exactly what you want. One, tight loop running over the entire corpus, applying only as much grouping, sorting, concating and neighborsing as it needed to for the next part of the computation. That saves you a few trips over the input with no additional effort on your part. Talk about low-hanging fruit.

I'll be honest, I was also going to talk about my latest thoughts on the square dissection problem, but this ended up being longer than I'd like as it is. It'll probably happen next time, I guess. Probably. Stay tuned if you're into that sort of thing.

Footnotes

1 - [back] - A note on functional, lazy sorts, because I was wondering about this both back when tuning the Haskell version of Life and as I was re-visiting it for today's feature. The way that lazy sorts seem to work is basically by using a destructured heapsort. Specifically, if you take a look at this pseudocode, what's happening is that a lazy sort runs heapify right away and passes up the first element, then pulls out the next element each time it's asked for one. That results in On performance for finding the first element (which as far as I'm aware is what you'd have to do in order to get the "nextest" element in the general case anyway), followed by O(log n) performance on looking up each next element. That's good because it means you don't have to do it all at once, and it means that if you only want the first 5 sorted elements of a list of 10000, you get to avoid most of the work. On the other hand, note that this doesn't save you much memory, if any; you still need to store that heap for the whole list, even if you only want the first few chunklets.

2 - [back] - Which is why I did this for the first pass of that program, not realizing that Haskell's lazy-by-default outlook would also make it about as efficient as it could be.

3 - [back] - In fact, the situation of "I want to write the pretty code, but run the ugly code" should send any Lisp veterans off thinking about how you'd defmacro your way out of this particular annoyance.

Tuesday, April 23, 2013

Ok, so I've been coding something for the past little while in between thinking about squares and reading up on concurrency in haskell. I'm not going to tell you what it is, because we've established my track record, but I did want to note that it looks like it's going to rely on Conduits. A minimal module that provides some interesting additional options for defpackage.

It doesn't have a quicklisp installation package yet, so I've written a basic .asd and chucked it up onto my github in the meantime.

More Square Thoughts

I haven't been actively working on this, just sort of letting it percolate at the back of my mind while other stuff is going on. Having taken a closer look at the original solution for 1..4, and imagining how its author went about getting it, it looks like we might be able to cut a lot of placements out of the process if we picked a representation that was easy to reflect and rotate. Then the actual solution would look something like

Assuming this works, we could easily write a solution to the "hard" version too; just call insert-dissection directly rather than going through collect-dissection and friends.

I'm carefully refraining from defining dissection-count, insert-dissection, grid, and indeed rotate and reflect. What I know is

insert-dissection is going to have to do the work of ensuring that duplicate dissections are discarded, which implies storing them as a set or hash

the definitions of rotate and reflect will depend heavily on the definition of grid, and square storage needs to be thought out to make sure they're all fast

it seems like there should be a better way to solve this problem than "brute force", but I'm not seeing it yet. I've joked about it before, but this may actually be the problem that gets me off my ass and into seriously learning about genetic algorithms

That's that for now. I was going to talk a bit about how work has been going, and the shape of small-scale development in Toronto's medical industry, but it looks like a client has finally decided to respond to me, so back to the grindstone I go.

I'm not going to endorse SATA3 for anyone, because it's honestly not a mind-blowing change. I mean load times are noticeably faster, but the difference isn't anything like what you get from switching to a SATA2 SSD from a traditional platter drive. If you end up trading up to this from one of the slower SSDs, be prepared to hear yourself say "Oh, that's nice" rather than "This is fucking glorious". Honestly, my purchase was a result of the fact that this sale price came under my may-as-well threshold for hardware and that my previous HD was literally held in by rubber stoppers and electrical tape.

Dissecting Squares

Suppose you have a grid of squares nxn large. There is a finite number of ways you can split that square grid up into squares of at least 1x1 size. For example, for a 1x1 grid, there is only 1 possible dissection. For a 2x2 grid, there are 2 (either 4 1x1 squares or 1 2x2 square). For a 3x3 grid there are 6; (either 9 1x1 squares, 1 3x3 square, or 4 separate combinations of 1 2x2 square and 5 1x1 squares). This is the Number of Square Dissections.

"Easy" Mode: Write a function that takes a number N and returns the Number of Square Dissections of an N by N grid.

"Hard" Mode: Write a second function with the same input as above that returns the number of unique, under rotation and reflection, dissections.

Before you chuckle at this, bear in mind that a general solution is publishable[1]. According to Dann, the sequence is solved up to an N of 15 or so, and that's already some ungodly number best expressed by exponents.

A couple of people took a stab at working "Hard" mode solutions[2], but as far as I know, no one managed to get to a working solution for "Easy" mode. This is the first Code Retreat I've gone to where I've felt deeply inadequate as a programmer, and that the most popular tool "a notebook" rather than a particular programming language.

As to the problem, I tried a bunch of approaches, none of which seemed to do anything other than annoy me. In terms of problem-solving, I'm usually a data-representation guy. Maybe it's the design degree making me focus on that, I don't fucking know, but it works fairly well. Given a thorny problem, my first reflex is to figure out a way of storing the data such that the mechanical components of the solution are going to be as simple as possible with no more moving parts than necessary. Which is why my train of thought as the problem was coming in was something like...

Can we do this as a naive area calculation? No, that actually fails on the 3x3 grid. We need to represent area and position, or we might represent the remaining area of the board in a more explicit way. Too complicated. Ok, we need to fit things into other things; is this actually a factoring problem? No, we're looking for the count of unique ways of factoring a thing rather than actual factorization. Doing it that way sounds like it would be computationally expensive. Can we model it as a tree of possible square placements? We'd need to represent each square as a width x y triple, at minimum, starting with (list (- n 2) 0 0) and working from there. But there are multiple interconnected combinations that can start with each 2x2 square on a 4x4 grid, and we can't actually prune a lot of sub-branches. Can we do this as a graph traversal? Sounds pretty go... wait, no. No, we wouldn't be visiting each node once. In fact, we can't know a-priori how many times a given node would have to be visited, or what the endpoints of any valid traversal are. Can we model this as an inventory system? Each grid has yae space and adding a square depletes it; when you get stuck, fill the rest with ones, each fill is a different dissection, count the results. Wait, isn't that going to be O^scary^fucking-scary? You need to generate all possible inventory combinations and de-duplicate. I guess you could keep a set of dissections somewhere and keep a canonical sorting order for placed squares. That still implies a representation that tracks position and size rather than just a size and a count. Is it possible that a 2D grid is actually a better representation for this one?-Inaimathi's brain

As it happens, sitting down and throwing together a simple program to test some of these didn't do anything to help. I'd start, get about a third of the way through, then realize that I wasn't accounting for this or that edge case. Or I'd figure out that edge-cases were covered, but the approach doesn't scale up to a general solution. The most promising I've got so far, and this is nowhere near a working solution yet, looks like

which is godawful, and doesn't work yet, but it isn't obvious that it can't work so I'm rolling with it. The (very) general high-level approach is starting from the largest possible sub-squares of a given grid and doing a depth-first placement traversal. Each successful placement will then be sorted and stored in a set, which will ensure uniqueness[3]. There's a few places where we can definitely prune possibilities, but not as many as I thought there would be, which means that this will ultimately be a fairly expensive operation in any case.

That's the most basic example I could find/cobble together of using SSEs in Wai. That's the library called Network.Wai.EventSource up there, and you can see the channel represented in the expressions involving newChan, eventSourceAppChan and writeChan. Basically, we set up a Chan[2] at server startup, we hand out an endpoint whenever someone requests /source, and we write to all endpoints whenever someone requests /post.

The file index.html is exactly what you think it is; about 10 lines each of HTML and JavaScript that set up the front-end EventSource hooks and make sure the chat list gets updated with each new message. You could write it yourself without very much trouble.

This isn't particularly interesting. Firstly because, as you can see, it's ridiculously simple, and secondly because it doesn't scale. I mean it scales with users, sure. According to the Warp benchmarks, we can expect this to support somewhere between 20k and 50k people chatting depending on their loquaciousness, but since they'll all be chatting anonymously in the same room, the experience will stop being useful well before that. The next step confounded me for a little while because I had the assumption that using state in Haskell meant using the State monad[3]. It turns out that's probably not what you'd want here.

What we're after is a system where you can start up arbitrary new rooms, and post to a specific one. In other words, something like

The significant operations there all involve something called an IORef, which is Haskell-talk for "a pointer". You can think of it an IO-based global variable that you can store stuff in[4], in this case, a map of channel names to channel streams.

That index.html file has a bunch of front-end changes too, mostly to do with acquiring and displaying multiple SSE sources, but we're not interested in that today. In the back-end, you'll notice that we've got a new function, getOrCreateChannel, which takes a "pointer" to our channel map and a name, and either returns the result of looking up that name, or inserts and returns a corresponding entry. readIORef "dereferences" that "pointer" to our map, and atomicModifyIORef mutates it. The rest of it should be self-explanatory.

Because we need to do a channel lookup before calling postMessage or eventSourceAppChan, our routes get a bit more complicated. We need to call getOrCreateChannel on the passed in channelName, then pass that to the appropriate function and return the response[5].

Finally, instead of passing a single channel to our app, we need to pass it a "pointer" to our lookup table. That happens in main at the bottom there.

The result of this exercise, as long as we put the front-end together appropriately, is a multi-room, anonymous, HTML chat system. More importantly though, this is a demonstration of how to handle simple global states in Haskell without tearing all your hair out.

I really wish someone else had written this before I started thinking about it...

Footnotes

1 - [back] - No, still not Yesod. Feel perfectly free to use it if that's your thing, but I'd still recommend Happstack if you absolutely, positively need a framework..

3 - [back] - Also, because I'm still not quite awesome enough at this that I can manipulate type expressions in my head. As a result, successful signature changes rarely happen first try, and I often find myself commenting them out then resorting to :t in GHCi and following the compilers' lead. I assume that's mechanical rather than a conceptual problem though, and talking about how I need more practice won't really help you out in any way.

4 - [back] - The IORef docs warn that using more than one of these in a program makes them unreliable in a multi-threaded setting. The thing is:

This chat program is extremely simple, needing only one global map to store open channels

If it ever got to the point of needing a more complex model, I'd hook it up to AcidState rather than trying to fiddle with MVars myself.

5 - [back] - You can see that happening in the branch labeled [channelName, action] ->, though we easily could have separated it into an external function rather than nesting cases.

Ruby and Erlang each come with their own modes, and recent Emacs versions ship with a built-in Python mode and shell. Smalltalk uses its own environment (though GNU Smalltalk does have its own mode), and I'd really rather not talk about PHP. If you're writing in it, chances are you're using Eclipse or an IDE anyway.