assorted ramblings by Jamis Buck

Maze Generation: Eller's Algorithm

29 December 2010 —
A clever technique is demonstrated for generating a random maze, one row at a time
—
11-minute read

Last time I talked about the recursive backtracker algorithm for maze generation. That’s probably always going to be my favorite algorithm for generating mazes, for a variety of reasons, but that’s not going to stop me from looking at others.

For one thing, there are some pretty crazy algorithms out there for generating mazes.

Eller’s algorithm is one of the craziest. It’s also one of the fastest. And it’s the only one I know that let’s you generate mazes of an infinite size. In linear time.

Yeah, it’s that crazy.

It does this by building the maze one row at a time, using sets to keep track of which columns are ultimately connected. But it never needs to look at more than a single row, and when it finishes, it always produces a perfect maze.

Like I did for the recursive backtracking algorithm, here’s the “mile-high” overview of Eller’s algorithm:

Initialize the cells of the first row to each exist in their own set.

Now, randomly join adjacent cells, but only if they are not in the same set. When joining adjacent cells, merge the cells of both sets into a single set, indicating that all cells in both sets are now connected (there is a path that connects any two cells in the set).

For each set, randomly create vertical connections downward to the next row. Each remaining set must have at least one vertical connection. The cells in the next row thus connected must share the set of the cell above them.

Flesh out the next row by putting any remaining cells into their own sets.

Repeat until the last row is reached.

For the last row, join all adjacent cells that do not share a set, and omit the vertical connections, and you’re done!

If you’re at all like me, your head is probably spinning at this point. Let’s back up and work through an example manually, to help you see how the algorithm works in practice. Let’s begin with a simple 5-column row.

An example

First, we initialize each cell in that row to be in its own set. I’ll just assign each cell a number, indicating the set it belongs to:

At this point, we can actually discard the first row, because the algorithm is done with it. We’ll keep it around for now, though, for the sake of illustration. I’ll just put a little space between the previous rows, and the current row:

Analysis

Let’s analyze that a bit. It seemed to come together pretty magically, considering we weren’t looking at anything but the current row (and the next row, briefly). The key to it all are the sets.

The set that a cell belongs to tells the algorithm who its siblings were, are, and will be. It’s the crystal ball that lets the algorithm gaze into the future (and the past!) and avoid adding cycles and isolates to the maze.

Cells that share a set, also share a path between them. (If you don’t believe me, look at the example I just gave, above. Every cell that shares a set identifier is connected; cells in different sets are not connected.)

If the algorithm allowed us to create a passage between two cells that shared a set, it would be introducing a second path between those two cells. That’s essentially the definition of a loop or cycle in the graph, and since we don’t want cycles in our maze, we disallow that.

Conversely, cells that do not share a set, are not connected (they are disjoint). By the time we reach the end of the maze, every cell must be connected to every other cell, and the only way we can do that is if every set is eventually merged into a single set.

We can’t do that if a set does not propogate itself to the next row. This is why the algorithm requires that at least one vertical passage be created for each set in the row. Otherwise, any set that didn’t create a vertical passage would become extinct after the current row. The result would be an isolate, an orphaned collection of cells that could never be reached from outside that set.

Then, at the end, the algorithm joins all disjoint sets, allowing every cell in the the entire maze to be connected by a single, unique path to any other cell in the maze. And we’re done!

Implementation

How would you implement this? The key, for me, turned out to be implementing the sets. You need to be able to quickly determine the set of any given cell in a row, as well as determine the list of cells in any given set. I did this by maintaining a hash of arrays that mapped sets to cells, and another hash that mapped cells to sets. As I did in the example above, I simply used a unique integer to identify each set.

Once I had these routines (encapsulated into a State class), the algorithm itself came fairly neatly. It works in two steps, plus a third to convert the representation into something easier to display.

The first step looks over the row and randomly connects adjacent cells (if they exist in disjoint sets):

As you can see, the process simply looks at each cell and its neighbor, comparing their states and then either adding the cells to a “connected set” (a series of adjacent cells that are all horizontally connected) and merging the sets together, or creating a new connected set when the two cells should not be merged.

The finish variable is used to change the behavior for the final row; it is false for the rest of the rows.

The second step looks at the available sets and randomly adds vertical connections:

State#next just returns a new State object (that we’re using for the next row). Then, for each set of cells, we randomly pick some number of them and add them to the list of verticals we’re going to create. (The verticals are also added to the next row, in the same set.)

The algorithm itself then loops over these steps repeatedly, setting state to next_state at the end of each pass, until it is done. (In my case, I trapped the INT signal, so ctrl-C can be used to terminate the algorithm and gracefully finish the maze.)

For those of you not using IE (which will make a total mess of this), here are two demos you can play with to see the algorithm in action:

I think Eller’s algorithm is harder to customize than the recursive backtracking algorithm, but it can be done. Consider it an exercise, if you want: how would you introduce horizontal or vertical bias into the maze? (I.e., how would you make the maze prefer longer corridors, either horizontally or vertically?) How would you implement weave mazes, where the passages move over or under other passages? Especially tricky: how would you introduce symmetry into the output, given that the algorithm itself doesn’t look at anything more than the single row?

Reader Comments

All the mazes you’ve generated in these articles have no start/end openings. Is it true that no matter where you poke two holes in the outer wall, they will be connected? It seems like each of the algorithms produces a single non-disjoint “tiling” of the maze, but do these algorithms guarantee that?

Jim Menard29 Dec 2010

@Jim, each algorithm I’ve presented (or will present) generates “perfect” mazes, which means they are acyclic undirected graphs. Thus, there is exactly one path that connects any two points in the maze, and this means you can poke two holes in the outer wall anywhere you like, and there is guaranteed to be a path between them.

Now, that doesn’t guarantee it will be an interesting or challenging path. To do that you’d want to find two points on the boundary of the maze that have the longest path between them, which is an interesting problem on its own. However, it’s not directly related to maze generation, so I probably won’t go into it.

Jamis29 Dec 2010

Can this be tweaked to generate uni-cursal mazes, in other words, Labyrinths instead of mazes?

I have been designing labyrinths manually and this would certainly be a great help.

Thanks.

HiMY SYeD29 Dec 2010

@HiMY, unicursal mazes can be generated from virtually any perfect orthogonal (rectangular) maze. My Theseus library (https://github.com/jamis/theseus) can do this, actually. The idea is to generate a maze first, remove the exit so only the entrance remains, and then bisect each passage with a wall. This makes dead-ends into u-turns, and will give you a unicursal labyrinth that is twice the size of the original maze.

If you have Ruby 1.9 installed, you can “gem install theseus”, and then “theseus -w 5 -u” to see a randomly generated 10×10 unicursal maze.

Very good read. I’ve come across this algorithm before and didn’t fully understand it, but your walkthrough makes much more sense to me now.

A while back when bored I was able to implement a modified/misinterpreted Eller’s algorithm using only formulas and conditional formatting in Excel or OpenOffice.

Every row’s leftmost cell contained a 1. For every other row in the grid, a random function determines whether the next cell was 1+Previous (same set) or reset to 1 (new set). (You can add vertical bias by favoring the reset option, or horizontal bias by favoring the 1+Previous option.)

Every row also had a random digit generated between 2 and the maximum number used in that row to determine where a vertical connection was made upward (call it X).

Conditional formatting worked like this:
- Add a left-side wall if the cell contains a 1.
- Add a top-side wall if:
... the cell contains a digit higher than X
... the cell contains a digit lower than X AND the cell immediately to it’s right does NOT contain 1. (This rule is used for when a set’s width is smaller than X to ensure that a vertical path is available.)

I’m not sure my explanation entirely makes sense, so you can grab this OpenOffice spreadsheet if you’re interested in seeing what I mean:

Is there a date for Eller’s Algorithm? I ask this because, I remember coming up with this exact solution in high school in 1984-5 or 1986 at the latest, and implementing it in Applesoft Basic on an Apple ][. Sadly, back then, awesome programming skills did not attract the attention of girls. I know it is different today with all girls checking over the elegance of your haskell code.

@Xalem, I honestly don’t know anything more about the origin of the algorithm, than its name. Googling shows that pretty much every other mention that cites a reference, refers to the “Think Labyrinth” site (http://www.astrolog.org/labyrnth/algrithm.htm#perfect), which is also where I was introduced to it. (Maybe the author of that site has more information?) Google Books has only a single mention, and that book is from 2005.

If anyone has any info on the origin of Eller’s algorithm, please share! It would be pretty cool if Xalem’s discovery of it predated any other published source. :)

Jamis29 Dec 2010

I encountered a program written in BASIC in around about 1983 or so, that must have been implementing this algorithm or one like it. I could never understand how it did it, and sadly I lost the listing when I upgraded from my Commodore CBM-8032 to a more modern MS-DOS compatible 8086 computer. It’s only taken 27 years, but thanks for finally explaining the mystery for me!

Eric TF Bat30 Dec 2010

excellent article, very good illustrations!

ar30 Dec 2010

That’s incredible.

William30 Dec 2010

I attended a programming competition in high school, years ago, where we were challenged to write a program that drew random, solvable mazes. We had to complete the task in 1 hour. The implementation we came up with was naive at best, and didn’t meet the requirements. Had we been aware of such an algorithm, we would have definitely had a solution. However, our high school never taught algorithms, only simple programming languages: unfortunately, Turing. The focus was on how to write in some language, but not actually get into algorithms.

This was a pleasant read.

dave30 Dec 2010

Also see John Tromp’s IOCCC winner, described at http://homepages.cwi.nl/~tromp/maze.html, and Carl Shapiro’s IOCCC winner before that.

I asked the maintainer of “Think Labyrinth” about the origin of Eller’s algorithm and here was his response:

“Eller’s algorithm is named after computer programmer Marlin Eller, CEO of sunhawk.com. He invented this algorithm in 1982, which is the earliest use of it I know of. He never published it, but he did tell me about it, so I chose to name the algorithm after him.”

Jeffrey Winter3 Jan 2011

@Jeffrey, thanks for the follow-up. That’s great information, and straight from the horse’s mouth, as it were.

Jamis3 Jan 2011

What would you call a maze that is “perfect” w.r.t. the solution path, i.e. there is only a single path from start to finish, but the non-solution areas can have loops? In other words, something between perfect and braid. To see what I mean, download Amaze from qtamaze.sf.net, run it, and slide the “Cuts” value to a non-zero value. B.t.w. the biggest single improvement in maze quality for that program’s algorithm comes from cutting short any loops in the naive solution path, where the path passes by itself one cell over; this tends to cut the path length by 70-90%.

chaered24 Feb 2011

@chaered, the instant you add a loop anywhere in the maze, you have multiple solutions. This is because the solution can go around the loop and retrace its steps, and then go to the end. I think what you’re referring to is a shortest path being “the” solution, and that’s going to be the ideal case regardless of how many loops a maze has. In other words, what you’re describing is just a special case of the multiply-connected graph. I don’t believe there is any special term for it.

Jamis25 Feb 2011

@Jamis, I think it depends on where the loop occurs. Analogously (topologically equivalent) you can think of the solution path as a long corridor with doors going off into different rooms, which may contain further walls. In a perfect maze, the rooms are not interconnected, except through that corridor. My question concerns the case where the rooms themselves may have free-standing walls inside them, so you can circle around inside a room, but any such loop will never take you between two points on the corridor; you can only enter/exit a room through the same door. The definition of “perfect” seems to preclude free-standing walls even if they never lead to connections between the rooms, i.e. never cause alternate solution paths to exist.

chaered25 Feb 2011

@chaered, I understand what you’re suggesting, but I don’t agree that it is substantially different from the general multiply-connected case. Imagine if you were placed in this maze. All you know is that there is a path from point A to point B. The existence of a shortest path does not guarantee that you won’t wander from it—you may very well find yourself in one of those side passages, and wander around that free-standing wall. The solution you then find for the maze would include that loop around the wall. The maze would have multiple solutions, even though some of those involve retracing your steps. Remember that any shortest path through a multiply-connected maze will avoid the loops anyway—from the perspective of someone following the shortest path, the loops will always appear to be in the “side-passages”.

Thus, a perfect maze cannot be multiply-connected in any way; as soon as it is, it ceases to be perfect. What you’re describing is simply a multiply-connected maze, where all but one of the solutions involve retracing your steps.

Jamis25 Feb 2011

@Jamis, one source of confusion may be how you enter the maze: you say “placed inside the maze”, whereas I assume the maze has a single defined entrance. My earlier comment only applies to mazes with a single defined entrance and exit point. I think retracing steps would not count as a loop, otherwise a loop-free maze cannot exits: imagine a T-shaped, 4-cell maze, enter on top left, exit top right; you can walk either straight from the entrance to the exit, or take a one-step detour into the dangling single cell below. The second path, where you go off the path and then retrace through the same cell, should not, I think, be counted as an alternative path. What do you think?
(PS: This is a fascinating topic, thanks for taking the time to reply!)

chaered25 Feb 2011

@chaered, it actually does not matter how you define “placed in the maze.” By definition, every cell in the maze must be reachable by every other cell, so regardless of where you are placed, there will be a shortest path to any other cell you designate. And since a shortest path will never include loops (since loops can be trivially removed from the path), any path between two cells will be “direct” (in the sense that the path will never intersect itself).

Also, when I’ve said “retrace steps”, I’m not talking about encountering a dead-end and doing an about-face. To reuse your T-shaped maze, imagine a loop at the bottom, making something like a lowercase b with a crossbar at the top. You could start at the top-left, turn right, walk all the way down, around the loop, back up, turn right, and exit at the top-right. All without encountering a dead-end, and yet for part of that you’d be retracing your steps. More technically, your solution path would intersect itself. This doesn’t invalidate the solution, but it does mean it can’t be the shortest path through the maze.

Maybe what I need to do is define “solution”. When I say solution, I mean any (absolutely any) path through the maze. If you place a mouse in the maze and record its progress, it will eventually get from the beginning to the end. Retracing its steps will take you through the maze. However, that path may consist of hitting dead-ends, turning around, etc.

The “shortest path” through the maze is the solution you’re referring to. And what I’m saying is that even with that definition, there will always be a shortest path between any two points in the maze. For multiply-connected mazes, there may even be multiple shortest paths. But any shortest path will never contain any loops (or self intersections), because those can be removed trivially, resulting in a shorter path.

Regardless of whether the maze is perfect or multiply-connected in any degree, to a viewer walking only the shortest path, the maze would appear to be perfect. Any path not on the shortest path would be a “side passage”, and whether there were loops there or not would be irrelevant. I think this is the point you’ve been making. However, my point is that to someone entering the maze, the “shortest path” is simply a hypothetical nicety. They don’t know the shortest path, and their own path through the maze will almost certainly deviate from the shortest path. As a result, they may encounter loops, dead-ends, and other features of the maze that complicate the solution they discover.

I guess what I’m saying is that, if the entity traversing the maze already knows the solution, then the concept of perfect versus multiply-connected is moot. But in that case, there is little interesting about the traversal of the maze. In practice, the maze is an unknown to the one traversing the maze, and whether their path finds loops or deadends, the end result is the same: a self-intersecting path which nonetheless results in a valid (if long) solution.

(Thanks to you, too, for the discussion!)

Jamis25 Feb 2011

@Jamis, thanks for the definitions of terms, it clear up confusion a lot; what I though of as a “path” corresponds to what you define as “shortest path”. So using your definition, if you have a 2-cell maze with cells 1 and 2, and start/end S and E, then S->1->2->E is the shortest path and a path, but S->1->2->1->2->E is just a path, right? Any path containing two visits to the same cell is non-shortest, and the subpath between those two visits is a “loop”. A dead-end is a special case of a loop, where the first half mirrors the second.
And a “perfect maze” is a maze where there is exactly 1 shortest-path between any two cells. And a braid is a maze where there is at least 1 shortest-path between any two cells.

In that case, the class of mazes I was talking about could be defined as having exactly 1 shortest-path between the entrance and exit cell, and at least 1 shortest-path between any two cells in the maze. In other words, the non-self-intersecting traversal path from start to end is unique, but the same for any other pair of cells may not be.

chaered25 Feb 2011

@Jamis, btw. sorry if I seem to be harping on about the specific cases. Just wanted to clarify that in order for a maze (as a puzzle) to have a single valid “answer”/”solution” (non-self-intersecting path from start to end), it is not required to be a “perfect maze”.

chaered25 Feb 2011

@chaered, the definition of perfect vs. braid is actually purely based on the absence or presence of loops in the graph. A maze is perfect if there are no loops anywhere in the maze. It is braided otherwise. A side-effect of that is that a perfect maze always has exactly one (and only one) path between any two cells. (Routes that include dead-ends are not usually counted as solutions.)

Now, given the class of maze that you were describing (which I still assert is simply a multiply-connected, or braided, maze), let’s assume as you said that there exists some path from the entrance to the exit, which is shortest and non-intersecting. However, a path exists meeting those same criteria between any two cells in the maze, whether it is braided or not. There will always be a non-intersecting path between any two cells; but in a braided maze, there may be other solutions that do intersect. However, the shortest path between any two cells in a maze, braided or perfect, will be non-intersecting.

A corollary to that is that, as you said, a maze can be braided and still have a non-intersecting path from start to end. Because, as I said, this will be true for any maze. As long as there exists a path between two cells (and there must, for it to be a true maze), there must exist a non-intersecting version of that path.

It’s easy enough to prove: assume that there exists a path between two cells that intersects itself somewhere. You can then “pinch” that loop out of the path entirely, since when the path reaches the intersection, you can just take non-loop route instead of the looped route, and reach the exit. If the path intersects itself, this will always be the case. So you can show that there will always be a non-intersecting path between any two cells in any maze.

Jamis25 Feb 2011

@Jamis, I agree with your assertions above. However, the distinction I was making between a braided maze and the (nameless so far) other maze class is a single issue about those non-self-intersecting paths: in a braided maze, there may be multiple non-self-intersecting paths between any two cells (and there will be at least one, as you state); in the “other” kind of maze, for the specific case only of the start and end cells, there cannot be multiple non-self-intersecting paths.
This means that if “solving” a maze is defined as finding any non-self-intersecting path from start to end, then for a braided maze you might come up with multiple valid solutions, but for the other maze kind (just like for a perfect maze) the solution is guaranteed to be unique.

chaered25 Feb 2011

Here’s a picture of an example: there is only 1 non-self-intersecting path between the entry and exit (the dotted line), but the right half does contain a loop.
Link: https://picasaweb.google.com/lh/photo/qGOcjje4fjjoomfN22gWgXqdrwFdBJ6iIsF9YwO7n70?feat=directlink

chaered25 Feb 2011

@chaered, thanks for the picture, that confirms that I was understanding you correctly. :) It seems to me that you’re not describing a class of maze, so much as you’re describing a particular constraint on the start and end points of the solution. The maze, as I said, is still braided; you’re just selecting start and end points so that the solution meets a particular set of constraints.

Jamis25 Feb 2011

@Jamis, you’re right, the difference I described is only in the constraints, not the general topology. As a technicality, the selection of the start and end points in this program takes place before building the maze, so the constraint is actually a feature of the maze building algorithm. Maybe the classification in my mind was more about maze generation algorithms than the overall topology of the result (comes from spending more time programming mazes than than actually looking at them). In that case, allow me to reformulate: An algorithm that builds a cell-based maze from a given start and end point can fulfill the constraint of producing a fully connected maze with only a single non-self-intersecting path between those two points, without having to produce a perfect maze. Would you agree with this more limited statement?

(Detail: This algorithm does actually start off producing a perfect maze. It then shortens the solution by short-circuiting near-loops. Then it divides the maze into zones corresponding to interconnected cells from a single side-passage of the solution, and chops up walls inside zones.)

chaered26 Feb 2011

@chaered, well stated! I can agree with that statement. :) Do you have a write-up of the algorithm anywhere? I’d love to know more about it.

Jamis28 Feb 2011

@Jamis, thanks! There’s no good write-up of the algorithm right now, but I’m adding one to the qtamaze development documentation. I’ll post a link when it’s all done, looking forward to your comments.

chaered28 Feb 2011

@jamis, I put a better description of the algorithm into the dev doc, it is parts 4.1 and 4.2, on pages 4 through 7, of this PDF: http://qtamaze.sourceforge.net/dev.pdf—enjoy!