Meta

My last post was about how the first stage of ARToolkit’s marker detection works. Chris has just started a Student Robotics internship, and is working towards a new vision system for the SR kit. As the first part of the journey towards this, he’s continuing the work on dissecting ARToolkit. Find part 2 of the dissection on his blog, in which Chris covers how ARToolkit finds the boundaries of the regions found in part 1.

Tonight’s PhD creative/productive escape was to continue teasing apart some of the functionality of ARToolkit. I’m pursuing getting some fiducial marker detection, à la ARToolKit, into the core of Student Robotics 2012’s kit (the one that’ll be shipped in October). We won’t be using the exact algorithms found in ARToolKit, as it frequently reports markers that aren’t there, but learning some of what I perceived to be the “magic” of the library’s guts seemed like a good idea.

I first hit Google scholar to find papers about how ARToolKit and other similar libraries work. Luckily, I’m currently a student at an institution that provides me with access to journal and conference papers. Sadly this is not the case for everyone, which sucks. I read, and skimmed through, a few papers. This gave me an idea of what’s around. Unfortunately I didn’t find a thorough description of ARToolKit’s algorithms. Even ARToolKit’s own description of its vision algorithm left big gaps out. There are a few papers out there that compare different marker detection libraries. I’d link to them, but they’re behind a paywall, so I’d rather not.

What I suspect is happening is that people take one look at the more interesting functions within ARToolKit’s source and then run a mile. Take, for example, the function labeling2 found in arLabeling.c. Take a look at it now. To understand what’s going on in there you need to be really determined. You need to wade through some manual loop unrolling, preprocessor fudging, and arrays of ints being treated like they should be arrays of structs. More on this in a bit.

What’s interesting is that this code works well enough for people to use it. Wikipedia says:

“ARToolKit is a very widely used AR tracking library with over 160,000 downloads since 2004”

So, either I’m going crazy and I have lost the ability to read someone else’s code, or the ARToolKit code leaves a lot to be desired. For what are hopefully intuitive reasons, I’m going to opt for the former latter explanation. So here I get to apply some rather extended extrapolation (for more of this see Freakonomics) about how we reached a situation in which at least thousands of people are using a library that’s quite impenetrable. I think it’s a pretty good demonstration of two things. First: usability can count more than implementation detail. ARToolKit functions as a library fine, and has a usable interface. Most users don’t need to care about the internal implementation until it breaks. Secondly, it’s a demonstration that organically developed things can work, and that one doesn’t need to follow software engineering formalisms to the letter to get the job done.

Still, all this mustn’t belittle the achievement of the authors of ARToolKit. They’ve obviously made a significant contribution to a lot of good projects that people have done, and I’m sure that many of the ideas wouldn’t have nucleated had the library not existed at all. So, time to quit whinging about how impenetrable the code is, and get to work on deciphering it! I’ll be doing this over a series of blog posts. This is the first one, and is about the first stage of ARToolKit’s image processing: labelling. I’ll be ignoring the image acquisition itself because this is a pretty run-of-the-mill operation.

Ok, I lied, I’m actually going to cover thresholding and labelling here. The thresholding step is really simple and happens in the same pass as labelling, so it doesn’t really count. Both the thresholding and labelling happen in the labeling2() function in arLabeling.c. I’ve spent several hours working through this function, simplifying it so that I could understand what was going on. Whilst I was doing this, I committed my changes to a git repository:

git clone https://xgoat.com/pubgit/artoolkit-revenge.git

The contents of the above git repository is not intended to be run. The only reason for this code existing is to help me (and maybe others) understand how ARToolKit works. In doing this, I removed several things that I didn’t care about, such as different pixel colour formats. These things weren’t important to understanding how ARToolKit’s algorithms work. At the point of writing, I’ve got labeling2() down from 344 lines of quite unapproachable code to 169 lines of fodder that I find easier to grok.

Thresholding

ARToolKit identifies black-and-white markers in images from the camera, such as the one shown on the right. It converts the colour image that comes from the camera into a black-and-white image using a threshold specified by the user. If one is working with an RGB image, then the sum of the R, G, and B components for each pixel are compared against the threshold that the user specifies like so:

r + g + b < threshold * 3

Pixels that satisfy this threshold are suitable for labelling. Other pixels are ignored.

Labelling

The pixels that got through the thresholding step are now grouped into sets of connected pixels. Each group of connected pixels is assigned a label, which is just an integer greater than zero. This is done by iterating through the thresholded image pixels, row-by-row from top-left to bottom-right, and storing its calculated label number in a second image buffer. A pixel's label is decided as follows:

If the pixel above is labelled, take the same label as it.

Otherwise, if the pixel to the top-right was labelled then some checks are made to determine if two groups of (what are currently) differently labelled pixels require merging to have the same label. If at least one of the pixels to the left and or top-left of the current pixel is labelled, then two labelled regions are indeed connected. ARToolKit makes the labels of the two intersecting groups equivalent, simply by recording the fact that these two labels are equivalent -- it doesn't go renumbering the labels stored in the label image buffer (this'd be quite inefficient).

Otherwise, take the label of the pixel to the top-left if it has one.

Otherwise, take the label of the pixel to the left if it has one.

Finally, if none of the above conditions were met, then a new region has been found. A new label number is assigned to the pixel.

Whilst all this labelling is going on, statistics about each label are built up as well. The following stats are collected for each label:

The number of pixels assigned this label.

The sum of the x coordinates, as well as the sum of the y coordinates of each pixel of this label.

The minimum and maximum x and y coordinates for the pixel.

The first two of those statistics are used to calculate the centre-point of a labelled region, whilst the other numbers will be passed back to the caller of labeling2.

As I said above, label numbers can be made equivalent to each other during the labelling process. This means that after the labelling is complete, there can be redundancy in label numbers. ARToolKit performs a little bit of label number shuffling to remove this redundancy, and ensures that all label numbers are consecutive.

These statistics are passed back along with the labelled image buffer to the caller. I won't go into the precise details here. If you want to know more, have a look at the labeling2 function in the modified sources that I linked to above. I've changed the prototype of the labeling2 function so that it uses arrays of structs that are easier to decipher, so hopefully it'll all make sense.

That's where I'm up to right now with parsing ARToolKit's behaviour. The next instalment will be on the behaviour of arDetectMarker2(), which will use some of the information collected by the labelling process.