My last post was about how the first stage of ARToolkit’s marker detection works. Chris has just started a Student Robotics internship, and is working towards a new vision system for the SR kit. As the first part of the journey towards this, he’s continuing the work on dissecting ARToolkit. Find part 2 of the dissection on his blog, in which Chris covers how ARToolkit finds the boundaries of the regions found in part 1.
Tonight’s PhD creative/productive escape was to continue teasing apart some of the functionality of ARToolkit. I’m pursuing getting some fiducial marker detection, à la ARToolKit, into the core of Student Robotics 2012’s kit (the one that’ll be shipped in October). We won’t be using the exact algorithms found in ARToolKit, as it frequently reports markers that aren’t there, but learning some of what I perceived to be the “magic” of the library’s guts seemed like a good idea.
I first hit Google scholar to find papers about how ARToolKit and other similar libraries work. Luckily, I’m currently a student at an institution that provides me with access to journal and conference papers. Sadly this is not the case for everyone, which sucks. I read, and skimmed through, a few papers. This gave me an idea of what’s around. Unfortunately I didn’t find a thorough description of ARToolKit’s algorithms. Even ARToolKit’s own description of its vision algorithm left big gaps out. There are a few papers out there that compare different marker detection libraries. I’d link to them, but they’re behind a paywall, so I’d rather not.
What I suspect is happening is that people take one look at the more interesting functions within ARToolKit’s source and then run a mile. Take, for example, the function labeling2 found in arLabeling.c. Take a look at it now. To understand what’s going on in there you need to be really determined. You need to wade through some manual loop unrolling, preprocessor fudging, and arrays of ints being treated like they should be arrays of structs. More on this in a bit.
What’s interesting is that this code works well enough for people to use it. Wikipedia says:
“ARToolKit is a very widely used AR tracking library with over 160,000 downloads since 2004”
So, either I’m going crazy and I have lost the ability to read someone else’s code, or the ARToolKit code leaves a lot to be desired. For what are hopefully intuitive reasons, I’m going to opt for the former latter explanation. So here I get to apply some rather extended extrapolation (for more of this see Freakonomics) about how we reached a situation in which at least thousands of people are using a library that’s quite impenetrable. I think it’s a pretty good demonstration of two things. First: usability can count more than implementation detail. ARToolKit functions as a library fine, and has a usable interface. Most users don’t need to care about the internal implementation until it breaks. Secondly, it’s a demonstration that organically developed things can work, and that one doesn’t need to follow software engineering formalisms to the letter to get the job done.
Still, all this mustn’t belittle the achievement of the authors of ARToolKit. They’ve obviously made a significant contribution to a lot of good projects that people have done, and I’m sure that many of the ideas wouldn’t have nucleated had the library not existed at all. So, time to quit whinging about how impenetrable the code is, and get to work on deciphering it! I’ll be doing this over a series of blog posts. This is the first one, and is about the first stage of ARToolKit’s image processing: labelling. I’ll be ignoring the image acquisition itself because this is a pretty run-of-the-mill operation.
Ok, I lied, I’m actually going to cover thresholding and labelling here. The thresholding step is really simple and happens in the same pass as labelling, so it doesn’t really count. Both the thresholding and labelling happen in the labeling2() function in arLabeling.c. I’ve spent several hours working through this function, simplifying it so that I could understand what was going on. Whilst I was doing this, I committed my changes to a git repository:
git clone https://xgoat.com/pubgit/artoolkit-revenge.git
The contents of the above git repository is not intended to be run. The only reason for this code existing is to help me (and maybe others) understand how ARToolKit works. In doing this, I removed several things that I didn’t care about, such as different pixel colour formats. These things weren’t important to understanding how ARToolKit’s algorithms work. At the point of writing, I’ve got labeling2() down from 344 lines of quite unapproachable code to 169 lines of fodder that I find easier to grok.
Thresholding
ARToolKit identifies black-and-white markers in images from the camera, such as the one shown on the right. It converts the colour image that comes from the camera into a black-and-white image using a threshold specified by the user. If one is working with an RGB image, then the sum of the R, G, and B components for each pixel are compared against the threshold that the user specifies like so:
r + g + b < threshold * 3
Pixels that satisfy this threshold are suitable for labelling. Other pixels are ignored.
Labelling
The pixels that got through the thresholding step are now grouped into sets of connected pixels. Each group of connected pixels is assigned a label, which is just an integer greater than zero. This is done by iterating through the thresholded image pixels, row-by-row from top-left to bottom-right, and storing its calculated label number in a second image buffer. A pixel's label is decided as follows:
- If the pixel above is labelled, take the same label as it.
- Otherwise, if the pixel to the top-right was labelled then some checks are made to determine if two groups of (what are currently) differently labelled pixels require merging to have the same label. If at least one of the pixels to the left and or top-left of the current pixel is labelled, then two labelled regions are indeed connected. ARToolKit makes the labels of the two intersecting groups equivalent, simply by recording the fact that these two labels are equivalent -- it doesn't go renumbering the labels stored in the label image buffer (this'd be quite inefficient).
- Otherwise, take the label of the pixel to the top-left if it has one.
- Otherwise, take the label of the pixel to the left if it has one.
- Finally, if none of the above conditions were met, then a new region has been found. A new label number is assigned to the pixel.
Whilst all this labelling is going on, statistics about each label are built up as well. The following stats are collected for each label:
- The number of pixels assigned this label.
- The sum of the x coordinates, as well as the sum of the y coordinates of each pixel of this label.
- The minimum and maximum x and y coordinates for the pixel.
The first two of those statistics are used to calculate the centre-point of a labelled region, whilst the other numbers will be passed back to the caller of labeling2.
As I said above, label numbers can be made equivalent to each other during the labelling process. This means that after the labelling is complete, there can be redundancy in label numbers. ARToolKit performs a little bit of label number shuffling to remove this redundancy, and ensures that all label numbers are consecutive.
These statistics are passed back along with the labelled image buffer to the caller. I won't go into the precise details here. If you want to know more, have a look at the labeling2 function in the modified sources that I linked to above. I've changed the prototype of the labeling2 function so that it uses arrays of structs that are easier to decipher, so hopefully it'll all make sense.
That's where I'm up to right now with parsing ARToolKit's behaviour. The next instalment will be on the behaviour of arDetectMarker2(), which will use some of the information collected by the labelling process.
It’s recently become warmer here in Southampton. So warm that when presented with assembling 42 Student Robotics servo boards, we opted to assemble them outside in the garden. It got colder as the sun went down, so I got the oil-filled radiator out of the shed and put it underneath the garden table, which we embellished with a blanket skirt to keep the heat in.
After placing one side, we retreated into the house for baking and second side placement.
Looks like it’s going to be a good year of hacking.
Sam‘s Bristol Ignite talk about Student Robotics has just appeared:
Good job Sam!
What do you do with 70-odd silicone sponge shims that you’ve just washed after laser cutting? Thanks right, bake them at 150°C for half an hour to dry them out:
I just watched Jérémie Zimmermann’s excellent 27C3 talk “Copyright Enforcement Vs. Freedoms” because I missed it the time around for some reason. Watch it:
Then I got a bit carried away, and generated this only very mildly related sticker for my laptop:
SVG, PDF, thing. Contains 2mm bleed. Original artefact.
So Johannes and I developed a need to create a giant keyboard cat.
(Now on youtube as well.)
I was in the process of setting up to record a screencast of how to use dynpk, when I discovered that my dynpk fu no longer ran on the RHEL machine I was using. Running my program resulted in this friendly output:
FATAL: kernel too old
Turns out this is a message brought to us by glibc. glibc has a configure option ‘–enable-kernel’. The argument to this option is a Linux version, e.g. 2.6.18. This tells glibc the minimum Linux version that the resulting built stuff has to support. Apparently telling it the name of a more recent kernel version can improve performance because it doesn’t have to include as much compatibility cruft.
The current Fedora 14 glibc is built with this argument set to 2.6.32. Running a statically linked binary generated on F14 on a RHEL 5 machine running 2.6.18 is thus no longer possible without some additional work. What additional work is that? Rebuilding the C library with the ‘–enable-kernel’ option set to something compatible with the target system. glibc will still present the same API to the programs we want to run, but includes the additional fudge for old kernel version compatibility. We don’t have to rebuild all of our programs, just shoehorn our own glibc into the bundle dynpk makes.
“Rebuild the C library from source, you’re crazy.” I hear you say whilst backing away slowly. It’s not a massive deal. All I have to do is:
% fedpkg co -a glibc % cd glibc ... change 2.6.32 to 2.6.18 in glibc.spec ... % fedpkg local ... crunch, bang, whizz, pop ...
And some time later, out spring some RPMs. You’ll find the ones I’m using for F14 on RHEL5 here (don’t install them!).
I’ve modified dynpk so that it can take uninstalled RPM files and install them into the bundle (it’s a rather naive install, but it does the job right now). It can also take some glibc RPMs to build its own wrapper against. Here’s how: The build process is modified slightly:
% make GLIBC_RPMS="glibc-2.12.90-18.i386.rpm glibc-static-2.12.90-18.i386.rpm" ... nom nom nom ...
One would then add this to the dynpk configuration:
local_rpms: glibc-2.12.90-18.i386.rpm glibc-common-2.12.90-18.i386.rpm glibc-static-2.12.90-18.i386.rpm
And BANG, the stuff that comes out of dynpk now runs on RHEL5 again.
(Photo copyright flickr user paperbits under CC by-nc-nd.)
<< Newer Posts | Older Posts >> |
Site by Rob Gilton. © 2008 - 2019