More tea please.

Speeding up libkoki

The result of Chris’s internship was a library called libkoki. This library hasn’t really been released yet properly to the internets — we’re working on it, but various other things have had a higher priority, like shipping Student Robotics stuff etc. You’ll be hearing more about it soon… The quick summary is that libkoki is a library for finding visual markers in images.

So the important thing for Chris during his internship was to make libkoki robust. It is more robust than ARToolkit, which is a popular choice, in a few ways. Its markers feature CRCs in their patterns, which prevent things like windows and other rectangular structures being misidentified as markers. libkoki also uses adaptive thresholding to be able to detect markers in heterogeneous lighting conditions, such as when one side of a marker is better lit than another.

Since much of the focus has been on getting libkoki robust (and I think rightly so) and into a working state (which it is in), not much time has yet gone into getting it to perform well. We’re now shipping it as the main part of the Student Robotics vision API, and so we’re quite interested in getting it to perform well.


So to get things to perform well, one needs to know where to focus one’s efforts. In the past I’ve used gprof, which is OK, but has various limitations — like not being able to profile shared libraries, which is what libkoki is. I’d read some things about Linux perf recently, so I decided to try it out. (I’d link to it, but their wiki seems to have been down for a while…) In conclusion, perf is pretty amazing. Just running `perf top` results in an summary of which functions are currently taking up CPU time for the whole system, including userspace and kernel time. This is pretty amazing. No binary modifications required. As long as debug symbols are available, it’ll tell you function names.

We use libkoki on a BeagleBoard, which is an ARM Cortex-A8 platform. We ship a slightly old 2.6.32 kernel (it does the job, and developer hours are limited…), in which perf did not support ARM platforms. It turns out that our BeagleBoard’s kernel image has OProfile support compiled in. OProfile appears to do similar things to perf, so I used that.

So, I profiled some simple code that used libkoki and got the following profile from oprofile:

root@beagleboard:~# opreport -l pyenv/lib/libkoki.so 
Overflow stats not available
CPU: ARM V7 PMNC, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
1955     56.0493  koki_label_image
1018     29.1858  koki_threshold_adaptive
203       5.8200  koki_v4l_YUYV_frame_to_RGB_image

So, much time was spent in koki_label_image. This function splits up a black-and-white thresholded image into connected regions. On closer inspection of some annotated source code that oprofile gave me, it turned out that koki_label_image was spending a lot of time when the code discovered that two regions were in fact the same. After changing the way that libkoki goes about aliasing one region’s label with another, this changed to:

samples  %        symbol name
2326     40.8572  koki_threshold_adaptive
2208     38.7845  koki_label_image
427       7.5004  koki_v4l_YUYV_frame_to_RGB_image

So the 1.5 second processing time on the BeagleBoard has now been reduced to 0.78 seconds processing time. Fun times, but still some way to go!

Posted at 9:01 pm on Saturday 22nd October 2011

Site by Robert Spanton. © 2008 - 2011