The result of Chris’s internship was a library called libkoki. This library hasn’t really been released yet properly to the internets — we’re working on it, but various other things have had a higher priority, like shipping Student Robotics stuff etc. You’ll be hearing more about it soon… The quick summary is that libkoki is a library for finding visual markers in images.
So the important thing for Chris during his internship was to make libkoki robust. It is more robust than ARToolkit, which is a popular choice, in a few ways. Its markers feature CRCs in their patterns, which prevent things like windows and other rectangular structures being misidentified as markers. libkoki also uses adaptive thresholding to be able to detect markers in heterogeneous lighting conditions, such as when one side of a marker is better lit than another.
Since much of the focus has been on getting libkoki robust (and I think rightly so) and into a working state (which it is in), not much time has yet gone into getting it to perform well. We’re now shipping it as the main part of the Student Robotics vision API, and so we’re quite interested in getting it to perform well.
So to get things to perform well, one needs to know where to focus one’s efforts. In the past I’ve used gprof, which is OK, but has various limitations — like not being able to profile shared libraries, which is what libkoki is. I’d read some things about Linux perf recently, so I decided to try it out. (I’d link to it, but their wiki seems to have been down for a while…) In conclusion, perf is pretty amazing. Just running
`perf top` results in an summary of which functions are currently taking up CPU time for the whole system, including userspace and kernel time. This is pretty amazing. No binary modifications required. As long as debug symbols are available, it’ll tell you function names.
We use libkoki on a BeagleBoard, which is an ARM Cortex-A8 platform. We ship a slightly old 2.6.32 kernel (it does the job, and developer hours are limited…), in which perf did not support ARM platforms. It turns out that our BeagleBoard’s kernel image has OProfile support compiled in. OProfile appears to do similar things to perf, so I used that.
So, I profiled some simple code that used libkoki and got the following profile from oprofile:
root@beagleboard:~# opreport -l pyenv/lib/libkoki.so Overflow stats not available CPU: ARM V7 PMNC, speed 0 MHz (estimated) Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1955 56.0493 koki_label_image 1018 29.1858 koki_threshold_adaptive 203 5.8200 koki_v4l_YUYV_frame_to_RGB_image
So, much time was spent in
koki_label_image. This function splits up a black-and-white thresholded image into connected regions. On closer inspection of some annotated source code that oprofile gave me, it turned out that
koki_label_image was spending a lot of time when the code discovered that two regions were in fact the same. After changing the way that libkoki goes about aliasing one region’s label with another, this changed to:
samples % symbol name 2326 40.8572 koki_threshold_adaptive 2208 38.7845 koki_label_image 427 7.5004 koki_v4l_YUYV_frame_to_RGB_image
So the 1.5 second processing time on the BeagleBoard has now been reduced to 0.78 seconds processing time. Fun times, but still some way to go!
Site by Robert Spanton. © 2008 - 2011