So Johannes and I developed a need to create a giant keyboard cat.
(Now on youtube as well.)
I was in the process of setting up to record a screencast of how to use dynpk, when I discovered that my dynpk fu no longer ran on the RHEL machine I was using. Running my program resulted in this friendly output:
FATAL: kernel too old
Turns out this is a message brought to us by glibc. glibc has a configure option ‘–enable-kernel’. The argument to this option is a Linux version, e.g. 2.6.18. This tells glibc the minimum Linux version that the resulting built stuff has to support. Apparently telling it the name of a more recent kernel version can improve performance because it doesn’t have to include as much compatibility cruft.
The current Fedora 14 glibc is built with this argument set to 2.6.32. Running a statically linked binary generated on F14 on a RHEL 5 machine running 2.6.18 is thus no longer possible without some additional work. What additional work is that? Rebuilding the C library with the ‘–enable-kernel’ option set to something compatible with the target system. glibc will still present the same API to the programs we want to run, but includes the additional fudge for old kernel version compatibility. We don’t have to rebuild all of our programs, just shoehorn our own glibc into the bundle dynpk makes.
“Rebuild the C library from source, you’re crazy.” I hear you say whilst backing away slowly. It’s not a massive deal. All I have to do is:
% fedpkg co -a glibc % cd glibc ... change 2.6.32 to 2.6.18 in glibc.spec ... % fedpkg local ... crunch, bang, whizz, pop ...
And some time later, out spring some RPMs. You’ll find the ones I’m using for F14 on RHEL5 here (don’t install them!).
I’ve modified dynpk so that it can take uninstalled RPM files and install them into the bundle (it’s a rather naive install, but it does the job right now). It can also take some glibc RPMs to build its own wrapper against. Here’s how: The build process is modified slightly:
% make GLIBC_RPMS="glibc-2.12.90-18.i386.rpm glibc-static-2.12.90-18.i386.rpm" ... nom nom nom ...
One would then add this to the dynpk configuration:
local_rpms: glibc-2.12.90-18.i386.rpm glibc-common-2.12.90-18.i386.rpm glibc-static-2.12.90-18.i386.rpm
And BANG, the stuff that comes out of dynpk now runs on RHEL5 again.
(Photo copyright flickr user paperbits under CC by-nc-nd.)
For my PhD, I’ve been using the University of Southampton’s Iridis Compute Cluster, a.k.a. “supercomputer”. I’m using this to run the fitness tests for some genetic algorithm optimisation things I’m working on. Each fitness test takes 10 to 30 seconds, so the more I can run in parallel, the better (up to a point…). Using this cluster, I can run my work much faster. Getting to this point took a lot of beating though.
In what appears to have been some kind of twisted marketing stunt, many places report that the Iridis cluster runs Windows. It doesn’t. If it did, I wouldn’t entertain going near it. It runs Red Hat EL5.
After filling in the paperwork to get an account on this wondrous cluster, I shelled in and went about compiling my work so I could run it on the cluster. I was expecting some packages not to be installed, as I do on most systems that I initially approach. Unfortunately one library that I wanted to use, CGAL wasn’t installed, nor was it in the repositories. So a request for this to be installed would have involved getting one of the sysadmins (who are already stretched on fixing some killer, and I really do mean killer, GPFS performance issues) to install it from source, and would take far too long.
Option 1: Build from Source
So, I went about building it myself from source. Like a good little source-building monkey, I climbed the dependency tree, building the various other things that I needed for this. Things like cmake and boost… This became painful, as the 32-bit headers for libc weren’t installed. CGAL has some sort of fatal bug on 64-bit systems. Damn. I wasn’t going to build the C library. That’s where I drew the line. I went home, drank some tea, watched some Family Guy, slept.
Option 2: Static linking
With a freshly caffeinated brain, I decided to try static linking. This’d hopefully solve my problems because I could compile all the libraries from my local machine into one giant executable that I’d transport to the remote machine. Sounded good. Then I found that Fedora only provides static libraries for a small set of packages, and has a general dislike of these things. I was a little perturbed by this.
Option 3: Bundling the dynamic linker
I nosed around with the dynamic linker. I found that the dynamic linker can be invoked with the program that it should dynamically link as its argument, e.g:
% /lib/ld-linux.so.2 /bin/bash
I realised that this could fix the issues I was having. So I wrote a small utility that’d bundle together the dynamic linker from my Fedora system along with the executables that I wanted to run and the shared libraries they required. This was nice. I could take a binary from my Fedora system, pass it through this utility, and get a wrapped binary that I could run on RHEL.
A further nicety that I discovered was the “auditing” functionality of the dynamic linker. This allowed me to write a function that’d get called every time the dynamic linker went to load a new shared library from disk. My auditing library would scream its head off if this library wasn’t from the set that within the bundle. This meant I could be sure that the code I was running was the same as the code I was running on my own machine.
There’s more information about the auditing API in the rtld-audit man page.
I stuck with this solution for a few hours, until I discovered some issues with it. Many programs rely on configuration files and other executables on the system they’re running on. ImageMagick is one of these programs. It often invokes other programs to convert between formats, and has a configuration file that modifies this process too. My statically linked program used ImageMagick’s C API to manipulate some images, so it ended up invoking some stuff from the host system. This turned nasty when RHEL’s buggy librsvg was used to convert an SVG to a PNG :(
Option 4: fakechroot
It was clear at this point that I needed some kind of chroot environment. Unfortunately chroot itself requires superuser rights on the system that it’s run on (for some reason that isn’t completely clear to me). So I looked around and found fakechroot. I extended my bundling utility to support bundling entire RPMs from my system. It would then scan through all the files, find the dynamically linked ones, and replace them with a wrapper script that ensured the correct dynamic linker was used.
Suddenly, horror. ImageMagick’s convert would segfault whilst it was somewhere inside pixman. I started debugging this, and rapidly discovered that most of the values I wanted to see in pixman’s code were unavailable to me, as gdb reported they were “optimised out”. I installed Fedora 14 in a VM so I could use the later version of gcc and gdb’s combined power to be able to see these variables. I wrangled on what was going on for quite a while, and roped Jeremy into the situation for a few hours too. We found that the dynamic linker was incorrectly returning a NULL pointer when the address of a specific piece of thread-local storage was requested. This was ghastly. I waded around in this situation for a while. Then, through a random stroke of luck, I found that if I didn’t load my audit library it all worked! Something about the rather unique situation I’d created evoked a bug from glibc. Exciting. I don’t have time to work on it, so I’ll have to leave it there and just not use the audit library.
The result: dynpk
The result of all of this is a tool called dynpk (“din-pack”). You provide it with a list of the RPMs from your own system that you’d like to be bundled, baked, and wrapped into something you can transport to a Linux system of your choice. It’s nice to be able to run Fedora’s ipython on top of a RHEL machine, for example!
Instructions on how to get hold of and use this tool can be found here.
This makes me wonder what the MATLAB and LabVIEW-loving license-server junkies do on the cluster when they find a bug…
I can tell another Red Hat EL5 related tale as well. All of the Linux computers in the undergraduate computing lab in ECS run Red Hat EL5. Sounds great. However, EL5 is made of old software. It’s probably fine if you’re using it to perform simple office tasks, but seeing bugs that were fixed years-ago still romping around on those desktops is incredibly frustrating.
There’s a student-run project called CSLib that has, for many years now, attempted to solve the lack of software that the undergraduate machines have. Unfortunately, CSLib is never going to match the man power, and hence freedom from bugs and number of packages, that $MAJOR_DISTRO (e.g. Fedora, Ubuntu) achieves. It’s a brave effort, but in order for it to be a catch-all solution, it really needs to use the power of the larger free-software community.
dynpk can provide some relief to people who are in situations where they are essentially forced to use a system that they are not in control of that lacks the software they desire.
It seems to me that a long-term solution for both the supercomputer and public-machine problems are virtual machines. Yes, I know I’m late to the “let’s all wave our hands in the air about virtual machines” party, but I think this invasion needs to continue much further. The compute cluster should run virtual machine images. Amazon’s EC2 already does this. The supercomputing posse should follow suit. The lab machines I speak of above would also massively benefit from allowing each user to have their own VM image that’s transferred to the machine they’re using when they log in.
Fedora 14 came out yesterday. Amongst various run-of-the-mill updates, a couple of things stand out in it for me. The newer gdb and gcc combination allow for a much improved debugging experience that rarely involves one cursing at the words “value optimised out”. I happen to know this as good as it says on the tin because I resorted to using an alpha of F14 in qemu to attack a particularly gristly situation I got myself into about a month back. systemd also looks quite interesting, and I think it’s possible to switch over to that in F14.
So I waited patiently for Fedora 14 to appear in preupgrade, which downloads all the updates, reboots and installs them without the need for a CD, DVD, USB key, or data-laden Armadillo. It downloaded everything and I rebooted into the updater to be presented with a prompt asking me to insert a driver disk. Not only did it not tell me what driver it was looking for, it also wouldn’t let me proceed with the upgrade without this “driver disk”.
Now I knew that it didn’t genuinely require some kind of magic driver to continue. My machine contains no exotic hardware, and has been running a plain-old Fedora kernel since its inception. I decided to delve somewhat deeper.
With the help of the Anaconda wiki pages, I quickly worked out how to replace the Anaconda instance that preupgrade was running with one that I’d massaged myself. I got Anaconda to run gdbserver, which I then connected to. Unfortunately 99% of the variables I wanted to look at had been optimised away. I then spent a fair while injecting print statements throughout the bits of relevant code.
Eventually, I discovered the problem. “stage1” of Anaconda, who’s responsibility is to load the more graphical “stage2”, searches through all the block devices in the machine to find disk drives that are worth inspecting. Part of this “worthiness” test involves it inspecting the size of the disk to determine if it’s above some small size that no disks today could be below. This test reads the contents of the ‘size’ file sysfs provides for the device that’s being examined (e.g. /sys/block/sda/size). This file contains a number. This number is the number of 512-byte blocks that the device has. On my machine this file contains “2930277168”.
This bit of Anaconda then used strtol() on this string to convert it to an integer. An important feature of strtol is that it returns a long int. On most 32-bit systems, this is a 32-bit number. It’s signed, so its maximum value is 2147483647. Note that the value from my hard disk, 2930277168, is larger than 2147483647. So strtol returns indicating the value’s out of range. Anaconda’s device listing stuff immediately explodes because of this, and then the (somewhat disturbing) state machine decides that we need a driver disk to solve the problem of us not having a device list.
Solution: A patch to Anaconda to use strtoll instead, which uses long long ints. These are 64-bits, and so there won’t be a problem until disks are measured in zettabytes.
Site by Rob Gilton. © 2008 - 2019