Entries tagged with hacking

Winding down ahead of the start of my Christmas leave, I finally found the time to dig into a little side project to extend python distutils to allow me to bundle up various tools which install in non-standard locations — principally sbin and libexec directories.

After reading my way through the various commands included in distutils, I worked came up with a series of modules to deal with the problem:

a build_sbin.py module that subclasses build_scripts, replacing the normal install location with a specific sbin version

a install_sbin.py module that duplicates much of the functionality of install_scripts.py, picking up the targets from the sbin build directory and installing them into either the sbin directory under the standard installation or the value set in setup.cfg

a dist.py module that subclasses distutils.dist.Distribution to create CustomDistribution, adding the attributes self.sbin and self.libexec to allow these keywords to be passed through from distutils.core.setup() and used by the custom build and install modules.

With all this in place, it was simply a matter adding the keyword distclass=CustomDistribution to setup() and adding the appropriate libraries to cmdclass — I eventually moved this into CustomDistribution even though I'm pretty sure it's not the right thing to do, because I thought I was going to want to do it every time.

This may all seem a bit baroque but got two significant bits of code that need to be installed in this fashion and I need something to reduce the complexity of the install processs and to provide a single command that can be used to regress the current version to an older one in the event of problems. And finally, after a morning of hackery, I think I've got something that will fulfil my requirements.

Despite having usedgit on small projects for quite a while, I'd always felt that I'd simply taken my experience from my previous versioning system — I'm a long-time CVS maven — and simply mapped into corresponding git commands. And although I'd skimmed through enough of the Git Book to make this work, I wasn't entirely sure I understood some of the more esoteric features.

Finally having a project large enough to experiment with — and having built up enough confidence to know how to clone and pull without damaging my master copy — I decided to spend a little time working with the book to firm up my theoretical understanding of what was actually going on.

Almost immediately, I had a revelation: I'd completely misunderstood staging.

Moving from CVS, I'd assumed that git add simply notified git that the specified file was under version control. I'd also assumed, again from CVS, that I either needed to supply the full path to each file I wanted to commit or use git commit -a to push the lot. But I quickly realised that what I'd actually done was to hit on a particular set of options that combined the two steps of moving the file to the staging area and committing it to the repository.

So I'd actually been missing a really useful feature: the ability to stage up copies of particular files ready for a commit whilst still working on others — something that seems extremely well suited to my usual cyclical method of debugging and testing.

I've discovered that much of the truly horrible recent performance of my iMac is not the result of age and resource deprivation as I'd supposed, but rather consequent of a gradually failing hard disk. Unfortunately, I learnt this under less than ideal conditions: the machine detected problems partway through an upgrade and refused to continue installing the new OS whilst also declinng to boot the old installation, leaving me in something of a limbo.

Unable to recover the system with either Disk Utility or a single user mode fsck, I've been reduced to runing my system off an SD card while I consider how to back up and move my old data — because, of course, my Time Machine backup turns out to have been almost a year out of date. Annoyingly using an SD card as my boot volume actually seem to be greatly preferable to running of the internal hard disk: both much quieter and much quicker. Which brings me back to my original realisation: most of my recent performance problems seem to have been down to failing IO operations...

Satisfying day finally sorting out cgroups integration, only briefly interrupted by a fire drill. I can't really complain: I can't remember the last time we were evacuated, the whole thing was nice and efficient, and we were back inside before my postprandial mug of tea could get cold...

Finally getting my act together to sort out LDAP, I got the server going but discovered a couple of problems whilst trying to get Linux to use it for authentication — essentially, I was able to look up users with some commands, e.g. finger and getent passwd etc, worked but when I tried to obtain information on individual accounts using id or a targeted getent it failed to work.

I eventually traced the problems back to a combination of nscd and a lack of indices on the LDAP databases. Shutting down the caching daemon allowed me to use the slapd log events to see what was going on, at which point I was able to see what I needed to add to slapd.conf and to build the new indices with slapindex. With that done, I restarted nscd and found that both forward and reverse lookups worked exactly as expected.

Exactly? Well, not quite. I enabled LDAP support on the storage appliances, only for the Lustre metadata servers to take themselves down for an unexpected reboot — not something that was mentioned in the documentation, but which is apparently mentioned in a later version of the software...

A series of interesting, if exotic and unnecessary, experiments with ELF files, changing some of fields with a binary editor in order to fix a run-time linking problem without resorting to either $LD_LIBRARY_PATH or ld.so.conf — madness yet there is method in it, for most of the time the system is being used for cross-compilation.

Eventually I realised that in order to get NumPy to pick up the MKL LAPACK and BLAS routines, I had to use patchelf on the shared objects to explicitly add the MKL directory to the RPATH in the various .so objects. Not pleasant but better, perhaps, than risking all sorts of compile time confusion by setting the library paths for the host system rather than the target one, simply to allow people to use numpy.

Fortunately, a potential saviour is already close at hand: conda, as championed by the Iris people. This essentially automates the steps of downloading, building, and binary editing the rpaths to create a relocatable set of packages all of which can be neatly contained in their own distribution channels — ideal for those of us who want to us it to install in a shared root file system...

An interesting poser came my way today: given various sets of objects, some of which may be members of other sets, how many distinct collections of sets exist such that all objects are either in one collection or the other? After a bit of pondering and graphic, I discovered that in my case the answer appeared to be one. Which given that the whole exercise was intended to allow me to partition the objects up into groups that could safely be processed in parallel wasn't really the answer I wanted...

In need of something interesting to do, I spent part of today digging into Trac, trying to work out whether we could use it to replace both our change logging and source code management systems. Although I think it may require a little bit of customisation to get it to interface with the wider organisation's change control software, all the other desired features — including configurable ticket templating — look like they ought to be pretty simple to implement.

A couple of weeks ago, on a bit of whim, I bought myself a Raspberry Pi as a bit of a toy. Having done this I promptly noticed a minor snag: my my total lack of USB input devices and the absence of HDMI ports on any of my monitors.

After a bit of thought, I realised that I was being a fool and that I could use my long-honed command line skills to set up a headless box in no time at all. Thus, I grabbed a copy of Raspbian, dd'd it to my SD card, booted the Pi off the network, used ssh to install a VNC server, installed and configured Avahi to register the VNC with OS X, and used the Screen Sharing App on my iMac as my VNC client.

My little slug looks like may have finally breathed its last: it spontaneously powered down sometime yesterday and now when I attempt to bring it back up, the status LED flashes orange for a second and then it powers off again which makes me suspect the power supply may be faulty. If I can't get it working again soon, I think I'm going to bite the bullet and move to a Raspberry Pi running Raspbian and hang the slug's USB disk off it to act backup for an SSD card — I was never able to get the disk to spin down on the slug and I'm looking forward to having a mail server that's a bit quieter.

Unfortunately the slug acted as my mail server — I know, I'm so trad! — so until I get a replacement, I'm stuck with my ISP's web interface to Exchange. Great...

ETA: testing appears to confirm my initial hypothesis, with the power dying more quickly when the network adapter is connected, suggesting that the supply is no longer providing the current required to boot the slug. Replacement power supplies seem to be pretty easy to get hold of, so I'm going to give that a try before carrying out a full-scale migration.

During testing of my current project — a file tree analyser that uses of the GPFS API to speed up access to the metadata — we discovered an interesting bug that resulted in some of the file ownerships being incorrectly attributed.

Investigating, I quickly realised that the fault was caused by a race condition in the code: because the dump of file namespace took a non-trivial amount of time to complete, it was possible for a file to be removed after its name had been retried but before its inode data had been read and for the inode number to be reused by a new file. This meant that it when the file names were joined with the file attributes using the inode number as a unique key, the original file path was assigned the attributes — including the ownership — of the new file. Once I understood this, I changed the join to use inode number and generation ID in place of simple inode number only to find that this completely broke the matching process

Digging deeper into the code and printing out some of the generation numbers, I discovered that the values returned by gpfs_ireaddir() in the form of a gpfs_direntx_t failed to match those returned by gpfs_next_inode() in a gpfs_iattr_t structure. From the sorts of values being returned, I wondered whether the problem might be caused by a mismatch between the variable types and replaced the 32 bit routines with their 64 bit equivalents only to experience exactly the same problem.

Looking more closely, I eventually realised that the lowest 16 bits of the two generation IDs were the same while the highest 16 bits were only set in gpfs_iattr_t.ia_gen. Masking the field appropriately, I was able to combine the generation ID with the inode number to create compound key that I was able to use to join both structures in a coherent way, trading one form of inaccuracy — incorrectly assigned ownerships — for a more acceptable one — ignoring files deleted and recreated during the scan.

I've spent a big chunk of today trying to track down a strange deadlock in the parallelised version of my current python script. Eventually, after much digging, I noticed that the hang always occurred when there were 32,767 items in the Multiprocessing.Queue, even though the queue had been declared with an unlimited size and the put attempt had not generated a Full exception as might be expected.

I was then able to confirm that a Queue created with q = Queue(2**15) failed with an exception, while a one created with q = Queue(2**15-1) worked as expected and raised a Full condition when the limit was reached. Inspecting the source code of multiprocessing/queue.py, I noticed that maxsize defaults to _multiprocessing.SemLock.SEM_VALUE_MAX, which inevitably comes out at 32,767 and explains the whole problem — although it's unfortunate that the code simply hangs rather than raising an exception when the limit is reached.

So given that I know I need a work queue that can hold more than 32K items, it looks like I'm going to have to roll my own class to support a larger number of active semaphores — something that I hope will be as simple as subclassing Queue and overriding the semaphore routines...

ETA: I finally fixed this by creating a wrapper class to add a local, producer-side buffer to hold any items that could not be immediately added to the queue. By attempting a flush of the buffer ahead of every attempted put, I was able to reduce the number of Full exceptions I had to take by checking for pending elements in the buffer and only accessing the Queue when there were no pending items.

Today I finally bit the bullet and parallised my file scanning program. I'd been hoping I could get away with a few tuning tucks to the core code but when I ran a test case with 30 million items, I found the execution time was completely dominated by the cost of the name query routine. Fortunately, this turned out to be relatively easy to divide up: all I had to do was replace the current recursive code with a queue and a series of worker tasks and I was able to drop the run-time to something like a fifth of the serial version. I also looked into parallelising the metadata matching routine — something that would have required a distributed hash to implement — but after realising that the routine varied with the size of the file system rather than the number of items, I decided not to bother.

In the process of parallelising the code with multiprocessing.Process and Cython, I discovered a weird problem with the argument values: when I called the new parallel routine directly, everything worked as expected; but when I called the routine from a higher level, I got an exception that appeared to suggest that the first item in my Queue was corrupt.

Digging into the problem, I realised that it was caused by my choice of argument in the function definition. In the parallel routine I'd chosen to use char *value to mirror the data type used by the serial version; something that worked when the routine was called directly with a string rather than a python variable. But when the calling value was replaced with a python object I found that the value was replaced by assert_spawning, causing the first worker thread to fail. Once I realised this I was able to fix the problem by changing the type to object value, but it took me a while to work out what was going on, not least because the routines worked when tested separately and only failed when used as a complete entity...

I've discovered a couple of elegant spin-offs from my current work on GPFS. Playing around with the pool structures and calls in the API, I've found it to be much faster than mmdf, largely because the latter goes out to each NSD and queries it for its usage information — something that's not of much interest if you just want to know whether the pools are full or not.

I've also realised that if I combine a couple of other calls, I can both identify the fileset that owns a particular directory and pull out the three levels of quota — user, group, fileset — for it. This makes it trivially easy to summarise all the limits imposed on a particular user in a way that indicates both their current level of consumption, how much they have left, and the level at which the resource is being constrained.

And best of all, the code to do all this can be wrapped up in a handful of Cython calls. OK so the fileset identification code seems to need an inode scan, which seems to mean it can't be run as a normal user, and I haven't yet found a way to dump out a complete list of filesets defined within the file system — although this information must be available be mmlsfileset and mmrepquota can do it — but it's an extremely handy thing to get out of something originally intended for a completely different purpose...

Spent a moderately successful afternoon profiling and refactoring one of my codes to try to reduce its epic memory footprint and improve its dismal performance. After noticing that almost all the time was being spent performing metadata lookups — the code matches names against metadata information by inode number — I realised that if I replaced the general walk with some targeted seeks, I ought to be able to speed things up and reduce the amount of memory needed to reconcile the two lists. After rejigging the code to merge the seek operations into the name lookups, I was able to reduce the runtime by two-thirds on my smallest test case — which represents a huge saving, given that my largest dataset contains 100+ million items.

And with that triumph behind me, I dashed off to the quay for an evening of climbing and general snarkiness. I broke in my new shoes — G soundly mocked me for wearing women's shoes (the only difference is they're turquoise rather than orange) but in my defence they were the only ones in my size in stock that I could actually get me feet into. And although they're not entirely discomfort free, they don't apply pressure to my poor battered achilles tendons which counts for a lot given the all the time I've spent over recent weeks icing my feet to keep the swelling down.

Through chance we bumped into A, who was hanging around waiting for his buddy to arrive. I completely failed to realise that it might have been polite to offer to belay for him — we were just standing around gossiping at the time — so R made me go over and suggest it, only for A to say that he was happy to wait. In revenge I persuaded R to do her nemesis route which, thankfully, she polished off with no trouble and then graciously accepted my apology for being such a meanie...

Thanks to a casual suggestion from someone at work, I've discovered delights of Django. Having read through some of the documentation and done the first few tutorials, it has opened my eyes to a world of possibilities. Not only is the process of implementing persistent objects in python pretty easy, but the admin interface is powerful and flexible enough to make it almost trivial to glue the objects into a useful whole.

Working on big bit of data analysis — distiling down something like 600TB into a summary — it occurred to instead of using multiprocessing.Pool to run in parallel on a single node, I could really benefit from something more scalable.

While I'd like to be able to use one of the python MPI libraries, just because it's what I know, I suspect I might be better off using multiprocessing.mangers to distribute the work and use the environment LoadLeveler sets up for POE to trigger the right numbers of tasks on each of the nodes in the job. Which makes me wonder if I mightn't be better off investigating hadoop, with a view to seeing whether (a) whether it might not be of use to other people and (b) whether we can come up with a way to get it to play nicely with our regular batch system.

Fortunately I don't think I really need to worry about any of this: my deadline isn't until Tuesday and if my analysis requires more than 2500 CPU hours to complete, then I need to completely rethink my approach...

ETA: Unwilling to risk not having my results in time I added an extra level of decomposition, re-ran my analysis over a much larger number of CPUs, and got my answers back in around half an hour. FTW!

I've been messing around with an attempt to glue python on to an API layer. I didn't think it'd be all that difficult: I had a couple of pieces of sample code, some reasonably good documentation on the API and the cython manual to hand. But when I tried to convert the API header file into cython definitions, I hit an interesting snag: some of the type definitions I needed to use were incompletely defined.

At first I was thrown by the incomplete definitions, assuming that I'd missed an #include somewhere. Once I realised this wasn't the case — I imagine that the structures are only ever defined in a private set of development headers — I tried to work out why the examples worked and why my code failed. Eventually I spotted the pattern: because the incomplete types were only ever used as pointers, their actual contents didn't matter because they were simply being used to reference chunks of memory and, consequently, that all I needed to do to get them working in cython was to define them as void pointers.

It's not a particularly attractive solution to the problem but it seems to work...

I've been having problems getting my windows 7 laptop to talk to my wifi point. All the parameters appear to correct and the MAC address seems to be in the ACL, but for some reason the thing just won't connect unless I drop the access controls on the router.

So there I was, in the middle of trying to fix the problem for once and all when my network dropped. I tried to log back in to the router. No dice. I tried to ping my linux box. Again, nothing. I checked my IP address settings and discovered they were completely wrong: the router had clearly crashed and reset itself to its factory defaults. And, of course, being a professional computer scientist, I had no backups, no note of my DSL password, and the only ISP documentation I was able to find dated back to 1997. Oops.

Fortunately, I was able to guess the default password for the admin account, configure the LAN with the right set of addresses, located the phone number of my ISP in a recent email — which, perhaps unsurprisingly, was the same as 16 years ago — and got an extremely helpful person on the service desk to reset my password for me. After a minor glitch — I mistyped the password he gave me — I got myself back on-line after an outage of around an hour

I feel like I've learnt a few valuable lessons: take regular backups; keep a hardcopy of important documentation; and ensure passwords are kept somewhere safe. And, as a corollary, consider whether the time has come to get a smartphone...

I've been noodling around with something that needs to read in a configuration. Having sketched out a rough and ready parser, I'm now starting to wonder whether I oughtn't to cut my losses and do it properly with spot of lex & yacc. It's a difficult balance: it's small enough that a proper parser feels like overkill; but adding a proper parser would make the code more robust and allow me to add some interesting new features.

But in the process of messing around with this thing, I've been struck by just how unnecessary it all seems to be. Thanks to good high-level languages with built-in data structures and powerful bundles of libraries, getting down and dirty with old fashioned strings and linked lists feels postively antediluvian.