tail -f /var/log/messages | grep vegard

Wednesday, August 31, 2016

Having done quite a bit of kernel fuzzing and debugging lately I’ve decided to take one of the very latest crashes and write up the whole process from start to finish as I work through it. As you will see, I'm not very familiar with the site of this particular crash, the block layer. Being familiar with some existing kernel code helps, of course, since you recognise a lot of code patterns, but the kernel is so large that nobody can be familiar with everything and the crashes found by trinity and syzkaller can show up almost anywhere.

Because we’re using KASAN we can’t look at CR2 to find the bad pointer because KASAN triggers before the page fault (or to be completely honest, KASAN tries to access the shadow memory for the bad pointer, which is itself a bad pointer and causes the GPF above).

Let’s look at the “Code:” line to try to find the exact dereference causing the error:

I’m using CONFIG_KASAN_INLINE=y so most of the code above is actually generated by KASAN which makes things a bit harder to read. The movabs with a weird 0xdffff… address is how it generates the address for the shadow memory bytemap and the cmpb that crashed is where it tries to read the value of the shadow byte.

The address is %rdx + %rax and we know that %rax is 0xdffffc0000000000. Let’s look at %rdx in the crash above… RDX: 0000000000000097; yup, that’s a NULL pointer dereference all right.

But the line in question has two pointer dereferences, bdev->bd_disk and bd_disk->queue, and which one is the crash? The lea 0x4b8(%rbx), %rdi is what gives it away, since that gives us the offset into the structure that is being dereferenced (also, NOT coincidentally, %rbx is 0). Let’s use pahole:

1208 is ->queue, so that fits well with what we’re seeing; therefore, bdev->bd_disk must be NULL.

At this point I would go up the stack of function to see if anything sticks out – although unlikely, it’s possible that it’s an “easy” bug where you can tell just from looking at the code in a single function that it sets the pointer to NULL just before calling the function that crashed or something like that.

Probably the most interesting function in the stack trace (at a glance) is iterate_bdevs() in fs/block_dev.c:

I can’t quite put my finger on it, but it looks interesting because it has a bunch of locking in it and it seems to be what’s getting the block device from a given inode. I ran git blame on the file/function in question since that might point to a recent change there, but the most interesting thing is commit 74278da9f7 changing some locking logic. Maybe relevant, maybe not, but let’s keep it in mind.

Remember that bd->bd_disk is NULL. Let’s try to check if ->bd_disk is assigned NULL anywhere:

This by no means necessarily includes the code that set ->bd_disk to NULL in our case (since there could be code that looks like x = NULL; bdev->bd_disk = x; which wouldn’t be found with the regex above), but this is a good start and I’ll look at the functions above just to see if it might be relevant. Actually, for this I’ll just add -W to the git grep above to quickly look at the functions.

The first two and last hits are comparisons so they are uninteresting. The third and fourth ones are part of error paths in __blkdev_get(). That might be interesting if the process that crashed somehow managed to get a reference to the block device just after the NULL assignment (if so, that would probably be a locking bug in either __blkdev_get() or one of the functions in the crash stack trace – OR it might be a bug where the struct block_device * is made visible/reachable before it’s ready). The fifth one is in __blkdev_put(). I’m going to read over __blkdev_get() and __blkdev_put() to figure out what they do and if there’s maybe something going on in either of those.

In all these cases, it seems to me that &bdev->bd_mutex is locked; that’s a good sign. That’s also maybe an indication that we should be taking &bdev->bd_mutex in the other code path, so let’s check if we are. There’s nothing that I can see in any of the functions from inode_to_bdi() and up. Although inode_to_bdi() itself looks interesting, because that’s where the block device pointer comes from; it calls I_BDEV(inode) which returns a struct block_device *. Although if we follow the stack even further up, we see that fdatawrite_one_bdev() in fs/sync.c also knows about a struct block_device *. This by the way appears to be what is called through the function pointer in iterate_bdevs():

1908 func(I_BDEV(inode), arg);

This in turn is called from the sync() system call. In other words, I cannot see any caller that takes &bdev->bd_mutex. There may yet be another mechanism (maybe a lock) intended to prevent somebody from seeing bdev->bd_disk == NULL, but this seems like a strong indication of what the problem might be.

Let’s try to figure out more about ->bd_mutex, maybe there’s some documentation somewhere telling us what it’s supposed to protect. There is this:

__blkdev_get() is called as part of blkdev_get(), which is what is called when you open a block device. In other words, it seems likely that we may have a race between opening/closing a block device and calling sync() – although for the sync() call to reach the block device, we should have some inode open on that block device (since we start out with an inode that is mapped to a block device with I_BDEV(inode)).

Looking at the syzkaller log file, there is a sync() call just before the crash, and I also see references to [sr0] unaligned transfer (and sr0 is a block device, so that seems slightly suspicious):

However, I can’t see any references to any block devices in fs/ramfs, so I think this is unlikely to be it. I do still wonder how opening /dev/sr0 can do anything for us if it doesn’t have a filesystem or even a medium. [Note from the future: block devices are represented as inodes on the “bdev” pseudo-filesystem. Go figure!] Grepping for sr0 in the rest of the syzkaller log shows this bit, which seems to indicate we do in fact have inodes for sr0:

VFS: Dirty inode writeback failed for block device sr0 (err=-5).

Grepping for “Dirty inode writeback failed”, I find bdev_write_inode() in fs/block_dev.c, called only from… __blkdev_put(). It definitely feels like we’re on to something now – maybe a race between sync() and open()/close() for /dev/sr0.

syzkaller comes with some scripts to rerun the programs from a log file. I’m going to try that and see where it gets us – if we can reproduce the crash. I’ll first try to convert the two programs (the one with sync() and the one with the open(/dev/sr0)) to C and compile them. If that doesn’t work, syzkaller also has an option to auto-reproduce based on all the programs in the log file, but that’s likely slower and not always likely to succeed.

I use syz-prog2c and launch the two programs in parallel in a VM, but it doesn’t show anything at all. I switch to syz-repro to see if it can reproduce anything given the log file, but this fails too. I see that there are other sr0-related messages in the kernel log, so there must be a way to open the device without just getting ENOMEDIUM. I do a stat on /dev/sr0 to find the device numbers:

So the device major is 0xb (11 decimal). We can find this in include/uapi/linux/major.h and it gives us:

include/uapi/linux/major.h:#define SCSI_CDROM_MAJOR 11

We see that this is the driver responsible for /dev/sr0:

drivers/scsi/sr.c: rc = register_blkdev(SCSI_CDROM_MAJOR, "sr");

(I could have guessed this as well, but there are so many systems and subsystems and drivers that I often double check just to make sure I’m in the right place.) I look for an open() function and I find two – sr_open() and sr_block_open(). sr_block_open() does cdrom_open() – from drivers/cdrom/cdrom.c – and this has an interesting line:

/* if this was a O_NONBLOCK open and we should honor the flags,
* do a quick open without drive/disc integrity checks. */
cdi->use_count++;
if ((mode & FMODE_NDELAY) && (cdi->options & CDO_USE_FFLAGS)) {
ret = cdi->ops->open(cdi, 1);

So we need to pass O_NONBLOCK to get the device to open. When I add this to the test program from the syzkaller log and run sync() in parallel… ta-da!

This is not exactly the same oops that we saw before, but it’s close enough that it’s very likely to be a related crash. The reproducer is actually taking quite a while to trigger the issue, though. Even though I’ve reduced to two threads/processes executing just a handful of syscalls it still takes nearly half an hour to reproduce in a tight loop. I spend some time playing with the reproducer, trying out different things (read() instead of readahead(), just open()/close() with no reading at all, 2 threads doing sync(), etc.) to see if I can get it to trigger faster. In the end, I find that having many threads doing sync() in parallel seems to be the key to a quick reproducer, on the order of a couple of seconds.

Now that I have a fairly small reproducer it should be a lot easier to figure out the rest. I can add as many printk()s as I need to validate my theory that sync() should be taking the bd_mutex. For cases like this I set up a VM so that I can start the VM and run the reproducer by running a single command. I also actually like to use trace_printk() instead of plain printk() and boot with ftrace_dump_on_oops on the kernel command line – this way, the messages don’t get printed until the crash actually happens (and have a lower probability of interfering with the race itself; printk() goes directly to the console, which is usually pretty slow).

Since __blkdev_put() is the very last line of output before the crash (and I don’t see any other call setting ->bd_disk to NULL in the last few hundred lines or so), there is a very strong indication that this is the problematic assignment. Rerunning this a couple of times shows that it tends to crash with the same symptoms every time.

To get slightly more information about the context in which __blkdev_put() is called in, I apply this patch instead:

One thing that’s a bit surprising to me is that this actually isn’t called directly from close(), but as a delayed work item on a workqueue. But in any case we can tell it comes from close() since fput() is called when closing a file descriptor.

Now that I have a fairly good idea of what’s going wrong, it’s time to focus on the fix. This is almost more difficult than what we’ve done so far because it’s such an open-ended problem. Of course I could add a brand new global spinlock to provide mutual exclusion between sync() and clone(), but that would be a bad solution and the wrong thing to do. Usually the author of the code in question had a specific locking scheme or design in mind and the bug is just due to a small flaw or omission somewhere. In other words, it’s usually not a bug in the general architecture of the code (which might require big changes to fix), but a small bug somewhere in the implementation, which would typically require just a few changed lines to fix. It’s fairly obvious that close() is trying to prevent somebody else from seeing bdev->bd_disk == NULL by wrapping most of the __blkdev_put() code in the ->bdev_mutex. This makes me think that it’s the sync() code path that is missing some locking.

Looking around __blkdev_put() and iterate_bdevs(), another thing that strikes me is that iterate_bdevs() is able to get a reference to a block device which is nevertheless in the process of being destroyed – maybe the real problem is that the block device is being destroyed too soon (while iterate_bdevs() is holding a reference to it). So it’s possible that iterate_bdevs() simply needs to formally take a reference to the block device by bumping its reference count while it does its work.

There is a function called bdgrab() which is supposed to take an extra reference to a block device – but only if you aready have one. Thus, using this would be just as racy, since we’re not already formally holding a reference to it. Another function, bd_acquire() seems to formally acquire a reference through a struct inode *. That seems quite promising. It is using the bdev_lock spinlock to prevent the block device from disappearing. I try this tentative patch:

My reasoning is that the call to bd_acquire() will prevent close() from actually reaching the bits in __blkdev_put() that do the final cleanup (i.e. setting bdev->bd_disk to NULL) and so prevent the crash from happening.

Unfortunately, running the reproducer again shows no change that I can see. It seems that I was wrong about this preventing __blkdev_put() from running: blkdev_close() calls blkdev_put() unconditionally, which calls __blkdev_put() unconditionally.

Another idea might be to remove the block device from the list that iterate_bdevs() is traversing before setting bdev->bd_disk to NULL. However, it seems that this is all handled by the VFS and we can’t really change it just for block devices.

Reading over most of fs/block_dev.c, I decide to fall back to my first (and more obvious) idea: take bd_mutex in iterate_bdevs(). This should be safe since both the s_inode_list_lock and inode->i_lock are dropped before calling the iterate_bdevs() callback function. However, I am still getting the same crash… On second thought, even taking bd_mutex is not enough because bdev->bd_disk will still be NULL when __blkdev_put() releases the mutex. Maybe there’s a condition we can test while holding the mutex that will tell us whether the block device is “useable” or not. We could test ->bd_disk directly, which is what we’re really interested in, but that seems like a derived property and not a real indication of whether the block device has been closed or not; ->bd_holders or ->bd_openers MAY be better candidates.

While digging around trying to figure out whether to check ->bd_disk, ->bd_holders, or ->bd_openers, I came across this comment in one of the functions in the crashing call chain:

In particular, the “This function can only be called if @bdev is opened” requirement seems to be violated in our case.

Taking bdev->bd_mutex and checking bdev->bd_disk actually seems to be a fairly reliable test of whether it’s safe to call filemap_fdatawrite() for the block device inode. The underlying problem here is that sync() is able to get a reference to a struct block_device without having it open as a file. Doing something like this does fix the bug:

What I don’t like about this patch is that it simply skips block devices which we don’t have any open file descriptors for. That seems wrong to me because sync() should do writeback on (and wait for) all devices, not just the ones that we happen to have an open file descriptor for. Imagine if we opened a device, wrote a lot of data to it, closed it, called sync(), and sync() returns. Now we should be guaranteed the data was written, but I’m not sure we are in this case.

Another slightly ugly thing is that we’re now holding a new mutex over a potentially big chunk of code (everything that happens inside filemap_fdatawrite()).

I’m not sure I can do much better in terms of a small patch at the moment, so I will submit this to the linux-block mailing list with a few relevant people on Cc (Jens Axboe for being the block maintainer, Tejun Heo for having written a lot of the code involved according to git blame, Jan Kara for writing iterate_bdevs(), and Al Viro for probably knowing both the block layer and VFS quite well).

Rabin Vincent answered pretty quickly that he already sent a fix for the very same issue. Oh well, at least his patch is quite close to what I came up with and I learned quite a few new things about the kernel.

Tejun Heo also responded that a better fix would probably be to prevent the disk from going away by getting a reference to it. I tried a couple of different patches without much luck. The currently last patch from me in that thread seemed to prevent the crash, but as I only realised a few minutes after sending it: we’re decrementing the reference count without doing anything when it reaches 0! Of course we don’t get a NULL pointer dereference if we never do the cleanup/freeing in the first place…

If you liked this post and you enjoy fixing bugs like this one, you may enjoy working with us in the Ksplice group at Oracle. Ping me at my Oracle email address :-)

Thursday, May 26, 2016

WARNING/DISCLAIMER: Audio programming always carries the risk of damaging your speakers and/or your ears if you make a mistake. Therefore, remember to always turn down the volume completely before and after testing your program. And whatever you do, don't use headphones or earphones. I take no responsibility for damage that may occur as a result of this blog post!

Have you ever wondered how a reverb filter works? I have... and here's what I came up with.

Reverb is the sound effect you commonly get when you make sound inside a room or building, as opposed to when you are outdoors. The stairwell in my old apartment building had an excellent reverb. Most live musicians hate reverb because it muddles the sound they're trying to create and can even throw them off while playing. On the other hand, reverb is very often used (and overused) in the studio vocals because it also has the effect of smoothing out rough edges and imperfections in a recording.

We typically distinguish reverb from echo in that an echo is a single delayed "replay" of the original sound you made. The delay is also typically rather large (think yelling into a distant hill- or mountainside and hearing your HEY! come back a second or more later). In more detail, the two things that distinguish reverb from an echo are:

The reverb inside a room or a hall has a much shorter delay than an echo. The speed of sound is roughly 340 meters/second, so if you're in the middle of a room that is 20 meters by 20 meters, the sound will come back to you (from one wall) after (20 / 2) / 340 = ~0.029 seconds, which is such a short duration of time that we can hardly notice it (by comparison, a 30 FPS video would display each frame for ~0.033 seconds).

After bouncing off one wall, the sound reflects back and reflects off the other wall. It also reflects off the perpendicular walls and any and all objects that are in the room. Even more, the sound has to travel slightly longer to reach the corners of the room (~14 meters instead of 10). All these echoes themselves go on to combine and echo off all the other surfaces in the room until all the energy of the original sound has dissipated.

Intuitively, it should be possible to use multiple echoes at different delays to simulate reverb.

The constructor takes one argument: the number of samples in the buffer, which is exactly how much time we will delay the signal by; when we write a sample to the buffer using the add() function, it will come back after a delay of exactly nr_samples using the get() function. Easy, right?

Since this is an audio filter, we need to be able to read an input signal and write an output signal. For simplicity, I'm going to use stdin and stdout for this -- we will read 8 KiB at a time using read(), process that, and then use write() to output the result. It will look something like this:

(-c means 1 channel; -f s16 means "signed 16-bit" which corresponds to the int16_t type we've used for our buffers; -r 44100 means a sample rate of 44100 samples per second; and ./reverb is the name of our executable.)

So how do we use class FeedbackBuffer to generate the reverb effect?

Remember how I said that reverb is essentially many echoes? Let's add a few of them at the top of main():

The buffer sizes that I've chosen here are somewhat arbitrary (I played with a bunch of different combinations and this sounded okay to me). But I used this as a rough guideline: simulating the 20m-by-20m room at a sample rate of 44100 samples per second means we would need delays roughly on the order of 44100 / (20 / 340) = 2594 samples.

Another thing to keep in mind is that we generally do not want our feedback buffers to be multiples of each other. The reason for this is that it creates a consonance between them and will cause certain frequencies to be amplified much more than others. As an example, if you count from 1 to 500 (and continue again from 1), and you have a friend who counts from 1 to 1000 (and continues again from 1), then you would start out 1-1, 2-2, 3-3, etc. up to 500-500, then you would go 1-501, 2-502, 3-504, etc. up to 500-1000. But then, as you both wrap around, you start at 1-1 again. And your friend will always be on 1 when you are on 1. This has everything to do with periodicity and -- in fact -- prime numbers! If you want to maximise the combined period of two counters, you have to make sure that they are relatively coprime, i.e. that they don't share any common factors. The easiest way to achieve this is to only pick prime numbers to start with, so that's what I did for my feedback buffers above.

Having created the feedback buffers (which each represent one echo of the original sound), it's time to put them to use. The effect I want to create is not simply overlaying echoes at fixed intervals, but to have the echos bounce off each other and feed back into each other. The way we do this is by first combining them into the output signal... (since we have 8 signals to combine including the original one, I give each one a 1/8 weight)

And finally we also write the result back into the buffer. I found that the original signal loses some of its power, so I use a factor 4 gain to bring it roughly back to its original strength; this number is an arbitrary choice by me, I don't have any specific calculations to support it:

buf[j] = 4 * out;

That's it! 88 lines of code is enough to write a very basic reverb filter from first principles. Be careful when you run it, though, even the smallest mistake could cause very loud and unpleasant sounds to be played.

If you play with different buffer sizes or a different number of feedback buffers, let me know if you discover anything interesting :-)

Wednesday, July 11, 2012

The standard C++ pattern for comparing and ordering objects is to overload operator<() for your class. This function is what all the standard containers (std::set, std::priority_queue, etc.) and algorithms (std::sort, std::binary_search, etc.) use for comparing objects.

So if you have a class that you would like to put in a standard container, you have to overload operator<(). Your class usually has more than one member, and you usually want to order your objects first by the value of one member, then by the value of another member, and so on. For example, if you have a class person with string members firstname and lastname, and you would like a "phone book" ordering, you would order by lastname first, then by firstname, e.g.:

This code is not optimal. To see why, consider a case where you have two persons with the same lastname. The first check will fall through (lastname < other.lastname is false); so will the second (other.lastname < lastname is also false). Each check had to loop over the lastname when a single loop should have sufficed.

Fortunately, std::string happens to implement a member function compare() which returns an int. The semantics are similar to C's strcmp(a, b), which also returns an int. The return value is less than 0 if a < b, 0 if they are equal, or greater than 0 if a > b. Using this function, we can implement a much more efficient operator<():

To see the difference in running time, we can implement a simple loop which inserts the same element many times to a multiset:

int main(int argc, char *argv[])
{
std::multiset<person> persons;
for (unsigned int i = 0; i < 1000000; ++i)
persons.insert(person("Thomas", "Anderson with a rather long last name for the sake of the argument"));
}

We exaggerate the number of elements and the length of the lastname in order to amplify the difference. Running this on my laptop, it takes approximately 3.9 seconds to run the original code and 2.9 seconds to run the new version.

This is all well and good, but the actual problem runs a bit deeper than this. What if our code uses other compound data types than strings? Take e.g. std::pair. It is hopelessly broken, precisely because it doesn't implement a compare() function! std::pair only implements operator<() and only expects its elements to implement operator<() too. Have a look at the libstdc++ implementation (simplified):

Can you spot the performance bug? Yep, it calls checks both __x.first < __y.first and __y.first < __x.first in the worst case. This also means that std::pair<std::string, std::string> has exactly the same problem as our first implementation of struct person.

In order to show just how bad this really is, consider the case where we have nested pairs:

Intuitively, we should only need to perform 128 comparisons in the worst case (two comparisons for each pair of corresponding elements from the two objects). If we had two arrays of 64 ints, we would do it like this:

In order to answer the question posed ("How many comparisons between two ints are we actually making when we compare x6 against itself?"), we can simply program it using a wrapper class for int that counts the number of comparisons:

Because std::pair has three comparisons in the worst case, instead of just two (which would have been sufficient), the complexity of comparing nested pairs has just climbed from 2^n (where n is the number of levels of pairs) to 3^n. That's pretty bad!

Part of the solution, as employed by std::string and our improved struct person, is to define the most fundamental comparison between objects as a function that can tell you whether a < b, a == b, or a > b in a single call rather than determine this using two calls. It is clear that the two are semantically equivalent (you can both define operator<() in terms of compare() and the other way around), but one is computationally superior for compound objects.

This is still not the full story, however. We wouldn't get any performance improvement from std::set if we changed it to call compare() instead of operator<() — because it doesn't need to compare both ways! In fact, if we changed it, we would now find that std::set<int> operations take more time, since calling compare() on an int would have to check both a < b and b < a even when std::set only needs to know the result of one of them! (This is of course tighly coupled to the std::set implementation — implemented as a tree, lookups would typically traverse the tree by comparing the element that is searched for with the current node, and update the current node depending on the result of the comparison.)

The real solution, therefore, seems to be to implement bothoperator<() and compare() for all types. std::pair's would look something like this:

Now containers and algorithms can use the most efficient one — std::set would still use operator<() on its elements, but std::pair::operator<() would now use compare() (for its first element), and so on.

The video is a bit poor, mostly because the frame rate is supposed to be 60 FPS, but the Ogg/Theora capture was 25 FPS while the actual gameplay was probably closer to ~10 while doing the capture. Need to fix that. It looks better when playing.

Thursday, September 10, 2009

So I somewhat promised to say something more on my project for the Google Summer of Code.

I participated for the first time, and it was really a great experience. This is the first time I've been paid to do what I would probably have done in either case (because I immensely enjoy it), namely to work on an interesting free software project. So this time it was Jato, a new implementation of the Java Virtual Machine.

Jato is written in C and aims to be a JIT-only compiler/VM. So before the summer started, it wouldn't run a lot of Java programs. It couldn't even run the trivial "Hello World", but more about that later. It could do a few things like (java.lang.)String operations, but that was mostly because it was riding piggyback on another VM, Jam VM, which would load classes and interpret a lot of the library code (i.e. most of the standard Java API implemented by GNU Classpath).

My own project was to replace the parts that were borrowed from Jam VM with completely new, shiny code. Well, in fact, I had already worked on a standalone library called cafebabe, which would parse the Java .class file and present it to C programs using a very thin layer of structs -- the idea was to do as little as possible with the actual data and just make it easily accessible, e.g. so cafebabe doesn't do any verification of the bytecode or make sure that a class is its own superclass.

So we linked this library with Jato, and that worked pretty well. (I believe strongly in modularisation; with the class loader in a separate library, there's no way to do stupid things like making the class loader itself depend on the rest of the VM for functioning properly.) There were problems, of course, the biggest being that after developing in my own private branch for a couple of weeks, the rest of the project with its three other GSoC student participants was moving ahead too quickly for me to keep up; I had to "fix" just about every new mainline commit, since most of the VM data structures (classes, methods, fields) now had different names, different fields, etc.

At around this point, we started to notice that Jam VM wasn't being used for nearly as little as just class loading. In fact, static initializers (the "<clinit>" methods) and their calls were all being interpreted and executed by Jam VM, not the JIT compiler of Jato. And of course, once we started using Jato for those methods which had previously been executed by Jam VM, we found tons of bugs everywhere: The core VM (mostly my code replacing that of Jam VM), including improperly implemented Java semantics, missing bytecode instruction handlers, and the compiler (instruction selection, liveness analysis, register allocation). But it was fun, and extremely educative.

Did you know that a simple System.out.println() will make a whopping 650550 Java-method calls? That's using GNU Classpath for the Java API.

So we had to implement a lot of stuff just to make that work. In fact, we have so much infrastructure that we can even run a few other programs. And thus it was decided that it was time to make a release. The version 0.0.1 announcement:

The "next big" missing feature is garbage collection. I expect that'll be version 0.0.2. In other words, if you're interested in Java, garbage collection, or just compilers in general, you're welcome to join us...

Our team consisted of:

Pekka Enberg; Original creator, project manager and mentor.

Tomek Grabiec; His original project proposal included implementing exception handling, but he ended up somwhat like Ingo Molnar of the Linux kernel; Tomek also implemented a generic radix tree, implemented the basic threading support (java.lang.Thread), rewrote large parts of the register allocator, wrote the VM side of the JNI interface from scratch, and also did a lot of other core-VM work. Tons of other things too, including a lot of debugging.

Arthur Huillet; Floating-point support, register allocator fixes, other missing bytecodes, also a lot of debugging. Various other things.

Eduard-Gabriel Munteanu; Ported Jato mostly to x86-64. A tough task, since the i386 parts were moving at close to light speed.

Wednesday, June 17, 2009

First of all, I should say that on April 22, I went to Denmark to give a talk on kmemcheck. I was invited to DIKU (Datalogisk Institut på Københavns Universitet) by Julia Lawall, who held a workshop on Coccinelle (or, more generally, "finding bugs in operating system software"). It was really nice to be there, not (just) because I got a chance to talk about kmemcheck, but because I learned so much from all the other talks! Feel free to check out my slides.

I had asked my university (the University of Oslo) beforehand to sponsor the trip in exchange for a trip report, but I got no reply to my e-mail. I guess they don't see any value in collaboration with universities abroad.

I had also asked Redpill Linpro (a Norwegian/Swedish open source developer) beforehand to sponsor my work on kmemcheck for the summer, but they apparently weren't interested. At least I got a polite reply. (Instead, I got a stipend from Google for working on Jato for the summer, but more on that in a later post!)