Beagle sure has come a long way in terms of maturity over the last few months.

I’ve been getting involved with Beagle’s interaction with dotLucene which is the C# port of Apache Lucene – a very powerful text search architecture. Beagle stores text content of indexed files within Lucene ‘databases’ and uses Lucene’s impressive search features to query on behalf of the user.

We previously used dotLucene 1.4.3 within Beagle, but I recently upgraded us to 1.9 RC1. Beagle is mostly unaffected by the changes, but there are some bug fixes and optimizations included. Perhaps the biggest win was the result of my extensive testing to make sure the upgrade didn’t break anything – I did identify and fix two bugs, and they were both also present in the 1.4 code.

The first bug was a file descriptor leak in a common code path (inside Beagle code), and the other, a fairly significant locking bug which was causing the locking often to not be having any effect at all. This explains some of the strange behaviour that has cropped up time to time in the past which we’ve never been able to pinpoint.

I also looked at some traces through the codepaths. I noticed that dotLucene was dealing with throwing and catching exceptions a hell of a lot – hundreds of exceptions being dealt with while indexing a small range of files. dotLucene was using exception catching where simple if/else combinations would work just fine. Exception handling is expensive as the runtime must jump through hoops keeping track of where to jump to if a certain type of exception occurs, so by greatly reducing the amount of exception handling that takes place, we have a nice small optimization in place.

After landing dotLucene 1.9, I’ve now turned some attention to another aspect of Beagle’s data storage mechanism. Beagle uses SQLite to store file attributes when extended attributes are not available, and for its file text cache.

Currently, Beagle only uses SQLite 2.x. Attempting to ‘port’ it to SQLite 3 revealed a problem in our SQLite interaction. You must always query a SQLite database from the same thread that the connection was originally established. Beagle is multi-threaded and we are using the same connection over multiple threads, which is (apparently) unsafe, and SQLite 3 explicitly checks this and returns error if you go beyond the original thread.

This creates a non-trivial problem to solve, and is a poor design decision from the SQLite developers. We’re going to stick with SQLite 2.x-only, as it seems to work just fine even despite sharing the connection over our thread pool. SQLite 3 wouldn’t bring any major benefits to us, and we are unable to use it due to its new explicit thread checking restriction. Sigh.

2.6.13 is almost ready to go stable in Gentoo, especially now that the evil AMD64 SMP bug has been solved (this also affected the last few kernel releases).

Beagle 0.1.0 is out, the result of much hacking from all directions over the summer. The release announcement pretty much says it all. On a sidenote this will be available in Gentoo’s package tree sometime soon.

Alauda driver is pretty much finished – reading, writing, hotswapping – to both XD and SmartMedia cards, even simultaneously, on 2 devices at the same time. The only problem right now is that a tester has reported reading of 8mb smartmedia does not work – this is difficult to track down as I do not own any cards this small, and the address space is different on this media (but the driver is written so that this should work…)

I’ve been donated a MS keyboard with fingerprint reader with the task of getting the fingerprint reader working on Linux. There is a major complication here though, the device appears to simply send an image of the fingerprint to the host computer, but the I think the image is encrypted. Can’t be an impossible problem to solve, right?

Regarding the spam-attacks on the Gentoo hosted weblogs, I can globally remove and blacklist spam (based on keyword or URL) very easily so please just report it to me. If anyone knows of good ways to automatically combat spam in b2evolution or feels like hacking something up then please let me know. I’m not too fond of the “type the letters from this image” schemes, but something like an additional confirmation screen (where the user just has to click a button) if the user included 3 or more URL’s in the same comment would probably confuse the spambots enough to quieten things down.

Update: Missed this earlier, but it looks like the new b2evo release has improved antispam capabilities. Will see how this turns out…

I’ll be offline for a while as of Monday, moving back up to Manchester into a new house to start my 2nd year of university.

entagged-sharp has now been imported into Beagle CVS, replacing the filtering code we had previously. This pretty much closed all the audio-filtering bugs that we had, and added support for more formats (m4a, m4p, xm, sm, it, mod). Hopefully nothing broke at the same time :)

I’ve just committed a new version of these, creating a transparent layer, so that these bindings work exactly the same on FreeBSD as they do on Linux. On FreeBSD systems, this will make use of FreeBSD’s extattr API – effectively making the use of extended attributes somewhat portable. The interface which Mono exposes is still the Linux xattr API, but the slight differences between xattr/extattr are handled accordingly by the mono runtime. Thanks to Stephen Bennett (spb from Gentoo) for letting me test things on his FreeBSD install.

Extended attributes are metadata (key/value pairs) that you can apply to files, directories, and symlinks. For example, a program could store the mime-type of a file in an attribute to prevent the need to look it up in future. Extended attributes are nice, because they are stored in/near the file inode, making them cheap to use if you are going to be using the file anyway. Beagle uses them internally and suffers quite a bit when it has to resort to using a traditional database instead.

I’m looking forward to going out to Stuttgart for GUADEC this weekend. I’m flying out Friday afternoon, immediately after an exam. I’m staying in the youth hostel. If you’re out there, make sure you say hi, but remember that unfortunately I won’t be at the Monday night social (or any of Tuesday) — I have to fly back Monday evening to sit my final exam on Tuesday :(
Here’s a mugshot:

After seeing it requested, I recently ported some of the Beagle code for manipulating extended attributes on files into the core Mono library base. It’s been added in the Mono.Unix.Syscall class, which is set to replace Mono.Posix.Syscall once Mono 1.2 is released and widespread.

This is my first contribution to Mono, with thanks to Jon Pryor (Mono.Unix maintainer) who pointed out that my first attempt at this was incorrect, and kindly went out of his way to describe the possible solutions in depth. Now to write some usage documentation…

Beginning to wonder if GConf for the configuration system was such a good choice after all. I’m getting slightly random Segfaults (yes, segmentation faults from a mono application, I thought it was odd too..).

Everything has been going according to plan until I start adding live update notification, so that beagled can immediately realise when a config option has been changed. Then (most of the time), when I change the configuration, I get segfaults before control is passed back to beagled. I’ve been advised that this might be a threading issue – I can’t get my head around why this might be the case, but then again, I can’t reproduce it in a simple test app.. Hmm…

On another note, I managed to sort out a problem with the timing of one of my exams which means I can now make GUADEC – I’ve got my flight booked :)

Finished the basic implementation of the configuration system. As previously mentioned, this is based on GConf.

It’s far from complete: you have to choose your settings before beagled is running, or restart beagled after changing the configuration for the new settings to take effect. You can’t do much in the way of configuration right now, but you can add more filesystem roots to be indexed.

It includes a new command line tool, beagle-config. For a demo, read the full post.

I’ve posted it to bug 172283 (direct patch link here). There’s some implementation details included at the top of the patch.

Read on for an example of beagle-config… It’s quite fancy as the configuration app itself is quite simple (and will be simplified further) – it pulls everything (the list of configuration “sections”, the operations you can perform, their descriptions, the invokation output) from Beagle.Util.Conf.