Sure, IPv6 is going to save us all from the apocalypse, defeat communism, cure the swine flu, and bake you the most delicious brownies you’ve ever tasted. Someday. But in the meantime, for real people trying to do real work, it’s a fucking nuisance.

As more systems have started shipping with the technology, little compatibility issues continue to crop up. One of the more recurrent problems I’ve encountered is incompatibilities between Java and IPv6 on Linux – specifically Ubuntu. Up until recently, it was quite easy to eliminate the problem by merely blacklisting the appropriate kernel modules, thusly:

However, as of Ubuntu 9.04 (Jaunty), IPv6 support is no longer a module – it’s hard-compiled into the shipping kernels. No big deal, though, because there’s a system control variable that allows you to remove IPv6 support from the kernel.

# echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6

Except that doesn’t work. It seems there was a kernel bug where that setting was just plain broken. And it hasn’t been shipped with the normal Ubuntu kernels yet. So, what is one to do, short of re-compiling their own kernel?

Here is a copy-paste from an IM exchange I had with Java earlier:

# Java has entered the chat.

Java: Hey dude, what’s up?

Ardvaark: hey, i’m having a problem getting you to listen to an ipv4 socket when ipv6 is installed on my ubuntu box

Java: Yeah! I totally support IPv6 now! You didn’t even have to do anything because I abstract you from the OS details! Isn’t that great?!

Ardvaark: awesome, i guess, except that it doesn’t work.

Ardvaark: i really need you to just listen on ipv4, because the tivo just doesn’t like galleon on ipv6

Ardvaark: so sit the hell down, shut the hell up, and use ipv4

Ardvaark: pretty please

Java: Okay, geez, no need to get all pissy about it.

Ardvaark: and while you’re at it, could you please stop using like half a gig RAM just for a silly hello world program?

Java: Don’t push your luck.

# Java has left the chat.

And now that we’re back in reality, the magic word is -Djava.net.preferIPv4Stack=true.

I had the privilege of attending Hadoop World 2009 on Friday. It was amazing to meet, listen to, and pick the brains of so many smart people. The quantity of good work being done on this project is simply stunning, but it is equally stunning how much farther there remains to go. Some interesting points for me include:

Yahoo’s Enormous Clusters

Eric Baldeschwieler from Yahoo gave an impressive talk about what they’re doing with Hadoop. Yahoo is running clusters at a simply amazing scale. They have several different clusters, totally some 86 PB of disk space, but their largest is a 4000-node cluster with 16 PB of disk, 64 TB of RAM, and 32,000 CPU cores. One of the most compelling points they made was that Yahoo’s experiences prove that Hadoop really does scale as designed. If you start with a small grid now, you can be sure that it will scale up – way up.

Eric made it clear that Yahoo uses Hadoop because it so vastly improves the productivity of their engineers. He noted that, though the hardware is commodity, the grid isn’t necessarily a cheaper solution; however, it easily pays for itself through the increased turnaround on problems. In the old days, it was difficult for engineers to try out new ideas, but now you can try out a Big Data idea in a few hours, and see how it goes.

A great example is the search suggestion on the front page. Using Hadoop, they cut the time to generate the search suggestions on the front page from 26 days to 20 minutes. Wow! For the icing on the cake, the code was converted from C++ to Python, and development time went from 2-3 weeks to 2-3 days.

HDFS For Archiving

HDFS hasn’t been used much as an archival system yet, especially not with the time horizons of someplace like my employer. When I asked him about it, Eric told me that the oldest data on Yahoo’s clusters is not much more than a year old. Ironically, they tend to be concerned more about removing data from the servers due to legal mandates and privacy requirements, rather than keeping it around for a Very Long Time. But he sees the need to hold some data for longer periods coming soon, and has promised he’ll be thinking about it.

Facebook, though, is already making moves in this area. They currently “back up” their production HDFS grid using Hive replication to a secondary grid, but they are working on (or already have – it wasn’t quite clear how far along this all was) an “archival cluster” solution. A daemon would scan for least-recently used files and opportunistically move them to a cluster built with more storage-heavy nodes, leaving a symlink stub in place of the file. When a request for that stub file comes in, the daemon intercepts it and begins pulling the data back off the archive grid. This is quite similar to how SAM-QFS works today. I had a chance to speak with with Dhruba Borthakur for a bit afterwards, and he had some interesting ideas about modifying the HDFS block scheduler to make it friendly for something like MAID.

Jaesun Han from NexR gave a talk on Terapot, a system for long-term storage and discovery of emails due to legal requirements and litigation. I asked him about whether they were relying on HDFS as their primary storage mechanism, or if they “backed up” to some external solution. He laughed, and said that they weren’t using one now, but would probably get some sort of tape solution in the near future. He also said that he believed HDFS was quite capable of being the sole archival solution, and I believe he was implying that it was fear from the legal and/or management folks that was driving a “back up” solution. At this point, the Cloudera CTO noted that both Yahoo and Facebook had no “back up” solution for HDFS, except for other HDFS clusters. It certainly seems like at least a couple multi-million dollar companies are willing to put their data where their mouth is on the reliability of HDFS.

What’s Coming

There is a tremendous sense that Hadoop has really matured in the last year or so. But it’s also been noted that the APIs are still thrashing a bit, and it’s still awfully Java-centric. Now that the underlying core is pretty solid, it seems like a lot of the work is moving towards making your Hadoop grid accessible to the rest of the company – not just the uber-geek Java coders.

Doug Cutting talked about how they’re working on building some solid, future-proof APIs for 0.21. Included in this is switching the RPC format to Avro, which is intended to solve some of the underlying issues with Thrift and Protocol Buffers while opening up the RPC and data format to a broader class of languages. It’s worth noting that Avro and JSON are pretty easily transcoded to one another. Also, they’ll finally be putting some serious thought into a real authentication and authorization scheme. Yahoo (I think) mentioned Kerberos – let’s hope we get some OpenID up in that joint, too.

There is a sudden push towards making Hadoop accessible via various UIs. Cloudera introduced their Hadoop Desktop, Karmasphere gave a whirlwind tour of their Netbeans-based IDE, and IBM was showing off a spreadsheet metaphor on top of Hadoop called M2 (I can’t find any good links for it). I hadn’t thought about that before, and it seemed so simple it was brilliant; Doug Cutting mentioned the idea, too, so it has some cachet.

One of the new features in the BagIt Library will be multi-threading CPU-intensive bag processing operations, such as bag creation and verification. Modern processors are all multi-core, but because the current version of the BagIt Library is not utilizing those cores, bag operations take longer than they should. The new version of BIL should create and verify bags significantly faster than the old version. Of course, as we add CPUs, we shift the bottleneck to the hard disk and IO bus, but it’s an improvement nonetheless.

Writing proper multi-threaded code is a tricky proposition, though. Threading is a notorious minefield of subtle errors and difficult-to-reproduce bugs. When we turned on multi-threading in our tests, we ran into some interesting issues with the Apache Commons VFS library we use to keep track of file locations. It turns out that VFS is not really designed to be thread-safe. Some recent list traffic seems to indicate that this might be fixed sometime in the future, but it’s certainly not the case now.

Now, we don’t want to lose VFS – it’s a huge boon. Its support for various serialization formats and virtual files makes modeling serialized and holey bags a lot easier. So we had to figure out how to make VFS work cleanly across multiple threads.

The FileSystemManager is the root of one’s access to the VFS API. It does a lot of caching internally, and the child objects coming from its methods often hold links back to each other via the FileSystemManager. If you can isolate a FileSystemManager object per-thread, then you should be good to go.

We’re currently working on a new version of the BagIt Library: adding some new functionality, making some bug fixes, and refactoring the interfaces pretty heavily. If you happen to be one of the people currently using the programmatic interface, the next version will likely break your code. Sorry about that.

The BagIt spec is pretty clear about what makes a bag valid or complete, and it might seem a no-brainer to strictly implement validation based on the spec. Unfortunately, the real-world is not so simple. For example, the spec is unambiguous about the required existence of the bagit.txt, but we have real bags on-disk (from before the spec existed) that lack the bag declaration and yet need to be processed. As another example, hidden files are not mentioned at all by the spec, and the current version of the code treats them in an unspecified manner. On Windows, when the bag being validated has been checked out from Subversion, the hidden .svn folders cause unit tests to fail all over the place.

It seems an easy enough feature to add some flags to make the bag processing a bit more lenient. In fact, the checkValid() method already had an overloaded version which took a boolean indicating whether or not to tolerate a missing bagit.txt. I began by creating an enum which contained two flags (TOLERATE_MISSING_DECLARATION and IGNORE_HIDDEN_FILES), and began retrofitting the enum in place of the boolean.

I found that, internally, the various validation methods call one another, passing the same parameters over and over. Additionally, the validation methods weren’t using any privileged internal information during processing – only public methods were being called.

I called Justin this morning to discuss refactoring the validation operations using a Strategy pattern. This would allow us to:

Encapsulate the parameters to the algorithm, making the code easier to read and maintain. No more long lists of parameters passed from function call to function call.

Vary the algorithm used for processing based on the needs of the caller.

He had also come to the same conclusion, although driven by a different parameter set. It’s a good sign you’re headed in the right direction when two developers independently hacking on the code come up with the same solution to the same problem.

I have been working hard over the last several weeks to port our system at work from our x86 Linux development environment to the PowerPC AIX production environment. Fortunately for us, most of the platform differences are well hidden because our code is generally platform independent: Java, XSLT, and JavaScript. There are a few cases where we make calls to a JNI library, but the libraries exist and are supported for the varying platforms, and we have had no trouble with those.

What we unexpectedly had trouble with, though, was our fulltext Lucene index. Weighing in at a massive 55 GB, and only expected to get bigger, we were duly impressed at our development environment’s ability to process the index with no hiccups, in addition to consistently speedy search times. When I moved it to AIX, however, something went amiss. We started receiving this exception, which the stack trace revealed was coming from Lucene’s index reading code:

java.io.IOException: Unknown format version:-16056063

We confirmed with MD5 hashes that the files were identical in both environments, and we confirmed that the Lucene libraries were all correct. That left us with some obscure platform difference we had to track down.

Using a smaller test index, we were able to confirm that Lucene was able to successfully open an index on AIX, confirming Lucene’s own touted endian agnosticism. We also lifted file write size ulimits on certain users to confirm that that limit didn’t unintentionally affect the ability to read files as well.

Finally, we discovered through some documentation (of all places!) that 32-bit IBM programs are limited to file reads of no more than ~2 GB – that magic 2^31 – 1 limit – and our Java virtual machine was only 32-bit! Simply upgrading to the 64-bit JVM solved the problem.

We hadn’t thought of this because we were using a 32-bit JVM in development, with no problems, but the crucial difference is that it was the Sun JVM. We later installed the 32-bit IBM JVM onto a development environment and confirmed that it cannot open our index file there, either. Notably, however, it provided a much more useful error message:

java.io.IOException: Value too large for defined data type
at java.io.RandomAccessFile.length(Native Method)
at org.apache.lucene.store.FSIndexInput.(FSDirectory.java:440)

Rather than throwing an IOException from the java.io code, the IBM JVM on AIX simply returned bogus data. This caused Lucene’s index reader to throw an exception because, coincidentally, the number it was trying to read at that magic signed integer limit was expected to be a file version number. It was expecting to see -1, but instead got -16056063.

And so everything seems to running swimmingly now. The moral of the story is: Beware of big files on 32-bit machines.

Despite the author’s numerous warnings that this syntax is arbitrary and just for exemplifying the general idea, I have to say this syntax sucks. How about we just reuse the recently introduced support for annotations and add a Published attribute to Java. No need to change the language, no need to add another arbitrary source file. No mess, no fuss.

I’ve got an import statement that references a deprecated class. However, it’s old and tested, and I don’t want to change it. For a long time, I annoyed the nagging warnings about such problems because there wasn’t anything I could do about them.

Along comes Java 1.5 with support for annotations (that’s attributes for you .NET folk). In particular, the SuppressWarnings attribute provides the ability to selectively disable warnings – precisely what I want.

And everything works great, except for that pesky import statement. It still tosses a warning at me, and I can’t get it to go away! Putting the attribute into the package-info.java file doesn’t work, either, because the SuppressWarnings attribute isn’t declared as a package annotation target.

For now, I’ve worked around the problem by removing the import and fully quaifying the class name in my code, but that’s lame. If anybody knows the “right” way to do this, please contact me.