Wednesday, September 30, 2009

Virtual machines, such as the Java Virtual Machine, are astoundingly complex.

My friend Andrew sent me a link the other day to this great puzzle from the C# world. Similar things arise in Java, although it's not identical, since you have to explicitly invoke the equals() method to get logical equality (the == operator always performs reference equality).

Lippert's blog post is great, and very clearly written, even if I do have no idea whatsoever about the interview question that he parenthetically notes ("invent a performant algorithm for determining intransitivities in a simplified version of the 'better method' algorithm").

The discussion of string interning in C# reminded me, somewhat, of a fascinating recent performance optimization that Knut Anders Hatlen investigated in Derby: DERBY-3883. In this very specialized case, converting a logical equality test into a more sophisticated "check the hash equality first, then check the logical equality" implementation brought surprising benefits. It's worth reading the issue comments thoroughly for some clear explanations from Knut about why and when the performance optimization is useful.

The fact that Java performance behaviors can be unexpected and surprising is intriguing. A decade ago, optimizing and improving Java program performance seemed like it was substantially easier than it is nowadays. Partly this is because I find myself working to squeeze further levels of performance from code bases that have already been thoroughly and extensively examined, but partly it is also because modern VMs and modern hardware architectures have made performance analysis very tricky.

I guess I should be pleased that (most of) the topics raised in this laundry list are familiar to me, but mostly I'm just reminded of how complicated modern systems software is, and how tricky it is to get it just right.

Speaking of GPU parallelism, I happened to see that NVidia have released their latest OpenCL libraries in the CUDA effort. I'm not sure how I feel about approaching the notion of a new "C-like language", since C itself is hard enough, but I may have a go at learning more about some of the Java-related CUDA apis.

Tuesday, September 29, 2009

My object-relational library at work has been acting up. We have an internal protocol for handling object modifications, which is designed to maintain the clarity of the object cache: when an object is updated, we move it from the primary cache into a per-transaction lookaside cache for the duration of the modification process; at commit time, the updated objects are restored to the main cache.

Unfortunately, once in a blue moon, under quite high load, the cache loses track of an object. Something goes wrong during the commit process, and at least one object fails to be restored to the cache.

Because of the way the cache works, the symptoms of this are almost unnoticeable, as the library code simply reloads the object from the database the next time it is referenced. So for the longest time all we noticed was that we were doing slightly more database access than we expected. Eventually we noticed in our database traces that we were occasionally reading objects which ought to have been found in the cache, and slowly over time I've been trying to track down why that is.

I've been generating theories about how this could be, and constructing experiments or building assertions and scaffolding into the code, to prove or disprove the theories.

So far, all the theories have been disproven.

I call this the "Sherlock Holmes" technique of debugging: "once you have eliminated the impossible, whatever else remains, no matter how improbable, must be the truth." Unfortunately, I still have more impossibilities to eliminate, and I haven't yet found the truth.

My friend Walt says: "if you aren't struggling, you aren't learning", and he's absolutely right. It takes a challenging and infuriating problem to drag you deeply enough into the code to reach that breakthrough of discovery and insight.

As a believer in rationality and logic, I know that eventually, this problem too will succumb, and I'll arrive at a full explanation of the behavior I'm seeing.

Tuesday, September 22, 2009

Several years ago I looked into using drag-and-drop in an internal web application that I support at work. I was using YUI, and YUI has considerable drag-and-drop support. I didn't take this very far, but it did what I needed it to do.

HTML 5 is one of the most important languages of the future. Many very significant applications will be written using it. People have been sweating the HTML 5 details for a decade. We want it to be the best language it can be.

Drag-and-drop is a very important UI technique, and it's long past time that web developers be able to use it easily and naturally in simple standards-compliant HTML.

Up to this point, HTML 5 has been an nearly un-interrupted series of success stories, so to have such a big black eye is unfortunate.

Maybe I'm giving too much weight to the nay-sayers. However, PPK and Hixie are as respected as they come in web-developer-land, so if they say it, it's certainly so.

On the plus side, I'm pleased to have stumbled across this link to Francisco Tolmasky's web log. The 280Slides work is extremely impressive -- subscribed!

"Some people on our team think this is the way that problems will be solved in the future," said Chris Hefele (updated) of The Ensemble, one of the two qualifying teams. "Large problems, with large teams all across the globe collaborating by internet."

The field of so-called "artifical intelligence" has been around for a long time, and comes and goes in favor. Some of the modern techniques, such as neural networks, Bayesian statistics, etc. are now extremely well developed, and obviously are quite successful. Perhaps this success will lead to both renewed interest and renewed respectability for these fields.

Meanwhile, Netflix is so pleased with the results that they're already planning Prize 2!

Now, there is no such thing as Year 0, nor Day 0, nor Month 0, so you might think that this code might throw an exception and refuse the date as illegal. But it doesn't! Instead, it prints:

0002-11-30 15:47:28.0

Now, this is strange in several ways:

Why is the year printed as year 2?

Why is the month and day printed as November 30?

Now, it turns out that the java.sql.Timestamp class is built atop java.sql.Date, which is built atop java.util.Date, which is built atop java.util.Calendar. So now we might wonder about how this code uses Calendar, and, in particular, about what the following code might do:

Wednesday, September 16, 2009

Here's a nice essay about some folks that used Markov chains to write a nifty test data generator.

The article does a good job of explaining the underlying theory behind Markov chains, and also explains some of the implementation details they had to overcome, such as how they used a reference implementation as an oracle, to decide whether their code had passed a particular test or not, and how they had to train their test data generator by feeding it samples of test cases that they already had in place.

I also like the fact that they didn't turn to this technique until they had already developed a large testbed of test cases via other mechanisms:

internally-developed test cases

test cases contributed by customers

test cases constructed by harvesting data off the internet

These sort of test case generation techniques seem to me to be more likely to be fruitful first; it makes sense that after they felt they had exhausted the reach of such test cases, they got more innovative and turned to the Markov chain technique to try to generate more test cases.

You're never done testing. When you think you are, consider a new technique.

Tuesday, September 15, 2009

They say it's a poor craftsman who blames his tools, but jhat just about did me in.

I was trying to diagnose a complicated out-of-memory situation in a very busy application server instance. We had run the server with the HeapDumpOnOutOfMemory flag, so we had a full dump of memory, and I was trying to figure out what was going wrong. jhat showed me that there were 185,000 instances of class A, and I knew that each instance of class B has two class A pointers, but jhat said that there were only 203 instances of class B. What gives?

I think that I was running into the problem encountered by this user: jhat just doesn't handle multiple class loaders very well. In my case, one class loader did indeed have 185,000 instances of class A, and a different class loader did indeed have 203 instances of class B, but I was interested in following the reference relationships between all the objects in the first class loader, and jhat wasn't helping.

Apart from one small installation snafu (which, to give them credit, is documented), download and installation was a breeze. And when I pointed it at my heap dump, it read the heap dump, and showed me just what I was expecting: 92,500 instances of class B.

Moreover, with just a couple clicks on 'Immediate Dominators' and 'Merge shortest paths to root', I was able to see the ArrayList instance in a background thread which was holding on to almost all of the 92,500 instances. Wonderful tool!

Unfortunately, the documentation seems rather chaotic, being organized primarily as a webcast and several years of blog posts.

But that's OK: for a tool this good, I'll take the time to read the old blog posts and watch the web cast, because having a solid, reliable, and high-performing memory analysis tool is about as good as it gets, in the programmer's world of development tools.

P.S. Markus, I know that you pointed me at the M.A.T. over two months ago, in a comment to an earlier post of mine. Thanks very much, and I'm just sorry it took me so long to follow up on your suggestion!

Friday, September 11, 2009

I've been considering using a reader-writer lock in a project I'm doing at the office, but I'm concerned about the fair scheduling problem. So let's define those terms, briefly:

A reader-writer lock is a synchronization primitive which is more sophisticated that the simple mutual-exclusion lock. With a mutual-exclusion lock, only one thread at a time is allowed to enter the critical section. With a reader-writer lock, callers must identify themselves as readers or writers; multiple readers may enter the critical section simultaneously, but a writer may only enter the critical section alone.

Fair scheduling ensures that waiting threads aren't starved of access to the critical section. Many naive implementations of the reader-writer lock suffer from write starvation, in which a writer may wait an arbitrarily long time if readers continue to arrive at the critical section while the writer is waiting.

When constructed as fair, threads contend for entry using an approximately arrival-order policy. When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock, or if there is a group of reader threads waiting longer than all waiting writer threads, that group will be assigned the read lock.

Design is a process of making choices, removing ambiguity, and getting specific. Which cabinets would we use, where would they go, and how would they fit with the appliances? What sort of doors, handles, shelves, knobs, and racks would go in the cabinets? What material should be used for the countertop (we chose a lovely piece of "Tropical Green" granite), and what size and color should it be?

Several early design decisions were crucial:

Since the granite countertop was the most important part of the upgrade, everything else had to work with it. The counter was to be large and striking, the centerpiece of the new kitchen.

To increase overall storage space, we decided to make the counter extra-large, and install a second set of base cabinets on the "far side" of the counter, with an overhang so people didn't bump their feet on them when sitting at the counter.

We would use IKEA cabinets.

IKEA provides software for kitchen planning, downloadable from their web site. It's quite powerful software, with tools for selecting different cabinets in different styles, arranging and assembling them into position, and presenting you with a 3-D view of the resulting kitchen, with a fly-through feature so you can zoom in-and-out and walk around in your new simulated kitchen. Unfortunately the software crashed routinely on my main computer, but it worked reliably on another computer so we used it there.

There are lots of hidden features within the IKEA planning software, which the IKEA employees are all aware of. So we took advantage of the feature that let us save our in-progress design to the IKEA web site, then we travelled to the store and opened the design at the IKEA planning booth inside the store, and various IKEA employees helped us improve and refine our choices.

We spent many, many hours with that IKEA planning software. I have no idea how hard it would have been to communicate our design desires to our contractors without it.

After we reached a design we were comfortable with, the planning software printed out a complete bill of materials, with every last panel, door, handle, and accessory listed, and gave us an extremely detailed price quote that we were able to use for our final budgeting and adjustment. And when we actually ordered and paid for the cabinets, we simply opened our saved kitchen design at the IKEA planning booth, and the IKEA employee activated the order.

Software design has lots of design tools, too: there are word processors, sketching and storyboarding tools, diagramming tools, rapid prototyping systems, and so forth. I use these tools quite frequently during software design, but I find that a major part of software design is the identification and clarification of constraints. There are usually some major constraints that govern the implementation of software systems, and elaborating and achieving agreement on these constraints is a crucial component of design. Constraints are typically restrictions on the software's behavior, for example:

All data must be tracked in such a way that a historical audit trail, showing all incremental changes over time, can be produced on demand,

The system must be deployable on a low-end laptop with 1 GB of physical memory, for demonstrations and early use, but must scale up to fully utilize 32-way server systems with 64 GB of memory.

As the design starts to emerge, I try to capture the most important choices and decisions and invariants in writing, so that we can all agree on them.

It would be great if there were an IKEA planning tool for server software design; maybe one day I'll see such a thing!

Wednesday, September 9, 2009

The first decision we met in our kitchen remodelling was a requirements question.

Kitchens have a vast number of different aspects:

cabinets

countertops

appliances

lighting

plumbing, electricity, ventilation

flooring

and so forth. The question was: which aspects were most important for our particular kitchen?

It didn't take very long to come to an agreement about this. Our previous kitchen's countertop was a 30-year-old white ceramic tile counter; the tiles were cracked and chipped, and the dark grout lines between them made us feel like the counter was always dirty, even when it was spotlessly clean.

So we decided to put the bulk of our budget into a custom granite countertop, 4 feet wide and 10 feet long, with a deep sunken sink along one edge.

Software projects are constantly running afoul of requirements issues. For one thing, engineers are always eager to get on with the coding, which provides an under-the-surface pressure to finish the requirements discussions prematurely.

For another, software can be quite abstract and nebulous, which makes it hard to have specific discussions about requirements.

And, of course, there is that classic software engineering chestnut: fast, cheap, reliable; choose two. (Or more likely, choose one, if you're even that lucky!)

I'm working on a new project at work. To start, I picked up a bug, and tried to fix it. After several weeks, it's become clear that this "bug" is actually a requirements problem: nobody was sure what a certain feature was intended to do, and so whatever it does, it doesn't really seem to be doing the right thing.

In the end, I'll make that feature do something, but for the next release, I'll try to build a different feature which gets closer (I hope) to what the customer actually wanted.

Tuesday, September 8, 2009

We've been extremely busy over the last two months, remodelling our kitchen. We didn't really intend to remodel the kitchen this summer, but our dishwasher broke and leaked water under the floorboards, ruining the flooring and cabinets, and made our decision for us.

It's been interesting work, and it's caused me to think a lot about how remodelling your kitchen has a lot of similarities to running a software project. I'll be writing more about some of these ideas in some future posts, but here's a simple summary:

The Derby storage layer uses the concept of a conglomerate, which is a collection of related data stored in a single file. Each table in your database is stored in a conglomerate, and each index of each table is stored in its own conglomerate, so you have one file per table-or-index in your database.

Note that sometimes indexes are created explicitly, via CREATE INDEX, and sometimes indexes are created automatically, such as when you declare a UNIQUE constraint or a PRIMARY KEY constraint.

Derby tracks each conglomerate separately, by its assigned conglomerate number. The conglomerate number is used to create the file name by converting it to hex and constructing the file name cHHH.dat, where HHH is the conglomerate number in hex.

Derby records information about each conglomerate in its system catalogs, so you can simply run a query to find out which conglomerate is used by which table:

Tuesday, September 1, 2009

I was mouthing off about Perforce administration the other day, and thought this might be worth sharing with others.

Write to Perforce support directly. They're very helpful, even when you haven't yet bought the software. They will be willing to arrange an evaluation license so that you can install and try out the software on your own machines ahead of time.

Make sure you use a Linux or Solaris server; do NOT use a Windows server! Perforce handles Windows clients just fine, but the server does vastly better on Linux.

Perforce loves physical memory. We have 32 GB of memory on our Perforce server, and when we upgraded it from 16 GB a few years back it made a BIG difference. I have read that large sites routinely run Perforce on machines with 256 GB of memory, even more. But you can increase the memory over time; when we started out we had just 4 GB of memory and for the first 18 months that was fine. But memory is cheap.

Perforce doesn't need much special disk or CPU hardware. You'll need enough disk to hold all the data that you want to store in Perforce, plus some extra to take backups, etc. You can add disk space over time without any special procedures needed. Use a RAID configuration if you can to add an extra level of protection. Perforce is VERY CPU efficient. Give the machine a good network hookup if you can as it spends a lot of time transmitting data to the clients. I think we have about 100 GB of disk on our machine.

I'm very attentive to the backups. Currently we back up once a week, although for years we backed up daily. Have a good close look at http://perforce.com/perforce/doc.091/manuals/p4sag/02_backup.html#1043336. All I did was to implement that procedure. The duration of the backup basically depends on the amount of data you have, so it tends to grow over time; when you start the backup won't take much time at all so you can run it routinely. Oh, and test your backup (do a test restore) before you go live! You can also find a bunch of other contributed backup scripts in the Perforce public depot.

Browse through the user papers from the Perforce user conferences at http://www.perforce.com/perforce/conferences/. There are lots of good 'real world experience' presentations there about what various Perforce sites have experienced.