This was a fun one. It's my second Star Battle puzzle, and it's
definitely more approachable than my previous dual-grid Star
Duel. (Incidentally, that giant Star Duel was noted as
one among several of the best object-placement puzzles of
2014
at GM Puzzles. I'm happy to see that people liked it.)

I got the idea for today's Star Battle when reading that best-of
post. One of the other puzzles noted there was this lovely
9-pentomino Star
Battle
by Zoltán Horváth. The post mentioned that another designer, Jiří
Hrdina, had independently designed a similar 9-pentomino Star
Battle. (I can't link directly to that one since it's not freely
available on the web. It's contained in The Art of Puzzles: Star
Battle e-book
available for sale). Then, in the comments, Matúš Demiger mentioned
that he had also independently constructed a third 9-pentomino Star
Battle
as part of the 2014 24-Hour Puzzle
Competition.

So at least three puzzle authors all happened to construct
pentomino-themed Star Battle puzzles last year, and all three happened
to choose the fairly-standard 10x10 grid size, (forcing the puzzle to
include only 9 of the 12 possible pentomino shapes). Matúš's comment
was "I hope someone will try to include all twelve pentominoes" and I
couldn't resist the challenge.

And it was an interesting challenge since putting 12 pentominoes into
a Star Battle requires a 13x13 grid. That's not too much of a problem
in and of itself, (my Star Duel used a 15x15 grid, for example). But
with this particular theme, as the puzzle grows the total area of the
pentominoes grows linearly, while the total puzzle area grows
quadratically. In my final puzzle there are 12 regions of size 5 and
then one giant outer region with 109 cells. But all 13 regions each
only contain two stars. So the real challenge here was to ensure that
when solving the puzzle the stars in the huge region didn't get
determined early, (causing a bunch of cells to be wasted and forcing
the user to tediously mark off all of the unused cells).

Luckily, I think it just worked out. In all of my test-solving, the
stars in the large region are among the very last determined.

Anyway, give this puzzle a try if you'd like. The large number of tiny
regions means there are a lot of easy steps early on in the
puzzle. But there are still a few more interesting deductions in store
later on, (but nothing ever all that difficult in this Wednesday-level
puzzle).

I should also thank Thomas Snyder for his editorial help. He found and
fixed a small ambiguity in the first version of this puzzle that I
submitted. I've since coded up a deductive Star Battle solver just to
be able to verify uniqueness for puzzles I construct. But maybe I'll
talk about that in a future post.

PS. If anyone is following closely, I neglected to mention my fourth
puzzle
when it was published a few weeks ago on a Friday. It's a Pentominous
puzzle with almost no clues other than 12 F's, and I named it "F is
for Fiendish". The title is a warning, and I think it deserves it. I
think this is the hardest puzzle I've published so far. My Star Duel
earned a longer estimated "Expert" time, (37 minutes compared to 20
minutes for "F is for Fiendish"), but that's mostly because Star Duel
is so much bigger, (2 15x15 grids compared to a single 10x10
grid). The deductions required here are definitely harder to find.

My sister is really kind to do some of the initial testing of several
of my puzzles. After I handed her a copy of "F is for Fiendish" one
evening, she called me later that night to ask, "Can you email me a
fresh copy of that puzzle? My husband and I have been trying it over
and over and the paper is all disintegrating after so much erasing." That's a beautiful thing for a puzzle designer to hear---that someone is terribly frustrated with a puzzle, but still determined to stick with it and keep trying.

So if you want a challenge, give "F is for
Fiendish"
a try. There are logical steps that can be found at every point to
solve the puzzle without needing any guessing or back-tracking, (but
they may not be easy to find). Good luck, and happy puzzling!

This puzzle is a Pentominous puzzle, a really fun style which I
believe was invented by Grant Fikes, a prolific puzzle contributor to
GM Puzzles. To solve the puzzle, you simply need to divide the 10x10
grid into 20 pentominoes such that each of the given letters in the
grid is contained by a pentomino of that shape, and that no two
pentominoes of the same shape are touching.

My puzzle today is a Monday-level difficulty, meaning it's not too
tough. And I think this puzzle in particular is a great introduction
to Pentominous. When I first started constructing logic puzzles, I had
a hard time making good and fun puzzles at an introductory level. Part
of the problem is that any puzzle I make is usually harder than I
expect it will be, (I think that's common for many beginning puzzle
designers), so I'd often force harder deductions in the puzzle
thinking that that was a requirement to make the puzzle interesting.

But this puzzle is probably the first time I made what I think it a
really successful introductory puzzle. I've given this puzzle to a lot
of people and people end up really liking the way the puzzle just sort
of walks you through a series of forced pentomino placements. I've
often heard feedback along the lines of, "Well, I still say that I
can't do puzzles like this, but that one was actually a lot of fun,"
and that's really rewarding to hear.

So take a look if you're interested and let me know how it goes for
you.

Oh, and I should comment on the words I ended up including in the
puzzle's presentation. First, it's really tempting to include words in
Pentominous puzzles. Second, it's really hard to pull them off. As a
case in point here is an example
puzzle by
Tapio Saarinen where he started with 4 English words, but by the time
he was done, only one remained as an actual word.

For today's puzzle, I set myself up for a bigger challenge, because I
wanted to get a meaningful phrase, (but I only have the letters
FILNPTUVWXYZ to work with---quite limited). The phrase I chose,
"Intuit, Nix, Fix" is specifically the strategy I don't recommend
using to solve puzzles. That is, you can just make a guess (intuit),
then work with the puzzle for a while until you encounter a
contradiction, then cancel your guess and try again (nix, fix). I
suppose "guess and check" would have been an easier way to say that,
but you can't spell that with the twelve Pentomino letters.

Anyway, a lot of people have been trained to use guess-and-check as a
strategy because they've been solving puzzles generated randomly by
computer programs, and many of these puzzles don't afford any other
strategy. A huge benefit of solving good, hand-made puzzles like those
published at gmpuzzles.com is that you don't need that
strategy. Inside, there's a nice logical path from start to finish. So
resist the temptation to guess, and go give this puzzle a try.

My boys just finished this year's Christmas puzzle hunt, (known in our
family as "The Code"). Read below for a summary of how things went, or
check out the actual puzzles if
you want to solve them yourself or just wonder at the amount of work
my poor boys have to go through before they get any Christmas
presents. I've also written up a brief history of
the Christmas code linking to all the puzzles from previous years.

I'm really pleased with how things went this year. I scaled the
difficulty back compared to last year, and that was good for
everyone's sanity. It's easy to see the difference in just the
numbers. Last year's hunt had 24 puzzles, 5 metapuzzles, and 1 final
metametapuzzle. This year there were only 9 puzzles and 2 small
metapuzzles. I was nervous the boys might think I did too little, but
I think I actually got the difficulty just right this year.

They did solve it a few days before Christmas this year, but that's a
good victory for them. Last year, there was a tough Battleships
metapuzzle and the deciphering of a Playfair cipher that all came
together only late at night on Christmas Eve.

Of course, the most important feedback comes from the boys
themselves. There was a priceless moment last night as one-by-one, the
boys in turn had the final flash of insight needed to solve the last
clue. There's nothing better for a puzzle designer than getting to
watch that "ah-ha" moment hit with its explosion of laughter. And that
laughter is always mixed---appreciation for the disguise of the clue
and a bit of disgust at not being able to see through it all
earlier. And last night, it was a perfect cascade of these moments
from one boy to the next. After the first one finally got it, just
knowing it was possible was all the others needed to reach the same
insight themselves. It was perfect.

This morning the boys reported that this year's code was the "best one
yet". I think I've heard that three years in a row now. That's what
gives me the motivation to do all this again. I guess I had better
start planning now if I want to improve again next year!

As one might hope, I believe my puzzle-designing skills are improving
with practice. So hopefully the more recent puzzle above show some of
my better work.

This tradition has become sufficiently involved that the history of
The Code really deserves to be captured. Here's the story of how it
all started, and how it's changed over the years.

On Keeping Christmas a Surprise: The Invention of The Code

One year, my wife and I were lamenting to some friends that Christmas
gifts were losing their mystery at our house. Our boys were getting
good enough at poking, prodding, and predicting their presents that
there weren't a lot of surprises left for Christmas morning. Our
friends had had similar problems with their children, and we ended up
comparing notes on things we had done to keep gifts secret.

One obvious tactic is to disguise a gift while wrapping it. This
technique has already been developed to an artform by our boys. They
love giving each other a large, wrapped gift which, when opened
reveals a smaller, wrapped gift inside. Then, like a set of matryoshka
dolls, that gift reveals a smaller, and smaller gift. In our family,
as soon as this unwrapping process is started, everyone knows what the
end result will be. The final few layers are so tiny that there's no
room for boxes anymore, just many, many layers of different colored
gift wrap and excessive amounts of tape. These last layers take a
tremendous amount of effort to get through. And the reward for all of
this hard work of unwrapping? Inevitably, it's a single penny. We
don't recall who wrapped the first penny, but it's now a standing
tradition where the boys try to outdo each other each year by
disguising a penny in the most elaborate way possible.

Clearly, we're not willing to go to such heroic efforts to disguise
every gift we wrap. We had experimented with waiting to put the
Christmas gifts out under the tree until just before Christmas, but
that wasn't much fun. It robbed the boys of a lot of the fun
anticipation of seeing the gifts under the tree throughout December.

Our friends shared an idea that had worked well in their family. What
they did was not write names on any of the gifts, but instead wrote a
single number on each gift indicating the recipient. And the method
for choosing the numbers was selected in a new and unpredictable way
each year. For example, one year my friend (who happens to be a
dentist) wrote the number of teeth that each child had lost on their
gifts.

We thought this was a great idea, and we decided to give it a try. And
none of our Christmas gifts have had any of the boys' names on them
sense, (though they've had just about everything else possible). Read
on for a rundown of what we have done for the code for each year.

The Early Years: Locking the Code up Tight

For the first year, we adopted a very simple strategy. We wrapped each
boys gifts in a unique color of wrapping paper. We didn't explain
anything, and when the boys asked why none of the gifts had names on
them, we were evasive. Then, on Christmas morning, we told them which
color of gift wrap corresponded to each boy.

The second year, the boys correctly predicted that we wouldn't use the
same technique, and they immediately set about trying to crack "the
code" for the Christmas presents. They started convening secret
meetings to discuss theories. We discovered some of the notes from one
of the meetings and found that they had constructed an entire table
mapping out the following variables for each gift under the tree:
Wrapping paper color, Number of bows, Colors of bows, Picture on the
tag, Names on the tag. You see, this year, I did put tags on the
presents, but instead of their names I wrote the names of characters
from nonsense poetry: "To: The cat; From: The fiddle", "To: The cow;
From: The moon", "To: The dish; From: The spoon", etc.

Their table was fairly effective. They were able to eliminate many
variables that couldn't work. If there were more colors of wrapping
paper tan children, they assumed that could ignore that. If there were
only one or two bows per present, they assumed that couldn't identify
one of the four boys. Fortunately for me, they didn't crack the code
that year, but only because their table hadn't accounted for the color
of ribbon on each present.

By the third year, I realized that I had to put more thought into
designing the code. Here was an active group of intelligent agents
determined to find the information I was trying to hide. I was careful
this time to imagine every variable they could track, and ensure that
each variable appeared with four different values, evenly distributed
among the presents. I also ensured there was no correlation between
any of the variables. Grouping the presents by wrapping-paper color,
bows, ribbons, or anything else would always yield four entirely
different sets of presents. (So yes, this meant that now my wife and I
needed to consult our own table before we could know how to correctly
wrap each gift).

Then, for the actual information, I chose two variables I thought they
would never track. We carefully folded the flaps on each gift either
in the same direction on each side or in opposite directions on each
side. Then we either folded all cut edges away to leave clean creases,
or left the raw edges exposed. This gave us four sets of presents:
Matching flaps creased, Matching flaps raw, Opposite flaps creased,
and Opposite flaps raw.

Of course, the boys never even looked at the flaps, and all of their
attempts to find logical groups were foiled. When I revealed the
answer on Christmas morning, I was smug, thinking I had "won" by
creating a code they couldn't crack. Of course, the boys called me out
saying that what I had done was totally unfair. (How could it have
been unfair I thought? This was a game that I had invented myself?)
But the boys were totally right. My problem was thinking that this was
a game, when in fact, this should have been a puzzle.

Years later, I read the book "Puzzle Craft" by Mike Selinker and
Thomas Snyder. In the introduction, Mike Selinker describes the lesson
that my boys were teaching me. He says that a game is a contest with
two equal sides and the outcome is in doubt, (either side has a
roughly equal chance to win). In contrast, a puzzle is a contest with
two wildly-unequal sides where the outcome is never in doubt, (the
weaker side will always win). When I treated The Code as a game, it
wasn't fun for any of us. The odds were stacked too much in my
favor---I could always create an impossible-to-solve puzzle, but who
has fun with a puzzle that's impossible to solve? That just leads to
frustration and giving up. A good puzzle, in contrast, has plenty of
frustration, but enough fun and reward that the solvers stick it
through to the end. So I needed to learn to create a puzzle.

The Code Today: The Code as a Puzzle

My boys taught me that the code needed to be fair. That is, they
needed to be given enough information to be able to solve the
code. There could still be lots of deception and trickery, but they
needed to know that with perseverance, patience, and creativity they
could actually find the answer. They also gave me a second ground
rule: The solution to the code must be relevant and interesting. A
final answer of "you get the presents with the matching flaps with the
raw edges" doesn't cut it. Instead, the solution to the code should
actually point to the boys themselves. Basically, they were telling me
"Design us a puzzle", and "Make it a good puzzle", and it just took me
some time to figure that out.

Christmas 2012 was the first year I approached The
Code as a puzzle. For that year, I labeled each present with nothing
more than a small QR code. Scanning the QR code linked to a web page
with a silly animated GIF, a solid-color background, some nonsense
poetry in the title, etc. Somewhere in all of that was a hidden
indication of who the intended recipient of each present was. So this
was simply one puzzle, and a lot of obfuscation. I thought this puzzle
would have been easier than it was, (a common problem for early puzzle
designers from what I understand). But the boys had a lot of fun with
it, and with some hints at the end, they figured things out by
Christmas.

Christmas 2013 was the first year I stepped up and
instead of designing just one puzzle, I designed an entire puzzle
hunt. A puzzle hunt is a connected series of several puzzles. Many
puzzle hunts also included "meta-puzzles" where the solutions to
several puzzle combine to form a new puzzle. This was my first puzzle
hunt to design and it included 24 puzzles, 5 metapuzzles, and 1 final
metametapuzzle, (where the solutions to 4 previous metapuzzles had to
be combined in another puzzle) That was probably over-ambitious for my
first puzzle hunt, but it worked fairly well. There were a couple of
bugs in the puzzles that I should have caught with better testing in
advance.

One thing I was really pleased with was that I intentionally
included every element from the previous puzzle, (animated GIFs,
random background colors, nonsense poetry, etc.). But where in 2012
many of these elements were meaningless red herrings, in this year's
hunt, every element was used in at least one puzzle. I was also happy
that I included a mechanism for providing additional hints along the
way. (And I did this in a way that I could revise those hints before
the boys encountered them, so I could fine-tune the hints based on
where they were getting stuck.) That was very useful, and I used that
to my advantage again in 2014.

Christmas 2014 was my second puzzle hunt, so it's
clearly now a new tradition that we won't give up for some
time. (We'll just need to adapt things when some of the boys move out
of the house, etc.). I do feel like I'm getting better at puzzle
design, and the boys are still having a lot of fun solving this. I
wrote up a blog post giving some of
my feedback on how the solving experience went this year.

This puzzle is a Star Battle, which is an elegant and easy-to-learn
puzzle type. All you have to do is place some stars into the grid such
that exactly one star, (or more stars for larger puzzles), appears in
each row, column, and bold region. Also, no two stars can be placed in
adjacent cells, not even diagonally.

For the puzzle I published today, I took a couple of steps beyond the
basic Star Battle formula. First, I made the grid quite large, (15x15
with 3 stars per row/column/region). Then I also provided two
grids. Each grid has the same solution of stars, so you work back and
forth between the grids to solve them both simultaneously. Finally,
instead of dividing the regions into random-looking arbitrary shapes,
I tried to make some recognizable pictures with the grids. If you
click through above, hopefully you'll be able to recognize the shapes
I was going for. This puzzle marks my first in a series of puzzles
where I chose the name of the puzzle type as the theme for the
puzzle. So, I drew a star battle for a Star Battle puzzle. And with
dual grids, I named this battle a Star Duel.

Finally, I should point out that where my previous puzzle
was published on a Monday,
today's puzzle is published on a Saturday. The GM Puzzles website
publishes puzzles that get increasingly difficult throughout the
week. So if you've never attempted a star-battle puzzle before, I
don't actually recommend you start with my puzzle from today.

Instead, you might start with this simpler star
battle puzzle that I wrote as part of a Christmas puzzle hunt for my
boys. It's a very gentle introduction to the puzzle type. If you try
it, you can ignore the second grid with the C, H, A, and S
regions. That was part of a metapuzzle included in the puzzle hunt.

I've really been impressed by the puzzles published at
gmpuzzles.com. They do a great job of showcasing how much more
interesting a logic puzzle can be when it's hand-crafted by a puzzle
designer, rather than computer generated. I've also enjoyed learning
many new kinds of logic puzzles, and decided that I wanted to try my
hand at composing some puzzles of my own.

My first published puzzle is a Nanro puzzle, which is a puzzle type I
only recently discovered. In a typical Nanro puzzle, the solver is
presented with a grid that is segmented into regions of various shapes
and sizes, with some of the cells labeled with numbers. The goal is to
label additional cells with numbers so that all of the labeled cells
form a single connected group, (with no 2x2 square of cells fully
labeled). Additionally, within each region there must at least one
labeled cell and the number(s) within a region must be equal to the
number of labeled cells within that region. (That is, a region may
contain one 1 or two 2s or three 3s, etc.) Finally, no two occurrences
of the same number can appear adjacent to each other across a region
boundary.

OK. So, I'll be the first to admit that the ruleset of Nanro isn't the
most elegant. The rules can't be captured as succinctly as those of
that other puzzle, ("no repeated numbers in any row, column, or
region"). Fortunately, the solving experience is smoother than the
description of the rules.

Most puzzles at gmpuzzles.com have some sort of "theme". Often there's
an interesting visual presentation. Almost always there's a
well-designed "solving path" of logical deductions that can be fun to
discover. The idea is that there's something creative that goes into
each puzzle that that distinguishes it from something
computer-generated. And that creativity usually translates to a more
interesting experience for the solver.

For this particular puzzle, the theme I wanted to explore was a
"clueless" puzzle. That is, my puzzle contains a grid segmented into
regions as typical for Nanro, but it doesn't include any initial
numbers in the presentation. So that's something unique I think (as
far as I have seen of Nanro at least). I'm not sure how successful I
was at exploring the theme in this puzzle. Hopefully it's an
interesting puzzle to solve, but I am a bit nervous that I'm bringing
down the average quality of puzzles at gmpuzzles.com with this one.

I hadn't seen it prior to composing my puzzle. But before mine was
published I found this spectacular
Nanro
by the inimitable Prasanna Seshadri. It's not entirely clueless, but
as close as possible, (just one given number), but it thoroughly
explores some of the ideas that my puzzle just touches
briefly. Prasanna's puzzle has a really striking presentation that
translates to a fun solving path. And it's obviously something that's
been generated by hand not computer-generated, (at least given the
state of the typical bland computer-generated puzzle today). So that's
a great example of what gmpuzzles is all about and the standard I'll
be aiming for with future puzzles.

So if you're interested in logic puzzles, take a look at my first
puzzle. But even more, look around at the other puzzles at
gmpuzzles.com. I doubt you'll be disappointed.

For quite some time I've been struggling with hundreds of email
messages a day, and hundreds-of-thousands of email messages in
archives to search. I've used several different email programs to try
to handle my mail, (pine, vm, mh-e, wanderlust, evolution, mutt), and
experimented with several others, but not until now did I find one
that really helps me process mail the way I want to.

Before I introduce sup, though, I want to
at least mention in passing a mail indexer named
mairix since I've been
meaning to write a blog entry about it for several months. Anthony
Towns first let me know about mairix earlier this year when I was
complaining about email near him. He described his approach to mail as
"mutt for reading mail and mairix for searching".

The way mairix works is that it creates an index of a mail archive
then provides a command-line-based search that creates results by
creating a new maildir, (with just symlinks to original messages if
the original archive is also maildir-based).

So mairix can give you a simple way to add fast, full-archive
searching to a program like evolution when its builtin search
capability is hopelessly inadequate for your large mail archive. The
mail client can be configured to look at a single search-results
maildir folder, and mairix can be used to create the results there.
Do note that mairix is somewhat limited as a mail search engine since
it has no proximity or phrase-based searching, (the best you get is
whether messages contain given words). But the maildir-in and
maildir-out approach it has makes it easy to integrate, and is the
kind of concept that can perhaps let us avoid some of the disasters
out there in the form of monolithic email clients.

But I've got a much more interesting story in sup. I was just a few
days into reinventing my email workflow around mutt when I got into an
interesting conversation with Jamey Sharp about what the ideal
mail-handling system would look like. Here are some of the ideas we
came up with:

Program presents me with messages that need my attention, (a
chronological sort of the archive with unread items in bold does
not count).

When I decide a message does not need my attention I can make it
"go away" into the archive, (and "mark as read" does not
count---I want to file away messages but still track whether
they are read or not in the archive).

Threads must be first-level objects, (in particular I want a
kill-thread command that archives not only the current messages,
but all future messages received in a given thread).

System should support arbitrary tags on messages. Tags can be
applied automatically, (such as applying categories to mail
received on lists), and applied ad-hoc by the user.

System should allow searching over the entire archive with most
recent results appearing instantly. Search terms can include
tags, message headers, or full-body search (including phrases).

In addition to full-archive search, incremental refinement
should be possible, (a new search based on the results of a
previous search).

There's no need for folders in this system. Tags and
incrementally refined, tag-based searching provide all the
benefits, (without the bug in folder-based systems where a user
has to hunt among many folders to find one where a search
returns non-empty results). In particular, the "inbox" view
should just be a search for unread and non-archived messages.

That was basically our list of core features. Beyond that, we also
discussed some further ideas for improving the prioritization of email
threads to the user. One primary job of the system is to present the
user with the most important thread to be read next, (based on tags,
etc.). The user supplies feedback by either reading the thread or
archiving it, so there's opportunity to apply machine learning to
improve the prioritization.

During the course of this conversation, Jamey mentioned that I should
go look at sup which he thought might
implement some of these features, but he hadn't tried it out yet. I
was delighted to find that each one of our core features already
exists in sup, (and sup would make a fine platform for research into
the learning-based approaches).

It also does a few things I hadn't specified, such as displaying email
in a full-thread nested view, (rather than one message at a time),
with quoted elements and signatures folded out of view until the user
asks to see them. Another nice touch is that the single-line
presentation of a thread includes the first names of participants in
the thread, (with bold names to indicate unread messages). This
provides some of the essential information needed for applying Joey
Hess's thread
patterns, but
without the tree view at this point.

[Note: I have been told that several of the above features are also
implemented in gmail. I've never tried gmail myself, since it fails to
provide some even more fundamental features: 1. offline
usage, (thanks to both Mark and Greg for pointing out that gmail
has offline support in beta via gears) 2. personal ownership of email
storage, 3. free-software implementation for customization.]

In the few days I've been using sup, it's definitely transformed the
way I process mail. Keeping my inbox empty is simple now, and I now
easily tag messages for specific, future actions without fearing that
I will forget, (and without leaving them cluttering up my
inbox). Searching through my entire archive is fast---the first
results come back instantly and they're generally what I want, (the
index is optimized to return the most-recent messages first).

Sup is implemented in ruby, and the author and maintainers on the
sup-talk mailing
list have been friendly and receptive to me, (in spite of my lack of
ruby-clue), so I've already got a patch or two in the upstream git
repository for sup. The indexing in sup has been performed by
ferret, but is currently switching to
xapian.

Sup is not entirely mature yet, (I discovered fairly easy ways to make
the 0.8.1 version in Debian crash), but I'm happily running the latest
code from git now, and trusting my mail to it. I did find that a bug
in a dependent package causes sup to crash when running sup from git
while sup is also installed from the Debian package. So I recommend
doing "apt-get install sup-mail" to get the dependencies installed,
then "apt-get remove sup-mail" (but don't autoremove the
dependencies), then run sup from:

git clone git://gitorious.org/sup/mainline.git

I've also posted a short
list
of some changes I'd like to see in sup, (mostly in the client
stuff---the indexing and searching stuff seems just fine).

There's a nice New User's Guide included with sup, but no reference
manual yet, so the sup wiki
is an essential (and short) substitute for a reference manual for now.

One thing sup does not do yet is that it doesn't separate the
mail-indexing/tagging/searching system from the mail client. So if
you're not happy with a curses client written in ruby, (and perhaps
more significantly, a different client than what you're currently
using for composing messages, making attachments, viewing mime stuff,
etc.) then you're currently out of luck. Fortunately, William Morgan
has plans to reinvent the most
interesting aspects of sup as a server process and he recently
reported making progress on those plans over the last year. (This
isn't too surprising, since why would William want to maintain all that
code to do things like dealing with mbox, maildir, attachments,
etc. when those are already solved problems for users everywhere.)

And hey, I would love to have a mailing list archiver based on sup's
threading. So if anyone wants to cook something like that up, please
feel free, (sup is licensed under the GNU GPLv2).

Trying to get a handle on 2D graphics rendering performance can be a
difficult task. Obviously, people care about the performance of their
2D applications. Nobody wants to wait for a web browser to scroll past
tacky banner ads or for an email client to render a screen full of
spam. And it's easy for users to notice "my programs aren't rendering
as fast with the latest drivers". But what developers need is a way to
quantify exactly what that means, in order to track improvements and
avoid regressions. And that measurement is the hard part. Or at least
it always has been hard, until Chris Wilson's recent cairo-perf-trace.

Previous attempts at 2D benchmarking

Various attempts at 2D-rendering benchmark suites have appeared and
even become popular. Notable examples are x11perf and gtkperf. My
claim is that these tools range from useless to actively harmful when
the task is understanding performance of real applications.

These traditional benchmarks suites are collections of synthetic
micro-benchmarks. Within a given benchmark, some tiny operation, (such
as "render a line of text" or "draw a radio box"), is performed
hundreds of times in a tight loop and the total time is measured. The
hope is that these operations will simulate the workload of actual
applications.

Unfortunately, the workload of things like x11perf and gtkperf rarely
come close to simulating practical workloads. In the worst case, the
operation being tested might never be used at all in modern
applications, (notice that x11perf tests things like stippled fills
and wide ellipses which are obsolete graphics operations). Similarly,
even if the operation is used, (such as a GTK+ radio button), it might
not represent a significant fraction of time spent rendering by the
application, (which might spend most of its time drawing its primary
display area rather than any stock widget).

So that's just the well-known idea to not focus on the performance of
things other than the primary bottlenecks. But even when we have
identified a bottleneck in an application, x11perf can still be the
wrong answer for measurement. For example, "text rendering" is a
common bottleneck for 2D applications. However, a test like "x11perf
aa10text" which seems like a tempting way to measure text performance
is far from ideal. This benchmark draws a small number of glyphs from
a single font at a single size over and over. Meanwhile, a real
application will use many glyphs from many fonts at many sizes. With
layers and layers of caches throughout the graphics stack, it's really
not possible to accurately simulate what "text rendering" means for a
real application without actually just running the actual application.

And yes, I myself have used and perhaps indirectly advocated for using
things like x11perf in the past. I won't recommend it again in the
future. See below for what I suggest instead.

What do the 3D folks do?

For 3D performance, everybody knows this lesson already. Nobody
measures the performance of "draw the same triangles over and
over". And if someone does, (by seriously quoting glxgear fps numbers,
for example), then everybody gets a good laugh. In fact, the phrase
"glxgears is not a benchmark" is a catchphrase among 3D
developers. Instead, 3D measurement is made with "benchmark modes" in
the 3D applications that people actually care about, (which as far as
I can tell is just games for some reason). In the benchmark mode, a
sample session of recorded input is replayed as quickly as possible
and a performance measurement is reported.

As a rule, our 2D applications don't have similar benchmark
modes. (There are some exceptions such as the trender utility for
mozilla and special command-line options for the swfdec player.) And
coding up application-specific benchmarking code for every interesting
application isn't something that anyone is signing up to do right now.

Introducing cairo-perf-trace

Over the past year or so, Chris "ickle" Wilson has been putting a lot
of work into a debugging utility known as cairo-trace, (inspired by
work on an earlier tool known as libcairowrap by Benjamin Otte and
Jeff Muizelaar). The cairo-trace utility produces a trace of all
cairo-based rendering operations made by an application. The trace is
complete and accurate enough to allow all operations to be replayed
with a separate tool.

The cairo-trace utility has long proven invaluable as a way to capture
otherwise hard-to-reproduce test cases. People with complex
applications that exhibit cairo bugs can generate a cairo-trace and
often easily trim it down to a minimal test case. Then after
submitting this trace, a developer can replicate this bug without
needing to have a copy of the complex application nor its state.

More recently, Chris wrote a new "cairo-trace --profile" mode and a
tool named cairo-perf-trace
for replaying traces for benchmarking purposes. These tools are
currently available by obtaining the cairo source
code, (either from git or in the
1.9.2 development snapshot or eventually the 1.10 release or
later). Hopefully we'll see them get packaged up so they're easier to
use soon.

With cairo-perf-trace, it's a simple matter to get rendering
performance measurements of real applications without having to do any
modification of the application itself. And you can collect a trace
based on exactly the workload you want, (as long as the application
you are interested in performs its rendering with cairo). Simply run:

cairo-trace --profile some-application

Which will generate a compressed file named something like
some-application.$pid.lzma. To later benchmark this trace, first
uncompress it:

lzma -cd some-application.$pid.lzma > some-application.trace

And then run cairo-perf-trace on the trace file:

cairo-perf-trace some-application.trace

The cairo-perf-trace utility will replay several iterations of the
trace, (waiting for the standard deviation among reported times to
drop below a threshold), and will report timing results for both the
"image" backend (cairo's software backend) and whatever native backend
is compiled into cairo, (xlib, quartz, win32, etc.). So one
immediately useful result is its obvious to see if the native backend
is slower than the all-software backend. Then, after making changes to
the graphics stack, subsequent runs can be compared to ensure
regressions are avoided and performance improvements actually help.

Finally, Chris has also established a cairo-traces git
repository which collects
useful traces that can be shared and compared. It already contains
several different browsing sessions with firefox, swfdec traces (one
with youtube), and traces of poppler, gnome-terminal, and
evolution. Obviously, anyone should feel free to generate and propose
new traces to contribute.

Putting cairo-perf-trace to use

In the few days that cairo-perf-traces has existed, we're already
seeing great results from it. When Kristian Høgsberg recently proposed
a memory-saving
patch
for the Intel driver, Chris Wilson followed up with a
cairo-perf-trace
report
showing that the memory-saving had no negative impact on a traced
firefox session, which addressed the
concern
that Eric had about the patch.

As another example, we've known that there's been a performance
regression in UXA (compared to EXA) for trapezoid rendering. The
problem was that UXA was allocating a pixmap only to then use
software-based rasterization to that pixmap (resulting in slow
read-modify-write cycles). The obvious fix I implemented is to simply
malloc a buffer, do the rasterization, and only then copy the result
to a pixmap.

After I wrote the patch, it was very satisfying to be able to validate
its real-world impact with a swfdec-based trace. This trace is based
on using swfdec to view the Giant
Steps
movie. When running this trace, sysprof makes it obvious that
trapezoid rendering is the primary bottleneck. Here is the output of
cairo-perf-trace on a GM965 machine before my patch:

The performance problem is quite plain here. Replaying the swfdec
trace to the X server takes 194 seconds compared to only 45 seconds to
replay it only to cairo's all-software image backend. Note that 194
seconds is longer than the full video clip, meaning that my system
isn't going to be able to keep up without skipping here. That's
obviously not what we want.

Here the xlib result has improved from 194 seconds to 81
seconds. That's a 2.4x improvement, and fast enough to now play the
movie without skipping. It's very satisfying to validate performance
patches with real-world application code like this. This commit is in
the recent 2.7.99.901 or the Intel driver, by the way. (Of course,
there's still a 1.8x slowdown of the xlib backend compared to the
image backend, so there's still more to be fixed here.)

The punchline is that we now have an easy way to benchmark 2D
rendering in actual, real-world applications. If you see someone
benchmarking with only toys like x11perf or gtkperf, go ahead and
point them to this post, or the the cairo-perf-trace
entry in the cairo FAQ, and
insist on benchmarks from real applications.

It's hard for me to believe that my last technical post on the
progress of the Intel graphics driver was nearly a year
ago. Certainly, a lot has happened in that year, and recent
developments have been covered well by
Keith,
Eric, and
Jesse. So
I won't go into any of those details, but I do want to give my
impression of where things stand.

As my colleagues have described so well, a major restructuring of the
driver is now complete, (KMS, GEM, dri2, removal of XAA, EXA, dri1,
etc.). This restructuring caused months of upheaval, followed by
months of stabilization during which many bugs were shaken out. The
end result is that the primary developers are all optimistic about the
current state of the driver. Things seem better than ever.

In the meantime, however, it's also obvious that some users are not
happy with the driver, and some have gotten the impression that the
quality is getting consistently worse with time. Some have theorized
that no quality-assurance testing is happening, or that the developers
just plain don't care about regressions. Neither of those theories are
true, so why is there such a disconnect between the developers and the
users?

One reason for unhappy users is the lag between development happening
and users getting their hands on the code. So the phase of upheaval
and instability might be behind the developers, but some users are
just entering into it. This is exacerbated by the fact that the major
restructuring puts several important components of the driver into the
kernel. It used to be that getting the latest Intel driver just meant
updating an xf86-video-intel module or upgrading an
xserver-xorg-video-intel package. But now it's essential to get a
recent kernel as well, (note that almost all of the major fixes that
Eric describes happened in the kernel, not the user-level driver). And
these fixes aren't in a major kernel release until 2.6.30 which
appeared only today.

In addition, distributions and users have a tendency to update the
kernel less frequently than user-level components. So when a user
checks out the latest xf86-video-intel from git, (without a kernel
upgrade), not only is that user not getting the "latest driver" as
recommended by the developers, but they might also be running a
combination that the developers and QA never saw nor tested.

For example, I recently attempted to run the latest-from-git Intel
driver with a kernel version as installed by current Debian
installation CDs. This kernel is ancient with respect to our driver
development (2.6.25 I think). What I found was that the X server would
not start at all. Obviously, we had neglected to test this particular
combination, (so yes, QA isn't perfect---but it also can't cover the
infinite number of possible combinations). Fortunately, the fix was
simple, and appears in the 2.7.99.901 snapshot I just released
(today!) so will appear in the 2.8.0 release very soon.

The packagers of X for Ubuntu have been working hard to address this
exact problem of the lag between developers and users. They've setup a
repository of packages named
xorg-edgers where users can
easily get packages built directly from git. I don't think it includes
the kernel yet, as it will soon hopefully, but this is definitely a
step in the right direction. Update: It does include a 2.6.30 kernel
now---maybe all that time I spent writing this blog post wasn't wasted
after all.

But what about the users that did upgrade their kernel and driver
and are still seeing some major problems? In this case, we really
want to hear from you. We take pride in our work and are committed to
doing our best to make quality releases without regressions. When you
hit some awful behavior, (GPU hang, scrambled mode, abysmal
performance), it may be that you're just hitting a combination of
components or application behavior that we've never seen before. We
want those details and we want to fix the problem. File good bug
reports and
please make use of the "regression" keyword as appropriate. We pay
attention to that.

Performance regressions are an especially important issue. In fact,
performance is so important that I'll make a separate post about
that. I've got some exciting news to share about performance
measurement.

In June of 2008 I started rock climbing at a local gym, and instantly
got hooked. I still can't fully explain the fascination I have with
climbing, but it certainly includes an aspect of solving a
three-dimensional puzzle. When I started, the easiest routes in the
gym (graded 5.7) were difficult for me, but satisfying.

Within a few months I had already acquired a fair amount of gear,
and I had the chance to climb outdoors (on toprope) with a good
friend, (Scott DV), who was rediscovering climbing, (having climbed in
Idaho years ago). We both found that there is some fine climbing to be
had in the Portland area. We were just disappointed that we hadn't
started sooner, because Oregon's wet season soon started interfering
with outdoor climbing.

I got through the winter months by being lucky enough to find a
climbing partner, (Scott D), willing to train with me 3 times a week,
(and helping to motivate me to get up at ridiculously early hours to
climb before work). We both took technique classes that took our
climbing up a notch. By the time we were ticking off routes at 5.9, we
took lead-climbing class and starting leading in the gym in January
2009.

An unseasonably cold and wet month of March kept us in the gym, where
we kept at our routine, leading at 5.8/5.9 and toproping at 5.10-
(with several 5.10+ projects). In April, the sun finally came out and
we took the first chance we could to get outdoors with a road trip to
Smith Rock and some of the best climbing in
Oregon. Our first experience leading outdoors was very satisfying,
(and very different than in the gym), with each of us succeeding at
5.8 leads.

Back at the gym two days later, we decided that if we're going to
start leading at 5.9, we'd better be working 5.11 in the gym. So we
started working our first 5.11- project. It was grueling, (and we
didn't make it through the crux without cheating with some weight on
the rope), but we did get to the top, and it was a blast.

One of my favorite things about climbing is that the routes are graded
on an open-ended scale. It's fantastic to know that no matter how much
I improve there will always be routes ready to punish me and put me
back in my place.