I compared the original four models (DSGE, BVAR, FOMC Greenbook, and the IT model) with four more: constant 2% (annual) inflation, an AR(1) process, and two LOESS smoothings of the data. The latter two aren't actually forecasts -- they're the actual data, just smoothed. I chose these because I wanted to see what the best possible model would achieve depending on how much "noise" you believe the data contains. I chose the smoothing parameter to be either 0.03 (achieving an R² of about 0.7, per this) or 0.54 (in order to match the R² of the IT model one quarter ahead). And here are what the four models look like forecasting one quarter ahead over the testing period (1992-2006):

So how about the performance metric (the R² of forecast vs realized inflation)? Here they are, sorted by average rank over the 1Q to 6Q horizon:

First, note that I reproduce the finding (per Noah Smith) that an AR process does better than the DSGE model. Actually, it does better than anything except what is practically data itself!

The IT model does almost exactly as well as a smoothing of the data (LOESS 0.54), which is what it is supposed to do: it is a model of the macroeconomic trend, not the fluctuations. In fact, it is only outperformed by an AR process (a pure model of the fluctuations) and a light smoothing of the data (LOESS 0.03). I was actually surprised by the almost identical performance for Q2 through Q6 of the LOESS smoothing and the IT model because I had only altered the smoothing parameter until I got (roughly) the same value as the IT model for Q1.

The DSGE model, on the other hand, is only slightly better than constant inflation, the worst model of the bunch.