Opinion: Performance benchmarks are worthless, here’s how to make them better

AMD is getting ready to launch its next-generation architecture (code named Trinity) and invited a bunch of us out to Austin to see it. I can’t talk about the technology until it is launched, but one of the events at the show was a head-to-head comparison between this AMD technology and Intel’s top-shelf products.In each test (including productivity, video enhancement, and file compression) AMD Trinity technology wasn’t just faster, it was substantially faster.

Though the demonstration was impressive,it also reminded me why benchmarks really aren’t that useful anymore. Not only do they fail to reflect what each of us individually do, they don’t factor in cost, device size, or design, each of which might be more important than any direct performance measure.

For instance, Apple hasn’t led benchmarks in years. Side-by-side with competitors, the iPad and iPhone actually tend to appear relatively slow (they often use older networking, processor, or storage technology). They are also relatively expensive, yet lots of folks still prefer them, suggesting benchmarks as they currently exist are worthless to these buyers. They rank other things higher.

So what would a perfect benchmark look like?

How do you work?

The perfect benchmark would be derived from an ongoing analysis of how you use your hardware. We all change as we age, and even change what we do from day to night,from weekdays to weekends, and on vacation, so the capture should occur over a period of time.

It should also look for critical points, like what annoys us and what thrills us — not only in terms of what we are doing, but what we are talking about. In short, factor in our social-networking activity in things like Facebook and Pinterest.

Finally it would rank all aspects of our interest and factor in cost, not only the cost of buying the product, but the cost in time of putting the product into service, maintaining it, and our sensitivity to down time.

Analyzing the device

Since it has proven impractical to go into a store and run a benchmark on a shelved PC, and impossible to do the same thing if we want to buy online, the ideal benchmark would also need to capture the performance of systems on the market. Against this objective data, it would also capture subjective data on design, expected reliability, and time to obsolescence. While the latter two could come from historic data (much like Consumer Reports does with its ranks), the design analysis would be based on what someone similar to you in terms of personality type and taste would rank the product.

Finally, given that we live in an online “cloud” world, a major portion of the data captured would need to be on the services the device connected to, the apps it would load, and the overall end-to-end user experience.

In the end, everything would be mathematically rendered.

The result

The result would be accessible on a site where you could go, log in, and specify either the type of product you were looking for, or enter a number of products you were looking at. The system would then give you a set of choices listing the key analytical elements of each. So if you saw something that wasn’t current, or you didn’t agree with, you could change the element and thus change the ranking.

You could see an overall ranking of around 10 products with some specific ones flagged: the lowest priced, the best match to you, and the most balanced (best value for the money as defined by your unique needs and tastes). This is also somewhat similar to what Consumer Reports tries to do, but more advanced.

You would end up with a list of top choices that would be more likely to thrill you. It could also analyze products you already own to flag when performance degraded to a point that would begin to irritate you, or when the extra performance of a new system was great enough to make it worth it for you – specifically based on your needs.

Benchmarks don’t have to suck

When I first ran into benchmarks, Intel was complaining that it built systems that were betterrounded, while AMD was using benchmarks to drive people to systems they would like less. Intel tried to get the industry to drop the benchmarks, failed, and now largely optimizes for benchmarks.

If you focus on what people want to do, you’ll provide a better experience, but still likely get slammed by benchmarks. At AMD’s event, the company was pointing to the reasons benchmarks suck.

I think the answer here is to create benchmarks that don’t suck. We have online tools that capture a ton of information about us to sell to advertisers, so it doesn’t seem to be such a stretch to use some of this technology to create a tool that makes us happier consumers. Considering all this information is compiled about usand should belong to us, it would be really nice if it were used to make us happier, rather than just milk us for money. This would be a way to do that. What do you think?