Benchmarks for Whole Phones Needed

Benchmarking is a tricky business. There are benchmarks and there are benchmarks. I'm sure I'm not alone in my weariness with reading reports on yet another batch of benchmark results.

MADISON, Wis. — Benchmarking is a tricky business. There are benchmarks and there are benchmarks. I'm sure I'm not alone in my weariness with reading reports on yet another batch of benchmark results.

Among the more notorious examples is the recent news about Intel's Z2580 application processor, codenamed CloverTrail prior to launch, which outperformed competitors' processors in a benchmarking exercise. The report, issued by ABI Research in early June, concluded that Intel has succeeded in reducing significantly the power consumption of its smartphone application processor and now rivals equivalent processors based on the ARM architecture licensed from ARM.

Subsequent reporting and investigation, however, revealed that ABI's conclusions were derived from one outlying benchmark (done by AnTuTu). The market research firm neglected to compare results from a suite of benchmarks.

To be clear, in the electronics industry, there's no shortage of benchmarks. Benchmarking exercises are carried out by various outfits for just about every purpose -- from CPUs, GPUs, and DSPs, to FPGAs, and more.

"There are surprisingly very few benchmarks available that capture the performance of the whole phone," said Jeff Bier, president of Berkeley Design Technology, Inc. (BDTI). Dissecting the performance of technology on a component level is one thing. "But what about measuring technology in terms of what matters to consumers, such as battery life, application speed, and network performance for a variety of mobile phones and tablets?" Bier asked.

BDTI last week announced that it has teamed with Qualcomm to create a new user-experience rating for mobile devices such as smartphones and tablets.

For BDTI, a technology consulting firm known for developing its own benchmarking for processor cores and other technologies over the last 20 years, this ascent up the food chain to focus on system-level performance struck me as a radical departure. Bier, however, stressed that BDTI is uniquely qualified to meet the challenge precisely because of its decades of expertise in independent benchmarking, which has earned widespread trust in the technology industry. "We know what we're doing," said Bier.

Pitfalls
Of course, I couldn't help but point out that BDTI is taking money from Qualcomm to develop this new benchmarking for the consumer experience of smartphones and tablets. How could it be "independent"?

Bier said, "Of course, no model is perfect." The new efforts are being funded by Qualcomm, but BDTI is independently designing benchmarks and experience ratings based on technical merit, he asserted.

"How we actually designed benchmarks will be transparent, and it can be viewed by network operators, OEMs, and savvy technology analysts" for their inspection. "In the interest of checks and balances, our policy is to let others see how we've done it," he added.

For those with legitimate business interests, the source code of BDTI's benchmarking will be available for a fee, said Bier. Journalists and industry analysts can see the results of the new benchmarks for free.

Bier also noted that BDTI is fully aware of many pitfalls and challenges associated with benchmarking efforts.

History suggests that all benchmarks are subject to manipulation. Even if they're developed within an industry consortium, there can be members who know ways to skew a benchmark in ways that stack the deck against competing technologies, pressure others, and get votes from their friends.

One of the most effective aspect of benchmarking - perofrmance and other aspects - is to give these devices to different group of teeenage schoold children. Probebly they are the most prolific user of these devices and hence can test them best. Their feedbacks are more authentic, less prejudiced. If teenage likes and approves mobile device, it will be a good indication of success too.

That said, simplicity of a user interface does win a lot of "average" consumers' hearts.

Indeed, and that's been a major factor in Apple's success. They put a lot of thought into their UI, and users of Apple gear can tap an icon or select a menu choice and largely expect it to do what they think it will do, the way they think it should do it.

Apple also enforces UI guidleines on third-party apps, so Apple's overall approach is "Have it our way." Apple preserves the quality of the user's experience by placing restrictions on what that experience is. Since Apple users can generally do what they want, or get an app that does, they don't see this as an imposition. Folks who want more control over and ability to customize their device won't agree, but aren't likely to buy Apple gear in the first place.

As an average consumer, I would pay attention to such matters

Which is why the tech sites all carry lengthy reviews that delve into usage, to give users an idea of what to expect if they get the device.

even if that is not a kind of things benchmarks are good at in quantifying.

And you touched on the key point. Benchmarks quantify, and UI quality is the sort of thing that largely can't be quantified. Benchmark values will affect it, by measuring capacities of the underlying hardware to do what the user wants when they tap something. You can assume more powerful hardware will be more responsive, and may even make possible actions that can't be performed on a less powerful device. But this, too, is something I expect a review to cover.

I think the question boils down to this: since an average consumer of smartphones is not all that benchmark savvy, do these metrics really matter when selecting a smartphone?

Probably not.

Consumer Reports, for example, offers the service of comparing mobile phones. But I am not sure if that's regarded as highly as their reports on cars. Part of the reason is mobile handsets are cheaper (than cars) and they reflect more individual tastes.

But I believe what BDTI is trying to do is not for marketing their benchmark results to consumers directly, but to give the electronics industry an opportunity to look at the perfomance of a whole phone (rather than individual component).

Which phone is best? Its like which shirt fits everyone... Every user has different expectation from phone. I have Galaxy S4 and if I dial down the network usage it can sustain battery ife for upto 4 days for me.. good enough, beats iPhone. But for other users where they are looking for stay connected iPhone performed better (for one person I know :) ).

Benchmarks can just compare particular aspect of phone. Ok graphics is genious in a particular phone, but how many seconds (not even in minutes) in phone's life have we pushed graphics to that need? LTE can achieve 100Mbps speed but how many pllaces we get signal to hit that speed and even if we hit how many need it? Battery life is purely per user's settings.. a phone can vary its batter life from 1x to 7x (or more) depending upon what function and how often we enable it..

I dont believe in a benchmark that can clearly tell which phone is better from other.

There is a good conversation on a separate but related topic going on over at SemiWiki. It's a follow-on thread to an excellent post by Paul McLellan specifically about benchmarking processors. http://www.semiwiki.com/forum/content/2675-how-benchmark-processor.html

Well, maybe it's high time for some bright marketing people to come up with UI benchmarketing!

That should be hilarious if someone tries, because UIs are so highly subjective. Your idea of a good UI might drive me screaming from the room. Which is "best"?

Consider Linux, where you have a variety of desktop managers - Gnome, Kde, Xfce4, Lxde, Unity...each somewhat to very different from the others, and each with a following that insists it is best. (One reason many folks like Linux is because you have that range of choice in UIs.)

"The least keystrokes/touches needed to accomplish a task" may not be a meaningful benchmark. What if it only takes one or two to run the app and perform a function, but if you use more you get extra options and better control? (Like when you're taking a picture with your phones camera, and you might get a better result if you don't just take the phone app's presets) An app that take one click to bring up the photo app and snap a picture might win on simplicity, but a competitor that used more taps might take a better photo. Who wins?

I'll settle for marketing and reviews that do a decent job of explaining the design assumptions and giving a feel for how you use the device.