Qualcomm Snapdragon S4 Benchmarking Day

Today, Qualcomm invited approximately 30 writers from across the world to attend their Snapdragon S4 Benchmarking Day. This wasn’t any regular benchmarking session, however, as it was filled with hours upon hours of informative discussions and talks about Qualcomm’s CPUs, GPUs and the software that makes it happen. Qualcomm also invited people from respected companies like Jon Peddie of Jon Peddie Research as well as Teemu Uotila from Rightware. They came to talk about the importance of good benchmarks and where they saw mobile bencmarks moving in the future.

The day kicked off with Jon Peddie talking about the importance of recognizing a good benchmark. He followed this up by stating that he had come to many mobile companies and asked them how they foresaw the solution to the mobile benchmarking problem. Many companies were receptive, however, only one was responsive, Qualcomm. Jon Peddie talked about the importance of having a real world benchmark and the overall list of different things on a smart phone that could be benchmarked. As a result, Jon Peddie concluded that one of the most powerful benchmarks that we could employ today and into the future would be an AR (Augmented Reality) based benchmark. This benchmark would enable reviewers and users alike to be able to benchmark the AR experience on any device while simultaneously evaluating the effectiveness of all the little different sensors and processors involved in the process of AR. Some members of the audience seemed to have trepidations regarding such an approach, however, we disagree with them and believe that AR could quite possibly be one of the most powerful benchmarking tools for today and the future when you consider all of the different parts of the device that could be put to use.

Following Jon Peddie was Travis Lanier of Qualcomm, who spoke very heavily about Qualcomm’s benchmark involvement and benchmarking practices squarely around the CPU. Since his expertise and involvement was primarily in CPUs, he talked a lot about the different practices that were involved in benchmarking mostly Android devices as well as the different ARM instruction sets that were implemented in order to maintain compatibility with the ARM ecosystem. Since Qualcomm designs their own chips and licenses ARM’s instructions, they spend quite a bit of time and money making sure that their processors are as compatible with ARM’s instruction sets as possible while simultaneously delivering the best performance per watt that they can possibly deliver. Mr. Lanier even went back and talked about the differences beteween different multi-core ARM SoC implementations and how some were fundamentally different from others and how Qualcomm’s were better.

Looking at the slide, you can see how Qualcomm compares their Krait CPU performance against ‘other offerings’ with four cores (obviously NVIDIA’s Tegra 3) and talking about the instruction issues per core and showing how even Qualcomm’s dual core has more instruction issues per core than their competitor’s because of the fundamental design of the CPU architecture. Note that Qualcomm’s original banter of more processors is not better has died down now that they have a quad-core processor. However, they are still maintaining that their cores are fundamentally faster than their competitors’ and as a result, their quad should be significantly faster and better.

Travis also talked about the sifference between having asynchronous symmetic multiprocessing versus having synchronous symmetric multiprocessing. The differences are most notable in situations where power is an issue and where a processor uses the same voltage and frequency on all cores, rather than varying the core frequency and voltage based upon load needed. He also talked about the importance that this had on device temperature and by extension battery life. He then followed up by talking about the different ways of benchmarking on an Android device featuring Qualcomm’s Krait processors and eventually pivoted towards talking about HTML5 and the importance that the CPU had on HTML5. He also wanted to make sure that the people attending the event understood that there are situations like video playback and audio playback as well as games that are wonderful real world benchmarks, but are not actual measures of CPU power. He stated that a credible benchmark had to be a real world benchmark made by someone reputable and knowledgable as well as someone who was willing to make an entire application rather than just snippets of code.

Travis was followed up by none other than Tim Leland, who is the product manager responsible for Qualcomm’s Adreno GPU group. He began to talk about the Adreno GPU regarding how the Snapdragon Adreno GPU is used primarily in three ways, with empasis being put on graphics rendering, composition and GPGPU compute. He went into detail about how Adreno’s GPU is based upon a unified shader architecture, similar to what we’ve seen in desktop GPUs for quite some time and the benefits associated with having a unified GPU architecture. As a result of the Adreno team’s advances, Qualcomm’s Adreno 320 delivers 3-4x the performance as it’s predecessor, the Adreno 225, as well as increases the overall frame rates and quality of the lighting and shaders.

The real, new feature of Adreno 320 came in the form of Qualcomm’s FlexRender which is Qualcomm’s new technology which enables intelligent switching between tiled and direct rendering modes to maximize application performance and minimize power consumption. FlexRender enables the GPU to intelligently switch between these two modes through an API that would tell the GPU to switch from tile based rendering to direct rendering as tile based rendering is the default for the GPU. Tim also stated that they were working on a tool to enable the GPU to analyze the situation frame by frame and to render each individual frame based upon the most power efficient method be it direct or tile based deffered rendering.

The GPGPU supported features of the Adreno 320 primarily are enabled through OpenCL 1.2 and RenderScript Compute. The use cases that Qualcomm provided were mainly through object cloning, object removal, and low light noise reduction. However, there is a strong possibility that these features could also be implemented for applications as serious computational photograph y where multiple depths of a photo could be taken to enable a perfectly in focus photo from front to back.

Tim also talked about the current trends with GPU benchmarks and also talked about what made a mobile graphics benchmark a good one. He talked about good benchmarks being well written tests that preduct graphical performance with real world applications that were written with current and relevant graphics APIs like OpenGL ES 2.0 and greater. They should include meaningful metrics and configurations to ensure correct results and run on the latest version of the OS targets to be tested. Tim stated that the best graphics benchmarks are generally developed by industry experts that are reputable companies like Rightware, Kishonti and Tactel all of whom test the GPU in all relevant ways. In the end, these benchmarks need to be able to allow the user to determine if their device is capable of playing the best games available for their device and nothing less.

He went on mentioning that the new Adreno 320 GPU would also support the Khronos Group’s new OpenGL E2 3.0 standard codenamed Halti, which has yet to be announced by the working group. He did mention that it enabled more efficient and flexible programming and simplifies the process across different platforms. It should also enable multiple enhancements in terms of advanced visual effects as well as high quality texture compression formats that could, in theory, reduce the texture pack sizes in high quality games.

Following Tim was Mike Genewich, who is part of the team that helped build the Vellamo benchmarking tool. This tool has proven to be one of the best HTML5 benchmarks out there and was originally born out of Qualcomm’s internal need to track web performance regression in OEMs implementations and eventually became good enough for Qualcomm to release. The four main components of Vellamo are rendering, JavaScript (SunSpider and V8), user experience tests, and networking. The scores of each of the four of these are combined and then compiled into a Vellamo benchmark experience score, which should, in theory, be a measure of a device’s browser performance.

After Mike, Teemu from Rightware came on stage and talked about the role that the company has in benchmarking mobile devices and their pedigree of being part of a gaming benchmarking company. Their focus is on graphics benchmarking and browser benchmarking as well as overall system benchmarking and they have one benchmark for each. Their Basemark ES 2.0 Taiji Free benchmark was previously known as 3DMarkMobile ES 2.0 which focuses on 3D graphics performance and has become the objective and reliable industry de-facto OpenGL ES 2.0 benchmark. They also have created Basemark OS which is a collection of tests which attempt to test the device from as many different angles as possible including media playback tests, program startup tests, JVM speed tests, compression tests, database operations, loading and sclaing images among many other things. Rightware will also be launching their Browsermark Next Generation, which is timed for an August release, meaning that it will likely line up with SIGGRAPH. Teemu finished his presentation by showing off a very high quality benchmark preview which appeared to be a benchmark that could have been for OpenGL ES 3.0 which Tim had spoken about earlier.

Not to be outdone, Liat Ben-Zur, Senior Director of Software Strategy and Business Development at Qualcomm, takled about the software innovation that Qualcomm was helping to facilitate through APIs like Alljoyn, which enable peer to peer communication across different platforms. The added benefits of having things like Augmented Reality, Computer Vision and Facial Processing are all software benefits born out of Qualcomm’s different SDKs including their Vuforia AR SDk and their Snapdragon SDK to touch on a few. She also talked about the role that these different software tools took in making our devices vastly more useful than any of us could every imagine as had been displayed at UPLINQ by the developer that had created an application to detect facial responses to determine if a child had down syndrome. She also brought up the fact that Qualcomm’s QDevNet is a vast resource for developers to utilize whenever building applications using any of Qualcomm’s tools, which is always being updated and improved to help developers.

Once Liat had let us go to lunch, we came back from our lunch and were greeted by Raj Talluri, SVP at Qualcomm, who quickly broke down the Snapdragon S4 PRO APQ 8064-based MDP tablet which featured a 1366×768 display and the 1.5 GHz Quad CPU core APQ8064 processor featuring the Adreno 320 GPU. He then got out of our way and informed us that the tablets already had 12 pre-installed benchmark applications and that we could run or install any applications that we wanted to test the tablets’ performance. However, there were not enough tablets to go around and one tablet had to generally be shared between two or three people. This was a bit disappointing considering that Qualcomm had brought so many of us so far to test the tablets and they were not able to secure enough tablets to allow each member of the press a device to test on. Furthermore, Qualcomm only gave away three of the tablets to the press to keep and the rest of the press went home tabletless. There is no doubt in our minds that the shortage of 28nm S4 processors was one of the culprits for this shortage of tablets, but we would have liked to see more tablets ending up in the hands of the press considering how far some of them had come. Oh and the price tag for the APQ8064 based MDP? A cool $1299. And they are available for purchase now!

After many hours of benchmarking (no thanks to the W Hotel’s dreadful Wi-Fi), we were able to get an idea of what kind of performance the APQ8064 was able to deliver and needless to say, we were surprised. We will be delivering our results shortly once we compile all of the data that we collected from our various devices.