Following today’s early-on coverage of the Day 2 coverage of Qualcomm’s Tech Summit event in Maui, Hawaii, we recap the major story of the day: The new Snapdragon 855 platform. The new platform follows this year’s extremely successful Snapdragon 845 SoC, which we saw power pretty much the vast majority of 2018’s flagship devices.

Qualcomm isn’t standing still, and the Snapdragon 855 represents a new generation, bringing a refresh of the SoC IPs as well a brand new 7nm manufacturing process. Let’s dwell more into today’s details and analyse how the new SoC platform will raise the bar for 2019.

Qualcomm’s take on implementing the new Cortex-A76 cores is quite a bit different than what we’ve seen from HiSilicon. Overall there’s still 4 Cortex A76 derived cores (Kryo 485 Gold as Qualcomm markets them), alongside four Cortex-A55 derived CPUs. The differences here lie in the frequencies, the apparent cache configurations, as well as apparent changes in some microarchitectural tuneables.

Interestingly, for the first time since Qualcomm has adopted Arm’s “Built on Cortex Technology” license and with the third iteration of its implementation, Qualcomm has finally released details on the kind of changes that have been commissioned to Arm in terms of changes to the IP. Here Qualcomm reveals that the Cortex-A76 variation in the Snapdragon 855 allows for a bigger out-of-order execution window, most likely referring to an increase in the size of the reorder buffer. The stock A76 has a 128 instruction buffer, whereas Qualcomm’s modified A76 has been increased to an undisclosed size.

Alongside what seems to be the ROB increase, Qualcomm has also revealed that the data prefetchers have been optimised for better efficiency. It’s not clear if the “efficiency” here refers to power efficiency or the efficiency in the way data is prefetched, nor are the disclosures here what exactly has changes, whether there’s more or less prefetch streams or if there’s been changes in the other types of prefetchers.

While HiSilicon opted for a 2+2 design, where one pair of A76’s were optimised for high frequencies and the second pair were optimised for higher power efficiency, Qualcomm opted to go with a 1+3 configuration.

The highest performance core, “Kryo 485 Gold Prime” as Qualcomm calls it, is clocked in at 2.84GHz – putting it on its own clock domain – and is seemingly configured with a 512KB L2 cache. The other three cores are clocked at 2.42GHz and retain smaller 256KB L2 caches. This configuration is quite odd – you also would expect Qualcomm to take advantage of the new DynamIQ cluster design, which is able to support different frequency and voltage planes, however things get even odder. The prime core actually doesn’t have its own power plane, and thus it has to share its power plane with the other three big cores.

This revelation of the prime core not having its own power domain is quite shocking and it invalidates a lot of the benefits of actually having a separate clock plane for a core. In effect the real-world benefit here isn’t any different than simply clock-gating the core.

It is true that there’s a large amount of scenarios where there’s predominantly a single larger thread active, this is particularly true in web browsing workloads. Such a 1+3 configuration would achieve better performance and possible better efficiency than a 2+2 configuration, but because the cores aren’t running on separate voltage planes it means the actual benefits here in real-world applications are just going to be quite minor. The net result is that when the prime core gets powered up, the other big cores have to follow, regardless of whether there’s any work for them to do.

Qualcomm’s 2.84GHz clock is 9.2% higher than HiSilicon 2.6GHz frequency. A big question here is just how far Qualcomm has driven the core up on the power curve – I am expecting it to be less efficient than the Kirin 980 by some margin, how big that margin will be is something we won’t see until we get our hands on commercial devices.

Most interestingly for today’s presentation is that Qualcomm hadn’t made a single concrete mention about CPU power efficiency of the Snapdragon 855, and I’m not sure if this means there’s no improvements or rather just downplaying this aspect of the SoC given the other significant changes.

Lastly, I do find it odd that Qualcomm went for smaller L2 caches on the remaining 3 high performance cores. I still expect these to end up higher performance than HiSilicon’s 1.92GHz A76 units with 512KB L2’s – but it’s nevertheless interesting to see both companies try to achieve the same goal in different ways.

Moving on, we see the four Cortex-A55 derived efficiency cores, which are running at 1.8GHz and coupled with 128KB L2 caches. In this regard, it seems the Snapdragon 855 doesn’t differ from the Snapdragon 845. Here the company has seemingly put all the process node advantages into improving power efficiency of the little cores.

The DynamiQ Shared Unit’s L3 cache should come in at 4MB – which would be a doubling over the 2MB configuration on the Snapdragon 845. It’s to be noted that we haven’t yet fully confirmed the cache configurations at the time of writing, but I’m strongly leaning towards these figures to be correct. We’ve by now confirmed that the L3 cache has remained at 2MB – this is quite conservative on Qualcomm’s part and there will be an IPC impact compared to the Kirin 980’s 4MB implementation.

In terms of performance, all that Qualcomm publishes is a claim of up to a 45% performance increase over the Snapdragon 845. As with last year, it’s a bit of a mystery exactly what this figure represents, but the number pretty much falls in line exactly where the Kirin 980 performs in relation to the Snapdragon 845 in SPEC2006. The big question for the S855 is how the new generation system level cache will behave in terms of memory latency, as this will be among the biggest aspects differentiating Qualcomm’s new SoC from its Kirin competition.

Another interesting performance comparison that was published today is a showcase of performance figures between the Snapdragon 855, Apple A12, and the Kirin 980 in terms of app launch times. Though Qualcomm doesn’t directly name their competitors, competitors A and B should be the Apple A12 and the Kirin 980 respectively, assuming Qualcomm’s colour scheme is also consistent across the GPU comparisons. For me it’s not to surprising to see the Snpadragon 855 perform this well – one thing I did note in my Huawei Mate 20 review is that the Pixel 3 and OnePlus 3 still felt faster in terms of application launch times. Though this could all just be a side-effect of the scheduler and framework of the Snapdragon chipset rather than the raw CPU performance of the hardware. Of course, software still matters immensely and over the last two years Qualcomm has demonstrated absolute leadership in terms of milking out responsiveness and reactivity out of the hardware through its software designs.

Adreno 640 GPU – Iterative Features and Performance

The Adreno 640 graphics block will be the focus for Qualcomm’s gaming efforts. The company went to great lengths to detail how they felt mobile gaming is on the rise, while other platforms for video games are either stagnating or in decline.

In terms of technical specifications, as is traditional with Qualcomm, we didn’t see much in the way of detailed disclosures on the new GPU. What we did get are more conservative figures, such as a 20% increase in performance. This increase is quite small compared to what we tend to usually see, especially given the fact that the Snapdragon 855 is able to take advantage of a major process node transition.

The Snapdragon 845’s GPU was already the smallest among flagship mobile SoCs at a mere 10.69mm², so unless Qualcomm has significantly increased the number of processing elements inside the GPU cores, this generation should be even smaller. Meanwhile in the event presentation there was one actual titbit about the GPU; Qualcomm is saying that they’ve increased the number of ALUs for FP32 and FP16 operations by 50%. If my previous estimates about the Adreno 630 were correct, then this would mean the new Adreno 640 sports 384 ALUs per core for a total of 768 ALUs. This ALU increase doesn’t match up with the claimed performance increase, so it’s possible Qualcomm is running the GPU at a lower frequency, or the performance claims were made in regards to possibly less ALU sensitive workloads.

Qualcomm showed a side-by-side comparison between the Snapdragon 845 and the new 855 running PUBG on a cycled script at 40fps. The new chipset was able to showcase a 28% reduction in power on this identical workload. It’s to be noted we don’t really know exactly what point on the power curve this measurement is done, so it’s always a bit of a mystery in terms of direct power comparisons when you do the testing at certain capped performance states.

While the performance gains remain a bit vague at time of writing, Qualcomm did disclose a lot in terms of new graphical features. Here we saw claims that the Adreno 640 graphics in the S855 will enable true HDR gaming, as well games built around Physically Based Rendering. The graphics pipeline will support 10-bit color depth and the Rec 2020 gamut to enable HDR, as well as enabling S855 devices to support the HDR10+ and Dolby Vision formats, which QC states is a world’s first. With the Adreno 640, along with the display IP, devices can support 120fps gaming as well as smooth 8K 360-degree video playback (resolving a major complaint about Snapdragon-power). Just don’t ask how much space those 8K 360-degree videos take up.

Qualcomm’s support for Physically Based Rendering in graphics is an interesting topic, one we’ll go into detail in a different article, but the concept is not new. In fact we’re a bit surprised to see it mentioned in the same breath as actual hardware changes, since conceptually it shouldn’t require any new hardware; PBR is just a shader program that all of the Adreno 600 family should be able to run.

In any case, the short version is that with this enabled, it will help add realism to gaming and augmented reality through more accurate lighting physics and material interactions. Qualcomm stated that through the Unity 4 engine, developers will be able to use real world materials designed from scientific values created by companies like Quixels and Allegorithmic that will make their environments more lifelike, such as the correct surface roughness / audio reflections or material-on-material interactions. This will also help with lighting and depth perception. More details to come.