ModeratorLegendVeteran

BTW I think that having two L/S ports is on the low side for a design that wide.

Click to expand...

I agree although at least the extra registers of ARMv8 should help compared to ARMv7 or x86-64, right? Also an extra load unit might be difficult to feed from the L1 data cache - I'm not sure what trade-offs they are doing there in terms of bandwidth/banks/etc. given their huge 128KiB capacity.

Coming from a GPU background I have maybe an irrational dislike of register spilling, but in extreme cases I've wondered whether it'd be more energy efficient for the compiler to recompute certain results rather than store it in L1 then reload it (if the data needed to compute is being kept in registers anyway for another reason - not sure how common that is in CPU workloads to be honest, might be too rare to focus on). I don't think modern compilers do this? Anyway it probably wouldn't make much difference and it's a bit academic...

BTW - do we know what's the L1 data cache line size for Apple? ARM's cores have 64 bytes cache lines, but in my mind, that's partly because some customers will use memory controllers with 64 bytes granularity. Since Apple controls the entire SoC, they might (or might not) have decided that 32 bytes granularity is still beneficial despite the cost in the memory controller, at which point it might make sense for the L1 data cache to also have 32 bytes cache lines. With smaller cache lines prefetching also becomes slightly more important, but given Apple's performance levels it's obvious they must have good prefetching algorithms.

Regular

I agree although at least the extra registers of ARMv8 should help compared to ARMv7 or x86-64, right? Also an extra load unit might be difficult to feed from the L1 data cache - I'm not sure what trade-offs they are doing there in terms of bandwidth/banks/etc. given their huge 128KiB capacity.

Click to expand...

I was not thinking about having a third load, but rather being able to issue two loads and one store; that's useful for some computing tasks that stream their data (e.g., summing two vectors). IIRC Intel can do it since Haswell and their CPU are less wide.

Regular

9to5mac claims that the rumored iPad Pro (2018) will have an A12X SoC.

No numbers are given, but I think a reasonable guess from previous -X SoCs is the following:

3 Vortex cores clocked slightly higher than in the A12

6? Tempest cores

128-bit memory interface

8 GPU cores (2x the A12)

16 Neural Engine cores (2x the A12), ~10 TOPS.

This may be a silly question, but do the numbers of Vortex and Tempest cores have to be in a fixed (1:2) ratio?

One of the more interesting rumors (Kuo, 9to5mac) is that the iPad Pro may have a USB-C port instead of the Lightning port, which allows for 4K output. This change further differentiates the iPad Pro from the iPhone and iPad (non-Pro) and seems to push it a bit closer to laptop territory.

Regular

AnandTech has released SPEC2006 estimates for the small CPU cores in the A11 and A12, as well as Neural Engine benchmarks.

Andrei Frumusanu said:

What did surprise me a lot was seeing just how well Apple’s small cores compare to Arm’s Cortex-A73 under SPECint. Here Apple’s small cores almost match the performance of Arm’s high-performance cores from ust 2 years ago. In SPEC's integer workloads, A12 Tempest is nearly equivalent to a 2.1GHz A73.

However in the SPECfp workloads, the small cores aren’t competitive.
[…]
In recent years I’ve felt that Arm’s little core performance range has become insufficient in many workloads, and this may also be why we’re going to see a lot more three-tiered SoCs (such as the Kirin 980) in the coming future.

Click to expand...

Is there any benefit for a future A-series SoC to have "big" cores, out-of-order "little" cores, and a third tier of in-order tiny cores?

Legend

AnandTech has released SPEC2006 estimates for the small CPU cores in the A11 and A12, as well as Neural Engine benchmarks.
Is there any benefit for a future A-series SoC to have "big" cores, out-of-order "little" cores, and a third tier of in-order tiny cores?

Click to expand...

I was referring to the Android SoCs - the middle gap is now quite big. We'll see some interesting solutions in the next gen for this.

BTW, your frequency measurements are bit off.
All precise frequencies I measured on A7,A9,A11 are divisible by 24MHz. (So 1587 or 2083 are just not possible)
This is a CNTFRQ_EL0 timebase.

The fastest way to read a timer is
isb
mrs x0, CNTPCT_EL0
ret

While 2064MHz (on Monsoon) was measured by my early freq timing code, the later versions with simultaneous measurements on N=3..6 cores
got the same 2304MHz Monsoon max freq as with dual cores. But I think I need to re-check min frequency in this situation.
I'll going to buy iPhone XR and revisit my measurements.

Veteran

The new iPad Pros and with the A12x SoC has been revealed.
Well, damn. They promise 35% higher single thread performance and over 90% better multithread over its predecessor, which in Geekbench 4 terms translates to a single thread score of 5400 and a multithread score of just under 20000. For ballpark reference, that's the performance level of Intels core i7 6700K.
And it has supporting computational functionality on the SoC that the x86 environment lacks.
The GPU seems to be effectively twice the performance of the previous iPad Pro, but with some twists. They made repeated references to game consoles, saying it was as fast as the XB1s but capable of feats that consoles cannot match - like portability and 120Hz display. (Although, trying to nip a pointless discussion in the bud, consoles are obviously a different market altogether.) Rather they targeted laptops, demonstrating that the iPad outsold all other laptops even aggregated by manufacturer (Apple themselves being edited out of the comparison), and claiming that the iPad Pros also outperform the overwhelming majority of laptops.
It is certainly true that it stomps all over the new MacBook Air in terms of performance. The iPad editing speed of a 3GB Photoshop file sure was impressive.

As an aside to Nebuchadnezzar, at these performance levels, it would be really neat if you extended the performance/power comparisons beyond the ARM cores and mobile GPUs, and included desktop or portable GPUs as well. Even though it's a can of worms, it would be neat to see what ballpark we are in. CPU comparisons can be done back-of-the-envelope already, but the GPU comparison is trickier. Just toss in a single x86 core and a single desktop GPU, and people have data points to extrapolate from to other products.

LegendVeteranSubscriber

Well, damn. They promise 35% higher single thread performance and over 90% better multithread over its predecessor

Click to expand...

Which predecessor? A10X or A12?

Nonetheless, it sounds like an impressive SoC. I wonder if it has the same 4*32bit channels. Using LPDDR4X 4266MT/s, they'd get almost 70GB/s total bandwidth. That's a lot more than a Geforce MX150 (GP108) GPU, and actually above the Xbone without the EDRAM.

Veteran

Nonetheless, it sounds like an impressive SoC. I wonder if it has the same 4*32bit channels. Using LPDDR4X 4266MT/s, they'd get almost 70GB/s total bandwidth. That's a lot more than a Geforce MX150 (GP108) GPU, and actually above the Xbone without the EDRAM.

Click to expand...

They compared to the A10x. Still have an aluminium body, which helps with heat dissipation, as opposed to the iPhones.

Regular

According to Steve Troughton-Smith, the 2018 iPad Pro features 6 GB RAM for both the 11" and 12.9", but only for the highest end storage (1 TB) configuration. The other storage capacities continue to have 4 GB.

I think this is the first time an iOS device has split RAM sizes by storage capacity. I was hoping for more RAM in the new iPad Pro (6 or 8 GB and regardless of storage) given the higher-end feature set compared to previous iPad Pros and since the iPhone XS and XS Max moved up to 4 GB this year.

About Us

Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!