The GPU: Apple's Gift to Game Developers

The GPU side of the A5 is really what's most exciting. As we mentioned in our iPad 2 GPU Performance analysis, the A5 includes a dual-core PowerVR SGX 543 - also known as the SGX 543MP2. In our earlier article we showed the SGX 543MP2 easily beating both an iPad 1 and the Tegra 2 based Motorola Xoom.

To understand why the SGX 543MP2 has such a performance advantage we need to first remember that NVIDIA's Tegra 2 is nearly a year late. NVIDIA's first competitive ultra mobile GPU was supposed to be shipping in products in the first half of 2010, instead it found itself shipping in 2011. While NVIDIA is good at designing GPUs, it's not good enough that it can release a product and maintain a two year performance advantage over the competition. Let's look at the architecture, shall we?

NVIDIA's Tegra 2 features a DirectX 9-class GPU. NVIDIA used to call it the GeForce ULP (Ultra Low Power) but now it's just GeForce. As a DX9 class GPU we're dealing with a conventional, non-unified shader architecture. While all OpenGL ES 2.0 GPUs can execute pixel and vertex shader instructions, the GeForce in Tegra 2 runs pixel and vertex shaders on separate groups of hardware.

NVIDIA calls each pixel and vertex shader ALU a core. The Tegra 2 has four pixel shader cores and four vertex shader cores. The four pixel shader ALUs make up a single Vec4 and the same goes for the four vertex shader ALUs. NVIDIA wouldn't elaborate on what limitations exist when dispatching operations to the cores. All pixel shader operations happen at 20-bits per component precision while all vertex shader operations happen at 32-bits per component.

Each core is capable of executing one multiply+add (MAD) operation per clock. Do the math and that works out to be a peak rate of 8 MADs per clock for the entire GPU. The maximum operating frequency for the Tegra 2 GeForce GPU is 300MHz, however device vendors may run the GPU at a lower frequency to save on power. At 300MHz this works out to be 4.8 GFLOPS (counting a MAD as two FLOPs).

Imagination Technologies' PowerVR SGX 543MP2 is fundamentally a bigger GPU than the GeForce in NVIDIA's Tegra 2. Let's go through the math.

The SGX 543 features four USSE2 pipes. This is a unified shader architecture so both vertex and pixel shader code runs on the same set of hardware. The benefit of this approach is you get better performance in peaky situations where you're running a lot of vertex or pixel shader code and not a balance that's perfectly tailored to your architecture. The Tegra 2 will only run at peak efficiency if it encounters a mix of 50% vertex and 50% pixel shader code. The PowerVR SGX series will never have any of its execution pipes idle regardless of the instruction mix.

Each USSE2 pipe has a 4-wide vector ALU capable of cranking out 4 MADs per clock. Two of these pipes is enough to equal the peak throughput of what NVIDIA built in Tegra 2, but the PowerVR SGX 543 has four of them. As for the MP2? Go ahead and double that number again. The SGX 543MP2 is simply two 543s placed next to one another.

All of this works out to be 16 MADs per clock for the SGX 543 and 32 MADs per clock for the SGX 543MP2. At 200MHz that's 12.8GFLOPS and at 250MHz we're talking about 16 GFLOPS.

Mobile SoC GPU Comparison

PowerVR SGX 530

PowerVR SGX 535

PowerVR SGX 540

PowerVR SGX 543

PowerVR SGX 543MP2

GeForce ULP

Kal-El GeForce

SIMD Name

USSE

USSE

USSE

USSE2

USSE2

Core

Core

# of SIMDs

2

2

4

4

8

8

12

MADs per SIMD

2

2

2

4

4

1

?

Total MADs

4

4

8

16

32

8

?

GFLOPS @ 200MHz

1.6 GFLOPS

1.6 GFLOPS

3.2 GFLOPS

6.4 GFLOPS

12.8 GFLOPS

3.2 GFLOPS

?

GFLOPS @ 300MHz

2.4 GFLOPS

2.4 GFLOPS

4.8 GFLOPS

9.6 GFLOPS

19.2 GFLOPS

4.8 GFLOPS

?

At its lowest expected clock speed, the 543MP2 already has over twice the compute power of the Tegra 2's GPU at its highest operating frequency. Take into account the fact that the A5 likely has more memory bandwidth than Tegra 2 and the SGX 543MP2 is a tile based architecture with lower bandwidth requirements and the performance numbers we talked about last time shouldn't be all that surprising.

The real competition for the SGX 543MP2 will be NVIDIA's Kal-El. That part is expected to ship on time and will feature a boost in core count: from 8 to 12. The ratio of pixel to vertex shader cores is not known at this point but I'm guessing it won't be balanced anymore. NVIDIA is promising 3x the GPU performance out of Kal-El so I suspect that we'll see an increase in throughput per core.

GPU Performance

Taken from our iPad 2 GPU Performance Preview:

As always we turn to GLBenchmark 2.0, a benchmark crafted by a bunch of developers who either have or had experience doing development work for some of the big dev houses in the industry. We'll start with some of the synthetics.

Over the course of PC gaming evolution we noticed a significant increase in geometry complexity. We'll likely see a similar evolution with games in the ultra mobile space, and as a result this next round of ultra mobile GPUs will seriously ramp up geometry performance.

Here we look at two different geometry tests amounting to the (almost) best and worst case triangle throughput measured by GLBenchmark 2.0. First we have the best case scenario - a textured triangle:

The original iPad could manage 8.7 million triangles per second in this test. The iPad 2? 29 million. An increase of over 3x. Developers with existing titles on the iPad could conceivably triple geometry complexity with no impact on performance on the iPad 2.

Now for the more complex case - a fragment lit triangle test:

The performance gap widens. While the PowerVR SGX 535 in the A4 could barely break 4 million triangles per second in this test, the PowerVR SGX 543MP2 in the A5 manages just under 20 million. There's just no competition here.

I mentioned an improvement in texturing performance earlier. The GLBenchmark texture fetch test puts numbers to that statement:

We're talking about nearly a 5x increase in texture fetch performance. This has to be due to more than an increase in the amount of texturing hardware. An improvement in throughput? Increase in memory bandwidth? It's tough to say without knowing more at this point.

Apple iPad vs. iPad 2

Apple iPad (PowerVR SGX 535)

Apple iPad 2 (PowerVR SGX 543MP2)

Array test - uniform array access

3412.4 kVertex/s

3864.0 kVertex/s

Branching test - balanced

2002.2 kShaders/s

11412.4 kShaders/s

Branching test - fragment weighted

5784.3 kFragments/s

22402.6kFragments/s

Branching test - vertex weighted

3905.9 kVertex/s

3870.6 kVertex/s

Common test - balanced

1025.3 kShaders/s

4092.5 kShaders/s

Common test - fragment weighted

1603.7 kFragments/s

3708.2 kFragments/s

Common test - vertex weighted

1516.6 kVertex/s

3714.0 kVertex/s

Geometric test - balanced

1276.2 kShaders/s

6238.4 kShaders/s

Geometric test - fragment weighted

2000.6 kFragments/s

6382.0 kFragments/s

Geometric test - vertex weighted

1921.5 kVertex/s

3780.9 kVertex/s

Exponential test - balanced

2013.2 kShaders/s

11758.0 kShaders/s

Exponential test - fragment weighted

3632.3 kFragments/s

11151.8 kFragments/s

Exponential test - vertex weighted

3118.1 kVertex/s

3634.1 kVertex/s

Fill test - texture fetch

179116.2 kTexels/s

890077.6 kTexels/s

For loop test - balanced

1295.1 kShaders/s

3719.1 kShaders/s

For loop test - fragment weighted

1777.3 kFragments/s

6182.8 kFragments/s

For loop test - vertex weighted

1418.3 kVertex/s

3813.5 kVertex/s

Triangle test - textured

8691.5 kTriangles/s

29019.9 kTriangles/s

Triangle test - textured, fragment lit

4084.9 kTriangles/s

19695.8 kTriangles/s

Triangle test - textured, vertex lit

6912.4 kTriangles/s

20907.1 kTriangles/s

Triangle test - white

9621.7 kTriangles/s

29771.1 kTriangles/s

Trigonometric test - balanced

1292.6 kShaders/s

3249.9 kShaders/s

Trigonometric test - fragment weighted

1103.9 kFragments/s

3502.5 kFragments/s

Trigonometric test - vertex weighted

1018.8 kVertex/s

3091.7 kVertex/s

Swapbuffer Speed

600

599

Enough with the synthetics - how much of an improvement does all of this yield in the actual GLBenchmark 2.0 game tests? Oh it's big.

Without AA, the Egypt test runs at 5.4x the frame rate of the original iPad. It's even 3.7x the speed of the Tegra 2 in the Xoom running at 1280 x 800 (granted that's an iOS vs. Android comparison as well).

With AA enabled the iPad 2 advantage grows to 7x. In a game with the complexity of the Egypt test the original iPad wouldn't be remotely playable while the iPad 2 could run it smoothly.

The Pro test is a little more reasonable, showing a 3 - 4x increase in performance compared to the original iPad:

While we weren't able to reach the 9x figure claimed by Apple (I'm not sure that you'll ever see 9x running real game code), a range of 3 - 7x in GLBenchmark 2.0 is more reasonable. In practice I'd expect something less than 5x but that's nothing to complain about.

Post Your Comment

189 Comments

Smoking the magic of realism, that's the point this device needs to hit. You can take that back to your IPS gods.

I didn't say it was feasible to happen right away, but that's where it needs to be for the low-end devices. The upper-end of the spectrum should land in around $450.

BTW, you misunderstand R&D and cost procurement. Just because these devices have a hefty starting price does not mean the cost of materials is even 1/1000 of that price. Whether the service providers are eating the price, or not, it all comes down to the fact that these devices do have a large mark-up. I think you need to consider the cost of an iPod Touch to the iPhone if you need a simpler way to compare - the 3G modem doesn't cost $300 lol and the Touch still had a high mark-up.

"when the competition with a 2 decade head start still hasn’t been able to compete on price" ... no one has had a 2-decade head start. Technology (manufacturing and supply chain) and costs have both shifted over the last 20 years to make things more affordable. You come at this with an emotional response of "that can't happen", when I say it can and it will. Be care about being shortsighted it will come back to bite you one way or another.Reply

Look, if you want a crappy cheap computer, go out and buy one. They make $50 computers (none of that extravagant $100 OLPC nonsense) for India powered by 6502s.

But at the end of the day you are being disingenuous. You don't ACTUALLY want a $250 PC --- you can get something like that today if you buy a second hand Eee on eBay. What you want is an actual iPad, not something with a tenth of the functionality, but at $250.Good luck with that.

And spare us this "eventually". If you're not content with a $250 eBay Eee now, buy the time the $250 iPad equivalent comes around, the real iPad 5 will be quadcore, 2GiB of RAM and a retina display screen, and you STILL won't find the cheap equivalent an acceptable choice, not when there's a real device at $600 that is so much better.Reply

I still am amazed when people complain about the iPad being too expensive. I remember a little over a year ago everyone expected it to have a starting price of $999. It debuted at half that, and people still complain it's too much.

It's now a year later, and even Apple's competitors cannot make a device that is competitive with a $499 starting price point or less.

Here is where i see the iPad fitting in. The console and notebook have effectively replaced my PC. Everything i used to do on a PC i now do on either my notebook or my PS3. You're always going to have a cellphone. The tablet then does what you used to use a notebook for 10 years ago.

You end up with a cell phone, a tablet, a notebook and if you want to game, a console.

I don't know if a tablet will ever replace a notebook, maybe for some who can't afford all 3 and have to choose between a tablet and a notebook and don't need the productivity and power you gain with a notebook. Like how a TV is just for media consumption, a tablet is the same but you carry it around with you. Reply

People complain about the device being too expensive because for what it is capable of doing, and compared to other devices it is overpriced. For Apple the price makes perfect sense for what they portray as a luxury device. It starts with enough room to drop the price (which they did sort of) and to be able to introduce another smaller ipad at some point in the future without cutting into their sails of macs. That's probably why they went with their keyboard choice.

I thought about buying one but came to the conclusion that it simply was far to expensive to justify, especially since all I needed was an ereader, and later a new laptop. But I ended up getting on for Christmas so I wasn't complaining. Turns out the iPad is for EXACTLY what Steve said it was for. This is essentially a couch companion. This takes care of all my computer need when I'm at home and don't have to do work. But that's about all it's good for since it's too big to feasible carry around and doesn't replace your laptop.

I still stand by my belief that the ipad is overpriced though much more attractive at $400. I think tablets will be very important in the future, it's just that they are far away and apple right now is only interested in making consumer devices while everyone else follows them. But right now everyone seems happy with just a new toy..Reply

Yeah, and the problem with playing with them at the store is that they always look really gross. I was messing with a tablet at a store today, and immediately washed my hands afterwards. I'm not a huge germ-a-phobe, but when I guy blows his nose, then approaches the electronics, I just start getting uneasy. I guess the screen just shows what's on all the mice and keyboards there, too. :pReply