Why is overdraw / fill rate such an issue on Mobile Dev??

I am just really curious whyoverdraw / fill rate such an issue on Mobile Dev (eg. iOS/Android) ??
I am asking this because from hardware perspective, PowerVR MBX Lite is basically the Dreamcast GPU - in the same class as NVidia TNT 2; while the more capable PowerVR SGX is about the same class as GeForce 2 (? or is that GeForce 4?). On a historical perspective, overdraw / fill rate was never really an issue on desktop, but for some reason on smart phones like iPhone and Android, as powerful as they are at processing polygons, but if some 2D particles are covering the screen and your game grinds to a halt. I just don't get it. Why was this never an issue on the desktop GPU and why haven't they (GPU vendors) learn from the past?

Back then it was no issue cause people didn't try to get next gen console / current desktop gen quality to gpus from 5 generations ago visually ...

if you would create games as they were created back then ... well then it just looks sucky ...

well done games on PowerVR hardware don't have fillrate problems normally as opaque gets early z tested, but there are many people with little to no knowledge at all and its unity and similar that are to blame for this for inviting people without the background on how to create efficient games for mobiles or how to create efficient 3d games at all to more limited platforms, given we wanted to blame someone.
That are then commonly also the users you will see run around the web blaming everyone but their blatant lack of basic knowledge for their game struggling to perform well. Its the same reason why desktop software is bloated and shitty: the hardware is fast enough to compensate for badly developed software.
thats what in the end makes well optimized and done software stand out that much, in the end they aren't 'genius', they just did their freaking job unlike 95% lurking around the mobile market places

No, I meant what was in the hardware back in the days that is different from the mobile GPU we have now?

Let's say I want to make alpha blended particles that cover 30% of the screen. I am pretty sure this was not an issue back in the days when TNT2 was around as I have seen many games using fire particle effects and have never seen the game slow down. But if I am not mistaken, fill rate / overdraw is STILL an issue even on iPhone 4 / iPhone 4S even though it can process amazing amount of polygons, good amount of shaders.

Its no different, actually its significantly more efficient (it has todays efficiency level).

Back in these days there just were no shaders, half the hardware did not even fully support Hardware Texture and Lighting.
Todays hardware is playing in a totally different world and people are pushing it in a way that is in a totally different world too.

And if you look back at these days and look at the games you realize that most of these effects were done with single sprite texture animations if not completely on cpu. today its an issue cause artists and devs take the lazy way, pushing in more and more particles to do a job that an image strip animation does on a single billboard with equal or higher quality.

also there were no common usage engines back in these days. There were single purpose engines that worked in exactly this way if you used them exactly as intend. They had a level of optimization towards that one topic that general purpose engines can never compete with just cause the teams would be unpayable as you would need 2000 programmers to get a general purpose engine to be opted for every type of game liek a single purpose engine is (thats the reason that the top teams have their own inhouse tech even nowadays independent of the massive maitenance, research and development costs that commonly exceed the licensing fee of a middleware significantly)

Also you are not considering how many games didn't even use HW acceleration. command and conquer series in that days didn't use the gpu at all, neither did starcraft - gpu was a shooter thing and shooter dev teams known from these days where experts on a level where only a dozen or so teams on the mobile sector today could ever stand a chance to compete with

There is a reason why the BSP format was the defining format for shooters for the whole decade (the only genre you can bring up when you want to compare to fillrate and similar problems today, cause it was the only gpu pushing genre back then at all), the fillrate back then would not have allowed quake, halflife or unreal ever to exist without bsp.
show me a single game beside rage today that makes use of bsp on iOS.
BSP put severe limitations on the levels, limitations that todays artists couldn't even work under commonly, yet for the hardware it made complete sense and even today on mobile it would make sense up to a given degree for similar fast paced games with the need of stead 60FPS. Static PVS is a attempt to replace BSP without the limitations. It comes with pro and cons.

I think I am starting to get it - its more to do with bandwidth issue (data fetching between CPU and GPU) that's causing all these FillRate/DrawCall issues. The desktop GPU probably had a much much wider bus than the mobile GPU of today thus it was never an issue in the past?

Bandwidth and Fillrate are different issues. You can max out fillrate without having any issues transferring data to the GPU.

The Dreamcast was outputting a lower resolution, and was designed with much higher efficiency regards than most current iOS games. Fillrate issues are also not going to be apparent at 30% of the screen. the iPhone4 can be performant rendering 7-8x it's screen provided you use the correct shaders and don't get a bottleneck somewhere else.

Drawcall is no issue of GPU its a matter of CPU.
The fillrate problem is caused by people throwing in transparent surfaces to fake stuff instead of doing it right like 2D sprite games that use a quad and alpha instead of more approximative meshes to remove the draw alltogether

2D games in the days of 'desktop gpus as fast as current mobile gpus' (such gpus never existed, cause the gpus with this performance had no shader support nor support for blending on a level of the ES 1.1 combiners which only makes the fillrate problem worse) were all done purely on CPU, no 3D acceleration at all.

The high performant dense 2D games from iphone 3G days were done purely on cpu too like payback which is a fullscale software renderer only (game was ported from mobile platforms with no gpu at all) and the same holds for many others.

I'm mentioning 2D cause 2D and 2.5D are the prime examples of overdraw caused fillrate problems, because Cocos2D, SpriteManager2, Corona SDK as well as iT2D all have this very problem, they all throw quads with alpha pixels at the gpu and wait for it to joke and throw up basically. Particle Systems are another such prime example as nearly all of them use quads to render the particles which together with the alpha blend materials will never go through free but if you make the effects either with small particles or low particle numbers and good textures they aren't killing anything.

Fillrate problems are nearly always caused by bad to very bad use of transparency and lack of optimization on that end, often really a direct consequence of the lack of experience and knowledge of the developers or lack of time to do it right although they would know how to do it ...

Look at mika mobile for an example of how to do it right. They use meshes that form their 2D art, not quad sprites with lots of wasted transparent pixels that need to be drawn.

Digital ApeModerator

The iPhone 4 GPU is the same as the iPhone 3GS, but it has 4X as many pixels. The Dreamcast never ran at 960x640....

--Eric

Click to expand...

Amen.

Also it never ran with anything except the equivalent of opengl ES 1.1, and also didn't in actual fact, spray particles all over the screen. It's strength was that it didn't overdraw opaque polygons. You can do more on the iphone, than you could ever do on the dreamcast. Lets not forget that Quake 3 ran like crap on it, at a lower resolution, 15-20fps. Also guess what? Quake3 runs the same on 3G (if you compile the source).

As someone who was developing games for the class of videocards you're comparing these to, I definitely had fillrate problems on them. We had to use a lot of clever tricks to make things run without slowdown back then. If anything, I'd say people are just spoiled these days, because they rarely run into problems on desktop hardware. That "people" includes me too, by the way.

Actually, one interesting thing to note is that the old GPU's such as the TNT series were built with very specific limitations for everything. A max polygon count, a max particle count etc. Only with the release of the GeForce 8XXX series, I believe, did GPU's start to be able to dedicate their resources to whatever was needed. It was the birth of stream processors that enable the GPU to dedicate any of its resources to do anything you need it too. Now if you want all particles on screen, the GPU can dedicate all its power to that. Back in the day, you had to optimize games to utilize every aspect of the GPU the best. The unused part just went unused. Go back and play some of these games, I bet you can spot the tricks used. Maze like environments that allowed only small numbers of enemies, items and walls to be rendered. Tons of baked lighting, bounding box collisions only. Ah, back in the day. So modern GPU's that can allocate resources for anything are much nicer than what GPU's used to be.(not sure if mobile GPU's still have this limitation or not)

I would have hated to develop games in those days. If you want to use an old engine to see what it was like, check out Reality Factory. I learned to make games on it and lived with its horrible engine limitations for years because it was the only free option to make games with while I was in high school that didn't involve excess code writing. I also learned about the entire process of making games from playing with it. RF runs the Genesis 3D engine, which originated back in 1998. Poke about at it a while if you want it would allow you to experience how difficult things were to do back then. I have great respect for all who made games before "modern times" because of my experience with Genesis and RF.

One should also mention that alpha tests are this bad because Imagination Technologies (the ones behind all the PowerVRs) decided to put it very late on the graphics pipeline, contrary to other GPU techs. They just omitted to change z testing order in this pipeline when alpha testing. But older hardware did. As quote says, it could be due to shaders becoming longer and longer over time :

*PowerVR hardware - it's bonkers (in a good way), don't even try to understand it. There is a speed hit from alpha-testing, but it's purely relative - their effective non-alpha-test fillrate is so good anyway that the speed hit simply brings it down to normal levels. I never noticed any actual problems anyway.

*Old "conventional" hardware. For this stuff, if you're not alpha-testing, the pipeline goes:

-Z read
-Z reject
-Z write
-Pixel process
-Alpha-blend
-FB write

When you turn alpha-test on, the Z-write has to go right to the end of the pipeline, coz the alpha-test can disable the Z write on a per-pixel basis. So we nominally have:

But this means sending the Z-buffer information all the way from the top of the pipeline to the bottom of the pipeline, which is mucho extra gates. So generally what older hardware does is move the whole Z unit (only when
alpha-testing):

However, now this means you don't get any early Z rejection - you're reading and shading all texels, even if they are Z-buffered away.

*New conventional hardware. Obviously you don't want to do the above, especially as shaders get longer. So there's a whole bunch of hacks and workarounds - not sure how much I can say though coz of NDAs and wotnot. Note that these hacks should work moderately well for alphatest and texkill, but pretty much nothing will help the performance when outputting depth from the shader - it's just inherently slow.

The limitations of each generation are built in by intention, mainly for two reasons:

a) They should force you to think about what you're trying to do, it's a creative learning process.
b) Some mysterious creatures gain their power by all the time developers spend on trying to figure alternatives out how to optimise their work, a little bit like in Momo.

The fillrate problem is caused by people throwing in transparent surfaces to fake stuff instead of doing it right like 2D sprite games that use a quad and alpha instead of more approximative meshes to remove the draw alltogether

Click to expand...

Could you please explain a little what do you mean by more approximative meshes ? Say I try to render a smiley . You suggest that using an opaque shader and a disc mesh is more efficient than using a transparent shader and a quad ? And you basically say to make a mesh that outlines the shape we wish to display ?

Particle Systems are another such prime example as nearly all of them use quads to render the particles which together with the alpha blend materials will never go through free but if you make the effects either with small particles or low particle numbers and good textures they aren't killing anything.

Click to expand...

So the number of pixels that transparent items occupy is proportional to the performance penalty we get ? Or the number of transparent pixels that transparent items have is the problem ?

Fillrate problems are nearly always caused by bad to very bad use of transparency and lack of optimization on that end, often really a direct consequence of the lack of experience and knowledge of the developers or lack of time to do it right although they would know how to do it ...

Click to expand...

I fall into the category of "lack of experience and knowledge". Could you please suggest some guidelines or refer me to some sources that explain how is this done properly ?
Thanks for the information,
Ippokratis

Hi,
This thread is an interesting read.
Could you please explain a little what do you mean by more approximative meshes ? Say I try to render a smiley . You suggest that using an opaque shader and a disc mesh is more efficient than using a transparent shader and a quad ? And you basically say to make a mesh that outlines the shape we wish to display ?

Click to expand...

Yes, I think that's what he meant. Currently it seems using transparent texture cause more fillrate/overdraw issues than if you use more more polygons. That's why I found it ironic from traditional perspective - where you traditionally would use 2D transparent texture stand in (Imposter system) which is more efficient than to use gemoetry.

So the number of pixels that transparent items occupy is proportional to the performance penalty we get ? Or the number of transparent pixels that transparent items have is the problem ?

Click to expand...

Its both I think. let's say if you have a large transparent textured object in foreground, then another transparent textured object in background, then another one. The problem as n0mad had found in gamedev.net thread it seems to indicate it stems from PowerVR's pipeline design where alpha testing is done at the later stage.

I fall into the category of "lack of experience and knowledge". Could you please suggest some guidelines or refer me to some sources that explain how is this done properly ?
Thanks for the information,
Ippokratis

Click to expand...

Generally, don't use transparent texture/shaders (that use blending or alpha testing) if you can. So a large column of smoke particles is probably something you don't want to do because its multiples of transparent textures that's constantly spawning.

As stated in the iOS Developer Library documentation:

"Tile-Based Deferred Rendering

The PowerVR SGX uses a technique known as tile based deferred rendering (TBDR). When you call OpenGL ES functions to submit rendering commands to the hardware, those commands are buffered until a large list of commands are accumulated. These commands are rendered by the hardware as a single operation. To render the image, the framebuffer is divided into tiles, and the commands are drawn once for each tile, with each tile rendering only the primitives that are visible within it. The key advantage to a deferred renderer is that it accesses memory very efficiently. Partitioning rendering into tiles allows the GPU to more effectively cache the pixel values from the framebuffer, making depth testing and blending more efficient.

Another advantage of deferred rendering is that it allows the GPU to perform hidden surface removal before fragments are processed. Pixels that are not visible are discarded without sampling textures or performing fragment processing, significantly reducing the calculations that the GPU must perform to render the tile. To gain the most benefit from this feature, draw as much of the frame with opaque content as possible and minimize use of blending, alpha testing, and the discard instruction in GLSL shaders. Because the hardware performs hidden surface removal, it is not necessary for your application to sort primitives from front to back.

Some operations under a deferred renderer are more expensive than they would be under a traditional stream renderer. The memory bandwidth and computational savings described above perform best when processing large scenes. When the hardware receives OpenGL ES commands that require it to render smaller scenes the renderer loses much of its efficiency. For example, if your application renders a batches of triangles using a texture, and then modifies the texture, the OpenGL ES implementation must either flush those commands immediately or duplicate the texture. Neither option uses the hardware efficiently. Similarly, any attempt to read pixel data from the framebuffer requires that preceding commands be processed if they would alter that framebuffer."

And from my limited understanding, I think due to the difference in PowerVR's architecture which uses Tile-Based Deferred Rendering (TBDR) as oppose to the traditional Immediate Mode Renderer (IMR) with HSR (Hidden Surface Removal), means somewhere you get penalties for using transparent objects/shaders.

Could you please explain a little what do you mean by more approximative meshes ? Say I try to render a smiley . You suggest that using an opaque shader and a disc mesh is more efficient than using a transparent shader and a quad ? And you basically say to make a mesh that outlines the shape we wish to display ?

Click to expand...

Correct.
Don't have a mesh that draw primarily transparent pixels.

Even if you use the disk you can use the alpha, with normal cases where overdraw hits you still save up to 50% of the overdraw and if you do it really well or use RageSpline you can pull it off at 0% overdraw.

So the number of pixels that transparent items occupy is proportional to the performance penalty we get ? Or the number of transparent pixels that transparent items have is the problem ?

Click to expand...

No, the number of pixels you draw defines how many frames per second you can draw (fillrate is a fixed number, its an absolute maximum: FrameRatePerSecond * PixelsPerFrame = Fillrate. This means if you waste more pixels per frame due to alpha overdraw you in consequence just get less frames per second)

I fall into the category of "lack of experience and knowledge". Could you please suggest some guidelines or refer me to some sources that explain how is this done properly ?

Click to expand...

Any older book on how to develop games and game art efficientely covers it, those targeted at mobile gaming even more.
There are no direct sources to learn it as its always an act of balancing which always also means its a process to achieve the best performance over time by polishing and enhancing things within the game and decide what you can omit without sacrificing enjoyment and visuals. Its nothing you can just define like that and hope to work out, not without developing only for desktop.

So no need to worry that you fall under this category. its a constant process of learning anyway, there is no one truth, there is only an 'optimal enough result to support the vision and enjoyment the game is meant to offer'

Even if you use the disk you can use the alpha, with normal cases where overdraw hits you still save up to 50% of the overdraw and if you do it really well or use RageSpline you can pull it off at 0% overdraw.

No, the number of pixels you draw defines how many frames per second you can draw (fillrate is a fixed number, its an absolute maximum: FrameRatePerSecond * PixelsPerFrame = Fillrate. This means if you waste more pixels per frame due to alpha overdraw you in consequence just get less frames per second)

Any older book on how to develop games and game art efficientely covers it, those targeted at mobile gaming even more.
There are no direct sources to learn it as its always an act of balancing which always also means its a process to achieve the best performance over time by polishing and enhancing things within the game and decide what you can omit without sacrificing enjoyment and visuals. Its nothing you can just define like that and hope to work out, not without developing only for desktop.

So no need to worry that you fall under this category. its a constant process of learning anyway, there is no one truth, there is only an 'optimal enough result to support the vision and enjoyment the game is meant to offer'