sometime ago I saw a game posted on glslsandbox which was quite simple, but run only in a a single shader. And the game state was saved in the same texture which was displayed.

I've been thinking about porting the Flash game Creeper World to run almost completely on the GPU.

IN OTHER NEWSI got rid of the 12 additional bytes in the transform feedback version, performance is now 3.55 million particles with transform feedback, up from 3.00 million. Haven't ported the SLI version yet, but it'd most likely hit 7 million particles at 60 FPS.

GPUs perform very poorly at heavily-branching code. If you could get any kind of general-purpose VM running on a GPU at all, I imagine it would perform very poorly on any code that wasn't already suited for GPU execution, i.e. heavily vectorized algorithms.

GPUs perform very poorly at heavily-branching code. If you could get any kind of general-purpose VM running on a GPU at all, I imagine it would perform very poorly on any code that wasn't already suited for GPU execution, i.e. heavily vectorized algorithms.

To celebrate my exams being over and the start of winter break (well, okay, I do have a basic Java exam left. ) I decided to create new particle engine thingy. Been working all day, but I finally got it done! It's a collection of old things I've posted here plus a few new features!

- It's now completely in 3D with particles bouncing around inside a huge box. - Like before, updating is done using OpenGL transform feedback. Nothing new here. - The particles are rendered as billboarded sprites using a geometry shader. The 4 vertices are generated in eye space and then sent through the projection matrix.

Nothing really huge here, just pretty much a mix of old stuff. The new feature is particle SORTING! I implemented a (very inefficient) radix sort using transform feedback to sort particles based on their depth. My algorithm needs 2 passes per bit of depth precision, meaning that for 24 bit integer depth I need 48 passes over the particles! Shit!

To combat this I decided to also do frustum culling when calculating the depth of each particle. That means that only particles that are actually on the screen will be sorted. All of them are still updated of course. This of course gave me a big performance boost when only a small number of particles are visible, but that's kind of cheating... =S

Anyway, I'm getting 200k sorted particles at 63 FPS (one GPU) at the moment. The particle culling is extremely efficient, improving FPS to 600 when no particles are visible (it's still updating them too). Using OGL4 I could reduce the number of passes needed to sort by a factor of 4 or even 8 at the cost of a small amount of video memory, but for now I'm stuck on OGL3. If anyone knows a more efficient sorting algorithm available in OpenCL or something like that, I'd love to hear about it!

It's still too unoptimized. I need to find an algorithm that doesn't require so many passes... Right now it seems to be a lot slower than Arrays.sort() on the CPU in raw sorting performance. If I disable culling, rendering and updating, I can sort 450 000 particles with 18-bits accuracy at 60 FPS (the algorithm scales linearly with the number of objects), so around 27 000 000 sorted particles per second. That's compared to real OpenCL sorting libraries which claims performance closer to a billion 32-bit keys sorted per second. I just have no idea how to implement this with OpenCL...

The algorithm also does not sort the particles, it sorts their indices by their distance (8 byte keys). Since I have so many I have to use 32-bit indices. It turns out that randomly indexing into the particle array is a lot slower than just drawing them all sequentially. Sorting indices makes it possible to copy around less data when sorting, but it might be a good idea to actually reorder the particle buffer too. Since the order changes very slowly, that would make the indices essentially sequential since the particle order changes very slowly so not many particles would be moved so far away that they cause a cache miss.

It doesn't looks that impressive on still images either. The coolest part is when I move the camera through that smoke cloud, and I can literally only see a few meters ahead. Without sorting, the cube of smoke looks hollow since you get the illusion that you can see inside it due to the incorrect blending of the particles. It'll also handle correct blending of things like fire and smoke.

if they are aligned to the camera orientation, then rotating the camera changes the sort-order drastically, as the render order is determined by the (infinite line)-point distance, not real distance to the camera.

if they are facing the camera (perpendicular to it), you indeed have relatively stable order, but then sprites will intersect eachother

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

I did gpu sorting a while back and radix sort is the way to go, there is a open source OpenCL implementation which is really fastFastes was at the time the CUDA impl of the Nvidia SDK

But for so many particles try some iterativ sorting algorithm(bubble sort) and do only a few passes each frame. You won't have perfect sorting with this at every frame, but it is a very good 99% solution.

Also, if your particles are static then you have only a fixed nummber of orderings (2D: 4, 3D:12)

Well, they're separate problems. I currently compute depth the same way as the depth is usually stored, so depth indeed depends highly on the orientation of the screen. It'll be easy to change it to use eye-space distance instead which would remain stable no matter the camera's orientation. How I render them to make them look good is a different problem. =S

Ah, yes, that site. Funny that he posted that just when I did my sorting stuff. =S I've been monitoring his blog for TXAA info and he recently deleted everything concerning TXAA (10+ posts), so I was worried that he had been fired or something. ^^' We'll see what he comes up with...

How about adding features like environment lighting via cubemaps, dynamic lighting, self shadowing, casting shadows, dynamic force fields. Then you get to point where number of particles that is feasible is much lower than you currently use and that change some things radically. Also you can usually cull whole emitters first and sort particles locally. So instead of sorting all particles you first sort emitter and after that sort particles per emitter.

Currently trying to figure how to get environment+ dynamic lighting and simulating vortexes with couple thousand particles at mobile tittle so only gles2.0, shadows are no go but luckily games does not even need those.

My idea is to just dump all kinds of particles into a huge list and update them on the GPU. The update shader is an uber-shader which allows for lots of particle types, including emitter particles etc. Since the particles are sorted by distance, they will be somewhat grouped together by type since they're emitted from the same place so the branching will be relatively cheap. It's also worth noting that transform feedback has relatively high bandwidth cost for each vertex processed, so adding more work to the update shader won't affect performance at all in my tests. If I'm right, multiple vertex streams will also allow me to do frustum culling (just 6 dot-products) for multiple lights in one pass and output the indices to other buffers in the same pass.

My current sorting algorithm is an abomination and I really need a more optimized one. To still do this with transform feedback (instead of for example OpenCL or CUDA) I really need support for multiple vertex streams = OGL4 to direct particles into buckets. I'm currently forced to do one pass over all visible particles for each bucket, which means that I have to do twice as many passes as the bit precision of the depth. With multiple vertex streams, I could sort 4 bits per pass using 16 buckets and reduce the number of passes from 48 to 6 for 24 bits of depth, or just 4 buckets and get it done in 12 passes (if VRAM is a limitation). Like I said before, transform feedback is very memory limited, so this is the main bottleneck at the moment.

I've taken a look at fourier opacity mapping, and it seems to be an excellent way of doing particle shadowing and self-shadowing. Performance seems good since the resolution of the map can be kept very low while still giving a very good look thanks to the blurry nature of particles. The particles also do not have to be sorted when rendering the opacity map. My only problem is that I have absolutely no idea how it works. That kind of math goes waaaaay over my head. It's definitely somewhere in my todo list though.

Since my current particles are meant to simulate smoke I also had a go with fragment limitations. To get good looking smoke you need a lot of overdraw, and with the current 2 megapixel screens that becomes very expensive. Some games render the particles at half to reduce the number of pixels drastically. Using a special upsampling filter they can preserve sharp edges. Although the particles get slightly blurry, there's not much of a different since particle effects are inherently blurry. The only artifact possible are single-pixel errors that won't be visible at all. For 4 times as much overdraw I'd say it's definitely worth it.

http://www.bungie.net/Inside/publications.aspxAt "Blowing S#!t Up the Bungie Way" paper they presents some nice gfx stuff that used at Halo3. There are nice tiling plate texture animation trick that brings more life to particles that can help you to reduce particle counts. Basically there is bigger tiling texture that have some shape in it an top of that they swim the actual particle texture by animation the uv's. It's seems so simple yet effective.

Another good trick is to use grayscale texture and palettize that with 1x256 texture. This can save some bandwith, reduce texture packing artefacts and give a lot more variation than simply using tint color. Addition to this technique would be pack albedo, spec mask and alhpa to one texture. This should give enought variation for textures that you could render liquid and gasses with same shader. Using world space normal maps also work like charm with cube mapping.

Ah, yes, that site. Funny that he posted that just when I did my sorting stuff. =S I've been monitoring his blog for TXAA info and he recently deleted everything concerning TXAA (10+ posts), so I was worried that he had been fired or something. ^^' We'll see what he comes up with...

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org