My first one was a CPU based particle engine which I worked on for quite some time. I tried most things that could be tried with it and I now have 9 working versions of it at the moment. The oldest one can only handle a measly 360 000 particles. It's plagued by garbage collecting overhead, linked particle lists and ByteBuffer loading. The newest one runs features multithreading for any number of cores, MappedObject particles to avoid an extra data copy and other particle handling optimizations. Now when my laptop broke I had it upgraded with another 4 GB of RAM. The exact same old stick is still in it, but a new one with the same timings and speeds has been added, meaning that it can now use dual channels. Heh, ever heard of dual channels making a difference? Well, now you have. This particle engine went from 1 100 000 particles to 1 600 000 particles since it was so memory bottlenecked. Memory bandwidth makes a HUGE difference here it seems. The advantages of this one is the flexibility of the CPU. We can do collision detection against terrain or animate particles however we want.

The second engine I made moved everything from the CPU to the GPU. All particle data was stored in textures and one shader was used for updating particles and one for rendering them from the data stored in the textures. This was quite a bit faster, though the difference isn't as big as before my RAM upgrade. It could handle 2 100 000 particles, but it was also the most complicated of the particle engines I've made, using 3 shaders, float textures and instancing. It also requires OGL3+ for the 32-bit float textures. It's awfully complicated, but wins big in performance, especially since it also leaves the CPU completely free for whatever else you need to get done.

The third engine was also a GPU implementation but used OpenCL for updating particles instead. This simplified lots of things, since I could just use a basic VBO and update the data in it instead of having to use textures. It simplified a lot (well, except for the fact that I had to learn OpenCL) and had identical performance to the OpenGL one. This is very interesting in the first place, and shows that OpenGL, despite the overhead of the shaders and textures OpenGL had IDENTICAL performance as OpenCL, which is MADE for computing ---> OpenGL is optimized as hell! This one also requires a OGL3+ card since OpenCL requires it, but at least is a bit less complicated (and less insane).

All the earlier three engines had one big problem which limited their usability in a real game. They had excellent peak performance, meaning that they performed the best when the number of alive particles was close to equal the number of particles allocated. They all had particles stored all over a MappedObject, some textures or a VBO, meaning that it was difficult to find out which and how many particles that were alive each frame. All allocated particles had to be checked, updated and processed each frame, regardless of if they were actually alive or not. Of course they earlied out of dead particles, but the fact that they had to be checked was a serious performance problem. They also did not preserve the ordering of the particles since they all had different tactics for avoiding having to find a dead particle to overwrite each time one was generated. These were really bad limitations, but I saw no way of solving them without severely impacting performance by a factor of 10 or so, until...

Enter Transform Feedback! Transform Feedback allows you to capture vertices before they're rasterized, allowing you to "render" vertices to a VBO. Their main use is processing expensive vertices (skinning, tessellation, animation) once and then rendering them multiple times, for example to a number of shadow maps and then to the screen, but they have a VERY interesting feature thrown in: They capture after the geometry shader! What this allows you to do is to both generate new vertices and remove vertices in your geometry shader, and they will end up in your VBO in the same order that they were created! I have no idea what kind of black magic they're using to get it working, but particle engines must have been exactly what they had in mind when they added this!

Transform feedback is ridiculously easy to use and requires only a few lines of setup. Just look at the line count!

Magic! The particles are updated with a geometry shader, and if they die they are simply discarded, and thanks to transform feedback the output buffer is completely consolidated. When all old surviving particles are done, we just draw the new ones and they'll end up after the old ones. Transform feedback also has another godsent features: glDrawTransformFeedback(). This functions works like glDrawArrays(), but draws the number of vertices that transform feedback produced without forcing you to read the value back to the CPU (which would stall everything and kill performance). It can't possibly be easier than this. Draw a new particle to transform feedback and the engine automatically handles it until it dies. I mean, this is it!

Sadly performance dropped a bit. This one only handles 1 200 000 particles. That's even less than the CPU implementation! Hopefully there's a solution though. Transform feedback isn't very flexible with its output types, so only 4-byte ints and floats can be output from the shader since that's the only two types that are supported. My earlier GPU implementations used 24 byte particles and stored color in 4 bytes and life+maxLife in 2 shorts. For my transform feedback test I simply made everything into floats, giving me 32 byte particles! That's 50% more data! The only thing the shader does is pretty much position += velocity, so it was probably memory bottlenecked even before I made the particles bigger. I suspect that I can make it around 33.3% faster to around 1 600 000 particles by packing the data more efficiently.

Now I think about the tweet from Tiy, (developer in Chucklefish, writing "Starbound") sayin, their engine would be able to render 10k particles

Well, for 3D I'd need at least 8 bytes more for the additional dimension, and I'd also need a proper geometry shader to expand the points to quads. In the end it'll probably end up being fragment limited. You'd also probably want texturing, smooth particles (fade them out as they get closer to the ground to prevent a sharp edge) and maybe even lighting, in which case it'll get even more expensive. There are some tricks you can apply though, like rendering the particles at half or quarter resolution and then upscale them with a smart filter. The blurryness is very hard to notice for smoke, fire, etc, and it reduces by fillrate needed by 4-16x. However, having 1m particles in a game isn't going to leave much CPU/GPU time left over for the actual game, is it? =S

Worth mentioning is that I used GL_POINT_SMOOTH to anti-alias the radius 1 points I have. This basically causes each particle to cover 4 pixels instead of one and also increases the cost per pixel for the coverage calculation. Blending was still left on though. The performance gain of disabling this widely differs between the 4 engines:

This is the cheapest possible particle we can draw, and doesn't mean much. Each pixel covers 1 pixel which it just colors with a single color, so it's pretty much guaranteed to not be fragment limited at least. Let's keep GL_POINT_SMOOTH disabled and bump up the point size to 5, meaning each pixel covers a square 5x5 pixel area:

The GL/CL version takes the biggest hit. It had the best vertex throughput, but when the bottleneck shifts to fragments they all approach the same performance. In this case, transform feedback wins since it is the most flexible one since it also preserves ordering and runs well with few particles too.

In the end I'd say that the CPU and the transform feedback versions are the most feasible in a real game, since they give you the most flexibility. The transform feedback version actually wins in performance since it runs solely on the GPU and leaves the CPU free for other things.

There's one final problem for the CPU version though: I'm rendering points! A real game would want to render textured quads! For something that was so RAM bandwidth limited, quadrupling the amount of data isn't a very good idea. Basically, we need a geometry shader (or instancing) to expand the points into quads on the GPU, and suddenly we lose the main advantage of the CPU engine since we need OpenGL 3! I think that might be what's holding back most in-game particle engines since they just don't want to lock themselves to OpenGL 3 hardware for some particles. Transform feedback doesn't have this problem since it requires OpenGL 3 in the first place.

Point is you can't make a game which flatout requires OpenGL3. Not even latest AAA games do that.But we talked about this of course ^^

No use writing a book in a language almost nobody speaks.

Quote

51% of the Minecraft user base have computers with graphics cards capable of OpenGL 3.0+.38.8% of the Minecraft user base have computers with graphics cards capable of OpenGL 3.2+.34.2% of the Minecraft user base have computers with graphics cards capable of OpenGL 3.3+.19.6% of the Minecraft user base have computers with graphics cards capable of OpenGL 4.0+.

Yeah I know, I know.Just if you want to sell it as a game, OpenGL2 support would be better: you would sell MOAR

Now I think about the tweet from Tiy, (developer in Chucklefish, writing "Starbound") sayin, their engine would be able to render 10k particles

Well, for 3D I'd need at least 8 bytes more for the additional dimension, and I'd also need a proper geometry shader to expand the points to quads. In the end it'll probably end up being fragment limited. You'd also probably want texturing, smooth particles (fade them out as they get closer to the ground to prevent a sharp edge) and maybe even lighting, in which case it'll get even more expensive. There are some tricks you can apply though, like rendering the particles at half or quarter resolution and then upscale them with a smart filter. The blurryness is very hard to notice for smoke, fire, etc, and it reduces by fillrate needed by 4-16x. However, having 1m particles in a game isn't going to leave much CPU/GPU time left over for the actual game, is it? =S

Okey... I have to correct my self now. I found the tweet from Tiy... he said, their engine could handle 10k particles, with 0 frames drop (he actually said "performance drop", but I think he wanted to say frames...). Also, their engine (I'm pretty sure) uses OGL2, or even OGL1.?, and they have textured quads, just like in Terraria.

Another Particle Engine I've heard - better: I've seen - about, is the Unreal Engine 4, which includes a really cool particle engine. But since you have DirectX there, and I'm sure they use DX11, it's harder to compare these two. But they actually handle some million particles in a fire-smoke-animation, which cast shadows and are effected from lighting. THAT is cracy:

Lighting particles isn't much harder than lighting stuff without deferred shading. It's pretty inefficient though, so you usually put a maximum of, say, 4 lights or so that can affect each particle system and just calculate lighting and sample the shadow map. Making particles cast shadows isn't that hard either but you'll have to sort them by Z for each light which isn't exactly cheap. What's reeeaaally tricky is getting particles to cast shadows on other particles. There are ways of doing this for individual particle systems like they do in the UE4 video (Fourier Opacity Mapping for example), but I don't think that different particle systems can cast shadows on each other. I don't know, somebody's probably figured it out. =S

Anyway, particles casting shadows:1. Render shadow map of normal geometry.2. Render the particles sorted and depth tested against that geometry, with blending to determine how much light they block and store that in a separate texture.3. When sampling it, sample it as usual, but if it passes the depth test also read the other texture and modulate the light by that value.

Particles being shadowed by normal geometry.1. Render shadow maps as usual.2. Determine which lights affect the particle system and pick out the X ones affecting it the most.3. Render the particles by sampling the depth buffer and calculating lighting for each particle.

Can you discuss a bit more about the texture-based particle system? I'm running OGL 2.1 and I'd like to try something like that myself thanks to GL_ARB_texture_float (float textures seem pretty widespread these days).

I suppose it's a 2D system, and you are using a 4-component texture to store (x, y, vx, vy)?How do you "write" new values to the texture? Shader + FBO? How are you rendering the particles? Something like your shader-based tile renderer -- a single quad across the screen? What about blending and overlapping of particle images? And would a geometry shader be moot for rendering, considering the sheer number of particles?

Can you discuss a bit more about the texture-based particle system? I'm running OGL 2.1 and I'd like to try something like that myself thanks to GL_ARB_texture_float (float textures seem pretty widespread these days).

I suppose it's a 2D system, and you are using a 4-component texture to store (x, y, vx, vy)?How do you "write" new values to the texture? Shader + FBO? How are you rendering the particles? Something like your shader-based tile renderer -- a single quad across the screen? What about blending and overlapping of particle images? And would a geometry shader be moot for rendering, considering the sheer number of particles?

I need to pick your brain.

Theoretically it should be possible to implement with OGL 2.1 since FBOs are supported through extensions (DX9 has support for it, it just didn't make it into OGL core). I did indeed use an RGBA 32-bit float texture. Some OpenGL 2 hardware does not support bilinear filtering of 32-bit float textures, but that's not a problem since we don't need that. I used a RGBA32F texture for position and velocity, a RGB8 texture for color and a RG16 texture for current life and max life (alpha = current/max, that's why I needed both). I've heard that OGL2 hardware supports multiple rendertargets, but all the formats have to have the same number of bytes per pixel. I'd therefore recommend 2xRG32F for position and velocity and one RGBA16, with color packed in RG and life time in BA. All three would be 8 bytes per pixel so it should work, assuming RG32F is supported by the extension.

For optimally uploading particles you'd need to keep track of which particles "slots" are empty in the textures, meaning that you'd have to update the particles on the CPU too, so I just hacked around this and just kept an index into the texture and simply uploaded particles starting from that index. Assuming your particles do not have widely varying life times the chance of a living particle being overwritten is minimal. I just generate a random particle on the CPU, upload it to a VBO and render it to all textures in one pass with an FBO and a pass through shader that routes everything to the right textures.

Each particle is rendered as a GL_POINT by reading the particle data from the texture, so it's very different from my tile renderer.

Quote

And would a geometry shader be moot for rendering, considering the sheer number of particles?

On the contrary, I used a geometry shader to (at least try to) improve performance. We don't know which particles that are alive, so we'll have to render a vertex for each allocated particle! That is the main drawback of this engine, but using a geometry shader we can at least discard (= not output) particles that are dead. Now that I think about it though, it might be faster to just ditch the geometry shader and cull dead particles by just rendering them outside the screen if they are dead. I was just interested in peak performance so I never bothered to check it.

The performance depends on the number of allocated particles more than on the number of alive particles.

Blending is easy, I just enabled it. =S It's worth noting that the particles are not kept in order, so blending which depends on the order of the objects will look very bad. Additive blending will work fine though.

If you want to render textured particles you could use point sprites, though you are still limited to a maximum point size of 64. I've also heard that they are buggy in some drivers, but I've never personally had a problem with them.

To summarize: - It's possible with OGL 2.1. - The particle data textures MIGHT have to have the same total number of bytes per pixel. - Don't use a geometry shader when rendering particles. Instead cull them by rendering them outside the screen.

In the end I strongly recommend against implementing it. It's complicated to implement and there's no real performance gain over CPU particles unless you constantly have millions of tiny tiny particles all the time. If your particles are fill-rate limited, this won't improve performance over CPU particles at all.

100 medals!!! I was actually at 99 until now, but I think it's showing 101 now, right? I think Riven explained that to me some time ago... =S

The forum allows you to give karma-points to others. When these are awarded to a 'post' they are turned into a medal. Any other karma point is simply a database operation like user.karma++. This explains part of the discrepancy. The other part is explained by when karma points were awarded for posts, when there weren't medals yet.

Now get back to work!

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

100 medals!!! I was actually at 99 until now, but I think it's showing 101 now, right? I think Riven explained that to me some time ago... =S

The forum allows you to give karma-points to others. When these are awarded to a 'post' they are turned into a medal. Any other karma point is simply a database operation like user.karma++. This explains part of the discrepancy. The other part is explained by when karma points were awarded for posts, when there weren't medals yet.

Now get back to work!

Karma... isn't this that, what Buddah's have? You can have good and bad karma, and if you have an equal amount of good and bad karma, then you are gonna taken to nirvana? Maybe I remember wrong... I'll take a look at wikipedia

Going from what I learned when I was a Buddhist, there's not so much "good" and "bad" karma as there's just karma, which is the cosmic force driving the whole cycle of life and death. The goal isn't about accumulating any type of karma, but escaping its influence altogether. So it's not so much about cancelling out any one kind of karma as it is about making it irrelevant.

Going from what I learned when I was a Buddhist, there's not so much "good" and "bad" karma as there's just karma, which is the cosmic force driving the whole cycle of life and death. The goal isn't about accumulating any type of karma, but escaping its influence altogether. So it's not so much about cancelling out any one kind of karma as it is about making it irrelevant.

I can see the philosophical connection between karma and particles (small details flying around and driving the universe), but can we please try to stay on topic?

Actually I'm going on a 2 day trip to a certain village, followed by returning home to Sweden next week, so I'll be pretty busy with lots of things... We'll see if I can get find some time on saturday though...

hmmm.....although having 1mil+ particles is great I think I would rather 50k that look fantastic. From what my very limited understanding is, you want to have the best looking effect with the least amount of particles.

This is my current system that will be use in my game. It is in java2D so I think I'll opt for a unbelievably pathetic 1000-1500 particles.

Once rewritten in opengl I will go for 10-20k. Got to leave room for everything else.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org