February 07, 2015, 03:35:10 am

So, I've been playing around with my Sphere SFML and ended up creating a rather fast particle engine by being slow. Sounds interesting right? It's the tortoise vs. hare approach and I'll show you how it's done.

So, first things first we create a particle budget. This is the first trick to the system. We create a static sized array filled with dead particles. A dead particle has all the data fields it needs to do work, but is turned off when it's action is complete. Usually when a particle fades out, we can consider it dead since you cannot see it anymore.

There happens to be minor speed increases caching functions like the above, but it seems to vary a lot depending on engine used. This is just the safest, fastest approach to use. These particles are simple, they only fade out and move in a single direction. But they don't have to. We can always add more logic to them to make them more complex. It's really up to you.

Here is the good part. This is where dead particles get turned back on when an emitter emits particles. Rather than doing many O(n) lookups to find dead particles, we blindly loop through one at a time and turn them on. All emitters access the same global array in this same linear fashion so all emitters are essentially running the same O(n) complexity at the same time. It also skips over alive particles on the same burst. It turns out in practice that alive particles rarely overlap, unless the buffer is full. Technically we can be more accurate here, but remember on a full buffer it would repeatedly do O(n) lookups and "busy-wait" for a freshly dead particle which is incredibly slow. I'd rather it step over alive particles while it continues to decrement the burst count so that there is always a finite end to each burst (yes it reduces accuracy, but in practice it's not bad, especially with very high particle budgets).

Another neat thing here is the modulus. It makes sure we wrap around the particle buffer freely and efficiently. Otherwise I'd add if statements which are a tad more costly than a modulus and again, depends heavily on the JS engine used. What about not wrapping it (since wrapping really only happens at a fraction of the total size)? Well... we could forget about particles greater than the size limit and instead just skip the burst altogether and reset the count. But then large particle bursts can look ugly if timed incorrectly, so for a gain in accuracy we use the modulus. Also not wrapping still takes if statements and other control logic, and so is slower.

Finally, the meat of the emitter is the "if (p.time < 0) p.setup(this)" statement. It tells that particular particle to turn alive with parameters fed into it by the particular emitter used. This maintains the static particle budget and completely stops GC'ing from occurring. If we deleted and recreated particles each time we would have initially a faster particle engine since we are maintaining smaller arrays, but then there would be an unforeseen cost to GC collection and there would be constructor overhead as memory is acquired. The create/destroy approach is 100 times faster than this approach but scales horribly at the 500+ particle mark. This static method scales beautifully even at 15000+ particles.

So to continue in the vein that slower is better, let's take a look as to why. In a static system we must update and render all particles all the time, always checking if they are alive or dead.

I could micro-optimize this, but nevertheless the speed hit is the same. Whether you draw 1 or the entire buffer, the script speed hit is nearly the same. So your games base FPS will be slower by using this, but it'll end up steadily drawing 10000 particles whereas other methods will crap out at 1000 or less even if they draw the first 50 thirty or more times faster than this method.

It just goes to show that sometimes extremely high FPS is not the answer to a fast 60 FPS game. It's rather a steady framerate that can take pressure under larger loads.

Context:I'm currently creating a space-sim game in Sphere SFML (flexing my math muscles a bit) and the fps in SSFML in that game went from 3500 FPS using my old particle system to the more stable 1000 FPS. At 100 particles the fps was at 200 FPS on old system and at 900 FPS new system. At 1000 particles the FPS was 60 on old system and 850+ on new system. The fps only dropped as more particles were rendered, but makes up for it in the fact the JS is constant time whereas before more JS was executed per particle, blowing up the complexity of the system. That and HW rendering becomes the next major bottleneck (which is rather good so no problem there).

I have likewise used this approach to draw other things too. Like a static sized spaceship pool, GUI element pool, etc. It really is a much faster approach and probably closer to how games were originally coded on low-powered devices like the GameBoy, NES etc.

Last Edit: February 07, 2015, 03:41:45 am by Radnen

If you use code to help you code you can use less code to code. Also, I have approximate knowledge of many things.

Good that you're realizing a stable framerate is better than a high framerate with drop-outs though. The tortoise-and-hare comparison is quite apt here (the tortoise keeps moving, if slowly, but the hare stops and takes a nap halfway through--i.e. lag--which costs him the race). I wish more game developers would realize this instead of just going "Hey look, my demo runs at 9,000 FPS with a thousand sprites on the screen!" but then it drops abysmally low in real-world use. A low framerate I can deal with (when I was a kid I played SNES games on a 486, so I'm rather tolerant of frame skipping), but intermittent lag is annoying.

I have likewise used this approach to draw other things too. Like a static sized spaceship pool, GUI element pool, etc. It really is a much faster approach and probably closer to how games were originally coded on low-powered devices like the GameBoy, NES etc.

Indeed, which is why cheat devices like the Game Genie and GameShark worked so well--the memory budget on old consoles was so tight that everything was allocated a neat little fixed-size "box" in the same area of RAM every time. All the device had to do was continuously write to a known location in RAM, and viola--infinite lives.

This makes both high and low framerates feel the same, even as low as 30 fps. But with the old particle system, as more particles were used, the spaceship will 'jump' around screen as if we temporarily hit 2fps for a fraction of a second, causing a huge frame lag. With the new particle system the lag is far more consistent and smoothed out so you barely notice it.

Yeah, time delta is by far the superior solution. You can always change the framerate throttle later (say, from 30fps to 60) without having to modify anything else, since your updaters aren't assuming a specific framerate. The only potential downside is that you need to have a fairly high-resolution timer (sub-ms, preferably) otherwise it falls apart at high framerates. Before I discovered Sphere, I used to have an engine that did exactly this, using the Win32 QueryPerformanceCounter() function. As this has sub-microsecond resolution, everything still worked even at 100k fps (which, for the record, did happen as I was using dirty rects). Not really sure now why I abandoned it...

Like I said, a steady, consistent framerate is always better than a high framerate with intermittent lag. This is why most 3DS games run at 30fps even though the system is capable of doing 60--the console is actually pretty weak and the lag would be noticeable at 60fps.

This reminds me of how I wrote NPart for my NShoot demo, but I foolishly added and removed new particles as needed instead of simply caching the entirety. I'm going to change NPart to use this method on my next edit, thanks!

I know you were giving simpler code for this example, but one thing I would change, though I don't immediately know the overhead cost for it, is your particle vx and vy calculator. You have parent.speed*_cos/_sin(parent.angle), when I would do parent.speed*parent.updateX/Y(parent.angle) or something (setting parent.updateX to Math.cos, etc), for consistency and flexibility.