The Particle Sessions : Part One, Is this it ?

Part One of the Particle Session kicks off with looking at a system similar, but simpler then this Bloom particle experiment (music made for it by the fantastic artist Rich Bologna, thank you again for it, Rich!). We’ll be taking slow steps of optimizations and at the end have a get prepared for the next level. To be able to get up to speed with the working of the video particles, we’ll start from the beginning and work our way up. For that reason, we’ll start very simple and look at a basic unoptimized particle system with next to zero functionality, and modify the code for performance.

Basis : The Particle

The opening picture of this post is the album cover of The Strokes – Is This It. It being one of my favorite albums, I have listened to it many times, but never noticed what the cover really was. It’s amazing & inspiring what imagery one can get from science. Now put on your headphones and listen to that album while reading this post.

Over the last years I’ve more and more started enjoying introductory books about theoretical physics. In complete honesty, it wasn’t even that long ago that I didn’t know that the Boson in the Higgs Boson wasn’t the other guy discovering it, next to Mr.Higgs. You know, like the Hale-Bopp comet was discovered by Hale & Bopp.

The simple core particle we will be using in this session get its complexity from its display of large numbers of them, not from per instance properties. To illustrate that, let’s start off with a simple particle object in AS3

As you can see, this particle object is pretty simple, and it contains the core properties we really need to make them move. Its position stored in x and y, and its velocity stored in velX and velY. For the case of this post it’s more then enough to do what we want.

I do want to point out two things about the simplicity of this. First of all, we express all values in absolute x and y. For the positions, this would be called a cartesian system. In some cases it might be handier to express velocity as a polar coordinates, but performance reasons we’ll keep them like this.

OOP (s)

It’s very common within the AS3 industry to “objectify” everything in nicely wrapped classes first and think about performance later. In the case of our particle and a classic “update” loop as seen in game engines, the most natural feeling way to implement the particle’s moving behaviour for the particle is as follows.

This is a very logical thing to do and is suitable for most things, especially in larger code bases, it certainly helps in readability. It also matches many things you’ll see in OOP and AS3 books. Like car.drive() or lasership.shoot() or avatar.updatePosition(). From a pure OO perspective, it makes sense. From a performance perspective, this isn’t as good though. We’ll tackle that in a bit.

Main

Now, let’s say we’d have a very simple Main class to run this system, as such :

For clarity sake I’ve omitted a couple of things, but this code runs. If you run it, you’ll see bunch of particles moving at a constant speed, moving offscreen. Our simplest of simple particle system works, so now we can start looking at performance and why we need to change the structure for performance.

Performance, the small issues

If you aware of performance caveats within Flash, you’ll immediately notice a couple of things, amongst which the lesser problems like for instance;

bitmapData.fillRect(bitmapData.rect,0x000000);

A lesser known fact about this piece of code is that the property rect of the bitmapdata is not a static read-only rect, but that every time you query this property a new rect is instantiated. Instantiation is rather expensive in the AVM2, but in this case, it’s called once per frame, thus not a major issue. Still it’s unneeded. Keep this in mind when you do many things per frame with bitmapdata.rect. In general, keep in mind that instantiation & allocation is expensive.

If you ever had one of those swf’s which seemed to hickup every now and then, chances are that it is the garbage collector kicking in. It doesn’t have to be one specific place where you are instancing to many objects, it might be more then one. So keep things clean from the beginning. Also keep in mind that many small performance losses will result in one big one.

The larger performance costs

There’s multiple things we can optimize in this piece of code. For now let’s keep it a bit simple and look at the larger issues, where we get the most gain. One of the main issues is how we store and access the individual particles.

particles = new Array();

We currently store them in an (as always in ActionScript) untyped array. If we’d like to keep the “storage” an array like structure, we would be better off using a Vector for storage, but we can get it even a bit faster by using a linked list instead. For the purposes of this simple particle system, that data structure is enough, as we are only going to linearly iterate over the particle set in one direction. This is called a singly linked list.

Let’s have a look at how that works.

Linked List

First, we modify our Particle class, to contain a link to the next particle in the list, by adding this to its properties.

public var next:Particle;

Then we modify the Main class to instanciated the particles like this :

For every particle we create we store a reference to the previous particle in that particle. Then when we are done with the loop we keep a reference to the last created particle. That particle refers to the particle created before that, and so on, until we reach the first particle which has no reference to any particle.

Final(ly) ?
Using a linked list instead of an Array helped, but we can even more performance for free, by adding the “final” keyword to the particle class.

final public class Particle {

You would use the final keyword normally to prevent classes from being subclassed, but there is an interesting piece about final in Flash AS3 and AVM2 performance tuning, which by now is an aged document. In this case it indicates that sealing classes using the final keyword helps the AVM because the properties are already resolved and bound.

I find that in Flash Player 10.1 the advantage is less noticable then 10 and before, but it still has a slight performance gain on my machine, although I’ve heard it been debated more then one time. It seems to me also that some platforms/architectures benefit more then others. My naive way of thinking about it is that if you’re not going to subclass it anyway, you might as well finalize it. Simplicity.

Function calling
It may seem obvious to call a method of an object (like with car.drive()), but if you want maximum performance you should avoid function calls. With this in mind, let’s take another look at our draw loop :

The updateAndDraw call here is the costliest thing in the main loop. Not because what it does, but because it is a function. In general, function calling is always going to have some form of overhead, but in the case of Flash / AVM2 / AS3, the cost is exceptionally high. Although this situation has improved with Flash 10.1, the overall hit of just calling a function inside the loop is way to high to be acceptable if we want to achieve high numbers of particles.

Inlining

Inlining is something a compiler does for for you to replace a method call with the actual functionality in place of the function call.

AS3 does not have an inline keyword natively, and the compiler wouldn’t know what to do with it. Joa Ebert’sApparat tool suite, written in Scala,provides amongst other incredibly handy things a way to do inlining in AS3. I highly recommend using it for larger projects where code style and performance are important. I actually also would highly recommend it for any project where performance is a priority. For this instance though, let’s go to a simpler solution first.

We’ll remove the updateAndDraw method from the particle class and make our main loop look like this.

Yay. For now you’ll have to take my word for it, but this is faster. (later on in this series will prove this with benchmark tests, I wanted to avoid this post just being a bunch of charts and performance graphs).

Yet another function call.

While looking at this new snippet, immediately you’ll notice another function call in our mainloop.

bitmapData.setPixel(p.x,p.y,0xFFFFFF);

The bitmapData.setPixel is yet another function call being called as many times as there are particles. What’s important to note though is that Flash native API calls are faster then calls to methods in non “api” classes. So although the bitmapdata.setpixel call is costing us more then it should, it still not costs as much as calling a method on one of your objects.

Still, there is cost associated to this function call, and we can avoid it by inlining the method again. setPixel() is an incredibly handy utility function though, so we need to replace its functionality by a piece of code which mimicks its behaviour. Also, we need to write to the bitmapdata’s data by hand.

I’m not going to jump into using Alchemy / FastMemory just yet, so for now we’re going to use the next fastest accessible type that is available in AS3. Vector.

Drawing to a Vector instead

One can avoid drawing to a bitmap using setPixel and use a faster way to do so using a Vector object. While certainly not the fastest way to write to a bitmapdata, it’s probably the fastest way to do it in pure actionscript without any external tools. In comparison to using setPixel in our main loop, it should be a very good performance improvement.

One of the things setPixel did for us was bounds checking. Since we won’t use it anymore, we need to do it ourselves, so that we don’t write to a position in the vector that is outside of the bounds of the vector. At the same time, we’ll use this to resolve another issue. The particles were randomly flying offscreen. Instead of that we’ll respawn them at their starting position. We’ll add these properties to the particle class.

While this setup is still far from optimal from a performance perspective, we’ve got a reasonably fast way to render 10 to 20k particles to screen.

One note here is the bitmapData.getVector and setVector.
This still isn’t that optimized – the function call “BitmapData.getVector” creates a new Vector object every time we call it, and as you know, creating objects takes CPU and memory allocation. Ideally the function should allow us to pass it a pre-existing vector object for it to populate with data. There are a couple of examples of this throughout the Flash API’s, and I truly hope Adobe will address these in future versions of the Flash Player. The benefit here of using getVector is that we can quickly clear the entire vector. Or if you would play around with this, you could not fillrect the bitmapdata and blur it slightly every frame.

What, you don\’t have Flash ? And you are still reading my blog ? Kudos to you!

So far, simple enough, right ?

I’ve kept everything quite simple so far to illustrate a couple of mostly obvious performance hints, and to set a frame of reference for the follow up article, part two. Technically this solution is far from the optimal solution to run particles. But for experimenting with moving the particles in a more interesting fashion, its good enough for Part One of this series. The benefit is that we can still express all the code in ActionScript. We’ll add a little bit of PixelBender in the mix to get comfortable with it and use it as an example of manipulating data with PixelBender.

Velocity Field ?

Visually this example is completely boring, so let’s make it a bit more interesting. We’ll start by making the particles move to a background pattern. We’ll use some perlin noise and pixelbender to generate a vector field for the particles to read from.

We’ll use a vector similar to the vector we are using to draw to the bitmapdata we did earlier, and mix in some perlin noise to convert grayscale perlin noise to velocities to be quickly looked up by the particles. We start by adding some code to the Main class, let’s start with the init method.

As you can see we added some code and a new property to our mainclass, to hold a reference to a “velocity vector”. This Vector will contain values much like colors. There’s many ways to get pixelbender to output data. To be able to read the data quickly in our mainloop, we need a vector again. So the easiest way to get a vector with uint’s from a pixelbender is to actually let it render to a bitmapdata.

The reason we are using 2 bitmapdata’s is that it’s significantly cheaper to instanciate them yourself and pass them, then to pass 1 bitmapdata for pixelbender to use as input and output, as it will allocate a block of memory for the output in any case. This way it’s faster and we have more control.

After the pixelbender filter has done it processing, we simple do a getVector on the bitmapdata to get the result into a Vector.<uint>.

A tiny bit of pixelbending

To get familiar with pixelbender kernels, we’re going to use one to calculate the data in our velocity field with. The pixelbender filter is one of a very simple kind, let’s have a look at it. It only has one input and it samples four pixels for every pixel it evaluates. Sampling from data within the Pixelbender is quite heavy, but for now we will only run this filter during the initialization of our swf. It might be fast enough to run every frame even.

What this filter does is that it calculates the so called central difference for every pixel, using its connecting neighbours and store them back into the color channels. Let’s look at its input and output to have a visual reference before discussing it further.

The input & output.

Although that wikipedia page about the central difference might have scared you off, all in all what it does is very simple. For every pixel you look at the connecting neighbours in both directions and you look at the difference between that and store that. Note that we are normalizing the value, so that the maximum negative value of -1 and the maximum value of +1 fit in a range 0 to 1. This way we can read these values later on in our mainloop. We store the result for the difference on the horizontal axis in the green channel and the result for the vertical axis in the blue channel, hence the blue green teint of the image.

It’s good to note that PixelBender really makes these types of image calculations fast and simple. Long before PixelBender we had to do this using combinations of filters and copy channels to do this quickly enough. My first experiment with fast bumpmapping from a long, long time ago used that way of doing it. When I started working on Papervision3D it resulted in this experiment.

Making things move

Now that we have this new set of data, how do we make our particles dance to it ? The reason we wrote too a vector uint is so that we can read from it quickly. But we store the data in two different channels, being blue and green. Let’s take that data out and manipulate our particles velocities using it.

If your unfamilar with masking and bitwise operations, this might look slightly strange. But if you have worked with colors using setPixel before, you might be familiar with the pattern here. Basically we extract the “color” as output by PixelBender from the green and blue channel. I put color in quotes there, because it’s not what we use it for. We simple use the byte packing into uint’s as a way to quickly access one value in a vector (which is the heaviest), and then unpack it into two of its components. A uint in Flash is 32 bit, so it contains 4 bytes. In theory we have 2 bytes left for storing other data. More on that in part two.

And it does almost exactly the opposite of what this did in the pixelbender loop :

The only difference is that pixelbender treats the color value as a range between 0-1 and that our actionscript values will now be filled with a value between -127.5 and 127.5. We handle putting the particle velocities to reasonable values by multiplying them by a rather small value. Play around with the values if you like.

To assure that the particles don’t pick up an infinite speed we dampen it by multiplying by a value near to one, as to only slightly lower it every loop.

p.velX = p.velX*.99;
p.velY = p.velY*.99;

So is it magical yet ?

If we look at what it looks like now with a thousand particles, we see this:

Still here without Flash huh ? You\’re persistent !

So, no it’s still rather uninteresting. Although our particles do seem to have to taken up a life on their own.

Adding complexity by adding the numbers

The particles seem to settle in areas, and the randomness of their movement and location is worsened by the fact that they are moving over perlin noise, where the noise part of that implies it’s somewhat random.

As I said at the beginning of this post, part of the percieved complexity of a particle system is in its numbers. What happens if we replace the perlin noise by a picture and add a lot more particles, let’s say 20.000 ? And then tweak the values a bit ?

The length of this simple introductory Session One already got way out of hand, so it’s time to close for this session, which means that unfortunately Alchemy is saved for next week. We’ve covered the basics of a reasonably performant particle system in pure, directly compilable AS3.

To continue this series and make the example code do a lot more then it does now, we need to start diving much deeper in PixelBender and the Alchemy-memory within Flash Player. We’ll be able to increase the number of particles to a much higher count then 20.000 of them using these techniques and hopefully be able to fully understand why and how. Along the way we’ll be able to look at how these technologies function inside the Flash Player.

Hopefully, we’ll polish up this example to a point where it’s starting to become interesting to look at. You’d be surprised to see how close we are to achieving something like the Particle Reactor with the current source code. But to be able to fully explain ways of using PixelBender and Alchemy, we need about another time the length of this post.

Until then, I would love to see your experiments based upon the source code above. If you do happen to do anything with it, please send it over and if there is enough response, maybe we’ll add it to next the post. I’ll give you some hints; dynamic vector fields, colors, trails.

I would like to thank Seb Lee-Delisle for reviewing this post and helping me to get it cleaner, better and more to the point. Another big thank you to Tim Knip for reading and pointing out the missing explanations.

I would also greatly appreciate any feedback on this article from you. I’m new to writing large articles / tutorials like this, and would love to hear what you think and what you would like to see.

53 Responses to The Particle Sessions : Part One, Is this it ?

This is a great tutorial and examples. Your writing style and detailed code explanation is excellent. I think I need some time to completely digest this, and no doubt will need to re-read it several times. Looking forward to the particle sessions series

This might be the vectors we are using. The way it’s being treated might change from system to sytem. As I said, the getVector implementation in flash api is suboptimal. In part 2 we will move away from this approach to a totally gc free implementation.

Fantastic to get some master-class insight in this area, there really isn’t much out there in this regard!

Having followed the article one thing that wasn’t even mentioned was how to use a pixelbender shader (by embedding the .pbj file). I’ve played with pixel bender before so I was okay, but I can imagine some one with out any pixel bender experience would be rather confused. Other than that the article read very well. I especially like the link to the mathematical concepts behind what you’re doing. Too often tutorials just tell you what to do without explaining the why behind it. Also really good to see some approachable insight on pixel bender and alchemy, maybe you could do the same for joa’s apparat?

curious side note: in most territories the cover of the strokes album was not particle collisions, but rather a woman touching her naked buttocks.

In light of keeping the post somewhat short I just provided the source code for it. The full Pixelbender part was out of scope for this session. I will dig a bit deeper in the next and will try and include the basics of getting it running. I’m planning to get apparat in this series, although that is not written as a linear follow up example yet.

Cheers!
I’ve doubled the size now, mapped the pixel colours to the Kinect’s depth camera values and instead of clearing the pixels every onEnterFrame, I add a blur and colormatrix filter. It’s really fun to play about with.http://twitpic.com/3re3fj

This article is fantastic! Very well-written and well-explained even for someone who doesn’t have his head around every piece (me).

One thing to note – and this is sort of ridiculous – is the usage of “then” instead of “than” when making comparative statements. I know that’s super nitpicky, but it seemed like a consistent mistake, and I thought you might like to know. If I’m totally out of line, forgive me.

Anyway, this post really blew my mind. I might be showing my naiveté here, but I had never even considered building the particle system on the Main class’s bitmapData instead of just adding a bunch of tiny Sprites as the particles, so this is a real paradigm shift for me!

I doubt that linked list will be faster then this solution, as this is as barebones as it gets. If it’s using alchemy, which I doubt, then it might be faster. That being said, in the next part we will be looking at restructuring the data for alchemy, which can be much faster.

I’m moving through the backlog of things to post about, these are on the list. That being said, I’m taking the blog as; either post something thorough, or not post at all. So it might take a bit. The raymarcher and the distance fields are going to be lots more interesting with Molehill on the horizon though.

this would be great to use live. I would love to be able to have a camera set up at a concert and project the particles on a huge screen behind me. let me know if you plan to make something like that as stand alone software or as an app at somepoint!!!