One of the big limitations when doing GPGPU computing is that since the architecture is designed to work on parallel, re arranging or compacting data is a non trivial task on the GPU, to overcome this limitation different stream compaction methods have been created, being histopyramids the algorithm that will be discussed in this post. Histopyramids are useful since it allows to allocate the data at the beginning of one texture, avoiding to evaluate complex shaders in fragments that don’t have useful information, that’s why this process is quite important making general computation using graphic cards.

The following image shows what is achieved with stream compaction. The texture represent a set of slices from a simulated 3D texture showing the different Z levels, in a 2D representation the data is scattered all over the texture, The colored line represents the same data compacted and displaying the 3D position of the voxels reallocated.

Algorithm:

Implementing histopyramids is not a straightforward process, it requires two different steps which are reduction and traversal (compaction). The first phase, reduction, is the process where the algorithm generates a set of textures levels based on the original texture (base texture) holding the scattered data; the amount of textures to generate in the reduction process are based on the size of the original texture using the following equation.

Textures to Generate = Log2(original texture size). (for power of two textures) (1)

For the reduction process, a base texture has to be generated from the original data texture, the idea is to define a binary image where each position will be marked with a RGBA value of vec4(1.) in the binary new texture. This would define a transparent texture where the white texels represent the data’s position allocated in the scattered original texture.

This first step is quite similar as the mipmapping generation from the GPUs, but instead of using a median reduction for the generation of each texture, each pixel of the parent level of the pyramid is calculated using the sum of the four pixels contiguous of the leaf level texture. At the end of the reduction process a pyramid of images is created where each level represents the sum of the pixels from the leaf levels. The top level of the pyramid, a 1×1 texture represent the sum of all the data present in the base texture.

Once the reduction process is completed and the pyramid is generated, the traversal is required to compact the data. The total data to be allocated in the compacted texture is defined in the top level of the histo-pyramid (the 1×1 texture). Each one of the output elements of the resulting stream has to traverse the pyramid to finally obtain a UV position (texture coordinates). This texture position relates the output stream with a data value from the scattered texture.

The traversal is done using a key index for each texel from the output texture, meaning that each pixel would be associated with an incremental unique 1D index that would be used to compare against the different levels of the pyramid. Also each texel from the pyramid would be re interpreted as a key interval value.

This second step starts at the first level of the pyramid, checking if the key from the texel to evaluate is inside the total range of the data to compact. To do so the key value has to be lower than the total sum allocated at the top level of the pyramid. If the key is lower than the total range the traversal goes to the second level of the histopyramid, in the 2×2 texture, where each texel value can be represented as “a, b, c, d” and the following ranges can be generated:

Each traversal step evaluates where in the ranges the key index of the output stream can be allocated (or falls into), that range gives a 2D texture position for each level traversed. On each new level a new set of ranges have to be generated and a new UV position is extracted based on the range where the key falls into. It’s important to specify that the ranges are generated in a “Z” order from the four adjacent pixels.

In the final phase, the base texture is used and the ranges would be of a unit difference, meaning that this step gives a UV coordinate that can be used to read the corresponding data in the scattered texture and place that data in the output telex from the stream.

The previous image shows the traversal in two levels for the output texel with a key value of “4”, the sum of all the values is represented as the 1x1 texture which value is “9”, the first level represented with a texture of 2x2 has four values that defines the following ranges:

A = [0, 3), B = [3, 5), C = [5, 8), D = [8, 9)

The key evaluated falls in the B range, hence for the next level the algorithm should use the four values allocated in the [0, 1] position, in this level the new ranges are defined as:

A = [3, 3), B = [3, 4), C = [4, 5), D = [5, 5)

In this case the key falls into the [2, 1] position of the base texture, since this texture is a 4×4 texture the corresponding UV coordinate would be [0.5, 0.25]. This coordinate would be used to read the data from that position in the scattered texture and write the read value inside the position relative to the key used to traverse the pyramid.

Implementation of the Shaders required:

The implementation in webGL will be done using plain javascript, this allows to transfer the code to whatever library the user would like to use (three.js, regl, babylon.js). This tutorial also assumes a certain level of understanding of basic webGL operations like shaders compilation, quad generations and the use of the different tests (scissor, depth, stencil) that can be applied in the framebuffers. If the user feel the need to revisit these concepts the following URL (http://learningwebgl.com/blog/?page_id=1217) is a good place to visit. First the required shaders will be discussed based on the algorithm previously explained, and then the javascript implementation will be addressed based on the programs generated.

Reduction Step:

Since all the important operations will be done in shaders, there are two main programs that have to be created, a reduction program and a traversal program. All the programs will be performed over a quad, meaning that the vertex shader is only responsible to generate the vertex positions of the quad and to provide a UV coordinate for each fragment in the fragment shader.

The vertex shader evaluates all the data for the vertex on execution time, but the user can also provide attributes with positions and UV coordinates for each vertex if desired.

The fragment shader works similar to a mipmapping shader, hence four texels from the input texture have to be read and the resulting color would be the sum of those four values. This can be done with the following shader.

The fragment shader reads the adjacent information from the evaluated fragment and sums that data to obtain the final fragment color. Since this shader is applied to all the different levels of the pyramid during the reduction process, it requires to know the actual size of the texture evaluated, this is provided in the uSize uniform being more conveniently defined as the inverse of the size value.

For the traversal process there are two ways of addressing the problem, writing the traversal in the vertex shader or writing it in the fragment shader. Both ways require to know how many vertices of fragments would be required to traverse the pyramid, so this should be controlled using javascript instructions. Since a vertex shader for a quad generation is already defined this new program will make the traversal in the fragment shader using the following code.

The previous traversal shader requires 3 important inputs that are defined below:

- uTexture3D: this vector has four values that represent the following data:* uTexture3D.x = size (length) of the base texture.* uTexture3D.y = size of the 3d texture simulated in the 2d texture (ex. a 128x128x128 3d texture can be allocated in a 2048×2048 texture, the corresponding data for this value would be 128).* uTexture3D.z = number of buckets of the 3D texture allocated on one side of the 2d texture. (based on the previous example the 2048 texture side can allocate up to 16 buckets of 128 x 128 layers).* uTexture3D.w = quantity of levels generated in the reduction process.

Notice that this data is mostly used to evaluate the 3D position based on the UV position obtained in the final traversal (this traversal is adjusted for a marching cubes application), so on a generic case only two values would be required for this shader, the size of the base texture and the amount of levels generated in the reduction passes.

- uBase: this would be the base texture generated from the scattered data.

- uTotalData: this is the 1×1 level texture that defines the total data to compress.

- UPyramid: a single 2D texture that holds all the different levels generated from the base texture, since all the levels are defined as half of the size from the previous level. This texture is generated in javascript copying the data from each level, when it’s generated, to a single texture using the copyTexSubImage command.

Having a single pyramid texture avoids the use of different uniforms for each level, which also limits the shader to a defined amount of maximum levels based on those uniforms. The image below shows an example of a Pyramid texture generated in the reduction process.

Another advantage of this single image is that it allows to debug issues in the reduction process, since it’s straightforward to visualize any issues that could arise when reductions are generated.

The traversal process is done using a for loop which is limited by a constant upper bound since webGL only allows to use constants to define the loop, this shaders is limited to up to 14 traversal loops, hence textures up to 4096×4096 can be used in this shader. To avoid the use of if branching the traversal evaluates the first three ranges using vector masking and logical comparisons, the final range comparison can be evaluated as the difference between the total range of the whole loop and the first three partial ranges evaluated before, that avoids to evaluate another texture read.

Since the quantity of loops are the same one for each thread is the same, and there are no “if” branching inside each iteration, all the threads will execute the same amount of instruction ensuring that the shader won’t have any divergence.

The use of a unique pyramid texture requires to use an offset position that has to be updated for each new loop, this offset has to be included inside the UV coordinate obtained to traverse the next level. Since the base texture is not included in the pyramid texture, the last traversal is done outside the loop, this final step uses the base texture to define the final UV positions required.

The traversal is done in the fragment shader where each fragment has an unique 1D key defined by it’s position on the texture, the key generated is used for the traversal. If the user would like to write the parser in the vertex shader it’s important to define two attributes for the vertices, the first one would be the position of the vertex in the final texture and the second attribute would be the key for that vertex to use in the traversal process.

One important thing of this algorithm is that it doesn’t require scattering (or repositioning) of the data (vertices), hence all the final positions of the data will be known in advance. In that regard using the fragment shader for the traversal has the advantage of avoiding positioning the vertices with the vertex shader.

On a final note the user can observe that if the traversal shader is applied to the whole texture, there will be fragments that will generate keys outside of the total amount of data to compress To avoid this issue the level0 pyramid texture (1×1 texture) is provided as an uniform using its value (the total data to compress) to compare against the key generated by the fragment’s position. If the key is higher than the value from the texture the fragment can be discarded. This could also be done with the final level allocated in the pyramid texture since the 1×1 texture is also copied into the pyramid.

Packing and unpacking of floating point values.

The last two programs are required to read floating point values in webGL using the gl.readPixel() command. This command only accepts 8 bit RGBA values to be read, hence the sum represented in the 1×1 texture can only be read packing the data from a floating point value to a RGBA 8 bit value that represents the 32 bits of the original data. To do so the following fragment shader is run over a 1×1 quad.

This two functions are pretty straightforward, they just generate a texture with a clamp to edge filtering, allowing the user to define if the texture would be a floating point texture or an unsigned_byte, also if there’s data provided it’s passed to the texture during the creation of it. The second function allows to generate a frame buffer based on a texture provided, it has a color attachment for the texture and a depth attachment for z buffering if required.

With the previous function a set of textures are generated for the pyramid using the following code:

Notice that the total levels to generate is based on equation (1), since the logarithmic function in Javascript is 10 based, the resulting value has to be divided by the constant “Math.log(2)” to change it to base 2. The baseTextureSize variable represents the length of the original scattered texture. The textures and frame buffers are saved in two arrays to be used in the compaction function.

tRead and fbRead makes reference to the texture and frambuffer used to make the packing of the floating point value present at the top level of the pyramid to be transferred to this texture using the RGBA 8bit encoding.

The compaction function is where the shaders will be executed, to so do three programs had to be generated beforehand using the shaders explained in the previous sections. the javascript application saves these programs using an array using string keys to invoke the corresponding program. The three required programs are:

“GENERATE_PYRAMID” program; this is generated using the quad shader and the reduction shader, it requires an ID attribute for the four vertex that generate the quad. It also requires to declare the needed uniforms for the reduction step, mainly the sampler used for the reduction, and the size of each sampler

“PACK_DATA” program: to compile this program the quad vertex shader is used and the packing shader is the corresponding fragment shader. It requires the same attribute than the previous program, and the uniforms are the 1×1 texture sampler input and a value representing the max value available to encode with the shader.

“PARSE_PYRAMID”: this last program uses the same vertex shader than the previous two ones and the fragment shader is the one that defines the traversal. The uniforms used are defined in the previous explanation for this shader.

To use the compaction function the three programs have to be generated, compiled and have their uniforms and attribute declarations ready. Pointers to the shaders uniforms are saved in the same array position that holds each program as a dynamic attribute, these pointers are used to send the data to the GPU on each invocation of the compaction function.

Compaction function

Once the textures are generated a compaction function is written to be used every time the application requires it, compaction can be done on every time the original data changes, hence it’s a good idea to have it on a helper function to call it whenever it’s needed. The following code shows the javascript calls to run the shaders.

The previous function required the initial texture to be compacted, the frame buffer where the result will be saved and an expected amount of the data to process if the user doesn’t want to use the gl.readPixels method to accelerate the process. The function itself reflects the two steps of the algorithm (reduction and compaction) in two different program uses, in between the user can decide to read the amount of data in the original texture using the readCells boolean variable, defining a third optional step (in between the reduction and compaction steps).

The first step, reduction, invokes a set of quad draw calls on the different pyramid textures using the reduction program, for each iteration a new size is calculated and the corresponding uniforms are send. After the new level is generated the end result is copied to the pyramid texture using the copySubTexImage method, using an offset to separate each level of the pyramid in the same texture.

This first step doesn’t require to use a depth test, but since the function can be run on each frame is important to clear the color buffer for each level. Notice that the program used for this step is the “GENERATE_PYRAMID” program.

Reading this data in javascript represents the bottleneck of this function, since this requires to synchronize the CPU with the GPU respond, meaning that the application can suffer an idle state while waiting for this information. This means that if the user opts to read the data on every frame the total framerate of the application can be hurt just trying to have a precise amount of data to evaluate after the compaction process.

To overcome this limitation an expected amount of data is defined as part of the arguments of the function to supply a way to avoid the use of the gl.readPixels method. The idea is that since the GPU can read the data in the traversal shader any fragment that doesn’t require to be traversed will be discarded, and for future steps the user can safely assume that the data processed is below the expectance defined by the user.

The traversal process invokes the “PARSE_PYRAMID” program and it generates a quad draw using the scissor test to avoid any operations on fragments that are outside of the range of data to compact. To calculate the region of fragments for the scissor an area defined as the size of the texture and the ratio between the totalCells divided by the size of the texture is used, if the total cells are not read using the second step then the expected amount of cells is used. With the scissor test and the discard method the traversal function is accelerated.

The function returns the total cells read in the second step, or the expected cells defined by the user (the value gets unmodified), it also saves the compacted data in the provided frame buffers ready to be used for subsequent GPGPU operations.

The previous function used to helper functions to bind textures to the corresponding program and to bind attributes too, these functions are defined below.

The previous shaders are a straightforward implementation from the algorithm explained before, but there is a quite nice improvement that can be done in the traversal shader to jump from three texture reads on each level to only one; this is done using vec4 histopyramids. The following image illustrates the modifications to the algorithm in order to implement this variant.

These kind of pyramids make the reduction in a different manner, since the GPU allows to work with four channels per pixel the reduction shader can be rewritten in order to save the partial sums of the four pixels read instead of the total sum done before. This can be seen in the following fragment shader.

In the previous shader the program saves the partial sum on each channel of the final color, this allows to read the ranges using only the parent pixel, which means that the traversal only requires to allocate one texel from the texture level to traverse. Also the ranges can be evaluated faster since once the end of each range is defined creating the start values of the ranges is trivial. The following shader shows the implementation of the traversal with only one texture read.

Notice that the traversal shader doesn’t change too much, the texture fetchs and the ranges evaluation are based on a new variable, “partialSums”. This variable reads the partial sums from the parent level so there’s no need to make multiple texture fetching. Finally these two shaders can replace the previous ones with the same javascript implementation.

In the next post a marching cubes implementation for WebGL will be explained using this compaction method.

Some time has passed since we have posted anything, so we are going to try to write something somehow interesting. This post is versed about the making of the one sweet project developed in our studio, it is called #translatingmelvin.

Project:

On December of 2011 we received one call from Ruben Martínez of Hommu Studio, he wanted to talk to us about one crazy and funny idea that we loved the first minute we heard about it…

The idea consisted about translating the language of the plants.

The project consists of a webcam transmitting the image of a plant in real time, this plant had to mount some control sensors (humidity, temperature, lighting), and with the values obtained from them we had to define the mood of a plant called Melvin. The development also required an artificial intelligent that allowed the user maintain one conversation with Melvin, and the answer would be different depending on the mood of the plant.

Talking with Ruben we realized that there was no interactivity with the installation, so we gave him some ideas like activating one air pump that could create bubbles in a water container, or dropping soap bubbles, or maybe turning on or off one light, perhaps watering the plant, or opening a curtain close to the plant. Something that could help the user to feel that the installation really existed and he/she could interact with it. Sadly no one came to prosper.

Melvin also had to emit via Twitter the generated answers in the conversations with every user. We also had to develop a system in order to talk with Melvin from Twitter using the hashtag #translatingmelvin. With this we had the social part of the project covered.

Production

Instalation

We had been working some time with one data acquisition hardware from National Instruments, using LabView and some sockets connections to perform hardware control of the things we could do in Flash. Even though, for this project, we decided to give a try to Arduino and Operframeworks.

From this blog we would like to congratulate and send our admiration to the teams of both projects, Arduino and Openframeworks are very powerful solutions and very well documented, with a quite big community of people giving solutions to issues more or less difficult. And the best part of all is that all this information is given in an altruistic way, and it is available to anyone who has a little interest in learning about these projects.

The components

The first step was to choose the electronic components that we should use. We found a big help from an internet store called Farnell.com, thay have very good prices, and almost any component that you need. The delivery is also very fast; we had what we needed the next day before 12h. This last paragraph sounds like a paid comment inside one post, but when you try to find a humidity sensor in your local electronic store you feel relief that Farnell exists.

We knew that we were going to connect the components to an Arduino, at the beginning we were going to use an Arduino UNO, but finally we decided to connect everything to an Arduino Mega since we needed more analog outputs to move seven relays.

To sense the humidity we used one sensor called “Sensirion SHT11”, this is a digital sensor that presents some issues since we had to use one cable larger than 5cm, and the signal was lost in the way. We had to use one capacitor and isolated cable as is explained in the datasheet of the sensor. The capacitor makes the signal stable along the way, and it allows us to read it with a sufficient intensity to be read without losing any bit in the journey.

To obtain the light intensity we used a Vishay Siliconix BPV22NF photodiode, this is an analog sensor and it was as easy as to send some current intensity to the enter pin and read the current intensity of the output pin.

For the temperature we used a LM35 sensor, this analog unit works in a similar manner that the photodiode, you feed it with current and you obtain a variable output that defines the actual temperature.

Circuits

To turn on the light bulbs that represent the mood of the plant we had to commute the 220V alternate current using the 5V direct current from Arduino, so we also needed a set of solid state relays. These are the Sharp PR39MF51NSZF, following the instructions from the datasheet we had everything mounted in a couple of minutes.

After we had the commuting system for the 220V bulbs, we started to make some tests in the plant´s installation and we found that the image from the webcam got burned when the light´s conditions were even low or light low, so we had to create a system that allowed us to dimmer the light depending on the current lighting from the surroundings. The bad news was that it was not possible with the actual components that we had selected.

We had to change the actual bulbs because these bulbs do not allow us to dim the light intensity, so we adopted a solution of 3 leds bulbs that we could dimmer using transistors as current amplificator, the leds work with 12V DC hence we don´t needed the relays anymore. The basic circuit is shown in the next image.

The previous circuit is controlled using the values obtained from the photodiode, as the lighting from the room increases, the lighting from the leds increased too and vice versa. This way we found that the image from the webcam was never burned and people could read the icons from the mood in low lights conditions.

Software

In order to control the whole hardware we wrote one application in C++ using the Openframeworks libraries.

In the first place we have mounted a connection system with Arduino to control the sensors, to make it we used the “OfArduino” class, this requires that Arduino gets provided with the sketch of “Firmdata Standard”. Once the “Firmdata” was installed, we could setup and manage the pins from the board directly from the C++ code running in the computer.

This way we could send and read the volt from the pins of the sensors, or we could send volt to any given relay to turn on or off the 220v bulbs, or modulate the current applied to the transistors for the Leds dimming.

The application logic is quite simple. It is just a state machine.

One of the issues we had to deal with was that the sensor SHT1x (the one for the humidity and temperature) has a protocol that requires to send pulses with a nanoseconds frequency. Using Firmdata to communicate with this sensor is not adequate.

The solution for this problem was to move the communication code into the Arduino Board, since we can send high frequency pulses from the board to a specific pin. Finally we had to merge the Firmada Standard code with the communication code for the sensor.

Once we obtained the humidity and temperature values from Arduino, we had to send those values back to Openframewors, this was made hacking the code from Firmdata. We defined to virtual pins of Firmdata to be the ones that send the values obtained from the sensor, we also had to insert one exception in the read and write pins buckle of Firmdata so it would not read or write anything from those pins.

In the Openframework´s code the value arrived without any issues, like it were to more regular pins.

The communication between the Flash application and Openframeworks was made with a Sockets server using the addOn ofxNetwork, specially the ofxTCPServer class.

The politics of Flash about crossdomains require that if you want to establish a net connection via Sockets with the navigator, the socket server must send, by a specific port, on string containing a Crossdomain.xml giving access as many domains as the user wish. This was made leaving the 2048 port opened and dedicated for this only task. When someone established one connection through the sockets to this port, this function was called:

Once the exchange of files was made, the connections between the designed port and the client could be initiated, that allowed us to obtain persistent connection in real time with a web browser. It is very easy to implement and it is very stable. We made proof with more than 100 connections at the same time and the application worked just time.

The next video is recorded on real time with an AMD Radeon 6970M HD (2Gb), feel free to watch the video if the simulation does not run in your computer.

Some links to start

When I was reading information about shadows particles for the previous posts, I saw one demo that really called my attention, this demo used one technique for fluids simulations that I really wanted to see how it worked. If you have seen the demo you will probably know this blog, here I found that the fluids where made using SPH (Smoothed particles hidrodinamics) for the simulation, so now I had a place to start searching for information. What I found was some good links that explain very well the maths behind the system.

If you want to understand how the simulation works you should read the paper from Matthias Müller, it explains the basics of fluid simulations using particles to calculate the properties of the fluid. You can also read this paper from Harada, this last paper is best suited for a GPU implementation.

There are two more links you should reed, Micky Kelager also explains the SPH, but the most important here is that you find how to deal with the collisions in the simulation using distance fields and he also defines some ways to calculate some important coeficients for the simulation. Another good link is the post from Iñigo Quilez, he defines some useful analytics distance fields that we can use to create collisions objects in the simulation.

Finally the GPU GEMS 3 has a good chapter for rigid body simulations that explains very well the creation of uniforms grids to find the neighbors of one.

The implementation

The process have some few steps explained in the next image, the most important of then are the neighborhood search and the velocity update. I do not explain too much in here because the previous links explain much better than me the implementation of SPH in a GPU.

Neighborhood search & Velocity update:

One of the most important thins in the simulation is to find the particles around the fragment particle, because they are responsible of the forces that will be applied to the simulation. We implemented a uniform grid based on the paper from Harada. This grid is a 3D voxel partition of the space that allows to keep track of up to four particles located in the voxel. A closer look of the algorithm is presented in the next image (taken from GPU GEMS 3).

The velocity update needs that the collision information among the colliding objects and the particles so you have to implement it in this step, what we did is to create spring forces based on the paper of Kelager using distance fields, one signed box, defined in the post from Iñigo.

The coeficients

Using the SPH to simulate fluids require to define some coeficients that alter he properties of the fluid. These are:

gridSize: Change the quantity of the voxels used for the uniform grid, if you make a bigger gridSize you will have more definition for the simulation, but the overall performance will be hurt. If you change this parameter also change the quantity of particles for the simulation.

volume: The volume defines the total space that the particles will use in a rest state, it is also used to define the mass for the particle, so this means that if you change the total rest volume, you will be changing the also the mass of the particle.

pressureK: this variable change the strength of the pressure from each particle, this variable is important because it avoids particles from collapsing, on the other hand, if you make it too strong, the particle will repel each other too much and the simulation will be unstable.

viscosity: the viscosity is a damping factor among the velocities of the particles, it keep particles moving smoothly relative to each other.

maxSearchRatio: this is perhaps the most delicate variable in the simulation, it defines the maximum kernel smoothing search ratio for the weigh kernells. If this value is too slow, there will be no particles to interact for the fragment particle, so there will be no simulation, but in the other hand, if the ratio is to big, the forces wil not be smoothed properly, and there will be some instabilities in the simulation.

restitution: this value is responsible for the strength of the collision forces, this variable acts as a spring force for the boundaries, this way you can control how the particles react in a collision (in a elastic or inelastic way).

dt: this is the integration time for the simulation, since we are using a simple integration equiation “At+1 = At + dt * n”, values should be slow to avoid errors produced for big changes in big time steps.

quality: this variable defines the quantity of particles used for each voxel to calculate the densities and forces, changing this value speed up the process but the simulation suffers from lack of information, so the visual quality is somehow hurt.

You can also see the position, velocity and density field applied as shader colors in the particles changing the shader option in the control panel. Finally you can swith from point to billboard sphere rendering method. All these variables can be updated with the control panel, once you update them you only have to restart the simulation.

Finally here are some images of our first breaking dam simulation.

Simulation showing the densities of each particle.

Simulation showing the position field of the particles.

Simulation showing the velocity field of the particles.

We would like to thanks AlteredQ for the help of testing the simulator, sadly we couldn´t find a solution for some configurations, so we have made a video if you can not play the simulation correctly. The simulations works fine with these configurations:

Feel free to try it, and if it´s broken please let us know your configuration, and what exactly you have seen while you played the simulation. It is our first version of the implementation, so we hope to make it faster using a more convenient neighborhood search. We hope that future posts will go on two branches, SPH optimizations and particles rendering using marching cubes.

When we try to code anything in our studio, the first thing we do is to search for information about the subject in order to find papers and techniques that could help us to develop what we want. The bad thing about it is that you loose a lot of time in Google trying to find that specific paper that reveals the “secrets” of the math behind the code.

But some time thing are even worst, because we do not even know what we are searching for, and this is when a good starting link is a good “gift”. In our case finding this blog has been a perfect starting point to find GPU techniques and papers related to those techniques.

The work made by these guys is amazing, but more important is that they talk about all the methods used in their works, so you can start searching by your own if you like to do something like. In our case we wanted to make some particles animations like the ones found here, and they have the perfect starting point in this post.

Two main things we would have to deal with in order to get a simple particle animation, these are:

Create a fluid like animation for the particles.

Shade the particles to give some volumetric effect.

Curl Noise:

In the post from DirectToVideo we found two lines that called our attention, they talked about a technique to fake velocity fields for a procedural fluid flow called “Curl Noise” created by Robert Bridson, and this is how we found this paper. In this paper the author defines a divergence free noise suitable for velocities simulations.

Using divergence free potentials is important for a flow simulation because it avoids sinks in the flow (try to think that with this you will not get all the particles going to a single final position, like a hole in space). Implementing is quite simple because you only have to calculate the differentials of the potentials (noises) used to create your field.

To create the potentials used for the velocities we used the equations from the next image. The only assumption that we made is that every scalar component from the potential vector field is a 2D function that would satisfy the rotor operator.

The second group of equations explains our assumption, we defined each axis potential as a function of the other two axes in order to use 2D textures for the tree potentials, this way we do not need 3d volume textures for each potential. With this assumption we ended with tree 2d textures that defined our potential so the velocity would be defined as the rotor operator applied to the potentials field as you can see in the equations. To calculate the partial derivates we only had to calculate the finite difference the given textures, and the good thing is that since we have our point defined in a texture space we could use this position to get the potential value for the tree defined potentials.

Now that we know how to create the velocity field we needed tree good noise textures to work with, this is an issue because if we load some external textures our development would be tied to those images. So we decided to create the noise textures by ourselves.

Our first approach was to make textures using perlin noise functions, but we then found that they are somehow expensive to calculate so we opted for a more practical solution, simplex noise, and the best thing is that we found one webGL hack here. With the noise functions solved we only needed to make it turbulent, so we found that we could add higher frequencies to get a turbulent noise, you can read a full explanation here.

This is how we ended creating our own textures that we could change in realtime (not in every step), and would allow us to find the best noise suitable for our needs. In the next image you can see the same base noise with a without turbulence.

Saving the potentials as textures add a new problem to solve, textures in webGL only saves 8 bits per channel, so if you naively try to save the potential value in one channel you will only save a 256 units space value. This means that if your noise is defined in the rank [0-1] you will find jumps of 1/256 in your noise. To solve this you can pack your one dimentional value in the RGB components of the textures, this would give you a resolution of 2**24 for a given value.

We use two functions to pack and unpack floating point values in the textures, these are defined like this:

The first function wraps one value in RGB components, so you can save one value with very high resolution, the second function unwrap the RGB components to the original value. Note that the function do not save negative values, so is up to the programmer to define how the unwrapped value is going to be treated. The bad news about it is that you need a whole texture to save one value, so if you want to save one 3D position of one texture you will need three textures for it.

The last paragraph defines the main bottleneck of the application, since we have to save the particles positions to run the simulation we need to traverse all the particles three times to save each axe position in a different texture. In the next image you can see one of the axis aligned position textures used.

Particles shading:

To shade the particles we applied the same technique from our previous posts (http://www.miaumiau.cat/2011/06/shadow-particles-part-ii-optimizations/), but we modified it to accept a variable bucket quantity so we could adapt the final resolution of the shadows in the volume. Since we can set many more slices we saw that the interpolation used between two slices was not necessary, so we took it away to get some more speed.

Now that we can change the buckets count we had to look for the best compromise among the buckets size, the quality of the shadows volume, and the quality of the floors shadow, for a 2048 texture size we found that the initial 8×8 buckets count is a very good compromise for the given parameters. With this in mind if you want to implement good quality shadows in your volume we recommend 16X16 buckets slices, but the final shadow for the floor will be very slow. In the next image you can see one test of the shadows with and without color.

The process:

With the curl noise, the volume shadows and the floating point issues solved we ended with an algorithm of seven steps, this process requires five render targets changes and we need to traverse all the particles five times. Three times to save the particle´s current position, one for the shadows and one final to paint them all.

We also perform some window, or 2D, calculations over the fragment shader to get the correct blurring for the final composition, we also calculate the potentials using the fragment shader.

Two things should be implemented to get a correct particle´s rendering, first of all, if we want to render all the particles we will need to use blending, but this requires that all the particles get to be sorted (if not you will se much less particles than the ones you expect). The other thing is to use another texture to control the life of the particle. This way we don´t have to reposition the particle in the emitter every time it reached it´s “y” limit.

Some workarounds:

The previous commented process have a huge bottleneck saving all the particles positions, so we wanted something faster to get the particles position. So instead of calculating the positions in a step way, we defined the positions based on a path (a Bezier path), this Bezier would be affected by the noise so we only would need one texture to save the Bezier points.

One good thing about using a Bezier path is that you can use the linear interpolation of the textures to get the current particle position in the curve, this way we don´t have to save all the particles positions, instead we only have to save some Bezier points in a texture (to be concrete 512 points) and then we read the particles positions using the gl.LINEAR function to calculate the point interpolation from the texture reading (all of this for free).

So if you can have as many paths as the height of your texture, and each path can have as many points as the width of the texture. In our case we only used to paths, you can see some images of the implementation.

The previous image shows the positions texture using two paths, and the 3D result of the Bezier applying the curl noise. This way we don´t need to save the positions in three textures. The bad thing about it is that all the particles are constrained to the Bezier path, so we cannot perform some particles dispersion.

Instead of applying the curl noise to the Bezier path we could use some of the total particles to create a volumetric path, and use some more to make some particles dispersion with the first technique. In the next image you can see how the Bezier paths are rendered using just one texture.

So if we use the Bezier path to move some initial particles, and we also move the emitter position over the path, and we finally define the exit initial velocity for the dispersion particles in the emitter using the bezier´s tangent, we think we could make something like this, but with much less particles

In the last post we talked about the implementation of volume shadows in a particle system, this last approach used a couple of for loops in order to define the final shadow intensity for each particle. This loops could make that the application has to evaluate 32 * nParticles the shadow map, making the process somehow very low.

So we tried a pair of optimizations that would deal with the defects of the previous work, the first thing we have done is to change all the texture reading to the fragment shader, and then we tried to make the sum of the shadows for each step with the fragment shader also. At the end of the working process we defined the next steps to get the shadows done.

First Step: Shadow Buckets

The first step is no different that the last approach, we defined 64 buckets in the vertex shader using the depth of the particles assuming that all of them are inside a bounding box, then we render them in a shadow framebuffer defining a color depending on the shadowDarkness variable. For this step we use the same getDepthSlice function in the vertex Shader.

There is one very important difference in this function from the previous one, this new one requires two parameters, one offset and the transformMatrix. The second one is used to tell the function which view (camera view or light view) is used to define the buckets, this is usefull If you try to use the buckets system to sort the particles and blend the buckets.

The first parameters offers you the possibility to obtain the next or previous bucket from a given depth, with this we can interpolate from two buckets or layers, that would give us a linear gradient of shadows between two layers instead of 64 fixed steps.

Second Step: Shadow blending

This step is the main optimization from this code, what it does is to blend the previous bucket into the next so you can define all the shadows for each layer in 64 steps. Blending the buckets require to use the gl.BLEND and to define the gl.blendFunc using gl.ONE and gl.ONE.

Then you have to render one quad to the shadow framebuffer using the shadow map of the shadow framebuffer as a texture, the most important thing here is to get the bucket coordinates for each of the 64 passes for the quad. This is also done in the vertex shader, with each pass, one depth is defined for the quad and the vertex shader defines the UV texture coordinates using the next chunk of code

This step has no difficulty, but requires the use of a composition framebuffer for a intermediate destination, this is made to avoid the forward loops in one texture. We defined the blur in two differents passes, one for the X component and another for the Y component, we used a simple box blur but you can perform a gausian blur or some other blur you like.

Fourth Step: Particles rendering

In this final step we render all the particles to the window framebuffer using the shadow texture, the lightTransformMatrix and the cameraTransformMatrix, first we transform then to the light view to read the correct shadow intensity from the map.

In this part of the process you could read the shadow force for the particle depending on the depth and the bucket, but this would make that all the particles in the same bucket would show the same “force” even though their initial depth are different. This is when we use the offset parameter from the getDepthSlice function, for a given depth and bucket we can obtain the current bucket and the next one and interpolate the shadow force of the current depth between those two layer.

Once the shadow force is defined we have to transform the particle position to the camera view to obtain the light attenuation and position of the particle, and finally render it.

Here we get the uvTexture coordinates for the current bucket and the next one (notice that we call the getDepthSlice function using an offset of 0.0 and 1,0), there is also the attenuation calculation in for each particle in this shader.

Here we can define a color tint for the shadow, we also read here the shadow from the texture in the current bucket and the next one to perform the interpolation fot the final shadow intensity.

And that is it, four steps to render the shadows, this steps gives you the chance to blur the shadows, make some interpolations between two shadow layers and the possibility to tint the shadows. But it has some disadvantages too, this process requires multiple changes of render targets, and it requires a good variables definition to make it work.

In the example (click on the first image if you haven´t done it yet) you can see these variables to play with:

- shadowBlur: it defines the blur in the shadow map.

- shadowDarkness: as it name says, it makes the shadow darkness or lighter inside the volume and the shadow applied in the floor.

- shadowTint: it “tints” the shadow with the particles color.

- lightAngle: it allows you to change the light direction from 0 to 45 degrees, this helps you to see how the shadows change in the volume and on the floor depending on the position of the directional light (remember that it´s always pointing to the center of the object).

- cameraRatio: this ratio gets you closer or farther to the volume.

- useAttenuation: it toggles the light attenuation over the distante (squared), if on you will see the effects of the shadows plus the attenuation, if off you will see only the shadows in the volume.

- useInterpolation: it toggles the interpolation between two buckets for a givven depth of a particle in those two layers. With a high bRatio and this variable off you should see the segmentations of the shadow in the volume.

- showTexture: it shows you the final shadow map used for the volume shadows and the floor shadow. To read it right you should start from botton to the top. The closest points to the light are in the left down corner of the image, and the final shadow (the one used in the floor) is in the top right corner of the imagen.

- bRatio: this is the most important variable of the whole process, this float defines the bounding sphere ratio that contains the volume. You must define one sphere that holds all the particles inside or you will find some glitches in your rendering. But beware, you could be tempted to define a big bounding sphere for the current volume, but what you are really doing is assigning less layers for the volume.

For this example is really ease to define the bRatio, but when you are animating the lights and the particles things starts to get messy and the only thing left is to define a big bounding sphere. If this is the case you could end up with 16 to 8 layers for one volume, and that is when the interpolation of the buckets information helps. Try to play around with this variable and turn on and off the interpolation and you will see the effects of it.

As we said before we hope we could write a third article using this method to do some animations.

About one year without writing so, this is the first of three very verbose posts about shadow particles…

One year and a half ago Félix and me were talking about how we could implement a good shading effect for a particle system made in Flash. In those days people could render 100.000 particles with no problem using point lighting or color shading based on the position of the particle, but we wanted something more, and there is when we started to talk about shadows on particles.

Shadowing the particles was not an easy task (for Flash in those days), but we could find very useful information about it in the web, the GPU GEMS has a very good article about rendering volume objects, and we also found a very handy paper about smoke particles written by Nvidia.

Both articles talked about the “half angle rendering”, this kind of rendering uses the half angle between the view direction and the light direction to sort the particles and calculate the shadows for them. The bad news about it is that we must sort the particles with that algorithm.

We tried to implement the algorithm in Flash but it was a total failure, because of that we parked the idea for a not so immediate future. Some time later we got news about webGL and Stage3D (Molehill) so we got very excited about the idea of having hardware acceleration (and the possibility to implement the shadows). Even though we still love Flash (with its future acceleration).

Two weeks ago we finally got some time to play around with webGL, so we started to learn the insides of it, we really recommend the reading of this page, it has many tuttotials that actually helped us. We also took a look to the OpenGL ES 2.0 Specification in order to see what functions we could for the shading.

Another good thing to get in touch with webGL is to read to “classes” of the Three.js Framework, these are WebGLRenderer and WebGLShaders, the first one is very explicit of how to render objects with webGL, but the second one is a must to read, this class wraps all the shaders written and you can explore all the shaders that Three.js has to get an idea of how to program your own shaders.

Shadow Particles, “our” aproach:

Shadowing particles is quite easy in OpenGL if you have all the extensions that it offers in the desktop (the full OpenGL version), but in the web (OpenGL ES 2.0) there are some issues that has to be solved to make it work, these issues are:

- There is no 3d Texture in webGL.
- There is no gl_FragData in the fragment shader.
- Sorting particles in the web should be done by Javascript or in the best case with the graphic card.

If we don´t have 3d Textures we can´t save the shadow information in layers as we first wanted, by the other hand if there is no way to use gl_FragData in the fragment shader we must render the different information in different passes for each channel data we want. Finally we decided that sorting in JS is not an option, and trying to sort in the graphic card is one task we didn´t even consider (too hard for us ☹).

So at the end we decided to use the following algorithm:

- All the particles would be “part” of one “object”, by this we mean that we would make only one call to the graphic card to paint all the particles, we would provide one array buffer of the quantity of the particles and their initial positions. This means that all the animations would be made in the vertex shader.

- Render the shadow information in a layered 2D texture: we would use a 8×8 layered texture in order to have 64 shadow layers, the main idea behind it is that each layer would represent the shadow that affects the next layer. It is like if you slice the volume in parts and you render the shadow that the first part gives to the second, the second to the third and so on.

- Use Render to texture from a framebuffer to define the 2D shadow texure: since OpenGL Es gives us the possibility to use framebuffers we thought that we should render the information in one buffer in order to get the texture. In the next pass we could use the texture in the shaders to define the points colors.

- Use a second pass to define the final color of the particles and paint them the window framebuffer.

So as you could read before we would use two passes rendering all the particles at once, the first pass would get the shadow information and the final pass would render the particles with the “correct” color.

Rendering the shadow map:

The shadow map that we wanted to obtain is something like the following picture, it has 64 “buckets” for 64 different depth defined with transforming the vertex positions to the light UCS. Once the transformation is done the 64 layers must be between the minimum and the maximum z of the volume containing all the particles.

The bad news of this is that you must have a bounding box to define the zMin and the zMax, and it gets worst when you animate your geometry. To solve this instead of a bounding box we decided to use a bounding sphere that could give us one measure (the ratio) that contains all the geometry, this is how the “shadowBoundingRatio” was born.

The “shadowBoundingRatio” is a ratio defined by the user (the programmer) based on the geometry that is going to be renderer (if you are going to animate the geometry there are some hacks that could be done to properly change this variable).

Now that we have one way to define the 64 depth steps, we only have to use the vertex shader to render the particles in their bucket, the bucket is defined using these equations:

The first line of code define the actual depth step of the particle based on the shadowBoundingRatio (uniform uBoundingRadio in the vertex shader) and the uShadowRatio (we explain it later). The second and third line define the x and y bucket position from 0 to 7 in the 64 buckets texture to properly render the particle.

The uniform variable uShadowRatio is the equivalent of the camera distance from one target. It allows us to adapt the slice render to the position in the texture (think of it of the only way that you could see all the information without mixing it in other buckets o rendering it too small).

Once the depth step and the actual bucket are defined we only have to place the particle in the framebuffer, we alter the gl_Position of the particle using the previous information, so getting the actual bucket and defining the gl_Position is made with this function in the vertex shader:

So every move have to be done taking into account the “w” factor of the position, that is why we move the particles relative to the “w” factor in the previous function. A further reading of this is highly recommended here.

With the previous function in the vertex shader we obtain one framebuffer that we could render to a texture for the next render pass, you can see one example of it clicking in the next image. This example is the shadow map for the main example, so you could se how the shadow changes because of the light´s movement.

The last example shows a white map with black particles, actually in the framebuffer we use a vec4(0.0, 0.0, 0.0, 0.0) rgba background color, and the particles are painted with a 1.0 / 64.0 * vec4(1.0, 1.0, 1.0, 0.0) particle color. We make this so if we want that a particle has no light incidence all the 64 layers should give shadow to the particle, the less layers affect the particle position, the less dark the shadow would be in the actual particle.

Once the particles are draw in the framebuffer we only have to get the final texture to the next pass, one good example of how to make this appears in this tutorial from LearningWebGL.

Rendering the second pass:

The second pass is used to simulate the light interaction with each particle, since particles does not have a normal not very much can be done with real lights simulations, but one thing can be done and is to define a light attenuation based on the distance from the particle to the light source. If the distance is defined with the light position, we can get a point light source, if it is defined with the plane conformed with the light vector (light.target – light.position) and the light position, we can define a direction light.

Since the shadows in the layers are renderer parallel to each layer, the final shadows are relative to a directional light source, but in our approximation we have a mix of point light attenuation and directional lights shadows (this is because the point to point is easier to calculate than a plane to point distance). But if you want to be rigorous you can change the attenuation distance calculation for a direction light.

The final attenuation is by definition the square of the distance obtained (via directional or point light), but it can be defined by any power of a fractional number to obtain a desired effect (linear attenuation, square attenuation, cubic attenuation).

To change the shadow darkness you can change the divider of the vec4 color used in the vertex shader (the 1.0/64.0), if the 64.0 divisor is slower you would have darker shadows, higher values would make the shadow softer.

If you think the attenuation is right, and the shadows are fine, but the whole thing is too dark, you can always perform an ambient light (adding a fractional color to all the particles) after you implement the shadow and attenuation.

Not everything is perfect….

The second pass has a little issue that we will implement for a future post, each particle has a shadow contribution of all the previous depth layers, so you have to read the shadow information from all the previous layers for each particle and them sum the collected information to get the shadow intensity for this particle.

This means that “for” bucles must be used in the vertex shader, so for the first depth there are no contributions of previous depths but for the n depth we have a contribution of (n-1) depths, and this make the shading quite slowly compared to the only use of the attenuation. How slowly does it make it, well if we calculate the sum of n from n = 1 to n = 64 we have 2048 passes (it is not THAT bad), but this number has to be divided by the total depths (64) so it gives us that it has a 32 textures readings for each depth (this slows down the performance a lot).

There is a work around to this problem, and it is as simple as to apply 64 more passes to the framebuffer to sum the individual buckets, so the first bucket would have no information, the second would be the sum of the first and the second, the third would be the sum of the first, the second and the third and so on.

In each step of the composition we add to the next bucket the previous sum of the previous step, each bucket would have all the information in itself. This means that we do not need to use the “for” bucles to read the texture and sum the components in the shader. Another benefit of this composition is that the final bucket contains all the shadow information that the volume produce on one surface, so you can project the shadow from the particles to any object in your scene.

This optimization of the code also allow us to redefine the second rendering pass in order to recreate a blending among the particles, we can render each depth with its final color like we did in the shadow texture and then we would have the particles sorted in layers. Finally we only would add each part of the frameBuffer color texture to get the proper blending to transparent particles.

There is another hack for the “for” issues, it´s actually what we are doing in this example, and it´s to “assume” that the stronger shadows for one leyer are the ones closest to it, so you don´t have to sum all the previous layers for one depth, but a limit of them (we are using only ten layers). The drawback of this hack is that it gives many artifacts (but it´s a little faster).

In the next post we will make the improvements we have talked before, a second post will talk about the composition in the frameBuffer (for the shadow texture and the second pass of the color texture), and finally a third post will talk about how to use this algorithm to animate particles (taking into account the limitations of the bounding sphere ratio), and how to render the particles shadow to a surface.

For this time we have a little example that shows all we have discussed in this post, one simple volume that you can press the scene to drag the particles, turning on and off the light attenuation and shadows to see the shading on the them. You can see it clicking on the image at the beginning of the post.

Many sites have been done implementing the interaction between video and photos, text, or any other additional assets that could be inserted in the original video to enhance the final experience. Tracking simple planes can be done very easy with the help of After-Effects or any other post- production software, but when it comes to track complex surfaces there is not a simple way to do it.
We were asked to develop one way to track surfaces for one project designed by David Navarro (www.davidnavarro.net). David wanted to track one t-shirt in order to place the users photo in the chest of the t-shirt. So after thinking how we could make it we got the main idea in five simple steps:
1. Track predefined control points placed in the t-shirt with the help of After-Effects.
2. Modify the video in order to create the mask and the shadings that would affect the mesh previously created.
3. Define one Nurbs-Mesh using the tracked control points in the t-shirt.
4. Create a color correction for the inserted image by the user.
5. Integrate the mesh and the video for the final result.

Tracking Control Points:
When it comes to track control points in a t-shirt the first idea that came to us was to draw a grid of points in a t-shirt, and try to track them with Motion. The main problem with it is that if you define an 8×8 grid you would end up tracking 64 for points per frame, (that would probably not be well tracked so you would be editing many frames in order to get the result that would be required).
So instead of tracking points we ended up tracking regions, these regions are defined as four corners areas that would be equidistantly defined in the original grid of points (see next image).

In the image below there is a rectangular region of the chest that would be changed with one image defined by the user. This region contains rectangular areas that would be tracked using Mocha (a plug-in for After Effects). Since all the squares share at least one corner with each other, only the black squares in the t-shirt must be tracked in order to get all the control points needed.
Once the squares are tracked for all the needed frames, you should re-order the four corners obtained for each square in order to rewrite the points in a row-column ordered format used to create one interpolated Nurbs Mesh. Finally you could export these points with the X, Y and Frame information in a XML to use in Action Script.

Video Masking and Shading:
Since we are not specialist in video post-production, we counted with the help of Manolo Calvo (www.freelancetv.es), he was responsible of the final tracking of the rectangular regions, but most important He was the guy that made the masking and shading of the video in order to superimpose lights and shadows to the generated mesh. Video masking can be done using the mesh generated with the original control points, one thing to take into account is that this mask should be smaller than the final integrated mesh in order to avoid holes between the mesh and the video.
In this work this was not the case, Manolo made the masking by hand, and he also painted the lights and shadows over the video in order to give us a final composition with the alpha channel ready to integrate the final mesh. In the next image you can see the final result of the video composition with the lights and shadows added.

Creating the Nurbs Mesh:
Using the Classes commented in these previous posts (http://labs.miaumiau.cat/?p=178, http://labs.miaumiau.cat/?p=338), we could interpolate one mesh with the original control points, the final mesh could have the resolution needed to show the final image in a flexible way, and it also allowed us to modify the behavior of the interpolation (the degree of the final mesh in the u,v directions). Depending of the typology of the surface you could decide the way you could interpolate the surface, if the internal points do not affect the final interpolation, you could interpolate using only the border curves of the surface, in the other hand if the surface has big changes in its interior you should interpolate using all the control points defined in the grid of the t-shirt (or the surface you are tracking in the video).

Color Correction:
The final video made by Manolo had desaturated colors, low values of contrast and also low values of brightness, so in order to adapt the final mesh to the video some color corrections should be done to the image inserted by the user. In order to do so we programmed a set of modifiers for a given bitmap data and we tried with many pictures in order to define the final values of contrast, brightness, saturation and blur that would by applied to each image inserted. Every time one image is going to be inserted in the t-shirt the values of these parameters are changed to make the final color correction.
Using this correction and the shading created over the video would give the illusion of integration between the mesh and the video.

Video and Mesh Integration:
Having the video post-produced with the alpha channel and the lights and shadows painted in the final composition, integrating the mesh for the final result is very easy. The mesh must one layer under the video so this last one can affect the mesh representation like a multiply blend mode.
So using the five previous commented steps, we have programmed one simple example (click on the first image) that shows the final integration in the video. You can change between one image or the webcam, you can also change the depth of the video and the mesh to see how the video affects the mesh, and finally you can see how the mesh is created.

Finally we would like to thanks Manolo & Ivan for their invaluable help with the video tracking and post-production for this project.

Nurbs surfaces require to understand how the Nurbs curves work, We have written one post about it, so if you haven´t read it yet press here. The surfaces are a extension of the curves, the only thing is that you´ll need to parameters (u, v) in order to define the surface, so when you are going to calculate the surface you need to calculate the curves in one direction “V” and the results are used to calculate the curves in the next direction “U”.

The main advantage of Nurbs surfaces against Bezier surfaces is the control of the degree for each direction, this means that for each surface you have two independent degree values (degreeU, degreeV), that define the local control interpolation for each direction. Nurbs surfaces also count with the posibility to adapt the weight for each control point in order to make the surface go closer to that point.

The modeller presented in this post allows you to work with one Nurbs surface of 7X7 control points, you can change the degree for each direction, the segmentation for the whole surface and the distribution of the domain region parameters.

Surface Tessellation:

Nurbs surfaces can be seen as a two dimensional non linear transformation of a rectangular domain region [0-1] [0-1] to a parameter region. This behaviour presents one problem when you try to tessellate the surface because if the parameters in the domain space are equally distributed, the result of the tessellation in the parameter space is not going to present the same equal distribution. In order to overcome this issue a couple of cubic transformations are used to define the distribution of the parameters, these curves are located in the surface editor of the modeller (the light blue boxes).

If you rotate the surface (try to make it when is plannar) and change the sliders of the boxes you will see how the distribution of the segments change, this should be done in the plannar case for each change of the degree in order to get an equal distribution in the parameter space.

These curves don´t solve the distribution problem in a exact way. The best thing to do is to calculate the parameter that should be used for each step forcing the condition of a fixed step between each point in the curve. This process requires to precalculate the curve, then calculate the curve´s length to define the step for each pair of points, and finally use the first derivates of the curves to calculate the needed parameter. The bad thing about the previous process is that it requires many iterations for a given parameter in order to evaluate if the parameter fits the defined fixed step, so this is not a good solution for real time rendering.

Degree U,V:

The change of the degree in each direction change the interpolation factor (local control) for each direction, the same control points (with the same weight definitions) can result in NxM differents surfaces (being N the number of control points in the U direction and M the number of control points in the V direction). So you can have a linear interpolation in one direction and cubic interpolation in the other (second image), or you can have a quadratic interpolation in one direction and a cubic interpolation in the other (third image).

Segmentation (level of detail):

One advantage of working with parametric surfaces (Bezier or Nurbs) is the total control of the level of detail for a given surface. The tessellation of the surface can be done evaluating the error of the aproximation from the parametric surface, this means that defining the error (distance from a given parametric point to the tessellated point in the surface) you could know exactly the segmentation needed o satisfy the error. In the modeller you have an option for segmentations changes in the surfaces if you want to have a better or worst aproximation.

Changes on the segmentation can be used in real time to satisfy a required frame rate and level of detail, if you have many surfaces on one scene, you could define more segments for the surfaces close to the camera and reduce segments for the surfaces away from the focal view. This would help to speed up the rendering of one scene. The last option in the Nurbs editor control the total segments used to tessellate the surface, you can model with low segmentations (keeping a high frame rate to move the points and rotate the shape), and then you can see the final result giving more segmentations to the surface. If you want to try the modeller just press on the next image.

There will be a third part of these series of post (Nurbs in Flash) based on how can the surfaces be rendered using the advantage of the parametrization for the vertex normal calculation and other features that speed up the rendering in real time (only for parametric surfaces).

This post is a little long so I will post it in two parts, the fist one is about NURBS in 2D (Curves) and the next one will be about NURBS in 3D (surfaces)…

Ever since I have been working in Flash I wanted to write one Class that would allow me to work with NURBS curves and surfaces the way I used to work them in 3dsMax, but getting to understand how they work is something somehow difficult because there is some math that needs to be applied in order to make it right.

NURBS is the acronym for Non Uniform Ration B-Splines, Nurbs curves and surfaces represent a precise mathematical representation of a free form curve or surface. All the theory needed to understand them is well explained here, but i´ll make a short introduction of the Nurbs concepts for a better undestanding of the example below.

B-Splines:

B-Splines are a generalization of the Bezier curves, these kind of curves use a set of interpolation functions called Basis Funcions , hence the “B” on “B-Splines”, the kind of interpolation of these function has the advantage of allowing local control on the curve with the control points unlike the bezier curves. This means that meanwhile the all the bezier curve is affected by all the control points, the control points of the B-Spline curve only affect one portion of the whole curve. The construction of the basis function can be read here.

Rational B-Splines:

Rational B-Splines are the generalization of the B-Splines, the rational factor enables to change the control point influence in the curve using weight factors on each control point, changing the weight of one control point will make the curve get closer to or farther from the control point edited. Using rational b-splines conic sections can be represented exactly.

Non Uniform Rational B-Spline:

The non uniform part of the acronym is not another generalization, it is the way rational b-splines start and end on the desired end points. The basis functions needed to interpolate the spline require a set of knots that define the way each basis function interacts with the global interpolation, the creation of the knots can be done in a uniform distribution, using the same distance among them, or using a non uniform distribution that force the interpolation start in the first control point and finish in the last control point. You can see the way the knots affect the resultant curve here.

The example from below has two curves with seven control points, the red one is a Nurbs curve and the blue one is a Bezier curve, the light blue lines represent the convex hull of both curves. You can move the control points to edit both curves and see the differences between them, you can also change the weights of the control points and the degree of the Nurbs curve to see how the degree change the general interpolation.

One of the most interesting things with Nurbs curves and Bezier curves is that for a given set of control points “n” (all with the same weight), the Nurbs curve of degree n-1 is equal as the Bezier curve created with the same control points. The lower the degree of the Nurbs curve the closer the curve will be to the convex hull. Finally if a degree of 1 is defined the curve will be exactly as the convex hull.

The AS3 Class:

The actionScript class calculates nurbs points on a curve and nurbs points on a surface. It has three static functions: Nurbs.nurbs(), Nurbs.nurbsCurve() and Nurbs.surface(), the parameters needed for each function depends on the theory for the Nurbs curves and surfaces. The whole class is not here, only the first function that allows to find the points on a Nurbs curve given a defined parameter.

The function Nurbs.nurbs() requieres four parameters:

p… parameter (value between 0 and 1).

controlPoints… vector of vectors 3D that define all the control points in the curve.

degree… degree of the curve interpolation.

order… the order alters the way the knots vector are created (uniform or non-uniform), usually it should not be used.

Some time ago I wanted to program a ray tracer on real time. The main problem I faced is that ray tracing is difficult to compute, because the quantity of calculations grows when you place more objects on the scene. In order to get a good frame rate I needed to reduce the calculations per frame to a minimum.

So I started with something very simple: one cube, one sphere and one light. The algorithm trows one ray per pixel of the image from the camera to the scene, checking if it collides with any object on the scene. In the first tries, besides the simplicity of the scene (five squares and one sphere), the performance was very bad. Since I couldn´t reduce more the quantity of objects to show, I had to make some improvements, sacrificing the code legibility, until the point of placing all the functionality in one single function in the style of spaghetti code. I wish I could have a goto instruction in AS3!.

Finally, as a final optimization, every image is drawn in a very small bitmapData, this data is then scaled through the smoothing option found in the Bitmap Class finding the final image with a more useful size. In this case the distortion is very clear, so I “killed” the sphere.

Bezier curves are widely used in Flash for many effects, from tweening one property to drawing complex curves and surfaces, but we have never found one Class that would allow us to work with them the way we wanted to.

Our main goal writing one Class for Bezier manipulation is to have one tool that calculate Bezier interpolations for curves and surfaces of any degree, so the first thing to do was to study the mathematical model of the Bezier curves in these useful article. In the article the Bezier curves are treated as a combination of the Berstain polinomials which uses the factorial function that requires many calculations for high degrees.

The previous problem is solved calculating the coefficients of the curve for the given control points and saving them on a buffer Array that helps us to speed up the calculation process for further interpolations. As the article suggests the interpolation of any curve is given in the [0 - 1] domain.

As you can see all the comments are written in Spanish and many functions have so many parameters that is somehow difficult to understand what these functions do, so i´ll try to explain every function and it´s implementation in this post and in the near future post. Anyway i´ll comment some important features of it.

The combinatoriaData Array is a set of coefficients of the Berstain polinomials that are pre-calculated for degrees lower than 12, this helps us to speed the initial calculation for one interpolation. If you are going to work with a higher degree, the program calculates the coefficients for that degree and saves the values on the array.

The bezier function is the simplest function of the class, it allows to interpolate one single 3D point from a set of controlPoints and a parameter value between 0 an 1. You can use this function to get the interpolation point to point.

The bezierPoints function does the same thing but gives all the points of the interpolation in one array, you have to note that the “m” value is the number of points that you want to interpolate. Since all the points are equidistant in the domain space you can´t control interpolation by distance from point to point in the parameter space, but there´s one parameter that allows you to at least control the initial distance between borders and the next / previous parameter point, if you would like one margin if you are treating with surfaces (i´ll promise i´ll make one example of this, because it´s difficult to understand just by reading it).

The surface function does the same thins that Away3D´s bezier surface function except that you can define the MxN control points as you want (not just 4X4 surface). To use the function you have to pass the M dimension, the Vector with the MxN points and the segmentation for the U,V direction of the surface. The last parameter, the “force” is the repetition of the control points in the surface to simulate weights for a given controlPoints, this makes the surface approximate more to the controlPoints resulting more like a Nurbs curve with the same controlPoints. The previos function is used to calculate the surface in this post .

The getPatch function also returns a surface, but this surface has the property of being controlled only by the curves that defines the borders of the surface, it means that the internal values of the surface are only border dependant and there are no internal controlPoints for this surface. This function is more complicated to use, but is more versatil in some applications like transitions and deformations.
Finally we have written one simple implementation of this class using only the fist function “Bezier.bezier()”. If you want to see it, just click on the image in above and relax!.

After we programmed the surface renderer for our last project, we decided to work a little more in order to implement a simple editor that could give use the opportunity to model simple objects based on a single surface. This editor enables us to work on further materials and uses for lighting, and it also allows us to make simple morphing surface animations in the near future.

The above image is a saddle modeled with the editor, the initial surface is a 7 x 7 RationalBezier Surface and all we did was to move points arround in order to get the shape done. The editor has a very easy interface to use, you can rotate the view by pressing on the screen and moving the mouse arround, when you press the control points you can drag them in 3d in order to adapt the shape to the convex hull made by the points.

There are some commands that enables you to view your model in differents ways, these are:

“w” : sets the wireframe mode.

“t“: sets the texture mode.

“l“: shows or hides the yellow guides lines.

“c“: shows or hides the control points and the lights.

“r“: reset the mesh and the lights.

As you can see, you can model your object using the control points and the guide lines, changing your view on a wireframe to get the mesh without lighting and change the texture mode to see how the lights affect the mesh. You can also move the lights in every moment to see how the shading changes with the light´s position.

By the moment we have made two shapes with this editor, the first one is a simple attempt to make a basic surface with the shape of one mouse, and the next object was the saddle shown in the beginning of the post. The first mesh, the surface mouse, is shown on the next two images

Since the surface is a rational Bezier surface, there are some artifacts that happend in the border with the texture, this is because the weight of the control points in the borders make the surface tessellate with less spacing around them. If you want to give it a try just press on the editor´s image (next image) and have fun!…

In these days some of our projects require the use of 3D, and dealing with 3D objects in Flash can be a very painfull situation if you need to show many poligons and apply dinamic material to those poligons. In order to explore some of the “new” features of Flash, we decided to skip the mainstream 3d libraries (Away and Papervision) and try to create a simple renderer for our projects and our needs.

The renderer is intended to “bring to life” any kind of parametric surface, these kind of surface would give us a domain space were the surface tessellation is defined. Working with the parametric space and the domain space gives us the oportunity to calculate texture maps on the domain space that could be applied to the parametric space. The domain space and tessellation also gives us the chance to simplify the calculations (this can be done because every step of the domain space is formed by two triangles for tessellation purposes), so every texel in the texture is a step in the domain space which can be interpreted as a interpolation of the information of the two triangles that are part of the step.

Working the texture for every texel (product of the mix of two triangles allows us to calculate the lighting for every surface taking into account the distance of every triangle from the light, it also allows us to use multiple lights on the scene and recreate phong materials with specular values defined through the material definition.

In some future posts we will try to clean up the code for this simple renderer if you would like to play with it.

I’ve been playing around with the webcam on AS3 and finally i got into something i liked.

Basically what I’m doing here is to generate a grid of shapes keeping references between objects using a linked list. I am working with a small video size to optimize performance, so I keep two positions in each properties at Particle class: origin and staticPosition.

origin is a Point keeping the coordinates of the pixel on the Video object which corresponds to the particle.

staticPosition is a Point keeping the coordinate that corresponds at stage.

Here there is a sample of the generate grid method, it has comments and the complete source is ready to download at the end of this post.

// storing the next particle at current
p.next = new Particle();
// define the coordinate that corresponds this iteration at the source video
var origin : Point = new Point(_x / particleWidth, _y / particleHeight);
// send it to the origin propertie at the particle
p.origin = origin;
// define the current position at stage
var point : Point = new Point(_x, _y);
// place it to two properties at the particle
// at ‘staticPosition’ , this value will not change
p.staticPosition = new Point(point.x, point.y);
// and at ‘position’, this value will change to animate flight
p.position = point;
// place the particle at stage coords
p.x = point.x;
p.y = point.y;
// add the displayObject to the container
particles.addChild(p);
// refresh the ‘p’ variable to store the next particle at next iteration of this bucle
p = p.next;
// add a particle width to the ‘_x’ var for the next iteration
_x += particleWidth;

}

}
[/as]

We can color each particle based on each pixel rgb value that points to the origin coords, doing so we can win a nice semitone effect for the video capture.

On the other side, the class MotionTracker has a very simple motion detection implementation, it draws each frame of the Video Object over the last frame using BlendMode.DIFFERENCE and then thresholding filtering the pixels with a color higher than 0×111111

Here the code of the detection

[as]
private function tictac(event:TimerEvent) : void{

// locking bitmapdatas for better performance
current.lock();
output.lock();
input.lock();
// a current copy from the source
current.draw(source);
// making a temporary copy of currentFrame to work with both input and current
var temp : BitmapData = current.clone();
// here we are drawing the previous frame on top of the current and the difference blendMode will make the rest
temp.draw(input, null, null, BlendMode.DIFFERENCE);
// filter the pixels tha are greater than darkgrey #111111 and thresholding them into blue #0000FF overwriting the previous bmp
temp.threshold(temp, temp.rect, temp.rect.topLeft, “>”, 0xFF111111, 0xFF0000FF, 0x00FFFFFF, true);
// resets output’s color information
output.fillRect(output.rect, 0xFF000000);
// stores the current frame wich will be the previous on the next tictac
input = current.clone();
// thresholding the temp bitmapdata into the output bitmapdata with ‘equal’ colors
output.threshold(temp, output.rect, output.rect.topLeft, “==”, 0xFF0000FF, 0xFF0000FF, 0x00FFFFFF, false);
// unlocking bitmapdatas
current.unlock();
output.unlock();
input.unlock();
// cleaning memory
temp.dispose();

}
[/as]

I think it would be interesting if the particles would move randomly at the same point where the movement was detected by the motionTracker, and here we go.

The Particle class has two methods to move scale and position, it implements a wonderful bouncing easing equation seen at Erik Natzke’s blog and one method to modify the internal position Point to a random value.