I've recently found galaxial while searching around the net for rendering techniques to use in large space scenes.I've read that you were having performance problems with rendering transparent objects, hopefully I can help a bit..I've tried several algorithms, but I found most too complex (a-buffers, depth peeling, bleh), required msaa or they were slow.The only one I really liked is weighted blended average. Works surprisingly well AND really easy to implement.It's not as precise as alpha blending, but for galaxial's art style it could work very well, and the weight function can be fine tuned easily.

The variant I use is kind of a double pass. You draw opaque first as usual, but you also draw objects with alpha cutouts with a > 0.9 (or so..) then you can draw the alpha cutouts where a < 0.9 and the transparent objects in the second pass.

I use a point list to draw objects (basically just a position and a few properties) and unpack them to quads with a geometry shader. I can render ~100k sprites with ~12% cpu usage on an i5 3570k and an amd 7870. Most of the time is spent copying to the vertex buffer(and actually rendering) and sorting by textures (which I'm planning to get rid of soon)

Btw, what kind of spatial partitioning do you use? The most efficient I found is a hierarchical grid (more like a quad-grid), that is, for frustum culling. (not planning collisions, only forces)

The variant I use is kind of a double pass. You draw opaque first as usual, but you also draw objects with alpha cutouts with a > 0.9 (or so..) then you can draw the alpha cutouts where a < 0.9 and the transparent objects in the second pass.

Draw the object first with depth writing and alpha testing enabled, discard pixels with < 1.0 alpha. Then in a second pass, draw with alpha blending and depth writing off but depth testing on. This should give smooth edges where it didn't draw a pixel before.

Problem is it doesn't work with transparency, only the opaque cutout things like ships.

Ideally I would like to use a-buffers as there is not significant overdraw at most pixels in Galaxial. But that would mean needing OpenGL 4.0+ to run, currently it only needs 2.0...

I use a point list to draw objects (basically just a position and a few properties) and unpack them to quads with a geometry shader. I can render ~100k sprites with ~12% cpu usage on an i5 3570k and an amd 7870. Most of the time is spent copying to the vertex buffer(and actually rendering) and sorting by textures (which I'm planning to get rid of soon)

Nice, are you able to render them in perfect batches (one draw call per texture/material)?

Btw, what kind of spatial partitioning do you use? The most efficient I found is a hierarchical grid (more like a quad-grid), that is, for frustum culling.

Just simple frustum culling (using radar approach, not 6 planes) to check whether to render something. That's all that is really necessary in Galaxial.

Still unsure if I should be making drastic changes to the rendering system. The game is fast already but the engine programmer side of me knows it could be so much better...

StuartMorgan wrote:Hi marton, thanks for the links. I'll give them a look.

The variant I use is kind of a double pass. You draw opaque first as usual, but you also draw objects with alpha cutouts with a > 0.9 (or so..) then you can draw the alpha cutouts where a < 0.9 and the transparent objects in the second pass.

Draw the object first with depth writing and alpha testing enabled, discard pixels with < 1.0 alpha. Then in a second pass, draw with alpha blending and depth writing off but depth testing on. This should give smooth edges where it didn't draw a pixel before.

Problem is it doesn't work with transparency, only the opaque cutout things like ships.

Ideally I would like to use a-buffers as there is not significant overdraw at most pixels in Galaxial. But that would mean needing OpenGL 4.0+ to run, currently it only needs 2.0...

The technique is only similiar for handling alpha cutouts, since weighted blending doesn't support high opacities. Two objects where a = 1, would appear 50% transparent. Also doesn't need multisampling or alpha to coverage.

StuartMorgan wrote:

I use a point list to draw objects (basically just a position and a few properties) and unpack them to quads with a geometry shader. I can render ~100k sprites with ~12% cpu usage on an i5 3570k and an amd 7870. Most of the time is spent copying to the vertex buffer(and actually rendering) and sorting by textures (which I'm planning to get rid of soon)

Nice, are you able to render them in perfect batches (one draw call per texture/material)?

Yes. I'd love to get rid of having to sort by textures (by caching order by spatial partition, where spatial partition = star system and surrounding space).Also, if I could just memcpy the draw info to the vertex buffer, cpu usage would be around ~3% for the same amount of sprites.Unfortunately using structs in C# with an ECS would complicate things a lot, so copying it is.

StuartMorgan wrote:Still unsure if I should be making drastic changes to the rendering system. The game is fast already but the engine programmer side of me knows it could be so much better...

Basically, as described in the links, the process looks like this:- draw opaque and alpha test alpha cutouts (a > 0.9) - depth buffer write (this can go into the backbuffer I guess)- draw alpha cutouts (a < 0.9) and transparent to two render targets (color, opacity) with a weighted blended shader - depth buffer read- merge the two render targets with a resolve shader to the backbuffer

You can tweak the weight function if things don't look right.A-buffers are extremely complex compared to this, and from my tests, I couldn't notice any significant blending errors.(edit: Because there WILL be blending errors.Whether it's good enough depends on the graphics style. I think it would fit well with galaxial's. In the worst case, the blending errors would be a 'part' of the art style, haha)

The blending and resolving part needs special blending modes as mentioned in the articles.

Well, as it turns out the weighted average algorithm is really sensitive to variables. And so it pretty much fell apart when I started using it to render anything more than the 3 test planes, haha. It works nice for.. particles. Alpha cutouts? Not so much. Especially when rendering particles on them. It took too much time to fine tune each effect to make it look the way you want. Then it changed colors as the depth changed. Oh well.

For now I'm rendering everything alpha blended back-to-front. Didn't see much improvement from early z rejection, so not using opaques (I guess it doesn't matter on modern hardware even with moderate amount of overdraw, when the pixel shader is only a texture fetch)It works okay (10k sprites with particles between, ~1400 batches, 10% cpu, of which ~35% is sorting... okay, it's horrible)

Another thing I have in mind is getting rid of alpha cutouts and drawing those as opaques, then using a post-effect to blur the edges (kind of like a fake SSAO). Could be enough.