Tell me about separable gaussian convolutions...

I've experimented with gaussian blurs in GLSL -- they work great actually, but the performance is fairly poor on my 5200. I get about 6 or 7 fps running a 5x5 kernel on a 800x600 rect texture. I can't say I'm surprised, but hey, that's what experimenting is for.

Now, I've already done some experimenting with running the blur on a downsampled image. Say, I downsample my 800x600 screen texture to 256x256, run the blur on that, then draw it scaled back up to 800x600. This works -- and runs pretty quickly -- but I get some nasty sampling artifacts.

That said, when I run GooBall, I see a gorgeous fullscreen gaussian blur that looks as if it were something large, like 21x21 or so, running at a fast rate in 1024x768. Significantly, I don't see what appear to be the kinds of artifacts I'd expect if this were being performed the way I describe above ( with the downsampling->blur->upsample approach ).

So I googled a bit and have seen described that gaussian blurs can be "separable", and performed in two blurring passes, one horizontal, one vertical.

What I haven't seen is any good description of why this would result in an acceptable output -- it seems to me it would look horrible.

Can somebody describe it to me why it should work? I'm willing to try.

TomorrowPlusX Wrote:That said, when I run GooBall, I see a gorgeous fullscreen gaussian blur that looks as if it were something large, like 21x21 or so, running at a fast rate in 1024x768. Significantly, I don't see what appear to be the kinds of artifacts I'd expect if this were being performed the way I describe above ( with the downsampling->blur->upsample approach ).

There was a thread about light bloom with Gooball screenshots being thrown around over at the Unity forums a while back:

arekkusu Wrote:This paper is also worth a read-- recall that the hardware is capable of sampling more than 4 texels during a texture fetch, so you might as well take advantage of that.

The offset by half a pixel while downsampling seems a clever way to perform a high-quality downsampling. I'm going to look into that.

Also, I implemented a simple full 800x600 separated gaussian convolution this morning. I compared my brute-force 5x5 kernel ( 25 texture lookups per fragment ) to a 13x13 separable ( 26 lookups ) and even though the separable requires two passes, the performance was almost identical ( slow! ), but the latter looked for all the world like a real 13x13. Excellent.

I'm going to look into the high quality downsampling approach described in the PDF. Thanks, arrekusu.

Well, I did a lot of work over the weekend and integrated it into my game, with classloading for specification of filters in the .world files my game uses, and it's working great, except, it's too slow for my machine.

That said, it'll be easy enough to turn them on or off from preferences.