This time I'm wondering how I can achieve a blooming result with OpenGL... So far here's what I did:

Render the whole scene once to a texture (using a FBO) which I'll call basetexture Display that rendered texture using a bright-pass shader and render that into a new texture (let's call it brightpasstexture) Display the brightpass texture to the screen and filter it using a gaussian blur shader, and render it to finalpasstexture Render the whole scene using multitexturing, with basetexture into texunit 0 and finalpasstexture into texunit 1, with GL_ADD

Here are the questions:

- Is there a better way to render each step than rendering to texture, then displaying using the next shader and rendering to texture again?
- I know I have to generate several textures from the brightpass one (or whatever texture is going to be used as "bright spots" one) to create the blooming effect. I know that using several gaussian blur shaders with bigger and bigger kernels is too slow, so I read I have to use the same kernel (usually 3x3) BUT each time with a downsampled texture (i.e. first time 256x256 then 128x128 then 64x64...). The question is: how do I generate these downsampled textures? Is there a fast way, or should I resize it at each pass? (I bet this is SLOW)

Here's the code I'm currently using (note I only use one blur pass at this stage):

Search the FBO extension spec for 'GenerateMipmapEXT' to generate the downsampled mipmaps.

hardtop

06-26-2006, 03:22 PM

Thanks Zbuffer. As for the way I render each step, is what I do correct? should I really render each stage to a separate FBO and then blend each into a multitextured final stage?

HT

k_szczech

06-26-2006, 03:41 PM

Hi!
My first HDR + bloom implementation was based on mipmaps but I eventually quit that when I got to the point that GenerateMipmapEXT threw exception on ATI (all parameters and texture valid).
Here is what I do now:

1. Render scene to texture

2. Render this texture to 4 times smaller one - use pixel shader that computes average of 4 samples, and place each sample at center between 4 pixels - This way I downsize 1280x1024 to 320x256 in single pass.

3. render this quarter texture into next texture performing horizontal blur - again using combination of pixel shader and native texture filtering to blur more samples as fast as possible

4. render to yet another texture this time performing vertical blur

5. Render to screen using that final texture and base texture.

k_szczech

06-26-2006, 03:47 PM

By the way:
glBlendFunc(GL_ONE, GL_ZERO);
Why not just:
glDisable(GL_BLEND);

If you do not want textures to blend then disable blending. Perhaps the driver will optimize it, but if not then you will have better performance when you disable blending if not used.

zed

06-27-2006, 01:04 AM

i havent done the mipmapping path, does it give acceptabler results, as good as doing the blur yourself?, ie pingponging horizontal + vertical blurs

ZbuffeR

06-27-2006, 02:15 AM

the mipmap is not for the blur itself, only to downsample original image so blur is faster.

k_szczech

06-27-2006, 02:45 AM

the mipmap is not for the blur itselfI've actually used mipmap for simple blur ;)
Final color:
0.95*level0 + 0.05*level1 + 0.02*level2 ...

You can imagine that it didn't look to well.

only to downsample original image so blur is fasterTo access downsampled image you have to specify texture LOD bias in the texture2D function in the shader - works fine on GeForce 7800, but on my GeForce 6600GT using:
texture2D( tex, coord, 0.0);
is noticeably slower than:
texture2D( tex, coord );

So I actually got a penalty for using mipmaps. Downsampling image yourself and then blurring it actually works faster if you do it right (4x4 -> 1x1 downsample in single pass).

Yet another reason for me to stop using mipmaps for fake blurring.

zed

06-27-2006, 11:22 PM

Originally posted by ZbuffeR:
the mipmap is not for the blur itself, only to downsample original image so blur is faster. ta, in that case its worthless as IMO bloom should not use the onscreen framebuffer as it produces worse results than a selected bloom

hardtop

06-29-2006, 04:01 AM

Just to know... here's what I do to render my post-processing effect: (I won't use mips, but downsample shader instead)

Do you think it's correct? how many "downsample + blur" ping-pong passes do you think could be needed in order to get a nice blooming effect? Is one enough, or should I render several ones? (I'm concerned about these FBO and shader changes actually)

Moreover, should I add some overload by splitting 4x4 blur to vertical blur and horizontal blur seperately? I guess it's cheaper in terms of texture sample access (3+3 instead of 3*3) but I'm also concerned about the overload caused by the FBO ping-pong and the shader switch... What's best/"least worse"? ^^

m_fboBasePass = new HT_FBO(width, height);
m_fboBrightPass = new HT_FBO(width, height);
m_fboBlur0Pass = new HT_FBO(width/HT_Constants::BLUR_STAGE_0, height/HT_Constants::BLUR_STAGE_0);
m_fboBlur1Pass = new HT_FBO(width/HT_Constants::BLUR_STAGE_1, height/HT_Constants::BLUR_STAGE_1);So I have downsampled FBO's for blur passes, each with linear filtered textures. I guess that's the reason why my edges are jaggy; but not because of the downscale, but because of the blur which is probably not strong enough. I'm using this gaussian blur shader:

The banding in the sky is intended?
I suspect 16 bit textures or 16 bit framebuffers.

You could eliminate the 1.0 weight multiplications of the kernel by leaving them unscaled, unrolling the loop and multiplying by 1/16 on the final result. That could also increase the precision.

hardtop

06-29-2006, 06:47 AM

yeah sorry, 800x600 is for debug purposes.

about the banding in the sky, I don't know, I think it's the texture depth.

as for the final point, why not, but I don't think that would prevent me from having that blocky effect I see on my bloom

any other ideas?

zed

06-29-2006, 11:16 PM

A/ youre mixing fixed function stuff eg glEnable( GL_TEXTURE_2d ) with glsl shaders
B/ a better blur is to first blur verticle + then with that resulting texture use as an input to blur horizontal, repeat this a few times

hardtop

06-30-2006, 06:38 AM

A/ I disable shaders first before enabling states/rendering:

HT_Graphics::SetActiveShaderProgram(HT_Constants:: SH_NONE);

B/ I noticed it's not only nicer, it's also kind of faster: with a 7x7 kernel size gaussian blur performed vertically then horizontally, I only sample 7*7 = 14 times, and I get a nicer result than when I sample 3*3 = 9 in one pass, with no noticeable performance loss.

Check out www.hardtopnet.net (http://www.hardtopnet.net) to see the results!

HT

k_szczech

06-30-2006, 07:08 AM

The more times you downsample your base texture the more 'blocky' effect you'll get.
So, if using 3x3 kernel for 4x downsapled image you can expect to have such effect.
I'm also using 4x downsampled image, but 15x15 kernel. Here is the result:

http://ks-media.ehost.pl/images/boats/screen/1027.jpg

Four hints:

#1
I see you are performing brightpass first - why doing it on full-size image?
Combine brightpass + downsample into one pass - you will save memory bandtwidth

#2
Split kernel into vertical and horizontal (I see you allready done that while I wrote this post).

#3
My kernel is 15x15 but, I'm using olny 18 texture lookups for that - 2 times 9x1 kernel. How do I implement 15x1 kernel in just 9 texture lookups?
I combine texture lookups into pairs (except for the middle sample). If you use linear filtering then:

#1: I'm doing it already :) I thought about that yesterday. The brightpass I'm now using is texel = texel³. this gives a nive contrast effect.

#2: also done that yesterday

#3: nice move, I'll give that a try, it'll save even more samples

#4: I've seen many people doing so, but I never thought of doing that.

I'll stick to a smaller kernel than 15x15 I think, with V/H split 7x7 kernel I got 14 samples lookups... Uh oh well, 18 for a 15x15 is maybe interesting all the same ;)

I've got a question for you: I'm using two blur passes consisting of 2x downsample and 4x downsample with 7x7 gaussian blur.

I've seen many implementations (such as Humus) using 4 passes (1x, 2x, 4x and 8x).
Do you think I should use only one blur pass (4x probably) with a bigger blur kernel, or stick to 2 passes with a lighter kernel? I mean performance-wise and quality-wise.

Thanks a lot for your interest

HT

k_szczech

06-30-2006, 09:51 AM

I'm confused - do you ask for downsampling or blurring?
I'm using single pass 4x downsample (linear) and then 2-pass blur (15x1) on that image - that's it.

I think you answered my question: the second choice is what you chose, and that's what I'm going to test.

Thanks, I'll post feedback :)

HT

hardtop

06-30-2006, 10:23 AM

I tried your solution #3 (18 lookups for 15x15 kernel) but to no avail, I must have missed something. Do you think you could post a more comprehensive example on how to do that? I think I messed up with the coordinates, which are expressed as float in the range 0.0 - 1.0

In second approach you do 4 texture lookups in fragment shaders - each texture lookup samples texture in the middle of 4 texels, thus performing 2x2 blur, and shader averages thexe 4 lookups performing 4x4 blur.
As for implementation - my shader is not aware of screen size - I just pass 4 texture coordinates at each corner of fullscreen quad from my application.

Right now I don't have time to write anything more detailed - I'm sure you'll figure it out. I hope I'll find some time to post a few coding articles on my webpage soon.

macarter

07-01-2006, 01:33 PM

Blocky artifacts are a result of the limitation of the 2x2 box filter used for downsampling with automatic mipmap generation and the standard bilinear filter used to upsample the bloom texture when producing the final image. The fastest artifact free method for bloom generation I have found uses a combination ZbuffeR's suggested automatic mipmap generation for the first stage of bloom filtering to reduce the number of texels to be passed through the additional stages of gaussian blurring. Each gaussian blurring stage also performs a two to one downsample. Since each gaussian blur pass reduces the number of texels by a factor of four the time required to do extreme blurring is quit reasonable. The gausian blurs also repair the blockiness intoduced by automatic mipmap generation. The final blurred image is then upsampled and combined with the original image using a bicubic filter as documented in GPU Gems II. The bicubic filter avoids blocky aliasing artifacts associated with bilinear filtering.

knackered

07-02-2006, 07:05 AM

like the wake effect k_szczech, very nice.
how're you doing it?

k_szczech

07-02-2006, 10:52 AM

It's fake. :)
I haven't implemented it properly yet. One of the nearest releases should have this fixed.

knackered

07-02-2006, 03:50 PM

just visited your site and downloaded your videos...wow, now that motorboatfury game does look very good indeed, well done.
couldn't be bothered with the 'email me to download the exe' thing....put me right off.

hardtop

07-04-2006, 02:04 AM

Okay; what I currently do is I render once downsampled to (w/2, h/2), then blur 7x7, then downsample again to (w/8, h/8) - this limits the number of processed fragments a lot - then blur 7x7 again. I then blend everything: base + blur2 + blur8

Here's what I get:

http://www.hardtopnet.net/images/bloom2.JPG

http://www.hardtopnet.net/images/bloom3.JPG

I'm not using mipmaps, I'm just rendering to smaller viewports. I think the result is quite nice, but I'd still like to find something faster... I'm thinking of performing HDR rendering soon, so I don't want my bloom effect to clog performance, as there is still HDR tone-mapping to perform.

HT

KRONOS

07-09-2006, 05:01 PM

I'm also currently implementing a hdri pipeline and this thread was a great help. My problem now is with tone mapping: I'm having a few problems with it's implementation (I've googled and searched OpenGL's forums, and I keep finding diferent approches).

I calculate the average luminance of my scene into a texture and generate it's mipmaps, so that the final mipmap has the avg luminance. Now what do I do with it?

Here (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directx9_c/HDRLighting_Sample.asp) it says that the bright pass should already be tone mapped. But then, how is the final image composed? Should it be baseScene tone mapped ADD glow? Or baseScene ADD glow then tone mapped?