Since there are a lot of cool screenshots flying around, here's an update on Marathon's *very slow* progress.

Three new features last week:

1) HDR rendering. Very easy to implement (with GLSL + EXT_fbo), but difficult to setup right. It's the same problem FarCry had, once you've built something that assumes LDR rendering (e.g. lighting & material parameters), when you switch to HDR you just can't expect a much better quality (actually it looked worse in most cases). So, we're now in a process to modify how our materials are defined, how the sun light behaves in different conditions, etc. The first step will probably be adding a float "intensity" setting along each 24bit color (so that everything will look ok both with and without HDR rendering).

2) Bloom. Super cool! Works with and without HDR, but obviously looks much better when you're dealing with fp values.

3) My favorite, gamma correction. When I started reading about HDR, it occured to me that I've never thought about gamma before. So, I started reading about that too.

a. You're forced to do it when doing HDR (the 24bit colors are in 2.2 gamma space, but all the calculations are done in 1.0 linear space and then there's tone mapping at the end, that brings it back to an appropriate space for viewing), else everything looks washed-out.

b. It is actually necessary without HDR too. If you multiply a 2.2 value with a 1.0 value in the shader (e.g. a modulation) it works, but when you're adding it fails (e.g. 64 + 64 = 88, instead of a 128 final value ). This affects anything involving an addition (specular addition, fog blending, etc). Of course, I couldn't believe that all this time I was watching a wrong image, but when I tried it the difference was amazing! Before, the specular highlights were picking hues from the surface color, but now are crystal clear. Fog looks more natural too, before it was "blending" with the background colors. The only problem is how to do it in the register combiners path. I obviously can't raise to a 2.2 power (but a power of 2.0 is a simple multiplication, very good compromise), I'm not sure if the internal precision will be good enough and I still have to do a final fragment shader pass to do the sqrt. We'll see.

c. Gamma-correct mipmap creation!!! That made a huge difference with most textures! Before, I was using Java's built-in scaling for rgba values and a custom filter for normal maps. But, guess what, you're *adding* values in the process! So, the proper way to do it is this (for RGB colors): Degamma the image and save the result in floating point precision, do the filtering and downscaling in fp precision, re-gamma and convert each LOD back to 24bits. Here's the difference:

Photoshop's filtering:

Marathon's filtering:

The original image contains black and white stripes (0 & 255). Photoshop is creating a, seemingly correct, 128 value. But that's not what the eye sees as half the intensity of black + white, but something much darker. The proper value is 186, which is produced if you do the gamma correction.

The coolest thing though is how I actually implemented the above. When I was deciding how to do it, I thought it was too much for a 100% Java implementation and it would slowdown the process (especially, since I had to do my own fp filtering now, without Java2D's very good implementation). So, I hacked the Marathon Engine to start without a window (just a pbuffer to get a GL context) and used GLSL, fp textures and EXT_fbo for off-screen rendering! A few hours later, after working around some of EXT_fbo's currect bugs/limitations and learning how cubic filtering works, it was ready. We now have an *almost instantaneous*, gamma-correct, mipmap creation process! And since it is so fast, I also ported the simpler normal map mipmap creation and I plan to do the height-map => normal-map conversion, add more texture types/targets, etc. Very helpful.

I noticed that your hdr rendering gave the artifacts I feared it would; color saturation due to downclipping to the rgb color space, most noticable in the sky in the first screenshot where there's a cyan band between the white (brightness saturated) and blue (correctly rendered) areas.

Any idea how to deal with that? =/

I thought about perhaps only doing black and white hdr rendering, then adding color in a separate pass (that's how I do the bloom/glow in wurm to avoid color saturation and banding), but that might end up using too many passes.

I noticed that your hdr rendering gave the artifacts I feared it would; color saturation due to downclipping to the rgb color space, most noticable in the sky in the first screenshot where there's a cyan band between the white (brightness saturated) and blue (correctly rendered) areas.

Any idea how to deal with that? =/

I thought about perhaps only doing black and white hdr rendering, then adding color in a separate pass (that's how I do the bloom/glow in wurm to avoid color saturation and banding), but that might end up using too many passes.

Yes, that's the problem with the bad material-lighting settings I mentioned.

The sky looks so bad for two reasons:

a) It's just a LDR gradient. In real world, the sky has much greater intensity than everything else. In a realistic rendering, clouds and the sun position would also affect this intensity.

b) The bloom is performed in 8bit precision, obviously for performance reasons. Tone mapping is actually performed in two places now, at the end of the normal rendering and when creating the initial bloom texture. I think that's where the problem occurs, either by the lower precision or bad selection of glow sources (which is done by a simple threshold value). The bands you see are created after compositing the bloom texture. Tone mapping the normal rendering has a pretty good result and the sky gradient is maintained.

I'm pretty confident that with proper settings everything will look much better. All I've done in those images is set some extreme values to the sun parameters and "play" with the overall scene intensity.

Indeed, most problems come from the bloom pass. There's lot of saturation, extreme glowing, objects that shouldn't glow are glowing, etc. I'm going to try a few things, add better controls of the overall intensities, thresholds, etc. Although the sky will be rarely visible, it is the first that should be "updated" to HDR, because when it is, it affects the whole image (especially when it is too dark in HDR space, everything else glows).

That's the same problem as I had in wurm. I solved it by doing bloom in greyscale with some funky blending.

Thanks, I'll try that.

Also, if anyone's noticed, the black pixels at the end of the hoplite's cope are there because the GeFX doesn't support fp rendering with blending or alpha testing. It is too slow for HDR rendering anyway.

OK, I implemented the gamma correction in the register combiners path too. It will be optional for the GeFXs, because it requires a separate fragment shader pass, to sqrt the colors, with a significant performance hit. I'm now sure that the precision is not good enough, but the final result is very good, at least there wasn't any noticeable quality loss in the final render. It will probably affect darker textures though. Also, I had to fight with the math order, to keep additions towards the end. The cost was one more general combiner, but the FX is super-fast with RCs anyway.

From today, about a year after we switched to a 100% GLSL pipeline, Marathon runs again on ATI cards with the 5.8 beta Catalyst!

The problem was character skeletal animation with vertex shaders, that uses a uniform array to hold the bone transforms. Previous drivers fail to optimize such shaders (the low-level code generated uses too many temporary registers) and fall to software mode. This bug fix tells me that they're working on optimizations lately, which is, er, very good.

Now, only a few, non-critical, bugs remain (that affect Marathon at least), but from my correspondence with ATI's developers relations, they should be fixed by the 5.9 release. Hmm, we might be able to release a tech demo this fall...

BTW, why isn't there an OpenGL discussion forum under Game Development Topics?

I'm using standard shadow mapping for the Marathon shadows. There's a single shadow map for the whole scene, which looks decent at 1024x1024 and almost pixel perfect at 2048x2048. There's also support for soft shadows when using the fragment shader pipeline.

It took me one day to have it working and ~2-3 months (!) to get rid of the artifacts. I started with an implementation of Trapezoidal Shadow Maps and then tried all kinds of tricks/optimizations to make it stable, to make it robust*, to minimize what needs to be rendered, to minimize the depth bounds, etc. It was a lot of work and brought my (not so good) math skills to the limit. It's working great, except a couple of extreme cases where the math breaks and we get ugly artifacts or even disappearing shadows. I'm either going to fight these cases a little more some time, or just limit the camera movements (which are completely free now).

Anyway, I'm pretty happy with the final result. For a while, I was thinking to do something even more complicated, like using multiple shadows maps to get perfect quality everywhere, but I found that what we have is enough. There're only two real issues now:

1) Additional lights (other than the sun/moon). The standard way to implement shadows in an engine is to render an ambient pass and then do a light pass for each light. Well, I made a hard decision back then and got rid of the ambient pass to boost the geometry we can render (the skeletal animated characters are very expensive). There's no need for additional lights currently, but if we ever need them, we'll have to do it in the shaders (e.g. two lights in a single pass). But this can only be accomplished with fragment shaders...

2) Disappearing bump, that is, there's no normal map detail where shadows fall, just the ambient "component". This doesn't matter in, say, Doom 3, but it looks awful in an open and shiny environment like Marathon, where you're supposed to see the effects of global illumination/radiosity. This cannot be fixed of course when you have dynamic lighting (we're a few generations away from that ), but after pressure from our artists, I added a little hack to make the diffuse component somewhat visible, even when the fragment is completely in shadow. The effect is wrong of course, but it's subtle enough and makes a pleasant difference. Also, because there's no ambient pass, we have free parallax mapping (when enabled) in the shadowed areas, which looks awesome on certain models.

Tell me if you need more details.

* The original TSM implementation works great for a Wurm Online kind of game, where the camera is almost at ground level and looks at a wide area, but not so good for a strategy game, where the camera looks downwards most of the time.

I take it your using projected shadows and ARB_depth_texture for hard shadows then? And the fragment shader pipeline simply applies a filter ontop of that? Or is the fragment shader pipeline a complete rewrite of the shadows? Im also interested about the speed difference between the two pipelines, a person said that the CPU implementation is infact faster (!) than that of the fragment shader, I dont know the validity of that comment so I was wondering if you might shed some light on that...

I was under the impression that each light required a single pass to generate the depth values, how are you combining two lights in a single pass?

I take it your using projected shadows and ARB_depth_texture for hard shadows then? And the fragment shader pipeline simply applies a filter ontop of that? Or is the fragment shader pipeline a complete rewrite of the shadows? Im also interested about the speed difference between the two pipelines, a person said that the CPU implementation is infact faster (!) than that of the fragment shader, I dont know the validity of that comment so I was wondering if you might shed some light on that...

Yes, I'm rendering the shadow pass in a ARB_depth_texture (with EXT_fbo now ) and sample that in the fragment shader (or in a NV_register_combiners shader). When using fragment shaders, you can choose between 4 different "shadow shaders":

1) Single sample, which is the fastest. This produces "antialiased" shadows on NV cards (with the built-in PCF), but blocky shadows on ATI cards.2) Hand-written PCF (4 samples). This makes sense only for ATI cards.3) "GPUGems" filter (4 samples). This is a filter I found in GPU Gems 1, which produces soft shadows but works only on NV cards (requires the built-in PCF to look correctly).4) 4x4 filter (16 samples). Takes 16 jittered samples to produce soft shadows, very slow.

The way I've done it, it's very easy to write any other shadow shader, so I'm planning to write the fancy "Percentage-Close Soft Shadows" when I get my hands on an NV40+.

I've no idea about the CPU implementation you mentioned, do you have any references?

Although none of the above are as good as the Trapazoidal Shadow Mapping technique...that thing looks terrific. Its definetly more complicated than the other implementations, but its advantages are definetly noticed. Thanks for the link.

spasi, found some articles that might be of interest to you, you might have come across some of them...

Omnidirectional lighting is not well supported with shadow mapping as cubemapping/spherical mapping has to be used, cubemapping requires 6 shadow maps (each required 2 passes) so 12 passes in total per light: This paper solves that problem:www.mpi-sb.mpg.de/~tannen/papers/cgi_02.pdf

Another completely different shadow mapping algorithm is called deep shadow mapping, this apparently works with translucent objects and gives very high resolution maps that are prefiltered (so no need for PCF filtering):graphics.stanford.edu/papers/deepshadows/deepshad.pdf

Omnidirectional lighting is not well supported with shadow mapping as cubemapping/spherical mapping has to be used, cubemapping requires 6 shadow maps (each required 2 passes) so 12 passes in total per light

Shadow mapping with paraboloid shadow maps is a well known technique with a well known limitation: the scene geometry needs to be sufficiently tesselated. It's a big drawback in many circumstances.

Bias calculations are done using ARBfp/vp in trapezoidal shadow mapping, this paper calculates the bias using dual layer shadow mapping

Cool technique, but requires two passes to construct the shadow map, using a fragment shader too.

Anyway, I've never had any serious artifacts in Marathon, only a few on surfaces almost parallel with the light direction, which are almost unavoidable (see AOE 3 demo*). Actually I'm using a little hack for this, I don't know how stupid, but works great: After calculating the trapezoidal MVP matrix, I'm substituting the Z-axis row with the one I'd have with classic shadow mapping. The final matrix is applying the trapezoidal transformation on the XY-axis (in light space), but does not distort the depth values.

Nope, horizon maps are used to create shadows from the bumps (on the lit side of a surface). The problem lies on the back side.

Coincidentally, I fixed this problem yesterday. Instead of letting the diffuse component show through the ambience (which is completely wrong and actually not visible at all on self-shadowed parts of the geometry), I'm modulating the ambient component with the Z component of the normal map (which relatively gives the sense of height). And with parallax mapping activated, I can even use the height map for more correct results. Luckily, I was able to fit the necessary calculations in the register combiners shaders too. Here's the difference (compare with above):

Another completely different shadow mapping algorithm is called deep shadow mapping, this apparently works with translucent objects and gives very high resolution maps that are prefiltered (so no need for PCF filtering)

Also, you could use seperable convolution to do soft shadows, this reduces the amount of pixels required to be touched from n² to 2n, although I haven't investigated this enough to know if it works or not with shadow mapping

Yeah, I'm using separate X/Y filters for blurring (e.g. bloom) and I also plan to test some kind of image-space filtering for shadow mapping. Though, I'm not sure if I could do it without an extra pass.

I've read most of the above in the past, but thanks for the links anyway!

* OMG, I couldn't believe my eyes! Yeah, it has awesome graphics, but they actually had the balls to use front face culling for the shadow pass to avoid artifacts! Just look under the buildings at their back side (enable camera rotation first), or under the walls in the second scenario. I'm sure the casual gamer won't spot it easily, but it's quite a significant error to spoil an overall perfect rendering.

Thanks for the pointers, I just skimmed over the article to see what they are good for, to get a gist of what the article is about, I didn't consider any of their limitations really.

Is there no way to combine the omnidirectional dual parabola shadow mapping with trapezoidal shadow mapping so that each face is rendered using trapezoidal shadow mapping by the actual technique to hold the shadow map is to use the dual parabola? Again, this is just me conjecturing...

Quote

Anyway, I've never had any serious artifacts in Marathon, only a few on surfaces almost parallel with the light direction, which are almost unavoidable (see AOE 3 demo*). Actually I'm using a little hack for this, I don't know how stupid, but works great: After calculating the trapezoidal MVP matrix, I'm substituting the Z-axis row with the one I'd have with classic shadow mapping. The final matrix is applying the trapezoidal transformation on the XY-axis (in light space), but does not distort the depth values.

Nice, but doesn't that cause the jittering effect of shadow mapping in large distances?

Quote

Yes, dynamic allocation of shadow maps (multiple sizes, multiple frustums) is the way to go for complex scenes. I wish I had time to waste on testing various such techniques...

Looks like im the one whos doing all the testing then

Im really looking forward to testing TSM and ASM and comparing the two results as both seem to solve the same problems that normal shadow mapping has...But my priority is getting omnidirectional lighting working with the minimum number of passes (maybe even using sphere mapping )

Is there no way to combine the omnidirectional dual parabola shadow mapping with trapezoidal shadow mapping so that each face is rendered using trapezoidal shadow mapping by the actual technique to hold the shadow map is to use the dual parabola? Again, this is just me conjecturing...

Hm, I don't think you can, because DPSM uses it's own coordinate mapping, it is not a standard projection that you could modify. Though, even if you could, there's would be no point. TSM (and PSM) are useful in large scenes with directional lights, where you want to distribute shadow map space where it matters the most (near the camera). DPSM solves the problem of rendering point light shadow maps, in totally different situations, usually indoors and with distance attenuation (only a few objects affected).

BTW, I've read a post from Stefan Brabec, that describes a nice optimization for DPSM. Instead of rendering the highly tessellated geometry in all the passes, you could just use it in the shadow pass (where it's necessary) and then render a low poly version of it for the light pass. It would still work without problems.

Nice, but doesn't that cause the jittering effect of shadow mapping in large distances?

No, the jittering effect is irrelevant, depth biasing is the issue. The problem with TSM is that it distorts the depth values and you can't use a constant depth bias for the whole scene. That's why the TSM authors used a fragment shader correction (or a vertex shader that approximates it, as an optimization), to fix that distortion. I'm basically doing the same, but without the overhead.

The jittering effect is a common problem to all shadow mapping techniques. Especially with PSM/TSM, when they fail to distribute the shadow map space efficiently (usually giving too much near the camera), the jittering effect is worsened on distant surfaces.

Im really looking forward to testing TSM and ASM and comparing the two results as both seem to solve the same problems that normal shadow mapping has...But my priority is getting omnidirectional lighting working with the minimum number of passes (maybe even using sphere mapping )

As I said, TSM should not be very useful for point lights. You've got a whole lot of other problems to solve first and you should concentrate on those. Basically, everything depends on the engine and the world you want to create. If point lights is what you need, I found a thread that might help you:

Thanks for the link, i've read some things by Yann L in the past regarding shadow mapping, but never in detail. Looks like im about to start doing so...

Quote

Keep me informed!

Sure...i'l probably either post back here to PM you, but it wont be soon as I need to do other things with the engine first. Also, im still a newbie when it comes to shadow mapping, so excuse the stupid remarks.

The above videos illustrate the improvements that have been recently made to the HDR implementation. I'll post tomorrow with details.

Warning 1: The frame capture was not real-time (any animation except the sun rotation is completely wrong).Warning 2: I didn't have time to use non-windows friendly compression, both videos require the WMV9 codec.

The following improvements have been made to Marathon's HDR implementation:

- Fixed all gamma correction issues. I had significant trouble with this, but after correcting a couple of stupid mistakes I did with the first implementation, it was quite simple. You bring everything to linear space when rendering (most importantly when texturing sampling - EXT_texture_sRGB helps a lot here) and you go back to gamma space at the very end.

- Improved the bloom quality in 3 ways:

a. All textures used for blooming are floating point now. With 8bit textures there was no way to conserve decent luminance values, especially from bright but tiny light sources. Simply moving to FP textures fixed this, with no significant loss in performance or memory (the bloom textures are small anyway).

b. A new blurring algorithm is now used. First a chain of bloom "mips" is generated (from 1/4x1/4 to 1/32x1/32 of the framebuffer dimensions) and then each one is gaussian blurred with the *same* number of samples. This has the effect of progressively fainting but also expanding the bloom. This way, even tiny bright spots can generate bloom on big parts of the screen and generally it looks much better.

One note here, the gaussian blur is separable and implemented as two passes (one horizontal and one vertical). But I wanted to avoid having two textures for each bloom level. So, I reused the initial bright-pass texture as a temporary buffer (e.g. 80x64 bloom => (horizontal blur) => 320x256 temp => (vertical vlur) => 80x64 bloom). The problem of course was that the vertical blur would grab samples from outside the current bloom level in the temp buffer. I fixed this by clamping the y texture coordinate to the appropriate point for each bloom level.

c. I figured out how to fix the color saturation mentioned by Markus on the second post of this thread. Code:

So, suppose we have a pixel (4.0, 2.0, 0.0) and we've set the threshold at 2.0. The first code would result in a bloom pixel (2.0, 0.0, 0.0) and the second (0.76, 0.38, 0.0). It's clear that, from the original yellowish, we get plain red with the old code, whereas the new code produces the correct yellowish color (the lower intensity is not a problem).

- When downscaling the scene luminance to find the average, I now also calculate the minimum and maximum of the whole scene. These have the following uses:

a. ATI cards do not support filtering on FP textures, so I'm using 16bit fixed-point for the bloom textures on ATI. The maximum luminance is necessary for an efficient conversion from FP to fixed and back.

b. It is possible to automatically calculate the tone mapping parameters from the min, max and average luminances, in a way that produces good quality in a variety of situations. I haven't tried this yet though.

Other than the HDR improvements, and if anyone noticed, the shadows in the videos are *true* area light shadows (especially in the 2nd, 32 jittered samples are used). The technique I'm trying is still problematic (produces excellent quality with great performance, but flickers a bit), so I'll post details when (if) I perfect it.

kapta, without HDR it runs on the minimum spec (GeFX+ and R9500+) at full speed. There are many settings of course, that can give you a good balance of quality/performance. With HDR on my 6800 AGP and high (but not extreme) quality settings I get an average ~45 fps. The most important issue is memory usage though. If you want everything maxed out, 256MB on the GPU may be not enough.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org