TOPIC: Advanced Lift Gain Gamma... could use help please.

...If above cannot be done, any brute force approach will have no different result than the AL implementation, only different accuracy and performance.
Reducing samples is basically that and you would not need random samples for an already random source/image.

How is the source random? If sampling from the frame buffer, it would be from the game's image render, which means it would certainly lack full ordered patterns, but it would still contain structure and similarities, as opposed to being fully random. So using random samples may improve accuracy, and possibly at the same or superior performance as AL (or possibly not). But until we get an answer from crosire about the possible ability to simply copy the frame buffer over as a usable texture, it really doesn't matter at this point; you're definitely right there.

LuciferHawk wrote:

By the way, source should mainly be C++.

*Smacks self in forehead* I should have realized that from the fact that we use .h header files. As I said, my brain doesn't seem to be working like it should.

The image is a "random" constellation of pixel for every frame. Structured sampling (an even distribution) gives you a better average than a random one might in its worst case. Plus, your average would differ for the same image when sampled twice if its sampled randomly.

Only if we have a forged, structured image in which by accident all structured samples fall into the same colored part of the image (so the image structure follows the sample structure) we would get the wrong average.

If it is C++ or C# I wouldn't mind seeing the source code and poking around, just to try and learn a little more about graphics programing in general by looking through some actual code. But I understand completely, crosire, if you aren't comfortable with giving the source out.

The source base is quite huge (contains a lot of components afterall, hooking, rendering, a full language compiler, ...) and not easy to get into by an extern individual. Plus it's closed source currently for reasons that were mentioned already (a search in the forum should bring those up). =)

DrakePhoenix wrote:

Also, back to the subject more at hand... I tried using crosire's code for full sampling of the entire frame buffer. I received an error when it tried to run (actually, repeated errors) that the loop had attempted too many iterations.

Try to put a "[loop]" in front of the "for" keyword.

[loop] for (int x; x < BUFFER_WIDTH; x++) ...

DrakePhoenix wrote:

On the subject of getting the frame buffer as a texture and storing it to another texture... how do games normally present the frame buffer in the first place? We obviously can access it and the information there, because we can sample from it. This suggests to me that it is already stored as a texture in its own right, isn't it? So wouldn't we really only need a way copy the texture en total without having to sample through it? Something along the lines of (psuedo code) manipTexture = frameBufferTexture?

The graphics API allocates a fixed buffer on the graphics card to render the game image into. That's the so called backbuffer. After the game finished rendering it requests a buffer swap. The graphics API then switches out the back and the frontbuffer, the frontbuffer being another fixed size buffer which the monitor reads to display the image. Those two buffers cannot be accessed directly, only the backbuffer can be written to.
ReShade waits for the game to request a buffer swap. It then kicks in right before that, copying the current backbuffer content (which is the one moved to the frontbuffer and thus the monitor in the next step) into a new texture. This texture is bound to the textures with the "COLOR" or "SV_Target" semantics in your shader code ("texture MyTex : COLOR;"), so sampling from those will return the backbuffer image. Each pass this procedure is repeated. ReShade copies the backbuffer to a second texture and binds that to the shader. There is another reason this is needed. It's impossible to both read and write from a texture at the same time (for obvious reasons). So things wouldn't work if the shader renders to the backbuffer (which ReShade defaults to, if no rendertarget is specified) and tries to sample from the backbuffer at the same time. Having a second texture inbetween solves that.

DrakePhoenix wrote:

Also I had another thought. Probably not a very good one, because my brain doesn't seem to be working very well today (I think I'm getting sick). But for trying to determine an approximate average of the scene's colors and/or brightness, what about taking and averaging a number of random samples, instead of ordered samples or sampling the entire set? So when calling tex2D, instead of passing it 0.5 or texcoord, pass it float2(noise(xVector), noise(yVector)). Run that multiple times through an iterated for-loop.

This sounds like a good idea actually. SSAO works similar. Simplified it casts rays in random directions to calculate the occlusion shadow and then uses a simple blur to hide the noisy artifcats that thus occur over multiple frames.

LuciferHawk wrote:

Is a copy (in source) of a renderTarget/texture to a texture with custom size internally the same as sampling it in HLSL?

In OpenGL it is, the API runs a pixelshader internally when copying between FBOs. Direct3D10+ has an API member for direct memory copying between texture, which could sometimes be a little faster. But the difference should be negligible if you sample the source texture using point sampling and thus disabling interpolation.

DrakePhoenix wrote:

the possible ability to simply copy the frame buffer over as a usable texture

It already is. Everything attached to shaders in ReShade is either a texture or a constant buffer (separate uniforms in D3D9 and a uniform buffer in OpenGL).

Is a copy (in source) of a renderTarget/texture to a texture with custom size internally the same as sampling it in HLSL?

In OpenGL it is, the API runs a pixelshader internally when copying between FBOs. Direct3D10+ has an API member for direct memory copying between texture, which could sometimes be a little faster. But the difference should be negligible if you sample the source texture using point sampling and thus disabling interpolation.

Thanks! That OpenGL uses a pixel shader internally is unfortunate. I wonder if we could still utilize D3D10+ memory copying. It is not about the speed but about the possible results. Loading a texture from disk into a 1x1 texture has a different result than sampling it into a 1x1 texture. The first is the one we would need for the cheapest average in case it is not flickering (which we cannot properly check unless we try).
Who knows, maybe even the "internal" pixel shader renders a different result than if we do it.

... snip …
Plus it's closed source currently for reasons that were mentioned already (a search in the forum should bring those up). =)

No need, as I'm sure the reasons are fairly standard fare (and even if they aren't, it's really your own business, not mine). But I definitely understand. I assumed it was closed source (since the source isn't available for download anywhere that I've seen), but sometimes developers are occasionally willing to release code to specific individuals anyway (rarely, but it happens).

As for it being large, so were the EQ and WoW emulators I used to dig into (though I never paid attention to the 3D programming code, just the back end). So I would generally ignore whatever wasn't of interest to me unless backtracing something through function calls leads me somewhere I wasn't expecting to go (which often happens). But since it would just be for a learning aid for me, and since there are, I'm sure, plenty of other learning aids for 3D programming, it really isn't necessary. I just often find it easier to learn when it directly relates to something I'm currently interested in in a specific way, whereas most tutorials and books and whatnot would deal more abstractly to try and make the information relevant to the widest audience.

I'm rambling a bit again... the short version: no worries, I would have been surprised if you had said yes.

crosire wrote:

Try to put a "[loop]" in front of the "for" keyword.

[loop] for (int x; x < BUFFER_WIDTH; x++) ...

I'll try that. Thanks. But that reminds me... In some cases I've seen “[flatten]” in front of if/else flow control segments. What is that for? I've noticed that sometimes the compiler doesn't seem to like simple if/else, and prefers instead #if/#elif/#else/#endif, but in other places it handles basic if/else just fine. Does [flatten] allow one to use basic if/else where the compiler might otherwise want to see #if/etc.? Or is it for some other purpose?

crosire wrote:

… snip …
… redirect to another section of post...
It already is. Everything attached to shaders in ReShade is either a texture or a constant buffer (separate uniforms in D3D9 and a uniform buffer in OpenGL).

First, yes I understand perfectly why you cant read from and write to something at the same time (even aside from the potential IO issues themselves). One of the first mistakes I ever made when learning C was to try to read from an array, and at each index then subsequently write to it. It should be obvious that the resulting array wasn't what I was trying to get from the processing

Anyway... So if we set up something like “texture MyTex : COLOR;”, then what we would get is a texture that contains what was in the backbuffer at the time the definition/assignment is processed, right? And obviously, MyTex would then be something that could be manipulated or otherwise processed, or passed on to other textures as well.

Which then brings the next question to LuciferHawk: Can we then simply take such a texture and use abstraction to reduce it down quickly and efficiently so that brute sampling can be limited to only a handful of samples? Something along the lines of defining a texture (say MySmallTex) with 5x5 dimensions, set that as the render target for one of the passes, and subsequently pass the MyTex texture to it. Then sample all 25 pixels from MySmallTex? And if so, do you think the accuracy loss (if any) and the performance gain (also if any) compared to your method in AL would be worthwhile? In other words, could that be a workable means of determining average color/luma in a cheaper way than how you currently do it in AL?

crosire wrote:

… snip ...
This sounds like a good idea actually. … snip ...

Well the way LuciferHawk described the issue I actually see his point. Particularly about how you could end up with a different “average” based on the same image (which highlights the accuracy issue with the method). In any such case, whether it be randomized sampling or even distribution sampling, there is the issue of accuracy vs. performance. The more samples you take, the better the accuracy, but worse the performance either way.

What I had in mind was the possibility that perhaps a random sampling might be more efficient (better accuracy at the same performance level, or better performance at the same accuracy level, or whatever). But I think that LuciferHawk is correct (now that I better understand what he was saying), that the pixels from any given frame are effectively random (at the least when taken as relative to any other given frame from the same game output). That being the case, I'm now not convinced that a random sampling would actually result in any increase to efficiency, as the number of samples necessary for accuracy would likely have to be just as high as with a structured or evenly distributed sampling method. Additionally, the accuracy of the result would not be static. In fact, a gaussian structured sampling might be superior to random sampling because of these issues (and I imagine LuciferHawk does something like that in AL).

This could be tested out though I suppose, as LuciferHawk could create a testbed rewrite of AL that uses random sampling instead of structured sampling (I can't at this point), and compare the accuracy and performance of each version, if he wanted to take the time. But with my almost non-existent understanding of AL so far, that's beyond me, so it would be his call if he wants to take the effort to do that or not (at least at this point in time). And it would require multiple test runs of the random sampling method to get a sense of the minimum, maximum, and average accuracy points of the method, since accuracy would be variable.

Anyway, I think that's about it for now (I believe I covered everything so far).

One last thing: I'd like to apologize for my long posts. I'm not skilled at being concise/succinct . LuciferHawk has shown much better skill at that.

I'll try that. Thanks. But that reminds me... In some cases I've seen “[flatten]” in front of if/else flow control segments. What is that for? I've noticed that sometimes the compiler doesn't seem to like simple if/else, and prefers instead #if/#elif/#else/#endif, but in other places it handles basic if/else just fine. Does [flatten] allow one to use basic if/else where the compiler might otherwise want to see #if/etc.? Or is it for some other purpose?

#if/#elif/#else/#endif are for the preprocessor. Those are no real language statements, they are evaluated before the compiler kicks in and can be used to change the source code according to custom text replacement definitions (#define).

So if we set up something like “texture MyTex : COLOR;”, then what we would get is a texture that contains what was in the backbuffer at the time the definition/assignment is processed, right? And obviously, MyTex would then be something that could be manipulated or otherwise processed, or passed on to other textures as well.

It contains the backbuffer contents of the previous pass (= the game backbuffer contents if in the first pass). All textures marked with a semantic (so either "texture MyTex : COLOR;" or "texture MyTex : DEPTH") are read-only. ReShade does not allow you to set them as rendertarget (for optimization reasons).

DrakePhoenix wrote:

Which then brings the next question to LuciferHawk: Can we then simply take such a texture and use abstraction to reduce it down quickly and efficiently so that brute sampling can be limited to only a handful of samples? Something along the lines of defining a texture (say MySmallTex) with 5x5 dimensions, set that as the render target for one of the passes, and subsequently pass the MyTex texture to it.

What I had in mind was the possibility that perhaps a random sampling might be more efficient (better accuracy at the same performance level, or better performance at the same accuracy level, or whatever). But I think that LuciferHawk is correct (now that I better understand what he was saying), that the pixels from any given frame are effectively random (at the least when taken as relative to any other given frame from the same game output). That being the case, I'm now not convinced that a random sampling would actually result in any increase to efficiency, as the number of samples necessary for accuracy would likely have to be just as high as with a structured or evenly distributed sampling method. Additionally, the accuracy of the result would not be static. In fact, a gaussian structured sampling might be superior to random sampling because of these issues (and I imagine LuciferHawk does something like that in AL).

Just use a pseudo-random input that stays the same over frames. A simple noise texture will do.

I think you skipped my last post crosire but we could also move that to skype?

crosire wrote:

Just use a pseudo-random input that stays the same over frames. A simple noise texture will do.

That would surely help to detach the random algorithm from the source image but still produce random results for a similar/same image though a similar image should have a similar/same resulting average.

For a good reason we use those algorithm for film grain noise. Only because it produces different results for similar images.

Random samples can improve accuracy but only if used as should be for that purpose: lets say we have 4 structured samples. To increase its accuracy on a "random" image we would need for example 4x4 random samples around the 4 structured samples and calculate their average. That would increase accuracy but also leave us with 4 structured vs 16 random samples plus the random retrieval instructions while having no advantage compared to 16 structured samples.

All random gives you for this particular problem is more instruction cost and possibly artifacts in the end result due to random averages.

@DrakePhoenix
Yes, only sampling an abstraction or representative fraction is the general idea. I can tell you that I currently get about 98% accuracy but I originally aimed for results that not only need to satisfy the average. The sweet spot for the average might be lower than that. However, I know I have an accurate result already to compare it with so yeah, lets try even lower resolutions. (1x1 solution would have been perfect for obvious reasons)
On a side note to lower accuracy due to lower resolutions. The low accuracy is the cause for flickering. An accurate result would not flicker. That of course does NOT mean that a non flickering result IS accurate

I want to thank you both for all of your help on this issue. And I particularly want to thank you, LuciferHawk, for your personal interest in it (you have your own reasons for that, I know, but I still appreciate it). I have gone to other forums for help with various things in the past (troubleshooting windows or a particular piece of software or for help with a game that I'm new to, etc.), and I regularly get responses from people who have clearly not read my posts, and/or are entirely condescending. I believe this is the first time that I have immediately gotten good and very real help right from the start and with no apparent condescension. I want you to know I appreciate that very much.

@crosire: Thanks for the clarification on the way coding of #if vs. if works, as well as the links. I'll read up on them some more later on. The partial code example you posted is pretty much exactly what I had in mind and was talking about, so I take that as a confirmation that something like that might possibly work. I'll try it out when I get the chance.

@LuciferHawk: What I'd like to do at this point is to take your custom.h testbed code, and use one abstraction resolution in the top third, and a different resolution in the middle third, to try and compare the two. This won't test the performance difference between the two, since it will be running both simultaneously and the performance loss would be the sum loss from both methods, but it will allow for a rough accuracy comparison.

I will also modify crosire's code to include the [loop] attribute to see if it will run beyond 255 iterations (I haven't tried that yet). If that works, then I can then use that as a basis for the middle third and move one of the abstraction resolutions to the bottom third (and use splitscreen left vs. right to keep the original image in place as well). That will let me test accuracy vs. a fully true average. My system will probably not handle the performance well at all, but if I get at least 2-5 fps I should be able to compare accuracy, even if I don't get a good sense of flickering, then I can test against flickering and performance on each resolution on its own. That should help me narrow on a good abstraction resolution.

In the end, my own goal would be to go with the minimum acceptable accuracy, because presumably that will allow for higher performance, which is important on my system. Laptops just aren't good gaming machines, even with dual graphics (sometimes especially with dual graphics). My next system I'm going back to full desktop with a minimum of 2 GPU cards. It's possible these days to get "laptops" that include full GPU cards (occasionally even two SLI or Crossfire linked cards), but for the cost vs. performance I don't see the point unless mobility is a crucial investment consideration (which for me it isn't right now). Anyway...

Lastly, I mentioned before that I may be getting sick... Now I'm fairly certain that I am getting sick, as I slept for about 12 hours last night and still feel a bit run down this morning. So depending on how bad that all gets, I may not get much done in the way of additional testing etc. for another week or two. We'll see I guess. But if I don't post anything new for a little while, please don't take that to mean I've lost interest

Just letting you know that I reduced the resolution on my bright detection algorithm and it works to a certain extent (being still almost as accurate - down from 98% to about 90%) and gives a huge performance increase (also for compilation). I think this was a very fruitful discussion mate

Update: Tailored it more to be useful for calculating average and corrected it up to have a little less performance (still high increase of performance compared to original) but even higher accuracy than my original algorithm, at least for average.

I will try to make the resulting values accessible in the Framework so it only has to be calculated once.

I tested out the new method, but the results rather surprised me. I set it up so that the top third would use one level of abstraction, the bottom third another level of abstraction, and the middle third would use a structured even distribution sampling. I ran it through in Guild Wars 2 and found that both top and bottom still flickered, but flickered differently from one another. The middle third was constantly full black only, with no changes.

This may have all been a result of my testing code itself. Following is the code I used:

The variables are set in the listed settings file. I can provide the settings and undef files if necessary, but they're very standard.

I'm not sure what I've done wrong with the code there. Perhaps it simply cannot work in so simplistic a fashion, or perhaps my code is not correct. My first thought is that it is trying to read and write to and from the same buffer. I noticed in the AL code that there are in textures (input) and out textures (output), and something similar may be necessary here. I'm not sure though. If you know what I'm doing wrong here, please let me know.

@LuciferHawk: I'm glad you've found the discussion useful. I look forward to the changes being released once you've finalized it. I've found the discussion quite useful as well. Although I still haven't gotten my shader to work the way I want it to, I have learned a lot about both the HLSL language and ReShade during this process.

That's all for now. I may be away from this discussion for a few days at a time for a little while (life stuff), but again I appreciate your help, both of you.

However, @LuciferHawk, I've been looking through your code for Ambient Light, and I'd like to make sure I'm understanding it correctly...

Basically, you have multiple passes. The first pass takes the buffer data, processes it (DetectHigh), then using abstraction crushes it down to 1/16th of the original size. You then make 12 sets of alternating passes in which you blur the image horizontally, then vertically, so that the combined result is a partially abstracted, then highly blurred image based on the original. All other AL effects are then processed after you're reached that blurred image point.

Is that essentially correct?

Also, I'm not sure I understand what the PS_AL_DetectHigh function does. I know that it is making a determination for the value of highR and then returning that value. But I'm not clear on what all the processing is for, other than to create the initial content of the alInTex abstracted texture. I'd appreciate a basic description of the function's job/purpose.

And lastly, you mentioned that you've now re-worked the AL code to be more optimized based on what's been said during this discussion. I'd love to see the revised code if you're willing to provide it prior to the next release update.

Just letting you know that I reduced the resolution on my bright detection algorithm and it works to a certain extent (being still almost as accurate - down from 98% to about 90%) and gives a huge performance increase (also for compilation). I think this was a very fruitful discussion mate

Update: Tailored it more to be useful for calculating average and corrected it up to have a little less performance (still high increase of performance compared to original) but even higher accuracy than my original algorithm, at least for average.

I will try to make the resulting values accessible in the Framework so it only has to be calculated once.

It's been a while since I worked on this at all, but I was looking through the code for AL from Framework 1.1 and I noticed that you're still using HQ as a single on/off option. It may not be reasonable/worthwhile for AL, but what about changing it to a generalize quality setting with 3 options. Set to 1 uses a LowQ, set to 2 uses normal, and set to 3 uses HighQ?

Sorry for the late reply drake,
I am thinking about removing the HQ option all together. It was only in for accurate results on screenshots anyway but with the latest changes, in terms of quality, there remain no significant differences.

I still have the global release of the detection results pending on my to-do list. Sorry for that.

I came across this topic while searching the averaging shader, and unfortunately there is no solution here, although judging by the discussion it was found. I'd really appreciate if someone will post the working performance-friendly averaging code without flickering. Thanks.