Thorsten wrote in Mon Aug 14, 2017 5:04 pm:I had the impressions drivers don't actually execute the code as written but run agressive optimization at shader compile time - and that sometimes does weird things.

I have no good advice how to isolate this further to give meaningful feedback to driver developers. The practical solution seems easy enough - don't use the agriculture effect while the driver exhibits the problem.

I agree - I have many more issues with the Mac drivers than with Linux - they usually go away when Apple updates the driver, and sometimes come back in a different form at a later date - usually I just find the combination of slider settings that makes it go away

Hooray wrote in Mon Aug 14, 2017 5:48 pm:Note that my responses wasn't intended to imply that I disagree with anything you said, it's just that most GLSL drivers these days are based on the same compiler back-end (LLVM), so there are various ways to look at the generated code, and even to tell the optimizer to be less aggressive - furthermore, a GLSL-enabled debug environment will provide much more GLSL related information than we'd typically see in fgfs or osgviewer, even with all debugging flags enabled.

As far as I'm aware, while the other open source Gallium based drivers (Nouveau, Radeon, LLVMpipe etc) use the LLVM compiler, the Intel driver has it's own GLSL compiler as Intel had issues getting the LLVM compiler to work with their driver.

abassign wrote in Mon Aug 14, 2017 7:53 pm:I have first described various tests, amongst which the one that replaces the value matrix with a constant all works regularly if the value ranges is from 0 to 1. If the value is much greater than 10 the defect colour is near a dark brown, if the constant value is less than zero, the defect appears with a cyan-colored image.

can't there be a clamp put in place that if the value is less than zero it is clamped to zero?? is there a bug on the other end of the scale, too, that needs to be clamped at a certain high value?

"You get more air close to the ground," said Angalo. "I read that in a book. You get lots of air low down, and not much when you go up.""Why not?" said Gurder."Dunno. It's frightened of heights, I guess."

Personally I suspect there is a NaN generated somewhere, or something similar.Could you test what happens if you replace dist with a fixed value?

That much seems straightforward, but the NaN isn't generated by the algorithm as you can see it when you open the file (otherwise it'd be there for all drivers), it's generated by the algorithm as it runs after it has been optimized and compiled by the particular driver.

Also again - the same line runs perfectly fine in a different shader (the block in which it is is the identical).

Anyway - I believe enrogue's test with software rendering in addition to what I've known before from my test setups pretty much concludes the issue.

If anyone still wants to test variants of distances or smoothstep functions you're welcome, I don't think it'll go anywhere and I'll remove myself from further such discussions.

Thorsten wrote in Tue Aug 15, 2017 5:11 am:That much seems straightforward, but the NaN isn't generated by the algorithm as you can see it when you open the file (otherwise it'd be there for all drivers), it's generated by the algorithm as it runs after it has been optimized and compiled by the particular driver.

after thinking about it again I realized it also might be a clamping problem. For some operations a value below zero or greater than 1 could cause these kinds of problems, even if it is just a marginally small fraction outside the 0.0 .. 1.0 band like 1.000001. This could be handled differently between drivers if one clamps it by itself an another relies the inputs are valid.

erik wrote in Tue Aug 15, 2017 7:30 am:For some operations a value below zero or greater than 1 could cause these kinds of problems, even if it is just a marginally small fraction outside the 0.0 .. 1.0 band like 1.000001. This could be handled differently between drivers if one clamps it by itself an another relies the inputs are valid.That said, it could still be just a driver bug.Erik

In my programming experience say is a "drive bug" is how to say i do not know so i do not do anything ...!In short, it is not a great answer sincerely and does not solve the problem. In this case, it seems odd to me that it is a trivial drive problem because even though I replace the "smoothstep ()" code with the equivalent function ... I get exactly the same problem!

This fact should make us think differently and look for the problem elsewhere. For example, if I replace:

You see that when the variable is negative you have the error (cyan color and vibrating black spots) otherwise, agriculture seems to be correct.This obviously makes me think of some problems related to the length (relPos) function that determines the length of a vector named "relPos".Could this function, in other types of drives, correctly handle something that in the case of an Intel for Linux (?!) Drive is not handled in the same way, or is this the correct function to get the distance?

Reading the distance determination problem, I found the "distance ()" function that finds the distance between two points I tried this way:

It works the same way as the function: "float dist = length(relPos);" so the problem is elsewhere!

My impression is that the distance value varies too fast to handle the next interpolation that is the "Transition". In fact, if I remove the "Transition" shader option with a value of less than 5 (it seems to me that it works by simulating a kind of diffusion effect or fog ...), everything is back in place!

It reduces the variation frequency of the "texel" variable, which then becomes fragColor (via the fragColor = color * texel + specular; statement)

This explains why the problem is becoming smaller at high altitudes and at lower angles, in situations where the frequency of distance variation tends to shrink.

How to reduce or eliminate the problem

1. I think it's okay (however, it is possible to observe some artifact) to remove "* (1.0-smoothstep (2000.0,5000.0, dist))" is the one that works best and does not seem to give artifacts. In this case, the local variation of the "texel" is rather contained and is not re-evaluated on the "Transition" shader process ".

2. Much more radical is to replace "agriculture-ALS.frag" with another program, I used "terrain-ALS-detailed.frag" renaming it with "agriculture-ALS.frag". In this case, everything works similar at "agriculture-ALS.frag" but some features are not exploited.

I personally prefer the first solution, but maybe in time, putting hands to "transition.frag" can also solve the problem for less powerful cards such as intel.

I strongly suggest that you don't consider 'random deletion of lines' a valid strategy to create rendering code in the future.(...)We have quality settings for this very reason - if your setup can't handle something, you can dial quality a notch down.

The story of Maxwell's equations does not tell how he found them ... we see them in the form of Oliver Heaviside, so we forget that Maxwell used the Laplace method to construct his theory. The Laplace method is based precisely on determining the characteristics of a mechanism, hidden inside a closed box, observing what happens by moving, blocking etc ... the elements we can access from the outside. So I had to figure out where the problem was in this case ... otherwise I would say, in a superficial way, that it is a drive defect!

Obviously I also took the problem as an OpenGL exercise, but at the end I learned a very important lesson:

Never say that a drive has a defect, maybe it is more likely that it is the program that uses it to be defective in a certain operating environment.

The reason behind this statement is simple: a drive is continually updated and improved, but essentially does not change functionality, a program is always new and therefore the development time per line is much lower than that of a drive...So it's a good starting point to think that the defect is in the program and not in the drive. Then, of course, it may not be so, but it is still much more improbable.

In our case it is clear that I could understand one thing: it is very likely that the defect is not in "agriculture-ALS.frag" but in "transition *", and the problem is not an incorrect code, but something related to high frequency of variation of the distance that undermines the calculation algorithm.My is just a hypothesis, obviously I can go wrong, but if it was true I would not be surprised.

However, as long as I have to work with the current HW configuration and the problem will not be resolved, I will make that little change that allows me to fly with pleasure.

Last edited by bugman on Tue Aug 15, 2017 8:31 pm, edited 1 time in total.
Reason:Please do not quote the entire previous post.

i posted a link to an article some time back... this article is written by a video driver writer... in that article, it is stated that many of these companies take a lot of games and such and look at what they do... so many of them are doing thing so wrong that the driver actually sucks up the game's driver code and modifies it in memory to correct the problems...

is that happening with flightgear's code and any video drivers? i don't have the first clue but knowing about it being done leads me to believe that it may very well be being done here... or maybe this is a time when it isn't and it could be... i can try to find the link again... john carmack, of doom fame, is the one i originally saw post the link... on twitter, i think... the article/blog made for some very interesting reading

"You get more air close to the ground," said Angalo. "I read that in a book. You get lots of air low down, and not much when you go up.""Why not?" said Gurder."Dunno. It's frightened of heights, I guess."

(Sort of forced to respond in case anyone else reads this in the future and falls for it).

So it's a good starting point to think that the defect is in the program and not in the drive. Then, of course, it may not be so, but it is still much more improbable.

If you know a line of code executes fine for all but one driver, and the (literally) same block of code executes fine on the same driver in a different effect, elementary logic would suggest that it's pretty unlikely that the problem is with the code.

Suspecting a code problem with that kind of information has nothing to do with sane reasoning.

In our case it is clear that I could understand one thing: it is very likely that the defect is not in "agriculture-ALS.frag" but in "transition *",

To those of us who actually work with GLSL it is actually far from 'clear' how a bit of code that doesn't even run can cause you problems, but, well, the standards of reasoning seem to differ here.

and the problem is not an incorrect code, but something related to high frequency of variation of the distance that undermines the calculation algorithm.

A distance is a distance is a distance - it varies with a meter for every meter an object is more distant. It's a smooth, perfectly linear function without any wiggles and hence the word 'frequency' can't be applied - distance doesn't oscillate with distance. So the sentence doesn't even conceptually make any sort of sense.

My is just a hypothesis, obviously I can go wrong, but if it was true I would not be surprised.

I on the other hand tend to adhere to the old-fashioned view that code needs to be compiled and run to cause any problems, and I would be completely and utterly surprised if problems can be traced to code that's never compiled or loaded into the GPU.

Thorsten wrote in Wed Aug 16, 2017 5:38 am:*sigh*(Sort of forced to respond in case anyone else reads this in the future and falls for it).

Do you know how important it is to try to be humble?!

Perhaps that is the first lesson to learn before writing a single line of code, especially if you work with complex systems such as GPUs rich in "mysterious" micro-code, with a parallelism difficult to understand for human minds. Certainly it is an exciting challenge, which makes us believe we can be powerful, but full of deadly traps.

If we were all humble, it is not unlikely that we will better organize our work. The project could have the goals in order to make better use of the time to all of us who are dedicated to this project.

So many times I think it makes sense, but the very way of working this way?For the graphics I'd love From a product of the highest level, stable and already managed by a community of developers.For FDM code would focus only on jsbsim which I find very good and would start to change it (the project is currently stopped and maybe it's time for a fork) to incorporate some excellent ideas Yasim.

I would work on a binding with other languages such as Python (see case is the language of the best software of recent years in FGFS ... osm2city), Java etc ... so as to expand those who want to develop their planes and new features.

And much more ...

But in the end everything can be done only if all of us are absolutely humble!

And now, as usual, I ask you, with humility, to write on this post only on topics relevant to the issue discussed here.

So far, the need to be humble sadly did not influence the code I was trying to write - on the other hand, I found that debugging responds very well to logical reasoning.

Are you seriously suggesting that if you write a piece of nonsense like that code that's not part of the effect causes the issues you see, nobody points out that this is plain wrong because we need to be humble?

This is a technical environment, code doesn't care for your attitude or beliefs, we're trying to find out what's wrong not trying to live in a monastery.

For the graphics I'd love From a product of the highest level, stable and already managed by a community of developers.

So would I - when are you coming up with the money to pay them all to work full-time for FG?

Richard wrote in Tue Aug 15, 2017 11:36 pm:Obviously I also took the problem as an OpenGL exercise, but at the end I learned a very important lesson:In my experience the quality of drivers varies, there may be implementation differences - or there could be a problem in the fg shader but unless the bug can be proved[1] it remains a theory.I've spent days trying to figure out why light point sprites don't work with my Radeon - it's frustrating but as yet I'm still working on it.[1] Usually the best proof of a problem is to find, explain and fix the problem.

It is true what you say, and I found myself with the same problems, but I often noticed that what we consider a software defect with which we must interfere is a defect born of the way we use that software.By definition all the software is incomplete, "Gödel's incompleteness theorems" has demonstrated it.So there is no saying that a way we use a drive is correct and it is not said that the drive communicates it explicitly.

If I look at the Agriculture* code, I note that it introduces a particular function, which combines the distance with a particular effect. I must say that in the past, when I had an NVIDIA (now defunct) on my PC, I had already noticed something strange in the agricultural areas, such as excessive frequency of the scene changes that gave rise to artifacts.Observing the other frags quickly (but I would not be surprised if I say something wrong ...) I did not find a code that correlates a property with the distance that is then re-processed by "Transition." Therefore Agriculture* is the result of an experiment that is not compatible with all of the OpenGL implementations.

The Intel GPU is very common and Fgfs an open source application, where Linux is essential, I believe it is right to correct the problem.

By definition all the software is incomplete, "Gödel's incompleteness theorems" has demonstrated it.

No, it hasn't. Do you actually know what it says? A GLSL shader certainly does not meet the criteria of the theorem (which is concerned with self-referencing systems).

The Intel GPU is very common and Fgfs an open source application, where Linux is essential, I believe it is right to correct the problem.

So do I - and since the problem has been identified to be very likely with the driver because it occurs only with that particular driver and can be made to go away by changing driver (enrogue has done some very good work to confirm that idea), that's where the problem should be fixed.

While I do believe the evidence suggests this is best addressed driver-side, if anyone wants to give it a try, it may (or may not) be possible to find a workaround.

Provided the code can be re-structured (clamping added, order of operations changed...) that it

- doesn't dramatically degrade performance- gives the same graphics output as the present code- does not create new problems with a different driver

I'll be happy to commit such a patch. Mutilating or otherwise degrading the code won't be considered a fix for obvious reasons (and we may picture certain people who suggest such mutilations argue in a different thread that we should under no circumstances make existing working code worse to fix problems of exotic graphics drivers ...)

Not to raise false hopes, from personal experiences the changes to find such a solution are perhaps 20% if you know very well what you're doing - the usual case was that I could not find any solution.

(And, to spell out the obvious - I won't try it not because I'm mean but because I can't - I don't have a system with an Intel GPU, so I can't test what helps or not)