but i don't get, what is the purpose of fixing conditional rendering when occlusion query is so uselessly slow.

Because you're not supposed to query it. This is in fact the entire point of conditional rendering: so that you don't have to induce a CPU/GPU synchronization, thus allowing the GPU to be very out of sync with the CPU.

What Graham seems to be saying is that AMD's drivers like running very out of sync with the CPU, regardless of the load. That's their prerogative. Thus, inducing a synchronization is something that should be avoided. Conditional rendering is a means of avoiding it.

In short, don't query for occlusion; use conditional rendering to determine whether to render something.

03-04-2013, 11:10 PM

Nowhere-01

ok, i'll wait until fix and try to see if it will actually work _effectively_. the reason i didn't understand it myself is because i've never seen conditional rendering working effectively. with modern nVidia GPU's occlusion query is ready in 0-1ms, so i couldn't see any benefit. and with AMD GPU's, where occlusion query result is really delayed, i've never seen it working. but i assumed it would always render objcects because query result is not ready by the time i try to render. with 100ms delay i'm not sure what is expected behavior.

so really, i should expect that conditional rendering will respond significantly faster? and delay of GL_QUERY_RESULT_AVAILABLE is caused solely by synchronizing it with CPU? in that case, i see more point in this extension.

03-05-2013, 01:51 AM

Alfonse Reinheart

Quote:

with modern nVidia GPU's occlusion query is ready in 0-1ms

But this comes from poor profiling, which you readily admit because you're not putting the scene under load. How long a query takes when you're not rendering much is irrelevant; what you need to know is how long a query will take when you render actual scenes. That's why it's always important to profile using data that is as close to the real thing as possible.

I'm not saying that in a real scene, NVIDIA's response time will jump to 100ms. But odds are good it's going to be rather more than 0-1ms.

03-05-2013, 02:22 AM

Nowhere-01

Quote:

Originally Posted by Nowhere-01

so really, i should expect that conditional rendering will respond significantly faster? and delay of GL_QUERY_RESULT_AVAILABLE is caused solely by synchronizing it with CPU? in that case, i see more point in this extension.

i'd like you to confirm or deny those statements directly.

Quote:

Originally Posted by Alfonse Reinheart

I'm not saying that in a real scene, NVIDIA's response time will jump to 100ms. But odds are good it's going to be rather more than 0-1ms.

at this point i cannot reproduce "real scene" because of the state my editor is currently in... but i did experiment and ran a test in which i had 300 objects scattered around scene, most of them were 8k triangles, each divided into 4 surfaces(means 4 glDrawElements calls per object). most of them were in the frustum, occluding each-other and being occluded by bigger objects. occlusion framebuffer was 256x256 depth-only. with GeForce 560Ti, it worked nicely, occlusion query took about 10-12ms to finish(with glFinish immediately after rendering to occlusion query).i find that satisfying, and expected from modern GPU. HD 6670 took 25ms with glFinish for about 10-12 objects in the frustum and 100ms+ normal way, if i just wait until query is ready. because in that case, it takes several frames.

03-05-2013, 02:40 AM

Alfonse Reinheart

Quote:

occlusion query took about 10-12ms to finish. i find that satisfying, and expected from modern GPU.

For whatever application you're using, it's OK for you to be sitting there waiting on the query, doing nothing else for 12ms? You're basically stalling your CPU for 3/4ths of a 60FPS frame. If a 12ms CPU stall (not to mention the GPU bubble you're creating by not feeding it rendering data on time) is acceptable, are you sure you wouldn't get faster performance by just rendering the object without the occlusion test?

03-05-2013, 03:01 AM

Nowhere-01

Quote:

Originally Posted by Alfonse Reinheart

For whatever application you're using, it's OK for you to be sitting there waiting on the query, doing nothing else for 12ms?

i never said i plan to use it this way. that was kind of extreme test to see, how it performs with a lot of big objects. i don't plan to have such amount of objects in the frustum and test all of them all the time. and no, i'm not using glFinish anywhere in final code, it was just a lazy way to check, how fast occlusion query becomes available. normally i render to occlusion query, do stuff and when ask for query results. without glFinish, in this test on nVidia, occlusion query still ready by the next frame. and for amd even about 10-12 objects takes 3-4 frames to be ready.

but i wouldn't care if conditional rendering was much faster(by faster, i mean able to use occlusion query result much earlier, than i do with GL_QUERY_RESULT). but i don't know what to expect from it, you didn't answer my questions about how it should perform, or how would you expect it to perform on AMD card after they'll fix it. do you expect speed similar to nVidia implementation?

03-05-2013, 01:23 PM

aqnuep

What you are saying is non-sense. Just because occlusion queries return their values later on one card than on another not necessarily has to do anything with the performance of the graphics processor. Maybe one queues up more work on the CPU before submitting it to the GPU.

What is this 100+ ms? I suppose it's CPU time. Well, guess what, you should use timer queries to figure out how much occlusion queries do cost in performance as that measures GPU time, not CPU time. I'll tell you that they probably don't consume any visible performance.

You are confusing the speed of an operation with the latency of an operation. These are two different things. Just because you have less latency doesn't mean that the GPU is faster. Not to mention that at least in D3D it is very common that the GPU is lagging two or three frames behind the CPU, in which case the occlusion query would also finish two frames late. Once again, this has nothing to do with speed but with latency.

03-05-2013, 05:29 PM

Nowhere-01

the question i repeat for about last three posts is:
should i expect conditional render to be able to access occlusion query result significantly earlier, than i do with glGetQueryObject? in the last post i specified - “if conditional rendering was much faster(by faster, i mean able to use occlusion query result much earlier, than i do with GL_QUERY_RESULT)”. is what somehow ambiguous?

why do i ask it? because i've never seen how does it perform on AMD GPU(it's confirmed broken currently). and i'm not sure what to expect with such latency in occlusion query.

i don't confuse anything, i've already learned that it doesn't direcltly depend on performance much. i just compared 2 GPU's to state how huge is the difference. is that somehow nonsensical? my older post may have been less sensible because i wasn't sure. but i asked and got corrected.

Quote:

Originally Posted by aqnuep

What is this 100+ ms? I suppose it's CPU time. Well, guess what, you should use timer queries to figure out how much occlusion queries do cost in performance as that measures GPU time, not CPU time. I'll tell you that they probably don't consume any visible performance.

100ms is the time occlusion query results becomes available in a very simple test(10-12 medium objects in frustum, 256x256 occlusion FBO) on AMD GPU. how did i get that? well i've rendered objects to occlusion query, then did the rest of scene processing and then at whe beginning of the next frame asked if query results are available for last rendered object with glGetQueryObject(..., GL_QUERY_RESULT_AVAILABLE, ...). they were available in 3-4 frames. with 30 fps, that means it had averagely about 100 ms of latency. that's what i care about, not about how much time it takes to render(i knwo it's neglectable). and i don't know how could you interpret my messages in such unreasonable way. maybe because i was not getting an answer and was trying to reformulate things constantly to specify in response to ridiculous pedantic misinterpretations. but it only made things confusing.

Quote:

Not to mention that at least in D3D it is very common that the GPU is lagging two or three frames behind the CPU, in which case the occlusion query would also finish two frames late. Once again, this has nothing to do with speed but with latency.

if i had 3-frames lag in rendering, would i see how objects affected by occlusion query pop-up? i may be incorrect, but i assume that if my application had 3-frames lag in render, then 3-4 frames delayed occlusion query results would be visually fine. but it's not, if i move camera, it's obvious that occlusion query result lags several frames behind rendering(i'm talking about AMD card).

if you decide to answer, could you interpret my messages in a more reasonable way? don't choose most backwards interpretation. english is not my native language, but you're ok understanding chinese, who uses google translate. i don't think i'm worse.

if i'm talking about occlusion query delay\speed\performance i mean the most important factor - how much time it takes for results to become available after i submit rendering commands.

03-06-2013, 05:50 AM

aqnuep

Okay, here's the things you should consider:

1. After performing all the occlusion queries (i.e. the glBegin/EndQuery part) you can call glFlush. That will most likely ensure that no further commands will be accumulated before it is sent to the GPU, thus the latency of glGetQueryObject(..., GL_QUERY_RESULT, ...) will be smaller.

2. When you use conditional rendering the decision is not done on the CPU, thus the CPU-GPU latency (in your case this huge 100ms) doesn't matter at all. All that will matter is the GPU-GPU latency, i.e. the latency between a) the GPU processing all commands between glBegin/EndQuery, and b) the GPU processing glConditionalRender(..., GL_QUERY_WAIT). You can expect that this latency is way smaller than what you've mentioned as it's not a GPU-CPU sync, but only a GPU-GPU sync. Sure, if you use GL_QUERY_NO_WAIT then if you didn't have enough work between glBegin/EndQuery and glConditionalRender, and here I mean GPU work, not CPU work, you still might not have enough time to get the query result, but as the GPU-GPU latency is way smaller, it's not really likely to cause a problem, and if it does, simply stick with GL_QUERY_WAIT.

Once again, just to emphasize it, getting the results back to the CPU (i.e. glGetQueryObject) is very different from getting the results on the GPU (i.e. glConditionalRender, or the functionality introduced by AMD_query_buffer_object). The key difference is that you don't have to worry about how "late" is the GPU compared to the CPU. It can easily happen that while your CPU-GPU latency is in fact 100ms, the GPU-GPU latecy could be way less than 1ms.

03-06-2013, 06:13 AM

Nowhere-01

thank you, that was the answer i was awaiting. i wanted to be sure, that i understood everything correctly. and i did, now i'm going to wait for fixed drivers and i will report results here.

i did experement with glFlush and glFinish. glFlush didn't affect latency noticeably in my case.