Recommended Posts

If I have a huge static VB that will not fit in video memory:
1) Does the whole thing get placed in agp memory, or will it be split up between them?
2) If the whole thing is in agp, is the performance then similar to if it was a dynamic VB instead?
Links and stuff on this kind of info would be appreciated.
Thanks,

Share this post

Link to post

Share on other sites

How big is your VB?! Even on some stupidly massive terrain engines I've not actually managed to create enough geometry to bother a modern card [smile]

If you're creating it in D3DPOOL_DEFAULT then it'll simply fail to create if there isn't enough VRAM.

If you're creating it in D3DPOOL_MANAGED then the runtime should handle the shifting to/from VRAM. Exactly where it exists at any point in time is largely unknown to the application (iirc, one of the queries allows some basic knowledge of where though).

The only use-cases I'm familiar with when paging is for entire resources (aka "all or nothing"); managed pool maintains a sys-mem "shadow" that it will page in/out of VRAM accordingly; I'm not aware of how (or even if) it splits the resource into chunks.

Regardless of that, having such a huge single resource is likely to murder performance. If you've optimized the vertex/index ordering to get a near-linear fetch sequence it might not be so bad, but if you're getting random access on a resource that isn't entirely in VRAM then... well... ouch [smile]

I don't know how relevant it is for the latest-n-greatest hardware, but the advice from NV/ATI a year or so back was for 1mb dynamic VB's and 4mb static VB's. Those sizes generated optimal performance/utilization.

hth,Jack

0

Share this post

Link to post

Share on other sites

My terrain engine allows up to 2049x2049 vert meshes and uses 40 bytes per vert. This comes out to ~168MB. I haven't hit my card's limit, but was wondering how things would be handled on a card with less mem.

Would a static VB in managed mem perform the same as a dynamic VB?

Good point on the random access remark. What I currently have is my mesh is broken up into 129x129 sized meshes, which I went with so I could use a 16-bit index buffer. Is this a reasonable size?

It just occured to me that I can uses bigger chunks and just offset with BaseVertexIndex in DIP.

Share this post

Link to post

Share on other sites

I would imagine that even if the runtimes resource scheduler can cope with such a large VB you could still get better performance by breaking it apart.

Say you create a 16x16 grid of 128x128 patches (which, as you comment, is a good approach for 16bit IB's). If you leave these as D3DPOOL_MANAGED you give the resource scheduler much more granularity. I'd put money on this being a more favourable setup from the runtimes perspective [smile]

Presuming you've got a decent (quadtree?) culling system, you're likely to have a lot of frame-to-frame coherancy with which patches are actually required. Thus if the patch is already in VRAM then the runtime is likely to leave it there.

Quote:

Would a static VB in managed mem perform the same as a dynamic VB?

You can't have a D3DUSAGE_DYNAMIC / D3DPOOL_MANAGED resource - so its a bit of a pointless comparison.

I've been working on a design for the most efficient (or close to [wink]) terrain renderer lately. A quadtree where the top-level node is 128x128 and sub-dividing down to ~4x4 regions works well. Where you want more than 128x128 you just muliply the number of top-level nods - hence my original statement about a 16x16 grid. A traditional quadtree has a single root node and 1x1 leaves - but its not difficult to break that rule [wink]

I'd highly recommend looking at a multi-buffer approach.

hthJack

0

Share this post

Link to post

Share on other sites

The ideal size for your vertex buffers is around 4 MB, and for your indices either 65x65 or 129x129 give the best results, depending on the exact scale of the terrain. (This last point is more related to geomipmapping style LoD.)

What this means is that the optimal perf point is where you have multiple chunks in a single VB, but draw a single chunk per draw call. For added points, you can fuse multiple chunks into a single draw call if they're in the same VB. Harder to do, but gives you absolutely beautiful batching behavior -- you end up with a small number of very large batches. In your case, you can fit 6 chunks into a 4 MB buffer, and 3 into an index buffer. So the absolute best case is two draw calls per vertex buffer, drawing only the set of patches that's visible.

I don't know if that's too vague a description, if you need more details feel free to ask.

0

Share this post

Link to post

Share on other sites

There's no guarantee where a buffer will be placed. A dynamic buffer can be copied to card RAM, for example. Normally AGP memory isn't larger than card memory, and I don't believe that the buffer will be split between them, so I would imagine that the allocation will fail. IIRC managed only loads complete resources, so that won't help either.

Are you planning to draw that entire buffer each frame? 168MB at 30FPS means reading 5GB of data a second just for the vertices (assuming no repeated readings, which usually happen). That's more than PCIe can provide anyway, so you'd have to put it on the card. Even then you're likely to have trouble keeping up a high frame rate.

Like Jack suggested, breaking into patches and culling them is a good way to go.

0

Share this post

Link to post

Share on other sites

Guest Anonymous Poster

Oh no. I have a culling and lod system already in place, and my entire mesh is already broken into groups of 129x129 verts. I was under the impression that fewer VBs were better, so I was considering using just 1 VB and indexing into it as needed. But I'm guessing rendering any part of it would send the whole thing to the card if it wasn't there?

Quote:

You can't have a D3DUSAGE_DYNAMIC / D3DPOOL_MANAGED resource - so its a bit of a pointless comparison.

I worded my question horribly. What I meant to ask is sending static data to the card perform the same to sending dynamic data?

Although I am wondering why you can't have a dynamic and managed resource. Is managed mem not the same as agp memory?

btw, the answers have been great and thorough, thanks again.

0

Share this post

Link to post

Share on other sites

Share this post

Link to post

Share on other sites

Fewer VBs are better only up to a point. If they're too large they make memory management harder. Someone mentioned 4MB, which would be a good way to break things up. Although in your case you might actually make do with smaller buffers, since shuffling them between system and card RAM will be easier. It'd probably be best to actually test this.

Managed memory is not like AGP memory. Managed memory is simply a system RAM backup of a video memory resource, with an automatic mechanism to load it when necessary (which might pass through an AGP copy if the card doesn't support direct memory access). AGP memory is a special area of memory, that's accessible to the card and isn't cached by the CPU (and therefore is slower to access by the CPU).

The way I see it the reason dynamic buffers can't be managed is simply because there's not a lot of meaning to have a RAM backup of something that you're recreating every frame.