The point was also made that D3D allows mapping of textures but yet doesn't suffer from any of these hypothetical reasons-for-objection.

Tell me, how D3D allows mapping of swizzled textures in any meaningful way on Windows? I've never heard about such mechanism.

Originally Posted by mhagain

Operating systems and drivers may differ, but the underlying hardware is still the same.

No, the underlying hardware is not the same. NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other, as I already mentioned.

Originally Posted by mhagain

It's great to theorize about the way things might work internally in hardware, but such theories don't really hold up in the face of a working example that refutes them.

Show me that working example.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

Tell me, how D3D allows mapping of swizzled textures in any meaningful way on Windows? I've never heard about such mechanism.

Ask Microsoft and the hardware vendors. The point remains that it does happen, and it happens without any the objections being raised affecting it. I'm willing to come back to that point as often as is necessary.

Originally Posted by aqnuep

No, the underlying hardware is not the same. NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other, as I already mentioned.

That is not what I meant. Of course different vendors have different hardware, and of course different hardware generations from the same vendor may be different, but that's something that current mechanisms also have to deal with - so it's not relevant to this particular item.

It's also the case that for a given PC with a given generation of (say) NVIDIA hardware, it doesn't matter what the OS is - that NVIDIA hardware is the same. Likewise for a given PC with a given generation of AMD hardware or a given PC with a given generation of Intel hardware.

It's also the case that the purpose of a API abstraction is so that you as the developer do not have to worry about things like "NVIDIA hardware works differently than AMD, and both work differently than Intel. Also, even a single vendor's GPUs might change the mechanism from one generation to the other".

And it's also the case that this is not a problem for D3D. No amount of objections can detract from the fact that here is an example where it is not a problem and where it works.

How do you know that this mapping mechanism doesn't give you a pointer to a memory which just has a linearized copy of the texture data? Neither in OpenGL nor in D3D there is guarantee that when you map a memory area then you actually will directly write to that area. The driver might just allocate a new piece of memory, copy the texture data there (unless you asked DISCARD) and then when finished it just re-uploads it. Believe me, this will happen in most (if not all) cases because of the following reasons:

1. If the texture is tiled/swizzled (which is true for almost all textures, except for buffer textures or compressed textures that have kind-of "raw" data in them) then you have to do a copy in order to allow linear access.
2. If the texture is in memory not visible to the CPU (which is true for all, except a small range of video memory) then you have to do a copy in order to allow access at all.

So if you think about, the API might look different in case of D3D, but it is actually the D3D equivalent of pixel buffer objects, except that it does both-ways communication at once (which can even backfire at you as you may perform unnecessary reads if you don't use DISCARD and NO_OVERWRITE properly).

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

There are currently two primary ways to update a texture in OpenGL, both using glTexSubImage2D but with or without a PBO bound.

(1) Without a PBO you write into system memory and call glTexSubImage2D; the driver copies the data off and transfers it to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call; if the texture is not currently being used for a pending draw call the copy off can potentially be skipped and the driver can transfer immediately).

(2) With a PBO you write into the PBO and call glTexSubImage2D; the driver transfers from the PBO to the texture at some arbitrary future point in time (which may be immediately but is before the texture is next used in a draw call).

Under both of these ways, and so far as OpenGL is concerned, any hypothetical top-secret proprietary vendor-specific internal representation does not exist. I am not suggesting that change if textures were to be mappable, and I do not know why such a focus was put on it seeming (or being made seem) as if I were. The OpenGL API as it is exposed to the programmer has no business dealing with that kind of detail, and it should stay that way.

Let's look at what making a texture mappable can offer to both ways.

For without a PBO the scenario should be easy and obvious. Instead of writing into your own system memory pointer you write into a pointer provided by the driver. This pointer may or may not be a direct pointer to the raw texture data, and - here's the thing - it does not matter which it is. The driver manages that part of it for you. If it can give you a pointer directly to the texture data then that's what you get. If it can't then you get a pointer to the driver's own internal backing storage. But either way, it's internal driver behaviour and the mechanics of it are completely irrelevant to this suggestion. What is relevant to this suggestion is that instead of having to go "raw data -> your storage -> driver storage" you get to go "raw data -> driver storage"; i.e. you get to avail of the reason why glMapBuffer was provided for buffer objects; avoiding an extra memory copy.

This is emphatically not a replacement for glTexSubImage2D from system memory data. It is expected that there would still be cases where glTexSubImage2D is still the most appropriate code path to use, or where any potential performance advantage from avoiding the memory copy does not matter (texture loading - assuming use of glTexStorage - would be one such example). The intention is to provide an additional option that drivers may provide a more optimal path for, and that programs can take advantage of in cases where that additional performance is important and significant to them.

For with a PBO it's less clear and I don't believe that the suggestion has merit in this case. First of all you're not updating a texture, you're updating a PBO (the driver updates the texture from the PBO). Secondly there is a clear use case for PBOs which this suggestion doesn't meet (and doesn't pretend to meet) and that's asynchronous pixel transfers.

By the way, "copy the texture data there (unless you asked DISCARD)" is untrue; this copy is also not needed if a texture were to be mapped with write-only access (which it is expected would be the normal case). In the worst case, all that the implementation needs is to allocate some scratch memory and give you a pointer to that; the implementation looks after everything else.

What is relevant to this suggestion is that instead of having to go "raw data -> your storage -> driver storage" you get to go "raw data -> driver storage"; i.e. you get to avail of the reason why glMapBuffer was provided for buffer objects; avoiding an extra memory copy.

TexSubImage without PBO is actually raw data -> driver storage.
Also, when using PBOs you don't have to first create a system memory stuff that you'll copy to your PBO, you can directly use the PBO in the first place. I don't know, however why people don't do this in the first place.

Originally Posted by mhagain

By the way, "copy the texture data there (unless you asked DISCARD)" is untrue; this copy is also not needed if a texture were to be mapped with write-only access (which it is expected would be the normal case). In the worst case, all that the implementation needs is to allocate some scratch memory and give you a pointer to that; the implementation looks after everything else.

That's not true. If I map a buffer range for WRITE_ONLY, but not DISCARD/INVALIDATE, then the user might write only a single byte to the range or even worse, some disjoint sections inside the range thus when transferring back with your approach you would copy back junk data from places that the user didn't write. Thus WRITE_ONLY does require a readback, as the driver cannot know what part of the mapped range will actually be written to and what part is left untouched. That's why DISCARD/INVALIDATE was invented. Otherwise there wouldn't be any point in having them in the first place.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

Mapping textures is a feature that everybody has had for a long time. That's how Mesa accesses texture storage, Mesa drivers must internally expose the map/unmap interface for textures. All Mesa hardware drivers from DX7 to DX11.1-level hardware fully support and implement the interface. It's also the fastest codepath for uploading/streaming textures.

The only problem with OpenGL is that it doesn't expose component ordering, i.e. you don't know if a texture is internally stored as RGBA, BGRA, ABGR, or ARGB, RG or GR, etc. Also you don't really know the bpp either, because if you ask for GL_RGBA16, the implementation is allowed to give you a GL_RGBA8 texture. And if you ask for GL_LUMINANCE8_ALPHA8 or GL_RGB8, you can get GL_RGBA8 as well. The GL map/unmap interface just needs a way to query this info, so it's not a big deal.

I don't need to map/unmap textures in OpenGL, because I don't use OpenGL, I implement it. However if I used OpenGL, it's something I would definitely want.

The only problem with OpenGL is that it doesn't expose component ordering...

Why everybody seem to ignore the problem of tiling/swizzling? Yes, a software implementation doesn't have to care about. Neither does an implementation that only allows mapping linear textures. But as mentioned before, sampling linear textures is way slower than sampling tiled textures so what you save at upload time you lose it multiple times, each time you actually use the texture.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/