[*] Changes the GetBuffer() format from BYTE. to uint32_t. This changes the render target format from palette indices to BGRA8.[*] Introduces two new rendering variables (dc_light + ds_light) that allows dc_colormap to point at the original palette instead of light remapping with GETPALOOKUP. This effectively allows light levels to be rendered with fixed_t precision.[*] Updates all the software renderers (R_DrawSpanP_C, vlinec4, etc.) to output RGBA8 instead of a palette index. This means the transparency functions do true alpha blending, and the opaque functions use dc_light/ds_light for true color shading.

I'd like to add the following things before doing a pull request:

[*] Currently only works with the Direct3D 9 target. This is because the GetBuffer format change affects the platform specific frame buffer code.[*] Replace all the assembly render functions with SSE2 compiler intrinsics. My current patch only tests this for two of the most critical functions.[*] Maybe evict any palette completely from the renderer by changing the format of GetColumn from BYTE to uint32_t as well. This would enable true color textures, but I'm a bit cautious about this because I'm not sure if Doom is doing any kinds of 'trickery' with dc_colormap beyond GETPALOOKUP.

Since there's a lot of work involved in completely finishing the patch I'm wondering what ZDoom's stance is on all of this. In particular:

[*] Replacing the assembly render functions with compiler SSE intrinsics. My assembly skills aren't remotely good enough to beat a compiler, but I do know to write SSE 2 intrinsics that should beat the performance of any assembly I'd personally be able to write. As a bonus, such a removal should make it significantly easier to refactor the render global variables and functions into classes.[*] There is a slight performance cost of using 32 bpp for output. In particular for the Direct3D 9 target where the final palette lookup is done by a GPU shader. I'm not sure there will be any real performance loss in the end if GetColumn is also modified to uint32_t, because then there's no palette lookups done at all by the software renderer.[*] My primary reason for doing all this was to get rid of the banding effects caused by GETPALOOKUP and its ugly palette colors at darker levels. This makes the game look a lot prettier in my opinion (bit more like GZDoom, but diminishing light intact) - yet, if someone has a desire to have the game look exactly like DOS doom they might not like the change.

So what do you think? What would it take to get such changes accepted - if at all?

Graf Zahl wrote:Lots and lots of work - work nobody here has time for.

No offense Graf, but please, could you show more empathy in your responses?

This guy just registered, put some efforts to help improving the engine.. I think the least one - especially a developer - can do is to welcome the attempt, and not being rude... Personnally, I would feel hurt to submit such an attempt and being received by this manner..

I'm certainly too sensible, but I didn't find any politeness in this response and it bothers me a little bit...

Oops. Sorry, I didn't notice that there was a link. This is still up to Randi as the software renderer is not my business. But still, as I see it there's a major problem in here that may well be enough to render the thing unusable:

The engine has to be able to switch between paletted and true color mode. In my eyes this is essential. This cannot be done with a #define.

About the define in my branch, that could be split to different functions with a little more work. It essentially just requires that the GetBuffer abstraction can return either one. I suppose that would also fix the issue with regard to keeping the old assembly render functions for the version using a palette render target.

Just to be clear: I'm not suggesting my current branch is pull request ready right now. It is more what it would take on my behalf to make it so.

I don't know if this was in relation to the assembly standard with ZDoom or if you wrote some of your own, but as far as the paletted routines are concerned: The point of the assembly isn't to try and beat the compiler via better code layout. The point is to beat the compiler by using self-modifying code, which isn't feasible with a high-level language. Without the ability to bake rarely-changed but non-constant values directly into the functions, a lot more time is spent shuffling data between registers and memory because the x86's seven registers are not enough. Via self-modification, the only memory accesses that need to occur during the inner loops are directly related to texture mapping. (A processor like the PowerPC might have enough registers that writing the assembly is pointless, since it has 32. Too bad those are a dead end in the home computer market.)

dpJudas wrote: I'm wondering what ZDoom's stance is on all of this

There is interest, but like Graf said, it needs to be selectable at run-time.

I fixed the blurry texture lines issue (how I could have missed that I dunno - maybe I need glasses). It was basically my SSE code having a big/little endian issue. The native resolution crash I'm not able to reproduce, but it might be because of the way I had hardcoded the video mode list to a few select resolutions. There is an issue with zdoom where it cuts off video modes if it finds too many are available.

I should soon have a new version ready with true color on/off toggle in place, if you'd like to help out with some more testing.