Mocha Doom TrueColor teaser...

I don't think you should bother with such micro optimizations. Personally I would approach speeding this up from a completely different perspective if I were working on a uni-paletted software renderer:

Given that we know all pixel RGB values in the scene come from a relatively small set of colors in the shared palette - I would optimize along the lines of batching together pixels in the frame by palette index (perhaps using something like a skip list, with the pixels ordered by Z-index), for processing by multiple threads on the one buffered frame. For example, thread A would process all pixels with color idx #1, thread B would process all pixels with color idx #2 (etc...). Each thread would then process each Z fade step only once as it navigates the skip list for it's palette index.

Share this post

Link to post

What you suggest would require an even more exhaustive search and memory bookkeeping for Z-depths -at worst, each pixel you see on the screen has its own Z-depth, if they are all far enough not to benefit from closeness. Lumping by color would have no sense from a workload balancing POV -you could just as well say "Thread 1/2 takes half of the pixels, thread 2/2 takes the other half". Syncing 256 threads 35 times a second? No thanks.

This would be, effectively, a first step towards a non-Doom software renderer. You might as well get rid of the whole column-based rendering merry-go-round and rewrite the renderer.

Share this post

Link to post

I got a weird idea. What if you were to use a 16-bit colormap that represented the same 8-bit palette being overlayed over itself with transparency? (similar to Hexen's TINTAB lump) NOT 16-bit RGB, but rather 16-bit indexed colors. Then you could use the normal 8-bit palette in rendering, but use temporal dithering and stuff for the mixed colors that are outside the main first 256 color palette.

The 16-bit indexed palette would have 65280 unique colors, since 256 of them would be redundant since they're being mixed with identical colors.

Share this post

Link to post

I got a weird idea. What if you were to use a 16-bit colormap that represented the same 8-bit palette being overlayed over itself with transparency?

So, in essence, make a 16-bit Boom TRANMAP lump? It could be done, but it would give very little visual advantage, and it could only be used to blend the One True Doom Palette (or colormap #0) with itself.

The actual TRANMAP is supposed to return indexes so that after "blending", a further colormap effect can be applied (e.g. two colors A and B map to color C, which is then mapped to color Z due to sector lighting). If you make the TRANMAP contain precise RGB values instead, you won't be able to use colormaps for the final column rendering.

To cover all such cases, you'd need either a whole load of LUTs (memory and cache killer), keep the 8-bit indexing until the very end (only the final colormap will be 16-bit), or go full-blown RGB processing.

As usual, it's always a tradeoff between speed, convenience (from a programming standpoint) and flexibility. You can have any two of them at a time, but not all three.

E.g.

Mocha Doom/Doom Alpha approach: it has speed and convenience, but it's not flexible beyond what their fixed colormaps allow.

Full processing approach like _bruce_'s chocodoom truecolor branch: it has full flexibility and convenience, but sacrifices speed.

A theoretical approach that has flexibility and speed would combine elements of both (e.g. dynamically generated colormaps, mixed colormap/full processing with caching whenever possible) but would not be convenient from a programming standpoint. MOcha Doom is actually verging on this, in order to achieve palette/gamma combined effects.

Sodaholic said:

EDIT: Oh, and what if you could assign assets their own palette?

It's possible, I've seen it done in a commercial game that used 8-bit resources on a 16-bit canvas, but in Doom you have the brick wall of the lighting effects, which would force you to use full processing. However, that game (Warlords Battlecry series) "cheated" by using resources that not only were 8-bit with their own palette, but also carried their own precalculated palettes for certain common in-game effects (e.g. to make sprites appear darker during night, "frozen", "turned to stone" or "on fire"). The Doom equivalent would be to have each sprite carry its own colormaps for all possible light levels, in addition to its own palette. The sprites in WBC had only 3 or 4 alternative palettes, though, not 32.

Edit 2: I kinda like this last idea. Anyone willing to prepare me a prototype sprite using a UNIQUE 256-color palette (not Doom's), e.g. a recolored imp? I will do the rest....my idea would be to have an extra set of lump marks e.g. XC_START and XC_END (eXtended Color), and have an extra lump for each sprite, e.g. TROOXC for imp (TROO). This lump would be just 32 colormaps in truecolor RGB format (32 x 768 bytes, 256 colors per map) or perhaps more. If the engine detects during loading that such a lump is available for a sprite, the sprite object will carry a reference to said colormap, and this colormap will be used instead of the common one for this sprite, thus allowing it to "break free", at least partially, from the One True Doom Palette.

Engines that don't have extended color renderers, can ignore it or try remapping the extended palettes to the Doom one.

Share this post

Link to post

What I was thinking is that there could be a fake "extended" palette, in what I was describing earlier. There'd be a fake palette of up to 65536 "extended palette" indices (likely less to account for redundancy), and each of these indices would contain 2 "normal palette" indices each (normal colors A and B).

There would be 2 video buffers. The render buffer that uses the extended palette, and the display buffer that uses the normal palette. The render buffer would be rendered using the "extended palette" indices. The results would be transferred to the display buffer, at which point it would display both normal colors A and B in a dithering pattern.

Do you understand how this method is intended to work, or was I too unclear in describing it again?

Share this post

Link to post

Do you understand how this method is intended to work, or was I too unclear in describing it again?

A practical example would be better, I think. What if I want to mix two colors from the One True Doom Palette (for ease, let's say that color #0 is RGB #000000 and color #1 is RGB #FFFFFF, and that their "middle" color, RGB #808080 does NOT exist in the palette, though it can appear in intermediate calcuations. Let's say that the closest is color #255 with values #909090, just to fuck up with indexes ;-)

Exactly what would be rendered on screen? And what indexes would appear in your process, during all phases?

Share this post

Link to post

Let's say that this is the palette: index 0 is #000000, index 1 is #FFFFFF, index 2 is #909090, and index 3 is #707070. If you want the index that is a mix of 0 and 1, which the intended result is #808080, what you would do is dither indices 2 and 3 together.

In terms of implementation, it's all arbitrary, all the engine itself ever does is use 2 palette indices per colormap index and dithers those together. As for generating the colormap itself, I'll explain the exact procedure later (I'm currently at my cousin's house and don't have access to some tools that I could use to create an example with).

Share this post

Link to post

So in essence you're proposing real-time dithering for the Doom engine, while the display is still strictly 8-bit? That has nothing to do with hicolor or truecolor modes at all.

Apart from all the practical problems of implementing it (e.g. you need a minimum continuous surface with exactly rhe same "dithered color" to apply, so that means columns closer than a certain distance), dithered graphics in general look fugly in Doom (e.g. textures). Plus, if you really bothered to implement all of this real-time dithering, it would be a waste and pointless to use with a hicolor or truecolor display: you could just as well do full RGB processing instead.

Maybe you had an effect similar to this in mind?

This one actually uses (relatively) full-resolution pre-dithered textures that don't pixelate at any scale.

Share this post

Link to post

Share this post

Link to post

So in essence you're proposing real-time dithering for the Doom engine, while the display is still strictly 8-bit? That has nothing to do with hicolor or truecolor modes at all.

I know, but I figured it was somewhat related on the topic of extended colormaps.

Maes said:

dithered graphics in general look fugly in Doom (e.g. textures).

I'm not talking about dithered assets at all, but rather the screen itself being dithered. This would look acceptable at 320x240, and even moreso at 640x480. This is because the dithering would be as small as the screen's pixels, not larger like assets being scaled.

Take a look at PrBoom's lighting interpolation (which it only offers dithering) for an example of how I mean the dithering would be applied on the displayed screen. The difference here is that instead of just dithering existing 8-bit colormap indices together, it would rather be using a fixed number of colormaps but the color indices themselves would be dithered.

Maes said:

Maybe you had an effect similar to this in mind?

[ZX Spectrum Doom video]

This one actually uses (relatively) full-resolution pre-dithered textures that don't pixelate at any scale.

That's generally what I meant. Thing is, it'd actually look DECENT because the user would be using MUCH higher resolutions than that homebrew program was using.

Share this post

Link to post

No it wouldn't. Not even the sky looks "acceptable" at vanilla resolution with dithering on. As for higher resolutions, just try forcing 256-color mode on Windows (can be done, with some tweaking) and then try finding an app that won't barf or try changing the color depth: it will be force-dithered to 8-bit (or even to 4-bit, if you manage to break the drivers), and tell me how it looks.

Or try dithering a high-resolution screenshot from an OpenGL port to 256 colors with an image editor.

It might be OK for a one-time gimmick mode but playing continuously with it on...I think there was a port for early Windows CE devices that did use a real-time dithered display.

Share this post

Link to post

I beg to differ. Seriously, what is with all the hate on dithering around here at those resolutions? It looks just fine! SNES games used dithering, and the SNES had an even lower resolution than vanilla Doom!

Beyond that, remember that I mentioned temporal dithering. That means that it inverts the dithering pattern every screen refresh, thus making it look like a solid color.

Share this post

Link to post

Beyond that, remember that I mentioned temporal dithering. That means that it inverts the dithering pattern every screen refresh, thus making it look like a solid color.

On a platform without a guaranteed screen refresh rate? Good luck with that. The most likely outcome would be that you get a very noticeable color flickering even if you hack the refresh rate to be higher (e.g. output 70 frames for 35 tics, with alternating colors). And with refresh rates capped at 60 Hz for modern monitors, that's a no-go.

Flickering and HAM color effects on 8-bit and 16-bit platforms worked because the games were coded in a single-tasking environment and with a host of dedicated hardware and strong assumptions about the video hardware (output was on 50 Hz or 60 Hz CRT monitors with their own natural scanline & color blending, driven precisely and palette updates were performed each cycle no-matter-what, with the aid of the hardware). The same effects look fugly in emulators, and for good reason.

Plus, in order to get a good dithering, you NEED to render a full-color scene somewhere in memory -even if you don't display it. Even to produce pre-dithered graphics you need a non-dithered source, and dithering is quite time-consuming. What would the minimum unit to dither? A visplane line? A patch column?

As I said, it would be a kind of expensive "fancy" effect, not something that you can pop in in lieu of full color support.

Share this post

Link to post

What you suggest would require an even more exhaustive search and memory bookkeeping for Z-depths -at worst, each pixel you see on the screen has its own Z-depth, if they are all far enough not to benefit from closeness. Lumping by color would have no sense from a workload balancing POV -you could just as well say "Thread 1/2 takes half of the pixels, thread 2/2 takes the other half". Syncing 256 threads 35 times a second? No thanks.

I think you are missing the obvious - there would be no need to sync the threads. Thus its a true divide and conquer subdivision of the work which is immediately suitable for paralleling.

This would be, effectively, a first step towards a non-Doom software renderer. You might as well get rid of the whole column-based rendering merry-go-round and rewrite the renderer.

So what if it is? You can't optimize effectively unless you are willing to change approach. Anything else is what I would class as micro-optimization through which you can't ever hope to beat your lower bound.

Share this post

Link to post

I think you are missing the obvious - there would be no need to sync the threads. Thus its a true divide and conquer subdivision of the work which is immediately suitable for paralleling.

Well, if mid-work syncing was required, that would suck terribly and would not even be worth discussing here.

Even if it doesn't, however, you'd still need to sync the threads on a barrier at the end of the frame (the very least), and most thread barrier implementations are not exactly linear in efficiency with an increasing number of threads -let alone that using 256 threads with a memory-intensive operation would be a terrible parallelization strategy, at least in general-purpose RAM. On a GPU, that's another story.

DaniJ said:

So what if it is? You can't optimize effectively unless you are willing to change approach. Anything else is what I would class as micro-optimization through which you can't ever hope to beat your lower bound.

I'd call that "changing the renderer into something else with its own ailments and quirks", rather than hyper-optimizing the old one.

Share this post

Link to post

Syncing at the end of the frame would only consist of waiting for all threads to complete. As such you don't need anything more complex than an atomic counter which each thread decrements when done. Consequently I don't see this as having any serious impact on the efficiency of the algorithm.

Also, each thread would have its own work data and there would be no need to sync internally either. Furthermore, the total amount of storage needed is known ahead of time (as you say, a function of the maximum number of palette indexes and unique Z coordinates). All that varies is the amount allocated to each thread.

Share this post

Link to post

Certainly from a complexity analysis predictability POV such a strategy seems alarming upon first inspection. However, look a little closer at the worst-case scenarios (i.e., all unique palette indices and Z depths) and you should immediately notice how unlikely that is in practice. I agree its an unconventional strategy but I've yet to see anything that would convince me that it wouldn't achieve a significant performance improvement.

Now, traditionalists might argue that any parallel subdivision of a work should strive to evenly distribute that work across N threads. However that doesn't really matter and in this case ignores the properties of the job which can be exploited for efficiency.

I'm not suggesting you rip Mocha Doom's renderer apart to achieve this. My intention was to simply highlight the existence of an alternative approach that could theoretically achieve higher performance.

Share this post

Link to post

I'm not suggesting you rip Mocha Doom's renderer apart to achieve this. My intention was to simply highlight the existence of an alternative approach that could theoretically achieve higher performance.

I think this would just be a thin layer over a full RGB processing techniques. With colormaps, once you lay a pixel down, you're finished with that pixel.

Laying only color (or palette) information down and applying the lighting a-posteriori (using whatever parallelization strategy works best) using recorded Z-depth, implies using both additional storage for Z-depths, and in essence doing full RGB processing (brightness or hue modifications) to existing colors.

It might be a good performance enhancement strategy for ports such as _bruce_'s truecolor chocolate branch, but it will never match the efficiency of simply writing down a pixel based on a value from a colormap. *

* With some reserve if you are trying to do really complex coloring or alterations and you don't only store Z-depth, but also other information, e.g. special sector properties such as colored light. In that case, it might be more efficient and less troublesome from a coding POV than using colormaps.

Needless to say, such a rendering scheme would have to be applied in three steps: one pass for solid walls, one pass for visplanes, and one pass for sprites and masked/translucent walls, each with their own Z-depth information and priorities.

Link to post

Share this post

Link to post

Any index approach that has more than 8 bits, requires 2 bytes for a color index, and that could easily be coded as 16bit RGB. Once you got 16bit RGB coding, then transparency effects can be done by RGB, and intensity effects are even easier.

The only reason to avoid RGB would be to use 9 bit indexed lookup tables for transparency and light. If you want to commit the memory then use 9 bit indexed colormap and transparency maps that hold 9 bit indexes. Have to make your own 512 entry Doom colormap too.
Would have to use a 9 bit pixel framebuffer and translate it to RGB using a final colormap lookup (to RGB).
Fast, better color, but a huge waste of 7 bits per entry and it is not supported by any hardware.

The dithering described seems to be at least as complicated as handling RGB.

Video cards and framebuffers have already transitioned to RGB, and if the draw library is going to translate your palette colors to RGB anyway, then why dither palette colors. For less trouble you can draw in the native RGB format and any effort you would make to dither could be used to select a blended RGB color.

Transparency effects have always involved reprocessing framebuffer pixels, even with colormaps. That is the problem that prevents using transparency lookup tables with 32bit draw.

I find that playing at 800x600 with better RGB colors gives adequate speed and improved appearance. The color space is more important than the screen resolution, so putting execution time into RGB processing of pixels can be balanced against screen resolution. The game still has low-res sprites and textures that are readily apparent, so I find that screens above 1024 does not improve things much.

Share this post

Link to post

I have a weird idea for hicolor doom using nothing but 256 color VGA cards.

Run doom in DOS (probably a modified Chocolate Doom), and have two computers running the game in sync simultaneously in just single player. They both use different sets of graphics with different palettes, the graphics are derived from the stock IWAD art, but adapted for these new palettes. The VGA cards are totally synched, the primary one has a grayscale palette of values from 0-63 and the secondary one uses a palette that contains only color information, so hue and saturation, but nothing else.

The secondary palette shall be derived from a true color colormap, but will be optimized to use the most common and similar of the 256 values (when taking the 18-bit RGB color limits into account). Note that the secondary version does NOT contain any value/luma/brightness information.

Then have the two VGA cards output YUV video signals, use ONLY the luma from the primary, and use ONLY the chroma from the secondary, and then convert it back to RGB to be used by a VGA monitor.