What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

dml wrote:So if anyone wants to have a go at it on the Falcon (or another retro box with a RISC chip inside), here's how the mapper works:

First you need to build a table of coefficients for quadratic equations, offline.

This involves solving and storing A,B,C terms in order to later query y=(Ax^2+Bx+C) for any (x), where (x) is essentially the scene (z) term for a given pixel, and the result (y) approximates (1/z). I'm using a table of 1024 equations but you can optimize in either direction to save space or gain range/accuracy.

It should be possible to use linear equations if the table is big enough, and perhaps save some cycles while still getting a decent(ish) approximation - but for the Falcon's DSP it's not much trouble to just do it properly.

Note that the table stores equations - triplets, not single values. This means you're performing a lookup on a set of curves - not single value samples.

Generating the table is a bit hard - it involves performing a best-fit on equations using a set of sample points on each curve. I use subdivision but random sampling may work. For most of the table entries, the same 3 points will converge to the best fit, but for entries near the ends of the table the choices will move due to clamping effects enforced on A,B,C for the legal fixedpoint range. It's important to be aware of this detail or you'll get stuck. There are a few gotchas involved in generating the table and due to the nature of best-fit algorithms, you can end up with a broken solver that looks like it is nearly working - beware.

Despite those problems, It's relatively easy to understand/test in floating point because A,B,C can be kept in their natural range. A fixed-point version however is much more difficult since the terms need to be normalized to maximize use of available bits, and for optimal precision they must be differently normalized. This part is a challenge but it can be shown to (just) work with as few as 23 bits + sign for all source terms.

For this to be efficient, you really need a RISC device with a multiply-accumulate and fast shifting capability. Or at the very least, a very fast multiplier and careful coding. Unfortunately the Falcon's DSP is terrible at shifting and does present some problems of its own here, getting it to work fast. Left as an exercise for the reader

...
now multiply x0 (1/z) by uz,vz and combine into texture address. uz,vz should be pre-normalized.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Interesting thing... that thread says the PSX has dedicated 3D hardware but the Jaguar didn't. That's actually not really true. The PSX does not actually have what most people would consider to be "dedicated 3D hardware". At least, not in a way that would distinguish it from Jaguar.

The main processor in the PSX is a MIPS R3000, which a basic, general purpose RISC processor otherwise known for being used in early Silicon Graphics workstations. Sony added a co-processor chip (the "GTE") that implements matrix math functions. This is used for doing your basic 3D graphics transformations on polygon vertices but has nothing to do with the actual pixel pushing. By comparison, the Jaguar GPU has similar matrix math instructions, so the two machines are on fairly even ground at that point.

The rendering loop of a PSX game is basically this:

Mike if you are reading this thread, do you have any idea how on the PSX version of Quake II they sidestepped the complex geometry lighting issue? DML hazarded a guess:

I guess it's probably quite clever and PSX specific - maybe they subdivide the faces efficiently and vertex-light it (although it doesn't appear that way to me, I didnt't notice mach banding) or maybe they use the hardware triangle engine to compose lightmaps with textures in VRAM on the fly.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

After experimenting with textures he has went back to flat shaded poly work.

DML wrote:FPS on 'fatal1' startpoint rose from 12 to over 16fps.

I'm going to keep hacking at this until I get completely stuck for ways to speed it up in TC mode, and then will switch back to texturing performance.

What map is Fatal1 startpoint on Quake II?

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

There are no textures in this one - it is flat-fill @ 320x160 / 16bit TC. This is the format I use for performance testing the engine code.

Anyway I think it's starting to run up against some hardware limits at 16/32, or at least design limits for what I've done with the program. I'm sure there are still ways to optimize it, but it's getting harder and taking longer with each try. The last optimization I tested was nasty, complicated and didn't really make much difference in the end... 1-2%. So I'm going to finally stop with this and fix the newly added bugs before looking at textures again.

Trivia: ARMA5 is a map I used to play at lunchtimes while I was working on PC games. I had a 450MHz PII (P3?) at the time, with an early NV graphics card. It's quite heavy going but the old bird just about copes

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Quick test demonstrating working transparency without a z-buffer, and z-clipping of transparencies with exaggerated nearplane.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Another speedup is on the way. After fixing most of the correctness issues and getting a stable render at all z-distances, I tried dropping the span arithmetic from 48bit to 24bit effective, and got nearly the same result. So there will soon be a 6x reduction in the amount of code needed to set up each span, and a 50% reduction in data transmitted to DSP per face - which is nice

This guy is amazing.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Did you guys see the Egyptian motif in the last minute of that video? That was awesome! I love Egyptian stuff! So badass.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Reading DML's blog on Atari forum where he is posting info on optimization updates he has done since the last video was posted is impressive. He's finding more and more ways to speed things up.

I made a few more simple improvements that actually got rid of all of the padding nops and doubled the size of the jumptower. The impact of this on speed of complex scenery is actually quite good - it spends more time drawing pixels and less flyback time on very short spans.

For now it's partly a negative trade because it meant pushing some other code out of DSP fast memory - slowing other areas down - and causing more cache misses on the CPU side (bad!) but these can be bought back later with other changes. The fact that it is faster eveywhere despite this is a good sign.

Imagine he or someone else doing these optimizations on the 32x or Jaguar where the caches are much larger and things won't get 'bumped out' so easily.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

DML wrote:So the other thing I had been working on is an alternate way to perform square-root operations for realtime 3D.

These are very expensive to perform via 882 FPU and even more so using algorithms on the CPU. Tables help but consume a lot of space to make a real difference and this is useless on a DSP 56k with very little RAM. The excellent Carmack/3DLabs sqrt() trick - which exploits floating point bit representation - deserves a mention. But it requires FPU and therefore still expensive (and limiting) on a Falcon, and useless on the DSP.

(I will point out that square-root is highly valuable for 3D graphics. Having access to a fast sqrt() makes a real difference to what is possible!)

So far I had been using a modified/improved bitwise algorithm on the DSP, both integer and fixedpoint versions. This works quite well but requires 23 iterations of a 5-instruction sequence. That's 23*5*2 = 230+ cycles (!!!). I tried translating other algorithms to DSP but this remained the general winner for speed/accuracy. There is a partial-table solution which should be faster but it didn't save much and consumed a lot more space and registers. In any case all methods tried are either so slow (or so inaccurate) that they have limited use.

But I didn't give up!

After some experiments I developed a solution which closely approximates a 23bit fixedpoint sqrt() in just 10 cycles.

A modified/compound version can also approximate 1.0 / sqrt(x) - albeit less accurately - which can then be used to normalize 3D vectors very very quickly. I wouldn't use this for important math (!) but I think it should suffice for most graphics uses.

The fun part - this method is continuous, accurate enough to replace other methods and fast enough to use per-pixel.

There is some other stuff going which ties in with this, but it is early stages and I'm not close to describing it yet.

Below is a dump from random samples using this integer-only sqrt() approximation. Only result deviations >= 0.01% vs expected are reported, indicating that accuracy decreases with small source values, which turns out to be ok for most common cases of sqrt() in graphics problems and isn't too much of a surprise for integer-based formulas anyway as fewer bits are available for smaller numbers, unlike floats.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

DML wrote:Maybe the DSP one can be improved but 5 ops and 23bit result was as near as I got for the traditional way. Note that it operates on 23bit fractions so in/out values are shifted by 1 bit, as is typical for DSP.

It is possible to get rid of the lsr shift but seemingly not the parallel move - so 5 ops it is for now. BTW unrolling it a bit can remove a few ops I think from the final iter but I didn't bother, kept it small. It takes forever anyway.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

DML wrote:I just got the DSP version of approximate sqrt() working, have tested it and I am now pretty certain that it will be accurate enough to replace the other one in most cases.

The body of the calculation is 8 cycles (4 ops) but there is a 12bit normalizing shift involved afterwards and the fastest I could do this was 10 cycles (5 ops - beating my previous impl for a 48bit dynamic shift by 2 ops). So the full arithmetic takes 18 cycles on DSP after all...

There is also some addressing setup code - which can be amortized into a loop (same as was done for the texturemapper) but standalone its another bunch of cycles. So lets say the first pass on the DSP is 18 cycles best case, up to 30 worst case if just called once. More than the 10 cycles I had sketched out but I won't complain. Definitely better than 230 though

For the texturemapper I was able to play with normalization of each term, at the expense of accuracy and removed nearly all of the shifting from the original version. Not sure I can do that here but it's only the first iteration. Maybe another day.

It's definitely nice to see my test running waaaaay faster with this upgrade

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Some great info in here. Just watched every video and was blown away by those graphics
Hope this becomes a release someday. In whatever form the last build is in. Demo or finished product. Some outstanding work

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Rumor going around that DML has reacquired an Alpine development system. What he intended to do with it is anyone's guess.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

Here is a link to the PSX maps converted to PC if anyone is poking at this for themselves. Since they are now in PC format they shouldn't present any new layer to learn if you already understand the PC format. However in them may be clues to how they reduced the complexity and answer the question below:

Some of the Q2 maps are extremely dense, yes. This is partly because textures never truly 'repeat', so flat surfaces must be broken into unique tiles, creating many more vertices than would be needed for a flat-filled version. this is why the floors of big rooms seem to be made of randomly sized tiles - it is necessary in order to texture each 'world pixel' uniquely. Unique texturing is required for unique pixel lighting (lightmaps). So its really the Quake lighting that forces the polycount to be much higher than the 'geometric surface' count - which is already fairly high to give the game a decent look.

I think this was somehow sidestepped in the PSX port, but I'm not sure of the details. They did a very, very good job of that port, but a significant portion of the savings came from changing the content (e.g. the maps) to better suit the HW polygon engine in that box. They didn't need to retain anything that was designed with software rast in mind.

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman

A procedural DML found that might translate well to Jag/32x and other systems without any VRAM.

I have found that by interfering with the lighting system and modulating the lightmap while it is being built, it's possible to add some interesting effects to the scenery.

The case I was experimenting with: marking concrete procedurally with an irregular grid, based on distance from the lit point to the contour of each face. The visual result is something like multitexturing with a low detail texture, breaking up a tiling pattern so it looks like many more base textures are involved.

Some other simple procedures also provide nice effects - e.g. turbulence for gravel and other less regular surfaces. All for free at runtime of course

There's nothing special about this really from a technical standpoint, except for twisting the lighting pass to do something it's not really meant to do. The best things are sometimes free

What came after the Jaguar was the PS1 which for all it's greatness, ushered in corporate development and with it the bleached, repetitive, bland titles which for the most part we're still playing today. - David Wightman