Eero Tamminen wrote:I've just updated Hatari manual's debugger section (for new debug symbol handling, breakpoint stuff and profiler), so it would be good to get it right...

Ok, I'll do that but keep in mind it could just be me - trying to do too many things at once and not enough time studying some things that could use closer inspection (a bit of prioritizing ).

What I actually saw was a mixture of things including far too many 'share same address' warnings, too many .Lxxxx local symbols, global symbols missing where expected (browsing through the C lib near rand(), strcpy() etc. shows lots of RTS but not labels at the heads of most of the functions - an occasional one but that's all) - and some functions carry the names of their object file e.g. 'fileio.o' instead of the function name. I also noticed some symbols hiding in desynchronized disassembly (data following one function, de-syncs the disasm of the next function and absorbs the label, until disassembled from the right address). This last one is to be expected really but might have been part of the problem with labels being invisible generally.

Anyway it's probably better if I just send you a binary to study and you'll be able to tell if something is wrong or not. I'll look closer when I get a bit more time.

BM engine makefile is complete - 68k & DSP code all built with cross-tools.BM project restructured for cross-tools.standalone bmview.ttp builds and runs as before.bmengine.a is generated with XDEFs for various API calls to the engine.bmview.tpp now uses the same API, so in theory Doom and bmview.ttp can both be used with the same engine.

The startup/shutdown sequence is quite ugly and some things require supervisor mode, or can only work before/after memory arena claims so there is still plenty to do - but getting there.

dml wrote:BM engine makefile is complete - 68k & DSP code all built with cross-tools.BM project restructured for cross-tools.standalone bmview.ttp builds and runs as before.bmengine.a is generated with XDEFs for various API calls to the engine.bmview.ttp now uses the same API, so in theory Doom and bmview.ttp can both be used with the same engine.

Sounds great!

If you later on add the source somewhere (or have time to put it to mercurial/git), I would like to poke it a bit to get m68k code build with my native GCC 2.95 & Vasm setup inside Aranym, to see how much speed difference there's between m68k code generated by latest & greatest version of GCC and that old Atari GCC. (so far I've been happy with that ancient GCC version, maybe I won't after that test... )

Eero Tamminen wrote:If you later on add the source somewhere (or have time to put it to mercurial/git),

I have been working with hg for a long time anyway, so the project is already in a local repo - I just need to look for a public version of that later, e.g. sourceforge or equivalent.

Eero Tamminen wrote:I would like to poke it a bit to get m68k code build with my native GCC 2.95 & Vasm setup inside Aranym, to see how much speed difference there's between m68k code generated by latest & greatest version of GCC and that old Atari GCC. (so far I've been happy with that ancient GCC version, maybe I won't after that test... )

It might be possible to build it all natively using the new project - it only needs make, gcc, vasm & asm56000 all of which have native binaries.

I used gcc295 (or maybe it was 272) when I was working on the Q1 sources on Atari natively - it worked fine but it has a tendency to crash and silent-exit when it runs out of memory, which happens very easily on a real machine.

dml wrote:I used gcc295 (or maybe it was 272) when I was working on the Q1 sources on Atari natively - it worked fine but it has a tendency to crash and silent-exit when it runs out of memory, which happens very easily on a real machine.

That's not a problem with Aranym, one can specify as much FastRAM for it as one wants to, and the JITed 040 emulation running at fullspeed is "fast enough" as long as one doesn't need to build monster projects with hundreds of C-source files (or large C++ sources).

BM now has a C-friendly API for all events including setting the window size and player/camera position. A few bits and pieces left and it should be possible to link the two projects (Doom, BadMood) into the same binary (although not quite ready to run side by side).

The memory management and message printing stuff needs changed to work properly with both projects working together. This is probably the next thing I'll have to fix. The API is also missing a number of things, including level selection and practically anything to do with sprites ('things').

New startup sequence using exported API, from a cut down 'viewer' shell application. This will get interleaved with Doom's startup/shutdown sequence.

Last night got Doom and BadMood running in the same binary under TOS, with BM executing a partial startup sequence. The Doom memory allocation has also been routed through a single interface to allow memory sharing.

As of lunchtime, Doom is now allocating all memory via BM's arena allocator, so they can both claim memory arbitrarily without fighting. Doom seems to run happily with that. It only required a few changes and removing a direct Mxalloc call I had added previously for the Falcon display buffer.

Unfortunately Doom is quite greedy in that it claims a maximum of 6mb right at the start (separately from a number of direct malloc calls), and chops it up using its own internal arena (zone) manager which does not map well to BM's allocation system. So this private zone arena will need shrunk down quite a lot as soon as BM takes over managing all the graphics and eventually audio data.

Hopefully Doom won't need to claim much except for the BSP and a few other bits. I can't easily solve the fact that Doom and BM will both try to load separate copies of the BSP - and this is probably not helpful for running on a 4MB system. It might be possible to get some sharing (with a lot of extra work!) of the BSP and/or other leveldata but the in-memory representation isn't the same on both sides of the fence and it may have to remain like that for now.

While BM's memory allocator handles defragmentation, it is still a linear memory map system and could eventually choke after lots of random allocations. However it's mainly used by the resource cache and this can be fully flushed after each level - so in theory it shouldn't be necessary to route through Doom's zone stuff. If it becomes a problem I can do that instead but with the current architecture (no persistent mallocs inside BM - all flushable/reloadable) it should be ok (!).

It's looking reasonably good so far. Won't get more done until evening (or tomorrow) but getting quite close to having dual-engine rendering output now.

bmdoom.png

You do not have the required permissions to view the files attached to this post.

I noticed that a *huge* proportion of time (nearly as much as the temporary truecolour screencopy!) being spent in the Doom2 game code was going into a single function which seemed to be scanning through the WAD file index repeatedly, and from within the network message event loop....

Three other offensive functions have been quickly recoded in 030 assembly, and have since disappeared from the profile view. Unfortunately the Doom game code is nearly as expensive as the rendering code on a 16MHz Falcon, which means gluing BadMood into Doom isn't going to be the whole story. Some game code will need rewritten too...

dml wrote:I noticed that a *huge* proportion of time (nearly as much as the temporary truecolour screencopy!) being spent in the Doom2 game code was going into a single function which seemed to be scanning through the WAD file index repeatedly, and from within the network message event loop....

And before this there was the MiNTlib gettimeofday().

On TOS MiNTlib gettimeofday() always calls mktime() which always does expensive timezone stuff, at least when "TZ" environment variable isn't set. And because Doom2 code (wrongly) gave gettimeofday() an obsolete timezone argument, for good measure MiNTlib calls the timezone code one more time...

These call(s) to gettimeofday() in Doom2 code were taking >60% of all the CPU usage, before these other optimizations. After mailing MiNT mailing list, I heard that this mktime() stuff is is "a well known performance issue".

Another thing to avoid with MiNTlib is nanosleep(). That calls also gettimeofday(), it does that without the timezone parameter, but because the select() call following gettimeofday() will fail on TOS, it will call gettimeofday() another time, to get the remaining time. Thus it also invokes the timezone stuff twice...

usleep() is safe, it will do select() call, but after that fails on TOS, it will just busyloop (also if given sleep value is small enough).

Yes it was actually Eero who noticed the inflated time on both of those functions while I was looking at other stuff (thanks) although I found the code for the WAD search was pretty funny when I looked for the cause

Since then I have removed the erroneous WAD scan and the gettimeofday() crap, and replaced 4 other expensive game functions with assembly language, and the game seems to be running ok. Not fast enough yet, but getting there.

I think there is another more fundamental problem though, as I recognize this type of game architecture and it has a performance related flaw (but is otherwise very nice and consistent at a wide range of framerates).

The game operates with a fixed assumption about time simulated by each tick - in this case 35 or 70Hz ticks. The ticks don't actually happen at this frequency - they happen in bursts in order to play catch-up with realtime - against a clock which may have variable resolution. The main advantage of this tick-burst architecture is that movements of players, enemies, bullets etc. do not need scaled by the current framerate so they have perfectly predictable behaviour. It's like coding for a 50Hz scroller really.

The downside is that the actual CPU time spent ticking (relative to drawing) tends towards infinity as the cost of the tick approaches the tick rate. i.e. if a game tick can't complete its work in a 70hz interval, it will chase after itself to make up the gap and never stop. This might be happening now, to some extent.

The good news is, any optimizations to the game tick will have an exponential (ish) impact on cost, so it should be easy to claw back a lot of time by doing not very much.

I was thinking about looking for higher-resolution sprite replacements (the 3D sprites) from other Doom versions, as overlay PWADs if possible. This would work well with mipmapping because they can downsample properly.

The current 3D sprite resources are very low res and can only be upsampled and blurred, which looks quite bad IMO, especially with the silhouette/mask edges.

I also looked at dither/jitter on texture mapping and some other things, but few are a good fit at this low resolution and with only partial DSP shading. Will see how that develops. Need to deal with game code performance problems before I get back to those areas.

As for 2D sprites and graphics... there may be interesting news on that front before very long.

I've been quite busy tonight trying to reduce the *enormous* cost of line-of-sight testing between the game objects. Something like 30% of total game time is spent just checking vision between entities.

After looking at a few other source ports (ChocolateDoom, ADoom, Boom etc.) I don't see any significant optimizations in that area of the code, so I'm trying some of my own. I have been discussing this problem with Eero while profiling the game, comparing versions of the code and doing tests.

Making some positive progress but it's difficult to speed up game code without messing up the game state and interfering with things like recorded demo playback. It needs much more care than the graphics areas.

I have added a bounding-ball test to the BSP subsectors, to use as a trivial rejection on rays fired into the BSP for line of sight. It will take some time to determine if it has any useful impact.

There are already 2 kinds of trivial rejection test in the Doom code - one of them (bounding box around the ray being tested) is next to useless, and the other is probably doing a good job (PVS similar to Quake's) - but apparently not quite good enough.

I think I have a workable strategy now for solving the gametick line-of-sight performance problem. It will take some time to implement and test so I'll come back to it when the BM integration is more complete.

How about a conservative method where only a percentage of the objects are updated each tick? In particular, if there is visibility between A and B, there's probably no need to check if it disappeared for 5 ticks or so, and it's not unrealistic.