My emulator implements real timing* but isn't quite ready yet; in a few weeks (or so) I or anybody else with a Mac can probably perform some basic timing tests if that'd be helpful. Sorry for the vague future offer — I thought maybe it was a bit more helpful to say something while the topic is active.

* i.e. a spinning platter from which a PLL attempts to reconstruct input data, from which the drive controller can spot sectors via the real hardware means of waiting for an ID address mark with matching fields, then waiting for a data mark, etc.

Chema wrote:I thought iss was doing his own tests in this area. Any results?

I have problem with time measuring using T1. My tests failed because sector reading routine runs after 'sei', so no T1 interrupts are handled and counting is completely wrong. I tried to leave interrupts enabled but reading simply hangs. Any idea?

That would be great ThomH! Your emulator is going to have very nice features!

@iss Uh... it is going to be difficult to measure with T1, yes.. I tried something similar way back ago for other matters, and could not do anything useful.

However, in this case the timing is so different that I guess that a couple of PINGS at the start and at the end of the load and a stopwatch would do

And I would test (some ideas):

1. A bulk of data (say 10K) loading from SEDORIC. Data can be just anything. A dump of the rom, for instance.
2. The same data loaded with FloppyBuilder with no compaction.
3. Idem but with compaction.

If the greatest difference is clearly between 2 and 3, our guess that decompression is affecting performance would be correct. If the difference is between 1 and 2, then something is wrong with the loading routines.

I measured the average time for the decompression routine and tried an interleave factor to minimize its effects, but had no luck. Either the supposition was not right or the interleave was not correct (or correctly done).

Makes sense?

On another note, I found the first memory problem in the game. In the scene I am working at (the cell) there will be a lot of text, as characters are introduced and many things are said to prepare the background for the characters. Well, too much text, in fact, and my engine failed with an out of memory exception (a red bar, a code in a variable and an infinite loop, that is).

I had to split the bulk of the text in two different resources and load them on demand, so that when the second part is needed, the compaction routine gets rid of the first and other unused data, creating the free space I need. But I still have to put more things in, so let's see if that is enough or I have to think about something different.

It is quite an extreme example as the room is bigger than the screen and there are a lot of characters at the same time, and a lot of text! In the end, I should have kept the idea of storing text compressed in memory too!

Looking at Sedoric's way of reading, first idea I had was that it uses 'Multiple sectors read' FDC command.
I recompiled Oricutron with 'GENERAL_DISK_DEBUG' and it showed clearly, that this is not the case -
Sedoric uses 'Single sector read' FDC command - exactly as FloppyBuilder does.(...or almost exactly, but about this small detail - tomorrow in my next post)

The next idea was ... buffering... hm, wait a minute... what about 'anti-buffering'?
I made some quick modifications in FloppyBuilder loader code, so if the file is uncompressed it is loaded directly in place avoiding the buffer copy ...
...and we have new champion: FloppyBuilder: 0:40 min - Eat my dust, Sedoric!
The winner image:

So, to sum up, the fact that uncompressed data was being buffered resulted in slow loading too. Probably due to the copy of each sector to the destination. Mmmmm... I wonder if that explains the difference between loading speeds.

8000x40/256=1250 sectors. How long could the copy process take? When I measured the uncompressing process it took an average of 34000 cycles per sector, so around 42 seconds. We are still far from those 4 minutes even it the pics were compressed!!!

So I still think that we have an interleave problem. Missing a sector and adding a full rotation here would mean, with a rotation speed of 300 rpm, 0.2s extra per sector, so 0.2*1250=250 seconds lost in waiting for the sector to arrive again, or 4.16 minutes. Very close to your timings

What I can't explain is why, when adding interleave, things did not change dramatically. I am starting to think that I did something wrong in the process... The disk images were different, that I can tell.

Edit: One sec! Did you say you used HxC??? That does not simulate the missing sectors/extra rotations, I guess... Then all what I said could be nonsense?

LoadUncompressedData
;
; Loop to read all the sectors
;
read_sectors_loop
jsr ReadNextSector
; Try to let time to an IRQ to play, and during that time copy the sector to the final location
cli
ldy #0
loop_copy
lda loader_sector_buffer,y ; Load the byte from the sector buffer
__auto_write_address
sta $c000,y ; Store it to the final location
iny
bne loop_copy
nop
nop
sei
; Next sector
inc __auto_write_address+2
dec sectors_to_go
bne read_sectors_loop

So based on your tests, just doing this copy slows down things tremendously, but did you just remove the copy, or did you also remove the cli/nop/nop/sei?

@Chema: Both images (FB and SED) are 'synthetic' i.e. generated with OSDK tools than converted in an equal way to HxC format as result they should have the same 'none' interleave. IMO for test purposes it's better to eliminate the interleave as factor for now. I plan to make interleave tests with real floppies during this weekend. Hm, thinking again now, may be it's good idea to make another test too:
Instead of: read->copy->read->copy->read->copy...
to try: read->read->read->...copy->copy->copy...
Of course more buffers will be needed, but with this approach full rotation waiting will be avoided.
If the performance is better then a properly selected interleave should allow the same faster speed!
Does it make sense?

@Dbug: Yes, this is the code. NOP's are removed, but SEI/CLI are moved to surround the reading from FDC loop - at this place they are obligatory. I'm still using the old FloppyBuilder - it's noted as 0.19 in the generated 'floppy_description' file. In this version I made some optimizations and changes to allow writing. So for now I feel very familiar and comfortable with this old code, but it's not problem for me to switch to the new version. Does SVN contains the most recent code? I want just to be sure, that we are working on the same sources. BTW, I can send my sources to you and Chema just PM me your e-mail. I can attach sources here too - they are basically not mine - but IMO there is no need to spam here with 'preliminary forks' and 'betas'.

Else, as I wrote, using Oricutron's disk debug I found another difference: Sedoric reads sectors with command 0x80 only for short after boot, than it uses 0x8C and 0x88 - the first - once after seek to new track else the second. In FloppyBuilder the read command is hard-coded always 0x80 for microdisk and always 0x8C for Jasmin.
As per WD1793 pdf, if bit-2 (%00000100) is set then 15ms delay is introduced - I think this is really important when working with real floppies. Bit-3 (%00001000), together with bit-1 (%00000010), is used by verifying if the physical ID matches the selected side.
IMO, this will not make things faster, but should be implemented in FloppyBuilder too .

The reason for the intermediate buffer was mostly that I was thinking of allowing the partial loading of sectors, thus for example allowing the loading of "37 bytes" in a 37 bytes large buffer.
I guess the code could try to load the entire sectors directly where they are needed, and only the final partial one using the intermediate buffer.
(Not sure what would happen if I did not read all the bytes from the sector, would that upset the FDC or would that be fine?)

That being said, if Chema's data are mostly compressed, we will have to find another way of doing things.

For the timings, no idea: As I said a number of time, the FDC code comes from Jede and Fabrice, I myself never really did study how all this stuff worked, so suggestions are welcome

Updated the ZeroFx code to use the new version of the FloppyBuilder.
Required changes were:

Description Script:
- Adding a 'FormatVersion 0.20' to the description file
- Adding defines for the location of the sector buffer and zero page locations
- Replacing AddFile by WriteLoader for the loader file
- Moving all the AddDefine AFTER the AddFile they are related to
- Removing the load address from all the AddFiles (because in all cases the loading code knows it)
- Adding some AddDefines to export useful information directly into the header files instead of wasting room in a binary table

Files:
- Removed the disk_info.h (all the relevant data is present in the boot sectors and loaders, a bit redundant, but less fragile)

Api:
- The exported table is still available, but it is now in loader_api.s
- LoadFile is now "LoadFileAt" and requires a loading address (which can be added in the description script and exported, just use AddDefine MY_EXPORTEDLOADINGADRESS_WHATEVER 0xA000)
- LoadFile and SetFileAddress are gone

I understand but the enormous amount of extra time cannot come from the copy loop at all. Just count cycles and multiply by the number of sectors.

I think that the fact that not using the buffered code is way faster is a lateral effect of something else.

I'm not near a computer now, so cannot check, but imagine (not being serious, just a random stupidity to illustrate my point) something such as the ISR clearing a bit which affects the FDC, and this not being done with interrupts disabled for reading directly into destination.

Mmmm I don't remember anything strange, but the light remaining ON and with large pauses between the typical read sounds (track-track). You can see it here (though nothing is heard, anyway) loading something as 16K of data in 17 secs (just after I press reset):

In any case, and while someone comes up with a new idea, I'm still working on the game. And I am quite excited with the results.

I don't know how to show things without spoiling the game, so no video or pic this time, but I can assure you I am really pleased with how things are going and I bet that, if you ever play the game, you'll see things that seemed impossible for an Oric. And probably have never been done in similar machines either.

My only concern is that I cannot put in all the things I would like to... I would eat up all the memory and disk space! And that, with just one disk, the game is not going to be as large as I'd like. The latter is relatively good news, nevertheless, because that means I will finish it somewhat soon

I have nearly finished the first big puzzle in Episode II which takes place abroad the London ship. As soon as I finish it, I will move onto the part of the game where the team gets the Liberator. And I already have nice ideas for that part too!

Chema wrote:My only concern is that I cannot put in all the things I would like to... I would eat up all the memory and disk space! And that, with just one disk, the game is not going to be as large as I'd like. The latter is relatively good news, nevertheless, because that means I will finish it somewhat soon

Just make it an episodic game

It's better to make something short and well designed, than something longer but not as good.
If people like it, just make a follow up adventure!

Congrats on your great tempo!
Just keep in mind: the start address of uncompressed files in FloppyBuilder must be always page aligned!
I found this today when I try to load the 'main' uncompressed program from $550.
Else I think, that playing with interleave can really bring speed improvement ... will see and report back asap.

I got my emulator seemingly into a working state, with an attempt at fully-correct timing. If you have a Mac, grab the binary from its thread and do as you wish. But if you don't have a Mac and there's anything I can do to help, please don't hesitate to ask.