FAT corruption and stuck system (ARCAOS/DOS full screen/R249)

Description

I have a newly made ARCAOS system.

My previous system was ECS22 beta, presumably using version 913 (2008) and no ill symptoms.

I dont know which release is part of ARCAOS but I found that obeying a CMD file from a FAT32 formatted disk could cause the contents that file to vanish (not the file iteslf). There appeared to be other corruption associated with that problem.

Not unreasonably LEwis Rosenthal recommended that I try R249, which I installed using the WPI version.

The above problem did not occur. However running a DOS in a full screen window (which I need to do for other reasons) and obeying a BAT command (calling word perfect) all seemed well until I exited word perfect when the systen froze.

On the forced reboot the partition showed lots of errors to the extent that it said "use SCANDISC under Windows".

Reverting to the "stable release" 913 these problems do not occur.

I arranged for full backup of the partitions that are vulnerable and tried again, with very similar results so the effect is repeatable.

Change History (278)

Why did the system froze? So, it occurs each time you start a .bat file with a DOS program? I have no WordPerfect?. I created a .bat file starting Volkov Commander -- seems to work fine and no FS corruption. How could I reproduce this on my machine? What FAT errors that were? Trash in directories or FAT table corruption, or? What is CHKDSK output? Could you take a f32mon log while starting WordPerfect?? Is it FAT16 or FAT32 partition?

PS: Sometimes i get trash in directories (while the FS is accessible and files can be opened fine, so you can always backup data and reformat the FS). But I cannot reproduce this yet.

What FAT errors: I have no output as it did the automatic chkdsk on system restart and the output gets thrown away (I think).

All my FAT partitions are FAT32.

I have been testing with r249 and alternatively 913 (partly to try to distinguish FAT32 problems and ARCAOS problems). I have the following as part of a BAT which seems to reliably freeze the system with r249 and NOT with 913.

==================
::@echo off
set popdir=%1
if "%popdir%"=="" set popdir=popdir
set popdrive=%1
if "%popdrive%"=="" set popdrive=popdrive

new comment by the reporter in ticket #47. But as this ticket might have a lot more information, I add the comment from #47 in here and close #47

Since experiencing the stuck state (as I thought on ArcaOS) I have experimented with both CS and ARCAOS. The stuck states occur on both systems in all releases later than 913.
I now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories.
Please let me know what diagnostic information would be useful.

now have a file on a FAT32 formatted volume that causes a stuck state on either system when attempting to delete it (DEL comand) in a DOS window. This effect is (so far) 100% consistent and oersisted despite remaking the volume by copying the files and directories.

Please let me know what diagnostic information would be useful.

You'd better describe how I could reproduce this on my machine. And please, try the latest version from ​ftp://osfree.org/upload/fat32/ again, and test if the problem is present in the latest version. (It could be gone on newer versions. I tried to delete a file from FAT32 partition from a DOS prompt -- it deletes it as it should, no stuck state.). And, you can give me a link to your DOS program, so, I could try to test it on my machine.

the following causes a stuck state on my ECS22 system (with any version of FAT32 other than 913).

POPDIR.BAT is the small file I have attached. It is on a FAT32 volume (and in case it matters, so is NE.EXE)

call NE by

NE popdir.bat
a command window opens showing the one line of data
END to go to the end of the line; backspace to delete the last charecter
F3 to exit
in response to the prompt to update the file or do other things type y (for yes)

2landb: Ah, nice. You need another machine with COM port too, and connect both with the cable, plus a terminal emulator program on another end. It can be any OS. If the second machine has no COM port, you can use an USB2Serial adapter (you plug a null-modem cable into it). For OS/2, I use Prolific pl2303-compatible adapters. For Linux, the choice could be significantly wider. And on debugee side, you should specify the /monitor switch in fat32.ifs command line in config.sys. It permanently enables the debug messages. Then you should see the messages in the terminal. You can save the log and send it to me. I need a log taken when the system begins hanging. This is very good if you have a COM port and a cable.

OK good. I guessed I would need a second computer :-) and I have a laptop T61 which does have a com port (and anyway I have a docking station which certainly does). I can run ECS21 or windows 7; does it matter which?

2landb: Yes, very good. I have a COM port on my ThinkPad? X61T Docking station, too. This is how I usually debug fat32.ifs. It does not matter which OS you use -- only in that how is your available terminal emulator is good enough.

No I have not installed a debug kernel; I did not realise it is needed. Where do I get it from?

On OS/2 installation disks, of course. \os2image\debug\smp\os2krnlb is a half-strict kernel, which you need. And please install the symbol file, os2krnlb.sym. But rename them to os2krnl.* first, of course.

What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.

The install disks for ECS21 and ARCAOS both have a half strict kernel available so I think it probably best if I change to using the ARCA system.

Do I just copy the kernel into the system (and the symbol table)?

(I used to do this a lot when kernel updates happened regularly but have lost the technique)

And re the cable: I have in the past only used the 25-25 connectors so I dont know if the 9-9 will work.

What I am actually running at present and am using as testing ssystem for FAT32 is ECS2.2 beta but this came with an ISO download and I dont know if the generated DVD has an alternate debug kernel (though I suppose it does). At present I cannot see the DVD but I have the iso file.

I know nothing about ArcaOS, but I suspect it is the same as eCS. On IBM's OS/2's and on eCS it is the \os2image subdirectory in the root. If you have only an ISO image, then you could burn it first on a CD/DVD disk (otherwise how you are going to install it on real hardware. If not real hardware, then it's possible, of course.). Also, you could extract it by 7zip or mount via an ISOFS Netdrive plugin. 7zip was available on eCS boot partition after installation. I suppose you have one, or ArcaOS has 7zip too (maybe, I only suspect, as I haven't ArcaOS).

Do I just copy the kernel into the system (and the symbol table)?

Yes, you need to copy the kernel, or insert it in the QSINIT(/ArcaLoader??) menu. In case of QSINIT, the older kernel is available there too.

PS: Back up your kernel. of course, if you're not using QSINIT menu. In case of QSINIT, you can copy your new kernel under another name, and add both kernels to menu.

However, during system init (reading config.sys) OS2 reports that COM.SYS (or PSCOM.SYS) is ignored as if the COM1 port is not present.

On which machine? On terminal or on debugee? If debugee then it's normal, because kernel debugger takes this port and com.sys/pscom.sys should be disabled. On terminal machine they should be enabled, of course.

Debug kernel grabs the COM port itself, so COM.SYS/PSCOM.SYS should be disabled during testing. And USBCOM.SYS does nothing with physical COM port too. It can handle USB COM ports only. Yes, the kernel grabs the COM port before config.sys.

Having finally (I hope) got it set up correctly (I added the dbflags and port to os2ldr.ini and corrected the kernel line) the system would not complete the boot process: at the end (maybe) of the config.sys work it stopped on a black screen.

I tried changing to VGA but that didnt work very well (for either mode)

It stopped at the debugger prompt in the boot process beginning. Did you tried the "g" (go) command from the debugger prompt? You can also create the file "kdb.ini" in the root directory of the boot drive with these contents:

.b 115200t 3f8
g
ln
u
dd ss:esp
k

It will set up the com port speed and address, and continue with "g" command. Then execute the rest of commands in case of a trap.

You just ask me if something is not clear. These things are needed for each one working with the kernel debugger, anyway. If not clear, just ask. The debugger stops at very early stage, to have very easy debugging. If you need not stop there, you just create a debugger ini file with "g" line (at least). But at most, I suggested you more lines for my convenience :)

landb: You could try connecting two terminals from both sides and see if it works (prior to connecting real debugger), with what settings. You'll need to reenable pscom.sys, of course. And see if your cable is defective. Did you set up all terminal settings correctly? Should be Speed 9600, 8 data bits, 1 stop bits, no parity (aka "8N1").

1) I will get the serial fitted professionally tomorrow so no need for extra fuss until then.
2)
===============
It should boot further, at least, and not to stop in the beginning. Do you see black screen in terminal or on the test machine? Did you copied kdb.ini to the root of boot drive? Not another drive?
==================

Black screen was on MUT. KDB.INI in MUT in root of boot drive.
Might the lack of COM1 confuse it? Let's wait and see....

So, your QSINIT boots an original kernel, and your debug kernel is called os2krnl.dbg on the disk, and the same name in os2ldr.ini? And it began to boot? You can try adding "loglevel=3" to the [config] section of os2ldr.ini, then QSINIT should write his debug info to the debug terminal. Is there this debug info, or no?

AH, good. Do you have ".b 9600t 3f8" at the beginning of a kdb.ini (kdb.ini should be placed on a test machine! As it is the kernel debugger .ini file and kernel debugger runs on the test machine, not the terminal) too? BTW, we can try setting speed to 115200 on both end (should be ".b 115200t 3f8") for more quick debugging. The rest of settings seems to be ok.

It occurred to me to try the latest version of FAT32 (r257) on my T61. It creates the same stuck state, so it is something to do with the way the system is set up rather than any oddity in the hardware. I will try to monitor in the other directin, using the Thinkpad as the MUT.

I could recommend to install a terminal program on both machines and experiment with their settings first. Try to set up the same settings and see if typing symbols on one terminal outputs the same symbols on another. But only enable pscom.sys first on that machine. Otherwise, the COM port will not work.

Of course, you should not load the debug kernel, if you want to use a COM port for another purpose, than kernel debugger, and enable the COM port driver instead. Otherwise the kernel debugger will take over.

Of course. My point is that when loading the debug kernel to try to debug why my system gets into a stuck state the system gets into a different stuck state prior to loading the normal OS2 screen. You suggested the kdb.ini file, which I put in place but that made no difference. So why isnt the system getting further with the debug kernel?

It probably, trapped somewhere. You need to get your terminal working first, to see why and where. (Maybe, it traps on mount. Disk mounts occurs nearly before pmshell start) When a debug kernel is installed, all traps go to the debug terminal, not showed on the screen.

I recommend to start with two terminals on both COM ports attached first, without the debug kernel active, and with a COM port driver enabled on both sides. Then if you correctly set all options, typing a string on one terminal should reproduce the same string on another side (you will see it on the terminal). If while you typing on one side, only incorrect characters repeated (not the same you typed on the other side), then somewhere COM port settings are wrong, but the connection is ready. Also, if, while you typing on side 1, it repeats correct chars on side 2, but not vice versa, then side 1 has correct COM port settings (data bits, stop bits and parity bits number, 8N1, for example), terminal emulation settings etc. 8N1 on both sides should be sufficient. And, there is suc setting as whether to react to CD (Carrier Detect) and TR (Terminal Ready) signals. Your cable could be 2- or 3-wire (which is common with nowadays' cables), and hence, doesn't have CD or TR signals wires. So, some terminal emulators could work incorrect. So, you should look into your terminal program settings, if such options (to disable CD and/or TR signals) exist. Debug kernel doesn't need these signals, but some terminal emulation programs may need.

When you set up terminal programs ok, you may change one side to a debug kernel instead of a terminal program, and all should work (you need to disable a COM port driver on that side, and enable a debug kernel first).

I tried to leave the COM port driver enabled, as a result, the COM port driver takes the COM port over, and starting from the COM port driver loading, messages are stopped to come to the terminal. So, it appears that the COM port driver prevents the debug kernel to be working.

1) I installed r276. Both with 269 and 276 I find that there are more frequent stuck states and I dont do anything odd; just for example opening an OS2 window (startup commands come from a Fat32 partition)

2) I have dterm on the terminal, and it showed the usual junk characters. I typed some text on the screen and it suddenly started outputting readable text as a result of the /monitor.

That's good. But the piece of log you've inserted here is too small, of course. In the dir you started DTerm from, should be a DTerm.log file. Please attach it here (please, set the Log option to "replace" to clear log on startup. There is a status line at the bottom, you can choose a set of options by "arrows" at top left, options could be choosen by mouse or keyboard)

This log contains no hang. All IFS functions complete normal and it does not hang in any of them. Are you sure this log is taken after the system is hanged? Or, it hanged not in the IFS, but somewhere else.

To get the dterm window to see characters instead of rubbish I started by doing
echo xxx > com1
on the MUT. why that should help is a mystery!

Are you sure you have disabled the com port driver?

echo xxx > com1

should not work as it should be no "com1" device in the system, because you disabled the COM port driver!

Ah, you load a non-debug kernel, output a string to COM1 via echo command, and the log becomes ok on the other side? If so, then you can use a non-debug kernel, every thing should work ok. It appears that some program accesses the com port too, so it reinits the COM port. When you issue "echo" command, the port is reinitted again and debug messages pass through.

From which window you issue an "echo" command, if it hangs after that?

And where it "does not end tidily"? I see it ending on FindPathCluster? call... Or, you cutted the log? Could you please, leave some number of repeating lines at the end?

Ah, "GetBuf3Access: FindPathCluster?", so it is waiting on the semaphore. Then it is not a hang. Only a program accessing the FAT disk should "freeze". The rest of the system should be responsive. What program are you using to access the FAT32 disk? WPS?

Ok, you meant the "OS/2 Window". It is a standard PM window. And I use OS/2 windows without any problems on my machines. I suspect that your autoexec.cmd makes some problems. Could you share your autoexec.cmd instead? (BTW, why not use 4OS/2 for that purpose, it can have a file executed on startup of each OS/2 window). And could you try to disable your autoexec.cmd for testing purposes? (Just restore standard values of OS2SHELL and COMSPEC variables in config.sys and reboot?). I am in GMT+3 (Moscow) timezone. But I usually working nearer to evening.

May be, email would be better, but I prefer irc (if you agree). I am 24h hanging over #osFree irc channel on EFnet irc network (www.efnet.org for the list of servers).

I could remove the automatic use of start up command file but actually I want to get to the bottom of this and not find ways around it. All that autoexec does is set up things I want set in my window such as the path, colour of prompt, larger window for scrolling, etc. What do you mean by "share the autoexec.cmd"?

I have not tried to recreate the DOS window hang since getting this monitoring working as assumed the other hand was "just as good"....

Ok, you could use email, if you wish, but it isn't much better than writing here (my email is _valerius (dog) mail (dot) ru).

Removing autoexec is just for testing purposes -- I want to see if it is your autoexec that is causing troubles. If yes, you could just share with me your .cmd's contents and I maybe, will reproduce the problem on my machine. It could speed up things a lot.

1) removing the /K from the OS2_SHELL line stopped my system completing the boot process (the PM screen came up but remained empty). I totally do not understand this!

So instead I changed my references to d:\d\cmd (d: is FAT32) in config.sys to F:\D\CMD (F: is HPFS)

2) with that setup I installed 276 again; added /monitor and booted. the first instance of an OS2 window worked fine. After exiting and starting another (clicking on icon) it was referencing the FAT32 volume and became stuck again. See putty2.txt

3) reboot: this time I started a DOS window and it quickly became stuck. See putty3.txt

PS: Yes, you have a very strange setup. Why do you need .cmd files on FAT32 drive and PATH referencing it? Anyway, it helps to catch glitches ;) But I am still unable to reproduce this on my machine :)

PPS: Are you sure that your FAT32 disk does not contain errors? Either, those fixable with CHKDSK, or hardware ones? Bad sectors, or maybe, a faulty drive?

1) I have done extensive and repeated error checking (my first thought!). In any event the errors also occur on my Thinkpad so I think hardware error is ruled out

2) my setup did not used to be like this. Current setup is all to do with data backup, migration and recovery.

I used to have cmd files with extensive lists of things to be backed up. I moved to using file synch on one (maybe two) tree. This tree is on the FAT32 partition so that windows files also get backed up at the same time.

first boot attempt failed.
After a long wait there is a message "failure to get semaphore" (exact text has gone. Then ENTER to proceed; screen clears. new message
FAT32: Semrequest getFatAccess Failed rc=121!
Press Enter to continue

I suspect, you downloaded the 2nd one, because I don't see any additional debug messages.

Regarding the getting semaphore problem, the current binary is compiled with an experimantal option for working around another problem. This is unrelated problem, now I built the binaries without this option enabled. (This problem needs to be adressed separately, and I know about it). The binary is at 1st link, as above (note the "test" subdirectory).

I downloaded directly from the link in the post above (comment 102). Looking at the 2 links in 107 the one I downloaded and the one you quote above as correct (\test\) have the same length (compressed 799kb) and the non test one has a different length (compressed 801kb).

That is non-official .WPI from ArcaNoae? maybe, installs it to \ecs\boot, but both my .WPI and .ZIP have it packed into \os2\boot. And I didn't uploaded any .WPI's. It is presumed that you need to copy each file into its intended location manually, not just "unzip file.zip" to root. Some files may be unnecessary for you, such as .sym files or docs.

Every boot with that version of FAT32.IFS hits the semaphore timeout. 4 tries so far

With last reuploaded? Should not be... As I disabled corresponding code.

Apologies. it took me a very long time to realise you meant you had changed the test version and I needed to download it again. trying that now (with the \ecs -> \os2 change)

Yes, so you did not redownloaded my test version.

PS: With your latest log (putty5.txt), I see no hang at all. Does it look hanging for you still? As I see all IFS functions completed and no more hang in FindPathCluster?.

PPS: Yes, timings/delays are because of added debug code. You can remove the /monitor switch and there will be no delays. BTW, you can start f32mon.exe before testing, this will temporarily enable debug messages. When you close f32mon, debug messages are switched off. This is more convenient method to use instead of permanently enabling it with the /monitor switch.

All earlier versions are available as WPI and I dont know who made them then.

For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)

So long as the config is updated each time by the WPI process it obviously doesnt matter. It did matter with a manual download, and I used the same instructions as I received when test the USB suite: unzi evewrything to root. I decided the extra files wouldnt matter.

Yes, putty5 refers to a stuck state. It is entirely conceivable that there is some interaction somewhere with something. I will try to generate something more obvious. Thanks for the tip about f32mon - that would be very useful.

For Ecomstation a lot of stuff, including FAT32.IFS, goes into \ecs\boot and the WPI version puts it there. Obviously NOT for ARCAOS :-)

Yes, my main machine is MCP2 and even on eCS I keep all my IFS'es in \os2\boot. And ArcaOS, I suspect, have no \ecs subdirectory. So, it would be not very clever from my side to install binaries to \ecs\ :) And to have two different .WPI's for eCS and for other systems is not a clever idea :)

It looks that it hangs when reading sectors. I added some other debug messages. It looks all these logs include calling ReadSector? from different places, and the hang is in ReadSector?. I suspect that you have hitted bad sectors and DOVOLIO call hangs. Need to check this. Please test one of your testcases when it hangs.

What's the difference between 11 and 12? Both logs look incomplete, or it hanged not in fat32.ifs. 12th log ends with

fpc008: GetNextCluster: ulCluster=fffffff

so, the loop should end and reach the mark fpc009, but I don't see fpc009 in the log. No places between fpc008 and fpc009, where it could hang. So, it looks like it hanged not in fat32.ifs, but somewhere else. Are you sure it hanged at fpc008 and there were no lines after

Hehe, "D:\FAT32.LOG". So, the log is on your FAT32 drive, that's because it hangs when you start f32mon.exe. Try starting it with other current drive. I still don't understand, why it hangs when accessing FAT32 drive, because the hang is somewhere else.

BTW I dont know if I told you that both on my desktop and the T61 the drives are SSD. So bad sectors less likely.

Why do you think so? SSD's are indeed a special kind of flash memory and it can be corrupted. The bad sectors should be replaced transparently to spare sectors, in theory, but there is some limit for spare sectors. (As it is with hard disk, too). So, when the limit is exhausted, the bad sectors cannot be replaced, hence errors. And, there should be simply faulty drives.

So, nothing new. It has been hanged in the place where it normally cannot hang -- on exit from ReadSector?, sectors read done ok, and no place it can hang. It looks like it hanged somewhere else, but not in fat32.ifs. So, no ideas yet. Probably, you're right, some timing issue -- it hanged earlier than in #15a/#15b. Probably, it was caused by delay added by additional debug code.

It looks like it hangs after some delay after accessing the fat32 disk. If I add more debug messages, the earlier in the code it will hang. And hang not immediately after accessing the media, but after some delay -- already not in the place of access.

I added the code for checking for bad sectors, which was commented out, since version 0.9.13. Maybe, it will help.

It occurred to me to wonder if there may be instances in which you ae writing outside a buffer and corrupting some other process's data, so the IFS is exiting correctly but then the calling process fails.

Why are you so sure that traps in kernel or ACPI are related to fat32? I'd suspect that is another problem. If fat32 could corrupt memory of kernel or ACPI, it would trap on my machine too, but I have no such experience. And rather than creating dumps, you'd better get a trap screen with symbols loaded and /monitor option enabled, to see what was going on in fat32.ifs that time.

I wonder why are you getting the black screen, because the debug kernel is the same as non-debug kernel and it does not need something special. You just should copy the kernel (renamed to os2krnl) plus a *.sym file to the root directory as usual, plus create kdb.ini file in the root.

Why do you need t*.zip? They are not needed anymore. You should use latest svn revision.

Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel.
I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.

Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?

T*.zip.. the only reason I suggested using them was to change the timing (within FAT32.IFS) as that seemed to result in more and different things failing, and I only got the traps in that state.

Debug kernel. As you recall I am using QSINIT so I dont need to rename the kernel.

Ok, It does not matter. Then not rename it.

I did create kdb.ini and it did not help. Post processing config.sys the system did not initialise the PM screen but remained in a black empty screen not responding to anything. I even tried "g" on the terminal to no effect.

Are you use SMP or UNI kernel? Are you sure you have the same type debug kernel? Because If you change kernel from UNI to SMP you should change doscal1.dll correspondingly. If you don't change doscal1.dll, many usermode processes, including pmshell.exe, will trap.

Anyway I dont need the debug kernel to cause these problems; why are you suggesting I use it?

I need to see last fat32 debug messages, plus a trap screen on the terminal. For that, I need the debug kernel. Non-debug kernel only show the trap screen on the screen.

So, you are sure you are using the debug SMP kernel and a SMP version of doscal1.dll? If so, Then probably, there is a trap somewhere, that's because you have a black screen. The trap screen should be on a debug terminal.

I will try using the debug kernel again, but if that doesnt work I will set up SYSDUMP which has the trap screen recorded.

SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both fat32.ifs debug messages and a trap screen, with all debug symbols added.

PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with 127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs working).

PS: how do you use system dumps on newer systems? They have > 2 GB of ram. Here, >dumpfs.ifs could help, but its os2dump version does not understand disks > 512 GB with >127 or 255 sectors per track. (I have a 2 TB disks and I am unable to get dumpfs >working).

I havent done it for a while but the first thing is to use QSINIT to restrict the memory to (say) 512Mb, then sysdump should be fine.

Also my SSDs are all 512Gb so should be fine.

SYSDUMP doesn't show what's happenned in fat32.ifs just before trap. I need both >fat32.ifs debug messages and a trap screen, with all debug symbols added.

noted. If you recall I never solved the problem of getting intelligable output without sending "echo xxx >com1" from the MUT. SO I would not have seen the trap dump info on the terminal. everything seemed set up correctly. Needs more thought

the trap is something the kernel debugger has noticed, and typing gt gets beyond it (g for go and t to ignore kernel traps)

So after that I did various things and eventually went to a Dos window, used the ne precessor to open a file, changed it a little. exited, and it "stuck" during write back.
The log seems to indicate that FAT32 processing is incomplete.

Very strangely the next thing that happened was a spontaneous reboot.....

Booted as before (with gt); system stuck while still processing startup commands.

Why with "gt"? Use "g". Which startup? What do you mean? Those scripts executed on cmd.exe startup? It hitted a breakpoint somewhere in kernel. It should not happen normally. It is probably, because you used "gt". Don't use "gt", use "g"!

putty42

The log output from fat32.ifs is entered in kernel debugger. Why this?

I saw that and it was a total surprise to me. I cannot think of any place where this should be happening (though I agree the trace shows it). I have looked quite hard and not found it. Will have to look harder. I cannot even think of any reason for it. (Though I do use wordperfect for DOS)

What I meant was that all the commands called directly or indirewctly during startup (as listed above) had completed apparently properly BUT the mouse and keyboard were inoperative and the system appeared stuck.

Before making a test which I hope will show FAT32.IFS problems I wait for the system to go quiet to avoid confusion. In this case it stuck at that point.

You can't get beyond a trap normally. (maybe, it will only help in case of a breakpoint trap, but not a pagefault or a protection violation). If you trapped and trying to continue, it will necessarily trap again. I don't need to continue after trap, I need to stop and see a trap screen.

Is there something suspicious in autoexec.bat/startup.cmd/script executed on cmd.exe start/in other command files, executed automatically/by a scheduler (if you hae one)? or startup folders?

Can you share a 14.200 kernel with me? (Together with a .sym file, if possible)? Better, the debug one. (You can try email it to me). I want to check editing with your ne.exe on this kernel. Maybe, this kernel has a broken VDM?

I asked you to test a situation like in putty45.txt, where the popdir.bat is deleted into DELDIR (after closing a file in NE.EXE). So, putty46.txt is a wrong log file. It hits a int3 (a breakpoint), so it is a problem with a kernel. Please use a testcase in putty45.txt.

Using R283 as requested, and moving beyond the initial trap by typing g

Which initial trap? int 3? I only see int 3 in the log, there is nothing after it.

Doing my best! I cannot repeat the situation in putty45 unless the system initialises itself to the point where I can open a DOS window and use the keyboard. I had two tries and both stopped on that trap 3 at which point I can do nothing.

I dont know why the earlier trap 1 isnt in the log; maybe putty has a fixed size buffer?

If you want to see the trap 1 I can capture it but it is so early in the process it seems unlikely to be relevant (it is before the PM screen is initialised).

I took a quick look at putty46. VERY strangely there is a reference to d:\d\qw98.exe and I know of no reason why that should be accessed during system startup (but that is not your problem, I think!) which is very similar to the strange appearance of WP51 in an earlier log.

I have changed putty to keep 100,000 lines and a little bit of experimentation shows that all the trace up to the first trap 1 is 6000 lines (202Kb) and the trace up to the int 3 (which does keep repeating) is 10000 lines (377Kb).

I also booted the ecs system with the nondebug kernel and the system initialised correctly. I then opened a dos window as in putty45, and got the same freeze at the same place. The putty script is very long - i could email it.

I realise why there are mysterious references to various programs in D: in the startup process: OS2 is verifying the desktop icons. This does NOT explain the D:\wp51 as I dont have an icon for that but does explain all the others.

1) boot with normal kernel
wait for system to complete initialisation
open os2 window: command F32MON
open DOS window: ne temp\popdir.bat
make a change; F3; y to file back
--- system stuck
record as putty53

2) boot with debug kernel
same as (1)
BUT
after the y to file back the system did a spontaneous reboot
recorded as putty54

The only reason I have to support this is that I get the trap using r285 but not when using 913. Using the retail kernel the system sticks: using the debug kernel I get the following (nd nothing else) on the terminal:

Trying to use the debuug kernel to investigate this I find that despite
BLDlevel showing both FAT32.IFS and CACHEF32.exe as R288,
when using the debug kernel, when CACHEF32.EXE is called in config.sys it reports that FAT32.IFS is R285.

The boot will not proceed beyond that point.

i will check whether I have done something wrong but is it self identifying correctly?