I still experience this crash, funny enough it now occurs less often since I upgraded to Seamonkey 2.5 from Seamonkey 2.3.3. Maybe Seamonkey 2.5 uses less floating point.
Unfortunately I don't know the technical details behind VT-x (I could have a look into the Intel manual but I am sure I am lacking years behind ...).
I am using Intel Core2 Duo with Windows 7 as host. I had another trap on bootup just now, unfortunately I forgot to take a photo. The only thing I can say is that it happens pretty randomly. If you want me to test anything ...

There's one crucial piece of information missing here. This problem does not occur with the SMP OS/2 kernel. The reason being that the SMP kernel runs with the CR0.NE bit set. The actual number of processors in the guest does not matter.

CR0.NE bit set implies that there is some old interrupt controller around that generates IRQ 13 on a floating point exception, correct ?
Ok, I will now install the SMP kernel in virtual box and see if the problem goes away. Maybe that's also the reason why traps occur so frequently when Seamonkey is in use. I have the impression Seamonkey creates a lot of floating point exceptions that are then handled internally by the application. But the underlying mechanisms have to work ...

It's the other way around with CR0.NE. When it's set, it means the "new style" (implemented since the 286) math error handling should be used, i.e. #MF exception. That's also the only way a SMP system can work. The OS/2 UNI kernels use the ancient FERR/IRQ13/IGNNE math error handling (CR0.NE clear) which clearly doesn't work right in VirtualBox. No modern OS uses that, Windows 9x was the only other important OS which used the old style math error handling. Besides DOS, of course.

Sorry, yes, I meant it the other way around for CR0.NE.
In any case, I have just upgraded to the SMP kernel along with enabled I/O APIC in VirtualBox configuration and OS2APIC.PSD (with no parameters) and I am using Seamonkey 2.5 under an OS/2 guest in VirtualBox. Should traps occur again, I will post them here.

By the way: when you say "OS/2 UNI kernels" do you mean only the "W4" kernel or also the "UNI" kernel ? I never really understood why there is yet a third kernel variant (UNI) besides the other 2 (W4, SMP).

I had a trap on bootup: SMP kernel, one CPU only, no I/O APIC VM emulation, no PSD loaded.
As I can tell from the trap screen, the CR0.NE bit is NOT set even though it's an SMP kernel. Does that mean that I either need a PSD to operate or that the OS needs its time to change from CR0.NE = 0 to CR0.NE = 1 ? How do you handle Win 9x and DOS guests ? As far as I understand they also set CR0.NE = 0.

No, you can run an SMP kernel without a PSD. But of course, you will only get to use the BSP and none of the ASPs. But it surely looks that running the SMP kernel with a PSD (OS2APIC.PSDD or ACPI.PSD) is preventing the traps. I guess that OS2APIC.PSD will set CR0.NE very early in the boot process. For ACPI.PSD I could ask the developer and find out if it also explicitely sets CR0.NE (sets flag INIT_USE_FPERR_TRAP).
I have now readded OS2APIC.PSD to config.sys and enabled "I/O APIC" in the VM configuration. At the same time I have only enabled one CPU core (of two CPU cores available) in the VM configuration because mouse tends to get jerky with > 1 CPU core. I will see if this eliminates the traps on the long term.

It would be nice if someone could try it and check if the guest OS traps are gone. Please note that this is a development build and I'm not interested in anything other than whether the traps are gone or not.

Hmm, installed it over a 4.1.8 and it would not start due to "driver structure changed" or so. Had some trouble getting back to a working setup (now at 4.1.10), so I'm not too enthusiastic to try again.

The problem is simply that you still have the 4.1.10 Extension Pack installed. If you don't need USB2 for that VM, just disable USB2 in the VM settings, otherwise we could provide you a test build of the 4.1.51 Extension Pack. Do you need one?

O.K., first of all, I don't get the trap in the guest anymore. However, there still seems to be something not quite right. When running the test program above on a freshly booted up guest I get a SIGFPE (as expected). But the location appears to be somewhere in the runtime lib instead of in the program code itself. Also, when running the program three or more times in a row, no SIGFPE is thrown anymore. Instead it simply continues printing out the unmodified value of "d1" (i.e. 1.00000).

I am not sure if this is related, if not, just ignore:
I am running the rusty old 16-bit Microsoft C compiler for OS/2. It's CL.exe with its subcomponents C1.exe (preprocessor(?)), C2.exe (tokenizer(?), optimizer(!)), C3.exe (output generator(?)). There also exists large memory model variants C1L.exe,C2L.exe,C3L.exe that can deal with large source files. I would have to use C2L.exe because I am using /Oe /Og (global optimizations) with rather large source files which require it (otherwise I get a warning that global optimizations cannot be performed for this and that routine).
I therefore specify /B2c2l.exe either on commandline or via CL env. var.

When I run the compiler with /B2... on a W4 kernel within VirtualBox, it just works but then I occasionally have these general trap problems.
When I run the compiler with /B2... on an SMP kernel within VirtualBox, I get "varying" results. I never get a trap but on the first run I might get a "C1001" compiler error, whereas on subsequent runs I will get a "Command line error D2030: INTERNAL COMPILER ERROR in 'P2'". But the internal compiler error might also occur on the very first run.
This is true for a source file of any size, small or big.

Unfortunately I don't have a native OS/2 on a multi-core system to test the SMP kernel on.
I would be grateful if anybody could test this behaviour on a multi-core system with SMP kernel on a native OS/2 installation and compare with behaviour in VirtualBox.

As to probs with C2L.EXE: I have to correct my statement. It keeps trapping but the trap address is pretty much random. Even though I compile the very same file with the very same command line switches. See attached POPUPLOG.OS2.
My gut feeling for this error is that it depends on how many segments the (segmented) executable contains. The more, the worse.

Some news:
at some point in time Scott Garfinkle from IBM modified the W4 kernel to also support using a PSD along with it so I took the chance:
1) if I use a W4 kernel with OS2APIC.PSD (and only 1 CPU of course), it looks like it gets rid of the traps and C2L.exe starts working again. I will need more observation time and report back
2) using a W4 kernel without any PSD leads to the random traps

[The kernel trap is caused by a defect in the Warp4 kernels. The firefox code issues a fldcw which generates a math fault (#MF) exception which does not push an exception specific error code on to the stack. The kernel code should push a dummy error code on to the stack before entering the common exception handler code, but it does not. The common codes assumes that the EFLAGS are at a specific stack offset and checks the EFLAGS VM bit to determine if the trap occurred in V86 mode. If the bit happens to be set, the result is a trap in V86FaultEntry + 17. If the bit is not set, the kernel will trap or hang somewhat later because the stack contains on less dword than the code expects.

The defect has been fixed in the SMP kernel, so running the SMP kernel in VirtualBox is a possible workaround.

It is not known why the fldcw generates a #MF exception. This might be a VirtualBox defect. ]

1) POPUPLOG.OS2 was taken with the SMP kernel 10.104a and OS2APIC.PSD in place. I was using only one CPU.
2) I was invoking cl.exe 3 times with the very same parameters and the very same source file
3) the traps however occured at 3 different places in C2L.exe.

The only reason I was mentioning the W4 kernel is to state that C2L.exe does not trap when I use the W4 kernel in conjunction with OS2APIC.PSD.

Summary
changed from OS/2 guest crashes on floating point exception to OS/2 guest crashes on floating point exception => fixed in svn

This ticket has clearly outlived its usefulness. We really don't care about 20+ year old Microsoft compilers which have known problems running on modern systems.

The reported problem is now fixed and the fix should be included in the next VirtualBox release. The OS/2 kernel should no longer crash on Intel CPUs because it should never get a #MF exception anymore (unless it asked for it).

No, it doesn't. That's a completely different problem, which was visible on AMD CPUs since day one. Feel free to open a separate ticket, just don't expect it to be fixed anytime soon without giving some really good reason why we should spend time on that (it actually needs quite a bit of work).

would you create a new bug ? Unfortunately, Michal does not consider the OS/2 Microsoft C-Compiler 6.0 a valid test case. I am sure that once the "inconsistent behaviour" is fixed that then the OS/2 Microsoft C-Compiler 6.0 will happily exhibit consistent behaviour (trapping or not) provided the same command line switches and the same input source file is used.

I'm not convinced that the problems you are describing are really related to this issue. To summarize: We had a trap in the kernel due to VirtualBox was delivering #MF which is neither expected nor properly handled by the W4-Kernel. To my understanding this has been fixed. Now we see two different problems:

1.) the SIGFPE fires only once or twice per guest session
2.) the reported exception location is wrong

I cannot tell if these two issues have a common cause (maybe a bug in the DOS-like FPE emulation) or if these are two separate things. I also don't know, if the location reporting is broken in general (i.e. not only for SIGFPE). Maybe Michal can tell and depending on this I might open one or two new ticket(s). Given the time it took until this one was addressed, expecting it to be fixed "anytime soon" is probably not realistic anyway...

The two issues probably have a common cause. They are also specific to floating-point exceptions because a) the delivery is very different, and b) the FPU has a whole own internal state that's different from the CPU.

If you have some paying customer who depends on accurate FPU exception reporting in OS/2 guests, that would greatly accelerate the process. But I suspect there's no such customer because very few applications even run with FP exceptions unmasked. I'm sure you understand that we have better things to do. Of course if someone wants to spent a fun few weeks with VirtualBox and submit a patch, we won't object :)

I highly doubt the problems with MS C 6.0 are related at all. MS C 6.0 is well known to have all sorts of problems running on modern systems, both Windows and OS/2. If you still depend on MS C 6.0 in 2012, you have no one but yourself to blame. So far I've seen no evidence that the MS C 6.0 compiler even uses the FPU at all (it might, but I wouldn't count on that).

2) Don't know if this has a bearing on the problem: see the 2. half of comment 31.
On the other hand I understood from Michals comments that with the existing fix the CPU should no longer get a #MF exception at all any more.
But 2. half of comment 31 would explain why the W4 kernel traps on a #MF exception while the SMP kernel does not and it would turn out to be a W4 kernel bug that cannot be fixed in VirtualBox.

Yes, that fixes it for me.
The funny thing is, for the W4 kernel the program snippet reports 0x2003e to be the program trap address in the program whereas with the very same program the SMP kernel reports 0x2003b to be the program trap address. For both kernels, the program trap address is consistent across multiple invocations of the program.

Whatever, I consider this problem fixed at least for my AMD CPU.

Just for fun, I loaded a self written PSD in conjunction with the W4 kernel that enables the new way of floating point exception reporting (#MF exception) and disables IRQ13. Under this scenario I get the kernel trap as already shown in "newTrapScreen.PNG".

@dbsoft: you should make sure that you run the W4 kernel WITHOUT ANY PSD as it is supposed to be for the W4 kernel.

@lerdmann: The program trap address is probably the address of the instruction where the FP exception was detected, but not the address of the actual FP instruction which triggered the exception. I don't know why the reported address is different, but it should not cause problems.

The thing with the PSD is very interesting and yes, it basically exactly simulates the VirtualBox bug (FP exceptions are delivered as #MF and not IRQ13) which then triggers a bug/unexpected code path in the W4 kernel.

I can test on Intel... can boot my Mac in Windows or test a Mac build if one is available... but I am experiencing a trap still with the test program... a different one though. So not sure how valuable my test will be.

So I booted Windows, installed the same VirtualBox 4.3.17 and using the same image I no longer get the trap. Seems to be fixed on my Intel Mac in Windows 7... not sure if my AMD PC has something configured differently but I am still getting the trap there with the same software.

I also now tested on an older AMD Athlon 64 X2 running Windows 7 and I also get the trap 000e... seems to not be fixed on AMD for me. I also tested on an older Core 2 Duo Mac running Windows 7 which also seems to be fixed.

So my testing shows Intel is fixed, AMD is still bugged but with trap 000e now instead of 0008.

@dbsoft: find attached file "os2pcat.zip". It contains a PSD (and all the source code) that fixes the existing problem in the Warp 4 kernel.
Unzip OS2PCAT.PSD and OS2PCAT.SYM and place them into \os2\boot directory.
Add this line to config.sys:
PSD=OS2PCAT.PSD

That should fix your problem. It's no use fixing something in VirtualBox where in fact the Warp4 kernel is causing all the problems.

Add. info: with this PSD loaded in conjunction with the W4 kernel, the failing address (of the program snippet) displayed is exactly the same as for the SMP kernel.

I am not the vitualization expert but there must be some difference between AMD and Intel.
The point is that the PSD works around the bug in the W
4 kernel and it correctly handles what VirtualBox does on an unmasked floating point exception.

About virtualization, here is a relevant excerpt from the Intel documentation (volume 3, chapter 23.8) and I suppose that AMD followed closely:

The first processors to support VMX operation require that the
following bits be 1 in VMX operation: CR0.PE, CR0.NE, CR0.PG, and
CR4.VMXE.

The necessity to set the CR0.NE bit translates to the generation of an #MF exception for floating point exceptions instead of taking the route via an external interrupt controller issuing a IRQ13 interrupt.
I would believe that your AMD CPU is an earlier model that requires CR0.NE to be set in order to properly operate in a virtualized environment. As a consequence the Warp4 kernel has to properly deal with the #MF exception which is what OS2PCAT.PSD ensures.
Later CPUs might offer additional capabilities where setting CR0.NE bit is not necessary, I don't know.

The requirement to run with CR0.NE set (when using virtualization) applies to all Intel processors. The legacy FPU exception handling does not scale beyond a single CPU, which is why even OS/2 SMP kernels can't use it.

Intel probably plans to completely remove the old-style FPU exception handling in the future since it's not usable for any modern OS (where "modern" includes anything better than DOS, Windows 9x, and OS/2 W4-style kernels).

Anyway, if the PSD is necessary, it's a bug in VirtualBox (which we can't reproduce). Then again, the PSD isn't a bad solution and might actually make things slightly faster because FPU exceptions don't need to be intercepted.

Anyway, if the PSD is necessary, it's a bug in VirtualBox (which we can't reproduce). Then again, the PSD isn't a bad solution and might actually make things slightly faster because FPU exceptions don't need to be intercepted.

That is kind of what I was thinking too since it works fine on Intel... the one processor I tried on is quite old but the newer one I purchased just a few months ago... it is Piledriver based which I think is the current series originally released at the end of 2012. So I don't think it is a problem with it being an old CPU.

I had successfully tested 4.3.17 (with the W4 kernel and without any PSD) with an Intel dual-Core CPU and NOT with an AMD CPU.
Combining with dbsoft's comments it looks like the problem is fixed for Intel CPUs but apparently not for AMD CPUs.

Here is another Windows test build which contains a fix for AMD hosts. And here is the extpack.

Initial testing seems to show it works, I only tested on the new processor and just commented out the PSD line in the CONFIG.SYS to remove the OS2PCAT mentioned above. I'll test some more to verify the PSD is actually not loading and that it works on the other processor later today. Thanks!

Tested with an image that I did not install the PSD in and also on the older AMD system and both work correctly! Thanks looks like it is fixed for AMD now too.