Cog

Can an arranged marriage be a perfect one, or can only a love-match result in the truly perfect union? Is perfection when two hearts beat as one, when Kate submits to Petruchio, or when Mr. accepts that Mrs. is wearing the trousers? Personally I don’t think there’s such a thing as a perfect marriage. All marriages, especially the long-lived ones take work. There are complex issues to be resolved, which requires good communication and division of responsibility. And of course a marriage won’t last unless the partners are fundamentally compatible. And so it is with the two senior partners in the Cog JIT VM, the CoInterpreter, older spouse full of experience (the old interpreter, facelifted for the new context, the existing primitive set), and the Cogit, the younger partner, creative tear-away, needing lots of support from their other half, but bringing real energy to the marriage.

But is it prudent, the anxious parents wonder, to arrange a marriage between these two souls in the first place? After all many JITs don’t have an interpreter. Well, doing without an interpreter means having to JIT all code at all times and this can be very tricky. For example, looking back at the Inline Cacheing post, think what has to happen when a send miss requires the allocation of a Closed PIC but the system runs out of code memory and needs to do a reclamation to free up code memory. It has to keep hold of the method containing the send site and the old and new target methods while these are moving beneath its feet as the reclamation compacts the code in the machine code zone. Or consider being faced with a huge autogenerated bytecode method whose jitted form may be ten times larger. The former problem is tricky; the latter is often terminal.

Aside: You may ask why restrict the amount of memory used to generate code? One reason is to bound footprint, but a better one is to increase performance; a smaller working set can result in far better instruction cache performance and hence better overall performance even if code reclamations are common. Still better for a dynamic language is that a limited contiguous code space is extremely easy to manage, for example when scanning all send sites in machine code to unlink sends when methods are redefined. Hence the way I’ve always implemented JITs is to compile code in a fixed-size space (the size of which can be determined at startup). It works well.

Further, specfic to Cog, the precursor StackInterpreter already had lots of infrastructure I needed to either use directly or carry forward in some way, such as the context-to-stack mapping machinery and the existing primitive set. So a CoInterpreter solves a number of problems. It is already the home for the existing primitive set; it is a fall-back to interpreted execution at times when it would be tricky to continue in machine code (such as during machine code reclamations); it allows avoiding jitting code in unproductive cases (seldom used or huge methods).

Stacked to Attract

The most fundamental area of compatibility is the stack, and we choose two different formats to allow interpreter and JIT to share the Smalltalk stack, and carefully manage the C stack to allow the Cogit to call on services in the CoInterpreter. If you haven’t already done so you might like to read the post on context to stack mapping that gives an in-depth account of Smalltalk stack management in Cog. In the StackInterpreter there was a single frame format:

the paper on the HiPE (High-Performance Erlang) VM that the implementors decided to keep the interpreter and machine code stacks separate. The paper mentions the apparently many bugs they had keeping the two different kinds of frames on the same stack, that the co-located scheme was abandoned and then goes into some detail explaining the difficulties keeping the two stacks separate imposes. If some thought is given to how the two kinds of frame can interface I contend there is manageable complexity and nothing particularly hairy about the approach and I offer Cog as an existence proof.

The two keys to the interpreter frame format are a) that it support the existing interpreter, hence having much the same layout as the above and b) that an interpreter caller frame can be returned to via a native return instruction executed by a machine code callee frame. Returning from machine code to a specific bytecode in an interpreted method can’t be easily done with a single return pointer (it

can be done if one is prepared to synthesize a thunk of machine code for each particular return, but this doesn’t seem very sensible to me). Instead, the native return address can be that of a sequence of code that trampolines control back into the interpreter. We must be able to return to the interpreter from many points; for example in the middle of a process switch primitive, called from machine code that finds that the target process has an interpreted frame for its suspendedContext. Hence we establish a setjmp handler immediately prior to entry into the interpreter and return to the interpreter via longjmp. I’ll come back to cogitceCaptureCStackPointers later.

CoInterpreter methods for initializationenterSmalltalkExecutiveImplementation"Main entry-point into the interpreter at each execution level, where an execution level is either the start of execution or reentry for a callback. Capture the C stack pointers so that calls from machine-code into the C run-time occur at this level. This is the actual implementation, separated from enterSmalltalkExecutive so the simulator can wrap it in an exception handler and hence simulate the setjmp/longjmp."<inline:false>cogitassertCStackWellAligned.cogitceCaptureCStackPointers."Setjmp for reentry into interpreter from elsewhere, e.g. machine-code trampolines."selfsigset:reenterInterpreterjmp:0.(selfisMachineCodeFrame:framePointer)ifTrue:[selfreturnToExecutive:falsepostContextSwitch:true"NOTREACHED"].selfsetMethod:(selfiframeMethod:framePointer).instructionPointer=cogitceReturnToInterpreterPCifTrue:[instructionPointer:=selfiframeSavedIP:framePointer].selfassertValidExecutionPointe:instructionPointerr:framePointers:stackPointerimbar:true.selfinterpret.^0

In Smalltalk the setjmp/longjmp pair is simulated via exception handling

CoInterpreter methods for cog jit supportsigset: aJumpBufjmp:sigSaveMask"Hack simulation of sigsetjmp/siglongjmp. Assign to reenterInterpreter the exception that when raised simulates a longjmp back to the interpreter."<doNotGenerate>reenterInterpreter:=ReenterInterpreternewreturnValue:0;yourself.^0

siglong:aJumpBufjmp:returnValue"Hack simulation of sigsetjmp/siglongjmp. Signal the exception that simulates a longjmp back to the interpreter."<doNotGenerate>aJumpBuf==reenterInterpreterifTrue:[selfassertValidExecutionPointe:instructionPointerr:framePointers:stackPointerimbar:true].aJumpBufreturnValue:returnValue;signal

CoInterpreter methods for initializationenterSmalltalkExecutive"Main entry-point into the interpreter at each execution level, where an execution level is either the start of execution or reentry for a callback."<cmacro:‘() enterSmalltalkExecutiveImplementation()’>"Simulation of the setjmp in enterSmalltalkExecutiveImplementation for reentry into interpreter."[([selfenterSmalltalkExecutiveImplementation]on:ReenterInterpreterdo:[:ex|exreturn:exreturnValue])=ReturnToInterpreter]whileTrue

We then save the bytecode pc to at which to resume interpretation in an additional interpreter frame slot. Hence in Cog an interpreter frame has the following format:

Since with the JIT the VM spends the bulk of its time in machine code and since the JIT can use knowledge such as the method’s argument count at compile-time we can make the machine code frame two slots smaller:

Since machine code methods are aligned on an 8-byte bounday the least signficant three bits of the method address are zero and can be used for three flags, two of which are used (had context and is block); see initializeFrameIndices below. The smallest frame, a zero-argument method with no local temps hence looks like

The minimum frame size is important; it defines the maximum number of contexts that may be allocated to flush a stack page to the heap, for which the garbage collector maintains reserve free space. Compared to a typical C frame layout method and context are extra. The former makes finding the metadata associated with a machine code frame (its argument count etc) trivial since all this is accessible via the machine code method. The context slot is required to allow the Smalltalk-specific mapping of stack frames to contexts. Of course the two formats mean that the CoInterpreter, which naturally inherits from the StackInterpreter must provide different offsets for the receiver, and in a number of cases must implement two variants of an accessor depending on the frame format:

CoInterpreter class methods for initialization

initializeFrameIndices"Format of a stack frame. Word-sized indices relative to the frame pointer. Terminology Frames are either single (have no context) or married (have a context). Contexts are either single (exist on the heap), married (have a context) or widowed (had a frame that has exited). Stacks grow down:

In an interpreter frame frame flags holds the number of arguments (since argument temporaries are above the frame) the flag for a block activation and the flag indicating if the context field is valid (whether the frame is married). saved method ip holds the saved method ip when the callee frame is a machine code frame. This is because the saved method ip is actually the ceReturnToInterpreterTrampoline address. In a machine code frame the flag indicating if the context is valid is the least significant bit of the method pointer the flag for a block activation is the next most significant bit of the method pointer

Interpreter frames are distinguished from method frames by the method field which will be a pointer into the heap for an interpreter frame and a pointer into the method zone for a machine code frame.

The first frame in a stack page is the baseFrame and is marked as such by a saved fp being its stackPage, in which case the first word on the stack is the caller context (possibly hybrid) beneath the base frame."

"For debugging nil out values that differ in the StackInterpreter."FrameSlots:=nil.IFrameSlots:=fxCallerSavedIP–fxIFReceiver+1.MFrameSlots:=fxCallerSavedIP–fxMFReceiver+1.

FoxCallerSavedIP:=fxCallerSavedIP*BytesPerWord."In Cog a base frame’s caller context is stored on the first word of the stack page."FoxCallerContext:=nil.FoxSavedFP:=fxSavedFP*BytesPerWord.FoxMethod:=fxMethod*BytesPerWord.FoxThisContext:=fxThisContext*BytesPerWord.FoxFrameFlags:=nil.FoxIFrameFlags:=fxIFrameFlags*BytesPerWord.FoxIFSavedIP:=fxIFSavedIP*BytesPerWord.FoxReceiver:=#undeclaredasSymbol.FoxIFReceiver:=fxIFReceiver*BytesPerWord.FoxMFReceiver:=fxMFReceiver*BytesPerWord.

"N.B. There is room for one more flag given the current 8 byte alignment of methods (which is at least needed to distinguish the checked and uncecked entry points by their alignment."MFMethodFlagHasContextFlag:=1.MFMethodFlagIsBlockFlag:=2.MFMethodFlagsMask:=MFMethodFlagHasContextFlag+MFMethodFlagIsBlockFlag.MFMethodMask:=(MFMethodFlagsMask+1)negated

Returning from the interpreter to machine-code is quite straight-forward; the interpreter can test the return pc and since all machine code pcs are lower than the object heap (another convenient consequence of having a fixed-size code zone) the interpreter simply compares against the heap base (localIP

localIP:=selfframeCallerSavedIP:localFP.localSP:=localFP+(selfframeStackedReceiverOffset:localFP).localFP:=callersFPOrNull.localIPasUnsignedInteger<objectMemorystartOfMemoryifTrue:[localIPasUnsignedInteger~=cogitceReturnToInterpreterPCifTrue:["localIP in the cog method zone indicates a return to machine code."^selfreturnToMachineCodeFrame].localIP:=selfpointerForOop:(selfiframeSavedIP:localFP)].selfinternalStackTopPut:localReturnValue.selfsetMethod:(selfiframeMethod:localFP).^selffetchNextBytecode

returnToMachineCodeFrame"Return to the previous context/frame after assigning localIP, localSP and localFP."<inline:true>cogitassertCStackWellAligned.selfassert:localIPasUnsignedInteger<objectMemorystartOfMemory.selfassert:(selfisMachineCodeFrame:localFP).selfassertValidExecutionPointe:localIPasUnsignedIntegerr:localFPs:localSPimbar:false.selfinternalStackTopPut:localIP.selfinternalPush:localReturnValue.selfexternalizeFPandSP.cogitceEnterCogCodePopReceiverReg"NOTREACHED"

The

baseFrameReturn phrase is necessary since the Smalltalk stack is actually organized as a set of small (4kb) stack pages, reasons for which are explained in the stack mapping post. Handling a base frame return in machine code is tricky since by the time the return instruction has executed the frame has already been torn down (the frame pointer set to the saved frame pointer, which in the case of a base frame is zero). Base frames are connected to other pages or the rest of the sender context chain via a reference to the sender context (which may itself be a context on the sender stack page). So we must be able to locate the spouse context and sender context of a base frame after the base frame has been torn down. The uppermost top two words on a stack page hold these two contexts (<blush>and yes I should define accessors for these</blush>). Again the use of a special return address causes the VM to jump to the code for a machine code base frame return.

CoInterpreter methods for trampolinesceBaseFrameReturn:returnValue"Return across a stack page boundary. The context to return to (which may be married) is stored in the first word of the stack. We get here when a return instruction jumps to the ceBaseFrameReturn: address that is the return pc for base frames. A consequence of this is that the current frame is no longer valid since an interrupt may have overwritten its state as soon as the stack pointer has been cut-back beyond the return pc. So to have a context to send the cannotReturn: message to we also store the base frame’s context in the second word of the stack page."<api>|contextToReturnTocontextToReturnFromisAContextthePagenewPageframeAbove|<var:#thePagetype:#’StackPage *’><var:#newPagetype:#’StackPage *’><var:#frameAbovetype:#’char *’>contextToReturnTo:=stackPageslongAt:stackPagebaseAddress.

"The stack page is effectively free now, so free it. We must free it to be correct in determining if contextToReturnTo is still married, and in case makeBaseFrameFor: cogs a method, which may cause a code compaction, in which case the frame must be free to avoid the relocation machinery tracing the dead frame. Since freeing now temporarily violates the page-list ordering invariant, use the assert-free version."stackPagesfreeStackPageNoAssert:stackPage.isAContext:=selfisContext:contextToReturnTo.(isAContextand:[selfisStillMarriedContext:contextToReturnTo])ifTrue:[framePointer:=selfframeOfMarriedContext:contextToReturnTo.thePage:=stackPagesstackPageFor:framePointer.framePointer=thePageheadFPifTrue:[stackPointer:=thePageheadSP]ifFalse:["Returning to some interior frame, presumably because of a sender assignment. Move the frames above to another page (they may be in use, e.g. via coroutining). Make the interior frame the top frame."frameAbove:=selffindFrameAbove:framePointerinPage:thePage."Since we’ve just deallocated a page we know that newStackPage won’t deallocate an existing one."newPage:=selfnewStackPage.selfassert:newPage=stackPage.selfmoveFramesIn:thePagethrough:frameAbovetoPage:newPage.stackPagesmarkStackPageMostRecentlyUsed:newPage.framePointer:=thePageheadFP.stackPointer:=thePageheadSP]]ifFalse:[(isAContextand:[objectMemoryisIntegerObject:(objectMemoryfetchPointer:InstructionPointerIndexofObject:contextToReturnTo)])ifFalse:[contextToReturnFrom:=stackPageslongAt:stackPagebaseAddress–BytesPerWord.selftearDownAndRebuildFrameForCannotReturnBaseFrameReturnFrom:contextToReturnFromto:contextToReturnToreturnValue:returnValue.^selfexternalCannotReturn:returnValuefrom:contextToReturnFrom]."void the instructionPointer to stop it being incorrectly updated in a code compaction in makeBaseFrameFor:."instructionPointer:=0.thePage:=selfmakeBaseFrameFor:contextToReturnTo.framePointer:=thePageheadFP.stackPointer:=thePageheadSP].selfsetStackPageAndLimit:thePage.selfassert:(stackPagesstackPageFor:framePointer)=stackPage.(selfisMachineCodeFrame:framePointer)ifTrue:[selfpush:returnValue.
* cogitceEnterCogCodePopReceiverReg."NOTREACHED"].instructionPointer:=selfstackTop.instructionPointer=cogitceReturnToInterpreterPCifTrue:[instructionPointer:=selfiframeSavedIP:framePointer].selfsetMethod:(selfiframeMethod:framePointer).selfstackTopPut:returnValue."a.k.a. pop saved ip then push result"selfassert:(selfcheckIsStillMarriedContext:contextToReturnTocurrentFP:framePointer).
* selfsiglong:reenterInterpreterjmp:ReturnToInterpreter."NOTREACHED"^nil

So between an interpreter caller and a machine code callee the return address must be ceReturnToInterpreterTrampoline and the caller’s iframeSavedIP: must be valid, whereas between a machine-code caller and an interpreted callee the machine code return address is fine. Similarly, the convention for the top frame on other than the active stack page is that it always contain a machine-code pc; ceReturnToInterpreter for an interpreted frame and the machine code return address for a machine code frame. The return pc for a base frame is ceBaseFrameReturnTrampoline.

Managing the C Stack

So far we’ve seen how trampolines are used to jump from machine-code to C, and how setjmp/longjmp can be used to get back into the interpreter from any point. But how do we manage the C stack so that we can call into the run-time and know that the C frame containing the setjmp is valid? Since the C stack is contiguous and grows in one direction (typically down) – at least on the OS’s Cog runs on today (Windows, Mac OS X, linux) – all that’s needed is to record the C stack in enterSmalltalkExecutiveImplementation via cogitceCaptureCStackPointers. Any jump into machine code will occur form a point further down the C stack. Any jump back to the C stack discards those intervening frames, cutting the C stack back to the activation of enterSmalltalkExecutiveImplementation. What we mustn’t do is capture the C stack pointers every time we jump into machine-code. That could cause uncontrolled stack growth.

Cogit methods for initializationgenerateStackPointerCapture"Generate a routine ceCaptureCStackPointers that will capture the C stack pointer, and, if it is in use, the C frame pointer. These are used in trampolines to call run-time routines in the interpreter from machine-code."

generateCaptureCStackPointers:captureFramePointer"Generate the routine that writes the current values of the C frame and stack pointers into variables. These are used to establish the C stack in trampolines back into the C run-time.

This is a presumptuous quick hack for x86. It is presumptuous for two reasons. Firstly the system’s frame and stack pointers may differ from those we use in generated code, e.g. on register-rich RISCs. Secondly the ABI may not support a simple frameless call as written here (for example 128-bit stack alignment on Mac OS X)."|startAddress|<inline:false>selfallocateOpcodes:32bytecodes:0.initialPC:=0.endPC:=numAbstractOpcodes–1.startAddress:=methodZoneBase.captureFramePointerifTrue:[selfMoveR:FPRegAw:selfcFramePointerAddress]."Capture the stack pointer prior to the call."backEndleafCallStackPointerDelta=0ifTrue:[selfMoveR:SPRegAw:selfcStackPointerAddress]ifFalse:[selfMoveR:SPRegR:TempReg.selfAddCq:backEndleafCallStackPointerDeltaR:TempReg.selfMoveR:TempRegAw:selfcStackPointerAddress].selfRetN:0.selfoutputInstructionsForGeneratedRuntimeAt:startAddress.selfrecordGeneratedRunTime:‘ceCaptureCStackPointers’address:startAddress.ceCaptureCStackPointers:=selfcCoerceSimple:startAddressto:#’void (*)(void)’

isCFramePointerInUse<doNotGenerate>"This should be implemented externally, e.g. in sqPlatMain.c."^true

and from e.g. platforms/Mac OS/vm/sqMacMain.c

#if COGVM/* * Support code for Cog. * a) Answer whether the C frame pointer is in use, for capture of the C stack * pointers. */# if defined(i386) || defined(__i386) || defined(__i386__)/* * Cog has already captured CStackPointer before calling this routine. Record * the original value, capture the pointers again and determine if CFramePointer * lies between the two stack pointers and hence is likely in use. This is * necessary since optimizing C compilers for x86 may use %ebp as a general- * purpose register, in which case it must not be captured. */intisCFramePointerInUse(){extern unsigned long CStackPointer, CFramePointer;extern void (*ceCaptureCStackPointers)(void);unsigned long currentCSP = CStackPointer;

The C stack needs to be captured and restored on callbacks. Lets follow this through assuming a call to the FFI primitive from machine-code. First of all the machine-code method calls the interpreter’s FFI primitive, using the current values of CFramePointer and CStackPointer as established in a previous enterSmalltalkExecutiveImplementation invocation. The primitive then marshalls arguments to the C stack and calls out to external code. If that code calls-back into the VM the C stack between the FFI call-out and the call-back must be preserved for the VM to return from the call-back. When the callback enters the VM it must do so via enterSmalltalkExecutiveImplementation, and set-up a new reenterInterpreter jmpbuf for jumping into the interpreter at this level. So callbackEnter: must both save and restore the C stack pointers and the reenterInterpreter jmpbuf.

"Suspend the currently active process"suspendedCallbacksat:jmpDepthput:selfactiveProcess."We need to preserve newMethod explicitly since it is not activated yet and therefore no context has been created for it. If the caller primitive for any reason decides to fail we need to make sure we execute the correct method and not the one ‘last used’ in the call back"suspendedMethodsat:jmpDepthput:newMethod.selftransferTo:selfwakeHighestPriorityfrom:CSCallbackLeave.

"Typically, invoking the callback means that some semaphore has been signaled to indicate the callback. Force an interrupt check as soon as possible."selfforceInterruptCheck.

"Restore the previous CStackPointers and interpreter entry jmp_buf."cogitsetCStackPointer:currentCStackPointer.cogitsetCFramePointer:currentCFramePointer.selfmem:reenterInterpretercp:(selfcCoerceSimple:savedReenterInterpreterto:#’void *’)y:(selfsizeof:#’jmp_buf’asSymbol).
:upto here"Transfer back to the previous process so that caller can push result"selfputToSleep:selfactiveProcessyieldingIf:preemptionYields.selftransferTo:(suspendedCallbacksat:jmpDepth)from:CSCallbackLeave.newMethod:=suspendedMethodsat:jmpDepth."see comment above"argumentCount:=selfargumentCountOf:newMethod.selfassert:wasInMachineCode=(selfisMachineCodeFrame:framePointer).calledFromMachineCodeifTrue:[instructionPointer>=objectMemorystartOfMemoryifTrue:[selfiframeSavedIP:framePointerput:instructionPointer.instructionPointer:=cogitceReturnToInterpreterPC]]ifFalse:["Even if the context was flushed to the heap and rebuilt in transferTo:from: above it will remain an interpreted frame because the context’s pc would remain a bytecode pc. So the instructionPointer must also be a bytecode pc."selfassert:(selfisMachineCodeFrame:framePointer)not.selfassert:instructionPointer>objectMemorystartOfMemory].selfassert:primFailCode=0.jmpDepth:=jmpDepth–1.^true

when the return from callback is done longjmp-ing back to the setjmp the C stack and reenterInterreter set-up in enterSmalltalkExecutiveImplementation is cut-back and restored to their previous values.

Keeping Up Appearances

Of course while the relationship between CoInterpreter and Cogit has to work in private (i.e. when compiled to C and executed in use) it also has to put on a good show in social settings. When being developed, simulated machine code must also be able to gain intimate access the CoInterpreter’s internals. This is on the one hand about faithful simulation, and on the other hand about the difference between, in the production vm, machine code calling into C code, and, in the simulator, turning a machine code call into a Smalltalk message send to a given receiver. This needs to be done when simulating run-time calls, and so we may as well use the same mechanism for accessing CoInterpreter variables as well. This of course takes us into the heart of the machine-code simulator, and I think this bit is particularly interesting.

A simple way of breaking-out of machine code is simply to dole out illegal addresses for variables and run-time routines:

Cogit methods for initializationsimulatedAddressFor:anObject"Answer a simulated address for a block or a symbol. This is an address that can be called, read or written by generated machine code, and will be mapped into a Smalltalk message send or block evaluation."<doNotGenerate>^simulatedAddressesat:anObjectifAbsentPut:[(simulatedAddressessize+101*BytesPerWord)negatedbitAnd:selfaddressSpaceMask]

simulatedVariableAddress:getterin:receiver"Answer a simulated variable. This is a variable whose value can be read by generated machine code."<doNotGenerate>|address|address:=selfsimulatedAddressFor:getter.simulatedVariableGettersat:addressifAbsentPut:[MessageSendreceiver:receiverselector:getter].^address

simulatedReadWriteVariableAddress:getterin:receiver"Answer a simulated variable. This is a variable whose value can be read and written by generated machine code."<doNotGenerate>|address|address:=selfsimulatedVariableAddress:getterin:receiver.simulatedVariableSettersat:addressifAbsentPut:[|setter|setter:=(getter,‘:’)asSymbol.[:value|receiverperform:setterwith:value]].^address

Cogit methods for trampoline supportgenLoadStackPointers"Switch back to the Smalltalk stack. Assign SPReg first because typically it is used immediately afterwards."selfMoveAw:coInterpreterstackPointerAddressR:SPReg.selfMoveAw:coInterpreterframePointerAddressR:FPReg.^0

Up in Smalltalk the primitive failure gets turned into a suitable exception:

BochsIA32Alien methods for primitivesprimitiveRunInMemory:memoryArray"<Bitmap|ByteArray>"minimumAddress:minimumAddress"<Integer>"readOnlyBelow:minimumWritableAddress"<Integer>""Run the receiver using the argument as the store. Origin the argument at 0. i.e. the first byte of the memoryArray is address 0. Make addresses below minimumAddress illegal. Convert out-of-range call, jump and memory read/writes into register instructions into ProcessorSimulationTrap signals."<primitive:‘primitiveRunInMemoryMinimumAddressReadWrite’module:‘BochsIA32Plugin’error:ec>^ec==#’inappropriate operation’ifTrue:[selfhandleExecutionPrimitiveFailureIn:memoryArrayminimumAddress:minimumAddressreadOnlyBelow:minimumWritableAddress]ifFalse:[selfreportPrimitiveFailure]

BochsIA32Alien methods for error handlinghandleCallFailureAt:pc"<Integer>"in:memoryArray"<Bitmap|ByteArray>"readOnlyBelow:minimumWritableAddress"<Integer>""Convert an execution primitive failure for a call into a ProcessorSimulationTrap signal."|relativeJump|relativeJump:=memoryArraylongAt:pc+2bigEndian:false.^(ProcessorSimulationTrappc:pcnextpc:pc+5address:(pc+5+relativeJump)signedIntToLongtype:#call)signal

handleMovGvEvFailureAt:pc"<Integer>"in:memoryArray"<Bitmap|ByteArray>"readOnlyBelow:minimumWritableAddress"<Integer>""Convert an execution primitive failure for a register load into a ProcessorSimulationTrap signal."|modrmByte|^(((modrmByte:=memoryArraybyteAt:pc+2)bitAnd:16rC7)=16r5)"ModRegInd & disp32"ifTrue:[(ProcessorSimulationTrappc:pcnextpc:pc+6address:(memoryArrayunsignedLongAt:pc+3bigEndian:false)type:#readaccessor:(#(eax:ecx:edx:ebx:esp:ebp:esi:edi:)at:((modrmByte>>3bitAnd:7)+1)))signal]ifFalse:[selfreportPrimitiveFailure]

The exception handler for ProcessorSimulationTraps is in the Cogit’s core machine-code simulation entry-point. I’m going to include it warts and all, since I think there’s lots of interest here. The exception handler almost right at the end.

One of the first things to notice is that it’s peppered with useful expressions I can evaluate when in the debugger. The second thing to notice is that when single-stepping this is like the In-Circuit-Emulator from the 22nd century. You can define just about any break-point function you can imagine and assign it to breakBlock, and it’ll be evaluated every instruction. The single-stepper can capture the last N instructions and register states so that one can look back at the instructions executed immediately before an error. It will print individual instructions and the registers to the transcript. It makes machine-code debugging verge on the enjoyable, and certainly leaves things like gdb in the dust.

handleCallOrJumpSimulationTrap:aProcessorSimulationTrap<doNotGenerate>|evaluablefunctionresultsavedFramePointersavedStackPointersavedArgumentCountrpc|evaluable:=simulatedTrampolinesat:aProcessorSimulationTrapaddress.function:=evaluableisBlockifTrue:[‘aBlock; probably some plugin primitive’]ifFalse:[evaluableselector].function~~#ceBaseFrameReturn:ifTrue:[coInterpreterassertValidExternalStackPointers].(functionbeginsWith:‘ceShort’)ifTrue:[^selfperform:functionwith:aProcessorSimulationTrap].aProcessorSimulationTraptype=#callifTrue:[processorsimulateCallOf:aProcessorSimulationTrapaddressnextpc:aProcessorSimulationTrapnextpcmemory:coInterpretermemory.selfrecordInstruction:{‘(simulated call of ‘.aProcessorSimulationTrapaddress.‘/’.function.‘)’}]ifFalse:[processorsimulateJumpCallOf:aProcessorSimulationTrapaddressmemory:coInterpretermemory.selfrecordInstruction:{‘(simulated jump to ‘.aProcessorSimulationTrapaddress.‘/’.function.‘)’}].savedFramePointer:=coInterpreterframePointer.savedStackPointer:=coInterpreterstackPointer.savedArgumentCount:=coInterpreterargumentCount.result:=["self halt: evaluable selector."evaluablevalueWithArguments:(processorpostCallArgumentsNumArgs:evaluablenumArgsin:coInterpretermemory)]on:ReenterMachineCodedo:[:ex|exreturn:exreturnValue].coInterpreterassertValidExternalStackPointers."Verify the stack layout assumption compileInterpreterPrimitive: makes, provided we’ve not called something that has built a frame, such as closure value or evaluate method, or switched frames, such as primitiveSignal, primitiveWait, primitiveResume, primitiveSuspend et al."(functionbeginsWith:‘primitive’)ifTrue:[coInterpreterprimFailCode=0ifTrue:[(#(primitiveClosureValueprimitiveClosureValueWithArgsprimitiveClosureValueNoContextSwitchprimitiveSignalprimitiveWaitprimitiveResumeprimitiveSuspendprimitiveYieldprimitiveExecuteMethodArgsArrayprimitiveExecuteMethodprimitivePerformprimitivePerformWithArgsprimitivePerformInSuperclassprimitiveTerminateToprimitiveStoreStackpprimitiveDoPrimitiveWithArgs)includes:function)ifFalse:[selfassert:savedFramePointer=coInterpreterframePointer.selfassert:savedStackPointer+(savedArgumentCount*BytesPerWord)=coInterpreterstackPointer]]ifFalse:[selfassert:savedFramePointer=coInterpreterframePointer.selfassert:savedStackPointer=coInterpreterstackPointer]].result~~#continueNoReturnifTrue:[selfrecordInstruction:{‘(simulated return to ‘.processorretpcIn:coInterpretermemory.‘)’}.rpc:=processorretpcIn:coInterpretermemory.selfassert:(rpc>=codeBaseand:[rpc<methodZonezoneLimit]).processorsmashCallerSavedRegistersWithValuesFrom:16r80000000by:BytesPerWord;simulateReturnIn:coInterpretermemory].selfassert:(resultisInteger"an oop result"or:[result==coInterpreteror:[result==objectMemoryor:[#(nilcontinuecontinueNoReturn)includes:result]]]).processorcResultRegister:(resultifNil:[0]ifNotNil:[resultisIntegerifTrue:[result]ifFalse:[16rF00BA222]])

Now handleCallOrJumpSimulationTrap: is anything but simple, but it does a lot. Its main complication comes from assert-checking. Most of the time when the simulator sends a message to the CoInterpreter (e.g. that of a run-time routine) we expect that the stack frame has not changed when we return. But for a particular set of primitives and run-time routines that isn’t so and hence the stack assertions must be avoided in their case. On return to machine code any caller-saved registers could contain arbitrary values, and on reentry to machine code any and all registers could contain arbitrary values so we smash them to avoid values persisting in the simulated processor from the time of a simulated call. This is also one place where the simulation of enilopmarts, jumps back into machine-code, has to be handled specially to avoid uncontrolled Smalltalk stack growth, henc the signal of ReenterMachineCode.

Cogit methods for simulation onlysimulateEnilopmart:enilopmartAddressnumArgs:n<doNotGenerate>"Enter Cog code, popping the class reg and receiver from the stack and then returning to the address beneath them. In the actual VM the enilopmart is a function pointer and so senders of this method end up calling the enilopmart to enter machine code. In simulation we either need to start simulating execution (if we’re in the interpreter) or return to the simulation (if we’re in the run-time called from machine code. We should also smash the register state since, being an abnormal entry, no saved registers will be restored."selfassert:(coInterpreterisOnRumpCStack:processorsp).selfassert:((coInterpreterstackValue:n)between:guardPageSizeand:methodZonezoneLimit–1).(printInstructionsor:[printRegisters])ifTrue:[coInterpreterprintExternalHeadFrame].processorsmashRegistersWithValuesFrom:16r80000000by:BytesPerWord;simulateLeafCallOf:enilopmartAddressnextpc:16rBADF00Dmemory:coInterpretermemory."If we’re already simulating in the context of machine code then this will take us back to handleCallSimulationTrap:. Otherwise start executing machine code in the simulator."(ReenterMachineCodenewreturnValue:#continueNoReturn)signal.selfsimulateCogCodeAt:enilopmartAddress."We should either longjmp back to the interpreter or stay in machine code so control should not reach here."selfassert:false

With the above machinery machine code can freely access variables and methods of the written-in -Smalltalk simulated VM. So for example, here’s the trampoline that calls ceReturnToInterpreter:

To save time when simulating I hold CStackPointer and CFramePointer in the byte array that holds the entire heap. But framePointer & stackPointer, instance variables of the CoInterpreter, and ceReturnToInterpreter:, a method in the CoInterpreter all have illegal addresses and get accessed via the ProcessorSimulationTrap exception machinery above. The end result is that in the simulator I generate exactly equivalent machine code to that executed by the production generated-to-C VM and I’m pretty confident that this approach has resulted in far fewer code generation bugs in unsimulated code than otherwise. Almost all machine code generation bugs show up in the simulator, and as you can imagine it’s quite a powerful development tool.So an arranged marriage, certainly, with some complex communications issues, but a successful one I think.