X86/ARM Emulator

X86/ARM emulator written using C++ and assembler for the .NET environment.

Introduction

A quick search reveals that the WWW holds a vast array of material and tools classed as emulators, virtual machines and interpreters. These range from the highly sophisticated to simple academic exercises. If you narrow your search and look for X86 assembler emulators, you will find commercial DOS emulators, Basic language tools, Java utilities such as Jasmin, and a CodeProject article, ASM.net X86, amongst others. Another large class of related tools are standalone X86 debuggers and disassemblers, which can be used both to step through code fragments and examine generated machine code statically or on the fly. This article offers something different. A Visual Studio Express based solution, written using C++ and assembler, that allows the user to write and execute X86, X86/64 and ARM assembler line by line as easily as using a text editor. As code is only emulated, errors are trapped and reported within the program without any wider impact.

Updates

The first version was a partial implementation for X86 code only. An updated set of solution files is now available via the DropBox link below that implements some ARM code, specifically the data processing sub-set of instructions. Operand syntax for ARM requires r1-r15 for registers and numeric immediates starting with #(base10) or #&(base16). The Condition Code defaults to AL (Always) and Set Flags to Not Set(Blank).

Examples:

LabelMneu CcSetFlg RdRnRm/immShift

label1:addalsr1r2#456

label2:andeqr0r1r4

label3:subalsr4r2r1lsl#4

adcalr3r2r4 ror#2

label4:eoralsr0r1#&7FF

Background

This article is not aimed at expert readers or professional application, but rather those seeking to develop their knowledge of C++ and assembler used in the .NET environment, or who like idea of such a tool for some anorak fun. For convenience, I have chosen to use and provide materials based on Visual Studio Express 2012 running under Windows 7 64bit (executables are X86 32 bit). My objective at the outset was to learn how to use Visual C++ to write multiform applications, to learn X86 assembler, and how to combine both of these to produce a single executable. Along the way I experimented with inline assembler, separate assembler modules, ARM assembler, and X64 bit assembler.

At this stage I am not presenting a finished project, but work in progress that others might like to experiment with or even develop. Under Windows 7 the basic framework executes reliably and the X86 virtual machine is fully functional for a core set of instructions (listed in the Help menu). The ARM and X64 elements are just illustrative at this stage, and X86 floating point is not yet implemented. I have utilized Windows Forms extensively, written classes and threading, and experimented with some of the C standard library. I have looked at new material such as the Atomic Class, but have been prevented from quick adoption by issues arising from using a mixture of managed and native code. I have even experimented with linking to FORTRAN modules as a basis for interpreting X86 floating point instructions, with binary files for data exchange, but effective use of FORTRAN requires close integration using the .net environment. Whilst such tools and compilers exist (Silverfrost FTN95, etc), I do not have access to the full commercial offerings and associated development environments to trial such ideas.

If your interest is bog standard object orientated development, or you are a professional C++ or assembler programmer, look away. I have not been a professional programmer since the late 1970s, at which time I used ICL System 4 assembler and Fortran IV. I have had a lifetime in ICT, but you soon have to leave the hands on technical behind. C++ is the perfect vehicle for me, it can be used and abused in many guises. I am retired so I can play, but my real aim is continuous learning.

I have learnt a great deal from hands on coding of this project. Professional Windows programmers may have no difficulty understanding the differences between Windows reference classes and classes, or the details of how header files for Windows Forms operate within Visual C++, or the issues arising from the need for static, global data sharing between assembler and multiple C++ forms. I have steered clear of a rigorous parent child form hierarchy, the niceties of professional parser and compiler writing, and strict object orientated development. I have even utilised the odd GOTO. My FORTRAN roots operate like weeds. If I were starting now, I would utilise a more hierarchical approach for header files and classes, with inheritance and conditional complilation to avoid the problems of duplicated definitions and ease code reuse.

The Idea

My initial attempts at learning X86 assembler using MASM32 and similar tools were hampered by a lack of understanding of the low level Windows environment, the limitations of input/output using Windows Console, and the fact that assembler mistakes could, at least pre-Windows 7, result in critical errors that would crash the PC rather than just the program being tested. As a learning exercise I decided to write an X86 assembly language interpreter that could be executed as a standard Windows Form application. C++, with its capability for everything from object orientated usage to low level code, seemed the perfect vehicle for this project. A critical requirement was the ability to mix managed code with native code, and to use assembler. I toyed with Java, having looked at projects such as Jasmin, but I decided that the Visual Studio environment was less challenging than using Eclipse or Netbeans to develop a GUI application using Windows and assembler in conjunction with Java. I appreciate that I could have set my objective as a purely Java X86 assembly language interpreter, but that did not excite me. Usually in software development the lack of a stable specification against which to produce a finished 'product' is a challenge, if not a cause of failure. However, my first goal has been to produce a software framework that could be used as a 'test harness' for (my) learning, not just of C++ but for the target systems being emulated. The notion is similar to the prototypes often employed within business analysis to gauge the merits of competing technical solutions before real development work starts.

Pre-Requisites

I am currently developing this project using Visual Studio 2012 Express, running the executable directly from that environment. The executable is also capable of being run directly if you install the latest version of the Visual Studio 2012 X86 runtime, which can be downloaded free from Microsoft. My PC environment is Windows 7 64 bit, but I have also successfully run this project under Windows 8 (using VS2012 Express for Desktop).

Known Issues

When executing under VS2012, select Build and then 'Start without debugging' when the build completes. A variety of warning messages will be observed,mostly related to the use of native code and compiler command line options. If you opt for 'Start debugging' the application does not always run reliably. Very occasionally, and apparently randomly, during execution I have experienced a generic GDI+ error but execution can be 'continued' if the error occurs or just restart the application. This is not obviously a GDI objects overflow problem, but nor is it straightforward to diagnose the actual cause.

Limitations

The X86 and X64 machines operate to a flat memory model and 32 bit and 64 bit operation are completely discrete. Segment registers are included but initialised to zero. Absolute addresses are calculated from the offset only, without reference to the segment registers. For the 32 bit emulation, data, code and stack base addresses are preconfigured as 0x00, 0x0600 and 0x0F00 respectively, with 0x01000 set as top of memory. Values in memory are held Little Endian. In X86 mode, only 32 bit instructions are allowed, excepting the use of 16 bit registers as defined for shift instructions, etc. Permitted instructions and operand types can be found in the Help menus for each machine. Interrupts, calling conventions, and call stacks are not implemented.

The X64 implementation is being developed to allow the use of both 64 bit and 32 bit registers. Segment registers are initialised to zero. Data, stack and code memory start at addresses 0x00, 0x01000 and 0x02000 respectively.

The Emulator application does not yet include exception handling for system events, such as DEP, address errors, system heap errors, etc.

A Visual Guide

The user interfaces for each emulation mode are illustrated below. Download the attached image files and view full screen to examine the detail.

Installation

The files provided via the DropBox link below comprise a set of screen snapshots (.jpgs) from the application running in each of the emulation modes, a compressed Windows 32 bit executable (.zip), and the Visual Studio solution folder, both zipped and uncompressed. Download the files and unzip the x64interprettest.zip to extract the executable. Install Visual Studio 2012 C++ X86 runtime, available on Microsoft's site, navigate to the folder containing the unzipped executable and double click the x64interprettest.exe file. Depending on your PC environment and user credentials, you may need to right click and run the .exe file with administrator rights, and/or override real time anti-virus warnings. Under Windows 8, if you see a box saying that the application is unauthorised, click more info and then select Run anyway. The MD5 checksum for the executable is:

(using WinMD5Free) Not yet available forthis update

The Visual Studio 2012 solution used to generate this executable is available for download from DropBox using the following link:

https://www.dropbox.com/sh/i0ozt6expnepg37/iutkJytnH4

Extract the X86Emulator.zip file after downloading. This will build the folder ...\X86Emulator\interprettest, comprising the required subfolders and files. Use Visual Studio Express 2012 for Desktop to open the x64interprettest.sln file within the ...\interprettest folder. Before you Build you will need to confirm the configuration and project properties are set correctly. First, ensure that on the ribbon strip menu at the top of the screen that the configuration is set to Release and Win32. My tests showed that provided the configuration is set to Release/Win32 the project properties are preserved in the solution files. However, to confirm this or resolve any Build 'configuration' errors (see below), right click the x64interprettest at the top of the Solution Explorer and select Properties. Using the Tabbed pages check and adjust the following as needed (ensure you click APPLY on each page):

Click Apply. For the last item above, ensure that there is no path so that assm1.obj is written to the default folder.

With the Properties and Configuration set as above, Select Build and when this completes select Debug\Start Without Debugging. If the latter option is greyed out use options to select 'expert mode'. Note, if the Build fails because PURE is selected, Safe event handlers is set to Yes, or assm1.obj file not found, recheck the above.

Use

The initial screen offers a choice of emulation modes. Select X86. Once loaded, the cursor is positioned on the code entry line. As an example, tab to the Mneumonic field and enter add, and then tab to first and second operand fields and enter eax and 45. Click the Translate Code and then the Execute buttons immediately above the code entry line. Further lines of code can then be entered one by one in the same manner. Data can be entered directly in to registers, the stack and memory using the Data Entry menu. For example, to change eax, click the box displaying the eax value, enter either a decimal number or a hex number starting 0x, and then select File/Registers/EAX. The value you entered should be confirmed as eax. If not, an appropriate error message is given in the Diagnostics box. To enter values in to memory you have to complete the address and value boxes adjacent to the box displaying the relevant memory segment and then select the required option form the Data Entry menu. Similarly for the stack, complete the value field above the Stack display and use the Data Entry menu as before. For further instructions and operand types consult Help/Top Menu . After entering a block of code, you can execute it as a whole by resetting eip to 0 or 0x600 and clicking the Execute Block button. Changing eip is achieved in a similar manner to changing eax. Set the eip box to the required value and click the Reset/Change eip button on the menu line.

IN and OUT instructions can used as a very simple Windows Console stream to input or output a numeric value. Note, if you get debug assert errors on running, especially when defining Constants, ensure build configuration is set to Release & Win32 and that all the Project Properties set as above.

Variables, Labels and Constants

The Emulator can accept data and label definitions. For labels, use the label field in the code line to enter a label name followed by : or :: (to define near and far labels) and then the required assembler code. Click Translate Code and then Execute as before. For data, enter the variable or constant definition on the .Data entry line and click Translate Data and then Execute. For example:

The Code

The Emulator is a C++ Windows Form application comprising a number of linked forms which in turn utilise external functions. These latter functions are a mix of C++, unmanaged C++ employing inline assembler, and an X86 assembler only module. As the original purpose of the application was as a learning aid, it has developed into a 'test harness' for trying out coding ideas. It is not spaghetti code, but nor is illustrative of best practice in naming or structure. The software contains significant duplication of functions to trial different approaches, and also some redundant code from earlier versions (mostly commented out). Global variables are employed extensively to simplify sharing of data between forms and threads. Despite all this, the X86 emulator has a straightforward architecture.

A Translate function parses the code line entered, checking that the operands are valid for the instruction type, and converting the input line to a token stream which is then stored in a series of structured records in memory. When the user selects the Execute function, the token value representing the instruction is used as a parameter to select the required execution routine. The Execute function acts as both an interpreter, generating the results from instruction execution, and also emits the machine code representation. Code, data and stack memory, and registers, are updated. A range of ancillary functions are implemented to speed use, for example allowing direct data entry into registers and memory, and to reset various aspects of the virtual machine. Once several lines of code have been entered, the user can select the Execute Code Block function which uses the structured records constructed from the previously parsed instructions to execute the complete 'program'. Unlike line at a time execution, code block execution is not sequential, but rather follows branches, loops and jumps as coded.

X64 and ARM Emulation

The X86 64 bit emulation is only a framework at this stage. You can enter code or data definitions in a similar manner to the X86 32 bit emulation, but the interface has been simplified such that everything is entered on one line and a single Assemble button is provided to translate and execute the code entered. If you download and examine the detail of the screen dump you can see the handful of instructions implemented to date. Most of the buttons on the menu strip are also operative e.g., Reset stack. This emulation is being developed using discrete Classes for token parsing, virtual machine data handling, interpreted execution and machine code emission. Finally, first steps towards an ARM emulation have been implemented, conceived around coding and decoding the ARM 32 bit instruction set. In the completed version, the user will be able to enter either assembler mneumonics, to be translated and executed, or a series of 32 bit binary values as a machine instruction stream. This framework is not yet operative. Clicking the Translate and Execute buttons merely illustrates the decoding of a few hard coded test values. This last component was added whilst waiting to acquire a Raspberry PI. As educational use of the latter gathers pace there may well be more interest in simple ARM emulation, beyond the commercial tools utilised by those developing embedded systems and mobile device applications.

Conclusion

This ongoing project has very personal goals but I hope it may appeal to others with similar interests.

Share

About the Author

Chartered ICT Engineer, my career started in mainframe data centre operations, progressed through programming, systems analysis and project management, to Director of ICT Services and Programmes in a large organisation. My speciality was security, and I was a member of the British Computer Society's ICT security standards group whilst leading work on standards for ICT security evaluations.

I have a strong interest in the topic as a learning vehicle but, despite being retired, there are many calls on my time. I an continuing to develop the project and will be publishing a new version later in the year.

I am planning a follow up article in a few months time. I have implemented part of the ARM functionality covering the data processing group of instructions and I just need to add load and store to have a useful subset of instructions. As a training exercise, I am also updating the X86 emulator to allow code lines to be entered as free form text. I am attempting to code this using the lexing, parsing and code generation techniques typically found in modern compliers. This is an attempt to put in to practice what I am learning from a Coursera/Stanford University free online course on compiler writing.

No. The X86 assembler runs and executes hand entered code as if operating on a 32 bit Windows machine. It does not provide disassembly from binary. The ARM assembler module is not yet fully implemented but it is not designed to provide disassembly either.